mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
Critical correctness fix found by debugging the thorough-analyst failure: * `claude -p` was running with the AMBIENT Claude Code project context (the repo's CLAUDE.md, installed skills, tools). The optimizer/target calls were polluted — reflect once replied with a list of the user's installed skills instead of JSON edits. Now ClaudeCliBackend._call runs ISOLATED: a clean temp cwd, --disallowedTools '*', --exclude-dynamic-system-prompt-sections. This is essential for the backend to be trustworthy and reproducible. * reflect prompt: translate failing rule-judge criteria into plain English (max_chars=1200 -> "the ENTIRE response must be at most 1200 characters") and require CONCRETE, verbatim thresholds in proposed rules (not "respect limits"). * attempt prompt: treat the Learned-preferences block as HARD CONSTRAINTS that override earlier conflicting skill text. Earlier Claude results predate this fix and are being re-validated clean; the Codex backend was never affected (it runs in its own exec context). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>