microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

Yifan Yang defb4566ea fix(sleep): isolate claude CLI calls; concrete+override-aware reflect; honor hard constraints

Critical correctness fix found by debugging the thorough-analyst failure:

* `claude -p` was running with the AMBIENT Claude Code project context (the
  repo's CLAUDE.md, installed skills, tools). The optimizer/target calls were
  polluted — reflect once replied with a list of the user's installed skills
  instead of JSON edits. Now ClaudeCliBackend._call runs ISOLATED: a clean temp
  cwd, --disallowedTools '*', --exclude-dynamic-system-prompt-sections. This is
  essential for the backend to be trustworthy and reproducible.

* reflect prompt: translate failing rule-judge criteria into plain English
  (max_chars=1200 -> "the ENTIRE response must be at most 1200 characters") and
  require CONCRETE, verbatim thresholds in proposed rules (not "respect limits").

* attempt prompt: treat the Learned-preferences block as HARD CONSTRAINTS that
  override earlier conflicting skill text.

Earlier Claude results predate this fix and are being re-validated clean; the
Codex backend was never affected (it runs in its own exec context).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

2026-06-08 14:31:51 +00:00