Files
microsoft-SkillOpt/skillopt
Yifan Yang a0419bfdbb feat(sleep): benchmark sweep + report tooling; override-aware reflect prompt
- sweep.py: run many (backend, model, seed, transfer-pair) configs sequentially,
  append each result to JSONL incrementally (resumable, interrupt-safe).
- report.py: render the sweep JSONL into a presented Markdown scorecard with
  direct-improvement and cross-model-transfer tables.
- reflect prompt now tells the optimizer its edits are APPENDED (can't delete the
  base skill text), so on a conflict it must write a forceful OVERRIDE rule.
  Diagnosed from a real failure: thorough-analyst (needs <=1200 chars) kept its
  edits rejected because the base "be exhaustive" line won; a verified override
  ("HARD LIMIT ... supersedes") makes Haiku obey (1194/880 chars -> hard=1.0).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-08 14:31:51 +00:00
..
2026-05-21 17:22:04 +00:00