mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
Strong-optimizer/weak-target (Sonnet -> Haiku), fully isolated: brief-writer, advisor, thorough-analyst all 0.00 -> 1.00 on held-out. thorough-analyst shows 2-night convergence (0.33 -> 1.00). Codex self-optimized brief-writer also 0 -> 1.00. Key finding answering the optimizer/target-split request: the OPTIMIZER MODEL is decisive — weak Haiku-as-optimizer is flaky (0 or 1.0 across runs), strong Sonnet-as-optimizer reliably hits 1.0 on every seed. Raw logs under docs/sleep/raw/. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>