Files
Yifan Yang e2de84d36f docs(sleep): real Claude<->Codex cross-validation of the new features
Three live runs exercise the new code paths on both runtimes:
  A) Claude Sonnet->Haiku, gate=OFF + rollouts_k=2: brief-writer test 0->1.00,
     action 'greedy_improved', val & test both reported (3-way split works).
  B) Codex, gate=ON + rollouts_k=2: brief-writer test 0->1.00 in 2 nights.
  C) Claude Sonnet->Haiku, thorough-analyst, 3 nights: slow-update fires and
     distils a durable cross-night meta-rule (general, not task-specific).

Confirms gate-off greedy path, 3-way val/test split, multi-rollout, and the
gate-independent slow-update all work with real models on Claude AND Codex.
Raw logs under docs/sleep/raw/crosscheck_*.txt.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-08 14:31:51 +00:00
..