fix(sleep): retry reflect on non-JSON reply; honest report narrative

- reflect() now retries once with a firmer "JSON only" instruction when the
  first reply doesn't parse to a non-empty array. A transient non-JSON reply
  otherwise wastes a whole night (gate sees no edits -> reject), which made
  weak optimizers (Haiku) flaky across runs.
- FINAL_REPORT.md: document the context-leak discovery honestly; Codex cells
  stand (clean), Claude cells recomputed under strict isolation.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
This commit is contained in:
Yifan Yang
2026-06-08 14:31:51 +00:00
parent c80914b036
commit d75863eb6f
2 changed files with 31 additions and 12 deletions

View File

@@ -373,9 +373,20 @@ class CliBackend(Backend):
f"{criteria_text}\n\n"
f"# Recurring failures\n{fail_text}"
)
raw = self._call(prompt, max_tokens=1024)
self._tokens += len(prompt) // 4 + len(raw) // 4
arr = _extract_json(raw, "array")
# Call with one retry: transient non-JSON replies otherwise waste a whole
# night (the gate sees no edits and rejects). A firmer second prompt
# recovers most of these.
arr = None
for attempt in range(2):
p = prompt if attempt == 0 else (
prompt + "\n\nIMPORTANT: your previous reply was not valid JSON. "
"Reply with ONLY the JSON array, no prose, no markdown fences."
)
raw = self._call(p, max_tokens=1024)
self._tokens += len(p) // 4 + len(raw) // 4
arr = _extract_json(raw, "array")
if isinstance(arr, list) and arr:
break
edits: List[EditRecord] = []
if isinstance(arr, list):
for e in arr[:edit_budget]: