fix(sleep): retry reflect on non-JSON reply; honest report narrative

- reflect() now retries once with a firmer "JSON only" instruction when the first reply doesn't parse to a non-empty array. A transient non-JSON reply otherwise wastes a whole night (gate sees no edits -> reject), which made weak optimizers (Haiku) flaky across runs. - FINAL_REPORT.md: document the context-leak discovery honestly; Codex cells stand (clean), Claude cells recomputed under strict isolation. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-07-03 14:02:58 +08:00 · 2026-06-08 14:31:51 +00:00
parent c80914b036
commit d75863eb6f
2 changed files with 31 additions and 12 deletions
--- a/skillopt/sleep/backend.py
+++ b/skillopt/sleep/backend.py
@@ -373,9 +373,20 @@ class CliBackend(Backend):
            f"{criteria_text}\n\n"
            f"# Recurring failures\n{fail_text}"
        )
-        raw = self._call(prompt, max_tokens=1024)
-        self._tokens += len(prompt) // 4 + len(raw) // 4
-        arr = _extract_json(raw, "array")
+        # Call with one retry: transient non-JSON replies otherwise waste a whole
+        # night (the gate sees no edits and rejects). A firmer second prompt
+        # recovers most of these.
+        arr = None
+        for attempt in range(2):
+            p = prompt if attempt == 0 else (
+                prompt + "\n\nIMPORTANT: your previous reply was not valid JSON. "
+                "Reply with ONLY the JSON array, no prose, no markdown fences."
+            )
+            raw = self._call(p, max_tokens=1024)
+            self._tokens += len(p) // 4 + len(raw) // 4
+            arr = _extract_json(raw, "array")
+            if isinstance(arr, list) and arr:
+                break
        edits: List[EditRecord] = []
        if isinstance(arr, list):
            for e in arr[:edit_budget]: