* feat(telemetry): carry observation volume on rollups so cache-value survives migration The context-cache-value, per-user-savings, and observation-type-by-model metrics were derivable only from the legacy per-occurrence events (context_injected, session_compressed), which decay to zero as the fleet upgrades to 13.7.0 and switches to the rollups. The rollups already received the underlying records — they just didn't aggregate the observation fields. - observer_turn_rollup: add observations_created (Σ per-turn observation count, distinct from the rollup's turn `count`) + summed obs_type_* buckets, so cost-per-observation (total_cost_usd / observations_created) and observation-type-by-top_model are derivable from the rollup alone. - context_injected_rollup: add total_observations_injected (cache-reuse count) + total_tokens_saved_vs_naive (windowed savings sum). - scrub.ts: whitelist the three new emitted keys (obs_type_* already allowed; deny-by-default whitelist would drop them otherwise). - docs: correct the rollup field tables — the prior context_injected_rollup row documented fields the code never actually emitted. - tests: assert both new aggregations (167 telemetry tests pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LEZpnYz9z4TjKcG19qHFrJ * build(telemetry): regenerate plugin bundles for rollup observation fields worker-service.cjs and transcript-watcher.cjs rebuilt via `npm run build` to bundle the new observation aggregation. Incidental, telemetry-unrelated churn in the other service/UI bundles was left out to keep the diff meaningful. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LEZpnYz9z4TjKcG19qHFrJ --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.2 KiB
TODO (next): make the observer IGNORE output it can't parse — kill the "poisoned" respawn heuristic
Status: Not started. Spec for a follow-up cleanup pass. Owner principle (from @thedotmack): The observer parser should IGNORE any output it can't parse. That is the whole job. It must NOT inspect the content of unparseable output and act on it.
The problem
When the observer SDK returns something that isn't parseable <observation>/<summary> XML, the code is supposed to drop it. Instead, a substring classifier labels some of it poisoned and kills + respawns the SDK session. "Poisoned" is decided by a case-insensitive includes() against 11 hardcoded English phrases (session exhausted, context window, prompt is too long, this session has ended, …) — see src/sdk/output-classifier.ts:20-32.
This is a brittle keyword match standing in for "ignore what you can't parse," and it over-fires badly.
Evidence it's broken (live PostHog, 30d, project CMEM)
poisoned aborts as a share of all compression turns, by model:
| Model | poisoned % of turns | context |
|---|---|---|
| haiku (4.5) | 41.1% | 200K |
| claude-haiku-4-5-20251001 | 33.9% (9.0M turns) | 200K |
| claude-sonnet-4-5 | 17.7% | 200K–1M |
| claude-sonnet-4-6 | 9.6% | 200K–1M |
| claude-opus-4-7 | 5.9% | 200K |
| gemini-2.5-flash-lite | 0.1% | 1M |
| openai/gpt-oss-120b:free | 0.1% | — |
| gemini-3-flash-preview / xiaomi mimo-flash:free | ~0% | small |
Total ≈ 13.2M poisoned aborts / 30d ≈ 22% of all turns.
This is the inverse of real context exhaustion. If "poisoned" meant running out of room, the small-context models would top the list. Instead they're ~0% and the large-context Claude models are 6–41%. The classifier is matching Claude's closure-phrase wording (e.g. "I cannot continue this session"), not session health. Net effect: claude-mem kills+respawns ~1/3 of Haiku sessions on a keyword false positive, throwing away work and re-spending tokens.
(Note: the word "poisoned" is also just wrong — it implies tainted/corrupted input, not a wedged session. The whole concept should go.)
The fix — "parse XML or drop it," nothing content-based
Where it lives
src/services/worker/agents/ResponseProcessor.ts:56-120— the!parsed.validbranch. The correct behavior already exists at:115-120("Plain-text skip responses are intentionally ignored" →confirmClaimedMessagesdrops the batch, returns).:75-77— the offending trigger:const mustRespawn = outputClass === 'poisoned' || // ← remove session.consecutiveInvalidOutputs >= INVALID_OUTPUT_RESPAWN_THRESHOLD; // ← see decision:79-113— the respawn + telemetry block.:26—INVALID_OUTPUT_RESPAWN_THRESHOLD = 3.
src/sdk/output-classifier.ts—POISONED_MARKERS(:20-32),poisonedin theObserverOutputClassunion (:13), the precedence check (:67-73).src/services/worker/SessionManager.ts—respawnPoisonedSession(:252),session.abortReason = 'poisoned'(:273).src/services/worker/http/routes/SessionRoutes.ts:37,44—abort_reasonenum mapping incl.poisoned.src/services/telemetry/scrub.ts:90-99— enum doc comments + whitelisted keysinvalid_output_class,abort_reason,respawn_triggered,consecutive_invalid_outputs.src/npx-cli/commands/telemetry.ts:71,74— field docs listingpoisoned.- Tests:
output-classifiertests + anyResponseProcessorrespawn tests.
Recommended change: PURE IGNORE (remove ALL content-driven respawn)
output-classifier.ts— deletePOISONED_MARKERSand thepoisonedclass. Classifier collapses toxml(parseable) vs not. KeeppreviewOutput()for log visibility only — it must never drive behavior.ResponseProcessor.ts— remove the wholemustRespawn/respawn block (:71-113). Unparseable output always falls through to the existing drop path (:115-120). Keep the warn-log +previewOutputso drops stay visible.SessionManager.respawnPoisonedSession— remove if it becomes dead code (confirm no other callers;respawnPoisonedSessionis referenced from the Phase-2 buffer-flush guards — do NOT flush a rollup for it; just delete the call path cleanly).scrub.ts/SessionRoutes.ts/ CLI docs — drop thepoisonedenum value and, if respawn goes entirely, the now-unusedrespawn_triggered/invalid_output_class/abort_reasonkeys (or keep them whitelisted but unused — safer to keep keys, remove only thepoisonedvalue from docs).- Tests — delete poison-classification + immediate-respawn tests; add a test asserting unparseable output is dropped (batch confirmed, no respawn) regardless of content.
Why pure-ignore is loop-safe: the drop path calls confirmClaimedMessages — the batch is NOT re-queued, so a genuinely-stuck session can't loop "until quota exhausted." The 3-strikes respawn was guarding a loop that the drop path already prevents.
If a safety net is still wanted (fallback option)
Keep only the content-agnostic consecutiveInvalidOutputs >= 3 structural respawn; remove just the outputClass === 'poisoned' || clause and the POISONED_MARKERS list. This kills the 33–41% false positives but retains a structural circuit-breaker. (Owner leans toward pure-ignore above.)
Verification
bun test(output-classifier + ResponseProcessor + telemetry suites) green;tsc --noEmitclean.- Grep guard:
grep -rin "poison" srcreturns nothing behavioral (only history/comments if intentionally kept). - Manual: feed the observer non-XML output containing the string "context window" → asserted ignored/dropped, session NOT respawned.
- Post-ship PostHog:
invalid_output_class='poisoned'/abort_reason='poisoned'volume goes to zero on updated installs; Haiku turn throughput should rise (fewer needless respawns).
Dashboard follow-up (separate, non-blocking)
The "Session abort reasons" and "Compression invalid-output classes" tiles on dashboard 1739781 will lose the poisoned slice as installs update — expected. No tile change needed; the decay itself is the confirmation signal.