* feat(telemetry): backfill historical token-savings economics The 8-month historical_activity backfill rolled up observation counts but never the token economics, so tokens_saved_vs_naive only existed for the ~6 days since context_injected went live (Jun 9). This adds the historical counterpart using the SAME formula live uses: read_tokens = sum(ceil(len(observations.text)/CHARS_PER_TOKEN_ESTIMATE)) tokens_saved_vs_naive = max(0, discovery_tokens - read_tokens) per UTC day Both inputs are already persisted in SQLite (session_summaries.discovery_tokens and observations.text), so the savings series now extends across the full backfill window instead of starting at Jun 9. Conservative by construction (once-per-observation read cost, not a replay of real injections) and flagged backfilled:true. Generation-side cost (cost_usd/tokens_input/tokens_output) is NOT recoverable here — it was never written to SQLite — so it is intentionally excluded; see plans/generation-cost-persistence.md for the forward-only fix. - backfill.ts: read_tokens rollup + per-day savings derivation, CHARS_PER_TOKEN_ESTIMATE import - scrub.ts: whitelist read_tokens - tests: fixture gains observations.text; +2 cases (ceil math + floor-at-0) - plans/: spec for the generation-cost persistence follow-up https://claude.ai/code/session_01YWJPckEtd2sLAtng39rasu * fix(telemetry): version the backfill marker so enriched rollup reaches existing installs The historical backfill was gated solely on the marker file existing, so every install that already backfilled under the prior version (#2912) would hit the one-shot return and never receive the new read_tokens / tokens_saved_vs_naive economics — the enriched series would only ever reach fresh installs. Add BACKFILL_VERSION to the marker. isBackfillComplete() now treats a marker written by an older version (or a legacy marker with no version field, i.e. version 1) as incomplete, so already-backfilled installs re-run and pick up the enriched keys. The re-run is idempotent and does not double count: every event keeps its deterministic per-(installId, event, day) uuid, so PostHog's historical-migration dedup replaces each event in place rather than appending a second row. - backfill.ts: BACKFILL_VERSION constant, version on BackfillMarker, version-aware gate, stamp version at both marker-write sites - tests: current-version marker skips; legacy (versionless) and older-version markers re-run and rewrite at the current version; assert version stamped https://claude.ai/code/session_01YWJPckEtd2sLAtng39rasu --------- Co-authored-by: Claude <noreply@anthropic.com>
5.1 KiB
Spec: persist generation cost (the second telemetry gap)
Problem
The growth/value story has two data layers with very different histories:
| Layer | Event | Span | Recoverable historically? |
|---|---|---|---|
| Observation growth | historical_activity |
Oct 2025 → now | n/a (already backfilled) |
| Injection value (read-cost avoided) | context_injected |
live, Jun 9+ | Yes — done in this PR via backfill.ts (read_tokens from observations.text, savings = discovery − read) |
| Generation cost (what it cost to produce observations) | session_compressed |
live, Jun 7+ | No — see below |
session_compressed carries tokens_input, tokens_output, cost_usd, computed in
ResponseProcessor from the SDK result message (Claude path) or
usage.cost (OpenRouter). These values are emitted to PostHog and then discarded —
they are never written to SQLite. observations has no token/cost column;
sdk_sessions has none; session_summaries only has discovery_tokens (read cost,
not generation cost).
Consequence: there is no way to backfill generation cost for the ~8 months before
session_compressed started firing. The data simply does not exist on disk. The
cost/economics panel can therefore only ever be a live, forward-looking metric
unless we start persisting it now.
Goal
Persist per-compression generation cost so that:
- future backfills/audits can roll it up per day (same shape as the other rollups), and
- a "what it cost to produce vs what it saved" panel can be built from local data.
This does not recover the past. It stops the bleed going forward.
Design
A compression turn is not 1:1 with an observation row (one turn → N observations + 1
summary), so generation cost belongs on a per-turn record, not smeared across
observation rows (that was the discovery_tokens multi-count trap — see backfill.ts
comments). Two viable homes:
- Option A (preferred): new
compression_eventstable. One row persession_compressed, keyed bymemory_session_id+ turn. Columns:tokens_input INTEGER, tokens_output INTEGER, cost_usd REAL, model TEXT, provider TEXT, outcome TEXT, created_at_epoch INTEGER. Clean, append-only, trivially rolled up. Mirrors the event we already emit. - Option B: columns on
session_summaries. Cheaper migration, but summaries are per-session-summary not per-turn, andoutcome='invalid_output'turns have no summary row — those costs would be lost. Rejected for that reason.
Go with Option A.
Steps
-
Schema migration (SessionStore.ts, next version after 32):
ensureCompressionEventsTable()—CREATE TABLE IF NOT EXISTS compression_events (...)with index oncreated_at_epoch DESC. Best-effort, same pattern as the otherensure*migrations. -
Write path (ResponseProcessor.ts): at every existing
captureEvent('session_compressed', …)site, alsoINSERT INTO compression_eventsthe sametokens_input/tokens_output/cost_usd/model/provider/outcome. The Claude path stashes the event onsession.pendingCompressionEventand fires it fromClaudeProvideronce theresultmessage lands — write to SQLite at that same point so the token/cost fields are populated, not the early-stream placeholders. Guard on a real cost (the abort/kill path ships without token fields — write the row with NULLs rather than dropping it, to keep turn counts honest). -
Backfill rollup (backfill.ts → collectDailyRollups): add a block summing
tokens_input,tokens_output,cost_usdfromcompression_eventsper day into countersgen_tokens_input,gen_tokens_output,gen_cost_usd. Wrapped in the same try/catch (older installs without the table skip the block). This only produces data for days on/after this ships — that is expected and correct. -
Whitelist (scrub.ts): add
gen_tokens_input,gen_tokens_output,gen_cost_usdto the backfill section.cost_usd/tokens_input/tokens_outputare already whitelisted for the live event; thegen_*names disambiguate the per-day rollup from the per-event live values to avoid semantic collisions in PostHog (same reasoning as keepingread_tokensdistinct). -
Docs (docs/public/telemetry.mdx): document the new table and the three rollup keys. Counts/sums only, no content — consistent with the existing privacy contract.
-
Tests (tests/telemetry/backfill.test.ts): fixture row in
compression_events, assert per-daygen_cost_usd/gen_tokens_*sums; assert the block no-ops when the table is absent.
Privacy
compression_events stores integers, a float cost, and two closed-enum strings
(model, provider) already shipped on the live event. No project names, no text,
no prompts. Rollups remain counts/sums per UTC day. No new PII surface.
Out of scope
Recovering pre-instrumentation cost. It was never written down; there is nothing to recover. The honest framing for any dashboard is: generation cost is measured from the date this ships forward; the observation-growth arc and (as of the sibling PR) the read-cost-savings series extend back to Oct 2025.