* feat(telemetry): add PostHog historical backfill module (Phase 1) One-time anonymized daily-rollup backfill: collectDailyRollups with pinned aggregation semantics, sessions-first install-day inference, deterministic v5 event uuids, 9-step gated transport (marker/consent/debug gates, historicalMigration client, marker only on clean shutdown). Whitelist additions in scrub.ts/common.ts, extended PostHog test mock, full (a)-(k) test coverage. Per plans/2026-06-11-posthog-historical-backfill.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telemetry): wire historical backfill into worker startup + docs disclosure Fire-and-forget runHistoricalBackfill call in initializeBackground after the worker_started capture (non-blocking, marker-gated one-shot, retries on next start if delivery fails). New "Historical backfill" section in docs/public/telemetry.mdx documenting what is sent, the once-per-install marker, identical consent gates, and upload-time geo caveat. Phases 3-4 of plans/2026-06-11-posthog-historical-backfill.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(telemetry): document dual-layer error handling on backfill call site runHistoricalBackfill never rejects by contract; the .catch is the unhandled-rejection backstop for the fire-and-forget call. Addresses Greptile review on #2912. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
33 KiB
PostHog Historical Backfill — Anonymized Observation Metadata Import
Goal: One-time, per-install backfill of anonymized daily activity rollups from the local SQLite DB into PostHog, using PostHog's historical-migration ingestion mode, so growth (installs over time, reconstructed WAU/MAU, cohort retention) is visible for activity that predates telemetry shipping.
What gets sent (anonymous counts/sums only — never titles, text, prompts, project names, or any raw string column):
- One
historical_activityevent per active day per install, profile-less ($process_person_profile: false), carrying ONLY rollup counters +backfilled: true(nobuildBaseProperties()— stamping the current version/os onto 2025-dated events would permanently poison version-over-time charts):{ observation_count, session_count, summary_count, prompt_count, project_count, discovery_tokens, obs_type_bugfix, obs_type_discovery, obs_type_decision, obs_type_refactor, obs_type_other, session_completed_count, session_failed_count, sessions_claude_count, sessions_codex_count, sessions_gemini_count, sessions_other_platform_count, subagent_obs_count, backfilled: true }. - One
install_inferredevent at noon UTC of the install day,person: true, withfirst_active_datein properties and$set— this single event draws the adoption curve.
Known caveats (accept, do not solve):
- Survivorship bias — only currently-installed, telemetry-consenting users backfill. The curve shows the history of the retained base.
- The last ~2.5 days before each install's first post-upgrade worker start are never backfilled (PostHog's 48h rule + whole-day windowing). Do NOT "fix" this by re-running without the marker — duplicates are worse than the gap.
- Geo properties on backfilled events reflect location at upload time, not the historical
date (
disableGeoip: falsekept for consistency with live telemetry). - Dashboards slicing by
versionmust filterbackfilled != true; combined install metrics must dedupeinstall_inferred∪install_completed(is_update=false)per distinct_id (telemetry-era installs emit both). - Event-style live metrics (searches, injections, compression spend) are intentionally absent from backfill — they are not stored locally and cannot be reconstructed.
Phase 0: Documentation Discovery — COMPLETE (findings consolidated below)
All implementation phases MUST use only the APIs listed here. Citations were verified against
the working tree, the published posthog-node v5.36.15 source, and the real
~/.claude-mem/claude-mem.db (read-only) on 2026-06-11, then adversarially re-verified on
2026-06-12 (see Review log at the bottom).
Precondition: this worktree has no node_modules — run bun install (or npm install)
before Phase 1 so imports and bun test resolve.
Allowed APIs
posthog-node ^5.36.15 (root package.json:143, imported in
src/services/telemetry/telemetry.ts:1; caret range — if the installed major ever drifts,
re-verify historicalMigration forwarding in dist/, search for historical_migration):
new PostHog(apiKey, options)—PostHogOptions = Omit<PostHogCoreOptions, 'before_send'> & {...}.PostHogCoreOptionsincludeshistoricalMigration?: boolean("Special flag to indicate ingested data is for a historical migration", default false). It is forwarded ashistorical_migration: trueon every/batchrequest body (verified in@posthog/core@1.32.1shipped source). Also valid:host,flushAt,maxBatchSize,maxQueueSize,disableGeoip.client.capture(msg: EventMessage)—EventMessageincludesdistinctId,event,properties,timestamp?: Date(aDateobject, NOT an ISO string),uuid?: string. Both are forwarded end-to-end to the payload.- capture() enqueues asynchronously (a multi-microtask promise chain runs before the event
reaches the queue). A bare
await client.flush()does NOT join those pending captures and can resolve while events are still un-enqueued. Onlyclient.shutdown()joins pending capture promises and then loops flush until the queue is empty — shutdown() is the only delivery barrier. shutdown()SWALLOWS fetch errors internally (logFlushError); its resolution proves nothing about delivery. Delivery failure is observable only via the public error emitter:client.on('error', handler)(@posthog/coreposthog-core-stateless.ts:301).- The in-memory queue silently drops the OLDEST event past
maxQueueSize(default 1,000). Multi-year installs can exceed 1,000 active days —maxQueueSizemust be raised. - Background flushes fire automatically when the queue reaches
flushAt; their errors are swallowed AND a non-network (4xx) failure removes the batch from the queue before throwing. Therefore: setflushAt/maxBatchSize/maxQueueSizeall to 5000 so NO background flush ever fires and the entire backfill goes as one request at shutdown (the SDK auto-halves the batch on HTTP 413). One request also makes a cross-restart retry byte-identical — the best possible dedupe-key match.
PostHog historical-migration rules (https://posthog.com/docs/migrate):
- Set
historicalMigration: trueso events bypass standard ingestion/billing handling. - Event timestamps must be at least 48 hours in the past. Server behavior for violating
events is undocumented (rejection, billing, silent acceptance all possible) — the
client-side day window is the only guard, and it applies to every event including
install_inferred. - "There is no way to selectively delete event data in PostHog" — idempotency is mandatory
before anything ships (deterministic
uuid+ completion marker). - uuid dedupe (https://posthog.com/docs/data/events) is eventual and best-effort
(ClickHouse merge-time, not query-time) and keyed on
(toDate(timestamp), event, distinct_id, uuid)— retried events must carry byte-identical timestamp+uuid+event+distinctId. The marker (not the uuid) is the PRIMARY idempotency gate; the uuid minimizes damage in the crash-retry window. Residual accepted risk: a crash between server-side ingest and marker write can leave transient duplicates until ClickHouse merges.
Existing telemetry modules to copy from (do not reinvent):
- Consent:
resolveTelemetryConsent(process.env, loadTelemetryConfig())—src/services/telemetry/consent.ts:68-73. Install UUID:getOrCreateInstallId()—consent.ts:114-124(random v4 UUID persisted totelemetry.jsonviagetTelemetryConfigPath()). - Scrub:
scrubProperties(props)whitelist scrubber —src/services/telemetry/scrub.ts:126-149; whitelist setALLOWED_PROPERTY_KEYSatscrub.ts:8-117. Properties not in the whitelist are silently dropped — new keys MUST be added there. Already present (verified): theobs_type_*family (livecontext_injectedships them),session_count,observation_count. Confirmed ABSENT (must add):discovery_tokens. - Key/host:
getTelemetryApiKey(),getTelemetryHost()—src/services/telemetry/common.ts:22-28. Note:getTelemetryApiKey()falls back to the embedded production key and is never falsy — every worker boot anywhere can send real data (see Phase 2/3 sequencing). Base props:buildBaseProperties()—common.ts:100-113(returns CURRENT version, os_version, runtime_version, locale, is_ci — historical events must NOT carry it). Person-prop subset:PERSON_PROPERTY_KEYS+buildPersonSet()—common.ts:36-76.buildPersonSetonly copies keys PRESENT on the event's properties — a person trait that is never assigned to the event silently never ships. - Capture-path conventions (consent gate → scrub → debug-mode stderr print → no-key no-op →
capture; swallow all errors):
captureEvent()—src/services/telemetry/telemetry.ts:72-117. - DB-reading telemetry pattern:
collectInstallStats(db: Database)—src/services/telemetry/install-stats.ts:29-99. Per-block try/catch; typed.get()casts. Keep the per-block try/catch in the rollup too: older installs may lack a table or column — skip that block's keys, never throw. - Epoch normalization (legacy rows store seconds, newer rows milliseconds — verified
against a real DB where 273 rows render as 1970 without it):
asMs(col)→`CASE WHEN ${col} < 1000000000000 THEN ${col} * 1000 ELSE ${col} END`—install-stats.ts:23-25.DAY_MS = 86_400_000—install-stats.ts:27. NoteMIN(asMs(x))applies normalization INSIDE the MIN (correct);asMs(MIN(x))would not be. - Deterministic UUID:
Bun.randomUUIDv5(name, namespace)— exists and is deterministic on the worker's embedded Bun 1.3.9 (verified by execution). Use it with one fixed namespace UUID constant in the module. The npmuuidpackage is not installed and must not be added. - Discovery-token storage semantics (load-bearing for aggregation):
src/services/sqlite/SessionStore.ts:1901-2015— ONEdiscoveryTokensvalue per compression turn is written identically to EVERY observation row of that turn (line 1962) AND to the turn'ssession_summariesrow (line 2004). Summing across observations multi-counts by the obs-per-turn factor. Sumsession_summaries.discovery_tokensonly.
Database access (bun:sqlite, synchronous):
import { Database } from 'bun:sqlite'— pattern atsrc/services/sqlite/SessionStore.ts:1-2.- Query:
db.query(sql).get(...)/.all(...)withastype casts — e.g.src/services/worker-service.ts:508-513,install-stats.ts:36-50. - Relevant columns: base schema in
src/services/sqlite/migrations/runner.ts:43-112—observations(project, created_at_epoch, memory_session_id, type, agent_type),sdk_sessions(project, started_at_epoch, memory_session_id, status, platform_source),session_summaries(project, created_at_epoch),user_prompts(created_at_epoch)(counts only — NEVER selectprompt_text).discovery_tokensis NOT in 43-112: it is added byensureDiscoveryTokensColumnatrunner.ts:381-402(ALTER TABLE ... ADD COLUMN discovery_tokens INTEGER DEFAULT 0on both observations and session_summaries). Query only columns created in runner.ts — the dev DB has extra columns other installs lack. - Observation
typebuckets: use the closedSTAT_TYPE_BUCKETSset fromsrc/services/context/ContextBuilder.ts(bugfix/discovery/decision/refactor/other) so the backfill vocabulary is identical to livecontext_injected. platform_sourceis a user-influenceable string — bucket in JS to the closed enum {claude, codex, gemini, other}; never ship the raw value.
Worker integration points (src/services/worker-service.ts):
worker_startedcapture: lines 532-540 (insideinitializeBackground()).- Fire-and-forget background-task precedent to copy:
ChromaSync.backfillAllProjects(...).then/.catchblock at lines 551-556. - Daily heartbeat precedent:
setInterval(..., 24*60*60*1000)+.unref?.()at lines 544-547. - Shutdown order:
worker_stoppedcapture BEFOREshutdownTelemetry()(lines 699-703).
State files in ~/.claude-mem:
- Path root:
DATA_DIR/paths.dataDir()—src/shared/paths.ts:40, 129-151. - Read with
readJsonSafe<T>(path, fallback)fromsrc/utils/json-utils.js(asconsent.ts:5imports it). There is no JSON-write helper — write withmkdirSync+writeFileSyncexactly assaveTelemetryConfigdoes atconsent.ts:103-107. - The marker is its own file
backfill.json— do NOT merge it intotelemetry.json(the consent save path would clobber it).
Logger (console.* is forbidden in services — enforced by
scripts/check-hook-io-discipline.cjs / tests/logger-usage-standards.test.ts):
import { logger } from '../utils/logger.js';logger.info('SYSTEM', msg, ctx, err?).'TELEMETRY'is NOT in theComponentunion (src/utils/logger.ts:15-52) — use'SYSTEM'.
Tests (bun test; global posthog-node mock):
- Global mock of
posthog-nodeintests/preload.ts:45-55recordspostHogConstructorCalls/postHogCaptureCalls— extend, don't replace. The mock class has NOflush(),shutdown()race-compatible signature, oron()— add no-opflush()/shutdown()/on()to it. - State reset:
__resetTelemetryForTests()—src/services/telemetry/telemetry.ts:125-129. - Env/temp-dir isolation pattern:
tests/telemetry/telemetry-client.test.ts:30-53(CLAUDE_MEM_DATA_DIR = mkdtempSync(...)). - In-memory DB schema pattern:
tests/telemetry/install-stats.test.ts:8-28— but itsmakeDbschema LACKSdiscovery_tokens,memory_session_id,type,agent_type,status,platform_source, and theuser_promptstable. Extend the test schema with every column the rollup queries touch before writing tests, or the queries throwno such column(and the per-block try/catch would mask it as silently-empty rollups).
Anti-patterns (DO NOT)
- ❌
historical_migration(snake_case) as a posthog-node constructor option — the SDK option ishistoricalMigration(camelCase). Snake_case is only for the raw/batchHTTP API. - ❌ Passing
timestampas an ISO string toclient.capture()—EventMessage.timestampis typedDate. Constructnew Date(...). - ❌ Reusing the live singleton client in
telemetry.tsfor backfill — it lackshistoricalMigrationand itsisShutdownlatch must stay untouched. Build a dedicated, short-lived client. - ❌ Sending properties without adding them to
ALLOWED_PROPERTY_KEYS— the scrubber drops them silently and the backfill would ship empty events. - ❌ Raw
created_at_epoch/started_at_epochmath withoutasMs()— legacy second-unit rows land in 1970 and poison the earliest cohort. Additionally apply the project-epoch floor (below): corrupt epochs can also land on plausible-looking wrong days, not just 1970. - ❌ Filtering rollup rows by raw epoch against a cutoff instant — that ships a permanently truncated PARTIAL day. Always include whole UTC day buckets only (window rule in Phase 1).
- ❌ Summing
discovery_tokensacross observations AND session_summaries — the same per-turn value is stored on every row of the turn (see Phase 0); sum summaries only. - ❌
buildBaseProperties()onhistorical_activityevents — current version/os on 2025-dated events is permanently wrong version-over-time data. - ❌ The npm
uuidpackage — useBun.randomUUIDv5(verified present and deterministic). - ❌
console.*anywhere insrc/services/— uselogger(debug-mode payload printing is the one exception and must useprocess.stderr.write, copyingtelemetry.ts:97-103). - ❌ Writing the completion marker before delivery is confirmed — and "delivery confirmed"
means
await client.shutdown()resolved AND zeroclient.on('error')events, NOT a bareflush()(see Phase 0 SDK facts). A crash mid-send must retry on next startup. - ❌ Booting the worker on the dev machine during Phases 1–3 without
CLAUDE_MEM_TELEMETRY=0(orCLAUDE_MEM_TELEMETRY_DEBUG=1) exported — the embedded production key + default-on consent means a casualnpm run build-and-syncships the dev machine's entire history to production PostHog before the dry-run gate.
Phase 1: Backfill module + tests (rollups, events, transport, marker)
Create src/services/telemetry/backfill.ts (one module, one test file).
1.1 Day window (one rule, used everywhere)
lastFullDay = utcDayString(nowMs - 60 * 3_600_000)— 60h = 48h (PostHog contract) + 12h (noon-UTC event timestamps). Noon of any included day is then guaranteed ≥48h old.PROJECT_EPOCH_FLOOR = Date.parse('2024-01-01T00:00:00Z')— predates claude-mem's first release; rows with normalized epoch below it are corrupt and ignored everywhere (rollups AND first-activity MIN).installDay = utcDayString(firstActivityEpochMs)(1.3). Include only whole UTC days whereinstallDay <= day <= lastFullDay, comparing day strings (YYYY-MM-DD compares lexicographically) — never raw epochs, so no partial days can ship. The lower bound discards backdated artifact rows (verified on the reference DB: obs ids 66888/66889 carry a 2025-08-12 epoch but belong to a session started 2026-04-10).
1.2 collectDailyRollups(db: Database, lastFullDay: string, installDay: string): DailyRollup[]
Copy the query style of collectInstallStats (per-block try/catch — a missing table/column
skips that block's keys), bucketing every table by
date(<asMs(col)>/1000,'unixepoch') AS day, merged in a Map<day, rollup>. Pinned
semantics — these exact aggregations, so any two implementers ship identical numbers:
| key | source (per day) |
|---|---|
observation_count |
COUNT(*) FROM observations |
obs_type_bugfix/discovery/decision/refactor/other |
observations GROUP BY day, type, bucketed via STAT_TYPE_BUCKETS |
subagent_obs_count |
COUNT(*) FROM observations WHERE agent_type IS NOT NULL |
session_count |
COUNT(*) FROM sdk_sessions only (do NOT add observations' distinct memory_session_id — same sessions, double count) |
session_completed_count / session_failed_count |
sdk_sessions GROUP BY day, status (closed enum) |
sessions_claude_count / sessions_codex_count / sessions_gemini_count / sessions_other_platform_count |
sdk_sessions platform_source, bucketed in JS to the closed enum |
summary_count |
COUNT(*) FROM session_summaries |
discovery_tokens |
SUM(discovery_tokens) FROM session_summaries only (per-turn cost — see Phase 0 storage semantics) |
prompt_count |
COUNT(*) FROM user_prompts (count only — never prompt_text) |
project_count |
COUNT(DISTINCT project) over (SELECT day, project FROM observations UNION SELECT day, project FROM sdk_sessions) — cross-table distinct in ONE query; never sum per-table distincts (triple-counts the same project) |
Omitted on purpose: session durations (verified dirty — 22% of completed sessions show >24h stale spans), dev-only columns not in runner.ts, anything string-valued.
1.3 findFirstActivityEpochMs(db: Database): number | null
SELECT MIN(asMs(started_at_epoch)) FROM sdk_sessions WHERE asMs(started_at_epoch) >= FLOOR
(copy install-stats.ts:56-61 — note asMs INSIDE the MIN), falling back to the observations
MIN only if sdk_sessions is empty. Keep sessions-first: session timestamps are write-time
and trustworthy, while observation epochs can be backdated artifacts (verified — see 1.1);
a cross-table MIN would bake an artifact date into undeletable data.
1.4 deterministicEventUuid(installId, event, day): string
Bun.randomUUIDv5(${installId}|${event}|${day}, BACKFILL_NAMESPACE) where
BACKFILL_NAMESPACE is one fixed UUID constant in the module. One line; deterministic;
no hand-rolled hashing.
1.5 buildBackfillEvents(db, installId, nowMs): EventMessage-like[]
Pure assembly:
- Each rollup →
historical_activitywithtimestamp: new Date(day + 'T12:00:00Z')and the deterministic uuid. Noon UTC is load-bearing TWICE: it keeps the event inside its day for dashboard timezones in UTC-12..+11 (keep the PostHog project timezone on UTC), and it makes the timestamp retry-stable, which the dedupe key requires — do not "simplify" to a non-deterministic timestamp. Properties: the rollup counters +backfilled: true, passed throughscrubProperties(...), then$process_person_profile: false. NobuildBaseProperties()(see anti-patterns). - One
install_inferredwithtimestamp: new Date(installDay + 'T12:00:00Z')(noon UTC — retry-stable even if the sessions/observations MIN source flips between runs) and uuid keyed on day'install'. Properties:{ ...buildBaseProperties(), first_active_date: installDay, backfilled: true }throughscrubProperties, then$set = buildPersonSet(props)(copytelemetry.ts:91-95).first_active_dateMUST be assigned here —buildPersonSetonly copies keys present on the event, and a whitelisted-but-never-assigned trait silently never ships. Base props are fine on this one person-event ($set = current person state). - If
installDay > lastFullDay(install younger than ~60h): return[]. Such installs have live telemetry coverage for their entire life — there is no pre-telemetry history to reconstruct, and shipping a <48h timestamp violates the migration contract.
1.6 Whitelist additions
Add to ALLOWED_PROPERTY_KEYS (scrub.ts:8-117) — verify each against the file first:
discovery_tokens (confirmed absent), summary_count, prompt_count, project_count,
backfilled, first_active_date, session_completed_count, session_failed_count,
sessions_claude_count, sessions_codex_count, sessions_gemini_count,
sessions_other_platform_count, subagent_obs_count.
Already whitelisted — reuse, do not re-add: observation_count, session_count,
the obs_type_* family. (session_count/observation_count carry different semantics on
live context_injected — acceptable: PostHog properties are filtered per-event in practice;
note the collision in the PR description.)
Add first_active_date to PERSON_PROPERTY_KEYS (common.ts:36-60).
1.7 Marker + transport: runHistoricalBackfill(db: Database): Promise<void>
Marker file ~/.claude-mem/backfill.json at join(dataDir, 'backfill.json') (mirror
getTelemetryConfigPath(), consent.ts:76-78). Shape:
{ completedAt: string, throughDay: string, eventCount: number, installId: string }.
Read with readJsonSafe; write with mkdirSync + writeFileSync (as
saveTelemetryConfig, consent.ts:103-107).
Gate sequence (ORDER MATTERS — debug must precede every marker write):
- Marker exists → return (idempotency gate #1).
resolveTelemetryConsent(process.env, loadTelemetryConfig())false → return without writing the marker (a later opt-in still backfills).- Build events via 1.5.
CLAUDE_MEM_TELEMETRY_DEBUG === '1'→process.stderr.writeone summary line (event count + day range) then one line per event (copytelemetry.ts:97-103), do NOT send, do NOT write marker — even when the event list is empty. This dry-run intentionally re-runs on every debug-mode worker start; the marker must never latch from debug mode. (Only the exact value'1'activates it, matchingcaptureEvent.)- Zero events → write marker, return. (Fresh installs land here: nothing pre-telemetry
exists and live
install_completed/daily events cover them from day 0 —install_inferredis intentionally NOT emitted for them.) !getTelemetryApiKey()→ return without marker. (Vestigial — the embedded key makes this unreachable; keep the one-liner for symmetry withcaptureEvent, write no test for it.)- Dedicated client:
new PostHog(getTelemetryApiKey(), { host: getTelemetryHost(), historicalMigration: true, flushAt: 5000, maxBatchSize: 5000, maxQueueSize: 5000, disableGeoip: false })— the 5000s guarantee a single batch, no swallowed background flushes, and no silent queue-cap drops (see Phase 0 SDK facts). const errors: unknown[] = []; client.on('error', e => errors.push(e)), thenclient.capture({ distinctId: getOrCreateInstallId(), event, properties, timestamp, uuid })per event, thenawait client.shutdown()— NO separateflush()call (it is not a delivery barrier) and NO 3s race (this runs fire-and-forget in the background; the SDK's default shutdown timeout is fine).- Write the marker ONLY if shutdown resolved AND
errors.length === 0. Wrap the whole body in try/catch that logs vialogger('SYSTEM') and never throws (telemetry must never break the worker —telemetry.ts:114-116).
Verification checklist:
- New test
tests/telemetry/backfill.test.tsusing the:memory:pattern fromtests/telemetry/install-stats.test.ts:8-28with the schema EXTENDED per Phase 0 (discovery_tokens, memory_session_id, type, agent_type, status, platform_source, user_prompts table). Cases: (a) mixed second/ms epochs land on the correct day (insertcreated_at_epoch = 1755000000seconds; assert no 1970 day); (b) day window: a row 47h old is excluded AND no partial day ever ships (whole-day buckets only); a row beforePROJECT_EPOCH_FLOORor beforeinstallDayproduces no day; (c)deterministicEventUuidstable across calls, general UUID shape (it is a v5 — do not assert a v4 version nibble); (d) every property in built events survivesscrubProperties(no silent drops); (e) empty DB → zero events, no throw; (f) discovery_tokens dedupe: one turn = 3 observations + 1 summary all carrying 100 → day total 100, not 400; (g) one session + one observation in it, same day →session_count1 andproject_count1 (no double counting); (h)install_inferreduses the sessions MIN, is stamped noon UTC ofinstallDay, and its$setcontainsfirst_active_date; (i) first activity 1h ago → zero events; (j) consent-off → zeropostHogCaptureCalls, no marker; marker present → zero calls; debug mode → zero calls, no marker including on an empty DB; second invocation after success → zero calls; (k) happy path → constructor receivedhistoricalMigration: true, every capture hasuuid+Datetimestamp, marker written with correctthroughDay; anerroremitted via the mock'sonhandler → NO marker. bun test tests/telemetry/passes.
Anti-pattern guards: no marker write reachable before the debug gate; sum
summaries-only for discovery_tokens; day-string windowing only; no buildBaseProperties on
historical_activity; no console.*; no raw epoch math without asMs.
Phase 2: Live dry-run against the real DB — THE SHIP/NO-SHIP GATE
This phase MUST complete before Phase 3 wires anything into the worker, because PostHog data
cannot be selectively deleted and getTelemetryApiKey() always returns the embedded
production key.
- Run
runHistoricalBackfillagainst the real~/.claude-mem/claude-mem.dbwithCLAUDE_MEM_TELEMETRY_DEBUG=1(small runner script or one-off test), and eyeball the stderr payload dump:- First day = 2025-10-19 for this machine (NOT Aug 2025 — the two 2025-08-12 rows are verified backdated artifacts and must be absent thanks to the installDay clamp).
- No day newer than
lastFullDay(≈ T-2.5 days), no 1970 days, no day beforeinstall_inferred'sfirst_active_date. - Plausible counts (~215 active days, ~93k total observations on this machine).
- Exactly one
install_inferred,first_active_date: 2025-10-19, noon-UTC timestamp. - No string property that looks like user content; every property is a number, boolean, or
the single
first_active_datedate string.
- Repeat runs are expected and safe (debug mode never writes the marker). If a non-debug test
run ever latches the marker locally, the reset is
rm ~/.claude-mem/backfill.json.
Phase 3: Worker wiring + telemetry docs disclosure
Export CLAUDE_MEM_TELEMETRY=0 (or CLAUDE_MEM_TELEMETRY_DEBUG=1) in the dev shell for
every worker boot in this phase — npm run build-and-sync restarts the worker, and an
unguarded boot performs the real production send from the dev machine.
-
Wire into startup: in
src/services/worker-service.ts, immediately after theworker_startedcapture (lines 532-540) and alongside the ChromaSync fire-and-forget precedent (lines 551-556), add:runHistoricalBackfill(this.dbManager.getConnection()).catch(error => { logger.error('SYSTEM', 'Telemetry historical backfill failed (non-blocking)', {}, error as Error); });Non-blocking, after core init. Do not add it to the heartbeat — it is one-shot by marker (a failed run retries on the NEXT worker start because no marker was written).
-
Docs: update
docs/public/telemetry.mdx— add a short "Historical backfill" section: what is sent (daily anonymous counts + inferred install date), that it runs once, that it honors the same consent gates (DO_NOT_TRACK,CLAUDE_MEM_TELEMETRY=0,telemetry.json), that opting out before first worker start after upgrade prevents it entirely, and that geo properties on backfilled events reflect upload-time location. Follow the existing page's tone/structure (read it first).
Verification checklist:
bun test(full suite) passes — especiallytests/logger-usage-standards.test.ts.grep -rn "runHistoricalBackfill" src/shows exactly two hits: definition + the one worker-service call site.- Worker boots clean with telemetry disabled in the shell:
CLAUDE_MEM_TELEMETRY=0 npm run build-and-sync, then confirm via worker log (~/.claude-mem/logs/) that startup completes and no backfill error is logged.
Anti-pattern guards: do not block initializeBackground() on the backfill promise; do not
capture worker_stopped-style events from inside backfill; no unguarded dev-machine boots.
Phase 4: Final Verification
- Anti-pattern greps (all must return nothing):
grep -rn "historical_migration" src/(wrong spelling for SDK path)grep -rn "console\." src/services/telemetry/backfill.tsgrep -rn "from 'uuid'" src/grep -n "buildBaseProperties" src/services/telemetry/backfill.ts→ must appear ONLY in the install_inferred builder, never for historical_activity.
- Whitelist proof: test asserting
scrubProperties(buildBackfillEvents(...)[i].properties)retains every expected key on both event types. - Full suite:
bun testgreen;npm run build-and-sync(telemetry-guarded shell) succeeds; worker starts. - Re-run the Phase 2 dry-run one final time on the release build (the marker is still
absent on the dev machine if Phase 3 boots were guarded — if not,
rm ~/.claude-mem/backfill.jsonfirst). - Post-ship validation (manual, in PostHog UI after release):
- BEFORE trusting reconstructed WAU: build a unique-users trend mixing one person event
(
worker_started) and one profile-less event (session_compressed) for a known installId and confirm it counts 1 user — this validates that profile-lesshistorical_activityand personinstall_inferredmerge as one unique user. - Adoption curve =
install_inferred∪install_completed(is_update=false), deduped per distinct_id (telemetry-era installs emit both — document on the dashboard). - Trend on unique
historical_activityusers by week = reconstructed WAU. - Annotate dashboards: survivorship bias;
versionslicing must filterbackfilled != true; geo on backfilled events is upload-time.
- BEFORE trusting reconstructed WAU: build a unique-users trend mixing one person event
(
Phase ordering & session boundaries
Phase 1 (module + tests) → Phase 2 (live dry-run gate, BEFORE any wiring) → Phase 3
(wiring + docs, telemetry-guarded) → Phase 4 (verification + post-ship). An executor starting
any phase cold should read this file's Phase 0 plus the cited source files for that phase
before writing code. Run bun install first — the worktree has no node_modules.
Review log (2026-06-12)
Adversarially reviewed by a 55-agent workflow (5 dimensions × independent skeptic verification; 26 findings confirmed, 6 downgraded, 1 disputed, 0 refuted). Material changes vs. the first draft:
- Pinned rollup semantics (session_count from sdk_sessions only; project_count via UNION distinct; discovery_tokens from summaries only — was multi-counted ~Nx per obs-per-turn).
- Day window redefined (whole-day buckets ≤
utcDay(now-60h); was an ambiguous row-level 48h filter that could ship truncated days and <48h timestamps). - install_inferred: kept sessions-first MIN (a proposed cross-table MIN would have baked
two verified backdated artifact rows — 2025-08-12 epochs written by a 2026-04-10 session —
into undeletable data); added installDay clamp on rollups, 60h skip rule, noon-UTC
retry-stable timestamp, and explicit
first_active_dateassignment (was whitelisted but never attached to any event). - Marker gating rebuilt on real SDK semantics: bare
flush()is not a delivery barrier andshutdown()swallows errors — single-batch config +on('error')latch + marker only on clean shutdown. Dropped the 3s race and the dead "flush every 5,000" advice. - Phase order fixed: the live dry-run now precedes worker wiring (the embedded prod key + default-on consent meant the old Phase 3 "boot the worker" step performed the real irreversible send before the old Phase 4 dry-run, whose marker then made that dry-run a silent no-op).
- More usage data, same privacy posture: added prompt_count, obs_type_* breakdown (whitelist keys already exist from live telemetry), session outcome counts, platform buckets, subagent count; project_count now includes session-only days.
- Simplifications: Bun.randomUUIDv5 one-liner replaces hand-rolled sha256 nibble-forcing (the "invented API" anti-pattern claim was false — verified on Bun 1.3.9); deleted dead whitelist keys (backfill_days, backfill_events); old Phases 1+2 merged; debug gate moved ahead of all marker writes; corrected citations (discovery_tokens lives in a migration at runner.ts:381-402, not the base schema; there is no JSON-write helper).