Opt-in PostHog product analytics living in the long-running worker. Consent precedence DO_NOT_TRACK > CLAUDE_MEM_TELEMETRY > telemetry.json > default OFF; whitelist-only scrubber; claude-mem telemetry CLI; install-flow consent prompt as the final step; async dependency installs with live spinner heartbeat; full privacy docs. Ships dark until the publishable key lands.
9.7 KiB
Opt-In PostHog Telemetry for claude-mem
Goal: One mergeable PR adding privacy-respecting, OPT-IN product analytics to claude-mem using the official posthog-node SDK, gated by a consent resolver and a whitelist property scrubber. Ships dark: default OFF, nothing leaves the machine without explicit consent.
Origin: Research on branch claude/analytics-platforms-comparison-o58erd (commit 256b3584f, claude-mem-analytics-research-transcript.txt) + the official PostHog skill (npx skills use posthog/ai-plugin@instrument-product-analytics).
Architecture decision (locked — do not re-litigate)
The research transcript proposed a SQLite telemetry_events spool table drained by a worker flush job. We are NOT building that. The spool solved "short-lived hook processes exit before the SDK's in-memory queue flushes" — but in claude-mem all the events worth tracking (compression, search, startup, errors) occur inside the long-running worker service, which is exactly where a posthog-node client is designed to live. So:
posthog-nodeSDK initialized lazily, once, in the worker. SDK handles batching/retry/flush.- NO new SQLite migration. NO custom
/batch/wire format. NO custom retry loop. - Every capture goes through ONE wrapper: consent gate → whitelist scrub → debug-print or
posthog.capture(). The wrapper never throws and never blocks. cli.commandevents from short-lived npx processes are scope-cut from v1 (future: POST to worker HTTP). Keeps the PR small.
What we keep from the research (the trust-critical layer):
- Opt-in, default OFF. Consent precedence:
DO_NOT_TRACK(truthy → forced off) >CLAUDE_MEM_TELEMETRYenv (0/false/off→ off,1/true/on→ on) >telemetry.jsonconfig > default OFF. - Random install UUID (crypto.randomUUID, first use) + consent stored in
~/.claude-mem/telemetry.json(data dir via existing helper; survives DB resets). UUID = PostHogdistinctId. - Whitelist scrubber. Allowed property keys ONLY:
version,os,arch,runtime,runtime_version,duration_ms,outcome,error_category,locale,is_ci, plus the event name. Everything else dropped. NEVER: paths, project names, git remotes, prompts, source code, IPs, hardware IDs (even hashed), env values, emails. CLAUDE_MEM_TELEMETRY_DEBUG=1→ print would-be payloads to stderr, send nothing.claude-mem telemetry [status|enable|disable]CLI command.docs/public/telemetry.mdxenumerating every field collected and not-collected.- Unit tests for the consent resolver and the scrubber.
Allowed APIs (verified)
posthog-node (https://posthog.com/docs/libraries/node — re-read before coding):
new PostHog(apiKey, { host, flushAt, flushInterval })— host defaulthttps://us.i.posthog.com; key/host viaCLAUDE_MEM_TELEMETRY_KEY/CLAUDE_MEM_TELEMETRY_HOSTenv with a hardcoded publishable project token fallback constant (publishable tokens are safe to embed — verified at posthog.com/docs/api: capture endpoints are public POST-only, no rate limits, 20MB body cap).client.capture({ distinctId, event, properties })— fire-and-forget, queues in memory.- Set
properties.$process_person_profile = falseon every event (anonymous events — cheaper, privacy-aligned). await client.shutdown()— flush on worker graceful stop ONLY. Never on the capture path.
Codebase facts (verified 2026-06-09):
- Migration runner is at version 34 (
src/services/sqlite/migrations/runner.ts) — we deliberately do not touch it. - Tests:
bun test, files attests/**/*.test.ts, import frombun:test; copy style fromtests/json-utils.test.ts. - Worker code:
src/services/worker/(Expresshttp/,SettingsManager.ts, providers). CLI commands:src/npx-cli/commands/(e.g.doctor.ts). @clack/promptsis already a dependency (used by installer flows).- Event naming: snake_case (per official PostHog skill).
Phase 1 — Discovery + consent & scrub core (pure logic, no network, no SDK)
- Grep for existing
posthog/telemetryreferences (expect only incidental SDK-flag mentions inProviderObservationGenerator.ts,ClaudeProvider.ts,ChromaMcpManager.ts— read them to confirm they're unrelated, do not modify). - Find the existing data-dir helper that resolves
CLAUDE_MEM_DATA_DIR/~/.claude-mem(grepCLAUDE_MEM_DATA_DIRinsrc/), and thereadJsonSafeutil (src/utils/json-utils.ts). Reuse both. - Create
src/services/telemetry/consent.ts:resolveTelemetryConsent(env, config): boolean— pure, precedence as locked above.loadTelemetryConfig() / saveTelemetryConfig()—telemetry.jsonin data dir:{ enabled: boolean, installId: string, decidedAt: string }.getOrCreateInstallId(): string.
- Create
src/services/telemetry/scrub.ts:scrubProperties(props): Record<string, string|number|boolean>— whitelist filter, primitive-only values, truncate strings >200 chars. - Tests:
tests/telemetry/consent.test.ts,tests/telemetry/scrub.test.ts(style oftests/json-utils.test.ts). Cover: DO_NOT_TRACK=1 beats enabled config; env off beats config on; default off; unknown keys dropped; nested objects dropped; denylisted-looking keys (path, cwd, prompt) dropped even if someone whitelists by mistake — add an explicit denylist assertion test.
Verify: bun test tests/telemetry/ green. No imports from posthog-node anywhere yet.
Phase 2 — SDK wrapper + worker capture sites
npm install posthog-node(package manager command, do not hand-edit package.json). Checkesbuild/build config: if worker bundle has anexternalarray pattern (it did forbullmq/pg), determine whetherposthog-nodebundles cleanly; mirror the existing convention.- Create
src/services/telemetry/telemetry.ts:- Lazy singleton
getClient()— constructs PostHog only on first consented capture. captureEvent(event: string, props?: Record<string, unknown>): void— consent gate → base props (version from package.json/version helper,os,arch,runtime: 'bun'|'node',runtime_version,is_ci,locale) → scrub → ifCLAUDE_MEM_TELEMETRY_DEBUG=1print JSON to stderr and return → elseclient.capture({...}). Entire body in try/catch that swallows (debug-log only). Synchronous, O(1).shutdownTelemetry(): Promise<void>— flush with a 3s race timeout; never rejects.
- Lazy singleton
- Wire capture sites (find exact locations by reading worker code; keep each insertion to 1-3 lines):
worker_started— worker service startup completion (src/services/worker-service.tsor equivalent startup path).session_compressed— where a session compression/summarization completes successfully (withduration_ms,outcome).search_performed—SearchManager/ search HTTP route entry (NO query text — onlyoutcome).error_occurred— central worker error handler if one exists (witherror_categoryonly — a coarse enum string, nevererror.message).shutdownTelemetry()in the worker's graceful-stop path (find existing SIGTERM/stop handler).
- Anti-pattern guards: no
posthogimport outsidesrc/services/telemetry/; nocapture(call site passing raw error messages, paths, or query strings; key only via env/constant.
Verify: npm run build (or the repo's build script) passes. bun test green. Grep checks: grep -rn "posthog" src/ | grep -v services/telemetry returns nothing; with consent absent, boot worker with CLAUDE_MEM_TELEMETRY_DEBUG=1 → zero telemetry stderr lines (gate is before debug print? NO — debug prints only when consented; with no consent, nothing prints. Confirm that ordering in code: consent gate FIRST, debug second).
Phase 3 — CLI command + docs
- Read
src/npx-cli/commands/doctor.tsand the command router that registers it; copy the registration pattern exactly. - Create
src/npx-cli/commands/telemetry.ts:status(default): prints enabled/disabled, which layer decided it (DO_NOT_TRACK / env / config / default), install ID, and the docs URL.enable:@clack/promptsconfirm showing the exact field list collected + "no prompts, paths, code, or project names — ever" + docs link; on yes, write config with installId.disable: writeenabled: false. No prompt.
- Create
docs/public/telemetry.mdx+ add nav entry indocs/public/docs.json(check existingdocs.jsonstructure first): tables of every field collected / explicitly never collected, all four disable methods (DO_NOT_TRACK, env var, config, CLI command), debug mode, where telemetry.json lives, the fact it's opt-in and ships disabled.
Verify: run the CLI command locally (bun src/npx-cli/... telemetry status per repo's dev pattern) — status reflects env overrides correctly (test with DO_NOT_TRACK=1).
Phase 4 — Verification + PR
- Full
bun testand the build script. Fix anything broken by the new dep. - Anti-pattern sweep (all must pass):
grep -rn "from 'posthog-node'" src/ | grep -v services/telemetry/telemetry→ empty.grep -rn "captureEvent(" src/→ every call site's props are literal objects containing only whitelisted keys.- No personal API key (
phx_) anywhere; only publishable (phc_) token constant/env. telemetry.jsonnever written without explicit user action (enable command) — grepsaveTelemetryConfigcall sites.
- Branch
feat/opt-in-telemetryfrommain, commit (conventional:feat(telemetry): opt-in anonymous usage analytics via PostHog), push,gh pr createwith: summary, the architecture-decision paragraph (worker-resident SDK, no spool), the collected/never-collected table, and screenshots/output oftelemetry status. - Hand off to
claude-mem:babysitfor the PR.
Done means: PR open, CI green, telemetry provably inert by default (fresh install sends nothing), and the privacy docs page renders.