feat(telemetry): person profiles on lifecycle events to unlock retention/cohort analytics

PostHog cannot compute retention, stickiness, lifecycle, or cohort
insights on profile-less events — exactly the charts growth reporting
needs. Lifecycle events (install_*, uninstall_completed, worker_started;
~1-2/day/install) now build a person profile keyed to the anonymous
install UUID with $set restricted to whitelisted enums. High-volume
operational events stay $process_person_profile:false for cost.

Adds plans/2026-06-09-telemetry-metrics-spec.md mapping every event to
the growth/retention/activation/reliability metric it powers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-06-09 18:13:24 -07:00
parent 482a6c921d
commit 4bdea1e21c
9 changed files with 310 additions and 163 deletions

View File

@@ -17,7 +17,9 @@ The standard [`DO_NOT_TRACK`](https://consoledonottrack.com) environment variabl
## What is collected
When enabled, events are anonymous and identified only by a random install UUID (`crypto.randomUUID()`, generated locally on first use). Events are sent with `$process_person_profile: false`, so PostHog never builds a person profile.
When enabled, events are anonymous and identified only by a random install UUID (`crypto.randomUUID()`, generated locally on first use).
Low-volume lifecycle events (`install_*`, `uninstall_completed`, `worker_started`) build an analytics profile keyed to that random UUID so aggregate retention and cohort statistics are computable — the profile contains nothing beyond the whitelisted fields below (platform, version, IDE/provider choice). It is not, and cannot be, connected to you: there is no name, email, IP, hardware ID, or any other identifier. All high-volume events (`session_compressed`, `search_performed`, `context_injected`, `error_occurred`) are sent with `$process_person_profile: false` and build no profile at all.
Every event property passes through a strict whitelist scrubber — any key not in this table is silently dropped before sending:

View File

@@ -0,0 +1,89 @@
# Telemetry Metrics Spec — the story the data must tell
Audience: us, when building the PostHog dashboard and the fundraise narrative.
Premise: 82k GitHub stars, zero analytics history. The dataset starts the day
this ships, so every chart below is designed to be meaningful within 48 weeks
of data and to compound from there.
## The narrative arc (what a deck slide needs to say)
1. **Reach** — "X installs/week and growing N% w/w, across 12 IDEs."
2. **Habit** — "Installs come back: D30 retention X%, DAU/MAU X%."
3. **Value loop** — "Memory isn't shelfware: X% of installs reach the aha
moment, and active installs read memory back X times/day."
4. **Reliability** — "Core pipeline succeeds X% of the time at scale."
Everything below maps an event to one of those four sentences. If a metric
doesn't feed a sentence, it doesn't go on the dashboard.
## Unit of measure — be precise with VCs
The `distinct_id` is an **install** (one machine + one `~/.claude-mem`), not a
human. Quote "active installs", never "users". This is the honest dev-tool
convention (Homebrew, VS Code extensions count the same way) and diligence
will check. Reinstalls keep the same ID (uninstall preserves the data dir), so
returning installs are not double-counted.
Always filter `is_ci = false` on every insight. CI noise inflates everything.
## Event → metric map
### Reach (growth accounting)
| Metric | Definition |
|---|---|
| New installs/week | unique `distinct_id` on `install_completed` where `is_update = false` |
| Upgrade adoption | `install_completed` where `is_update = true`, broken down by `version` |
| Active installs (WAU/MAU) | unique `distinct_id` on `worker_started` (start + daily heartbeat = presence signal) |
| Churn | `uninstall_completed` count; net growth = new uninstalls |
| Surface mix | `install_completed` breakdown by `ide`, `provider`, `runtime_mode` |
### Habit (retention — the slide that raises the round)
| Metric | Definition |
|---|---|
| D1/D7/D30 retention | PostHog Retention insight: first `install_completed` → returning on `worker_started`. Requires person profiles — that's why lifecycle events set them. |
| Stickiness (DAU/MAU) | PostHog Stickiness insight on `worker_started` |
| Lifecycle | PostHog Lifecycle insight on `worker_started` (new / returning / resurrecting / dormant) |
| Retention by segment | same retention insight broken down by person property `ide` or `provider` — "Cursor installs retain 2×" is a fundable sentence |
### Value loop (activation + engagement)
| Metric | Definition |
|---|---|
| Activation funnel | Funnel: `install_completed` → first `session_compressed` → first `context_injected`. The third step is the aha moment: stored memory actually used. |
| Time-to-value | median time from `install_completed` to first `context_injected` |
| Engagement depth | `session_compressed` count per active install per day; `context_injected` per active install per day |
| Read/write ratio | `context_injected` ÷ `session_compressed` — memory being consumed, not hoarded |
| Feature adoption | `search_performed` breakdown by `endpoint` |
### Reliability (diligence armor)
| Metric | Definition |
|---|---|
| Compression success rate | `session_compressed` outcome ok ÷ all, by `version` and `provider` |
| Error rate | `error_occurred` per active install, by `error_category` and `version` |
| Latency health | p50/p95 `duration_ms` on `session_compressed`, `search_performed`, `context_injected` |
| Install success rate | `install_completed` ÷ (`install_completed` + `install_failed`), failures by `error_category` |
## Person-profile design (cost control)
Only lifecycle events (`install_*`, `uninstall_completed`, `worker_started`)
carry person profiles — ~12 events/day/install, so profile-priced ingestion
stays bounded even at 100k installs. High-volume operational events are
profile-less (cheaper tier). Person properties are the whitelisted enums only:
`version`, `os`, `arch`, `runtime`, `locale`, `ide`, `provider`, `runtime_mode`.
## Caveats to state proactively in diligence
- Telemetry is opt-out (`DO_NOT_TRACK` honored, one-command disable); numbers
undercount by the opt-out rate. That's the credible direction to undercount.
- Data starts <date this ships>; star history is the pre-telemetry proxy.
- One human can be several installs (work + home). Quote installs.
## Dashboard build order (PostHog UI, ~30 min)
1. Trends: weekly unique `worker_started` (active installs) + weekly
`install_completed` where `is_update=false` (new installs).
2. Retention: `install_completed``worker_started`, weekly, breakdown `ide`.
3. Funnel: `install_completed``session_compressed``context_injected`,
14-day window.
4. Stickiness + Lifecycle on `worker_started`.
5. Trends: `session_compressed` outcome error ÷ total (reliability), p95
`duration_ms` (latency).

File diff suppressed because one or more lines are too long

View File

@@ -1374,7 +1374,7 @@ export async function runInstallCommand(options: InstallOptions = {}): Promise<v
} catch (error: unknown) {
if (error instanceof InstallAbortError) {
// error.category.id is OUR taxonomy id (error-taxonomy.ts), never a message.
await captureCliEvent('install_failed', { error_category: error.category.id });
await captureCliEvent('install_failed', { error_category: error.category.id }, { person: true });
// Flush whatever warnings accrued before the abort, then print the
// remediation headline and exit non-zero. ABORT must never reach the
// "Installation Complete" path.
@@ -1816,7 +1816,7 @@ async function runInstallCommandInner(options: InstallOptions, summary: InstallS
is_update: alreadyInstalled,
outcome: failedIDEs.length > 0 ? 'partial' : 'ok',
duration_ms: Date.now() - installStartedAt,
});
}, { person: true });
}
export async function runRepairCommand(): Promise<void> {

View File

@@ -400,7 +400,7 @@ export async function runUninstallCommand(): Promise<void> {
// Capture BEFORE the data dir note becomes stale advice: consent and the
// install ID still live in ~/.claude-mem, which uninstall preserves.
await captureCliEvent('uninstall_completed');
await captureCliEvent('uninstall_completed', {}, { person: true });
p.outro(pc.green('claude-mem has been uninstalled.'));
}

View File

@@ -10,7 +10,7 @@
import { resolveTelemetryConsent, loadTelemetryConfig, getOrCreateInstallId } from './consent.js';
import { scrubProperties } from './scrub.js';
import { getTelemetryApiKey, getTelemetryHost, buildBaseProperties } from './common.js';
import { getTelemetryApiKey, getTelemetryHost, buildBaseProperties, buildPersonSet } from './common.js';
const CAPTURE_TIMEOUT_MS = 2000;
@@ -21,7 +21,8 @@ const CAPTURE_TIMEOUT_MS = 2000;
*/
export async function captureCliEvent(
event: string,
props?: Record<string, unknown>
props?: Record<string, unknown>,
opts?: { person?: boolean }
): Promise<void> {
try {
if (!resolveTelemetryConsent(process.env, loadTelemetryConfig())) {
@@ -32,7 +33,13 @@ export async function captureCliEvent(
...buildBaseProperties(),
...(props ?? {}),
});
// Lifecycle events (install_* / uninstall) build the anonymous person
// profile that powers retention and cohort insights; see telemetry.ts.
if (opts?.person) {
properties.$set = buildPersonSet(properties);
} else {
properties.$process_person_profile = false;
}
if (process.env.CLAUDE_MEM_TELEMETRY_DEBUG === '1') {
process.stderr.write('[telemetry] ' + JSON.stringify({ event, properties }) + '\n');

View File

@@ -25,6 +25,39 @@ export function getTelemetryHost(): string {
return process.env.CLAUDE_MEM_TELEMETRY_HOST || DEFAULT_TELEMETRY_HOST;
}
/**
* Whitelisted properties that may also be set as PostHog person properties on
* lifecycle events (install_*, worker_started). The "person" is the anonymous
* install UUID — these traits make retention/cohort insights sliceable by
* platform and product choices. Strict subset of the scrub whitelist.
*/
export const PERSON_PROPERTY_KEYS = [
'version',
'os',
'arch',
'runtime',
'locale',
'ide',
'provider',
'runtime_mode',
] as const;
/**
* Splits already-scrubbed properties into a $set object for person-profile
* events. Lifecycle events are low-volume (~1-2/day/install), so the
* person-profile ingestion cost is bounded while unlocking PostHog's native
* retention, stickiness, lifecycle, and cohort insights.
*/
export function buildPersonSet(
scrubbed: Record<string, unknown>
): Record<string, unknown> {
const set: Record<string, unknown> = {};
for (const key of PERSON_PROPERTY_KEYS) {
if (scrubbed[key] !== undefined) set[key] = scrubbed[key];
}
return set;
}
export function buildBaseProperties(): Record<string, unknown> {
return {
version: packageVersion,

View File

@@ -5,7 +5,7 @@ import {
getOrCreateInstallId,
} from './consent.js';
import { scrubProperties } from './scrub.js';
import { getTelemetryApiKey, getTelemetryHost, buildBaseProperties } from './common.js';
import { getTelemetryApiKey, getTelemetryHost, buildBaseProperties, buildPersonSet } from './common.js';
let client: PostHog | null = null;
let isShutdown = false;
@@ -51,8 +51,20 @@ function getClient(): PostHog {
* 4. No API key configured — no-op (telemetry ships dark until the
* publishable token lands).
* 5. posthog.capture() — SDK queues in memory and batches in background.
*
* Two event classes (opts.person):
* - Lifecycle events (worker_started, install_*) pass person: true. They are
* low-volume and build an anonymous person profile keyed to the random
* install UUID, which is what makes PostHog's retention / stickiness /
* lifecycle / cohort insights work. Person properties are restricted to
* PERSON_PROPERTY_KEYS — the same whitelisted enums as event properties.
* - Everything else (high-volume operational events) stays profile-less.
*/
export function captureEvent(event: string, props?: Record<string, unknown>): void {
export function captureEvent(
event: string,
props?: Record<string, unknown>,
opts?: { person?: boolean }
): void {
try {
// Once shutdown has flushed the client, late events (e.g. a request that
// raced graceful stop) are dropped rather than queued in a new client
@@ -65,9 +77,13 @@ export function captureEvent(event: string, props?: Record<string, unknown>): vo
...buildBaseProperties(),
...(props ?? {}),
});
// Anonymous events: no person profile processing. Added AFTER scrubbing —
// $-prefixed PostHog directives are not user data and bypass the whitelist.
// $-prefixed PostHog directives are not user data and bypass the whitelist;
// they are added AFTER scrubbing.
if (opts?.person) {
properties.$set = buildPersonSet(properties);
} else {
properties.$process_person_profile = false;
}
if (process.env.CLAUDE_MEM_TELEMETRY_DEBUG === '1') {
// Direct stderr write (not console.*): debug mode is a human running the

View File

@@ -298,13 +298,13 @@ export class WorkerService implements WorkerRef {
captureEvent('worker_started', {
trigger: 'start',
duration_ms: Date.now() - this.startTime,
});
}, { person: true });
// Long-lived workers would otherwise look like a single day of activity.
// A daily heartbeat makes DAU/WAU/retention computable from distinct_id.
// unref() so the timer never keeps a stopping process alive.
this.telemetryHeartbeat = setInterval(() => {
captureEvent('worker_started', { trigger: 'heartbeat' });
captureEvent('worker_started', { trigger: 'heartbeat' }, { person: true });
}, 24 * 60 * 60 * 1000);
this.telemetryHeartbeat.unref?.();