* plan-10 Phase 1: ship deterministic plugin runtime dependency closure Approach A — commit & ship plugin/bun.lock so the plugin's runtime node_modules install is deterministic, fixing the recurring `Cannot find module 'zod/v3'` (#2730). - align generated plugin zod range to root (^4.4.3) in build-hooks.js - new scripts/gen-plugin-lockfile.cjs generates plugin/bun.lock as a build artifact after build-hooks.js writes plugin/package.json - track & ship plugin/bun.lock (.gitignore negation, .npmignore, files allowlist) - install with `bun install --frozen-lockfile --ignore-scripts` at runtime Refs #2783, #2730 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plan-10 Phase 2: fail loud at install time on a broken dependency closure Strengthen verifyCriticalModules to assert each dependency is actually importable via require.resolve (not merely a directory), and assert the worker-required zod subpaths resolve: zod/v3, zod/v4, zod/v4-mini. A partial/stale install now fails `npx claude-mem install` immediately instead of surfacing later as a Stop-hook `Cannot find module 'zod/v3'`. Bin-only packages (e.g. tree-sitter-cli, which has no bare-name entry point) fall back to resolving <dep>/package.json so a healthy install isn't falsely rejected. Adds tests/cli/verify-critical-modules.test.ts covering a missing zod/v3 subpath (throws), a complete zod (passes), and a bin-only dep (passes). Refs #2783, #2730 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plan-10 Phase 3: clean-room install + import smoke test (#2730 backstop) Add scripts/smoke-clean-room.cjs and a `smoke:clean-room` npm script. Against fresh temp dirs (never the repo's node_modules) it: - copies plugin/, runs `bun install --frozen-lockfile --ignore-scripts`, asserts zod, zod/v3, zod/v4, zod/v4-mini resolve, and boots the bundled worker asserting no `Cannot find module` — the direct #2730 regression guard; - `npm pack`s, installs the tarball into a second temp dir, and load-tests the published bin entrypoint, warning loudly on any declared main/exports target missing from the tarball (latent #2537 gap). Exits non-zero naming the missing module on any failure; cleans up all temp dirs and the tarball in a finally. Refs #2783, #2730 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plan-10 Phase 4: gate CI and publish on the clean-room dependency closure - ci.yml: new `clean-room-deps` job (between build and the docker e2e job) runs a frozen-lockfile drift check on the committed plugin lockfile, then `npm run build` + `npm run smoke:clean-room`. The drift step catches a contributor who changed plugin deps without regenerating plugin/bun.lock. - npm-publish.yml: add setup-bun and run `npm run smoke:clean-room` between build and `npm publish`, so a broken runtime closure cannot be published on a tag push (ci.yml does not run on tags). Secrets block untouched. Refs #2783, #2730 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plan-10: doc recluster note + Phase 0 execution slice for #2730 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plans: backlog recluster (2026-06-04) — cross-cluster execution order + plan-13 doc Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * plan-10: gen-plugin-lockfile degrades gracefully when bun is absent The Windows build CI job has no bun on PATH; regenerating the lockfile there threw and failed the build. The committed plugin/bun.lock is already the deterministic closure, so skip regeneration (non-fatal) when bun is missing and a lockfile exists; fail loud only when neither is available. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: rebuild plugin artifacts after merging main (v13.5.1) + plan-10 work Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore: rebuild plugin artifacts after merging main v13.5.5 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore(deps): daily upgrade pass — agent SDK 0.3.172, better-auth 1.6.16, posthog-node 5.36.15, dompurify 3.4.9 - Bump @anthropic-ai/claude-agent-sdk 0.2.141 -> 0.3.172 (tsc + full test suite green) - Remove deprecated @types/dompurify stub (dompurify ships its own types) - Add overrides.tmp ^0.2.7 to clear GHSA-52f5-9888-hmc6 / GHSA-ph9p-34f9-6g65 via np -> listr-input -> inquirer -> external-editor -> tmp chain - npm audit: 0 vulnerabilities; npm outdated: clean - package-lock.json is gitignored in this repo, so not committed Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * plan: worker-restart single-source-of-truth — 7-phase fix for restart races Phased plan from the adversarially-verified diagnosis (wf_f07f3541-b05): kill the cache mirror, single verified restart initiator, self-replacing restart endpoint, unified spawn gate with lockfile, PID-file demotion, test data-dir isolation, soak verification. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(restart): delete sync-script cache-mirror and HTTP restart trigger Phase 1 of plans/2026-06-10-worker-restart-single-source-of-truth.md. The installed-version cache mirror wrote version-N code into the version-(N-1) cache dir, manufacturing permanent version disagreement; the HTTP POST to /api/admin/restart raced the CLI restart that follows it in build-and-sync. Both are deleted; the CLI worker:restart in the marketplace copy is now the single restart initiator, and the sleep 1 between the two mechanisms is gone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(restart): restart proves itself or exits 1 Phase 2 of plans/2026-06-10-worker-restart-single-source-of-truth.md. worker-service restart now captures the old worker pid, waits for the port with the same platform-scaled 15s budget as stop, spawns the marketplace copy of worker-service.cjs when present, then polls /api/health until the pid changes and the version matches this build's baked __DEFAULT_PACKAGE_VERSION__ — success is printed to stdout, deadline (platform-scaled 30s) exits 1 with the last observed health payload and the spawned script path. The --daemon generic start-failure path now exits 1 instead of masquerading as success; the three duplicate-suppression exits remain 0. New helper src/services/restart-verify.ts (worker-service.ts bootstraps on import, so the helper lives in an import-safe module) with 8 tests covering pid-flip success, stale pid, wrong version, unreachable timeout, 503-degraded acceptance, and null-oldPid version-only verification. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(restart): self-replacing worker — old worker spawns its successor Phase 3 of plans/2026-06-10-worker-restart-single-source-of-truth.md. /api/admin/restart was kill-only: hooks that POSTed it then raced the dying worker with their own lazy-spawn (the observed recycle ping-pong). Now the dying worker spawns its successor itself — after a re-entrancy- guarded, deadline-bounded (platform-scaled 10s) graceful shutdown, and only once its port is confirmed free; stop and signal shutdowns stay kill-only. The hook recycle path waits for that successor via /api/health polling (HOOK_READINESS_TIMEOUT_MS budget) and lazy-spawns only as a fallback, with a warn-only version re-check so a hook never recycles more than once per invocation. Shutdown sequence lives in import-safe src/services/worker-shutdown.ts (worker-service.ts bootstraps on import); registerSignalHandlers no longer pre-sets isShuttingDown — the supervisor's shutdownInitiated guard owns signal dedupe, and pre-setting would no-op the new entry guard. 13 new tests cover re-entrancy, deadline expiry/rejection, handoff ordering, kill-only reasons, successor-wait vs lazy-spawn fallback, and pre-graceful bookkeeping failures. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(restart): one spawn gate; CLI restart defers to the self-replacing worker Phase 4 of plans/2026-06-10-worker-restart-single-source-of-truth.md. Three uncoordinated spawn paths (hook lazy-spawn, MCP worker-spawner, CLI) with two different bun resolvers produced 3-launcher collisions within a single second. Now a wx-flag lockfile (<DATA_DIR>/spawn.lock, 60s mtime staleness with re-stat-before-unlink, owner-checked release) gates every external spawn: lock losers never fail — they skip the spawn and wait for the winner's worker. resolveBunRuntime is deleted in favor of ProcessManager's resolveWorkerRuntimePath (adds BUN_PATH, ~/.bun/bin, brew, which fallbacks), closing the kill-then-can't-respawn path; mcp-server prefers the marketplace worker script so stale cache dirs stop spawning stale workers. Integration fix surfaced by live verification: the CLI restart raced the Phase 3 self-replacement handoff (the successor re-binds the port in ~200ms, so waitForPortFree always timed out and restart exited 1 while the restart had actually succeeded). The CLI now verifies the worker's self-spawned successor directly, and only spawns — gate- wrapped, after the port frees — as the fallback when no worker was running, the shutdown POST was rejected, or no successor appeared. The dying worker's handoff is intentionally ungated: it spawns only after its own port closes, and hooks wait on it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(restart): demote the PID file — health and port are the liveness oracle Phase 5 of plans/2026-06-10-worker-restart-single-source-of-truth.md. The dying worker's shutdown cascade deleted the PID file unconditionally as its final act, clobbering the successor's freshly-written file; status then required portInUse AND pidInfo, so a healthy worker reported as "not running". Now every PID-file deletion is owner-guarded: the supervisor cascade deletes only its own pid (removeOwnedPidFile), and the CLI stop/restart-fallback, the restart handoff, and the daemon start-failure cleanup go through removePidFileIfOwner (owner-or-dead — a live successor's file always survives; corrupt files are left for the next boot's validator). status sources from GET /api/health alone (pid, version, uptime, workerPath; 503-degraded counts as running and now surfaces its queue detail), with port-in-use-but-unreachable and not-running fallbacks — all exit 0 as before. The --daemon duplicate gate checks the port first (ground truth) and the PID file second (advisory, for the freed-port-but-undeleted-file window); duplicate suppression stays exit 0. writePidFile/touchPidFile remain — the file is diagnostics, and the worker stays its only writer. Also fixes combined-run test pollution: spawn-gate and worker-utils timeout tests now eagerly import paths.js before setting a temp CLAUDE_MEM_DATA_DIR, so the import-time DATA_DIR const can't freeze on a deleted temp dir for suites loaded later in the same bun process. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: no test ever touches the real ~/.claude-mem again Phase 6 of plans/2026-06-10-worker-restart-single-source-of-truth.md. process-manager and graceful-shutdown tests wrote corrupt JSON and sentinel PIDs (2147483647) into the real ~/.claude-mem/worker.pid and drove the real supervisor.json cascade under a snapshot-restore that a killed run would skip — that pollution contaminated production logs and a prior diagnosis. Both files now set a temp CLAUDE_MEM_DATA_DIR at the top of the file before dynamically importing the code under test (ESM hoisting makes beforeEach too late), assert their paths landed outside the real dir, and derive PID_FILE from the same frozen paths module the code uses so test and code can never diverge under bun's shared module cache. The snapshot-restore scaffolding is deleted; zero assertions changed. tests/preload.ts gains a tripwire: when CLAUDE_MEM_DATA_DIR is unset it fills a per-run temp dir, so no test in any file can fall through to the real data dir. Fallout made explicit: worker-spawn child processes get an explicit temp dir; install-error-matrix restores rather than deletes the env var; settings-defaults-manager pins the unset-env default it was implicitly relying on. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(settings): bootstrap notices go to stderr, never stdout CI on PR #2894 caught the latent bug: on the first boot in a fresh data dir, SettingsDefaultsManager printed '[SETTINGS] Created settings file with defaults: ...' to stdout before the start command's JSON hook payload, corrupting the machine-readable contract every fresh install's first hook invocation relies on. The Phase 6 per-run temp data dir made the cold-dir case deterministic in CI, exposing it. Both informational notices (creation, nested-schema migration) now use console.warn — stderr — matching the function's existing failure-path idiom; two regression tests pin stdout silence on both paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(restart): address PR #2894 review — dedupe script resolver, skip futile port wait Both inline copies of the marketplace-first script-candidate list in worker-service.ts (restart fallback + successor handoff injection) now call the exported resolveWorkerScriptPath() ?? __filename, so the candidate list lives in one place. verifyRestartedWorker's failure result gains lastPollSawHealth; when the self-replacement handoff verification timed out while a live (but unverifiable) worker was still serving on the port, the CLI fallback now skips its port-free wait — the port cannot free while that worker lives, so the wait only burned its full platform-scaled budget before the same final verification ran anyway. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(sdk): stale Claude CLI can no longer silently kill every observation findClaudeExecutable validated candidates with --version only, so an abandoned old install shadowing a current CLI in PATH (e.g. npm-global 2.0.42 next to the auto-updating native installer) passed validation, then died at every Observer spawn: the SDK passes --permission-mode dontAsk (hardened-options) which old CLIs reject with exit 1. Result: healthy worker, zero observations, no visible error (#2782 family; previously #1857/#2049/#1866/#2142 in the same class). - Probe every candidate (which -a + known install paths) with `--permission-mode dontAsk --version`: one spawn proves both flag compatibility and version, no API call (~150ms) - Prefer the newest capable version; PATH order only breaks ties - Explicit CLAUDE_CODE_PATH still wins but fails loud with version and remedy when too old, instead of dying silently at spawn - All-too-old throws an error naming each candidate, its version, and how to fix; resolution success logs at INFO with the chosen version - Cache successful resolution 15 min (resolver runs per SDK query); never cache failure so a CLI update is picked up without restart - SDK child keeps a 2KB stderr tail and includes it in the exit WARN, so "unknown option" deaths are diagnosable at default log level Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(audit): add ponytail audit master plan + SQLite removal companion Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-0): remove commit-hash fabrication verification Removes the commit-verification system whose second-system effects (greedy hex regex false-positives + compensating scaffolding) cost more than the edge case it caught. Deletes src/sdk/commit-verification.ts + its 2 tests, the verify-before-persist block + stripFabricatedHashesFromSummary in ResponseProcessor, and the now-dead fabricated_count/fabrication_* telemetry in buffer.ts/scrub.ts/npx-cli telemetry.ts. Consequence: PostHog Fabrication Rate tile goes dark (no other producer). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-1): remove duplicate SQLite stack, test the shipping path Two parallel SQLite systems lived in src/services/sqlite/. The worker only ever used SessionStore (which runs its own inline migrations); the second stack — ClaudeMemDatabase/MigrationRunner + a free-function CRUD API — was reachable only from tests. This removes the dead duplicate and rewrites its tests against the real SessionStore path. Deleted: Database.ts, migrations/runner.ts, index.ts, SchemaRepair.ts, all 6 CRUD barrels, transactions.ts, and the sessions/summaries/prompts/timeline/import submodules (18 files). Trimmed the 5 survivor leaf functions to their single live export. Removed dead SessionStore methods getSessionSummaryById (broken) and storeObservationsAndMarkComplete (0 callers). Replaced 9 old test files (coupled to the dead stack) with 7 focused session-store-*.test.ts suites + a rewritten cleanup-v12_4_3 test, all exercising the worker's real path. Net ~-3,262 src lines. tsc clean; full suite 2253 pass / 0 fail; dead-reference sweep zero. No production behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-6): drop dead storage methods, dedupe EnvManager, stdlib swaps Dead-code deletes (all zero-caller verified): sqlite ProjectsRepository.upsert + getByRootPath, ServerSessionsRepository.getByMemorySessionId, MemoryItemsRepository.getByLegacyObservationId, pg AgentEventsRepository.createMany + TeamsRepository.getMember; shared getWorkerSocketPath, createBackupFilename, and the deprecated ENV_FILE_PATH const. Refactors (behavior-preserving): EnvManager's three open-coded 5-key credential blocks collapse to one CREDENTIAL_KEYS loop (whitelist semantics kept, no Object.assign); parseEnvFile line-parser → util.parseEnv; fetchWithTimeout's manual setTimeout race → AbortSignal.timeout (caller timeout-message contract preserved). Net ~-106 lines. tsc clean; full suite 2253 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-4): sweep dead/redundant code from src/server Deletes (zero-caller verified): jobs/outbox.ts (canonical path is IngestEventsService/EndSessionService) + its test; the src/server/mcp/ surface (live MCP server is src/servers/mcp-server.ts) + its test; the always-inert ServerBetaProviderRegistry/EventBroadcaster boundaries (no Active variant exists) + their /v1/info payload keys; the unused isServerClassified guard; dead option sessionDebounceWindowMs (route + IngestEventsService). Dedupe: hoist the 5 byte-identical auth helpers shared by auth.ts/postgres-auth.ts into request-auth-helpers.ts; collapse resolveSummaryQueue/resolveEventQueue into one resolveQueue(lane). Kept (plan premise didn't verify): the ServerV1Routes runtime? option — a live smoke test sets and asserts it. Disabled* boundary subclasses kept (real Active counterparts chosen at runtime). Net ~-570 source lines. tsc clean; full suite 2242 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-7): dead-code sweep + dedup across services + npx-cli Deletes (zero-caller verified, tests removed in lockstep): ProcessManager getChildProcesses + parseElapsedTime (ps-reaper leftovers), ModeManager getObservationConcepts/validateType/getTypeLabel, AgentFormatter renderAgentFileHeader/renderAgentColumnKey/renderAgentContextIndex (return-[] inert, HeaderRenderer arms collapsed to Human-only), asyncHandler export, PriorMessages.userMessage (always ''), and 3 dead barrels (services/context, services/server, services/server/Middleware). Dedup: shared toError() helper (src/utils/to-error.ts) across 8 in-scope sites; asMs hoisted to telemetry/common; spawnPlugin() folds 4 spawnHidden blocks; countObserverSessionRows extracts the CleanupV12_4_3 count trio; IS_WINDOWS imported from paths.ts (was redeclared 3x); detectOsVersion inlined to os.release(). Kept (plan premise didn't verify): bun-resolver's spawnSync (npx-cli runs under Node — Bun.which would ReferenceError); context-generator.ts (a build entrypoint + test mock seam) — barrel still removed; Middleware.ts consumer Server.ts repointed to the real middleware. Net ~-279 lines. tsc clean; full suite 2232 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-5): delete orphans + dead UI/util code across cli/supervisor/ui Orphan deletes (zero-caller verified): src/bin/ (import-xml-observations, cleanup-duplicates — not wired to any entrypoint), src/adapters/ (abandoned parallel impl; real adapters are src/cli/adapters/), core/schemas/context-pack.ts + core/schemas/index.ts barrel. Companion tests removed in lockstep. Dead functions: 7 cursor-utils helpers (kept the 6 live registry/context writers), logger correlationId/sessionId/timing. YAGNI trims: gemini-cli metadata block (never read), useStats collapsed to fire-and-forget (+ dead Stats/WorkerStats/DatabaseStats types), useTheme resolvedTheme, useContextPreview.refresh from the public result; env-sanitizer ENV_PROXY_VARS folded into ENV_EXACT_MATCHES. Dedup: useGitHubStars + formatNumber inlined into their single consumer GitHubStarsButton (kept lowercase-k display — matches GitHub, Intl would uppercase); waitForExit extracted in process-registry and shared with shutdown's reapSession (placed low to avoid a circular import). Net ~-1085 lines (incl. test deletions). tsc + viewer typecheck clean; full suite 2181 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-2a): integrations/sync dead exports + scoped Chroma cleanup Dead-export deletes (zero-caller verified): syncContextToAgentsMd, updateWindsurfContextForProject, handleWindsurfCommand, detectClaudeCode (+ orphaned exec/promisify imports), getScriptExtension/detectPlatform, getVersionCheckAbsolutePath (+ companion plugin-distribution test edit). ChromaSyncState flush/resetCache/dirty (bump/replace persist eagerly, so dirty was always false). Scoped to the Chroma write path only (SQLite/search/telemetry copies stay live): dropped the created_at ISO field (kept created_at_epoch, the field actually read back) and the discovery_tokens param/field from ChromaSync's local Stored interfaces + the 3 live call sites. Stdlib/shrink: parseSemver/compareSemver → localeCompare numeric (ordering verified); OpenClaw roots hoisted to one const; bootstrapWatermarks max-loop → Math.max; Goose YAML builders collapsed to one withHeader flag. Defers the 5-installer consolidation to a separate pass. Net ~-187 lines. tsc clean; full suite 2181 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-2b): delete dead fetchAndInjectOpenCodeContext Orphaned when slice-2a removed syncContextToAgentsMd (its only caller chain). Confirmed zero callers. Kept fetchRealContextFromWorker (live at :323). Per feasibility analysis, the plan's 5-installer table-driven consolidation (headline -2400) was descoped: the 5 installers write 5 different config schemas and 4 different context formats with thin test coverage, so a 5->1 collapse would relocate divergence into callbacks and risk untested IDE-config write paths for a line-count-only win. Plan gate ('only if behavior genuinely identical') not met. tsc clean; full suite 2181 pass / 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(slice-3): worker dead-code sweep + provider/search consolidation Search pipeline: delete the unreachable half built speculatively (TimelineBuilder, ResultFormatter instance methods [kept the one live static], search/filters/, SearchStrategy.canHandle/name + BaseSearchStrategy, HybridSearchStrategy no-op search(), dead SearchOrchestrator wrappers, ChromaQueryResult) + companion tests. Dedup the 5 genuinely-identical chroma->recency->hydrate->FTS paths into one hybridSemanticHydrate helper and the 3 timeline renderers into renderTimeline() (search()/decisions()/changes()/howItWorks() left as-is — different shape). Providers: extract OpenAICompatibleProvider base from the Gemini/OpenRouter twins, preserving every per-provider divergence via flags/abstracts (truncation guard, token estimation, empty-response handling, endpointClass, the Gemini-only RPM throttle). Dedup parseRetryAfterMs into retry.ts. http/agents/session: delete the IngestEventBus machinery + ingestPrompt, the read-only POST /api/processing dup, FallbackErrorHandler.shouldFallbackToClaude, RateLimitStore.getAll/clear, SessionCompletionHandler.completeByDbId, SessionCleanupHelper, 3 dead agent types, and no-op empty-passthrough validateBody; collapse the 6 CorpusRoutes 404 blocks into one helper. Kept (false-dead, caller verified): SSEEventPayload (live via WorkerRef), RateLimitStore.get (live via shouldAbortForQuota). Both adversarial reviews confirmed behavior preserved. Net ~-2,640 lines. tsc clean; full suite 2080 pass / 0 fail; dead-ref sweep zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * build(plugin): regenerate distribution artifacts after ponytail audit Rebuilds the worker/mcp/server-beta/transcript-watcher/context-generator bundles and the viewer UI from the post-audit source (Slices 0-7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(env): require Node >=20.12.0 to match util.parseEnv usage EnvManager adopted stdlib util.parseEnv (added in Node 20.12.0) during the ponytail audit, but engines.node still advertised >=20.0.0. On Node 20.0-20.11 parseEnv is undefined: loadClaudeMemEnv() silently returns {} (credentials never load) and saveClaudeMemEnv() throws. Declare the real floor in both the npm package and the generated plugin manifest (via its generator in build-hooks.js so rebuilds stay consistent). Greptile PR #3021 P1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
20 KiB
Plan: Remove the entire duplicate SQLite stack + rewrite its tests against the shipping path
Scope: src/services/sqlite/ — delete the dead duplicate DB/migration stack AND the parallel
free-function CRUD API it propped up. Delete every test coupled to that duplicate and write fresh tests
against SessionStore (the path the worker actually runs). Out of scope: installer unification,
SearchManager dedup (separate plans).
Philosophy (per maintainer): Do NOT adapt old tests written against the dead abstraction — that's
busywork. Delete them, write new ones against SessionStore. The old tests' value is their
behavior checklist, not their code; that checklist is captured in Phase 1.
Goal: Remove ~3,000+ src lines of duplication, zero production behavior change, and replace ~2,300 lines of misdirected tests with a smaller, focused suite that exercises the real worker path.
Why this exists: Two parallel SQLite systems live in src/services/sqlite/:
- Shipping (KEEP):
SessionStore— the worker doesnew Database(DB_PATH)→new SessionStore(db), and SessionStore's constructor runs its own inline imperative migrations. All production reads/writes go throughSessionStoremethods. - Dead duplicate (DELETE):
ClaudeMemDatabase→MigrationRunner(a second, drifted migration engine) plus a free-function CRUD API (Sessions.ts/Observations.ts/… barrels + submodules +transactions.ts) that re-implements the SessionStore methods. Reachable only from tests.
Phase 0 — Ground Truth (verified by 6 discovery agents + direct grep; re-confirm gates as you go)
Shipping path (KEEP, never touch)
src/services/worker/DatabaseManager.ts:17-32→new Database(DB_PATH)thennew SessionStore(this.db).SessionStorector (SessionStore.ts:34, chain ~49-73) runs the inline migrations (schema_versions 4-32).- CLI server:
worker-service.ts:855opensnew Database(DB_PATH,…)— also noMigrationRunner/ClaudeMemDatabase. - The worker never imports
ClaudeMemDatabase,MigrationRunner,getDatabase,initializeDatabase, theservices/sqlite/index.tsbarrel, or the free-function CRUD API. Production's only direct imports into the free-fn tree are 5 helper functions (see keep-set below).
SessionStore API the new tests will call (verified signatures)
Constructor: constructor(dbPathOrDb: string | Database = DB_PATH) — accepts ':memory:' AND a raw
bun:sqlite Database (adopts it, then migrates → use this to seed a legacy schema first). Public raw
handle: store.db. Has close().
| Domain | Method (SessionStore.ts line) |
|---|---|
| Sessions | createSDKSession(contentSessionId, project, userPrompt, customTitle?, platformSource?): number (1692); updateMemorySessionId(sessionDbId, memId|null) (1030); markSessionCompleted(sessionDbId) (1038); ensureMemorySessionIdRegistered(sessionDbId, memId, workerPort?) (1048); getSessionById(id) (1624) |
| Observations | storeObservation(memId, project, observation{type,title,subtitle,facts[],narrative,concepts[],files_read[],files_modified[],agent_type?,agent_id?,metadata?}, promptNumber?, discoveryTokens?, overrideTimestampEpoch?, generatedByModel?): {id,createdAtEpoch} (1781); storeObservations(memId, project, observations[], summary|null, promptNumber?, …): {observationIds[],summaryId,createdAtEpoch} (1901); getObservationById(id) (1475) |
| Summaries | storeSummary(memId, project, summary{request,investigated,learned,completed,next_steps,notes}, promptNumber?, …): {id,createdAtEpoch} (1855); getSummaryForSession(memId) (1555) |
| Prompts | saveUserPrompt(contentSessionId, promptNumber, promptText): number (1754); getPromptNumberFromUserPrompts(contentSessionId): number (1685); findRecentDuplicateUserPrompt(contentSessionId, promptText, windowMs) (1406) |
3 old free-functions have no SessionStore method — new tests that need them import the leaf
function and pass store.db: computeObservationContentHash (observations/store.ts:8),
getFirstObservationCreatedAt (observations/recent.ts:36), getObservationsByFilePath
(observations/get.ts:97). All three are in the keep-set anyway.
Test gotchas (from source, save debugging time):
getSessionSummaryByIdclass method (SessionStore.ts:2435) queries non-existent columns → throws; don't call it (it's deleted in Phase 5 regardless).getRecentObservationsreads the legacytextcolumn → NULL for rows written bystoreObservation. Assert content viagetObservationById(title/subtitle/narrative), nottext.- Empty-project guard (
project || cwd-derived) exists only in leafobservations/store.ts, not inSessionStore.storeObservation/.storeObservations. The worker path stores empty project as-is. Test the leaf for the guard; don't assert it on SessionStore. pending_messageshas no SessionStore store method — seed via rawstore.db.prepare(INSERT…). For the cleanup test, observer cascade needsmemory_session_idset:createSDKSessioninserts NULL, so callupdateMemorySessionId(id,'obs-memory-N')beforestoreObservation.
Dead duplicate — DELETE (all production-dead, grep-confirmed)
- DB/migration stack:
Database.ts(ClaudeMemDatabase, the sqliteDatabaseManagersingleton — NOT the worker's,getDatabase,initializeDatabase,Migration),migrations/runner.ts(MigrationRunner, 1147 lines),index.ts(barrel, 0 importers), andSchemaRepair(openWithSchemaRepair, imported only byDatabase.ts). - Free-function CRUD API: all 6 barrels + submodule files +
transactions.ts(see keep/delete map).transactions.tsconfirmed dead: 0 direct prod importers; worker usessessionStore.storeObservations. - Dead SessionStore methods:
getSessionSummaryById(2435-2476, dead+broken),storeObservationsAndMarkComplete(2017-2143, 0 call sites —.storeObservationsAndMarkComplete(greps to zero).
Free-function tree — KEEP/DELETE map
KEEP (5 files, trimmed to ONLY the live export — production imports these directly):
| File | Keep ONLY | Live importer |
|---|---|---|
observations/store.ts |
computeObservationContentHash |
SessionStore.ts:15 |
observations/files.ts |
parseFileList |
SessionStore.ts:16, ChromaSync.ts:7 |
observations/get.ts |
getObservationsByFilePath |
DataRoutes.ts:17 |
observations/recent.ts |
getFirstObservationCreatedAt |
DataRoutes.ts:18 |
prompts/get.ts |
findRecentDuplicateUserPrompt |
SessionStore.ts:18 |
Each survivor's only in-tree dep is a type-only ./types.js import; after trimming, those type imports
become removable (the survivors use types from ../../../types/database.js, out of tree). Net: keep 5
function bodies, drop their ./types.js import lines.
DELETE ENTIRELY (18 files — only importers are the dead barrels/index/tests):
Observations.ts Sessions.ts Summaries.ts Prompts.ts Timeline.ts Import.ts # 6 barrels
transactions.ts
sessions/create.ts sessions/get.ts sessions/types.ts
summaries/store.ts summaries/get.ts summaries/recent.ts summaries/types.ts
prompts/store.ts prompts/types.ts
timeline/queries.ts
import/bulk.ts
sessions/, summaries/, timeline/, import/ dirs become empty → remove.
The one risk — CONFIRMED SAFE
Server tables (projects, server_sessions, memory_items, teams, api_keys, audit_log, …) are
created by ensureServerStorageSchema (src/storage/sqlite/schema.ts:21-305), called from 13 live
server-repo sites — not by MigrationRunner. Deleting the duplicate doesn't touch them.
Anti-pattern guards (every phase)
- ❌ Never delete a
src/services/sqlite/file before grepping its direct importers (excluding the deadindex.ts). - ❌ Don't touch
worker/DatabaseManager.ts, SessionStore's migration chain, orsrc/storage/sqlite/*. - ❌ Don't write new tests against the leaf free-functions where a SessionStore method exists — the worker uses methods; test the method. (Exceptions: the 3 leaf-only helpers above.)
- ❌ Don't port schema-repair or MigrationRunner-version-specific assertions — those test code the worker never runs.
Phase 1 — Write NEW tests against SessionStore (FIRST, so coverage exists before any deletion)
These must pass against the current tree (SessionStore + the 5 survivor leaf functions all exist now). They become the regression guard for Phases 2-5. New-file layout + behavior spec (KEEP items from the old suites, retargeted):
tests/sqlite/session-store-observations.test.ts — new SessionStore(':memory:')
- store returns
{id>0, createdAtEpoch>0}; all fields round-trip viagetObservationById overrideTimestampEpochhonored (epoch + ISO); default = now when omitted- null subtitle/narrative stored OK;
getObservationByIdreturns null for missing id - subagent:
agent_type/agent_idstored when provided; default NULL when omitted;agent_typealone OK getFirstObservationCreatedAt(store.db)→ null when empty, earliest ISO otherwise (leaf import)
tests/sqlite/session-store-dedup.test.ts
computeObservationContentHash(leaf import): deterministic, 16 chars, different content→different hash, null title/narrative OK, no field-boundary collision (\x00separator → 4 distinct hashes — keep verbatim)- identical
(memId,title,narrative)dedupes to same id regardless of time gap (collapse the old two "30s window" tests into ONE — dedup is the UNIQUE index, not time-based) - different content at same timestamp → distinct ids;
content_hashpopulated (16 chars) on new rows storeObservationsbatch: 3 identical inputs → 3 equal ids, 1 physical row (real worker hot path)- dedup unaffected by agent fields: 2nd insert w/ different
agent_typereturns existing id, count stays 1, original agent fields preserved - (optional, leaf) empty-project guard on
observations/store.tsleaf only — note SessionStore stores empty as-is
tests/sqlite/session-store-sessions.test.ts
createSDKSession→ id>0; idempotent (same content_session_id→same id); different→different- persisted
user_prompttag-stripped + bounded toMAX_STORED_PROMPT_CHARSending… getSessionByIdround-trips fields;memory_session_iddefaults null; null for missingcustom_title: stored at creation; defaults null; backfilled on idempotent call if unset; not overwritten if set; empty→nullplatform_source: defaults'claude'; preserves non-default when legacy caller omits it; throws/Platform source conflict/on explicit conflictupdateMemorySessionIdsets + allows re-update to different value
tests/sqlite/session-store-prompts.test.ts
saveUserPrompt→ id>0, incrementing, distinct across sessions; prompt_text tag-stripped + boundedfindRecentDuplicateUserPromptfinds dup in window (id/prompt_number/prompt_text)getPromptNumberFromUserPrompts: 0 when none; counts; session-isolated; handles 100 prompts
tests/sqlite/session-store-summaries.test.ts
storeSummary→{id>0,createdAtEpoch>0}; all fields +prompt_numberround-trip viagetSummaryForSessionoverrideTimestampEpochhonored; default = now; null notes preservedgetSummaryForSession: by memId; null when none; returns MOST RECENT when multiple
tests/sqlite/session-store-transactions.test.ts — target sessionStore.storeObservations (the real path)
- stores N atomically → ids + null summaryId when no summary; correct
createdAtEpoch - all observations in a batch share one timestamp
- observations + summary together → summaryId non-null, summary retrievable
- empty observations array → 0 ids, null summary; summary-only → 0 ids, summaryId set
promptNumberapplied to all in batch- DROP the old
storeObservationsAndMarkCompletequeue-delete/rollback tests — that path is dead (worker completes viaSessionCompletionHandler, not this function)
tests/sqlite/session-store-migrations.test.ts — seed legacy via new Database(':memory:') then new SessionStore(rawDb)
- legacy NULL
content_hashrows → rewritten to__null_migration_<id>__(preserved), non-NULL dups deduped to one,ux_observations_session_hashUNIQUE index created (this is SessionStore's v29addObservationsUniqueContentHashIndex; the existing data-integrity "Migration parity" test is the canonical source — port it here) - idempotency: constructing SessionStore twice over the same db → no throw, identical schema/version set, data unchanged
- fresh-DB init creates SessionStore's core tables (
schema_versions, sdk_sessions, observations, session_summaries, user_prompts, pending_messages) — assert SessionStore's tables, NOT MigrationRunner's server tables - (optional)
PRAGMA foreign_key_listshowson_update=CASCADE,on_delete=CASCADEon a fresh SessionStore db - (optional) seed a legacy
pending_messageswithretry_count/completed_at_epoch/worker_pid→ SessionStore drops them (v31/v32) - DROP: MigrationRunner-only server tables (v33/34), specific cross-stack version-number lists, mig-24 drift, #979 old-DatabaseManager conflicts, crash-recovery
_newtemp tables
tests/infrastructure/cleanup-v12_4_3.test.ts — REWRITE in place: reseed seedDatabase via new SessionStore(dbPath) + methods (raw store.db INSERT for pending_messages only). All behaviors KEEP:
- missing DB → marker
skipped:'no-db', null backupPath, zero counts - purges observer sessions (
OBSERVER_SESSIONS_PROJECT) + cascade rows, purges stuck pending (COUNT>=10), wipes chroma dir + sync-state, writes backup; real-project rows survive - pending preserved when stuck count < 10 (9 survives); idempotent (2nd run no-ops, no 2nd backup)
- proceeds on non-credible
statfsSync(bsize=0) with WARN containing 'non-credible'{bsize:0}(keep the spy assertion) - honors
CLAUDE_MEM_SKIP_CLEANUP_V12_4_3=1(exits, no marker, observer intact)
Verify Phase 1: bun test tests/sqlite/session-store-*.test.ts tests/infrastructure/cleanup-v12_4_3.test.ts → all green against the current tree. New tests must import ONLY SessionStore + the 5 survivor leaf functions — grep them to confirm no import of a to-be-deleted barrel/transactions.ts.
Phase 2 — Delete the old tests coupled to the dead stack
rm tests/sqlite/observations.test.ts tests/sqlite/transactions.test.ts \
tests/sqlite/sessions.test.ts tests/sqlite/prompts.test.ts \
tests/sqlite/summaries.test.ts tests/sqlite/data-integrity.test.ts \
tests/services/sqlite/observations/store-subagent-label.test.ts \
tests/services/sqlite/migration-runner.test.ts \
tests/services/sqlite/schema-repair.test.ts
(schema-repair.test.ts has NO replacement — it tests ClaudeMemDatabase.openWithSchemaRepair, which the worker never uses.)
Verify: grep -rn "ClaudeMemDatabase\|MigrationRunner\|runAllMigrations\|sqlite/transactions\|sqlite/Sessions\|sqlite/Observations" tests/ → ZERO. bun test tests/ green (new suite carries the coverage).
Phase 3 — Delete the dead DB/migration stack
Gate:
grep -rn "from .*services/sqlite/Database" src/ --include=*.ts | grep -v "Database.ts:" # ZERO
grep -rn "services/sqlite/index" src/ tests/ --include=*.ts # ZERO
grep -rn "ensureServerStorageSchema" src/ --include=*.ts # 13 callers, none in runner.ts
rm src/services/sqlite/Database.ts src/services/sqlite/migrations/runner.ts src/services/sqlite/index.ts
rmdir src/services/sqlite/migrations
grep -rn "SchemaRepair\|openWithSchemaRepair" src/ --include=*.ts # expect ZERO after Database.ts gone
rm src/services/sqlite/SchemaRepair.ts # use actual path from grep; only if zero importers
Verify: bunx tsc --noEmit — no dangling imports.
Phase 4 — Delete the duplicate free-function CRUD API + trim survivors
Gate (per group, must be zero non-barrel/non-test importers):
grep -rn "sqlite/transactions\|sqlite/Sessions\|sqlite/Observations\|sqlite/Summaries\|sqlite/Prompts\|sqlite/Timeline\|sqlite/Import" src/ --include=*.ts
grep -rn "sessions/create\|sessions/get\|summaries/store\|summaries/get\|summaries/recent\|prompts/store\|timeline/queries\|import/bulk" src/ --include=*.ts
Delete (18 files):
rm src/services/sqlite/{Observations,Sessions,Summaries,Prompts,Timeline,Import}.ts
rm src/services/sqlite/transactions.ts
rm src/services/sqlite/sessions/{create,get,types}.ts
rm src/services/sqlite/summaries/{store,get,recent,types}.ts
rm src/services/sqlite/prompts/{store,types}.ts
rm src/services/sqlite/timeline/queries.ts src/services/sqlite/import/bulk.ts
rmdir src/services/sqlite/sessions src/services/sqlite/summaries src/services/sqlite/timeline src/services/sqlite/import
Trim the 5 survivor files to keep only the live export + remove their now-dead ./types.js imports:
observations/store.ts → keep computeObservationContentHash; delete storeObservation; drop ./types.js import
observations/files.ts → keep parseFileList; delete getFilesForSession; drop ./types.js import
observations/get.ts → keep getObservationsByFilePath; delete getObservationById/getObservationsByIds/getObservationsForSession; drop ./types.js import
observations/recent.ts → keep getFirstObservationCreatedAt; delete getRecentObservations/getAllRecentObservations; drop ./types.js import
prompts/get.ts → keep findRecentDuplicateUserPrompt; delete the other 7 prompt getters; drop ./types.js import (it uses LatestPromptResult from ../../../types/database.js)
Then observations/types.ts and prompts/types.ts should have no remaining consumer → grep and delete if zero.
Verify: bunx tsc --noEmit clean; bun test tests/sqlite/ green. Confirm the 5 live anchors still resolve (SessionStore.ts:15/16/18, ChromaSync.ts:7, DataRoutes.ts:17/18).
Phase 5 — Delete the dead SessionStore methods
Gate: grep -rn "\.getSessionSummaryById(\|\.storeObservationsAndMarkComplete(" src/ tests/ --include=*.ts → ZERO
(the standalone sessions/get.ts:getSessionSummaryById is already deleted in Phase 4; the live one is gone with it — confirm nothing else references either name).
Delete (higher range first to keep line numbers):
SessionStore.tslines 2435-2476 (getSessionSummaryById)SessionStore.tslines 2017-2143 (storeObservationsAndMarkComplete) Verify:bunx tsc --noEmitclean;bun test tests/green.
Phase 6 — Final verification gate
npm run build-and-sync— succeeds, worker restarts.- Worker boots + migrates a fresh DB (temp
~/.claude-mempath) — no errors in worker log. - Worker migrates a seeded legacy v22 DB without error (reuse the Phase 1 migrations-test seed helper).
bun test— full suite green.bunx tsc --noEmit— no dangling imports ofClaudeMemDatabase,MigrationRunner,getDatabase,initializeDatabase,Database.ts,SchemaRepair, any deleted barrel/submodule, or the two SessionStore methods.- Dead-reference sweep:
grep -rn "ClaudeMemDatabase\|MigrationRunner\|sqlite/Database\|sqlite/index\|sqlite/transactions\|sqlite/migrations" src/ tests/ --include=*.ts→ ZERO.
Done when: all 6 pass; the only migration engine + CRUD API in the tree is SessionStore; the new
session-store-* suite covers the shipping behaviors; the 5 leaf helpers remain for their prod callers.
Line accounting (approx)
| Delete (src) | Lines |
|---|---|
migrations/runner.ts |
1147 |
Database.ts |
211 |
index.ts |
22 |
SchemaRepair.ts (orphaned) |
~? |
| free-fn CRUD API (18 files) | ~1,500 |
| survivor trims | ~150 |
| SessionStore dead methods | 169 |
| src subtotal | ~3,200 |
| Tests | Lines |
|---|---|
| old suites deleted (9 files) | ~2,340 |
new session-store-* suites added |
~+1,000 |
| test net | ~−1,340 |
Net: ~−4,500 lines, zero production behavior change, real path now directly tested.
Execution order rationale
New tests first (Phase 1) → they pass on the current tree and guard every subsequent deletion. Old tests next (Phase 2) → nothing then references the duplicate. Then peel the duplicate in dependency order: DB stack (3) → free-fn API + survivor trim (4) → dead methods (5) → gate (6). The suite stays green at every step.