fix: sequoia-territory bug-fix bundle (chroma, env, build, MCP, worker) (#2394)

* fix(mcp): drop ${_R%/} parameter-expansion trim that trips Claude Code MCP validator

The POSIX substring trim ${_R%/} is misread by Claude Code's MCP-config
validator as a required env var named "_R%/", causing /doctor to flag
mcp-search as invalid on every install. POSIX collapses // in paths, so
the trim was cosmetic — drop it and the validator passes.

Fixes #2350, #2354, #2356.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(env): block ANTHROPIC_BASE_URL leak + three-branch OAuth-skip predicate

Issue #2375: parent-shell ANTHROPIC_BASE_URL leaked through to subprocess
isolatedEnv, while ANTHROPIC_AUTH_TOKEN was blocked. The OAuth-skip
predicate fired on bare BASE_URL, but no auth credential reached the
subprocess -> "Not logged in". Add ANTHROPIC_BASE_URL to BLOCKED_ENV_VARS
so it can only enter isolatedEnv via ~/.claude-mem/.env.

Replace the OAuth-skip predicate with three branches to prevent a
second-order security regression: a user with a tokenless gateway
configured in .env (BASE_URL only, no token) would otherwise have their
Anthropic OAuth token fetched and sent to their gateway. Token leak to
third party. Three-branch predicate:

1. BASE_URL set -> return without OAuth (custom gateway, never leak token)
2. API_KEY or AUTH_TOKEN set -> return without OAuth (explicit credentials)
3. Otherwise -> OAuth lookup for api.anthropic.com

Adds tests/env-isolation.test.ts.

Fixes #2375.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(worker): classify Claude SDK HTTP 400 as unrecoverable

ClaudeProvider previously had no explicit HTTP 400 handling — the
default branch classified all errors as `transient`, so a permanent
400 (e.g., model rejecting an `effort` parameter forwarded from a
leaked CLAUDE_CODE_EFFORT_LEVEL) would be retried indefinitely
(#1874+ retries observed in one session per #2357).

Mirror GeminiProvider/OpenRouterProvider's pattern: classify 400 as
`unrecoverable`, 401/403 as `auth_invalid`, 429 as `rate_limit`,
default to `transient`. When the 400 body matches the
"effort parameter" signature, emit a one-time SDK warn log pointing
at the env-leak fix in ~/.claude-mem/.env.

Adds tests/claude-provider-error-classifier.test.ts.

Fixes #2357.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroma): pin onnxruntime>=1.20 + protobuf<7 to fix INVALID_PROTOBUF on macOS arm64

The shipped all-MiniLM-L6-v2 model has pytorch-2.0 IR. chroma-mcp 0.2.6
transitively depends on `chromadb>=1.0.16` which only requires
`onnxruntime>=1.14.1` — uv can therefore resolve to an onnxruntime old
enough to fail every embedding add with `[ONNXRuntimeError] : 7 :
INVALID_PROTOBUF` on macOS arm64 / Python 3.13. Semantic search silently
degraded to FTS-only and smart backfill broke (#2371).

Path B (override) was required because chroma-mcp 0.2.6 is the latest
PyPI release — no upstream bump exists.

Inject `--with onnxruntime>=1.20 --with protobuf<7` into the uvx spawn
args (both persistent and remote modes). The protobuf cap is essential:
forcing only `onnxruntime>=1.20` causes uv to re-resolve and land on
protobuf 7.x, which trips opentelemetry's `_pb2` stubs with `TypeError:
Descriptors cannot be created directly` because they were generated
with protoc <3.19. Capping below 7 lands on protobuf 6.x which
opentelemetry tolerates.

Verified end-to-end: ONNX model loads, embeddings produce a 384-dim
vector, PersistentClient init / add / query roundtrip succeeds:

    uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" \
        chroma-mcp==0.2.6 --help     # clean
    # programmatic test: onnxruntime 1.26.0, protobuf 6.33.6,
    # embedding ok 384, query ok ids=[['1']]

Fixes #2371.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroma): enforce single chroma-mcp subprocess per worker (#2313)

Root cause: every reconnect path in ChromaMcpManager — connectInternal's
re-entry, the connect-timeout catch, callTool's transport-error retry, and
the transport.onclose handler — used to abandon `this.transport`/`this.client`
by calling at most `transport.close()` and nulling the handles. The MCP SDK's
StdioClientTransport.close() only signals the direct child (uvx); on Linux the
grandchildren (uv -> python -> chroma-mcp) re-parent to init and survive
because the SDK does not put the subprocess in its own process group. Each
reconnect therefore leaked a full chroma-mcp tree, accumulating 20+ instances
per session.

Fix: introduce a private disposeCurrentSubprocess() helper that always tree-
kills via the existing killProcessTree primitive before nulling the transport
reference, and route every "abandon current transport" path (reconnect,
connect-timeout, transport error, onclose, stop) through it. The existing
`connecting: Promise<void> | null` lock continues to serialize concurrent
ensureConnected() callers into a single spawn.

Adds tests/services/sync/chroma-mcp-manager-singleton.test.ts covering:
- 5 parallel ensureConnected() calls produce exactly one spawn
- a transport-error reconnect tree-kills the prior subprocess pid before
  spawning a replacement
- stop() disposes state including any pending connecting promise

Manual verification needed on Linux: after a long session with multiple
tool uses, `ps aux | grep chroma-mcp | wc -l` should return 1, not 20+.

Fixes #2313.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): polyfill import.meta.url to __filename in CJS worker bundle

The worker bundles ESM dependencies (notably @anthropic-ai/claude-agent-sdk's
*.mjs files) into CJS output. Those modules call createRequire(import.meta.url)
at module-load time. esbuild's CJS output left this as createRequire(ute.url)
— where `ute` is its `import.meta` polyfill `{}` — so `ute.url` was undefined
and module-load crashed with:

  TypeError: The argument 'filename' must be a file URL object, file URL
  string, or absolute path string. Received undefined
  code: ERR_INVALID_ARG_VALUE

Every Stop hook and every worker subprocess invocation hit this. Fix is the
esbuild `define` option mapping `import.meta.url` to `__filename` (provided as
a real absolute path by the existing CJS prelude in the banner).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: daily dep bump per CLAUDE.md maintenance policy

Root: @anthropic-ai/claude-agent-sdk, @clack/prompts, @types/node,
dompurify, postcss, react, react-dom, yaml, zod.
plugin/: tree-sitter-cli, zod.
openclaw/: @types/node.

All patch/minor bumps; no major version changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: regenerate plugin artifacts after env/chroma/mcp fixes

Built artifacts are committed so the marketplace-installable plugin
ships with the runtime bundles. Picks up:
- d7b145e9 .mcp.json shell-prelude trim drop
- a8cbd651 EnvManager BASE_URL block + 3-branch predicate
- 8cb73b8c ClaudeProvider HTTP 400 unrecoverable classifier
- ecd5b802 ChromaMcpManager onnxruntime/protobuf overrides
- c79324ea ChromaMcpManager singleton enforcement
- e8376f46 esbuild import.meta.url -> __filename polyfill
- a7541d71 daily dep bump

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: regenerate plugin artifacts after main merge

Bundles now include both v13.0.0 server-beta runtime (server-beta-service.cjs
+ updated mcp-server.cjs / worker-service.cjs) and this branch's chroma /
env / build / Claude SDK fixes.

Verified: bun test tests/env-isolation.test.ts \\
  tests/claude-provider-error-classifier.test.ts \\
  tests/services/sync/chroma-mcp-manager-singleton.test.ts
→ 13/13 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): address CodeRabbit findings on PR #2394

1. scripts/build-hooks.js — `import.meta.url` now maps to a file:// URL
   (via pathToFileURL(__filename).href in the CJS banner) instead of the
   raw __filename path. Preserves URL semantics for any bundled ESM dep
   that does `new URL(rel, import.meta.url)`. createRequire still works.

2. src/shared/EnvManager.ts — added envFilePath() that resolves
   CLAUDE_MEM_ENV_FILE lazily (falling back to paths.envFile()), and
   switched internal load/save call sites to use it. ENV_FILE_PATH is
   kept as a deprecated snapshot for back-compat. Lets tests target a
   temp file without depending on module-load order.

3. tests/env-isolation.test.ts — redirects to a temp dir via
   CLAUDE_MEM_ENV_FILE in beforeAll, removes all mutation of the real
   ~/.claude-mem/.env, and wraps the OAuth-spy assertion in try/finally
   so the spy is always restored even if the test fails.

Verified:
  bun test tests/env-isolation.test.ts \
    tests/claude-provider-error-classifier.test.ts \
    tests/services/sync/chroma-mcp-manager-singleton.test.ts
  → 13/13 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-05-09 18:05:48 -07:00
committed by GitHub
parent 13d5fa71c2
commit 5533412984
14 changed files with 1064 additions and 417 deletions

View File

@@ -5,7 +5,7 @@
"command": "sh",
"args": [
"-c",
"_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do _R=\"${_R%/}\"; [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
"_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
]
}
}

View File

@@ -10,7 +10,7 @@
"test": "tsc && node --test dist/index.test.js"
},
"devDependencies": {
"@types/node": "^25.6.0",
"@types/node": "^25.6.2",
"typescript": "^6.0.3"
},
"openclaw": {

View File

@@ -123,26 +123,26 @@
"2fa": false
},
"dependencies": {
"@anthropic-ai/claude-agent-sdk": "^0.2.119",
"@anthropic-ai/claude-agent-sdk": "^0.2.138",
"@better-auth/api-key": "^1.6.9",
"@clack/prompts": "^1.2.0",
"@clack/prompts": "^1.3.0",
"@modelcontextprotocol/sdk": "^1.29.0",
"ansi-to-html": "^0.7.2",
"better-auth": "^1.6.9",
"bullmq": "^5.76.6",
"cors": "^2.8.6",
"dompurify": "^3.4.1",
"dompurify": "^3.4.2",
"express": "^5.2.1",
"glob": "^13.0.6",
"handlebars": "^4.7.9",
"ioredis": "^5.10.1",
"pg": "^8.20.0",
"picocolors": "^1.1.1",
"react": "^19.2.5",
"react-dom": "^19.2.5",
"react": "^19.2.6",
"react-dom": "^19.2.6",
"shell-quote": "^1.8.3",
"yaml": "^2.8.3",
"zod": "^4.3.6",
"yaml": "^2.8.4",
"zod": "^4.4.3",
"zod-to-json-schema": "^3.25.2"
},
"devDependencies": {
@@ -156,7 +156,7 @@
"@types/cors": "^2.8.19",
"@types/dompurify": "^3.2.0",
"@types/express": "^5.0.6",
"@types/node": "^25.6.0",
"@types/node": "^25.6.2",
"@types/pg": "^8.20.0",
"@types/react": "^19.2.14",
"@types/react-dom": "^19.2.3",
@@ -164,7 +164,7 @@
"jimp": "^1.6.1",
"np": "^11.2.0",
"parse5": "^8.0.1",
"postcss": "^8.5.13",
"postcss": "^8.5.14",
"remark-mdx": "^3.1.1",
"remark-parse": "^11.0.0",
"tree-sitter-bash": "^0.25.1",

View File

@@ -5,7 +5,7 @@
"command": "sh",
"args": [
"-c",
"_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do _R=\"${_R%/}\"; [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
"_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
]
}
}

View File

@@ -212,7 +212,7 @@ ${f}`}let a=s.lineStart;for(let u=s.lineStart-1;u>=0;u--){let l=i[u].trim();if(l
${c}`}var u_=new Set([".js",".jsx",".ts",".tsx",".mjs",".cjs",".py",".pyw",".go",".rs",".rb",".java",".cs",".cpp",".cc",".cxx",".c",".h",".hpp",".hh",".swift",".kt",".kts",".php",".vue",".svelte",".ex",".exs",".lua",".scala",".sc",".sh",".bash",".zsh",".hs",".zig",".css",".scss",".toml",".yml",".yaml",".sql",".md",".mdx"]),Gx=new Set(["node_modules",".git","dist","build",".next","__pycache__",".venv","venv","env",".env","target","vendor",".cache",".turbo","coverage",".nyc_output",".claude",".smart-file-read"]),Kx=512*1024;async function*l_(t,e,r=20,n){if(r<=0)return;let o;try{o=await(0,Ar.readdir)(t,{withFileTypes:!0})}catch(s){S.debug("WORKER",`walkDir: failed to read directory ${t}`,void 0,s instanceof Error?s:void 0);return}for(let s of o){if(s.name.startsWith(".")&&s.name!=="."||Gx.has(s.name))continue;let i=(0,Vn.join)(t,s.name);if(s.isDirectory())yield*l_(i,e,r-1,n);else if(s.isFile()){let a=s.name.slice(s.name.lastIndexOf("."));(u_.has(a)||n&&n.has(a))&&(yield i)}}}async function Jx(t){try{let e=await(0,Ar.stat)(t);if(e.size>Kx||e.size===0)return null;let r=await(0,Ar.readFile)(t,"utf-8");return r.slice(0,1e3).includes("\0")?null:r}catch(e){return S.debug("WORKER",`safeReadFile: failed to read ${t}`,void 0,e instanceof Error?e:void 0),null}}async function d_(t,e,r={}){let n=r.maxResults||20,o=e.toLowerCase(),s=o.split(/[\s_\-./]+/).filter(w=>w.length>0),i=r.projectRoot||t,a=Wn(i),c=new Set;for(let w of Object.values(a.grammars))for(let v of w.extensions)u_.has(v)||c.add(v);let u=[];for await(let w of l_(t,t,20,c.size>0?c:void 0)){if(r.filePattern&&!(0,Vn.relative)(t,w).toLowerCase().includes(r.filePattern.toLowerCase()))continue;let v=await Jx(w);v&&u.push({absolutePath:w,relativePath:(0,Vn.relative)(t,w),content:v})}let l=i_(u,i),d=[],p=[],f=0;for(let[w,v]of l){f+=Bx(v);let k=Ps(w.toLowerCase(),s)>0,_e=[],Ee=(Dt,Qt)=>{for(let ae of Dt){let St=0,Ve="",Cr=Ps(ae.name.toLowerCase(),s);Cr>0&&(St+=Cr*3,Ve="name match"),ae.signature.toLowerCase().includes(o)&&(St+=2,Ve=Ve?`${Ve} + signature`:"signature match"),ae.jsdoc&&ae.jsdoc.toLowerCase().includes(o)&&(St+=1,Ve=Ve?`${Ve} + jsdoc`:"jsdoc match"),St>0&&(k=!0,_e.push({filePath:w,symbolName:Qt?`${Qt}.${ae.name}`:ae.name,kind:ae.kind,signature:ae.signature,jsdoc:ae.jsdoc,lineStart:ae.lineStart,lineEnd:ae.lineEnd,matchReason:Ve})),ae.children&&Ee(ae.children,ae.name)}};Ee(v.symbols),k&&(d.push(v),p.push(..._e))}p.sort((w,v)=>{let x=Ps(w.symbolName.toLowerCase(),s);return Ps(v.symbolName.toLowerCase(),s)-x});let m=p.slice(0,n),_=new Set(m.map(w=>w.filePath)),y=d.filter(w=>_.has(w.filePath)).slice(0,n),b=y.reduce((w,v)=>w+v.foldedTokenEstimate,0);return{foldedFiles:y,matchingSymbols:m,totalFilesScanned:u.length,totalSymbolsFound:f,tokenEstimate:b}}function Ps(t,e){let r=0;for(let n of e)if(t===n)r+=10;else if(t.includes(n))r+=5;else{let o=0,s=0;for(let i of n){let a=t.indexOf(i,o);a!==-1&&(s++,o=a+1)}s===n.length&&(r+=1)}return r}function Bx(t){let e=t.symbols.length;for(let r of t.symbols)r.children&&(e+=r.children.length);return e}function p_(t,e){let r=[];if(r.push(`\u{1F50D} Smart Search: "${e}"`),r.push(` Scanned ${t.totalFilesScanned} files, found ${t.totalSymbolsFound} symbols`),r.push(` ${t.matchingSymbols.length} matches across ${t.foldedFiles.length} files (~${t.tokenEstimate} tokens for folded view)`),r.push(""),t.matchingSymbols.length===0)return r.push(" No matching symbols found."),r.join(`
`);r.push("\u2500\u2500 Matching Symbols \u2500\u2500"),r.push("");for(let n of t.matchingSymbols){if(r.push(` ${n.kind} ${n.symbolName} (${n.filePath}:${n.lineStart+1})`),r.push(` ${n.signature}`),n.jsdoc){let o=n.jsdoc.split(`
`).find(s=>s.replace(/^[\s*/]+/,"").trim().length>0);o&&r.push(` \u{1F4AC} ${o.replace(/^[\s*/]+/,"").trim()}`)}r.push("")}r.push("\u2500\u2500 Folded File Views \u2500\u2500"),r.push("");for(let n of t.foldedFiles)r.push(Ir(n)),r.push("");return r.push("\u2500\u2500 Actions \u2500\u2500"),r.push(" To see full implementation: use smart_unfold with file path and symbol name"),r.join(`
`)}var iu=require("node:fs/promises"),zs=require("node:fs"),Qe=require("node:path"),h_=require("node:os"),g_=require("node:url"),cP={},Yx="12.7.5";console.log=(...t)=>{S.error("CONSOLE","Intercepted console output (MCP protocol protection)",void 0,{args:t})};var __=!1,y_=(()=>{if(typeof __dirname<"u")return __dirname;try{return(0,Qe.dirname)((0,g_.fileURLToPath)(cP.url))}catch{return __=!0,process.cwd()}})(),au=(0,Qe.resolve)(y_,"worker-service.cjs");function Xx(){__&&((0,zs.existsSync)(au)||S.error("SYSTEM","mcp-server: dirname resolution failed (both __dirname and import.meta.url are unavailable). Fell back to process.cwd() and the resolved WORKER_SCRIPT_PATH does not exist. This is the actual problem \u2014 the worker bundle is fine, but mcp-server cannot locate it. Worker auto-start will fail until the dirname-resolution path is fixed.",{workerScriptPath:au,mcpServerDir:y_}))}var f_={search:"/api/search",timeline:"/api/timeline"};async function su(t,e){S.debug("SYSTEM","\u2192 Worker API",void 0,{endpoint:t,params:e});let r=new URLSearchParams;for(let[o,s]of Object.entries(e))s!=null&&r.append(o,String(s));let n=`${t}?${r}`;try{let o=await $s(n);if(!o.ok){let i=await o.text();throw new Error(`Worker API error (${o.status}): ${i}`)}let s=await o.json();return S.debug("SYSTEM","\u2190 Worker API success",void 0,{endpoint:t}),s}catch(o){return S.error("SYSTEM","\u2190 Worker API error",{endpoint:t},o instanceof Error?o:new Error(String(o))),{content:[{type:"text",text:`Error calling Worker API: ${o instanceof Error?o.message:String(o)}`}],isError:!0}}}async function Qx(t,e){let r=await $s(t,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify(e)});if(!r.ok){let o=await r.text();throw new Error(`Worker API error (${r.status}): ${o}`)}let n=await r.json();return S.debug("HTTP","Worker API success (POST)",void 0,{endpoint:t}),{content:[{type:"text",text:JSON.stringify(n,null,2)}]}}async function Mr(t,e){S.debug("HTTP","Worker API request (POST)",void 0,{endpoint:t});try{return await Qx(t,e)}catch(r){return S.error("HTTP","Worker API error (POST)",{endpoint:t},r instanceof Error?r:new Error(String(r))),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function eP(){try{return(await $s("/api/health")).ok}catch(t){return S.debug("SYSTEM","Worker health check failed",{},t instanceof Error?t:new Error(String(t))),!1}}async function tP(){if(await eP())return!0;S.warn("SYSTEM","Worker not available, attempting auto-start for MCP client"),Xx();try{let t=Jc(),e=await Kg(t,au);return e==="dead"&&S.error("SYSTEM","Worker auto-start failed \u2014 MCP tools that require the worker (search, timeline, get_observations) will fail until the worker is running. Check earlier log lines for the specific failure reason (Bun not found, missing worker bundle, port conflict, etc.)."),e!=="dead"}catch(t){return S.error("SYSTEM","Worker auto-start threw \u2014 MCP tools that require the worker (search, timeline, get_observations) will fail until the worker is running.",void 0,t instanceof Error?t:new Error(String(t))),!1}}var S_=[{name:"__IMPORTANT",description:`3-LAYER WORKFLOW (ALWAYS FOLLOW):
`)}var iu=require("node:fs/promises"),zs=require("node:fs"),Qe=require("node:path"),h_=require("node:os"),g_=require("node:url"),cP={},Yx="13.0.0";console.log=(...t)=>{S.error("CONSOLE","Intercepted console output (MCP protocol protection)",void 0,{args:t})};var __=!1,y_=(()=>{if(typeof __dirname<"u")return __dirname;try{return(0,Qe.dirname)((0,g_.fileURLToPath)(cP.url))}catch{return __=!0,process.cwd()}})(),au=(0,Qe.resolve)(y_,"worker-service.cjs");function Xx(){__&&((0,zs.existsSync)(au)||S.error("SYSTEM","mcp-server: dirname resolution failed (both __dirname and import.meta.url are unavailable). Fell back to process.cwd() and the resolved WORKER_SCRIPT_PATH does not exist. This is the actual problem \u2014 the worker bundle is fine, but mcp-server cannot locate it. Worker auto-start will fail until the dirname-resolution path is fixed.",{workerScriptPath:au,mcpServerDir:y_}))}var f_={search:"/api/search",timeline:"/api/timeline"};async function su(t,e){S.debug("SYSTEM","\u2192 Worker API",void 0,{endpoint:t,params:e});let r=new URLSearchParams;for(let[o,s]of Object.entries(e))s!=null&&r.append(o,String(s));let n=`${t}?${r}`;try{let o=await $s(n);if(!o.ok){let i=await o.text();throw new Error(`Worker API error (${o.status}): ${i}`)}let s=await o.json();return S.debug("SYSTEM","\u2190 Worker API success",void 0,{endpoint:t}),s}catch(o){return S.error("SYSTEM","\u2190 Worker API error",{endpoint:t},o instanceof Error?o:new Error(String(o))),{content:[{type:"text",text:`Error calling Worker API: ${o instanceof Error?o.message:String(o)}`}],isError:!0}}}async function Qx(t,e){let r=await $s(t,{method:"POST",headers:{"Content-Type":"application/json"},body:JSON.stringify(e)});if(!r.ok){let o=await r.text();throw new Error(`Worker API error (${r.status}): ${o}`)}let n=await r.json();return S.debug("HTTP","Worker API success (POST)",void 0,{endpoint:t}),{content:[{type:"text",text:JSON.stringify(n,null,2)}]}}async function Mr(t,e){S.debug("HTTP","Worker API request (POST)",void 0,{endpoint:t});try{return await Qx(t,e)}catch(r){return S.error("HTTP","Worker API error (POST)",{endpoint:t},r instanceof Error?r:new Error(String(r))),{content:[{type:"text",text:`Error calling Worker API: ${r instanceof Error?r.message:String(r)}`}],isError:!0}}}async function eP(){try{return(await $s("/api/health")).ok}catch(t){return S.debug("SYSTEM","Worker health check failed",{},t instanceof Error?t:new Error(String(t))),!1}}async function tP(){if(await eP())return!0;S.warn("SYSTEM","Worker not available, attempting auto-start for MCP client"),Xx();try{let t=Jc(),e=await Kg(t,au);return e==="dead"&&S.error("SYSTEM","Worker auto-start failed \u2014 MCP tools that require the worker (search, timeline, get_observations) will fail until the worker is running. Check earlier log lines for the specific failure reason (Bun not found, missing worker bundle, port conflict, etc.)."),e!=="dead"}catch(t){return S.error("SYSTEM","Worker auto-start threw \u2014 MCP tools that require the worker (search, timeline, get_observations) will fail until the worker is running.",void 0,t instanceof Error?t:new Error(String(t))),!1}}var S_=[{name:"__IMPORTANT",description:`3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) \u2192 Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) \u2192 Get context around interesting results
3. get_observations([IDs]) \u2192 Fetch full details ONLY for filtered IDs

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -151,13 +151,19 @@ async function buildHooks() {
'onnxruntime-node'
],
define: {
'__DEFAULT_PACKAGE_VERSION__': `"${version}"`
'__DEFAULT_PACKAGE_VERSION__': `"${version}"`,
// Polyfill import.meta.url for ESM deps bundled into CJS output.
// @anthropic-ai/claude-agent-sdk's *.mjs files use createRequire(import.meta.url)
// and `new URL(rel, import.meta.url)`. We map import.meta.url to a file:// URL
// (not the raw __filename path) so URL construction preserves its semantics.
'import.meta.url': '__IMPORT_META_URL__'
},
banner: {
js: [
'#!/usr/bin/env bun',
'var __filename = __filename || require("node:path").resolve(process.argv[1] || "");',
'var __dirname = __dirname || require("node:path").dirname(__filename);'
'var __dirname = __dirname || require("node:path").dirname(__filename);',
'var __IMPORT_META_URL__ = require("node:url").pathToFileURL(__filename).href;'
].join('\n')
}
});

View File

@@ -23,6 +23,26 @@ const CHROMA_SUPERVISOR_ID = 'chroma-mcp';
const CHROMA_MCP_PINNED_VERSION = '0.2.6';
// Override transitive dep resolutions for chroma-mcp 0.2.6 (issue #2371).
//
// Why onnxruntime>=1.20: the shipped all-MiniLM-L6-v2 model has pytorch-2.0
// IR. Older onnxruntime versions can't parse it and fail every embedding
// add with `[ONNXRuntimeError] : 7 : INVALID_PROTOBUF`. uv may otherwise
// resolve to a too-old onnxruntime on macOS arm64 / Python 3.13 depending
// on cache state, so we force a floor.
//
// Why protobuf<7: protobuf 7.x's stricter generated-file check rejects
// opentelemetry's _pb2 stubs (generated with protoc <3.19), throwing
// `TypeError: Descriptors cannot be created directly` at chromadb import.
// Capping below 7 lands on protobuf 6.x which opentelemetry tolerates.
//
// These pins are runtime-only (uvx --with) so we don't have to fork
// chroma-mcp upstream — they apply only to claude-mem's spawned subprocess.
const CHROMA_MCP_DEP_OVERRIDES: ReadonlyArray<string> = [
'onnxruntime>=1.20',
'protobuf<7',
];
export class ChromaMcpManager {
private static instance: ChromaMcpManager | null = null;
private client: Client | null = null;
@@ -72,15 +92,14 @@ export class ChromaMcpManager {
}
private async connectInternal(): Promise<void> {
if (this.transport) {
try { await this.transport.close(); } catch { /* already dead */ }
}
if (this.client) {
try { await this.client.close(); } catch { /* already dead */ }
}
this.client = null;
this.transport = null;
this.connected = false;
// Singleton invariant (#2313): kill any pre-existing chroma-mcp subprocess
// tree before spawning a new one. The MCP SDK's transport.close() only
// signals the direct child (uvx); on Linux the grandchildren (uv, python,
// chroma-mcp) get re-parented to init and survive, accumulating 20+
// instances per session if reconnects fire repeatedly. Reuse the same
// tree-kill primitive used by stop() so reconnect can never leave
// orphans behind.
await this.disposeCurrentSubprocess();
const commandArgs = this.buildCommandArgs();
const spawnEnvironment = this.getSpawnEnv();
@@ -121,14 +140,12 @@ export class ChromaMcpManager {
await Promise.race([mcpConnectionPromise, timeoutPromise]);
} catch (connectionError) {
clearTimeout(timeoutId!);
logger.warn('CHROMA_MCP', 'Connection failed, killing subprocess to prevent zombie', {
logger.warn('CHROMA_MCP', 'Connection failed, killing subprocess tree to prevent zombie', {
error: connectionError instanceof Error ? connectionError.message : String(connectionError)
});
try { await this.transport.close(); } catch { /* best effort */ }
try { await this.client.close(); } catch { /* best effort */ }
this.client = null;
this.transport = null;
this.connected = false;
// Tree-kill (not just transport.close) so failed-connect descendants
// can't survive on Linux (#2313).
await this.disposeCurrentSubprocess();
throw connectionError;
}
clearTimeout(timeoutId!);
@@ -139,6 +156,7 @@ export class ChromaMcpManager {
logger.info('CHROMA_MCP', 'Connected to chroma-mcp successfully');
const currentTransport = this.transport;
const currentTrackedPid = (this.transport as unknown as { _process?: ChildProcess })._process?.pid;
this.transport.onclose = () => {
if (this.transport !== currentTransport) {
logger.debug('CHROMA_MCP', 'Ignoring stale onclose from previous transport');
@@ -150,6 +168,20 @@ export class ChromaMcpManager {
this.client = null;
this.transport = null;
this.lastConnectionFailureTimestamp = Date.now();
// Direct child (uvx) emitted close, but on Linux the grandchildren
// (uv/python/chroma-mcp) often outlive their parent because MCP SDK
// does not use process groups. Sweep the descendant tree using the
// captured PID — best-effort; pgrep returns nothing if everything
// already exited (#2313).
if (currentTrackedPid) {
ChromaMcpManager.killProcessTree(currentTrackedPid).catch((error) => {
logger.debug('CHROMA_MCP', 'Background tree-kill after onclose finished (best-effort)', {
pid: currentTrackedPid,
error: error instanceof Error ? error.message : String(error)
});
});
}
};
}
@@ -158,6 +190,8 @@ export class ChromaMcpManager {
const chromaMode = settings.CLAUDE_MEM_CHROMA_MODE || 'local';
const pythonVersion = process.env.CLAUDE_MEM_PYTHON_VERSION || settings.CLAUDE_MEM_PYTHON_VERSION || '3.13';
const depOverrideFlags = CHROMA_MCP_DEP_OVERRIDES.flatMap(spec => ['--with', spec]);
if (chromaMode === 'remote') {
const chromaHost = settings.CLAUDE_MEM_CHROMA_HOST || '127.0.0.1';
const chromaPort = settings.CLAUDE_MEM_CHROMA_PORT || '8000';
@@ -168,6 +202,7 @@ export class ChromaMcpManager {
const args = [
'--python', pythonVersion,
...depOverrideFlags,
`chroma-mcp==${CHROMA_MCP_PINNED_VERSION}`,
'--client-type', 'http',
'--host', chromaHost,
@@ -193,6 +228,7 @@ export class ChromaMcpManager {
return [
'--python', pythonVersion,
...depOverrideFlags,
`chroma-mcp==${CHROMA_MCP_PINNED_VERSION}`,
'--client-type', 'persistent',
'--data-dir', DEFAULT_CHROMA_DATA_DIR.replace(/\\/g, '/')
@@ -213,14 +249,15 @@ export class ChromaMcpManager {
arguments: toolArguments
});
} catch (transportError) {
this.connected = false;
this.client = null;
this.transport = null;
logger.warn('CHROMA_MCP', `Transport error during "${toolName}", reconnecting and retrying once`, {
error: transportError instanceof Error ? transportError.message : String(transportError)
});
// Tree-kill the dying subprocess before reconnect. Previously this path
// just nulled the handle, which on Linux leaks the uv/python/chroma-mcp
// descendants every time a transport error happens (#2313).
await this.disposeCurrentSubprocess();
try {
await this.ensureConnected();
result = await this.client!.callTool({
@@ -328,6 +365,53 @@ export class ChromaMcpManager {
}
}
/**
* Singleton enforcement helper (#2313): tree-kill the currently tracked
* chroma-mcp subprocess and reset all state so the next spawn starts clean.
*
* Why this is the singleton invariant: every code path that intends to
* abandon `this.transport` / `this.client` (reconnect, transport error,
* connect-timeout, onclose, stop()) MUST funnel through here. The MCP
* SDK's transport.close() only signals the direct child (uvx); on Linux
* the grandchildren (uv, python, chroma-mcp) re-parent to init and
* accumulate. Calling killProcessTree() against the captured PID before
* we drop the reference is the only way to guarantee at most one
* chroma-mcp subprocess tree exists per worker process.
*
* Idempotent and best-effort — safe to call when there is no active
* subprocess (no-op in that case).
*/
private async disposeCurrentSubprocess(): Promise<void> {
const chromaProcess = (this.transport as unknown as { _process?: ChildProcess })?._process;
const trackedPid = chromaProcess?.pid;
if (trackedPid) {
try {
await ChromaMcpManager.killProcessTree(trackedPid);
} catch (error) {
logger.warn('CHROMA_MCP', 'failed to kill prior chroma-mcp tree (best-effort)', {
pid: trackedPid,
error: error instanceof Error ? error.message : String(error)
});
}
}
if (this.transport) {
try { await this.transport.close(); } catch { /* already dead */ }
}
if (this.client) {
try { await this.client.close(); } catch { /* already dead */ }
}
if (trackedPid) {
getSupervisor().unregisterProcess(CHROMA_SUPERVISOR_ID);
}
this.client = null;
this.transport = null;
this.connected = false;
}
/**
* Gracefully stop the MCP connection and kill the chroma-mcp subprocess tree.
*
@@ -341,34 +425,15 @@ export class ChromaMcpManager {
* pattern from shutdown.ts (Principle 5: OS-supervised teardown).
*/
async stop(): Promise<void> {
if (!this.client) {
if (!this.client && !this.transport) {
logger.debug('CHROMA_MCP', 'No active MCP connection to stop');
this.connecting = null;
return;
}
logger.info('CHROMA_MCP', 'Stopping chroma-mcp MCP connection');
// Kill the entire process tree before closing the MCP client so
// descendants (uv, python, chroma-mcp) don't become orphans.
const chromaProcess = (this.transport as unknown as { _process?: ChildProcess })?._process;
if (chromaProcess?.pid) {
await ChromaMcpManager.killProcessTree(chromaProcess.pid);
}
try {
await this.client.close();
} catch (error) {
if (error instanceof Error) {
logger.debug('CHROMA_MCP', 'Error during client close (subprocess may already be dead)', {}, error);
} else {
logger.debug('CHROMA_MCP', 'Error during client close (subprocess may already be dead)', { error: String(error) });
}
}
getSupervisor().unregisterProcess(CHROMA_SUPERVISOR_ID);
this.client = null;
this.transport = null;
this.connected = false;
await this.disposeCurrentSubprocess();
this.connecting = null;
logger.info('CHROMA_MCP', 'chroma-mcp MCP connection stopped');

View File

@@ -27,6 +27,19 @@ import {
import { query } from '@anthropic-ai/claude-agent-sdk';
import { ClassifiedProviderError } from './provider-errors.js';
/**
* Module-scoped guard so the "effort parameter" hint only fires once per
* worker process. The underlying cause (a leaked CLAUDE_CODE_EFFORT_LEVEL in
* ~/.claude-mem/.env, see #2357) is environmental — re-logging it on every
* SDK call would spam the logs without adding signal.
*
* Exported solely for tests to reset the latch between cases.
*/
let effortHintLogged = false;
export function __resetEffortHintLatchForTesting(): void {
effortHintLogged = false;
}
/**
* Classify a ClaudeProvider error (executable spawn failures, SDK errors,
* Anthropic API errors). Provider-specific because it relies on:
@@ -36,7 +49,7 @@ import { ClassifiedProviderError } from './provider-errors.js';
*/
export function classifyClaudeError(err: unknown): ClassifiedProviderError {
const message = err instanceof Error ? err.message : String(err);
const errAny = err as { name?: string; status?: number; error?: { type?: string } };
const errAny = err as { name?: string; status?: number; error?: { type?: string }; body?: unknown };
// Executable / spawn issues — unrecoverable, no point retrying.
if (
@@ -88,6 +101,39 @@ export function classifyClaudeError(err: unknown): ClassifiedProviderError {
return new ClassifiedProviderError(message, { kind: 'unrecoverable', cause: err });
}
// HTTP 400 from the Anthropic SDK — bad request, never recoverable. Mirrors
// the pattern in GeminiProvider.classifyGeminiError / classifyOpenRouterError
// (see #2357: the SDK forwards `effort` to the Messages API when
// CLAUDE_CODE_EFFORT_LEVEL leaks into the subprocess env, and models like
// Haiku/Sonnet 4.5 reject with 400 — without this branch the default
// `transient` classification retried indefinitely).
if (errAny.status === 400) {
// Inspect both the message and any structured body for the effort marker.
const bodyText = (() => {
const body = errAny.body;
if (typeof body === 'string') return body;
if (body && typeof body === 'object') {
try { return JSON.stringify(body); } catch { return ''; }
}
return '';
})();
const haystack = `${message}\n${bodyText}`;
if (/effort parameter/i.test(haystack) && !effortHintLogged) {
effortHintLogged = true;
logger.warn(
'SDK',
'Anthropic API rejected request with HTTP 400: this model does not support the `effort` parameter. ' +
'CLAUDE_CODE_EFFORT_LEVEL is likely leaking into the SDK subprocess env via ~/.claude-mem/.env — ' +
'remove it or scope it to models that support effort. See https://github.com/thedotmack/claude-mem/issues/2357.',
{ status: 400 }
);
}
return new ClassifiedProviderError(
message || 'Anthropic bad request (status 400)',
{ kind: 'unrecoverable', cause: err },
);
}
// Server errors → transient.
if (typeof errAny.status === 'number' && errAny.status >= 500 && errAny.status < 600) {
return new ClassifiedProviderError(message, { kind: 'transient', cause: err });

View File

@@ -9,7 +9,16 @@ import {
type OAuthTokenResult,
} from './oauth-token.js';
export const ENV_FILE_PATH = paths.envFile();
// Resolved lazily so tests (and any rare runtime path-overrides) can target a
// temp file via CLAUDE_MEM_ENV_FILE without depending on module-load order.
// Production callers see the canonical ~/.claude-mem/.env path through
// paths.envFile() unchanged.
export function envFilePath(): string {
return process.env.CLAUDE_MEM_ENV_FILE ?? paths.envFile();
}
/** @deprecated Prefer envFilePath(); kept as a snapshot for back-compat. */
export const ENV_FILE_PATH = envFilePath();
const BLOCKED_ENV_VARS = [
'ANTHROPIC_API_KEY', // Issue #733: Prevent auto-discovery from project .env files
@@ -17,6 +26,10 @@ const BLOCKED_ENV_VARS = [
// shell would otherwise short-circuit OAuth lookup at spawn time.
// The fresh token from ~/.claude-mem/.env is re-injected below
// when explicit gateway credentials are configured.
'ANTHROPIC_BASE_URL', // Issue #2375: same leak class as AUTH_TOKEN. A leaked BASE_URL
// alone (no token) was enough to trigger the OAuth-skip path,
// sending the subprocess to a proxy with no credentials.
// Re-injected from ~/.claude-mem/.env when configured.
'CLAUDECODE', // Prevent "cannot be launched inside another Claude Code session" error
'CLAUDE_CODE_OAUTH_TOKEN', // Issue #2215: prevent stale parent-process token from leaking into
// isolated env. The fresh token is read from the keychain at spawn
@@ -77,12 +90,13 @@ function serializeEnvFile(env: Record<string, string>): string {
}
export function loadClaudeMemEnv(): ClaudeMemEnv {
if (!existsSync(ENV_FILE_PATH)) {
const envFile = envFilePath();
if (!existsSync(envFile)) {
return {};
}
try {
const content = readFileSync(ENV_FILE_PATH, 'utf-8');
const content = readFileSync(envFile, 'utf-8');
const parsed = parseEnvFile(content);
const result: ClaudeMemEnv = {};
@@ -94,12 +108,13 @@ export function loadClaudeMemEnv(): ClaudeMemEnv {
return result;
} catch (error: unknown) {
logger.warn('ENV', 'Failed to load .env file', { path: ENV_FILE_PATH }, error instanceof Error ? error : new Error(String(error)));
logger.warn('ENV', 'Failed to load .env file', { path: envFile }, error instanceof Error ? error : new Error(String(error)));
return {};
}
}
export function saveClaudeMemEnv(env: ClaudeMemEnv): void {
const envFile = envFilePath();
let existing: Record<string, string> = {};
try {
if (!existsSync(paths.dataDir())) {
@@ -107,8 +122,8 @@ export function saveClaudeMemEnv(env: ClaudeMemEnv): void {
}
chmodSync(paths.dataDir(), 0o700);
existing = existsSync(ENV_FILE_PATH)
? parseEnvFile(readFileSync(ENV_FILE_PATH, 'utf-8'))
existing = existsSync(envFile)
? parseEnvFile(readFileSync(envFile, 'utf-8'))
: {};
} catch (error) {
const normalizedError = error instanceof Error ? error : new Error(String(error));
@@ -155,10 +170,10 @@ export function saveClaudeMemEnv(env: ClaudeMemEnv): void {
}
try {
writeFileSync(ENV_FILE_PATH, serializeEnvFile(updated), { encoding: 'utf-8', mode: 0o600 });
chmodSync(ENV_FILE_PATH, 0o600);
writeFileSync(envFile, serializeEnvFile(updated), { encoding: 'utf-8', mode: 0o600 });
chmodSync(envFile, 0o600);
} catch (error: unknown) {
logger.error('ENV', 'Failed to save .env file', { path: ENV_FILE_PATH }, error instanceof Error ? error : new Error(String(error)));
logger.error('ENV', 'Failed to save .env file', { path: envFile }, error instanceof Error ? error : new Error(String(error)));
throw error;
}
}
@@ -230,15 +245,17 @@ export async function buildIsolatedEnvWithFreshOAuth(
if (!includeCredentials) return isolatedEnv;
// If the user already configured explicit Anthropic/gateway credentials in
// ~/.claude-mem/.env, honor those and skip OAuth lookup entirely. A bare
// ANTHROPIC_BASE_URL counts because gateways may be tokenless, and falling
// back to OAuth would silently route requests to api.anthropic.com.
if (
isolatedEnv.ANTHROPIC_API_KEY ||
isolatedEnv.ANTHROPIC_BASE_URL ||
isolatedEnv.ANTHROPIC_AUTH_TOKEN
) {
// Custom gateway: never inject OAuth (would leak the user's Anthropic OAuth
// token to a third-party gateway). The user must explicitly configure a
// gateway-appropriate token in ~/.claude-mem/.env if their gateway requires
// one. A bare BASE_URL with no token = tokenless gateway (e.g. mTLS at the
// network boundary).
if (isolatedEnv.ANTHROPIC_BASE_URL) {
clearStaleMarker();
return isolatedEnv;
}
// Direct API with explicit credentials: skip OAuth lookup.
if (isolatedEnv.ANTHROPIC_API_KEY || isolatedEnv.ANTHROPIC_AUTH_TOKEN) {
clearStaleMarker();
return isolatedEnv;
}

View File

@@ -0,0 +1,122 @@
import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
import {
classifyClaudeError,
__resetEffortHintLatchForTesting,
} from '../src/services/worker/ClaudeProvider.js';
import { isClassified } from '../src/services/worker/provider-errors.js';
import { logger } from '../src/utils/logger.js';
/**
* Tests for HTTP 400 classification in ClaudeProvider's classifyClaudeError.
*
* Regression coverage for #2357: ClaudeProvider previously had no explicit
* HTTP 400 handling, so the default branch classified all 400s as `transient`
* and the retry loop would hammer a permanent error indefinitely (e.g. when
* CLAUDE_CODE_EFFORT_LEVEL leaks into the SDK subprocess and the model
* rejects the `effort` parameter).
*/
describe('classifyClaudeError — HTTP 400 handling (#2357)', () => {
let warnSpy: ReturnType<typeof spyOn>;
beforeEach(() => {
__resetEffortHintLatchForTesting();
warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
});
afterEach(() => {
warnSpy.mockRestore();
__resetEffortHintLatchForTesting();
});
it('classifies 400 with "effort parameter" body as unrecoverable AND logs an SDK warn once', () => {
const sdkErr = Object.assign(
new Error('This model does not support the effort parameter.'),
{ status: 400 },
);
const classified = classifyClaudeError(sdkErr);
expect(isClassified(classified)).toBe(true);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).toHaveBeenCalledTimes(1);
// First positional arg of logger.warn is the component category.
const [component, hintMessage] = warnSpy.mock.calls[0] as [string, string, ...unknown[]];
expect(component).toBe('SDK');
expect(hintMessage).toMatch(/effort/i);
expect(hintMessage).toMatch(/2357/);
});
it('classifies 400 with effort marker in a structured body field', () => {
const sdkErr = Object.assign(
new Error('Bad request'),
{
status: 400,
body: { error: { message: 'This model does not support the effort parameter.' } },
},
);
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).toHaveBeenCalledTimes(1);
});
it('classifies 400 without effort body as unrecoverable WITHOUT firing the effort hint', () => {
const sdkErr = Object.assign(
new Error('some other 400 error'),
{ status: 400 },
);
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
expect(warnSpy).not.toHaveBeenCalled();
});
it('throttles the effort hint to one log per process even on repeated 400s', () => {
const sdkErr = Object.assign(
new Error('This model does not support the effort parameter.'),
{ status: 400 },
);
for (let i = 0; i < 5; i++) {
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('unrecoverable');
}
expect(warnSpy).toHaveBeenCalledTimes(1);
});
});
describe('classifyClaudeError — sibling status codes (regression sanity)', () => {
let warnSpy: ReturnType<typeof spyOn>;
beforeEach(() => {
__resetEffortHintLatchForTesting();
warnSpy = spyOn(logger, 'warn').mockImplementation(() => {});
});
afterEach(() => {
warnSpy.mockRestore();
__resetEffortHintLatchForTesting();
});
it('classifies status=401 as auth_invalid', () => {
const sdkErr = Object.assign(new Error('unauthorized'), { status: 401 });
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('auth_invalid');
});
it('classifies status=429 as rate_limit', () => {
const sdkErr = Object.assign(new Error('rate limited'), { status: 429 });
const classified = classifyClaudeError(sdkErr);
expect(classified.kind).toBe('rate_limit');
});
it('classifies a network error with no status as transient', () => {
const networkErr = new Error('ECONNRESET: socket hang up');
const classified = classifyClaudeError(networkErr);
expect(classified.kind).toBe('transient');
});
});

155
tests/env-isolation.test.ts Normal file
View File

@@ -0,0 +1,155 @@
import { describe, it, expect, beforeAll, afterAll, beforeEach, afterEach, spyOn } from 'bun:test';
import * as fs from 'fs';
import { tmpdir } from 'os';
import { join } from 'path';
import {
envFilePath,
buildIsolatedEnv,
buildIsolatedEnvWithFreshOAuth,
} from '../src/shared/EnvManager.js';
import * as oauthToken from '../src/shared/oauth-token.js';
/**
* Tests for issue #2375: ANTHROPIC_BASE_URL must not leak from the parent
* shell into the spawned worker's isolatedEnv, AND the OAuth-skip predicate
* must not inject the user's Anthropic OAuth token onto a custom gateway URL
* (which would be a token leak to a third party).
*
* Redirect EnvManager to a per-suite temp file via CLAUDE_MEM_ENV_FILE so
* the user's real ~/.claude-mem/.env is never read or mutated even if a test
* fails mid-flight. envFilePath() resolves the override on every call, so
* this works regardless of the order other tests imported the module.
*/
const TEST_DATA_DIR = fs.mkdtempSync(join(tmpdir(), 'claude-mem-env-isolation-'));
const TEST_ENV_FILE = join(TEST_DATA_DIR, '.env');
const ORIGINAL_ENV_FILE = process.env.CLAUDE_MEM_ENV_FILE;
const ORIGINAL_BASE_URL = process.env.ANTHROPIC_BASE_URL;
const ORIGINAL_API_KEY = process.env.ANTHROPIC_API_KEY;
const ORIGINAL_AUTH_TOKEN = process.env.ANTHROPIC_AUTH_TOKEN;
const ORIGINAL_OAUTH_TOKEN = process.env.CLAUDE_CODE_OAUTH_TOKEN;
function clearEnvFile(): void {
if (fs.existsSync(TEST_ENV_FILE)) {
fs.unlinkSync(TEST_ENV_FILE);
}
}
function clearAnthropicEnv(): void {
delete process.env.ANTHROPIC_BASE_URL;
delete process.env.ANTHROPIC_API_KEY;
delete process.env.ANTHROPIC_AUTH_TOKEN;
delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
}
function restoreOriginalEnv(): void {
if (ORIGINAL_BASE_URL === undefined) {
delete process.env.ANTHROPIC_BASE_URL;
} else {
process.env.ANTHROPIC_BASE_URL = ORIGINAL_BASE_URL;
}
if (ORIGINAL_API_KEY === undefined) {
delete process.env.ANTHROPIC_API_KEY;
} else {
process.env.ANTHROPIC_API_KEY = ORIGINAL_API_KEY;
}
if (ORIGINAL_AUTH_TOKEN === undefined) {
delete process.env.ANTHROPIC_AUTH_TOKEN;
} else {
process.env.ANTHROPIC_AUTH_TOKEN = ORIGINAL_AUTH_TOKEN;
}
if (ORIGINAL_OAUTH_TOKEN === undefined) {
delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
} else {
process.env.CLAUDE_CODE_OAUTH_TOKEN = ORIGINAL_OAUTH_TOKEN;
}
}
describe('Issue #2375: ANTHROPIC_BASE_URL env-var isolation', () => {
beforeAll(() => {
fs.mkdirSync(TEST_DATA_DIR, { recursive: true, mode: 0o700 });
process.env.CLAUDE_MEM_ENV_FILE = TEST_ENV_FILE;
expect(envFilePath()).toBe(TEST_ENV_FILE);
});
afterAll(() => {
fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
if (ORIGINAL_ENV_FILE === undefined) {
delete process.env.CLAUDE_MEM_ENV_FILE;
} else {
process.env.CLAUDE_MEM_ENV_FILE = ORIGINAL_ENV_FILE;
}
});
beforeEach(() => {
clearEnvFile();
clearAnthropicEnv();
});
afterEach(() => {
clearEnvFile();
restoreOriginalEnv();
});
it('leaked ANTHROPIC_BASE_URL is stripped from isolatedEnv', () => {
// No .env file exists. The parent shell sets a stray ANTHROPIC_BASE_URL —
// this MUST NOT propagate into the subprocess isolatedEnv, because doing
// so used to trigger the OAuth-skip path and leave the worker with no
// credentials at all.
process.env.ANTHROPIC_BASE_URL = 'https://shouldnotleak.example';
const result = buildIsolatedEnv();
expect(result.ANTHROPIC_BASE_URL).toBeUndefined();
});
it('~/.claude-mem/.env BASE_URL + AUTH_TOKEN reaches isolatedEnv', () => {
// User intentionally configured a gateway with a gateway-appropriate
// auth token. Both must be re-injected into isolatedEnv.
fs.writeFileSync(
TEST_ENV_FILE,
'ANTHROPIC_BASE_URL=https://gateway.example\nANTHROPIC_AUTH_TOKEN=test-token\n',
{ mode: 0o600 },
);
const result = buildIsolatedEnv();
expect(result.ANTHROPIC_BASE_URL).toBe('https://gateway.example');
expect(result.ANTHROPIC_AUTH_TOKEN).toBe('test-token');
});
it('bare .env BASE_URL alone does not trigger OAuth fetch', async () => {
// A user with a tokenless gateway (e.g. mTLS at the network boundary)
// configures BASE_URL only. The three-branch predicate must hit the
// BASE_URL-set branch BEFORE OAuth lookup, so CLAUDE_CODE_OAUTH_TOKEN
// must NOT appear in the result. This is the security-regression guard
// against a token leak to a third-party gateway.
//
// Note: EnvManager captures readClaudeOAuthToken via a named import at
// module load, so spyOn on the namespace export only weakly observes
// the call (the binding inside EnvManager is independent). The
// behavioral assertions (BASE_URL re-injected AND OAuth token NOT
// injected) are the load-bearing checks: in the no-OAuth-injection
// outcome, the only execution path that produces this combination is
// the new BASE_URL-first branch returning early.
fs.writeFileSync(
TEST_ENV_FILE,
'ANTHROPIC_BASE_URL=https://gateway.example\n',
{ mode: 0o600 },
);
const oauthSpy = spyOn(oauthToken, 'readClaudeOAuthToken');
try {
const result = await buildIsolatedEnvWithFreshOAuth();
expect(result.ANTHROPIC_BASE_URL).toBe('https://gateway.example');
expect(result.CLAUDE_CODE_OAUTH_TOKEN).toBeUndefined();
// Best-effort sanity check; see note above.
expect(oauthSpy).not.toHaveBeenCalled();
} finally {
oauthSpy.mockRestore();
}
});
});

View File

@@ -0,0 +1,228 @@
import { describe, it, expect, beforeEach, mock } from 'bun:test';
// Singleton enforcement regression coverage for issue #2313.
//
// Hypothesis under test: prior to the fix, ChromaMcpManager could leak its
// chroma-mcp subprocess tree on every reconnect / transport error, accumulating
// 20+ instances per session on Linux because the MCP SDK's transport.close()
// only signals the direct child (uvx). The fix routes every "abandon current
// transport" path through disposeCurrentSubprocess(), which tree-kills via
// killProcessTree() before nulling the handles.
let transportCount = 0;
const transportInstances: Array<FakeTransport> = [];
interface FakeChildProcess {
pid: number;
once: (event: string, _cb: (...args: unknown[]) => void) => FakeChildProcess;
on: (event: string, _cb: (...args: unknown[]) => void) => FakeChildProcess;
}
class FakeTransport {
static nextPid = 100_000;
onclose: (() => void) | null = null;
closed = false;
// Mimic StdioClientTransport's internal `_process` field that the manager
// pokes into via `(this.transport as unknown as { _process })._process`.
_process: FakeChildProcess;
constructor(_opts: { command: string; args: string[] }) {
transportCount += 1;
const pid = FakeTransport.nextPid++;
const child: FakeChildProcess = {
pid,
once: function (this: FakeChildProcess) { return this; },
on: function (this: FakeChildProcess) { return this; },
};
this._process = child;
transportInstances.push(this);
}
async close(): Promise<void> {
this.closed = true;
}
}
mock.module('@modelcontextprotocol/sdk/client/stdio.js', () => ({
StdioClientTransport: FakeTransport,
}));
let connectImpl: () => Promise<void> = async () => {};
let callToolImpl: () => Promise<unknown> = async () => ({
content: [{ type: 'text', text: '{}' }],
});
class FakeClient {
closed = false;
async connect(): Promise<void> {
await connectImpl();
}
async callTool(): Promise<unknown> {
return await callToolImpl();
}
async close(): Promise<void> {
this.closed = true;
}
}
mock.module('@modelcontextprotocol/sdk/client/index.js', () => ({
Client: FakeClient,
}));
mock.module('../../../src/shared/SettingsDefaultsManager.js', () => ({
SettingsDefaultsManager: {
get: () => '',
getInt: () => 0,
loadFromFile: () => ({}),
},
}));
mock.module('../../../src/shared/paths.js', () => ({
USER_SETTINGS_PATH: '/tmp/fake-settings.json',
paths: {
chroma: () => '/tmp/fake-chroma',
combinedCerts: () => '/tmp/fake-combined-certs.pem',
},
}));
mock.module('../../../src/utils/logger.js', () => ({
logger: {
info: () => {},
debug: () => {},
warn: () => {},
error: () => {},
failure: () => {},
},
}));
// Track tree-kill invocations and the transport whose subprocess was killed.
const killTreeCalls: number[] = [];
mock.module('../../../src/supervisor/index.ts', () => ({
getSupervisor: () => ({
assertCanSpawn: () => {},
registerProcess: () => {},
unregisterProcess: () => {},
}),
}));
mock.module('../../../src/supervisor/env-sanitizer.js', () => ({
sanitizeEnv: (env: NodeJS.ProcessEnv) => env,
}));
// Replace child_process.execFile so the static killProcessTree implementation
// can be observed without actually shelling out. We feed pgrep an empty stdout
// (no descendants) so the only signal target is the root pid.
mock.module('child_process', () => {
const original = require('node:child_process');
return {
...original,
execFile: (
cmd: string,
args: string[],
_opts: unknown,
cb: (err: Error | null, stdout: { stdout: string; stderr: string }) => void
) => {
// Bun's promisify path will call this as if it were a Node-style callback.
if (cmd === 'pgrep') {
cb(null, { stdout: '', stderr: '' } as any);
} else {
cb(null, { stdout: '', stderr: '' } as any);
}
},
execSync: () => '',
};
});
// Stub process.kill so the tree-kill path can record targets without crashing
// the test runner if the synthetic PID happens to collide with a real one.
const realProcessKill = process.kill.bind(process);
const stubbedProcessKill = ((pid: number, _signal?: string | number) => {
killTreeCalls.push(pid);
return true;
}) as typeof process.kill;
process.kill = stubbedProcessKill;
import { ChromaMcpManager } from '../../../src/services/sync/ChromaMcpManager.js';
function resetState(): void {
transportCount = 0;
transportInstances.length = 0;
killTreeCalls.length = 0;
connectImpl = async () => {};
callToolImpl = async () => ({ content: [{ type: 'text', text: '{}' }] });
}
describe('ChromaMcpManager singleton enforcement (#2313)', () => {
beforeEach(async () => {
await ChromaMcpManager.reset();
resetState();
});
it('serializes concurrent ensureConnected() calls into one spawn', async () => {
const mgr = ChromaMcpManager.getInstance();
// Five parallel callers race ensureConnected via callTool — only one
// chroma-mcp subprocess (one transport) should be spawned.
await Promise.all(
Array.from({ length: 5 }, () =>
mgr.callTool('chroma_list_collections', { limit: 1 })
)
);
expect(transportCount).toBe(1);
});
it('kills the prior subprocess tree before a reconnect spawn', async () => {
const mgr = ChromaMcpManager.getInstance();
// First call: opens transport #1.
await mgr.callTool('chroma_list_collections', { limit: 1 });
expect(transportInstances.length).toBe(1);
const firstPid = transportInstances[0]._process.pid;
// Second call: rig callTool to throw a transport error on the FIRST attempt
// so the manager runs its reconnect-and-retry path. The retry should
// dispose the prior subprocess tree (firstPid) before spawning a new one.
let invocations = 0;
callToolImpl = async () => {
invocations += 1;
if (invocations === 1) {
throw new Error('Connection closed');
}
return { content: [{ type: 'text', text: '{}' }] };
};
await mgr.callTool('chroma_list_collections', { limit: 1 });
expect(transportInstances.length).toBe(2);
// The first transport's pid must have been signaled by killProcessTree
// before the second transport spawned.
expect(killTreeCalls).toContain(firstPid);
});
it('stop() disposes state including any pending connecting promise', async () => {
const mgr = ChromaMcpManager.getInstance();
await mgr.callTool('chroma_list_collections', { limit: 1 });
expect(transportInstances.length).toBe(1);
const subprocessPid = transportInstances[0]._process.pid;
await mgr.stop();
// After stop(), every internal handle should be cleared and the prior
// subprocess tree must have been signaled.
expect(killTreeCalls).toContain(subprocessPid);
// A subsequent ensureConnected must spawn a fresh transport (not reuse
// a stale one).
await mgr.callTool('chroma_list_collections', { limit: 1 });
expect(transportInstances.length).toBe(2);
});
});
// Restore the real process.kill once the test module finishes evaluating any
// late-arriving microtasks.
process.on('exit', () => {
process.kill = realProcessKill;
});