diff --git a/docs/public/docs.json b/docs/public/docs.json index 5707cda7e..baf786283 100644 --- a/docs/public/docs.json +++ b/docs/public/docs.json @@ -90,6 +90,13 @@ "openclaw-integration" ] }, + { + "group": "Hosted Server", + "icon": "cloud", + "pages": [ + "hosted-server" + ] + }, { "group": "SDK & Embedding", "icon": "code", diff --git a/docs/public/hosted-server.mdx b/docs/public/hosted-server.mdx new file mode 100644 index 000000000..374b77046 --- /dev/null +++ b/docs/public/hosted-server.mdx @@ -0,0 +1,251 @@ +--- +title: "Hosted Server (Beta)" +description: "Remote authenticated MCP recall, usage metering + quotas, and data deletion — how claude-mem's cloud server works today." +--- + +# Hosted Server (Beta) + + +**This is early and moving fast.** The hosted server's capture, recall, metering, +and deletion paths described below are real and tested, but the **UX and developer +experience around them are still being built** — there's no polished dashboard, +onboarding flow, or self-serve signup yet. Expect the *plumbing* to be solid and +the *paving* to be unfinished. Routes, env var names, and the first-key bootstrap +flow may shift as we wire up the dashboard. Pin a version if you're integrating. + + +The hosted server is the cloud side of claude-mem: a Postgres-backed HTTP service +(`/v1`) plus a separate BullMQ generation worker. Where the local plugin keeps +memory in `~/.claude-mem/claude-mem.db` on your machine, the hosted server keeps +it per **team** and per **project** in Postgres, and exposes it back to any MCP +client over an authenticated link. + +Three capabilities landed together and are documented here: + + + + Paste an authenticated link into Claude Code to recall your cloud memory — + read-only, team/project-scoped. + + + Opt-in rate limiting, monthly request/token quotas, and usage metering — + the guards a paid tier needs. + + + Right-to-erasure: forget a single memory, or purge everything captured for a + project. + + + +## The shape of the system + +``` + Claude Code (or any MCP client) + │ Authorization: Bearer cm_... + ▼ + ┌─────────────────────────────┐ ┌──────────────────────────┐ + │ HTTP server (/v1) │ jobs │ BullMQ generation worker │ + │ - auth (api-key mode) ├───────▶│ claude-mem server │ + │ - rate limit / quota / meter │ │ worker start │ + │ - REST + /v1/mcp recall │ │ - provider call │ + │ - data deletion │ │ - writes observations │ + └──────────────┬───────────────┘ └────────────┬─────────────┘ + │ │ + ▼ ▼ + ┌───────────────────────────────────────────────────┐ + │ Postgres (teams, projects, observations, │ + │ agent_events, server_sessions, generation jobs, │ + │ api_keys, usage_events, audit_log) │ + └───────────────────────────────────────────────────┘ +``` + +Every row is scoped by `(team_id, project_id)`. An API key carries a **team** +(always) and an optional **project** scope; that scoping bounds every read, +write, and delete. + +### Authentication + +Set `CLAUDE_MEM_AUTH_MODE=api-key` and send `Authorization: Bearer ` on every +request. Scopes gate access: + +- **Read** endpoints (search, context, recall, usage) require `memories:read`. +- **Write** endpoints (ingest, key issuance, deletion) require `memories:write`. + +Keys are stored as SHA-256 hashes in the `api_keys` table; the raw `cm_...` value +is shown exactly once, at mint time. + +## Remote authenticated MCP recall + +`/v1/mcp` is a streamable-HTTP [MCP](https://modelcontextprotocol.io) server. It's +the secure link a user pastes into Claude Code to recall their cloud memory. It is +**read-only** and authenticated by the same API key as the REST routes +(`memories:read`); the key's team — and project, if the key is project-scoped — +bounds every read. + +```bash +claude mcp add --transport http claude-mem /v1/mcp \ + --header "Authorization: Bearer cm_..." +``` + +Three tools are exposed, each mirroring an existing REST path: + +| Tool | Arguments | Returns | +|-----------|------------------------------------|---------| +| `search` | `{ projectId, query, limit? }` | Matching observations (full-text search). | +| `context` | `{ projectId, query, limit? }` | Observations **plus** a concatenated `context` string ready for prompt injection. | +| `recent` | `{ projectId, limit? }` | The newest observations for a project. | + + +The transport is **stateless** — one MCP server + transport per request — so it +needs no session affinity behind a load balancer. Mutating tools are +intentionally absent: a pasted recall link can never write or delete. Every read +is written to `audit_log` as an `observation.read` event, the same as +`POST /v1/search`. + + +## Connecting a client: key issuance + connect + +Two routes turn "I have a server" into "Claude Code is recalling my cloud memory": + +- **`POST /v1/keys`** (requires `memories:write`) mints a **read-only** API key for + the caller's team and returns a paste-ready connect command. The raw key appears + **once**. Body: `{ "expiresInDays"?: number }`. Minting requires write scope so a + read-only key can't escalate itself into more keys. + + ```json + { + "id": "...", + "apiKey": "cm_...", + "scopes": ["memories:read"], + "expiresAt": null, + "mcpUrl": "https:///v1/mcp", + "connectCommand": "claude mcp add --transport http claude-mem https:///v1/mcp --header \"Authorization: Bearer cm_...\"" + } + ``` + +- **`GET /v1/connect`** (requires `memories:read`) returns the same command with a + `` placeholder — a GET never mints. The `mcpUrl` is built from + `CLAUDE_MEM_PUBLIC_URL` (recommended when behind a proxy or load balancer) or, + failing that, the request host. + + +**First-key bootstrap is the rough edge.** Minting a team's *very first* key still +needs a session-gated path (a web dashboard), because `POST /v1/keys` itself +requires a write-scoped key. better-auth's `apiKey()` plugin exists but writes to +a different store than the Postgres `api_keys` these routes authenticate against — +wiring the better-auth org → team mapping is the remaining piece, and the biggest +part of the devex work still ahead. + + +## Paid-readiness: rate limiting, quotas, metering + +These guards run **after** auth and are **opt-in via environment variables**. Unset +(the default) means no rate limit, no quota, and no metering — behavior is +identical to a server without them. Every guard **fails open**: a backing-store +error never blocks a legitimate request. + +| Env var | Effect | Response when exceeded | +|---------|--------|------------------------| +| `CLAUDE_MEM_RATE_LIMIT_PER_MIN` | Max requests per **API key** per minute. | `429` with `Retry-After` and `X-RateLimit-*` headers. | +| `CLAUDE_MEM_MONTHLY_REQUEST_CAP` | Max requests per **team** per calendar month (UTC). | `402 quota_exceeded`. | +| `CLAUDE_MEM_MONTHLY_TOKEN_CAP` | Max provider **tokens** per team per month. Gates **writes only** — reads stay open so a team over budget can still recall. | `402` at the cap. | +| `CLAUDE_MEM_USAGE_METERING=1` | Records one `request` usage event per authenticated call (fire-and-forget). | — | + +Token and observation metering is written to the same `usage_events` table from +the generation worker, so usage reflects real provider spend, not just HTTP calls. + +`GET /v1/usage` returns the caller team's per-kind totals for the current month: + +```json +{ "since": "2026-06-01T00:00:00.000Z", "usage": { "request": 1280, "observation": 44 } } +``` + + +"Gates writes only" is deliberate: ingestion is what drives generation, which is +what costs tokens. A team that blows its token budget can still **read** its +existing memory — you never lock someone out of their own data over billing. + + +## Data deletion (forget) + +Right-to-erasure. Both routes require `memories:write` and are scoped to the +caller's team. Both write an `audit_log` entry. + +- **`DELETE /v1/memories/:id`** — delete a single observation; its + `observation_sources` cascade. Returns `404` if no such observation exists for + the team. Audited as `observation.deleted`. + +- **`DELETE /v1/projects/:projectId/memory`** — purge **all** captured content for + a project in one transaction: observations, raw agent events, server sessions, + and generation jobs. The project shell (config/membership) is kept so the team + can keep using it. Returns per-table `counts`. Returns `404` if the project + doesn't belong to the team. Audited as `project.memory_purged`. + + ```json + { "purged": true, "projectId": "...", "counts": { "observations": 42, "agentEvents": 17, "sessions": 3, "jobs": 17 } } + ``` + + +Deletion is team-scoped at the SQL layer, so a key can only ever erase its own +team's data — a cross-team or nonexistent `projectId` returns `404` rather than a +misleading success. + + +## Event generation semantics + +Ingestion (`POST /v1/events`) accepts two query flags that control observation +generation: + +- `generate=false` — write the event but do **not** enqueue a generation job. +- `wait=true` — return the `generationJob` descriptor so callers can poll + `GET /v1/jobs/:id` for completion. + +Without `wait=true`, the response includes the new event row plus a best-effort +`generationJob` field. With `wait=true`, that field is always populated (or `null` +only when generation was explicitly disabled). The actual provider call happens in +the separate BullMQ worker (`claude-mem server worker start`) — the HTTP path +**never blocks** on a provider response. + +## Endpoint reference + +All endpoints are mounted under `/v1`; legacy worker routes remain under `/api`. + +``` +GET /healthz +GET /v1/info +GET /v1/projects +POST /v1/projects +GET /v1/projects/:id +POST /v1/sessions/start +POST /v1/sessions/:id/end +GET /v1/sessions/:id +POST /v1/events # ?generate= ?wait= +POST /v1/events/batch +GET /v1/events/:id +POST /v1/memories +GET /v1/memories/:id +PATCH /v1/memories/:id +DELETE /v1/memories/:id # forget one observation +POST /v1/search +POST /v1/context +ALL /v1/mcp # remote authenticated MCP recall +POST /v1/keys # mint a read-only key (write scope) +GET /v1/connect # connect command with key placeholder +GET /v1/usage # current-month usage totals +DELETE /v1/projects/:projectId/memory # purge a whole project +GET /v1/audit?projectId= +``` + +## What's solid vs. what's coming + + +**Solid today:** Postgres-backed multi-tenant storage, api-key auth with +read/write scopes, the `/v1/mcp` recall link, opt-in rate limiting + quotas + +metering, and audited data deletion. All covered by the Postgres-gated e2e suite. + +**Still being built (UX / devex):** a web dashboard for the first-key bootstrap and +key management, self-serve onboarding, a billing/plan UI on top of the metering +primitives, and a smoother "connect Claude Code to my cloud memory" flow than +pasting a CLI command. These are the next focus — the primitives above are the +foundation they'll sit on. +