docs: add Hosted Server (Beta) page — MCP recall, paid-readiness, data deletion

Documents the cloud server's current state across the three merged features
(#3070/#3078/#3087): remote authenticated /v1/mcp recall, opt-in rate
limiting/quotas/usage metering, and audited data deletion. Includes the
explicit caveat that the UX/devex flow (dashboard, first-key bootstrap,
onboarding, billing UI) is still being built.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-06-29 19:46:58 -07:00
parent 482b4f0a15
commit 5def052993
2 changed files with 258 additions and 0 deletions

View File

@@ -90,6 +90,13 @@
"openclaw-integration"
]
},
{
"group": "Hosted Server",
"icon": "cloud",
"pages": [
"hosted-server"
]
},
{
"group": "SDK & Embedding",
"icon": "code",

View File

@@ -0,0 +1,251 @@
---
title: "Hosted Server (Beta)"
description: "Remote authenticated MCP recall, usage metering + quotas, and data deletion — how claude-mem's cloud server works today."
---
# Hosted Server (Beta)
<Warning>
**This is early and moving fast.** The hosted server's capture, recall, metering,
and deletion paths described below are real and tested, but the **UX and developer
experience around them are still being built** — there's no polished dashboard,
onboarding flow, or self-serve signup yet. Expect the *plumbing* to be solid and
the *paving* to be unfinished. Routes, env var names, and the first-key bootstrap
flow may shift as we wire up the dashboard. Pin a version if you're integrating.
</Warning>
The hosted server is the cloud side of claude-mem: a Postgres-backed HTTP service
(`/v1`) plus a separate BullMQ generation worker. Where the local plugin keeps
memory in `~/.claude-mem/claude-mem.db` on your machine, the hosted server keeps
it per **team** and per **project** in Postgres, and exposes it back to any MCP
client over an authenticated link.
Three capabilities landed together and are documented here:
<CardGroup cols={3}>
<Card title="Remote MCP recall" icon="plug">
Paste an authenticated link into Claude Code to recall your cloud memory —
read-only, team/project-scoped.
</Card>
<Card title="Paid-readiness" icon="gauge">
Opt-in rate limiting, monthly request/token quotas, and usage metering —
the guards a paid tier needs.
</Card>
<Card title="Data deletion" icon="trash">
Right-to-erasure: forget a single memory, or purge everything captured for a
project.
</Card>
</CardGroup>
## The shape of the system
```
Claude Code (or any MCP client)
│ Authorization: Bearer cm_...
┌─────────────────────────────┐ ┌──────────────────────────┐
│ HTTP server (/v1) │ jobs │ BullMQ generation worker │
│ - auth (api-key mode) ├───────▶│ claude-mem server │
│ - rate limit / quota / meter │ │ worker start │
│ - REST + /v1/mcp recall │ │ - provider call │
│ - data deletion │ │ - writes observations │
└──────────────┬───────────────┘ └────────────┬─────────────┘
│ │
▼ ▼
┌───────────────────────────────────────────────────┐
│ Postgres (teams, projects, observations, │
│ agent_events, server_sessions, generation jobs, │
│ api_keys, usage_events, audit_log) │
└───────────────────────────────────────────────────┘
```
Every row is scoped by `(team_id, project_id)`. An API key carries a **team**
(always) and an optional **project** scope; that scoping bounds every read,
write, and delete.
### Authentication
Set `CLAUDE_MEM_AUTH_MODE=api-key` and send `Authorization: Bearer <key>` on every
request. Scopes gate access:
- **Read** endpoints (search, context, recall, usage) require `memories:read`.
- **Write** endpoints (ingest, key issuance, deletion) require `memories:write`.
Keys are stored as SHA-256 hashes in the `api_keys` table; the raw `cm_...` value
is shown exactly once, at mint time.
## Remote authenticated MCP recall
`/v1/mcp` is a streamable-HTTP [MCP](https://modelcontextprotocol.io) server. It's
the secure link a user pastes into Claude Code to recall their cloud memory. It is
**read-only** and authenticated by the same API key as the REST routes
(`memories:read`); the key's team — and project, if the key is project-scoped —
bounds every read.
```bash
claude mcp add --transport http claude-mem <server-base>/v1/mcp \
--header "Authorization: Bearer cm_..."
```
Three tools are exposed, each mirroring an existing REST path:
| Tool | Arguments | Returns |
|-----------|------------------------------------|---------|
| `search` | `{ projectId, query, limit? }` | Matching observations (full-text search). |
| `context` | `{ projectId, query, limit? }` | Observations **plus** a concatenated `context` string ready for prompt injection. |
| `recent` | `{ projectId, limit? }` | The newest observations for a project. |
<Note>
The transport is **stateless** — one MCP server + transport per request — so it
needs no session affinity behind a load balancer. Mutating tools are
intentionally absent: a pasted recall link can never write or delete. Every read
is written to `audit_log` as an `observation.read` event, the same as
`POST /v1/search`.
</Note>
## Connecting a client: key issuance + connect
Two routes turn "I have a server" into "Claude Code is recalling my cloud memory":
- **`POST /v1/keys`** (requires `memories:write`) mints a **read-only** API key for
the caller's team and returns a paste-ready connect command. The raw key appears
**once**. Body: `{ "expiresInDays"?: number }`. Minting requires write scope so a
read-only key can't escalate itself into more keys.
```json
{
"id": "...",
"apiKey": "cm_...",
"scopes": ["memories:read"],
"expiresAt": null,
"mcpUrl": "https://<host>/v1/mcp",
"connectCommand": "claude mcp add --transport http claude-mem https://<host>/v1/mcp --header \"Authorization: Bearer cm_...\""
}
```
- **`GET /v1/connect`** (requires `memories:read`) returns the same command with a
`<YOUR_API_KEY>` placeholder — a GET never mints. The `mcpUrl` is built from
`CLAUDE_MEM_PUBLIC_URL` (recommended when behind a proxy or load balancer) or,
failing that, the request host.
<Warning>
**First-key bootstrap is the rough edge.** Minting a team's *very first* key still
needs a session-gated path (a web dashboard), because `POST /v1/keys` itself
requires a write-scoped key. better-auth's `apiKey()` plugin exists but writes to
a different store than the Postgres `api_keys` these routes authenticate against —
wiring the better-auth org → team mapping is the remaining piece, and the biggest
part of the devex work still ahead.
</Warning>
## Paid-readiness: rate limiting, quotas, metering
These guards run **after** auth and are **opt-in via environment variables**. Unset
(the default) means no rate limit, no quota, and no metering — behavior is
identical to a server without them. Every guard **fails open**: a backing-store
error never blocks a legitimate request.
| Env var | Effect | Response when exceeded |
|---------|--------|------------------------|
| `CLAUDE_MEM_RATE_LIMIT_PER_MIN` | Max requests per **API key** per minute. | `429` with `Retry-After` and `X-RateLimit-*` headers. |
| `CLAUDE_MEM_MONTHLY_REQUEST_CAP` | Max requests per **team** per calendar month (UTC). | `402 quota_exceeded`. |
| `CLAUDE_MEM_MONTHLY_TOKEN_CAP` | Max provider **tokens** per team per month. Gates **writes only** — reads stay open so a team over budget can still recall. | `402` at the cap. |
| `CLAUDE_MEM_USAGE_METERING=1` | Records one `request` usage event per authenticated call (fire-and-forget). | — |
Token and observation metering is written to the same `usage_events` table from
the generation worker, so usage reflects real provider spend, not just HTTP calls.
`GET /v1/usage` returns the caller team's per-kind totals for the current month:
```json
{ "since": "2026-06-01T00:00:00.000Z", "usage": { "request": 1280, "observation": 44 } }
```
<Note>
"Gates writes only" is deliberate: ingestion is what drives generation, which is
what costs tokens. A team that blows its token budget can still **read** its
existing memory — you never lock someone out of their own data over billing.
</Note>
## Data deletion (forget)
Right-to-erasure. Both routes require `memories:write` and are scoped to the
caller's team. Both write an `audit_log` entry.
- **`DELETE /v1/memories/:id`** — delete a single observation; its
`observation_sources` cascade. Returns `404` if no such observation exists for
the team. Audited as `observation.deleted`.
- **`DELETE /v1/projects/:projectId/memory`** — purge **all** captured content for
a project in one transaction: observations, raw agent events, server sessions,
and generation jobs. The project shell (config/membership) is kept so the team
can keep using it. Returns per-table `counts`. Returns `404` if the project
doesn't belong to the team. Audited as `project.memory_purged`.
```json
{ "purged": true, "projectId": "...", "counts": { "observations": 42, "agentEvents": 17, "sessions": 3, "jobs": 17 } }
```
<Note>
Deletion is team-scoped at the SQL layer, so a key can only ever erase its own
team's data — a cross-team or nonexistent `projectId` returns `404` rather than a
misleading success.
</Note>
## Event generation semantics
Ingestion (`POST /v1/events`) accepts two query flags that control observation
generation:
- `generate=false` — write the event but do **not** enqueue a generation job.
- `wait=true` — return the `generationJob` descriptor so callers can poll
`GET /v1/jobs/:id` for completion.
Without `wait=true`, the response includes the new event row plus a best-effort
`generationJob` field. With `wait=true`, that field is always populated (or `null`
only when generation was explicitly disabled). The actual provider call happens in
the separate BullMQ worker (`claude-mem server worker start`) — the HTTP path
**never blocks** on a provider response.
## Endpoint reference
All endpoints are mounted under `/v1`; legacy worker routes remain under `/api`.
```
GET /healthz
GET /v1/info
GET /v1/projects
POST /v1/projects
GET /v1/projects/:id
POST /v1/sessions/start
POST /v1/sessions/:id/end
GET /v1/sessions/:id
POST /v1/events # ?generate= ?wait=
POST /v1/events/batch
GET /v1/events/:id
POST /v1/memories
GET /v1/memories/:id
PATCH /v1/memories/:id
DELETE /v1/memories/:id # forget one observation
POST /v1/search
POST /v1/context
ALL /v1/mcp # remote authenticated MCP recall
POST /v1/keys # mint a read-only key (write scope)
GET /v1/connect # connect command with key placeholder
GET /v1/usage # current-month usage totals
DELETE /v1/projects/:projectId/memory # purge a whole project
GET /v1/audit?projectId=<id>
```
## What's solid vs. what's coming
<Note>
**Solid today:** Postgres-backed multi-tenant storage, api-key auth with
read/write scopes, the `/v1/mcp` recall link, opt-in rate limiting + quotas +
metering, and audited data deletion. All covered by the Postgres-gated e2e suite.
**Still being built (UX / devex):** a web dashboard for the first-key bootstrap and
key management, self-serve onboarding, a billing/plan UI on top of the metering
primitives, and a smoother "connect Claude Code to my cloud memory" flow than
pasting a CLI command. These are the next focus — the primitives above are the
foundation they'll sit on.
</Note>