diff --git a/docs/public/docs.json b/docs/public/docs.json
index 5707cda7e..baf786283 100644
--- a/docs/public/docs.json
+++ b/docs/public/docs.json
@@ -90,6 +90,13 @@
"openclaw-integration"
]
},
+ {
+ "group": "Hosted Server",
+ "icon": "cloud",
+ "pages": [
+ "hosted-server"
+ ]
+ },
{
"group": "SDK & Embedding",
"icon": "code",
diff --git a/docs/public/hosted-server.mdx b/docs/public/hosted-server.mdx
new file mode 100644
index 000000000..374b77046
--- /dev/null
+++ b/docs/public/hosted-server.mdx
@@ -0,0 +1,251 @@
+---
+title: "Hosted Server (Beta)"
+description: "Remote authenticated MCP recall, usage metering + quotas, and data deletion — how claude-mem's cloud server works today."
+---
+
+# Hosted Server (Beta)
+
+
+**This is early and moving fast.** The hosted server's capture, recall, metering,
+and deletion paths described below are real and tested, but the **UX and developer
+experience around them are still being built** — there's no polished dashboard,
+onboarding flow, or self-serve signup yet. Expect the *plumbing* to be solid and
+the *paving* to be unfinished. Routes, env var names, and the first-key bootstrap
+flow may shift as we wire up the dashboard. Pin a version if you're integrating.
+
+
+The hosted server is the cloud side of claude-mem: a Postgres-backed HTTP service
+(`/v1`) plus a separate BullMQ generation worker. Where the local plugin keeps
+memory in `~/.claude-mem/claude-mem.db` on your machine, the hosted server keeps
+it per **team** and per **project** in Postgres, and exposes it back to any MCP
+client over an authenticated link.
+
+Three capabilities landed together and are documented here:
+
+
+
+ Paste an authenticated link into Claude Code to recall your cloud memory —
+ read-only, team/project-scoped.
+
+
+ Opt-in rate limiting, monthly request/token quotas, and usage metering —
+ the guards a paid tier needs.
+
+
+ Right-to-erasure: forget a single memory, or purge everything captured for a
+ project.
+
+
+
+## The shape of the system
+
+```
+ Claude Code (or any MCP client)
+ │ Authorization: Bearer cm_...
+ ▼
+ ┌─────────────────────────────┐ ┌──────────────────────────┐
+ │ HTTP server (/v1) │ jobs │ BullMQ generation worker │
+ │ - auth (api-key mode) ├───────▶│ claude-mem server │
+ │ - rate limit / quota / meter │ │ worker start │
+ │ - REST + /v1/mcp recall │ │ - provider call │
+ │ - data deletion │ │ - writes observations │
+ └──────────────┬───────────────┘ └────────────┬─────────────┘
+ │ │
+ ▼ ▼
+ ┌───────────────────────────────────────────────────┐
+ │ Postgres (teams, projects, observations, │
+ │ agent_events, server_sessions, generation jobs, │
+ │ api_keys, usage_events, audit_log) │
+ └───────────────────────────────────────────────────┘
+```
+
+Every row is scoped by `(team_id, project_id)`. An API key carries a **team**
+(always) and an optional **project** scope; that scoping bounds every read,
+write, and delete.
+
+### Authentication
+
+Set `CLAUDE_MEM_AUTH_MODE=api-key` and send `Authorization: Bearer ` on every
+request. Scopes gate access:
+
+- **Read** endpoints (search, context, recall, usage) require `memories:read`.
+- **Write** endpoints (ingest, key issuance, deletion) require `memories:write`.
+
+Keys are stored as SHA-256 hashes in the `api_keys` table; the raw `cm_...` value
+is shown exactly once, at mint time.
+
+## Remote authenticated MCP recall
+
+`/v1/mcp` is a streamable-HTTP [MCP](https://modelcontextprotocol.io) server. It's
+the secure link a user pastes into Claude Code to recall their cloud memory. It is
+**read-only** and authenticated by the same API key as the REST routes
+(`memories:read`); the key's team — and project, if the key is project-scoped —
+bounds every read.
+
+```bash
+claude mcp add --transport http claude-mem /v1/mcp \
+ --header "Authorization: Bearer cm_..."
+```
+
+Three tools are exposed, each mirroring an existing REST path:
+
+| Tool | Arguments | Returns |
+|-----------|------------------------------------|---------|
+| `search` | `{ projectId, query, limit? }` | Matching observations (full-text search). |
+| `context` | `{ projectId, query, limit? }` | Observations **plus** a concatenated `context` string ready for prompt injection. |
+| `recent` | `{ projectId, limit? }` | The newest observations for a project. |
+
+
+The transport is **stateless** — one MCP server + transport per request — so it
+needs no session affinity behind a load balancer. Mutating tools are
+intentionally absent: a pasted recall link can never write or delete. Every read
+is written to `audit_log` as an `observation.read` event, the same as
+`POST /v1/search`.
+
+
+## Connecting a client: key issuance + connect
+
+Two routes turn "I have a server" into "Claude Code is recalling my cloud memory":
+
+- **`POST /v1/keys`** (requires `memories:write`) mints a **read-only** API key for
+ the caller's team and returns a paste-ready connect command. The raw key appears
+ **once**. Body: `{ "expiresInDays"?: number }`. Minting requires write scope so a
+ read-only key can't escalate itself into more keys.
+
+ ```json
+ {
+ "id": "...",
+ "apiKey": "cm_...",
+ "scopes": ["memories:read"],
+ "expiresAt": null,
+ "mcpUrl": "https:///v1/mcp",
+ "connectCommand": "claude mcp add --transport http claude-mem https:///v1/mcp --header \"Authorization: Bearer cm_...\""
+ }
+ ```
+
+- **`GET /v1/connect`** (requires `memories:read`) returns the same command with a
+ `` placeholder — a GET never mints. The `mcpUrl` is built from
+ `CLAUDE_MEM_PUBLIC_URL` (recommended when behind a proxy or load balancer) or,
+ failing that, the request host.
+
+
+**First-key bootstrap is the rough edge.** Minting a team's *very first* key still
+needs a session-gated path (a web dashboard), because `POST /v1/keys` itself
+requires a write-scoped key. better-auth's `apiKey()` plugin exists but writes to
+a different store than the Postgres `api_keys` these routes authenticate against —
+wiring the better-auth org → team mapping is the remaining piece, and the biggest
+part of the devex work still ahead.
+
+
+## Paid-readiness: rate limiting, quotas, metering
+
+These guards run **after** auth and are **opt-in via environment variables**. Unset
+(the default) means no rate limit, no quota, and no metering — behavior is
+identical to a server without them. Every guard **fails open**: a backing-store
+error never blocks a legitimate request.
+
+| Env var | Effect | Response when exceeded |
+|---------|--------|------------------------|
+| `CLAUDE_MEM_RATE_LIMIT_PER_MIN` | Max requests per **API key** per minute. | `429` with `Retry-After` and `X-RateLimit-*` headers. |
+| `CLAUDE_MEM_MONTHLY_REQUEST_CAP` | Max requests per **team** per calendar month (UTC). | `402 quota_exceeded`. |
+| `CLAUDE_MEM_MONTHLY_TOKEN_CAP` | Max provider **tokens** per team per month. Gates **writes only** — reads stay open so a team over budget can still recall. | `402` at the cap. |
+| `CLAUDE_MEM_USAGE_METERING=1` | Records one `request` usage event per authenticated call (fire-and-forget). | — |
+
+Token and observation metering is written to the same `usage_events` table from
+the generation worker, so usage reflects real provider spend, not just HTTP calls.
+
+`GET /v1/usage` returns the caller team's per-kind totals for the current month:
+
+```json
+{ "since": "2026-06-01T00:00:00.000Z", "usage": { "request": 1280, "observation": 44 } }
+```
+
+
+"Gates writes only" is deliberate: ingestion is what drives generation, which is
+what costs tokens. A team that blows its token budget can still **read** its
+existing memory — you never lock someone out of their own data over billing.
+
+
+## Data deletion (forget)
+
+Right-to-erasure. Both routes require `memories:write` and are scoped to the
+caller's team. Both write an `audit_log` entry.
+
+- **`DELETE /v1/memories/:id`** — delete a single observation; its
+ `observation_sources` cascade. Returns `404` if no such observation exists for
+ the team. Audited as `observation.deleted`.
+
+- **`DELETE /v1/projects/:projectId/memory`** — purge **all** captured content for
+ a project in one transaction: observations, raw agent events, server sessions,
+ and generation jobs. The project shell (config/membership) is kept so the team
+ can keep using it. Returns per-table `counts`. Returns `404` if the project
+ doesn't belong to the team. Audited as `project.memory_purged`.
+
+ ```json
+ { "purged": true, "projectId": "...", "counts": { "observations": 42, "agentEvents": 17, "sessions": 3, "jobs": 17 } }
+ ```
+
+
+Deletion is team-scoped at the SQL layer, so a key can only ever erase its own
+team's data — a cross-team or nonexistent `projectId` returns `404` rather than a
+misleading success.
+
+
+## Event generation semantics
+
+Ingestion (`POST /v1/events`) accepts two query flags that control observation
+generation:
+
+- `generate=false` — write the event but do **not** enqueue a generation job.
+- `wait=true` — return the `generationJob` descriptor so callers can poll
+ `GET /v1/jobs/:id` for completion.
+
+Without `wait=true`, the response includes the new event row plus a best-effort
+`generationJob` field. With `wait=true`, that field is always populated (or `null`
+only when generation was explicitly disabled). The actual provider call happens in
+the separate BullMQ worker (`claude-mem server worker start`) — the HTTP path
+**never blocks** on a provider response.
+
+## Endpoint reference
+
+All endpoints are mounted under `/v1`; legacy worker routes remain under `/api`.
+
+```
+GET /healthz
+GET /v1/info
+GET /v1/projects
+POST /v1/projects
+GET /v1/projects/:id
+POST /v1/sessions/start
+POST /v1/sessions/:id/end
+GET /v1/sessions/:id
+POST /v1/events # ?generate= ?wait=
+POST /v1/events/batch
+GET /v1/events/:id
+POST /v1/memories
+GET /v1/memories/:id
+PATCH /v1/memories/:id
+DELETE /v1/memories/:id # forget one observation
+POST /v1/search
+POST /v1/context
+ALL /v1/mcp # remote authenticated MCP recall
+POST /v1/keys # mint a read-only key (write scope)
+GET /v1/connect # connect command with key placeholder
+GET /v1/usage # current-month usage totals
+DELETE /v1/projects/:projectId/memory # purge a whole project
+GET /v1/audit?projectId=
+```
+
+## What's solid vs. what's coming
+
+
+**Solid today:** Postgres-backed multi-tenant storage, api-key auth with
+read/write scopes, the `/v1/mcp` recall link, opt-in rate limiting + quotas +
+metering, and audited data deletion. All covered by the Postgres-gated e2e suite.
+
+**Still being built (UX / devex):** a web dashboard for the first-key bootstrap and
+key management, self-serve onboarding, a billing/plan UI on top of the metering
+primitives, and a smoother "connect Claude Code to my cloud memory" flow than
+pasting a CLI command. These are the next focus — the primitives above are the
+foundation they'll sit on.
+