# SkillOpt-Sleep (Claude Code plugin) > Give your local Claude agent a **sleep cycle**. Every night it reviews your > past sessions offline, replays your recurring tasks on your own API budget, > and consolidates what it learns into **validated** memory (`CLAUDE.md`) and > skills (`SKILL.md`). Your agent gets better the more you use it — no > model-weight training. SkillOpt-Sleep is the **deployment-time** companion to [SkillOpt](https://github.com/microsoft/SkillOpt). SkillOpt trains a skill offline on a benchmark; SkillOpt-Sleep applies the same discipline to *your own daily usage*: bounded text edits, accepted only through a held-out validation gate, with rejected edits kept as negative feedback. It synthesizes three ideas: | Idea | Contribution | |---|---| | **SkillOpt** | skill/memory = trainable text; bounded add/delete/replace edits; **held-out gate** keeps only changes that help. | | **Claude Dreams** | offline consolidation over past sessions; input never mutated; output **reviewed then adopted**. | | **Agent sleep** | periodic offline replay turns short-term episodes into long-term skill. | ## What it does (one "night") ``` harvest ~/.claude transcripts → mine recurring tasks → replay offline → consolidate (reflect → bounded edit → GATE) → stage proposal → (you) adopt ``` Nothing live is modified until **you** run `/skillopt-sleep adopt` (the Dreams "review, then adopt or discard" contract). Every adopt backs up the prior file first. ## Install **Requirements:** Python ≥ 3.10, and the `claude` CLI (and/or `codex` CLI) on PATH. ```bash # 1) get the code (the plugin ships inside the SkillOpt repo) git clone https://github.com/microsoft/SkillOpt.git cd SkillOpt # 2) add the plugin to Claude Code as a local marketplace /plugin marketplace add ./skillopt-sleep-plugin /plugin install skillopt-sleep@skillopt-sleep # 3) verify /skillopt-sleep status ``` The plugin's bundled runner (`scripts/sleep.sh`) auto-selects a Python ≥ 3.10 interpreter and calls the `skillopt_sleep` engine in the repo. No `pip install` is required for the default `mock` backend or for `claude`/`codex` backends — they shell out to the CLIs you already have. ## Quick start ```bash # from inside any project you use with Claude Code: /skillopt-sleep dry-run # safe preview: what it would learn, no changes staged /skillopt-sleep run # full cycle: stages a reviewed proposal (still no live edits) /skillopt-sleep status # see history + the latest staged proposal /skillopt-sleep adopt # apply the staged proposal to CLAUDE.md / SKILL.md (with backup) ``` Or call the engine directly (Python ≥ 3.10): ```bash python -m skillopt_sleep run --project "$(pwd)" --scope invoked --backend mock python -m skillopt_sleep run --project "$(pwd)" --backend claude # real lift via Claude python -m skillopt_sleep run --project "$(pwd)" --backend codex # real lift via Codex ``` Default backend is **`mock`** — deterministic, no API spend — so you can try the plumbing for free. Switch to `--backend claude` or `--backend codex` for genuine improvement on your own budget. ## Does it actually improve? (real models, public benchmark) SkillOpt-Sleep is validated against [gbrain-evals](https://github.com/garrytan/gbrain-evals)' public `skillopt-v1` suite — the same benchmark gbrain scores its own skill optimizer against. We take a deliberately **deficient** skill and run one sleep night; held-out scoring is done by a local rule judge (no judge-API, no way to grade its own homework). | Backend | Seed | Held-out before → after | Nights | |---|---|---|---| | **Claude (Haiku 4.5)** | brief-writer | **0.00 → 1.00** | 1 | | **Codex** | brief-writer | **0.00 → 1.00** | 2 | Both took a brief-writer with no risks section / no confidence level and, within 1–2 nights, proposed gated edits that lifted the held-out score to perfect — into the protected `LEARNED` block, nothing else touched. The Codex 2-night trace even shows the optimizer **diagnosing its own residual failure** and adding a meta-rule to fix it. Full writeup + reproduction: [`docs/sleep/real_api_results.md`](../docs/sleep/real_api_results.md). Reproduce: ```bash git clone https://github.com/garrytan/gbrain-evals /tmp/gbrain-evals python -m skillopt_sleep.experiments.run_gbrain --backend claude --model haiku \ --seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \ --nights 1 --limit-replay 3 --limit-holdout 3 python -m skillopt_sleep.experiments.run_gbrain --backend codex \ --seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \ --nights 1 --limit-replay 3 --limit-holdout 3 ``` ## Deterministic proof (no API, no keys) ```bash python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves python -m skillopt_sleep.experiments.run_experiment --persona programmer --assert-improves ``` Each prints the held-out score rising from baseline toward 1.0 as the gate accepts the general rules your tasks need, and confirms the gate **rejects** an injected harmful edit. Recorded output: [`docs/sleep/experiment_results.md`](../docs/sleep/experiment_results.md). ## Schedule it nightly ```bash "${CLAUDE_PLUGIN_ROOT}/scripts/install-cron.sh" "$(pwd)" # prints a crontab line; installs nothing ``` ## Safety - **Read-only** harvest of `~/.claude`. `mock` replay has no side effects. - Proposals are **staged**, never auto-applied (unless you opt in with `--auto-adopt`). - Every adopt writes a backup under the staging dir's `backup/`. - Per-night **token/task budget caps**; secrets redacted from prompts. - `fresh` replay (Phase 3) runs only in throwaway git worktrees. ## Status Phase 1 (engine + deterministic experiment + plugin surface) is complete. Phase 3 adds the real-API miner/judge and `fresh` worktree replay. See [`docs/superpowers/specs/2026-06-07-skillopt-sleep-claude-code-plugin-design.md`](../docs/superpowers/specs/2026-06-07-skillopt-sleep-claude-code-plugin-design.md).