Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin
shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep.
- claude-code/: existing plugin moved here; runner delegates to the shared
launcher (fixes repo-root resolution after the move).
- codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md +
install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces.
- copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools,
plus mcp-config.example.json and a copilot-instructions snippet. Verified end
to end (initialize -> tools/list -> tools/call returns real engine output).
- plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep
section; pyproject lists skillopt_sleep as a first-class package.
Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero
dependency on the research package. 29 tests pass; all three shells resolve.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
3.2 KiB
SkillOpt-Sleep — plugins for Claude Code, Codex, and Copilot
One engine, three thin shells. SkillOpt-Sleep gives a local coding agent a nightly sleep cycle: it reviews your past sessions offline, replays your recurring tasks on your own API budget, and consolidates what it learns into validated long-term memory and skills — behind a held-out gate, staged for your review. Your agent gets better the more you use it, with no model-weight training.
It synthesizes three ideas: SkillOpt (validation-gated bounded text optimization — the research in this repo), Claude Dreams (offline memory consolidation; input never mutated; review-then-adopt), and the agent sleep literature (short-term experience → long-term competence).
This is an open-source tool, decoupled from the research code. The engine lives in the top-level
skillopt_sleep/package and has zero dependency on the paper'sskillopt/experiment package (the validation gate is vendored). You can ship/use it without the research stack.
The three integrations
| Platform | Folder | Mechanism | Status |
|---|---|---|---|
| Claude Code | claude-code/ |
.claude-plugin + /sleep command + skill + hooks |
full, installable |
| Codex | codex/ |
~/.codex/prompts/sleep.md + ~/.agents/skills + AGENTS.md |
full |
| Copilot | copilot/ |
MCP server (sleep_* tools) + copilot-instructions |
full (MCP) |
All three call the same plugins/run-sleep.sh → python -m skillopt_sleep, so behaviour is identical everywhere. Per-platform setup is in
each folder's README.
Quick start (Claude Code)
git clone <repo-url> && cd SkillOpt-Sleep
# Claude Code:
/plugin marketplace add ./plugins/claude-code
/plugin install skillopt-sleep@skillopt-sleep
/sleep status
Codex: bash plugins/codex/install.sh.
Copilot: register plugins/copilot/mcp_server.py as an MCP server.
What one "night" does
harvest ~/.claude (or session) transcripts → mine recurring tasks → replay offline
→ consolidate (reflect → bounded edit → GATE on real held-out tasks)
→ stage proposal → (you) adopt
Nothing live changes until you adopt; every adopt backs up first.
Controls (work on all platforms)
--gate on|off · --rollouts-k K (multi-rollout contrastive reflection) ·
--budget-tokens/--budget-minutes · --preferences "..." · separate
optimizer/target models (--optimizer-model / --target-model) · slow-update
long-term memory. Full guide:
../docs/sleep/CONTROLLABLE_DREAMING.md.
Does it actually work?
Validated on the public
gbrain-evals skillopt-v1 benchmark
with real models on both Claude and Codex: deficient skills go 0.00 →
1.00 on held-out sets (all 4 seeds incl. a real tool-use loop), cross-model
transfer is positive, and the gate blocks regressions. Full results:
../docs/sleep/FINAL_REPORT.md.
Deterministic proof (no API key):
python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves