Files
microsoft-SkillOpt/plugins/README.md
Yifan Yang f9db99853b feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot
Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin
shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep.

  - claude-code/: existing plugin moved here; runner delegates to the shared
    launcher (fixes repo-root resolution after the move).
  - codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md +
    install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces.
  - copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools,
    plus mcp-config.example.json and a copilot-instructions snippet. Verified end
    to end (initialize -> tools/list -> tools/call returns real engine output).
  - plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep
    section; pyproject lists skillopt_sleep as a first-class package.

Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero
dependency on the research package. 29 tests pass; all three shells resolve.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-08 14:31:52 +00:00

3.2 KiB

SkillOpt-Sleep — plugins for Claude Code, Codex, and Copilot

One engine, three thin shells. SkillOpt-Sleep gives a local coding agent a nightly sleep cycle: it reviews your past sessions offline, replays your recurring tasks on your own API budget, and consolidates what it learns into validated long-term memory and skills — behind a held-out gate, staged for your review. Your agent gets better the more you use it, with no model-weight training.

It synthesizes three ideas: SkillOpt (validation-gated bounded text optimization — the research in this repo), Claude Dreams (offline memory consolidation; input never mutated; review-then-adopt), and the agent sleep literature (short-term experience → long-term competence).

This is an open-source tool, decoupled from the research code. The engine lives in the top-level skillopt_sleep/ package and has zero dependency on the paper's skillopt/ experiment package (the validation gate is vendored). You can ship/use it without the research stack.

The three integrations

Platform Folder Mechanism Status
Claude Code claude-code/ .claude-plugin + /sleep command + skill + hooks full, installable
Codex codex/ ~/.codex/prompts/sleep.md + ~/.agents/skills + AGENTS.md full
Copilot copilot/ MCP server (sleep_* tools) + copilot-instructions full (MCP)

All three call the same plugins/run-sleep.shpython -m skillopt_sleep, so behaviour is identical everywhere. Per-platform setup is in each folder's README.

Quick start (Claude Code)

git clone <repo-url> && cd SkillOpt-Sleep
# Claude Code:
/plugin marketplace add ./plugins/claude-code
/plugin install skillopt-sleep@skillopt-sleep
/sleep status

Codex: bash plugins/codex/install.sh. Copilot: register plugins/copilot/mcp_server.py as an MCP server.

What one "night" does

harvest ~/.claude (or session) transcripts → mine recurring tasks → replay offline
   → consolidate (reflect → bounded edit → GATE on real held-out tasks)
   → stage proposal → (you) adopt

Nothing live changes until you adopt; every adopt backs up first.

Controls (work on all platforms)

--gate on|off · --rollouts-k K (multi-rollout contrastive reflection) · --budget-tokens/--budget-minutes · --preferences "..." · separate optimizer/target models (--optimizer-model / --target-model) · slow-update long-term memory. Full guide: ../docs/sleep/CONTROLLABLE_DREAMING.md.

Does it actually work?

Validated on the public gbrain-evals skillopt-v1 benchmark with real models on both Claude and Codex: deficient skills go 0.00 → 1.00 on held-out sets (all 4 seeds incl. a real tool-use loop), cross-model transfer is positive, and the gate blocks regressions. Full results: ../docs/sleep/FINAL_REPORT.md.

Deterministic proof (no API key):

python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves