mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

elzlxx 553446575a feat(plugins): add OpenClaw shell for SkillOpt-Sleep

Adds a thin OpenClaw shell wrapping the SkillOpt-Sleep engine. Enables
nightly validation-gated skill improvement cycles for OpenClaw agents.

Components:
- skillopt_sleep_openclaw.py: DeepSeek V4 Pro + Ollama nomic-embed-text
  backend, mirroring the Claude/Codex/Copilot backend pattern.
- run_sleep.py: CLI entry point supporting dry-run and pre-built task files.
- run_sleep_cron.sh: bash wrapper for nightly cron invocation.
- slash_sleep.py: /sleep command (status / run / adopt / reject / cost).
- config.json: engine config tuned for our stack.
- SKILL.md: OpenClaw skill manifest.
- tests/: 14 held-out tasks across 3 categories (research-cron, devops, wiki).

OpenClaw is the 4th ecosystem in which SkillOpt-Sleep can be deployed,
joining Claude Code, Codex, and Copilot. The shell follows the same
single-engine / thin-shell pattern as the existing three plugins.

End-to-end tested: pipeline runs against real OpenClaw session transcripts,
gate correctly rejects non-improvements, staging artifacts land in
~/.skillopt-sleep/staging/<night>/. Cost: ~$0.02/night on DeepSeek V4 Pro.

2026-06-14 23:27:54 +08:00

claude-code

chore(sleep): English-only across the engine, plugins, and docs

2026-06-08 14:31:52 +00:00

codex

feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot

2026-06-08 14:31:52 +00:00

copilot

feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot

2026-06-08 14:31:52 +00:00

openclaw

feat(plugins): add OpenClaw shell for SkillOpt-Sleep

2026-06-14 23:27:54 +08:00

README.md

feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot

2026-06-08 14:31:52 +00:00

run-sleep.sh

feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot

2026-06-08 14:31:52 +00:00

README.md

SkillOpt-Sleep — plugins for Claude Code, Codex, and Copilot

One engine, three thin shells. SkillOpt-Sleep gives a local coding agent a nightly sleep cycle: it reviews your past sessions offline, replays your recurring tasks on your own API budget, and consolidates what it learns into validated long-term memory and skills — behind a held-out gate, staged for your review. Your agent gets better the more you use it, with no model-weight training.

It synthesizes three ideas: SkillOpt (validation-gated bounded text optimization — the research in this repo), Claude Dreams (offline memory consolidation; input never mutated; review-then-adopt), and the agent sleep literature (short-term experience → long-term competence).

This is an open-source tool, decoupled from the research code. The engine lives in the top-level skillopt_sleep/ package and has zero dependency on the paper's skillopt/ experiment package (the validation gate is vendored). You can ship/use it without the research stack.

The three integrations

Platform	Folder	Mechanism	Status
Claude Code	`claude-code/`	`.claude-plugin` + `/sleep` command + skill + hooks	full, installable
Codex	`codex/`	`~/.codex/prompts/sleep.md` + `~/.agents/skills` + `AGENTS.md`	full
Copilot	`copilot/`	MCP server (`sleep_*` tools) + `copilot-instructions`	full (MCP)

All three call the same plugins/run-sleep.sh → python -m skillopt_sleep, so behaviour is identical everywhere. Per-platform setup is in each folder's README.

Quick start (Claude Code)

git clone <repo-url> && cd SkillOpt-Sleep
# Claude Code:
/plugin marketplace add ./plugins/claude-code
/plugin install skillopt-sleep@skillopt-sleep
/sleep status

Codex: bash plugins/codex/install.sh. Copilot: register plugins/copilot/mcp_server.py as an MCP server.

What one "night" does

harvest ~/.claude (or session) transcripts → mine recurring tasks → replay offline
   → consolidate (reflect → bounded edit → GATE on real held-out tasks)
   → stage proposal → (you) adopt

Nothing live changes until you adopt; every adopt backs up first.

Controls (work on all platforms)

--gate on|off · --rollouts-k K (multi-rollout contrastive reflection) · --budget-tokens/--budget-minutes · --preferences "..." · separate optimizer/target models (--optimizer-model / --target-model) · slow-update long-term memory. Full guide: ../docs/sleep/CONTROLLABLE_DREAMING.md.

Does it actually work?

Validated on the public gbrain-evals skillopt-v1 benchmark with real models on both Claude and Codex: deficient skills go 0.00 → 1.00 on held-out sets (all 4 seeds incl. a real tool-use loop), cross-model transfer is positive, and the gate blocks regressions. Full results: ../docs/sleep/FINAL_REPORT.md.

Deterministic proof (no API key):

python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves