The advertised backend choices in scripts/train.py use 'azure_openai',
not 'openai'; align the inputSchema description hint accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CopilotCliBackend that drives the GitHub Copilot CLI in
non-interactive mode (copilot -p ... --output-format json) and parses the
JSONL event stream for assistant.message content. Registered as the
'copilot' backend (with aliases) and wired through the CLI, config,
experiment harness, and the Copilot MCP server's backend enum.
- Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on
Windows when responses contain non-cp1252 bytes).
- Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and
custom instructions disabled, so user MCP servers are not spawned per
call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME
/ SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV.
Validated end-to-end on real held-out tasks (researcher persona:
0.42 -> 1.00 lift; gate correctly rejects non-improving edits).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools
(skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only
stdio server, mirroring the existing SkillOpt-Sleep plugin layout.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per maintainer request:
- Remove the internal/scratch docs/sleep/ tree (reports, raw logs, blog run
JSON, sweep.jsonl) — 23 files — and the root PUBLISHING.md. These were
working notes, not reference docs.
- Take the dedicated SkillOpt-Sleep content out of the main README (News bullet
+ section) and host it in the rendered guide instead: new section 9 in
docs/guideline.html (deployment companion, the three plugins, opt-in
experience replay / dream rollouts) with a sidebar entry.
- Fix the README's opening reference so "Documentation & Reproduction Guide"
links directly to the rendered GitHub Pages page, not the raw .html source.
- Repoint the now-removed docs/sleep links in the plugin READMEs to the guide
section.
The plugin code (plugins/, skillopt_sleep/) is unchanged; only docs move.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:
* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
on transient 429/timeout), content-filter-safe rollout prompt, an
output-contract guardrail that rejects edits violating the task's required
format, and a per-sample cache key so repeated dream rollouts are independent
samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
controls, TaskRecord.system framing field.
Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.
99/99 unit tests pass.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Adds a thin OpenClaw shell wrapping the SkillOpt-Sleep engine. Enables
nightly validation-gated skill improvement cycles for OpenClaw agents.
Components:
- skillopt_sleep_openclaw.py: DeepSeek V4 Pro + Ollama nomic-embed-text
backend, mirroring the Claude/Codex/Copilot backend pattern.
- run_sleep.py: CLI entry point supporting dry-run and pre-built task files.
- run_sleep_cron.sh: bash wrapper for nightly cron invocation.
- slash_sleep.py: /sleep command (status / run / adopt / reject / cost).
- config.json: engine config tuned for our stack.
- SKILL.md: OpenClaw skill manifest.
- tests/: 14 held-out tasks across 3 categories (research-cron, devops, wiki).
OpenClaw is the 4th ecosystem in which SkillOpt-Sleep can be deployed,
joining Claude Code, Codex, and Copilot. The shell follows the same
single-engine / thin-shell pattern as the existing three plugins.
End-to-end tested: pipeline runs against real OpenClaw session transcripts,
gate correctly rejects non-improvements, staging artifacts land in
~/.skillopt-sleep/staging/<night>/. Cost: ~$0.02/night on DeepSeek V4 Pro.
Remove every non-ASCII/CJK character for a professional open-source repo:
- harvest.py: drop hardcoded Chinese feedback phrases; add an env-based
extensibility hook (SKILLOPT_SLEEP_NEG_FEEDBACK / _POS_FEEDBACK) so any
locale can be added without baking one in. Verified with a German example.
- rollout.py / consolidate.py: English comments.
- README.md section heading + anchor, CONTROLLABLE_DREAMING.md, plugin.json,
marketplace.json (also fixed stale path skillopt-sleep-plugin ->
plugins/claude-code), SKILL.md: English only.
- Remove the internal WAKE_UP_SUMMARY.md note (not user-facing, not referenced).
Verified: zero CJK chars remain anywhere; 29 tests pass.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin
shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep.
- claude-code/: existing plugin moved here; runner delegates to the shared
launcher (fixes repo-root resolution after the move).
- codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md +
install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces.
- copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools,
plus mcp-config.example.json and a copilot-instructions snippet. Verified end
to end (initialize -> tools/list -> tools/call returns real engine output).
- plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep
section; pyproject lists skillopt_sleep as a first-class package.
Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero
dependency on the research package. 29 tests pass; all three shells resolve.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>