A harvested single-turn Devin session spanned only 1s (reply written 1000ms
after the prompt), which the engine's harvest filter conservatively classifies
as a <3s headless replay (skillopt_sleep Issue #62) and skips — so a real
single-turn session mined 0 tasks. Widen the prompt->reply gap to 5s. With this,
an end-to-end dry-run mines the task: "night 1: 1 sessions -> 1 tasks".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the copilot MCP server: same rich _TOOL_SCHEMA (source, model,
tasks_file, target_skill_path, max_sessions, max_tasks, lookback_hours,
auto_adopt, json, edit_budget, hour, minute) and generic flag forwarding, plus
sleep_schedule / sleep_unschedule. Devin specifics retained: the ATIF-v1.7
harvest step (run before data-reading actions, engine pointed at it via
--claude-home, default --source claude) and post-adopt sync into .devin/skills/.
Tests + README + rules snippet updated for the 7-tool interface.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Review fixes:
- Path bug: SKILLOPT_DEVIN_CLAUDE_HOME (and SKILLOPT_SLEEP_REPO) read from the
env are now wrapped in os.path.expanduser, so the documented "~/..." config
no longer passes a literal ~ to --claude-home (which yielded zero mined
sessions). expanduser on an absolute default is a no-op.
- tests/test_devin_plugin.py: tool-schema completeness, action→subcommand map,
backend enum, the CLAUDE_HOME expansion regression, and an ATIF-v1.7 harvest
shape test against a bundled fixture.
- plugins/devin/fixtures/devin_sample.json: sample ATIF-v1.7 transcript.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the skillopt_sleep engine into Devin (Cognition) via an MCP server,
following the same thin-shell pattern as plugins/copilot.
- mcp_server.py: stdlib-only stdio MCP server exposing the standard sleep_*
tools (status, dry-run, run, adopt, harvest). REPO_ROOT defaults to ../.. so
it finds skillopt_sleep automatically when run from plugins/devin/.
- harvest_devin.py: converts Devin ATIF-v1.7 transcripts, agentmemory, and
.devin/skills/*/SKILL.md into the Claude Code-compatible JSONL the engine
consumes; enriches with taskKey + outcome envelopes (hard test/build signal
or judge rubric). Workspace auto-detection; cross-platform paths.
- judge.py, mcp-config.example.json, devin-rules.snippet.md, README.md.
- plugins/README.md: add Devin to the platform + install tables.
No changes to skillopt_sleep; shells out to `python -m skillopt_sleep` like the
other plugins. Pure stdlib; default backend mock (no API spend).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fixes from issue #57 feedback:
1. run-sleep.sh: support SKILLOPT_SLEEP_PYTHON env var to explicitly set
the Python interpreter. Useful on macOS where system Python is 3.9 but
a newer Python is available elsewhere (e.g. Codex Desktop's bundled
Python 3.12). Applied to both the shared runner and the bundled
Claude Code plugin copy.
2. cycle.py: on first run (no prior harvest recorded), apply the
lookback_hours config (default 72h) as a time cutoff. Previously,
first run scanned the entire transcript history, which could trigger
massive LLM mining on users with months of session data.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- harvest: tighten sub-3s filter to also require prompt < 200 chars,
avoiding false positives on fast real one-shot questions
- openclaw schedule_cmd: add docstring clarifying it schedules the
shared engine, not the OpenClaw-native runner
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The advertised backend choices in scripts/train.py use 'azure_openai',
not 'openai'; align the inputSchema description hint accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CopilotCliBackend that drives the GitHub Copilot CLI in
non-interactive mode (copilot -p ... --output-format json) and parses the
JSONL event stream for assistant.message content. Registered as the
'copilot' backend (with aliases) and wired through the CLI, config,
experiment harness, and the Copilot MCP server's backend enum.
- Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on
Windows when responses contain non-cp1252 bytes).
- Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and
custom instructions disabled, so user MCP servers are not spawned per
call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME
/ SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV.
Validated end-to-end on real held-out tasks (researcher persona:
0.42 -> 1.00 lift; gate correctly rejects non-improving edits).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools
(skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only
stdio server, mirroring the existing SkillOpt-Sleep plugin layout.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per maintainer request:
- Remove the internal/scratch docs/sleep/ tree (reports, raw logs, blog run
JSON, sweep.jsonl) — 23 files — and the root PUBLISHING.md. These were
working notes, not reference docs.
- Take the dedicated SkillOpt-Sleep content out of the main README (News bullet
+ section) and host it in the rendered guide instead: new section 9 in
docs/guideline.html (deployment companion, the three plugins, opt-in
experience replay / dream rollouts) with a sidebar entry.
- Fix the README's opening reference so "Documentation & Reproduction Guide"
links directly to the rendered GitHub Pages page, not the raw .html source.
- Repoint the now-removed docs/sleep links in the plugin READMEs to the guide
section.
The plugin code (plugins/, skillopt_sleep/) is unchanged; only docs move.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:
* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
on transient 429/timeout), content-filter-safe rollout prompt, an
output-contract guardrail that rejects edits violating the task's required
format, and a per-sample cache key so repeated dream rollouts are independent
samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
controls, TaskRecord.system framing field.
Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.
99/99 unit tests pass.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Adds a thin OpenClaw shell wrapping the SkillOpt-Sleep engine. Enables
nightly validation-gated skill improvement cycles for OpenClaw agents.
Components:
- skillopt_sleep_openclaw.py: DeepSeek V4 Pro + Ollama nomic-embed-text
backend, mirroring the Claude/Codex/Copilot backend pattern.
- run_sleep.py: CLI entry point supporting dry-run and pre-built task files.
- run_sleep_cron.sh: bash wrapper for nightly cron invocation.
- slash_sleep.py: /sleep command (status / run / adopt / reject / cost).
- config.json: engine config tuned for our stack.
- SKILL.md: OpenClaw skill manifest.
- tests/: 14 held-out tasks across 3 categories (research-cron, devops, wiki).
OpenClaw is the 4th ecosystem in which SkillOpt-Sleep can be deployed,
joining Claude Code, Codex, and Copilot. The shell follows the same
single-engine / thin-shell pattern as the existing three plugins.
End-to-end tested: pipeline runs against real OpenClaw session transcripts,
gate correctly rejects non-improvements, staging artifacts land in
~/.skillopt-sleep/staging/<night>/. Cost: ~$0.02/night on DeepSeek V4 Pro.
Remove every non-ASCII/CJK character for a professional open-source repo:
- harvest.py: drop hardcoded Chinese feedback phrases; add an env-based
extensibility hook (SKILLOPT_SLEEP_NEG_FEEDBACK / _POS_FEEDBACK) so any
locale can be added without baking one in. Verified with a German example.
- rollout.py / consolidate.py: English comments.
- README.md section heading + anchor, CONTROLLABLE_DREAMING.md, plugin.json,
marketplace.json (also fixed stale path skillopt-sleep-plugin ->
plugins/claude-code), SKILL.md: English only.
- Remove the internal WAKE_UP_SUMMARY.md note (not user-facing, not referenced).
Verified: zero CJK chars remain anywhere; 29 tests pass.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin
shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep.
- claude-code/: existing plugin moved here; runner delegates to the shared
launcher (fixes repo-root resolution after the move).
- codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md +
install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces.
- copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools,
plus mcp-config.example.json and a copilot-instructions snippet. Verified end
to end (initialize -> tools/list -> tools/call returns real engine output).
- plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep
section; pyproject lists skillopt_sleep as a first-class package.
Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero
dependency on the research package. 29 tests pass; all three shells resolve.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>