Files
Yifan Yang 86bad36ffe feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling
Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:

* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
  refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
  unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
  on transient 429/timeout), content-filter-safe rollout prompt, an
  output-contract guardrail that rejects edits violating the task's required
  format, and a per-sample cache key so repeated dream rollouts are independent
  samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
  controls, TaskRecord.system framing field.

Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.

99/99 unit tests pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-14 16:12:00 +00:00

3.2 KiB

description, argument-hint, allowed-tools
description argument-hint allowed-tools
Run or manage the SkillOpt-Sleep self-evolution cycle (review past sessions, replay tasks offline, consolidate validated memory + skills; can also schedule nightly runs) [run | dry-run | status | adopt | harvest | schedule | unschedule] (default: status) Bash, Read

/skillopt-sleep — SkillOpt-Sleep nightly self-evolution

You are driving SkillOpt-Sleep: a tool that lets this user's Claude agent improve offline by reviewing past sessions, replaying recurring tasks, and consolidating what it learns into validated memory (CLAUDE.md) and skills (SKILL.md). It is gated like SkillOpt: a change is kept only if it improves a held-out replay score, and nothing live is modified until the user adopts it.

Requested action: $ARGUMENTS

(If $ARGUMENTS is empty, treat it as status.)

How to run it

The engine is the skillopt_sleep Python package in this repo. Use the plugin's bundled runner so the right interpreter and repo are on the path:

"${CLAUDE_PLUGIN_ROOT}/scripts/sleep.sh" <action> --project "$(pwd)" --scope invoked

<action> is one of:

action what it does
status show how many nights have run + the latest staged proposal (READ-ONLY)
dry-run harvest → mine → replay → report, but stage nothing (safe preview)
run full cycle: also stage a reviewed proposal (still does NOT touch live files)
adopt apply the latest staged proposal to live CLAUDE.md / SKILL.md (backs up first)
harvest debug: print the recurring tasks mined from recent sessions
schedule install a nightly cron entry for this project (--hour --minute, off-:00 by default)
unschedule remove the nightly cron entry (--all to remove every managed entry)

Default backend is mock (deterministic, no API spend). To use real budget for genuine improvement, add --backend claude or --backend codex. To steer what the optimizer writes, add --preferences "<your house rules>".

Steps to follow

  1. Run the requested action via the bundled runner above. Capture stdout.
  2. For run / dry-run: after it completes, Read the generated report.md in the staging dir it prints, and show the user:
    • held-out score: baseline → candidate (the proof it helped)
    • the gate decision (accept/reject) and the exact edits it proposes
    • where the proposal is staged
  3. For run that produced an accepted proposal: tell the user the diff is staged and that nothing live changed yet. Offer to run /skillopt-sleep adopt.
  4. For adopt: confirm which live files were updated and that backups were written under the staging dir's backup/.
  5. Never edit CLAUDE.md or SKILL.md yourself — only the adopt action does that, with a backup. Respect the review gate.

Safety reminders

  • Harvest is read-only over ~/.claude. Replay in mock mode runs no shell side effects.
  • The cycle stages proposals; the user is in control of adoption.
  • If the user asks to schedule this nightly, point them at ${CLAUDE_PLUGIN_ROOT}/scripts/install-cron.sh (prints a crontab line; does not install anything without confirmation).