mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-05 23:30:35 +08:00

Files

Yifan Yang 86bad36ffe feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:

* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
  refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
  unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
  on transient 429/timeout), content-filter-safe rollout prompt, an
  output-contract guardrail that rejects edits violating the task's required
  format, and a per-sample cache key so repeated dream rollouts are independent
  samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
  controls, TaskRecord.system framing field.

Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.

99/99 unit tests pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

2026-06-14 16:12:00 +00:00

3.2 KiB

Raw Permalink Blame History

description, argument-hint, allowed-tools

description	argument-hint	allowed-tools
Run or manage the SkillOpt-Sleep self-evolution cycle (review past sessions, replay tasks offline, consolidate validated memory + skills; can also schedule nightly runs)	[run \| dry-run \| status \| adopt \| harvest \| schedule \| unschedule] (default: status)	Bash, Read

/skillopt-sleep — SkillOpt-Sleep nightly self-evolution

You are driving SkillOpt-Sleep: a tool that lets this user's Claude agent improve offline by reviewing past sessions, replaying recurring tasks, and consolidating what it learns into validated memory (CLAUDE.md) and skills (SKILL.md). It is gated like SkillOpt: a change is kept only if it improves a held-out replay score, and nothing live is modified until the user adopts it.

Requested action: $ARGUMENTS

(If $ARGUMENTS is empty, treat it as status.)

How to run it

The engine is the skillopt_sleep Python package in this repo. Use the plugin's bundled runner so the right interpreter and repo are on the path:

"${CLAUDE_PLUGIN_ROOT}/scripts/sleep.sh" <action> --project "$(pwd)" --scope invoked

<action> is one of:

action	what it does
`status`	show how many nights have run + the latest staged proposal (READ-ONLY)
`dry-run`	harvest → mine → replay → report, but stage nothing (safe preview)
`run`	full cycle: also stage a reviewed proposal (still does NOT touch live files)
`adopt`	apply the latest staged proposal to live `CLAUDE.md` / `SKILL.md` (backs up first)
`harvest`	debug: print the recurring tasks mined from recent sessions
`schedule`	install a nightly cron entry for this project (`--hour --minute`, off-:00 by default)
`unschedule`	remove the nightly cron entry (`--all` to remove every managed entry)

Default backend is mock (deterministic, no API spend). To use real budget for genuine improvement, add --backend claude or --backend codex. To steer what the optimizer writes, add --preferences "<your house rules>".

Steps to follow

Run the requested action via the bundled runner above. Capture stdout.
For run / dry-run: after it completes, Read the generated report.md in the staging dir it prints, and show the user:
- held-out score: baseline → candidate (the proof it helped)
- the gate decision (accept/reject) and the exact edits it proposes
- where the proposal is staged
For run that produced an accepted proposal: tell the user the diff is staged and that nothing live changed yet. Offer to run /skillopt-sleep adopt.
For adopt: confirm which live files were updated and that backups were written under the staging dir's backup/.
Never edit CLAUDE.md or SKILL.md yourself — only the adopt action does that, with a backup. Respect the review gate.

Safety reminders

Harvest is read-only over ~/.claude. Replay in mock mode runs no shell side effects.
The cycle stages proposals; the user is in control of adoption.
If the user asks to schedule this nightly, point them at ${CLAUDE_PLUGIN_ROOT}/scripts/install-cron.sh (prints a crontab line; does not install anything without confirmation).

3.2 KiB Raw Permalink Blame History