mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

Yifan Yang 86bad36ffe feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:

* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
  refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
  unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
  on transient 429/timeout), content-filter-safe rollout prompt, an
  output-contract guardrail that rejects edits violating the task's required
  format, and a per-sample cache key so repeated dream rollouts are independent
  samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
  controls, TaskRecord.system framing field.

Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.

99/99 unit tests pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

2026-06-14 16:12:00 +00:00

prompts

feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

2026-06-14 16:12:00 +00:00

skills/skillopt-sleep

feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

2026-06-14 16:12:00 +00:00

install.sh

feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

2026-06-14 16:12:00 +00:00

README.md

feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling

2026-06-14 16:12:00 +00:00

README.md

SkillOpt-Sleep — Codex integration

Give your Codex agent a nightly sleep cycle: it reviews past sessions offline, replays your recurring tasks on your own Codex budget, and consolidates what it learns into validated memory + skills behind a held-out gate. Same engine as the Claude Code plugin (skillopt_sleep), wrapped for Codex.

Verified on Codex: on the public gbrain-evals skillopt-v1 benchmark, a deliberately deficient skill goes 0.00 → 1.00 on a held-out set with the Codex backend (incl. the tool-use seed via a real tool loop). See ../../docs/sleep/FINAL_REPORT.md.

What Codex supports (and what we use)

Codex (@openai/codex) extends via AGENTS.md instructions, skills at ~/.agents/skills/<name>/SKILL.md, and custom prompts at ~/.codex/prompts/<name>.md (invoked as /<name>). This integration ships all three, plus a shared runner.

Install

git clone <repo-url> SkillOpt-Sleep
cd SkillOpt-Sleep
bash plugins/codex/install.sh          # installs the /skillopt-sleep prompt + skill
export SKILLOPT_SLEEP_REPO="$(pwd)"    # so the runner is found from anywhere

Requires Python ≥ 3.10 and the codex CLI on PATH.

Use

/skillopt-sleep status      # what's happened
/skillopt-sleep dry-run     # safe preview, stages nothing
/skillopt-sleep run         # full cycle, stages a reviewed proposal (no live edits)
/skillopt-sleep adopt       # apply the staged proposal (with backup)

Or call the engine directly:

python -m skillopt_sleep run --project "$(pwd)" --backend codex

Default backend is mock (no API spend). --backend codex uses your Codex budget for real improvement. All the controllable knobs (--gate on|off, --rollouts-k, --budget-tokens, --preferences, optimizer/target split) work identically — see ../../docs/sleep/CONTROLLABLE_DREAMING.md.

Notes / status

Codex's exec runs shell, so the real-tool-loop replay (e.g. the tool_called: search benchmark seed) works natively.
Codex's standalone plugin-package manifest format is not yet a stable public spec; this integration uses the documented AGENTS.md + skills + prompts mechanisms, which are stable. If/when a codex plugin package format ships, we'll add a one-file manifest.