mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-05 23:30:35 +08:00
Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and engine improvements since the initial drop: * Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells; refreshed plugin READMEs and install scripts. * Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule / unschedule the nightly cycle without external cron wiring. * Backend robustness: bounded retry with backoff (no more silent empty-string on transient 429/timeout), content-filter-safe rollout prompt, an output-contract guardrail that rejects edits violating the task's required format, and a per-sample cache key so repeated dream rollouts are independent samples (fixes degenerate single-sample reflection). * consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode controls, TaskRecord.system framing field. Scope: this commit ships only the plugin engine + shells. Research/benchmark harnesses and their data are intentionally not included; the public package has no dependency on them (the one research-evaluator import is now guarded). Marked as an early preview in the README; we'll keep iterating. 99/99 unit tests pass. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
60 lines
2.4 KiB
Markdown
60 lines
2.4 KiB
Markdown
# SkillOpt-Sleep — Codex integration
|
|
|
|
Give your **Codex** agent a nightly **sleep cycle**: it reviews past sessions
|
|
offline, replays your recurring tasks on your own Codex budget, and consolidates
|
|
what it learns into validated memory + skills behind a held-out gate. Same engine
|
|
as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex.
|
|
|
|
> **Verified on Codex:** on the public
|
|
> [gbrain-evals](https://github.com/garrytan/gbrain-evals) `skillopt-v1`
|
|
> benchmark, a deliberately deficient skill goes **0.00 → 1.00** on a held-out
|
|
> set with the Codex backend (incl. the tool-use seed via a real tool loop).
|
|
> See [`../../docs/sleep/FINAL_REPORT.md`](../../docs/sleep/FINAL_REPORT.md).
|
|
|
|
## What Codex supports (and what we use)
|
|
|
|
Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at
|
|
`~/.agents/skills/<name>/SKILL.md`, and **custom prompts** at
|
|
`~/.codex/prompts/<name>.md` (invoked as `/<name>`). This integration ships all
|
|
three, plus a shared runner.
|
|
|
|
## Install
|
|
|
|
```bash
|
|
git clone <repo-url> SkillOpt-Sleep
|
|
cd SkillOpt-Sleep
|
|
bash plugins/codex/install.sh # installs the /skillopt-sleep prompt + skill
|
|
export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere
|
|
```
|
|
|
|
Requires Python ≥ 3.10 and the `codex` CLI on PATH.
|
|
|
|
## Use
|
|
|
|
```text
|
|
/skillopt-sleep status # what's happened
|
|
/skillopt-sleep dry-run # safe preview, stages nothing
|
|
/skillopt-sleep run # full cycle, stages a reviewed proposal (no live edits)
|
|
/skillopt-sleep adopt # apply the staged proposal (with backup)
|
|
```
|
|
|
|
Or call the engine directly:
|
|
|
|
```bash
|
|
python -m skillopt_sleep run --project "$(pwd)" --backend codex
|
|
```
|
|
|
|
Default backend is `mock` (no API spend). `--backend codex` uses your Codex
|
|
budget for real improvement. All the controllable knobs (`--gate on|off`,
|
|
`--rollouts-k`, `--budget-tokens`, `--preferences`, optimizer/target split) work
|
|
identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sleep/CONTROLLABLE_DREAMING.md).
|
|
|
|
## Notes / status
|
|
|
|
- Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the
|
|
`tool_called: search` benchmark seed) works natively.
|
|
- Codex's standalone *plugin-package manifest* format is not yet a stable public
|
|
spec; this integration uses the documented `AGENTS.md` + skills + prompts
|
|
mechanisms, which are stable. If/when a `codex plugin` package format ships,
|
|
we'll add a one-file manifest.
|