Make Codex integration skill-first

This commit is contained in:
Kirill Kostarev
2026-06-12 16:51:54 +03:00
committed by carpedkm
parent 1b2652c6f8
commit 1953484822
8 changed files with 100 additions and 71 deletions

View File

@@ -20,6 +20,12 @@ sleep** idea (short-term experience → long-term competence).
---
| Platform | Folder | Mechanism | Status |
|---|---|---|---|
| **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/skillopt-sleep` command + skill + hooks | full, installable |
| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full |
| **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) |
## Install (pick your agent)
| Platform | Install | Then |

View File

@@ -14,16 +14,17 @@ as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex.
## What Codex supports (and what we use)
Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at
`~/.agents/skills/<name>/SKILL.md`, and **custom prompts** at
`~/.codex/prompts/<name>.md` (invoked as `/<name>`). This integration ships all
three, plus a shared runner.
`~/.agents/skills/<name>/SKILL.md`, and plugins that can distribute skills.
Custom prompts are deprecated in Codex, so this integration is skill-first: the
installed `skillopt-sleep` skill contains the launch commands and operating
rules. The shared runner remains a plain shell entrypoint that the skill calls.
## Install
```bash
git clone <repo-url> SkillOpt-Sleep
cd SkillOpt-Sleep
bash plugins/codex/install.sh # installs the /skillopt-sleep prompt + skill
bash plugins/codex/install.sh # installs the skill
export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere
```
@@ -31,11 +32,14 @@ Requires Python ≥ 3.10 and the `codex` CLI on PATH.
## Use
Mention `$skillopt-sleep` where Codex supports explicit skill mentions, or ask
Codex in natural language:
```text
/skillopt-sleep status # what's happened
/skillopt-sleep dry-run # safe preview, stages nothing
/skillopt-sleep run # full cycle, stages a reviewed proposal (no live edits)
/skillopt-sleep adopt # apply the staged proposal (with backup)
Use the skillopt-sleep skill to run status for this project.
Use the skillopt-sleep skill to run a dry-run for this project.
Use the skillopt-sleep skill to run the full cycle for this project with the Codex backend.
Use the skillopt-sleep skill to adopt the latest staged proposal.
```
Or call the engine directly:
@@ -53,7 +57,7 @@ identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sle
- Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the
`tool_called: search` benchmark seed) works natively.
- Codex's standalone *plugin-package manifest* format is not yet a stable public
spec; this integration uses the documented `AGENTS.md` + skills + prompts
mechanisms, which are stable. If/when a `codex plugin` package format ships,
we'll add a one-file manifest.
- This integration no longer installs a `.codex/prompts` slash command. Skills
are the reusable Codex workflow surface; mention `skillopt-sleep` explicitly
or ask for a sleep/dream/offline self-improvement run and Codex can load the
skill.

View File

@@ -1,36 +1,32 @@
#!/usr/bin/env bash
# Install the SkillOpt-Sleep Codex integration into the user's ~/.codex and
# ~/.agents directories. Idempotent; prints what it does.
# Install the SkillOpt-Sleep Codex integration as a user-level Codex skill.
# Idempotent; prints what it does.
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
AGENTS_SKILLS="${HOME}/.agents/skills"
echo "[install] repo: $REPO_ROOT"
# 1) custom /skillopt-sleep prompt
mkdir -p "$CODEX_HOME/prompts"
cp "$REPO_ROOT/plugins/codex/prompts/skillopt-sleep.md" "$CODEX_HOME/prompts/skillopt-sleep.md"
echo "[install] /skillopt-sleep prompt -> $CODEX_HOME/prompts/skillopt-sleep.md"
# 2) user-level skill
# 1) user-level skill
mkdir -p "$AGENTS_SKILLS/skillopt-sleep"
cp "$REPO_ROOT/plugins/codex/skills/skillopt-sleep/SKILL.md" "$AGENTS_SKILLS/skillopt-sleep/SKILL.md"
echo "[install] skill -> $AGENTS_SKILLS/skillopt-sleep/SKILL.md"
# 3) record the repo location so the runner is found from anywhere
# 2) record the repo location so the runner is found from anywhere
echo "[install] add to your shell profile:"
echo " export SKILLOPT_SLEEP_REPO=\"$REPO_ROOT\""
# 4) optional: append an AGENTS.md hint (only if the user opts in)
# 3) optional: append an AGENTS.md hint (only if the user opts in)
cat <<EOF
[install] Optional — add this to ~/.codex/AGENTS.md so Codex always knows the tool:
## SkillOpt-Sleep
An offline self-improvement cycle is available. To run it:
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status\`. Use \`/skillopt-sleep\` for the guided flow.
Use the skillopt-sleep skill when I ask to run a sleep/dream/offline
self-improvement cycle. The runner is:
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status --project "\$(pwd)"\`.
Done. Try: /skillopt-sleep status
Done. Try asking Codex:
Use the skillopt-sleep skill to run status for this project.
EOF

View File

@@ -1,21 +0,0 @@
# /skillopt-sleep — SkillOpt-Sleep for Codex
#
# Custom prompt: copy this file to ~/.codex/prompts/skillopt-sleep.md and invoke with
# `/skillopt-sleep` in the Codex CLI. ($ARGUMENTS is the text after /skillopt-sleep.)
Run the SkillOpt-Sleep offline self-evolution cycle. Action: $ARGUMENTS
(empty → "status").
Use the bundled runner via shell:
bash "${SKILLOPT_SLEEP_REPO:?set SKILLOPT_SLEEP_REPO to the repo root}/plugins/run-sleep.sh" $ARGUMENTS --project "$(pwd)"
Then:
- For `run`/`dry-run`: read the staged `report.md` and show the held-out
baseline → candidate score and the proposed edits. `run` only stages a
proposal; nothing live changes until `adopt`.
- For `adopt`: confirm which files were updated and that a backup was written.
- Never edit the user's AGENTS.md / skills yourself; only `adopt` does that.
Default backend is `mock` (no API spend). Add `--backend codex` for real
improvement on the user's Codex budget.

View File

@@ -1,49 +1,93 @@
---
name: skillopt-sleep
description: Nightly offline self-evolution for a Codex agent. Reviews past sessions, replays recurring tasks, and consolidates validated memory + skills behind a held-out gate. Use when the user wants Codex to learn from past usage, run a "sleep"/"dream" cycle, or schedule offline self-optimization.
description: "Use when the user wants Codex to self-improve from past usage, asks about a nightly/offline 'sleep' or 'dream' cycle, wants Codex to review past sessions, learn preferences, consolidate memory/skills, run dry-run/run/adopt/status for SkillOpt-Sleep, or schedule offline self-optimization. Drives the skillopt_sleep engine: harvest past sessions -> mine recurring tasks -> replay offline -> consolidate validated memory + skills behind a held-out gate."
---
# SkillOpt-Sleep (Codex skill)
# SkillOpt-Sleep: offline self-evolution for a local Codex agent
This skill drives the `skillopt_sleep` engine — an offline "sleep cycle" that
makes a Codex agent better at the user's recurring work without retraining.
SkillOpt-Sleep gives the user's Codex agent a sleep cycle. While the user is
offline or on demand, it reviews past local sessions, re-runs recurring tasks
on the user's own budget, and consolidates what it learns into memory and
skills. It keeps only changes that pass a held-out validation gate, and live
files change only after the user explicitly adopts a staged proposal. There is
no model-weight training.
## When to use
Trigger when the user wants to: review past sessions, learn their preferences,
consolidate feedback into long-term memory/skills, run a nightly/offline
self-improvement cycle, or adopt a staged proposal.
Trigger when the user wants any of:
## How to run it
- Codex to learn from past sessions or get better the more they use it;
- a nightly/scheduled or on-demand sleep/dream/offline self-improvement run;
- to review past sessions and distill recurring tasks;
- to consolidate feedback into memory or managed skills;
- to run `status`, `harvest`, `dry-run`, `run`, or `adopt` for SkillOpt-Sleep.
## The cycle
1. **Harvest** - read local session transcripts according to the engine
configuration and normalize them into session digests.
2. **Mine** - turn digests into recurring `TaskRecord`s with outcomes and
checkable references where possible.
3. **Replay** - re-run mined tasks offline under the current skill and memory.
4. **Consolidate** - reflect on failures and propose bounded edits.
5. **Gate** - accept edits only when the held-out validation score improves.
6. **Stage** - write the proposal under
`<project>/.skillopt-sleep/staging/<date>/`; nothing live changes.
7. **Adopt** - only after explicit user approval, copy staged files over live
files with backups.
## How to drive it
Invoke the bundled runner via shell (Codex `exec` has shell access). The runner
finds the engine and a Python 3.10 automatically:
finds the engine and a Python >= 3.10 automatically.
```bash
# point at the repo if it isn't auto-detected from CWD:
export SKILLOPT_SLEEP_REPO=/path/to/SkillOpt-Sleep
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" <action> --project "$(pwd)"
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" status --project "$(pwd)"
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" harvest --project "$(pwd)"
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" dry-run --project "$(pwd)" --backend mock
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" run --project "$(pwd)" --backend codex
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" adopt --project "$(pwd)"
```
`<action>` `status | dry-run | run | adopt | harvest`. Use `--backend codex`
for real improvement on the user's own Codex budget (default `mock` = no spend).
Actions are `status`, `harvest`, `dry-run`, `run`, and `adopt`.
- Default backend is `mock`, which is deterministic and spends no API budget.
- `--backend codex` uses the user's Codex budget for real improvement.
- Keep `dry-run --backend mock` as the first smoke check unless the user
explicitly asked for a real optimization run.
## Steps
1. Run the requested action; capture stdout.
2. For `run`/`dry-run`: read the staged `report.md` it prints and show the user
the held-out baseline → candidate score and the exact proposed edits.
3. `run` only **stages** a proposal under `<project>/.skillopt-sleep/staging/`;
nothing live changes until `adopt`. Offer `/skillopt-sleep adopt`.
4. Never hand-edit the user's `AGENTS.md` / skills yourself — only `adopt` does,
and it backs up first.
2. For `dry-run` and `run`, report the held-out baseline -> candidate score,
gate action, task count, session count, and exact proposed edits.
3. If a staging directory is printed, read `report.md` before summarizing.
4. `run` only stages a proposal; nothing live changes until `adopt`.
5. Offer adoption only after the user has reviewed the staged proposal.
6. Never hand-edit the user's `AGENTS.md`, memory, or skills as a substitute
for `adopt`; adoption is the safety boundary and writes backups first.
## Hard rules
- Harvest is read-only. Do not edit archived sessions or raw transcripts.
- Keep raw secrets, credentials, private user data, and unsanitized transcript
contents out of messages, logs, generated artifacts, and commits.
- Show validation evidence before recommending adoption.
- Treat generated edits as proposals, not as source of truth.
- Do not rely on deprecated custom prompts or `/sleep` slash commands for this
Codex integration. This skill is the entrypoint.
## Validate
```bash
python -m skillopt_sleep dry-run --project "$(pwd)" --backend mock --json
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
--seeds brief-writer --data-root /path/to/gbrain-evals/eval/data/skillopt-v1 \
--nights 2 --limit-replay 3 --limit-holdout 3
```
A deficient skill goes 0.00 → 1.00 on a held-out set; the optimizer's edits are
gated on real-task performance.
A deficient skill goes 0.00 -> 1.00 on a held-out set; the optimizer's edits
are gated on real-task performance.