mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
Make Codex integration skill-first
This commit is contained in:
committed by
carpedkm
parent
1b2652c6f8
commit
1953484822
@@ -20,6 +20,12 @@ sleep** idea (short-term experience → long-term competence).
|
||||
|
||||
---
|
||||
|
||||
| Platform | Folder | Mechanism | Status |
|
||||
|---|---|---|---|
|
||||
| **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/skillopt-sleep` command + skill + hooks | full, installable |
|
||||
| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full |
|
||||
| **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) |
|
||||
|
||||
## Install (pick your agent)
|
||||
|
||||
| Platform | Install | Then |
|
||||
|
||||
@@ -14,16 +14,17 @@ as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex.
|
||||
## What Codex supports (and what we use)
|
||||
|
||||
Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at
|
||||
`~/.agents/skills/<name>/SKILL.md`, and **custom prompts** at
|
||||
`~/.codex/prompts/<name>.md` (invoked as `/<name>`). This integration ships all
|
||||
three, plus a shared runner.
|
||||
`~/.agents/skills/<name>/SKILL.md`, and plugins that can distribute skills.
|
||||
Custom prompts are deprecated in Codex, so this integration is skill-first: the
|
||||
installed `skillopt-sleep` skill contains the launch commands and operating
|
||||
rules. The shared runner remains a plain shell entrypoint that the skill calls.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
git clone <repo-url> SkillOpt-Sleep
|
||||
cd SkillOpt-Sleep
|
||||
bash plugins/codex/install.sh # installs the /skillopt-sleep prompt + skill
|
||||
bash plugins/codex/install.sh # installs the skill
|
||||
export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere
|
||||
```
|
||||
|
||||
@@ -31,11 +32,14 @@ Requires Python ≥ 3.10 and the `codex` CLI on PATH.
|
||||
|
||||
## Use
|
||||
|
||||
Mention `$skillopt-sleep` where Codex supports explicit skill mentions, or ask
|
||||
Codex in natural language:
|
||||
|
||||
```text
|
||||
/skillopt-sleep status # what's happened
|
||||
/skillopt-sleep dry-run # safe preview, stages nothing
|
||||
/skillopt-sleep run # full cycle, stages a reviewed proposal (no live edits)
|
||||
/skillopt-sleep adopt # apply the staged proposal (with backup)
|
||||
Use the skillopt-sleep skill to run status for this project.
|
||||
Use the skillopt-sleep skill to run a dry-run for this project.
|
||||
Use the skillopt-sleep skill to run the full cycle for this project with the Codex backend.
|
||||
Use the skillopt-sleep skill to adopt the latest staged proposal.
|
||||
```
|
||||
|
||||
Or call the engine directly:
|
||||
@@ -53,7 +57,7 @@ identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sle
|
||||
|
||||
- Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the
|
||||
`tool_called: search` benchmark seed) works natively.
|
||||
- Codex's standalone *plugin-package manifest* format is not yet a stable public
|
||||
spec; this integration uses the documented `AGENTS.md` + skills + prompts
|
||||
mechanisms, which are stable. If/when a `codex plugin` package format ships,
|
||||
we'll add a one-file manifest.
|
||||
- This integration no longer installs a `.codex/prompts` slash command. Skills
|
||||
are the reusable Codex workflow surface; mention `skillopt-sleep` explicitly
|
||||
or ask for a sleep/dream/offline self-improvement run and Codex can load the
|
||||
skill.
|
||||
|
||||
@@ -1,36 +1,32 @@
|
||||
#!/usr/bin/env bash
|
||||
# Install the SkillOpt-Sleep Codex integration into the user's ~/.codex and
|
||||
# ~/.agents directories. Idempotent; prints what it does.
|
||||
# Install the SkillOpt-Sleep Codex integration as a user-level Codex skill.
|
||||
# Idempotent; prints what it does.
|
||||
set -euo pipefail
|
||||
|
||||
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
|
||||
AGENTS_SKILLS="${HOME}/.agents/skills"
|
||||
|
||||
echo "[install] repo: $REPO_ROOT"
|
||||
|
||||
# 1) custom /skillopt-sleep prompt
|
||||
mkdir -p "$CODEX_HOME/prompts"
|
||||
cp "$REPO_ROOT/plugins/codex/prompts/skillopt-sleep.md" "$CODEX_HOME/prompts/skillopt-sleep.md"
|
||||
echo "[install] /skillopt-sleep prompt -> $CODEX_HOME/prompts/skillopt-sleep.md"
|
||||
|
||||
# 2) user-level skill
|
||||
# 1) user-level skill
|
||||
mkdir -p "$AGENTS_SKILLS/skillopt-sleep"
|
||||
cp "$REPO_ROOT/plugins/codex/skills/skillopt-sleep/SKILL.md" "$AGENTS_SKILLS/skillopt-sleep/SKILL.md"
|
||||
echo "[install] skill -> $AGENTS_SKILLS/skillopt-sleep/SKILL.md"
|
||||
|
||||
# 3) record the repo location so the runner is found from anywhere
|
||||
# 2) record the repo location so the runner is found from anywhere
|
||||
echo "[install] add to your shell profile:"
|
||||
echo " export SKILLOPT_SLEEP_REPO=\"$REPO_ROOT\""
|
||||
|
||||
# 4) optional: append an AGENTS.md hint (only if the user opts in)
|
||||
# 3) optional: append an AGENTS.md hint (only if the user opts in)
|
||||
cat <<EOF
|
||||
|
||||
[install] Optional — add this to ~/.codex/AGENTS.md so Codex always knows the tool:
|
||||
|
||||
## SkillOpt-Sleep
|
||||
An offline self-improvement cycle is available. To run it:
|
||||
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status\`. Use \`/skillopt-sleep\` for the guided flow.
|
||||
Use the skillopt-sleep skill when I ask to run a sleep/dream/offline
|
||||
self-improvement cycle. The runner is:
|
||||
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status --project "\$(pwd)"\`.
|
||||
|
||||
Done. Try: /skillopt-sleep status
|
||||
Done. Try asking Codex:
|
||||
Use the skillopt-sleep skill to run status for this project.
|
||||
EOF
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
# /skillopt-sleep — SkillOpt-Sleep for Codex
|
||||
#
|
||||
# Custom prompt: copy this file to ~/.codex/prompts/skillopt-sleep.md and invoke with
|
||||
# `/skillopt-sleep` in the Codex CLI. ($ARGUMENTS is the text after /skillopt-sleep.)
|
||||
|
||||
Run the SkillOpt-Sleep offline self-evolution cycle. Action: $ARGUMENTS
|
||||
(empty → "status").
|
||||
|
||||
Use the bundled runner via shell:
|
||||
|
||||
bash "${SKILLOPT_SLEEP_REPO:?set SKILLOPT_SLEEP_REPO to the repo root}/plugins/run-sleep.sh" $ARGUMENTS --project "$(pwd)"
|
||||
|
||||
Then:
|
||||
- For `run`/`dry-run`: read the staged `report.md` and show the held-out
|
||||
baseline → candidate score and the proposed edits. `run` only stages a
|
||||
proposal; nothing live changes until `adopt`.
|
||||
- For `adopt`: confirm which files were updated and that a backup was written.
|
||||
- Never edit the user's AGENTS.md / skills yourself; only `adopt` does that.
|
||||
|
||||
Default backend is `mock` (no API spend). Add `--backend codex` for real
|
||||
improvement on the user's Codex budget.
|
||||
@@ -1,49 +1,93 @@
|
||||
---
|
||||
name: skillopt-sleep
|
||||
description: Nightly offline self-evolution for a Codex agent. Reviews past sessions, replays recurring tasks, and consolidates validated memory + skills behind a held-out gate. Use when the user wants Codex to learn from past usage, run a "sleep"/"dream" cycle, or schedule offline self-optimization.
|
||||
description: "Use when the user wants Codex to self-improve from past usage, asks about a nightly/offline 'sleep' or 'dream' cycle, wants Codex to review past sessions, learn preferences, consolidate memory/skills, run dry-run/run/adopt/status for SkillOpt-Sleep, or schedule offline self-optimization. Drives the skillopt_sleep engine: harvest past sessions -> mine recurring tasks -> replay offline -> consolidate validated memory + skills behind a held-out gate."
|
||||
---
|
||||
|
||||
# SkillOpt-Sleep (Codex skill)
|
||||
# SkillOpt-Sleep: offline self-evolution for a local Codex agent
|
||||
|
||||
This skill drives the `skillopt_sleep` engine — an offline "sleep cycle" that
|
||||
makes a Codex agent better at the user's recurring work without retraining.
|
||||
SkillOpt-Sleep gives the user's Codex agent a sleep cycle. While the user is
|
||||
offline or on demand, it reviews past local sessions, re-runs recurring tasks
|
||||
on the user's own budget, and consolidates what it learns into memory and
|
||||
skills. It keeps only changes that pass a held-out validation gate, and live
|
||||
files change only after the user explicitly adopts a staged proposal. There is
|
||||
no model-weight training.
|
||||
|
||||
## When to use
|
||||
|
||||
Trigger when the user wants to: review past sessions, learn their preferences,
|
||||
consolidate feedback into long-term memory/skills, run a nightly/offline
|
||||
self-improvement cycle, or adopt a staged proposal.
|
||||
Trigger when the user wants any of:
|
||||
|
||||
## How to run it
|
||||
- Codex to learn from past sessions or get better the more they use it;
|
||||
- a nightly/scheduled or on-demand sleep/dream/offline self-improvement run;
|
||||
- to review past sessions and distill recurring tasks;
|
||||
- to consolidate feedback into memory or managed skills;
|
||||
- to run `status`, `harvest`, `dry-run`, `run`, or `adopt` for SkillOpt-Sleep.
|
||||
|
||||
## The cycle
|
||||
|
||||
1. **Harvest** - read local session transcripts according to the engine
|
||||
configuration and normalize them into session digests.
|
||||
2. **Mine** - turn digests into recurring `TaskRecord`s with outcomes and
|
||||
checkable references where possible.
|
||||
3. **Replay** - re-run mined tasks offline under the current skill and memory.
|
||||
4. **Consolidate** - reflect on failures and propose bounded edits.
|
||||
5. **Gate** - accept edits only when the held-out validation score improves.
|
||||
6. **Stage** - write the proposal under
|
||||
`<project>/.skillopt-sleep/staging/<date>/`; nothing live changes.
|
||||
7. **Adopt** - only after explicit user approval, copy staged files over live
|
||||
files with backups.
|
||||
|
||||
## How to drive it
|
||||
|
||||
Invoke the bundled runner via shell (Codex `exec` has shell access). The runner
|
||||
finds the engine and a Python ≥ 3.10 automatically:
|
||||
finds the engine and a Python >= 3.10 automatically.
|
||||
|
||||
```bash
|
||||
# point at the repo if it isn't auto-detected from CWD:
|
||||
export SKILLOPT_SLEEP_REPO=/path/to/SkillOpt-Sleep
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" <action> --project "$(pwd)"
|
||||
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" status --project "$(pwd)"
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" harvest --project "$(pwd)"
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" dry-run --project "$(pwd)" --backend mock
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" run --project "$(pwd)" --backend codex
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" adopt --project "$(pwd)"
|
||||
```
|
||||
|
||||
`<action>` ∈ `status | dry-run | run | adopt | harvest`. Use `--backend codex`
|
||||
for real improvement on the user's own Codex budget (default `mock` = no spend).
|
||||
Actions are `status`, `harvest`, `dry-run`, `run`, and `adopt`.
|
||||
|
||||
- Default backend is `mock`, which is deterministic and spends no API budget.
|
||||
- `--backend codex` uses the user's Codex budget for real improvement.
|
||||
- Keep `dry-run --backend mock` as the first smoke check unless the user
|
||||
explicitly asked for a real optimization run.
|
||||
|
||||
## Steps
|
||||
|
||||
1. Run the requested action; capture stdout.
|
||||
2. For `run`/`dry-run`: read the staged `report.md` it prints and show the user
|
||||
the held-out baseline → candidate score and the exact proposed edits.
|
||||
3. `run` only **stages** a proposal under `<project>/.skillopt-sleep/staging/`;
|
||||
nothing live changes until `adopt`. Offer `/skillopt-sleep adopt`.
|
||||
4. Never hand-edit the user's `AGENTS.md` / skills yourself — only `adopt` does,
|
||||
and it backs up first.
|
||||
2. For `dry-run` and `run`, report the held-out baseline -> candidate score,
|
||||
gate action, task count, session count, and exact proposed edits.
|
||||
3. If a staging directory is printed, read `report.md` before summarizing.
|
||||
4. `run` only stages a proposal; nothing live changes until `adopt`.
|
||||
5. Offer adoption only after the user has reviewed the staged proposal.
|
||||
6. Never hand-edit the user's `AGENTS.md`, memory, or skills as a substitute
|
||||
for `adopt`; adoption is the safety boundary and writes backups first.
|
||||
|
||||
## Hard rules
|
||||
|
||||
- Harvest is read-only. Do not edit archived sessions or raw transcripts.
|
||||
- Keep raw secrets, credentials, private user data, and unsanitized transcript
|
||||
contents out of messages, logs, generated artifacts, and commits.
|
||||
- Show validation evidence before recommending adoption.
|
||||
- Treat generated edits as proposals, not as source of truth.
|
||||
- Do not rely on deprecated custom prompts or `/sleep` slash commands for this
|
||||
Codex integration. This skill is the entrypoint.
|
||||
|
||||
## Validate
|
||||
|
||||
```bash
|
||||
python -m skillopt_sleep dry-run --project "$(pwd)" --backend mock --json
|
||||
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
|
||||
--seeds brief-writer --data-root /path/to/gbrain-evals/eval/data/skillopt-v1 \
|
||||
--nights 2 --limit-replay 3 --limit-holdout 3
|
||||
```
|
||||
A deficient skill goes 0.00 → 1.00 on a held-out set; the optimizer's edits are
|
||||
gated on real-task performance.
|
||||
|
||||
A deficient skill goes 0.00 -> 1.00 on a held-out set; the optimizer's edits
|
||||
are gated on real-task performance.
|
||||
|
||||
Reference in New Issue
Block a user