mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
Merge pull request #49 from Kirchberg/codex/codex-skill-first-upstream
Make Codex integration skill-first
This commit is contained in:
@@ -80,7 +80,7 @@ harvest session transcripts → mine recurring tasks → replay offline
|
||||
| Platform | Folder | Install |
|
||||
|---|---|---|
|
||||
| **Claude Code** | [`plugins/claude-code`](plugins/claude-code) | `/plugin marketplace add ./plugins/claude-code` → `/skillopt-sleep` |
|
||||
| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `/skillopt-sleep` |
|
||||
| **Codex** | [`plugins/codex`](plugins/codex) | `bash plugins/codex/install.sh` → `skillopt-sleep` skill |
|
||||
| **Copilot** | [`plugins/copilot`](plugins/copilot) | register `plugins/copilot/mcp_server.py` as an MCP server |
|
||||
|
||||
**Validated on real models.** On the public
|
||||
|
||||
@@ -15,7 +15,7 @@ Synthesizes SkillOpt (validation-gated bounded text edits), Claude Dreams
|
||||
Shipped as plugins for **three agents**, one engine + three thin shells:
|
||||
|
||||
- **Claude Code** — `.claude-plugin` + `/sleep` command + skill + hooks
|
||||
- **Codex** — `~/.codex/prompts/sleep.md` + `~/.agents/skills` + `install.sh`
|
||||
- **Codex** — user-level `skillopt-sleep` skill + shared runner + `install.sh`
|
||||
- **Copilot** — a stdlib-only MCP server exposing `sleep_*` tools
|
||||
|
||||
## Design notes
|
||||
|
||||
@@ -23,7 +23,7 @@ from scratch for this test. Two forms were used:
|
||||
| Shell | What was run | Result |
|
||||
|---|---|---|
|
||||
| **Claude Code** (`scripts/sleep.sh`) | `harvest`, full `run`, `adopt` | harvest found 2 sessions → 2 tasks; `run` staged a proposal; `adopt` honored the safety contract (no live change when nothing was accepted) |
|
||||
| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed `~/.codex/prompts/sleep.md` and `~/.agents/skills/skillopt-sleep/SKILL.md` correctly |
|
||||
| **Codex** (`install.sh` + shared runner) | `install.sh` into a temp HOME | placed the user-level `~/.agents/skills/skillopt-sleep/SKILL.md` skill correctly and moved any legacy custom prompt aside instead of installing one |
|
||||
| **Copilot** (`mcp_server.py`) | `initialize` → `tools/list` → `tools/call sleep_harvest` | 5 tools listed; `sleep_harvest` returned real engine output (2 sessions → 2 tasks) |
|
||||
|
||||
### Genuine improvement (real model, fresh persona)
|
||||
@@ -71,6 +71,6 @@ Shell checks:
|
||||
# Copilot MCP server
|
||||
printf '%s\n' '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \
|
||||
| SKILLOPT_SLEEP_REPO="$(pwd)" python3 plugins/copilot/mcp_server.py
|
||||
# Codex installer (into a throwaway HOME)
|
||||
# Codex skill installer (into a throwaway HOME)
|
||||
HOME=$(mktemp -d) bash plugins/codex/install.sh
|
||||
```
|
||||
|
||||
@@ -20,6 +20,12 @@ sleep** idea (short-term experience → long-term competence).
|
||||
|
||||
---
|
||||
|
||||
| Platform | Folder | Mechanism | Status |
|
||||
|---|---|---|---|
|
||||
| **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/skillopt-sleep` command + skill + hooks | full, installable |
|
||||
| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full |
|
||||
| **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) |
|
||||
|
||||
## Install (pick your agent)
|
||||
|
||||
| Platform | Install | Then |
|
||||
|
||||
@@ -14,28 +14,35 @@ as the Claude Code plugin (`skillopt_sleep`), wrapped for Codex.
|
||||
## What Codex supports (and what we use)
|
||||
|
||||
Codex (`@openai/codex`) extends via **`AGENTS.md`** instructions, **skills** at
|
||||
`~/.agents/skills/<name>/SKILL.md`, and **custom prompts** at
|
||||
`~/.codex/prompts/<name>.md` (invoked as `/<name>`). This integration ships all
|
||||
three, plus a shared runner.
|
||||
`~/.agents/skills/<name>/SKILL.md`, and plugins that can distribute skills.
|
||||
Custom prompts are deprecated in Codex, so this integration is skill-first: the
|
||||
installed `skillopt-sleep` skill contains the launch commands and operating
|
||||
rules. The shared runner remains a plain shell entrypoint that the skill calls.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
git clone <repo-url> SkillOpt-Sleep
|
||||
cd SkillOpt-Sleep
|
||||
bash plugins/codex/install.sh # installs the /skillopt-sleep prompt + skill
|
||||
bash plugins/codex/install.sh # installs the skill
|
||||
export SKILLOPT_SLEEP_REPO="$(pwd)" # so the runner is found from anywhere
|
||||
```
|
||||
|
||||
If a previous install created `~/.codex/prompts/sleep.md`, the installer moves
|
||||
that deprecated prompt aside with a `.skillopt-legacy*.bak` suffix.
|
||||
|
||||
Requires Python ≥ 3.10 and the `codex` CLI on PATH.
|
||||
|
||||
## Use
|
||||
|
||||
Mention `$skillopt-sleep` where Codex supports explicit skill mentions, or ask
|
||||
Codex in natural language:
|
||||
|
||||
```text
|
||||
/skillopt-sleep status # what's happened
|
||||
/skillopt-sleep dry-run # safe preview, stages nothing
|
||||
/skillopt-sleep run # full cycle, stages a reviewed proposal (no live edits)
|
||||
/skillopt-sleep adopt # apply the staged proposal (with backup)
|
||||
Use the skillopt-sleep skill to run status for this project.
|
||||
Use the skillopt-sleep skill to run a dry-run for this project.
|
||||
Use the skillopt-sleep skill to run the full cycle for this project with the Codex backend.
|
||||
Use the skillopt-sleep skill to adopt the latest staged proposal.
|
||||
```
|
||||
|
||||
Or call the engine directly:
|
||||
@@ -53,7 +60,7 @@ identically — see [`../../docs/sleep/CONTROLLABLE_DREAMING.md`](../../docs/sle
|
||||
|
||||
- Codex's `exec` runs shell, so the real-tool-loop replay (e.g. the
|
||||
`tool_called: search` benchmark seed) works natively.
|
||||
- Codex's standalone *plugin-package manifest* format is not yet a stable public
|
||||
spec; this integration uses the documented `AGENTS.md` + skills + prompts
|
||||
mechanisms, which are stable. If/when a `codex plugin` package format ships,
|
||||
we'll add a one-file manifest.
|
||||
- This integration no longer installs a `.codex/prompts` slash command. Skills
|
||||
are the reusable Codex workflow surface; mention `skillopt-sleep` explicitly
|
||||
or ask for a sleep/dream/offline self-improvement run and Codex can load the
|
||||
skill.
|
||||
|
||||
@@ -1,24 +1,30 @@
|
||||
#!/usr/bin/env bash
|
||||
# Install the SkillOpt-Sleep Codex integration into the user's ~/.codex and
|
||||
# ~/.agents directories. Idempotent; prints what it does.
|
||||
# Install the SkillOpt-Sleep Codex integration as a user-level Codex skill.
|
||||
# Idempotent; prints what it does.
|
||||
set -euo pipefail
|
||||
|
||||
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
|
||||
AGENTS_SKILLS="${HOME}/.agents/skills"
|
||||
LEGACY_PROMPT="$CODEX_HOME/prompts/sleep.md"
|
||||
|
||||
echo "[install] repo: $REPO_ROOT"
|
||||
|
||||
# 1) custom /skillopt-sleep prompt
|
||||
mkdir -p "$CODEX_HOME/prompts"
|
||||
cp "$REPO_ROOT/plugins/codex/prompts/skillopt-sleep.md" "$CODEX_HOME/prompts/skillopt-sleep.md"
|
||||
echo "[install] /skillopt-sleep prompt -> $CODEX_HOME/prompts/skillopt-sleep.md"
|
||||
|
||||
# 2) user-level skill
|
||||
# 1) user-level skill
|
||||
mkdir -p "$AGENTS_SKILLS/skillopt-sleep"
|
||||
cp "$REPO_ROOT/plugins/codex/skills/skillopt-sleep/SKILL.md" "$AGENTS_SKILLS/skillopt-sleep/SKILL.md"
|
||||
echo "[install] skill -> $AGENTS_SKILLS/skillopt-sleep/SKILL.md"
|
||||
|
||||
# 2) retire the old custom prompt entrypoint from previous installs
|
||||
if [ -f "$LEGACY_PROMPT" ]; then
|
||||
backup="${LEGACY_PROMPT}.skillopt-legacy.bak"
|
||||
if [ -e "$backup" ]; then
|
||||
backup="${LEGACY_PROMPT}.skillopt-legacy.$(date +%Y%m%d%H%M%S).bak"
|
||||
fi
|
||||
mv "$LEGACY_PROMPT" "$backup"
|
||||
echo "[install] legacy prompt -> $backup"
|
||||
fi
|
||||
|
||||
# 3) record the repo location so the runner is found from anywhere
|
||||
echo "[install] add to your shell profile:"
|
||||
echo " export SKILLOPT_SLEEP_REPO=\"$REPO_ROOT\""
|
||||
@@ -29,8 +35,10 @@ cat <<EOF
|
||||
[install] Optional — add this to ~/.codex/AGENTS.md so Codex always knows the tool:
|
||||
|
||||
## SkillOpt-Sleep
|
||||
An offline self-improvement cycle is available. To run it:
|
||||
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status\`. Use \`/skillopt-sleep\` for the guided flow.
|
||||
Use the skillopt-sleep skill when I ask to run a sleep/dream/offline
|
||||
self-improvement cycle. The runner is:
|
||||
\`bash "$REPO_ROOT/plugins/run-sleep.sh" status --project "\$(pwd)"\`.
|
||||
|
||||
Done. Try: /skillopt-sleep status
|
||||
Done. Try asking Codex:
|
||||
Use the skillopt-sleep skill to run status for this project.
|
||||
EOF
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
# /skillopt-sleep — SkillOpt-Sleep for Codex
|
||||
#
|
||||
# Custom prompt: copy this file to ~/.codex/prompts/skillopt-sleep.md and invoke with
|
||||
# `/skillopt-sleep` in the Codex CLI. ($ARGUMENTS is the text after /skillopt-sleep.)
|
||||
|
||||
Run the SkillOpt-Sleep offline self-evolution cycle. Action: $ARGUMENTS
|
||||
(empty → "status").
|
||||
|
||||
Use the bundled runner via shell:
|
||||
|
||||
bash "${SKILLOPT_SLEEP_REPO:?set SKILLOPT_SLEEP_REPO to the repo root}/plugins/run-sleep.sh" $ARGUMENTS --project "$(pwd)"
|
||||
|
||||
Then:
|
||||
- For `run`/`dry-run`: read the staged `report.md` and show the held-out
|
||||
baseline → candidate score and the proposed edits. `run` only stages a
|
||||
proposal; nothing live changes until `adopt`.
|
||||
- For `adopt`: confirm which files were updated and that a backup was written.
|
||||
- Never edit the user's AGENTS.md / skills yourself; only `adopt` does that.
|
||||
|
||||
Default backend is `mock` (no API spend). Add `--backend codex` for real
|
||||
improvement on the user's Codex budget.
|
||||
@@ -1,49 +1,93 @@
|
||||
---
|
||||
name: skillopt-sleep
|
||||
description: Nightly offline self-evolution for a Codex agent. Reviews past sessions, replays recurring tasks, and consolidates validated memory + skills behind a held-out gate. Use when the user wants Codex to learn from past usage, run a "sleep"/"dream" cycle, or schedule offline self-optimization.
|
||||
description: "Use when the user wants Codex to self-improve from past usage, asks about a nightly/offline 'sleep' or 'dream' cycle, wants Codex to review past sessions, learn preferences, consolidate memory/skills, run dry-run/run/adopt/status for SkillOpt-Sleep, or schedule offline self-optimization. Drives the skillopt_sleep engine: harvest past sessions -> mine recurring tasks -> replay offline -> consolidate validated memory + skills behind a held-out gate."
|
||||
---
|
||||
|
||||
# SkillOpt-Sleep (Codex skill)
|
||||
# SkillOpt-Sleep: offline self-evolution for a local Codex agent
|
||||
|
||||
This skill drives the `skillopt_sleep` engine — an offline "sleep cycle" that
|
||||
makes a Codex agent better at the user's recurring work without retraining.
|
||||
SkillOpt-Sleep gives the user's Codex agent a sleep cycle. While the user is
|
||||
offline or on demand, it reviews past local sessions, re-runs recurring tasks
|
||||
on the user's own budget, and consolidates what it learns into memory and
|
||||
skills. It keeps only changes that pass a held-out validation gate, and live
|
||||
files change only after the user explicitly adopts a staged proposal. There is
|
||||
no model-weight training.
|
||||
|
||||
## When to use
|
||||
|
||||
Trigger when the user wants to: review past sessions, learn their preferences,
|
||||
consolidate feedback into long-term memory/skills, run a nightly/offline
|
||||
self-improvement cycle, or adopt a staged proposal.
|
||||
Trigger when the user wants any of:
|
||||
|
||||
## How to run it
|
||||
- Codex to learn from past sessions or get better the more they use it;
|
||||
- a nightly/scheduled or on-demand sleep/dream/offline self-improvement run;
|
||||
- to review past sessions and distill recurring tasks;
|
||||
- to consolidate feedback into memory or managed skills;
|
||||
- to run `status`, `harvest`, `dry-run`, `run`, or `adopt` for SkillOpt-Sleep.
|
||||
|
||||
## The cycle
|
||||
|
||||
1. **Harvest** - read local session transcripts according to the engine
|
||||
configuration and normalize them into session digests.
|
||||
2. **Mine** - turn digests into recurring `TaskRecord`s with outcomes and
|
||||
checkable references where possible.
|
||||
3. **Replay** - re-run mined tasks offline under the current skill and memory.
|
||||
4. **Consolidate** - reflect on failures and propose bounded edits.
|
||||
5. **Gate** - accept edits only when the held-out validation score improves.
|
||||
6. **Stage** - write the proposal under
|
||||
`<project>/.skillopt-sleep/staging/<date>/`; nothing live changes.
|
||||
7. **Adopt** - only after explicit user approval, copy staged files over live
|
||||
files with backups.
|
||||
|
||||
## How to drive it
|
||||
|
||||
Invoke the bundled runner via shell (Codex `exec` has shell access). The runner
|
||||
finds the engine and a Python ≥ 3.10 automatically:
|
||||
finds the engine and a Python >= 3.10 automatically.
|
||||
|
||||
```bash
|
||||
# point at the repo if it isn't auto-detected from CWD:
|
||||
export SKILLOPT_SLEEP_REPO=/path/to/SkillOpt-Sleep
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" <action> --project "$(pwd)"
|
||||
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" status --project "$(pwd)"
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" harvest --project "$(pwd)"
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" dry-run --project "$(pwd)" --backend mock
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" run --project "$(pwd)" --backend codex
|
||||
bash "$SKILLOPT_SLEEP_REPO/plugins/run-sleep.sh" adopt --project "$(pwd)"
|
||||
```
|
||||
|
||||
`<action>` ∈ `status | dry-run | run | adopt | harvest`. Use `--backend codex`
|
||||
for real improvement on the user's own Codex budget (default `mock` = no spend).
|
||||
Actions are `status`, `harvest`, `dry-run`, `run`, and `adopt`.
|
||||
|
||||
- Default backend is `mock`, which is deterministic and spends no API budget.
|
||||
- `--backend codex` uses the user's Codex budget for real improvement.
|
||||
- Keep `dry-run --backend mock` as the first smoke check unless the user
|
||||
explicitly asked for a real optimization run.
|
||||
|
||||
## Steps
|
||||
|
||||
1. Run the requested action; capture stdout.
|
||||
2. For `run`/`dry-run`: read the staged `report.md` it prints and show the user
|
||||
the held-out baseline → candidate score and the exact proposed edits.
|
||||
3. `run` only **stages** a proposal under `<project>/.skillopt-sleep/staging/`;
|
||||
nothing live changes until `adopt`. Offer `/skillopt-sleep adopt`.
|
||||
4. Never hand-edit the user's `AGENTS.md` / skills yourself — only `adopt` does,
|
||||
and it backs up first.
|
||||
2. For `dry-run` and `run`, report the held-out baseline -> candidate score,
|
||||
gate action, task count, session count, and exact proposed edits.
|
||||
3. If a staging directory is printed, read `report.md` before summarizing.
|
||||
4. `run` only stages a proposal; nothing live changes until `adopt`.
|
||||
5. Offer adoption only after the user has reviewed the staged proposal.
|
||||
6. Never hand-edit the user's `AGENTS.md`, memory, or skills as a substitute
|
||||
for `adopt`; adoption is the safety boundary and writes backups first.
|
||||
|
||||
## Hard rules
|
||||
|
||||
- Harvest is read-only. Do not edit archived sessions or raw transcripts.
|
||||
- Keep raw secrets, credentials, private user data, and unsanitized transcript
|
||||
contents out of messages, logs, generated artifacts, and commits.
|
||||
- Show validation evidence before recommending adoption.
|
||||
- Treat generated edits as proposals, not as source of truth.
|
||||
- Do not rely on deprecated custom prompts or `/sleep` slash commands for this
|
||||
Codex integration. This skill is the entrypoint.
|
||||
|
||||
## Validate
|
||||
|
||||
```bash
|
||||
python -m skillopt_sleep dry-run --project "$(pwd)" --backend mock --json
|
||||
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
|
||||
--seeds brief-writer --data-root /path/to/gbrain-evals/eval/data/skillopt-v1 \
|
||||
--nights 2 --limit-replay 3 --limit-holdout 3
|
||||
```
|
||||
A deficient skill goes 0.00 → 1.00 on a held-out set; the optimizer's edits are
|
||||
gated on real-task performance.
|
||||
|
||||
A deficient skill goes 0.00 -> 1.00 on a held-out set; the optimizer's edits
|
||||
are gated on real-task performance.
|
||||
|
||||
Reference in New Issue
Block a user