mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
Highlights since v0.1.0: - feat: SkillOpt-Sleep engine — nightly offline self-evolution (harvest -> mine -> replay -> consolidate behind a validation gate), with multi-objective reward, experience replay + dream rollouts, slow-update long-term memory, and secret redaction in cycle diagnostics. Shipped as the `skillopt-sleep` CLI. - feat: cross-tool backends & plugin shells — Claude, Codex (+Desktop harvest), Copilot, Devin, and OpenClaw. - feat: SearchQA split materialization + rollout fail-fast. - fix: Windows robustness for claude/codex backends, hardened JSON fallback, Qwen timeout/thinking gating, Codex failure surfacing. Packaging: - Bump pyproject / skillopt / skillopt_sleep to 0.2.0. - Restore skillopt_webui to the packaged wheel. See CHANGELOG.md for the full changelog and contributor acknowledgements. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
108 lines
7.0 KiB
Markdown
108 lines
7.0 KiB
Markdown
# SkillOpt: Executive Strategy for Self-Evolving Agent Skills
|
||
|
||
*Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights.*
|
||
|
||
[](https://microsoft.github.io/SkillOpt/) [](https://arxiv.org/abs/2605.23904) [](https://youtu.be/JUBMDTCiM0M) [](https://pypi.org/project/skillopt/) [](https://www.python.org/) [](LICENSE)
|
||
|
||
<p align="center">
|
||
<a href="https://trendshift.io/repositories/38498?utm_source=trendshift-badge&utm_medium=badge&utm_campaign=badge-trendshift-38498" target="_blank" rel="noopener noreferrer"><img src="https://trendshift.io/api/badge/trendshift/repositories/38498/daily?language=Python" alt="microsoft%2FSkillOpt | Trendshift" width="250" height="55"/></a>
|
||
<a href="https://trendshift.io/repositories/38498?utm_source=trendshift-badge&utm_medium=badge&utm_campaign=badge-trendshift-38498" target="_blank" rel="noopener noreferrer"><img src="https://trendshift.io/api/badge/trendshift/repositories/38498/weekly?language=Python" alt="microsoft%2FSkillOpt | Trendshift" width="250" height="55"/></a>
|
||
</p>
|
||
|
||
> 📖 **For installation, data preparation, training/eval commands, the full configuration reference, and framework internals, see the [Documentation & Reproduction Guide](https://microsoft.github.io/SkillOpt/docs/guideline.html)** (rendered on GitHub Pages).
|
||
|
||
---
|
||
|
||
## News 🔥🔥🔥
|
||
- **[2026-07-02]** 🚀 **SkillOpt [v0.2.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) is out on [PyPI](https://pypi.org/project/skillopt/)!** Headline feature: **SkillOpt-Sleep**, a nightly offline self-evolution engine (harvest → mine → replay → consolidate, all behind a held-out validation gate) with multi-objective reward, experience replay + dream rollouts, and long-term memory — now shipped as the `skillopt-sleep` CLI. This release also adds cross-tool backends and plugin shells for **Claude, Codex, Copilot, Devin, and OpenClaw**, SearchQA split materialization, Windows robustness, and hardened JSON parsing. See the [release notes](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) for the full changelog and contributor acknowledgements.
|
||
- **[2026-06-15]** 😴 **SkillOpt-Sleep (preview)** — a nightly offline self-evolution companion for local coding agents (Claude Code / Codex / Copilot): review past sessions, replay recurring tasks, and consolidate validated skills behind a held-out gate. See **[`docs/sleep/README.md`](docs/sleep/README.md)** for what it is, how to use it, and results.
|
||
- **[2026-06-03]** 🎉 **[gbrain](https://github.com/garrytan/gbrain), [gbrain-evals](https://github.com/garrytan/gbrain-evals/blob/main/docs/benchmarks/2026-06-03-skillopt.md), and [darwin-skill](https://github.com/alchaincyf/darwin-skill) have all integrated SkillOpt.**
|
||
- **[2026-06-02]** 🎉 **SkillOpt [v0.1.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.1.0) is now available on [PyPI](https://pypi.org/project/skillopt/)!** Install with `pip install skillopt`. This initial release includes the full training loop (rollout → reflect → aggregate → select → update → evaluate), multi-backend support (OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and WebUI dashboard.
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Modern agent skills are usually hand-crafted, generated one-shot by a strong
|
||
LLM, or evolved through loosely controlled self-revision — none of which
|
||
behaves like a deep-learning optimizer for the skill itself, and none of
|
||
which reliably improves over its starting point under feedback.
|
||
|
||
**SkillOpt treats the skill document as the trainable state of a frozen
|
||
agent**, and trains it with the discipline that makes weight-space
|
||
optimization reproducible. A separate optimizer model turns scored rollouts
|
||
into bounded add / delete / replace edits on a single skill document; a
|
||
candidate edit is accepted only when it strictly improves a held-out
|
||
validation score. A textual learning-rate budget, a rejected-edit buffer,
|
||
and an epoch-wise slow / meta update make skill training stable while
|
||
adding **zero inference-time model calls** at deployment.
|
||
|
||
The deployed artifact is a compact `best_skill.md` (typically 300–2,000
|
||
tokens) that runs against the unchanged target model. Across **six
|
||
benchmarks, seven target models, and three execution harnesses** (direct
|
||
chat, Codex CLI, Claude Code CLI), SkillOpt is best or tied-best on **all
|
||
52 evaluated (model, benchmark, harness) cells** and on GPT-5.5 lifts the
|
||
average no-skill accuracy by **+23.5 points in direct chat, +24.8 inside
|
||
the Codex agentic loop, and +19.1 inside Claude Code**. Optimized skill
|
||
artifacts transfer across model scales, between Codex and Claude Code
|
||
harnesses, and to nearby benchmarks without further optimization.
|
||
|
||
For the full method, ablations, and per-cell results see the [paper](https://arxiv.org/abs/2605.23904); for a visual walkthrough of the loop see the [project page](https://microsoft.github.io/SkillOpt/); for deeper API / backend / benchmark docs see [`docs/`](docs/).
|
||
|
||
## 🎬 Demo Video
|
||
|
||
https://github.com/user-attachments/assets/eb12d3bc-371c-467f-904d-91b61f339ed7
|
||
|
||
<p align="center">
|
||
<a href="https://youtu.be/JUBMDTCiM0M"><b>▶ Watch the full demo on YouTube</b></a>
|
||
</p>
|
||
|
||
---
|
||
|
||
## Extensibility & WebUI
|
||
|
||
### Adding a new backend
|
||
|
||
A backend = a chat / exec target (e.g. `openai_chat`, `claude_chat`,
|
||
`qwen_chat`, `minimax_chat`, `codex_exec`, `claude_code_exec`). See
|
||
[`docs/guide/new-backend.md`](docs/guide/new-backend.md) for the full
|
||
contract; in short you add a `skillopt/model/<name>_backend.py` module,
|
||
register it in `skillopt/model/common.py` + `backend_config.py`, and wire
|
||
it through the router in `skillopt/model/__init__.py`. `qwen_backend.py`
|
||
and `minimax_backend.py` are good templates.
|
||
|
||
### Adding a new benchmark
|
||
|
||
A benchmark = a `skillopt/envs/<name>/` package with a `dataloader.py`, a
|
||
`rollout.py`, and an `initial.md` seed skill. See
|
||
[`docs/guide/new-benchmark.md`](docs/guide/new-benchmark.md) for the full
|
||
contract; the simplest reference is `skillopt/envs/searchqa/`.
|
||
|
||
### WebUI
|
||
|
||
Launch the monitoring dashboard (optional):
|
||
|
||
```bash
|
||
pip install -e ".[webui]"
|
||
python -m skillopt_webui.app
|
||
```
|
||
|
||
| Flag | Default | Description |
|
||
|---|---|---|
|
||
| `--port` | 7860 | Server port |
|
||
| `--host` | `0.0.0.0` | Bind address |
|
||
| `--share` | off | Create a public Gradio share link |
|
||
|
||
---
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@article{yang2026skillopt,
|
||
title={Skillopt: Executive strategy for self-evolving agent skills},
|
||
author={Yang, Yifan and Gong, Ziyang and Huang, Weiquan and Yang, Qihao and Zhou, Ziwei and Huang, Zisu and Li, Yan and Gao, Xuemei and Dai, Qi and Liu, Bei and others},
|
||
journal={arXiv preprint arXiv:2605.23904},
|
||
year={2026}
|
||
}
|
||
```
|