# SkillOpt: Executive Strategy for Self-Evolving Agent Skills *Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights.* [![Project Page](https://img.shields.io/badge/Project%20Page-SkillOpt-8dbb3c)](https://microsoft.github.io/SkillOpt/) [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b)](https://arxiv.org/abs/2605.23904) [![Project Video](https://img.shields.io/badge/Project%20Video-Watch%20Demo-ff0000)](https://youtu.be/JUBMDTCiM0M) [![PyPI](https://img.shields.io/badge/PyPI-skillopt-green.svg)](https://pypi.org/project/skillopt/) [![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

> 📖 **For installation, data preparation, training/eval commands, the full configuration reference, and framework internals, see the [Documentation & Reproduction Guide](https://microsoft.github.io/SkillOpt/docs/guideline.html)** (rendered on GitHub Pages). --- ## News 🔥🔥🔥 - **[2026-07-02]** 🚀 **SkillOpt [v0.2.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) is out on [PyPI](https://pypi.org/project/skillopt/)!** Headline feature: **SkillOpt-Sleep**, a nightly offline self-evolution engine (harvest → mine → replay → consolidate, all behind a held-out validation gate) with multi-objective reward, experience replay + dream rollouts, and long-term memory — now shipped as the `skillopt-sleep` CLI. This release also adds cross-tool backends and plugin shells for **Claude, Codex, Copilot, Devin, and OpenClaw**, SearchQA split materialization, Windows robustness, and hardened JSON parsing. See the [release notes](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) for the full changelog and contributor acknowledgements. - **[2026-06-15]** 😴 **SkillOpt-Sleep (preview)** — a nightly offline self-evolution companion for local coding agents (Claude Code / Codex / Copilot): review past sessions, replay recurring tasks, and consolidate validated skills behind a held-out gate. See **[`docs/sleep/README.md`](docs/sleep/README.md)** for what it is, how to use it, and results. - **[2026-06-03]** 🎉 **[gbrain](https://github.com/garrytan/gbrain), [gbrain-evals](https://github.com/garrytan/gbrain-evals/blob/main/docs/benchmarks/2026-06-03-skillopt.md), and [darwin-skill](https://github.com/alchaincyf/darwin-skill) have all integrated SkillOpt.** - **[2026-06-02]** 🎉 **SkillOpt [v0.1.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.1.0) is now available on [PyPI](https://pypi.org/project/skillopt/)!** Install with `pip install skillopt`. This initial release includes the full training loop (rollout → reflect → aggregate → select → update → evaluate), multi-backend support (OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and WebUI dashboard. --- ## Overview Modern agent skills are usually hand-crafted, generated one-shot by a strong LLM, or evolved through loosely controlled self-revision — none of which behaves like a deep-learning optimizer for the skill itself, and none of which reliably improves over its starting point under feedback. **SkillOpt treats the skill document as the trainable state of a frozen agent**, and trains it with the discipline that makes weight-space optimization reproducible. A separate optimizer model turns scored rollouts into bounded add / delete / replace edits on a single skill document; a candidate edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, a rejected-edit buffer, and an epoch-wise slow / meta update make skill training stable while adding **zero inference-time model calls** at deployment. The deployed artifact is a compact `best_skill.md` (typically 300–2,000 tokens) that runs against the unchanged target model. Across **six benchmarks, seven target models, and three execution harnesses** (direct chat, Codex CLI, Claude Code CLI), SkillOpt is best or tied-best on **all 52 evaluated (model, benchmark, harness) cells** and on GPT-5.5 lifts the average no-skill accuracy by **+23.5 points in direct chat, +24.8 inside the Codex agentic loop, and +19.1 inside Claude Code**. Optimized skill artifacts transfer across model scales, between Codex and Claude Code harnesses, and to nearby benchmarks without further optimization. For the full method, ablations, and per-cell results see the [paper](https://arxiv.org/abs/2605.23904); for a visual walkthrough of the loop see the [project page](https://microsoft.github.io/SkillOpt/); for deeper API / backend / benchmark docs see [`docs/`](docs/). ## 🎬 Demo Video https://github.com/user-attachments/assets/eb12d3bc-371c-467f-904d-91b61f339ed7

--- ## Extensibility & WebUI ### Adding a new backend A backend = a chat / exec target (e.g. `openai_chat`, `claude_chat`, `qwen_chat`, `minimax_chat`, `codex_exec`, `claude_code_exec`). See [`docs/guide/new-backend.md`](docs/guide/new-backend.md) for the full contract; in short you add a `skillopt/model/_backend.py` module, register it in `skillopt/model/common.py` + `backend_config.py`, and wire it through the router in `skillopt/model/__init__.py`. `qwen_backend.py` and `minimax_backend.py` are good templates. ### Adding a new benchmark A benchmark = a `skillopt/envs//` package with a `dataloader.py`, a `rollout.py`, and an `initial.md` seed skill. See [`docs/guide/new-benchmark.md`](docs/guide/new-benchmark.md) for the full contract; the simplest reference is `skillopt/envs/searchqa/`. ### WebUI Launch the monitoring dashboard (optional): ```bash pip install -e ".[webui]" python -m skillopt_webui.app ``` | Flag | Default | Description | |---|---|---| | `--port` | 7860 | Server port | | `--host` | `0.0.0.0` | Bind address | | `--share` | off | Create a public Gradio share link | --- ## Citation ```bibtex @article{yang2026skillopt, title={Skillopt: Executive strategy for self-evolving agent skills}, author={Yang, Yifan and Gong, Ziyang and Huang, Weiquan and Yang, Qihao and Zhou, Ziwei and Huang, Zisu and Li, Yan and Gao, Xuemei and Dai, Qi and Liu, Bei and others}, journal={arXiv preprint arXiv:2605.23904}, year={2026} } ```