chore(release): v0.2.0

Highlights since v0.1.0: - feat: SkillOpt-Sleep engine — nightly offline self-evolution (harvest -> mine -> replay -> consolidate behind a validation gate), with multi-objective reward, experience replay + dream rollouts, slow-update long-term memory, and secret redaction in cycle diagnostics. Shipped as the `skillopt-sleep` CLI. - feat: cross-tool backends & plugin shells — Claude, Codex (+Desktop harvest), Copilot, Devin, and OpenClaw. - feat: SearchQA split materialization + rollout fail-fast. - fix: Windows robustness for claude/codex backends, hardened JSON fallback, Qwen timeout/thinking gating, Codex failure surfacing. Packaging: - Bump pyproject / skillopt / skillopt_sleep to 0.2.0. - Restore skillopt_webui to the packaged wheel. See CHANGELOG.md for the full changelog and contributor acknowledgements. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 14:02:58 +08:00 · 2026-07-02 22:11:10 +08:00
parent 5487e2c426
commit e4ea6a6771
6 changed files with 122 additions and 6 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,100 @@
+# Changelog
+
+All notable changes to SkillOpt are documented here. This project adheres to
+[Semantic Versioning](https://semver.org/) and the format is based on
+[Keep a Changelog](https://keepachangelog.com/).
+
+## [0.2.0] — 2026-07-02
+
+The headline of this release is **SkillOpt-Sleep**: a nightly offline
+self-evolution engine that harvests a coding agent's real session
+transcripts, mines recurring tasks, replays them offline, and consolidates
+short-term experience into long-term memory and skills — all behind the same
+held-out validation gate that keeps SkillOpt training honest. It ships as a
+decoupled top-level package (`skillopt_sleep/`, zero dependency on the
+research code) and as the new `skillopt-sleep` CLI.
+
+### Added
+- **SkillOpt-Sleep engine** — nightly offline self-evolution cycle
+  (harvest → mine → replay → consolidate) behind a validation gate, exposed
+  as the `skillopt-sleep` console script and `python -m skillopt_sleep`.
+  - Multi-objective reward (accuracy / tokens / latency) with user preferences.
+  - Multi-rollout contrastive reflection under a token/time budget.
+  - Experience replay + controllable dream rollouts (opt-in).
+  - Slow-update long-term memory field (runs even with the gate off).
+  - 3-way train/val/test split with `gate_mode on|off`.
+  - Verifier-discipline validation gate, with a stress-test suite
+    (thanks @Tanmay9223, #87).
+- **Cross-tool backends & plugin shells** for Claude Code, Codex, Copilot,
+  Devin, and OpenClaw:
+  - Codex Desktop transcript harvesting, skill-first Codex integration, and a
+    reviewed task-file flow (thanks @Kirchberg, #48, #49, #60).
+  - GitHub Copilot backend (`CopilotCliBackend`) + research-engine MCP plugin
+    (thanks @Dongbumlee, #50).
+  - Devin plugin: MCP server + ATIF-v1.7 harvest (thanks @xerxes-y, #88).
+  - OpenClaw shell for SkillOpt-Sleep (thanks @Elzlxx, #59).
+- **SearchQA** split materialization helper and fail-fast on systemic rollout
+  failures, with a `searchqa` install extra (thanks @summerview1997,
+  #63, #64, #65).
+- WebUI environment loading and backend preflight (thanks @summerview1997, #63).
+
+### Changed
+- Decoupled the Sleep engine into a standalone top-level `skillopt_sleep/`
+  package with zero dependency on the research code.
+- Made `EnvAdapter.reflect` a shared default so reflect kwargs are no longer
+  dropped (thanks @imshunsuke, #44).
+- English-only pass across the engine, plugins, and docs.
+
+### Fixed
+- Windows robustness for the Claude/Codex backends, plus a hardened JSON
+  fallback path (thanks @Yif-Yang, #79).
+- Reject prose pseudo-JSON wrapped in single quotes/backticks (#82).
+- Surface Codex auth/model/version failures instead of silently scoring 0
+  (thanks @dmmdea, #92).
+- Redact secrets before persisting cycle diagnostics.
+- Configure the `qwen_chat`/`minimax` backends so local LLM endpoints work
+  (thanks @imrehg, #85).
+- Forward the Qwen target timeout and gate `enable_thinking` for vLLM targets
+  (thanks @mvanhorn, #40).
+- Make `--bare` conditional on `ANTHROPIC_API_KEY` (#68), add a
+  `SKILLOPT_SLEEP_PYTHON` override with a lookback-hours first-run fallback
+  (#74), and fix ALFWorld gamefile paths relative to `ALFWORLD_DATA`.
+
+### Packaging
+- Bump `skillopt`, `skillopt.__version__`, and `skillopt_sleep.__version__`
+  to `0.2.0`.
+- Restore `skillopt_webui` to the built wheel (it was dropped when the
+  `packages.find` include list was made explicit).
+- Add the `searchqa` extra and include `json_repair` in the `claude`, `qwen`,
+  and `all` extras.
+
+### Acknowledgements 🙏
+v0.2.0 landed thanks to our community contributors — thank you!
+
+- @Kirchberg — Codex Desktop harvesting, skill-first Codex integration,
+  reviewed task-file flow (#48, #49, #60)
+- @Dongbumlee — GitHub Copilot backend + research-engine MCP plugin (#50)
+- @summerview1997 — SearchQA materialization, rollout fail-fast, WebUI
+  preflight (#63, #64, #65)
+- @xerxes-y — Devin plugin: MCP server + ATIF-v1.7 harvest (#88)
+- @Elzlxx — OpenClaw shell for SkillOpt-Sleep (#59)
+- @imshunsuke — shared `EnvAdapter.reflect` default + docs fixes (#43, #44)
+- @mvanhorn — Qwen timeout forwarding + `enable_thinking` gating (#40)
+- @dmmdea — surface Codex auth/model/version failures (#92)
+- @Tanmay9223 — verifier-discipline stress test (#87)
+- @imrehg — `configure_qwen_chat` for local LLM endpoints (#85)
+- @samuelgoofus-boop — community contributions
+
+Special thanks to @Yif-Yang for driving the SkillOpt-Sleep engine.
+
+**Full changelog:** https://github.com/microsoft/SkillOpt/compare/v0.1.0...v0.2.0
+
+## [0.1.0] — 2026-06-02
+
+Initial public release: the full training loop (rollout → reflect →
+aggregate → select → update → evaluate), multi-backend support
+(OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and the
+WebUI dashboard.
+
+[0.2.0]: https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0
+[0.1.0]: https://github.com/microsoft/SkillOpt/releases/tag/v0.1.0
--- a/README.md
+++ b/README.md
@@ -14,6 +14,7 @@
 ---

 ## News 🔥🔥🔥
+- **[2026-07-02]** 🚀 **SkillOpt [v0.2.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) is out on [PyPI](https://pypi.org/project/skillopt/)!** Headline feature: **SkillOpt-Sleep**, a nightly offline self-evolution engine (harvest → mine → replay → consolidate, all behind a held-out validation gate) with multi-objective reward, experience replay + dream rollouts, and long-term memory — now shipped as the `skillopt-sleep` CLI. This release also adds cross-tool backends and plugin shells for **Claude, Codex, Copilot, Devin, and OpenClaw**, SearchQA split materialization, Windows robustness, and hardened JSON parsing. See the [release notes](https://github.com/microsoft/SkillOpt/releases/tag/v0.2.0) for the full changelog and contributor acknowledgements.
 - **[2026-06-15]** 😴 **SkillOpt-Sleep (preview)** — a nightly offline self-evolution companion for local coding agents (Claude Code / Codex / Copilot): review past sessions, replay recurring tasks, and consolidate validated skills behind a held-out gate. See **[`docs/sleep/README.md`](docs/sleep/README.md)** for what it is, how to use it, and results.
 - **[2026-06-03]** 🎉 **[gbrain](https://github.com/garrytan/gbrain), [gbrain-evals](https://github.com/garrytan/gbrain-evals/blob/main/docs/benchmarks/2026-06-03-skillopt.md), and [darwin-skill](https://github.com/alchaincyf/darwin-skill) have all integrated SkillOpt.**
 - **[2026-06-02]** 🎉 **SkillOpt [v0.1.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.1.0) is now available on [PyPI](https://pypi.org/project/skillopt/)!** Install with `pip install skillopt`. This initial release includes the full training loop (rollout → reflect → aggregate → select → update → evaluate), multi-backend support (OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and WebUI dashboard.
--- a/docs/sleep/README.md
+++ b/docs/sleep/README.md
@@ -28,6 +28,20 @@ experience → long-term competence).

 ## How to use it

+### Quickest path: the `skillopt-sleep` CLI (pip)
+
+```bash
+pip install skillopt        # installs the engine + the `skillopt-sleep` command
+skillopt-sleep dry-run      # harvest + mine + replay, report only (changes nothing)
+skillopt-sleep run          # a full nightly cycle; the proposal is staged for review
+skillopt-sleep status       # show state + the latest staged proposal
+skillopt-sleep adopt        # apply the latest staged proposal
+skillopt-sleep schedule     # install a nightly cron entry for this project
+```
+
+The per-agent plugin shells below (Claude Code / Codex / Copilot) still come from the
+repo; the CLI above is the standalone, pip-only way to run a cycle.
+
 One engine, thin per-agent shells (see [`plugins/`](../../plugins)):

 | Platform | Folder | Install |
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "skillopt"
-version = "0.1.0"
+version = "0.2.0"
 description = "SkillOpt: Agentic Skill Optimization via Reflective Training Loops"
 readme = "README.md"
 license = {text = "MIT"}
@@ -68,9 +68,10 @@ Repository = "https://github.com/microsoft/SkillOpt"
 Issues = "https://github.com/microsoft/SkillOpt/issues"

 [tool.setuptools.packages.find]
-# skillopt* = the research package; skillopt_sleep = the open-source Sleep tool
-# (decoupled, zero dependency on the research code).
-include = ["skillopt", "skillopt.*", "skillopt_sleep", "skillopt_sleep.*", "scripts*"]
+# skillopt*      = the research package
+# skillopt_sleep = the open-source Sleep tool (decoupled, zero research dep)
+# skillopt_webui = the Gradio dashboard (installed via the `webui` extra)
+include = ["skillopt", "skillopt.*", "skillopt_sleep", "skillopt_sleep.*", "skillopt_webui", "skillopt_webui.*", "scripts*"]

 [tool.ruff]
 line-length = 120
--- a/skillopt/init.py
+++ b/skillopt/init.py
@@ -12,7 +12,7 @@ Pipeline stages:
  6. Evaluate  — validate candidate skill, accept/reject
 """

-__version__ = "0.1.0"
+__version__ = "0.2.0"

 from skillopt.types import (  # noqa: F401
    BatchSpec,
--- a/skillopt_sleep/init.py
+++ b/skillopt_sleep/init.py
@@ -17,4 +17,4 @@ Public entry points:
 from __future__ import annotations

 __all__ = ["__version__"]
-__version__ = "0.1.0"
+__version__ = "0.2.0"