From 7ae2d8766e8424bd7ccce82e21e577aea70bb0f7 Mon Sep 17 00:00:00 2001 From: Cuzyoung Date: Sun, 24 May 2026 19:19:12 +0000 Subject: [PATCH] docs: restore clean README with Install/Data/QuickStart/WebUI/Citation only Keep remote project page header (badges, video), replace body with our streamlined 5-section README focused on reproducibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 407 +++++++++++++++--------------------------------------- 1 file changed, 109 insertions(+), 298 deletions(-) diff --git a/README.md b/README.md index 2d1edde..7266b05 100644 --- a/README.md +++ b/README.md @@ -13,105 +13,18 @@ --- -## What is SkillOpt? +## Install -SkillOpt is a framework for optimizing a natural-language **skill document** through iterative rollout, reflection, editing, and gated validation. - -It does **not** fine-tune model parameters. Instead, it treats the skill document as the optimization target: - -- The **target** model executes tasks with the current skill -- The **optimizer** model analyzes trajectories and proposes edits -- The framework merges, ranks, applies, and validates those edits -- Only validated skill updates are kept - -| Deep Learning | SkillOpt | -|---|---| -| Model weights | Skill document (Markdown) | -| Forward pass | Rollout (target executes tasks) | -| Loss computation | Reflect (optimizer analyzes trajectories) | -| Gradient | Edit patches (proposed skill improvements) | -| Gradient clipping | Edit ranking & selection (`learning_rate`) | -| Weight update | Patch application to skill document | -| Validation | Gated evaluation on held-out split | -| Learning rate schedule | `lr_scheduler`: cosine, linear decay | -| Epochs | Multi-epoch training with slow update & meta skill | - ---- - -## Method Overview - -### Optimization Target - -Each run maintains a mutable markdown skill document. The framework repeatedly improves that document instead of changing model parameters. - -This gives a training-style loop for prompt / policy optimization: - -1. Roll out the current skill on a batch of tasks. -2. Reflect on failures and successes. -3. Merge patch proposals into a coherent candidate update. -4. Rank and select a bounded number of edits. -5. Apply those edits to produce a candidate skill. -6. Validate the candidate skill on a held-out selection split. -7. Keep the update only if the gate accepts it. - -### Per-Step Pipeline - -Every training step executes the following pipeline in `skillopt/engine/trainer.py`: - -1. **Rollout** - The target model runs a batch of tasks using the current skill. - -2. **Reflect** - The optimizer analyzes minibatches of trajectories and emits raw patches. - Failure-driven and success-driven patches are tracked separately. - -3. **Aggregate** - Raw patches are merged hierarchically. Metadata such as `support_count` and `source_type` is carried into the merged patch so later ranking can use it. - -4. **Select** - The optimizer ranks the merged edit pool and keeps up to `edit_budget` edits. - -5. **Update** - The selected edits are applied to the skill document. The framework records an `edit_apply_report.json` so you can see which edits actually landed, which were skipped, and why. - -6. **Evaluate / Gate** - The candidate skill is evaluated on the selection split. A candidate update is accepted only if it improves over the current selection score; a new global best is tracked separately. - -### Within-Epoch Memory - -Inside an epoch, the trainer maintains a step buffer containing: - -- Compact failure-pattern summaries from previous steps -- Rejected edits and their score deltas - -That context is fed back into later reflection calls so the optimizer can avoid repeating ineffective edits and can focus on unsolved error patterns. - -### Epoch-Level Mechanisms - -#### Slow Update - -At the end of each epoch, `slow_update` compares the previous epoch's terminal skill and current epoch's terminal skill on a sampled train subset. It then writes longitudinal guidance into a protected slow-update region inside the skill document. - -This guidance is **not** blindly written through — it is converted into a candidate skill and sent through the same selection gate as step-level updates. - -#### Meta Skill - -`meta_skill` is optimizer-side cross-epoch memory. It does not directly edit the current skill. Instead, it writes a compact memory artifact describing longer-term patterns across adjacent epochs. That memory is loaded into later reflection / merge / ranking calls as extra context. - -#### Meta Reflect - -`meta_reflect` runs at epoch end over the step history of the current epoch. It looks at accepted and rejected directions from the whole epoch, proposes higher-level patch edits, applies them to a meta candidate, and then sends that candidate through the same selection gate. - ---- - -## Quick Start - -### Install +**Requirements:** Python 3.10+ ```bash -git clone https://github.com/AgenticOpt/SkillOpt.git +git clone https://github.com/microsoft/SkillOpt.git cd SkillOpt pip install -e . + +# For ALFWorld benchmark (optional): +pip install -e ".[alfworld]" +alfworld-download ``` ### Configure API Credentials @@ -122,13 +35,17 @@ cp .env.example .env source .env ``` -**Azure OpenAI** (API key or managed identity): +**Azure OpenAI** (recommended): ```bash export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" +# Option 1: API key auth export AZURE_OPENAI_API_KEY="your-key" -# Or use managed identity: set azure_openai_auth_mode=managed_identity in config +# Option 2: Azure CLI auth (no API key needed) +export AZURE_OPENAI_AUTH_MODE="azure_cli" ``` +> **Note:** `AZURE_OPENAI_ENDPOINT` is always required. Without it, all LLM calls will fail. + **OpenAI** directly: ```bash export OPENAI_API_KEY="sk-..." @@ -145,237 +62,139 @@ export QWEN_CHAT_BASE_URL="http://localhost:8000/v1" export QWEN_CHAT_MODEL="Qwen/Qwen3.5-4B" ``` -### Run Training - -```bash -python scripts/train.py --config configs/searchqa/default.yaml -``` - --- -## Configuration +## Data Preparation -SkillOpt uses a hierarchical YAML configuration system. Each benchmark config inherits from `configs/_base_/default.yaml`. +SkillOpt expects data in a **split directory** with `train/`, `val/`, `test/` subdirectories, each containing a JSON file (e.g., `items.json`). -### Configuration Structure - -```yaml -model: - optimizer_backend: openai_chat # openai_chat | claude_chat | qwen_chat - target_backend: openai_chat # openai_chat | claude_chat | codex_exec | qwen_chat - optimizer: gpt-5.5 # optimizer model deployment name - target: gpt-5.5 # target model deployment name - reasoning_effort: medium # low | medium | high - -train: - num_epochs: 4 - batch_size: 40 - seed: 42 - -gradient: - minibatch_size: 8 # trajectories per reflection call - analyst_workers: 16 # parallel reflection workers - use_deep_reflect: false # deep multi-turn probing - deep_reflect_failures: 4 - deep_reflect_successes: 2 - -optimizer: - learning_rate: 4 # max edits per step (edit_budget) - min_learning_rate: 2 # min edits for decay schedulers - lr_scheduler: cosine # constant | linear | cosine | autonomous - skill_update_mode: patch # patch | rewrite_from_suggestions | full_rewrite_minibatch - use_slow_update: true - use_meta_skill: true - use_meta_reflect: false - -evaluation: - use_gate: true # gated validation (always recommended) - -env: - name: "" # benchmark name - skill_init: "" # path to initial skill document - split_mode: ratio # ratio | split_dir - split_ratio: "2:1:7" # train:val:test +``` +data/my_split/ +├── train/items.json +├── val/items.json +└── test/items.json ``` -### CLI Overrides +Each JSON file is an array of task items. The required fields depend on the benchmark. For example, SearchQA items look like: -Override any config key from the command line: - -```bash -python scripts/train.py \ - --config configs/searchqa/default.yaml \ - --cfg-options model.optimizer_backend=openai_chat \ - model.target_backend=codex_exec \ - train.batch_size=40 \ - optimizer.learning_rate=4 - -# Legacy flat overrides also work for common keys: -python scripts/train.py \ - --config configs/searchqa/default.yaml \ - --backend azure_openai \ - --optimizer_model gpt-5.5 \ - --target_model gpt-5.5 \ - --reasoning_effort medium +```json +[ + { + "id": "unique_item_id", + "question": "Who wrote the novel ...", + "context": "[DOC] relevant passage text ...", + "answers": ["expected answer"] + } +] ``` ---- +See `skillopt/envs//dataloader.py` for the exact format each benchmark expects. -## Model Backends +> **Note:** Benchmark datasets are not included in this repository. Prepare your own data following the format above. -All model access goes through the unified backend router in `skillopt/model/`. - -| Backend | Use case | Config key | -|---|---|---| -| `openai_chat` | Azure OpenAI / OpenAI API | optimizer / target | -| `claude_chat` | Anthropic Claude | optimizer / target | -| `codex_exec` | Codex execution harness | target only | -| `qwen_chat` | Local Qwen via vLLM | optimizer / target | - -Separate optimizer/target endpoints are supported: - -```yaml -model: - optimizer_backend: openai_chat - target_backend: codex_exec - optimizer: gpt-5.5 - target: gpt-5.5-codex -``` - ---- - -## Data Splits - -SkillOpt supports two split modes: - -**Ratio split** — auto-generate from raw data: -```bash -python scripts/train.py \ - --config configs/searchqa/default.yaml \ - --split_mode ratio \ - --data_path /path/to/searchqa_data.json -``` - -**Pre-split directory** — consume prepared splits: -```bash -python scripts/train.py \ - --config configs/searchqa/default.yaml \ - --split_mode split_dir \ - --split_dir /path/to/searchqa_split -``` - ---- - -## Supported Benchmarks +### Supported Benchmarks | Benchmark | Type | Config | |---|---|---| | SearchQA | QA | `configs/searchqa/default.yaml` | -| SpreadsheetBench | Code generation | `configs/spreadsheetbench/default.yaml` | | ALFWorld | Embodied agent | `configs/alfworld/default.yaml` | | DocVQA | Document QA | `configs/docvqa/default.yaml` | -| OfficeQA | Tool-augmented QA | `configs/officeqa/default.yaml` | -| SealQA | Tool-augmented QA | `configs/sealqa/default.yaml` | -| BabyVision | Vision QA | `configs/babyvision/default.yaml` | | LiveMathematicianBench | Math | `configs/livemathematicianbench/default.yaml` | -| MathVerse | Multimodal math | `configs/mathverse/default.yaml` | -| MMRB | Multimodal reasoning | `configs/mmrb/default.yaml` | -| SWEBench | Software engineering | `configs/swebench/default.yaml` | +| SpreadsheetBench | Code generation | `configs/spreadsheetbench/default.yaml` | +| OfficeQA | Tool-augmented QA | `configs/officeqa/default.yaml` | --- -## Running Training +## Quick Start -Basic training: - -```bash -python scripts/train.py --config configs/searchqa/default.yaml -``` - -Exec harness (Codex target): +### Training ```bash +# Minimal example — train on SearchQA: python scripts/train.py \ - --config configs/searchqa/default.yaml \ - --optimizer_backend openai_chat \ - --target_backend codex_exec \ - --optimizer_model gpt-5.5 \ - --target_model gpt-5.5-codex \ - --use_deep_reflect true \ - --skill_update_mode rewrite_from_suggestions -``` + --config configs/searchqa/default.yaml \ + --split_dir /path/to/your/searchqa_split \ + --azure_openai_endpoint https://your-resource.openai.azure.com/ \ + --optimizer_model gpt-5.5 \ + --target_model gpt-5.5 -SWEBench: - -```bash +# Train on LiveMathematicianBench: python scripts/train.py \ - --config configs/swebench/default.yaml \ - --cfg-options env.dataset_name=lite env.split_ratio=2:1:7 + --config configs/livemathematicianbench/default.yaml \ + --split_dir /path/to/your/livemath_split \ + --azure_openai_endpoint https://your-resource.openai.azure.com/ \ + --optimizer_model gpt-5.5 \ + --target_model gpt-5.5 + +# Train on ALFWorld: +python scripts/train.py \ + --config configs/alfworld/default.yaml \ + --split_dir /path/to/your/alfworld_split \ + --azure_openai_endpoint https://your-resource.openai.azure.com/ \ + --optimizer_model gpt-5.5 \ + --target_model gpt-5.5 ``` +Key CLI arguments: + +| Argument | Description | Example | +|---|---|---| +| `--config` | Benchmark config YAML | `configs/searchqa/default.yaml` | +| `--split_dir` | Path to data split directory | `/path/to/split` | +| `--azure_openai_endpoint` | Azure OpenAI endpoint URL | `https://your-resource.openai.azure.com/` | +| `--optimizer_model` | Optimizer model deployment name | `gpt-5.5` | +| `--target_model` | Target model deployment name | `gpt-5.5` | +| `--num_epochs` | Number of training epochs | `4` | +| `--batch_size` | Batch size per step | `40` | +| `--workers` | Parallel rollout workers | `8` | +| `--out_root` | Output directory | `outputs/my_run` | + ### Eval Only -Evaluate a specific skill without training: +Evaluate a trained skill on specific data splits without training: ```bash +# Evaluate on test set only: python scripts/eval_only.py \ --config configs/searchqa/default.yaml \ - --skill skillopt/envs/searchqa/skills/initial.md + --skill outputs/my_run/best_skill.md \ + --split valid_unseen \ + --split_dir /path/to/searchqa_split \ + --azure_openai_endpoint https://your-resource.openai.azure.com/ + +# Evaluate on all splits (train + val + test): +python scripts/eval_only.py \ + --config configs/searchqa/default.yaml \ + --skill outputs/my_run/best_skill.md \ + --split all \ + --split_dir /path/to/searchqa_split \ + --azure_openai_endpoint https://your-resource.openai.azure.com/ ``` ---- +| Split | Description | +|---|---| +| `valid_unseen` | Test set | +| `valid_seen` | Validation set | +| `train` | Training set | +| `all` | All splits combined (default) | -## Output Structure +### Output Structure -Each run writes a structured output directory: +Each run writes to a structured output directory: ``` outputs// ├── config.json # Flattened runtime config -├── history.json # Per-step history records -├── runtime_state.json # Resume state (for auto-resume) -├── best_skill.md # Current best validated skill +├── history.json # Per-step training history +├── runtime_state.json # Resume checkpoint +├── best_skill.md # Best validated skill document ├── skills/skill_vXXXX.md # Skill snapshot per step -├── steps/step_XXXX/ # Per-step artifacts -│ ├── merged_patch.json -│ ├── ranked_edits.json -│ ├── candidate_skill.md -│ ├── edit_apply_report.json -│ ├── rewrite_result.json # when rewrite mode is enabled -│ └── selection_eval/ -├── slow_update/epoch_XX/ -├── meta_skill/epoch_XX/ -└── meta_reflect/epoch_XX/ +├── steps/step_XXXX/ # Per-step artifacts (patches, evals) +├── slow_update/epoch_XX/ # Slow update logs +└── meta_skill/epoch_XX/ # Meta skill logs ``` -### Resume Behavior - -The trainer resumes from `runtime_state.json` when present. That state tracks: - -- Last completed step -- Current skill path and score -- Best skill path and score -- Origin tags for current and best skill - ---- - -## Extending SkillOpt - -### Add a New Benchmark - -1. Create `skillopt/envs//` with: - - `adapter.py` — implements `EnvAdapter` - - `dataloader.py` — data loading logic - - `rollout.py` — target execution logic - - `skills/initial.md` — initial skill document -2. Add a config at `configs//default.yaml` -3. Register in `skillopt/envs/__init__.py` - -See `skillopt/envs/_template/` for a scaffold. - -### Add a New Model Backend - -Implement a backend in `skillopt/model/` following the interface in `skillopt/model/common.py`, then register it in `skillopt/model/router.py`. +Re-running the same command auto-resumes from the last completed step. --- @@ -388,34 +207,26 @@ pip install -e ".[webui]" python -m skillopt_webui.app ``` -Provides browser-based config selection, training launch, and real-time log monitoring. - ---- - -## Minimal Setup +| Flag | Default | Description | +|---|---|---| +| `--port` | 7860 | Server port | +| `--host` | `0.0.0.0` | Bind address | +| `--share` | off | Create a public Gradio share link | ```bash -conda create -n skillopt python=3.11 -conda activate skillopt -pip install -e . +# With public share link (useful for remote servers) +python -m skillopt_webui.app --share ``` -Depending on the benchmark, you may also need: - -```bash -pip install datasets gymnasium numpy -``` - -For SWEBench, you also need a working Docker environment plus the SWE-bench harness dependencies. - --- ## Citation ```bibtex @article{skillopt2026, - title={SkillOpt: Executive Strategy for Self-Evolving Agent Skills}, + title={SKILLOPT: Executive Strategy for Self-Evolving Agent Skills}, author={SkillOpt Team}, year={2026} } ``` +