mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

Cuzyoung 4a1b984d87 refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

- Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts
- CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model
- Remove best_skill files, keep only initial skills
- Fix slow update gate (force write into skill)
- Fix SLOW_UPDATE marker stripping
- Remove deep_reflect and meta_reflect mechanisms
- Update .env.example with export prefix and azure_cli docs
- Add endpoint empty validation in azure_openai.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-24 19:15:10 +00:00

4.5 KiB

Raw Permalink Blame History

hide

navigation

SkillOpt

Train Agent Skills Like Neural Networks

Optimize natural-language skill documents through iterative rollout, reflection, and gated validation — with epochs, learning rates, and validation gates — without touching model weights.

Get Started :material-rocket-launch:{ .md-button .md-button--primary } View on GitHub :material-github:{ .md-button }

How It Works

🎯

Rollout

Target executes tasks

🔍

Reflect

Optimizer analyzes trajectories

🔗

Aggregate

Merge edit patches

✂️

Select

Rank & clip edits

📝

Update

Apply to skill doc

🚦

Gate

Validate & accept

🔄 Slow Update

🧠 Meta Skill

Epoch Boundary

Deep Learning Analogy

SkillOpt brings the familiar deep-learning training paradigm to agentic prompt optimization:

Deep Learning	SkillOpt
Model weights	Skill document (Markdown)
Forward pass	Rollout (target executes tasks)
Loss / gradient	Reflect (optimizer produces edit patches)
Gradient clipping	Edit selection (`learning_rate` = max edits)
SGD step	Patch application to skill
Validation set	Gated evaluation on selection split
LR schedule	`lr_scheduler`: cosine, linear, constant
Epochs	Multi-epoch with slow update & meta skill memory

Supported Benchmarks

Benchmark	Type	Config
DocVQA	Document QA	`configs/docvqa/`
ALFWorld	Embodied AI	`configs/alfworld/`
OfficeQA	Enterprise QA	`configs/officeqa/`
SearchQA	Open-domain QA	`configs/searchqa/`
LiveMathBench	Math reasoning	`configs/livemathematicianbench/`
SWEBench	Software Engineering	`configs/swebench/`
+ 5 more	Various	See docs

Quick Example

# Install
pip install -e .

# Configure credentials
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-key"

# Train on SearchQA
python scripts/train.py --config configs/searchqa/default.yaml

# Evaluate best skill
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/best_skill.md

:material-book-open-variant:{ .lg .middle } Getting Started

Install SkillOpt, configure your API keys, and run your first experiment in 5 minutes.

:octicons-arrow-right-24: Installation
:material-puzzle:{ .lg .middle } Add a Benchmark

Extend SkillOpt with your own benchmark in ~100 lines of code.

:octicons-arrow-right-24: Extension Guide
:material-cog:{ .lg .middle } Configuration

Full reference for all hyperparameters with deep learning analogies.

:octicons-arrow-right-24: Config Reference
:material-monitor-dashboard:{ .lg .middle } WebUI

Configure, launch, and monitor training from your browser.

:octicons-arrow-right-24: WebUI Guide

4.5 KiB Raw Permalink Blame History