Files
microsoft-SkillOpt/docs/index.md
Cuzyoung 4a1b984d87 refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update
- Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts
- CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model
- Remove best_skill files, keep only initial skills
- Fix slow update gate (force write into skill)
- Fix SLOW_UPDATE marker stripping
- Remove deep_reflect and meta_reflect mechanisms
- Update .env.example with export prefix and azure_cli docs
- Add endpoint empty validation in azure_openai.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:15:10 +00:00

4.5 KiB

hide
hide
navigation

SkillOpt

Train Agent Skills Like Neural Networks

Optimize natural-language skill documents through iterative rollout, reflection, and gated validation — with epochs, learning rates, and validation gates — without touching model weights.

Get Started :material-rocket-launch:{ .md-button .md-button--primary } View on GitHub :material-github:{ .md-button }


How It Works

🎯
Rollout
Target executes tasks
🔍
Reflect
Optimizer analyzes trajectories
🔗
Aggregate
Merge edit patches
✂️
Select
Rank & clip edits
📝
Update
Apply to skill doc
🚦
Gate
Validate & accept
🔄 Slow Update
🧠 Meta Skill
Epoch Boundary

Deep Learning Analogy

SkillOpt brings the familiar deep-learning training paradigm to agentic prompt optimization:

Deep Learning SkillOpt
Model weights Skill document (Markdown)
Forward pass Rollout (target executes tasks)
Loss / gradient Reflect (optimizer produces edit patches)
Gradient clipping Edit selection (learning_rate = max edits)
SGD step Patch application to skill
Validation set Gated evaluation on selection split
LR schedule lr_scheduler: cosine, linear, constant
Epochs Multi-epoch with slow update & meta skill memory

Supported Benchmarks

Benchmark Type Config
DocVQA Document QA configs/docvqa/
ALFWorld Embodied AI configs/alfworld/
OfficeQA Enterprise QA configs/officeqa/
SearchQA Open-domain QA configs/searchqa/
LiveMathBench Math reasoning configs/livemathematicianbench/
SWEBench Software Engineering configs/swebench/
+ 5 more Various See docs

Quick Example

# Install
pip install -e .

# Configure credentials
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-key"

# Train on SearchQA
python scripts/train.py --config configs/searchqa/default.yaml

# Evaluate best skill
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/best_skill.md