mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

Cuzyoung 4a1b984d87 refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

- Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts
- CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model
- Remove best_skill files, keep only initial skills
- Fix slow update gate (force write into skill)
- Fix SLOW_UPDATE marker stripping
- Remove deep_reflect and meta_reflect mechanisms
- Update .env.example with export prefix and azure_cli docs
- Add endpoint empty validation in azure_openai.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-24 19:15:10 +00:00

2.4 KiB

Raw Blame History

Your First Experiment

This guide walks through running a complete SkillOpt training on SearchQA.

1. Choose a Benchmark

SkillOpt includes ready-to-use configs for several benchmarks:

Benchmark	Difficulty	Typical Runtime
SearchQA	⭐ Easy	~30 min
DocVQA	⭐⭐ Medium	~2 hours
ALFWorld	⭐⭐⭐ Hard	~3 hours

We'll use SearchQA as it's the fastest to complete.

2. Configure

Review the config file:

cat configs/searchqa/default.yaml

Key parameters (deep learning analogy in parentheses):

train:
  num_epochs: 4           # (epochs)
  batch_size: 40          # (batch size)

optimizer:
  learning_rate: 4        # (max edits per step)
  lr_scheduler: cosine    # (learning rate schedule)
  use_slow_update: true   # (momentum at epoch boundary)
  use_meta_skill: true    # (cross-epoch optimizer memory)

gradient:
  analyst_workers: 16     # (parallel reflection workers)

evaluation:
  use_gate: true          # (validation gating)

3. Train

python scripts/train.py --config configs/searchqa/default.yaml

You'll see output like:

[Step 1/8] Rollout: 20 items, 4 workers...
[Step 1/8] Score: 0.65 → Reflect...
[Step 1/8] 6 edit patches generated
[Step 1/8] Selected 4 edits (lr=8, cosine → 7.7)
[Step 1/8] Gate: val score 0.68 > 0.65 ✓ ACCEPT
[Step 2/8] ...

4. Monitor

Training outputs are saved to outputs/<benchmark>/<run_id>/:

outputs/searchqa/2024-01-15_10-30-00/
├── steps/
│   ├── step_0001/
│   │   ├── candidate_skill.md
│   │   ├── step_record.json
│   │   └── trajectory_digest.json
│   └── step_0002/
├── slow_update/
│   └── epoch_02/
├── meta_skill/
│   └── epoch_02/
├── skills/
│   └── step_0001.md
├── best_skill.md
├── history.json
└── config.yaml

5. Evaluate

Evaluate the best skill on the test split:

python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/searchqa/<run_id>/skills/best_skill.md

WebUI

Prefer a graphical interface? Launch the WebUI:

pip install -e ".[webui]"
python -m skillopt_webui.app

Then open http://localhost:7860 in your browser to configure parameters and launch training.

2.4 KiB Raw Blame History