microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
Cuzyoung	0dc84162dc	feat(optimizer): skill-aware reflection (EmbodiSkill S_app), config-controlled and env-independent Split failure reflections into SKILL_DEFECT (body edit) vs EXECUTION_LAPSE (protected appendix note that re-emphasizes an existing rule, never edited by step-level analysts). Toggle: optimizer.use_skill_aware_reflection (default false; baseline byte-identical when off). - optimizer/appendix.py: protected APPENDIX region (inject/extract/append with dedup), mirrors the slow_update protected-field pattern - optimizer/skill_aware.py: analyst prompt augmentation, appendix_notes parsing, threshold-gated LLM consolidation, and a process-wide runtime switch (configure_skill_aware_reflection) set once by the trainer - gradient/reflect.py: augment error/success analyst prompts at runtime; None-sentinel kwargs resolve from the global switch, so env adapters need no per-benchmark wiring (works for all envs, present and future) - optimizer/skill.py: generalize the protected-region check to (slow_update, appendix); edits inside any protected region are skipped - engine/trainer.py: inject appendix at init, flush per-step EXECUTION_LAPSE notes after the gate settles, optional consolidation - tests: regression suite incl. toggle-off byte-identical guarantee and env-independent global-switch resolution (6/6 passing + live smoke) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:10:08 +00:00
Yif Yang	643346c9f3	Merge pull request #26 from KovaForge/minimax-backend feat: add MiniMax as first-class chat backend Adds skillopt/model/minimax_backend.py (clean port of qwen_backend.py targeting MiniMax-M2.7 via https://api.minimax.io/v1) and registers it in the router, backend_config, and common defaults. Existing backends (openai_chat, claude_chat, qwen_chat, codex_exec, claude_code_exec) remain bit-for-bit unchanged. Verified via 10 import / routing / parity subtests; backward-compat sweep across the 8 shipped configs passes with no regression. A follow-up commit completes the YAML / CLI plumbing that this PR left half-wired (FLATTEN_MAP entries, trainer-level configure_minimax_chat call, and --minimax_* CLI args).	2026-05-31 08:20:39 +00:00
Cuzyoung	00602df9e9	feat(slow-update): add config-controlled gated / force-injected modes Add optimizer.slow_update_gate_with_selection to control how epoch-boundary slow-update guidance is applied: - false (default): force-injected - inject guidance into current & best unconditionally (unchanged behavior). - true: gated - evaluate the slow-update candidate on the selection set and accept/reject via the same validation gate as step-level updates (logic follows the SkillReflection ablation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-31 02:02:23 +00:00
Declan Murphy	c6da31df44	fix: use correct MiniMax endpoint, model name, and add .venv to gitignore	2026-05-31 05:27:50 +08:00
Declan Murphy	e4201074aa	docs: add MiniMax config to default.yaml and .env.example default.yaml: - Add minimax_base_url, minimax_api_key, minimax_model, minimax_temperature, minimax_max_tokens, minimax_enable_thinking settings - Add optimizer_minimax_base_url, target_minimax_base_url per-role overrides - Add optimizer_minimax_api_key, target_minimax_api_key per-role overrides .env.example: - Add MINIMAX_BASE_URL, MINIMAX_API_KEY, MINIMAX_MODEL env var docs	2026-05-31 05:22:35 +08:00
Yif Yang	4f3a9bc055	docs: scope PR #25 gate_metric as opt-in example, not default Move the soft/mixed gate-metric configuration introduced in PR #25 out of the base default config and into a standalone example config so that default SkillOpt runs (and paper reproduction) remain bit-for-bit on the original hard gate. - configs/_base_/default.yaml: drop gate_metric / gate_mixed_weight keys. The trainer's cfg.get("gate_metric", "hard") fallback preserves the original behavior unchanged. - configs/examples/soft_gate.yaml: new standalone reference config with a header explaining when to consider it (small selection split with continuous rewards) and when not to (paper reproduction, large or binary-reward settings). - README.md: add a short "Community-contributed configs" section that clearly flags this as user-contributed and non-default.	2026-05-30 08:09:03 +00:00
Yif Yang	d190bf37c1	Merge pull request #25 from lvbaocheng/feature/gate-soft-metric Add configurable gate metric (hard / soft / mixed) for skill validation Default is `hard`, preserving exact pre-PR behavior — verified by 22 unit assertions on the gate module plus an end-to-end 8-step trainer-trajectory test that produces a bit-for-bit identical accept/reject sequence between the pre-PR and post-PR code paths under `gate_metric: hard`. Paper- reproduction results are unaffected. `soft` and `mixed` are opt-in via `evaluation.gate_metric` in the config and address small-selection-set runs where discrete hard accuracy is too coarse to distinguish candidate skills.	2026-05-30 08:01:39 +00:00
Huangzisu	dbc90bd755	fix(auth): let env vars override yaml for openai_compatible mode The yaml default `azure_openai_auth_mode: azure_cli` was silently overwriting `AZURE_OPENAI_AUTH_MODE` exported by the user, because `configure_clients()` treats any non-empty config value as an explicit override. Switching the three auth_mode defaults (shared / optimizer / target) to "" lets `_clean()` drop them and restores the intended fallback chain: yaml → env var → module default ("azure_cli"). Also update README and .env.example to document the openai_compatible mode introduced in `d5c5b61`, and remove the misleading `OPENAI_API_KEY` snippet — SkillOpt reuses the `AZURE_OPENAI_*` env vars in this mode.	2026-05-30 06:58:05 +00:00
lvbaocheng	5d7875cb2e	Add configurable gate metric (hard / soft / mixed) for skill validation The training gate currently always compares candidate vs. current/best using hard exact-match accuracy. On environments with a small held-out selection set (e.g. 3-6 items) or partial-credit scoring, hard accuracy is too coarse: candidate skills that meaningfully improve per-item soft scores get rejected because the discrete hard count does not move. Add three opt-in metrics so users can pick the one that matches their scoring function: - `gate_metric: hard` — original behavior (default, fully backward compatible). - `gate_metric: soft` — gate on the soft / F1 / partial-credit score. - `gate_metric: mixed` — `(1 - w) * hard + w * soft`, where `w` is set by `gate_mixed_weight` (default 0.5). Changes ------- - `skillopt/evaluation/gate.py`: extend `evaluate_gate` with `cand_soft`, `metric`, and `mixed_weight` keyword arguments; add a pure helper `select_gate_score(hard, soft, metric, mixed_weight)`. Defaults preserve the original `metric="hard"` behavior — existing callers that only pass `cand_hard` keep working unchanged. - `skillopt/evaluation/__init__.py`: export the new helper / type. - `skillopt/engine/trainer.py`: read `evaluation.gate_metric` and `evaluation.gate_mixed_weight` from the config (with safe defaults), pass both metrics into `evaluate_gate`, and project the baseline `current_score` / `best_score` into metric space so subsequent comparisons are consistent. Print the gate metric on the `[6/6 EVALUATE]` line so logs make the decision basis explicit. The selection cache still records both `(hard, soft)` so a metric change on resume is non-destructive. - `configs/_base_/default.yaml`: document and ship the new keys with backward-compatible defaults (`hard`, `0.5`). Backward compatibility ---------------------- - Default config does not change behavior: `gate_metric` defaults to `hard`, exactly matching the previous gate. - `evaluate_gate(...)` keeps its existing positional signature; the new parameters are keyword-only with safe defaults. - `step_record.json` gains optional `gate_metric` and `candidate_gate_score` fields; old records still load. Tested ------ - Unit-tested all three metrics + boundary `mixed_weight` values (0.0 / 1.0) and rejection of unknown metric strings. All six cases pass. - Verified `skillopt.engine.trainer` imports cleanly after the refactor.	2026-05-30 14:45:27 +08:00
hwq	786d57b5cf	Make rollout completion tokens configurable	2026-05-28 09:45:47 +00:00
Cuzyoung	4a1b984d87	refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update - Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts - CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model - Remove best_skill files, keep only initial skills - Fix slow update gate (force write into skill) - Fix SLOW_UPDATE marker stripping - Remove deep_reflect and meta_reflect mechanisms - Update .env.example with export prefix and azure_cli docs - Add endpoint empty validation in azure_openai.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-24 19:15:10 +00:00
CharlesYang030	244e346b83	SkillOpt v0.1.0: initial release - Skill optimization framework with training loop analogy - 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen) - WebUI for browser-based training control - Pluggable architecture for extending benchmarks and backends	2026-05-21 17:22:04 +00:00

12 Commits