microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
Shunsuke	98d0430bee	refactor: make EnvAdapter.reflect a shared default (fixes dropped reflect kwargs) All six adapters duplicated an identical reflect() that delegates to run_minibatch_reflect. The copies had drifted: OfficeQA/DocVQA silently dropped meta_skill_context and ALFWorld dropped update_mode, so those analysts ran without inputs every other benchmark receives (active under the default use_meta_skill: true). Move the delegation into EnvAdapter.reflect as one default that forwards all kwargs uniformly, and delete the six overrides. reflect is no longer abstract — adapters inherit it and override only for custom logic. Net -225 lines. Behavior change: OfficeQA/DocVQA/ALFWorld reflect now receive the kwargs they previously dropped; the three already-correct benchmarks are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 09:06:00 +00:00
Cuzyoung	ffe581098b	feat(trainer): final-skill val + best promotion; keep best unpolluted by slow_update - slow_update force-inject now writes current_skill ONLY (best_skill stays a faithful val-best snapshot, never receives un-validated slow_update content) - after training, run one val on the final skill; if its gate score beats the incumbent best, promote final to best (updates best_skill/best_step/best_origin) - trainer now evaluates final skill on test itself (reuses best test result when final==best); records final_selection_* and final_test_* in summary.json - spreadsheetbench: head+tail truncate the post-execution verification report at source to fix multi-MB conversation bloat Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 13:03:17 +00:00
Cuzyoung	372fd56c1e	fix(spreadsheetbench)+optimizer: fix verify-feedback bloat, drop optimizer-side truncation, soft-disable gate A. SpreadsheetBench verification-feedback bloat - rollout.py _auto_verify_output: use official _compare_cell_value (was repr() equality, which falsely flagged 5 vs 5.0 / None vs ""); collapse correct-and-empty cells into a count so large sparse answer ranges no longer flood feedback with MBs of None=None noise. - codegen_agent.py _build_eval_feedback: only list WRONG cells, collapse correct ones into a count. Scoring is unaffected (evaluate() is independent); this only fixes the target model's multi-turn solving feedback. B. Remove optimizer-side truncation (bloat source now fixed) - reflect.py: drop _MAX_TRAJ_CHARS cap and all per-field clips. - update_modes.py / clip.py / lr_autonomous.py: describe_item / short_item_summary no longer truncate; raise ranking/lr token budget. - trainer.py _format_step_buffer: full task_ids / target. - slow_update.py: full comparison samples. C. Soft-disable gate - config.py / trainer.py: use_gate=false no longer raises; validation still runs but candidates are force-accepted (new force_accept branch + log). Misc: aggregate.py merge token budget 4096 -> 16384. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 13:03:17 +00:00
hwq	1f75d022a5	y	2026-05-30 15:01:34 +00:00
hwq	786d57b5cf	Make rollout completion tokens configurable	2026-05-28 09:45:47 +00:00
Cuzyoung	f55a26414e	cleanup: remove unused benchmarks, deep_probe, meta_reflect Remove sealqa, babyvision, mathverse, mmrb, swebench envs and configs. Remove deep_probe, deep_reflect, meta_reflect modules and prompts. Remove download_babyvision script. These are not part of the core released benchmarks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-24 19:36:48 +00:00
Cuzyoung	cff7ff6846	fix: rename remaining teacher/student refs, remove .gradio from repo - Fix teacher/student in deep_reflect, meta_reflect, sealqa, babyvision, mathverse, mmrb, swebench envs and prompt templates - Remove .gradio/certificate.pem from tracked files - Add .gradio/ to .gitignore Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-24 19:22:20 +00:00
Cuzyoung	4a1b984d87	refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update - Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts - CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model - Remove best_skill files, keep only initial skills - Fix slow update gate (force write into skill) - Fix SLOW_UPDATE marker stripping - Remove deep_reflect and meta_reflect mechanisms - Update .env.example with export prefix and azure_cli docs - Add endpoint empty validation in azure_openai.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-24 19:15:10 +00:00
CharlesYang030	244e346b83	SkillOpt v0.1.0: initial release - Skill optimization framework with training loop analogy - 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen) - WebUI for browser-based training control - Pluggable architecture for extending benchmarks and backends	2026-05-21 17:22:04 +00:00

9 Commits