microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 22:24:36 +08:00

Files

Cuzyoung 0dc84162dc feat(optimizer): skill-aware reflection (EmbodiSkill S_app), config-controlled and env-independent

Split failure reflections into SKILL_DEFECT (body edit) vs EXECUTION_LAPSE
(protected appendix note that re-emphasizes an existing rule, never edited
by step-level analysts). Toggle: optimizer.use_skill_aware_reflection
(default false; baseline byte-identical when off).

- optimizer/appendix.py: protected APPENDIX region (inject/extract/append
  with dedup), mirrors the slow_update protected-field pattern
- optimizer/skill_aware.py: analyst prompt augmentation, appendix_notes
  parsing, threshold-gated LLM consolidation, and a process-wide runtime
  switch (configure_skill_aware_reflection) set once by the trainer
- gradient/reflect.py: augment error/success analyst prompts at runtime;
  None-sentinel kwargs resolve from the global switch, so env adapters
  need no per-benchmark wiring (works for all envs, present and future)
- optimizer/skill.py: generalize the protected-region check to
  (slow_update, appendix); edits inside any protected region are skipped
- engine/trainer.py: inject appendix at init, flush per-step
  EXECUTION_LAPSE notes after the gate settles, optional consolidation
- tests: regression suite incl. toggle-off byte-identical guarantee and
  env-independent global-switch resolution (6/6 passing + live smoke)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-06-10 13:10:08 +00:00

__init__.py

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

eval_only.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

run_alfworld.sh

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

run_searchqa.sh

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

run_spreadsheetbench.sh

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

train.py

feat(optimizer): skill-aware reflection (EmbodiSkill S_app), config-controlled and env-independent

2026-06-10 13:10:08 +00:00