Split failure reflections into SKILL_DEFECT (body edit) vs EXECUTION_LAPSE
(protected appendix note that re-emphasizes an existing rule, never edited
by step-level analysts). Toggle: optimizer.use_skill_aware_reflection
(default false; baseline byte-identical when off).
- optimizer/appendix.py: protected APPENDIX region (inject/extract/append
with dedup), mirrors the slow_update protected-field pattern
- optimizer/skill_aware.py: analyst prompt augmentation, appendix_notes
parsing, threshold-gated LLM consolidation, and a process-wide runtime
switch (configure_skill_aware_reflection) set once by the trainer
- gradient/reflect.py: augment error/success analyst prompts at runtime;
None-sentinel kwargs resolve from the global switch, so env adapters
need no per-benchmark wiring (works for all envs, present and future)
- optimizer/skill.py: generalize the protected-region check to
(slow_update, appendix); edits inside any protected region are skipped
- engine/trainer.py: inject appendix at init, flush per-step
EXECUTION_LAPSE notes after the gate settles, optional consolidation
- tests: regression suite incl. toggle-off byte-identical guarantee and
env-independent global-switch resolution (6/6 passing + live smoke)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PR #26 added a MiniMax chat backend but left three loose ends that
silently dropped any YAML / CLI configuration of minimax_* keys: only
the environment-variable path worked.
- skillopt/config.py: add 6 model.minimax_* entries to _FLATTEN_MAP so
the keys declared in configs/_base_/default.yaml actually survive
flatten_config() (mirroring the existing model.qwen_chat_* block).
- skillopt/engine/trainer.py: import configure_minimax_chat and call
it alongside configure_qwen_chat, so cfg-supplied credentials,
temperature, max_tokens, and enable_thinking reach the backend. Also
apply cfg["minimax_model"] via set_target_deployment when the active
target backend is minimax_chat.
- scripts/train.py: add 6 --minimax_* CLI flags + the corresponding
_CLI_TO_YAML entries, add 'minimax' / 'minimax_chat' to the --backend
choices, auto-route to target_backend=minimax_chat, and pick the
right default target_model for the new backend.
Default behavior on existing backends (openai, claude, qwen, codex,
claude_code_exec) is unchanged; all 8 shipped configs continue to load
with gate_metric falling back to 'hard' for paper reproduction.
- Skill optimization framework with training loop analogy
- 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen)
- WebUI for browser-based training control
- Pluggable architecture for extending benchmarks and backends