microsoft-SkillOpt/skillopt at a0419bfdbbc1130f599055e10324ca4736d36f11 - microsoft-SkillOpt - 网新新思

github/microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

History

Yifan Yang a0419bfdbb feat(sleep): benchmark sweep + report tooling; override-aware reflect prompt

- sweep.py: run many (backend, model, seed, transfer-pair) configs sequentially,
  append each result to JSONL incrementally (resumable, interrupt-safe).
- report.py: render the sweep JSONL into a presented Markdown scorecard with
  direct-improvement and cross-model-transfer tables.
- reflect prompt now tells the optimizer its edits are APPENDED (can't delete the
  base skill text), so on a conflict it must write a forceful OVERRIDE rule.
  Diagnosed from a real failure: thorough-analyst (needs <=1200 chars) kept its
  edits rejected because the base "be exhaustive" line won; a verified override
  ("HARD LIMIT ... supersedes") makes Haiku obey (1194/880 chars -> hard=1.0).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

2026-06-08 14:31:51 +00:00

..

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

Support Qwen chat as optimizer backend

2026-06-01 16:44:49 +08:00

envs/_template: make template instantiable against real EnvAdapter ABC

2026-06-01 20:15:12 +00:00

Add configurable gate metric (hard / soft / mixed) for skill validation

2026-05-30 14:45:27 +08:00

fix(reflect): support continuous reward scores in failure filtering

2026-05-29 19:04:42 +08:00

fix(model): forward Qwen timeout and only set enable_thinking when true

2026-06-07 07:41:35 -07:00

feat(slow-update): add config-controlled gated / force-injected modes

2026-05-31 02:02:23 +00:00

cleanup: remove unused benchmarks, deep_probe, meta_reflect

2026-05-24 19:36:48 +00:00

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

feat(sleep): benchmark sweep + report tooling; override-aware reflect prompt

2026-06-08 14:31:51 +00:00

fix(scoring): use float() instead of int() for continuous reward scores

2026-05-30 07:47:41 +08:00

__init__.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

config.py

Support Qwen chat as optimizer backend

2026-06-01 16:44:49 +08:00

types.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00