microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

zq 41be2f1803 fix(scoring): use float() instead of int() for continuous reward scores

int() truncates smoothed composite scores (0.0-1.0) to 0,
making all continuous reward values appear as failures.
This broke SkillOpt training pipelines using SmoothedCompositeReward.

2026-05-30 07:47:41 +08:00

datasets

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

engine

fix(trainer): support continuous reward scores in bucket aggregation

2026-05-29 19:03:52 +08:00

envs

Make rollout completion tokens configurable

2026-05-28 09:45:47 +00:00

evaluation

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

gradient

fix(reflect): support continuous reward scores in failure filtering

2026-05-29 19:04:42 +08:00

model

Add OpenAI-compatible backend support for Pioneer.ai and other providers

2026-05-28 05:54:43 -03:00

optimizer

cleanup: remove unused benchmarks, deep_probe, meta_reflect

2026-05-24 19:36:48 +00:00

prompts

cleanup: remove unused benchmarks, deep_probe, meta_reflect

2026-05-24 19:36:48 +00:00

scheduler

SkillOpt v0.1.0: initial release

2026-05-21 17:22:04 +00:00

utils

fix(scoring): use float() instead of int() for continuous reward scores

2026-05-30 07:47:41 +08:00

__init__.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

config.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00

types.py

refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update

2026-05-24 19:15:10 +00:00