Files
microsoft-SkillOpt/skillopt
zq 41be2f1803 fix(scoring): use float() instead of int() for continuous reward scores
int() truncates smoothed composite scores (0.0-1.0) to 0,
making all continuous reward values appear as failures.
This broke SkillOpt training pipelines using SmoothedCompositeReward.
2026-05-30 07:47:41 +08:00
..
2026-05-21 17:22:04 +00:00