mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
int() truncates smoothed composite scores (0.0-1.0) to 0, making all continuous reward values appear as failures. This broke SkillOpt training pipelines using SmoothedCompositeReward.