fix(reflect): support continuous reward scores in failure filtering

not r.get("hard") treats non-zero floats as success. Add explicit float threshold check (< 1e-9). Backward compatible with binary hard=0/1.
2026-07-03 14:02:58 +08:00 · 2026-05-29 19:04:42 +08:00
parent afb552008b
commit a62ec857f1
1 changed files with 588 additions and 588 deletions
--- a/skillopt/gradient/reflect.py
+++ b/skillopt/gradient/reflect.py