microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

Tanmay9223 fccc21f3f6 test(sleep): add verifier-discipline stress test (closes #67 )

Add a regression test to ensure the validation gate correctly rejects
reward-hacking skill edits. It has been observed that optimizers
sometimes propose shortcuts that improve train/replay metrics but fail
to improve held-out behavior. This test codifies that the gate blocks
such artifacts.

Add TestVerifierDiscipline to the test_sleep_engine.py suite:
- Create MockRewardHackingBackend that simulates a reward-hacking rule
  which passes the train set but degrades the held-out tasks.
- Assert that the proposed edit is rejected by the gate.

2026-06-30 13:04:22 +05:30

__init__.py

test: add unit test suite for core utility modules

2026-06-01 02:04:22 +08:00

test_alfworld_paths.py

Fix ALFWorld gamefile paths relative to ALFWORLD_DATA

2026-06-23 10:32:38 +00:00

test_devin_plugin.py

devin plugin: full schema/tool parity with plugins/copilot

2026-06-25 21:56:42 +02:00

test_json_utils.py

fix(json_utils): reject prose pseudo-JSON in single quotes/backticks (#82 )