Commit Graph

8 Commits

Author SHA1 Message Date
zq
afb552008b fix(trainer): support continuous reward scores in bucket aggregation
int() truncates any float in [0,1) to 0. Replace with float().
Also fix falsy float check in failure detection.
Backward compatible with binary hard=0/1.
2026-05-29 19:03:52 +08:00
Yif Yang
75b5c7f31c Merge pull request #16 from guilhermeleste/feat/pioneer-ai-provider-integration
Add OpenAI-compatible backend support for Pioneer.ai and other providers
2026-05-29 10:14:32 +08:00
hwq
786d57b5cf Make rollout completion tokens configurable 2026-05-28 09:45:47 +00:00
guilhermeleste
d5c5b61830 Add OpenAI-compatible backend support for Pioneer.ai and other providers
- Add 'openai_compatible', 'compat', and 'openai' auth modes to azure_openai.py
- Modify _make_client() to use OpenAI client (not AzureOpenAI) for compatible endpoints
- Update type hints to support both AzureOpenAI and OpenAI clients
- Auto-configure API version sentinel when using compatible modes
- Add .env template for Pioneer.ai configuration

This allows users to use Pioneer.ai or any OpenAI-compatible API endpoint
as both optimizer and target backend without requiring Azure OpenAI.

Resolves: Support for non-Azure OpenAI-compatible providers
2026-05-28 05:54:43 -03:00
Cuzyoung
f55a26414e cleanup: remove unused benchmarks, deep_probe, meta_reflect
Remove sealqa, babyvision, mathverse, mmrb, swebench envs and configs.
Remove deep_probe, deep_reflect, meta_reflect modules and prompts.
Remove download_babyvision script.
These are not part of the core released benchmarks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:36:48 +00:00
Cuzyoung
cff7ff6846 fix: rename remaining teacher/student refs, remove .gradio from repo
- Fix teacher/student in deep_reflect, meta_reflect, sealqa, babyvision,
  mathverse, mmrb, swebench envs and prompt templates
- Remove .gradio/certificate.pem from tracked files
- Add .gradio/ to .gitignore

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:22:20 +00:00
Cuzyoung
4a1b984d87 refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update
- Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts
- CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model
- Remove best_skill files, keep only initial skills
- Fix slow update gate (force write into skill)
- Fix SLOW_UPDATE marker stripping
- Remove deep_reflect and meta_reflect mechanisms
- Update .env.example with export prefix and azure_cli docs
- Add endpoint empty validation in azure_openai.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:15:10 +00:00
CharlesYang030
244e346b83 SkillOpt v0.1.0: initial release
- Skill optimization framework with training loop analogy
- 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen)
- WebUI for browser-based training control
- Pluggable architecture for extending benchmarks and backends
2026-05-21 17:22:04 +00:00