microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	4f582d4f6e	test: add template contract checks and refine benchmark docs	2026-06-01 19:39:52 +00:00
copilot-swe-agent[bot]	b3c7d72364	docs: align benchmark guide and templates with real adapter API	2026-06-01 19:38:17 +00:00
copilot-swe-agent[bot]	36284e1bb0	Initial plan	2026-06-01 19:31:30 +00:00
Yifan Yang	fb1a76371d	Merge pull request #29 from LifeIsSoSolong/codex/qwen-chat-optimizer-backend Support qwen_chat as optimizer backend	2026-06-02 03:27:50 +08:00
Yifan Yang	47063e1ceb	Merge pull request #27 from Oxygen56/test/add-core-utility-tests test: add unit test suite for core utility modules	2026-06-02 03:27:26 +08:00
hwq	181d71b737	Release data split manifests	2026-06-01 16:02:14 +00:00
kaikai-macbook	41012e2d5e	Support Qwen chat as optimizer backend	2026-06-01 16:44:49 +08:00
Claude Code Agent	dd8cd993b5	test: add unit test suite for core utility modules Add initial test infrastructure covering: - skillopt/utils/scoring.py (compute_score, skill_hash) - skillopt/utils/json_utils.py (extract_json, extract_json_array) - skillopt/types.py (Edit, Patch dataclass serialization) All tested functions are pure/deterministic with no LLM dependencies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 02:04:22 +08:00
Yif Yang	8ebede0efd	Refine README for clarity on optimization results Removed redundant wording about math benchmarks.	2026-05-31 18:20:00 +08:00
Yif Yang	266fca72ab	docs: clarify optional features and ckpt artifacts	2026-05-31 09:36:25 +00:00
Yif Yang	9265545c45	docs: clarify README and paper-aligned skill artifacts	2026-05-31 09:23:07 +00:00
Yif Yang	b4850ce418	fix(minimax): wire YAML / CLI config through to backend PR #26 added a MiniMax chat backend but left three loose ends that silently dropped any YAML / CLI configuration of minimax_* keys: only the environment-variable path worked. - skillopt/config.py: add 6 model.minimax_* entries to _FLATTEN_MAP so the keys declared in configs/_base_/default.yaml actually survive flatten_config() (mirroring the existing model.qwen_chat_* block). - skillopt/engine/trainer.py: import configure_minimax_chat and call it alongside configure_qwen_chat, so cfg-supplied credentials, temperature, max_tokens, and enable_thinking reach the backend. Also apply cfg["minimax_model"] via set_target_deployment when the active target backend is minimax_chat. - scripts/train.py: add 6 --minimax_* CLI flags + the corresponding _CLI_TO_YAML entries, add 'minimax' / 'minimax_chat' to the --backend choices, auto-route to target_backend=minimax_chat, and pick the right default target_model for the new backend. Default behavior on existing backends (openai, claude, qwen, codex, claude_code_exec) is unchanged; all 8 shipped configs continue to load with gate_metric falling back to 'hard' for paper reproduction.	2026-05-31 08:22:20 +00:00
Yif Yang	643346c9f3	Merge pull request #26 from KovaForge/minimax-backend feat: add MiniMax as first-class chat backend Adds skillopt/model/minimax_backend.py (clean port of qwen_backend.py targeting MiniMax-M2.7 via https://api.minimax.io/v1) and registers it in the router, backend_config, and common defaults. Existing backends (openai_chat, claude_chat, qwen_chat, codex_exec, claude_code_exec) remain bit-for-bit unchanged. Verified via 10 import / routing / parity subtests; backward-compat sweep across the 8 shipped configs passes with no regression. A follow-up commit completes the YAML / CLI plumbing that this PR left half-wired (FLATTEN_MAP entries, trainer-level configure_minimax_chat call, and --minimax_* CLI args).	2026-05-31 08:20:39 +00:00
Cuzyoung	00602df9e9	feat(slow-update): add config-controlled gated / force-injected modes Add optimizer.slow_update_gate_with_selection to control how epoch-boundary slow-update guidance is applied: - false (default): force-injected - inject guidance into current & best unconditionally (unchanged behavior). - true: gated - evaluate the slow-update candidate on the selection set and accept/reject via the same validation gate as step-level updates (logic follows the SkillReflection ablation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-31 02:02:23 +00:00
Declan Murphy	c6da31df44	fix: use correct MiniMax endpoint, model name, and add .venv to gitignore	2026-05-31 05:27:50 +08:00
Declan Murphy	e4201074aa	docs: add MiniMax config to default.yaml and .env.example default.yaml: - Add minimax_base_url, minimax_api_key, minimax_model, minimax_temperature, minimax_max_tokens, minimax_enable_thinking settings - Add optimizer_minimax_base_url, target_minimax_base_url per-role overrides - Add optimizer_minimax_api_key, target_minimax_api_key per-role overrides .env.example: - Add MINIMAX_BASE_URL, MINIMAX_API_KEY, MINIMAX_MODEL env var docs	2026-05-31 05:22:35 +08:00
Declan Murphy	309ea64ff4	feat: integrate MiniMax into model router, backend config, and common common.py: - Add minimax_chat → MiniMax/MiniMax-Text-01 to _BACKEND_DEFAULT_MODELS - Add minimax/minimax_chat aliases to _BACKEND_ALIASES backend_config.py: - Add minimax_chat to set_optimizer_backend() valid set - Add minimax_chat to set_target_backend() valid set - Add minimax_chat to is_optimizer_chat_backend() - Add minimax_chat to is_target_chat_backend() __init__.py: - Import minimax_backend as _minimax - Add minimax_chat to set_backend() legacy handler - Add minimax_chat to get_backend_name() reporting - Route chat_target() and chat_target_messages() to _minimax - Update NotImplementedError messages to list minimax_chat - Aggregate _minimax into get_token_summary() - Add _minimax.reset_token_tracker() - Add configure_minimax_chat() delegator - Add _minimax to set_reasoning_effort() and set_target_deployment()	2026-05-31 05:22:33 +08:00
Declan Murphy	d224d425f9	feat: add MiniMax chat backend module Port qwen_backend.py pattern to minimax_backend.py as a new OpenAI-compatible urllib-based backend. Includes: - BASE_URL defaulting to https://api.minimax.chat/v1 - API_KEY, TIMEOUT_SECONDS, MAX_TOKENS, TEMPERATURE env vars - ENABLE_THINKING support (MiniMax thinking mode) - configure_minimax_chat() runtime configurator - chat_target() and chat_target_messages() functions - TokenTracker integration and get_token_summary() - set_target_deployment() support - Default model: MiniMax/MiniMax-Text-01	2026-05-31 05:22:29 +08:00
hwq	42e555d28e	Update eval-only README example	2026-05-30 15:28:17 +00:00
hwq	933c0a4ab5	Add GPT-5.5 benchmark skills	2026-05-30 15:15:15 +00:00
hwq	1f75d022a5	y	2026-05-30 15:01:34 +00:00
Yif Yang	4f3a9bc055	docs: scope PR #25 gate_metric as opt-in example, not default Move the soft/mixed gate-metric configuration introduced in PR #25 out of the base default config and into a standalone example config so that default SkillOpt runs (and paper reproduction) remain bit-for-bit on the original hard gate. - configs/_base_/default.yaml: drop gate_metric / gate_mixed_weight keys. The trainer's cfg.get("gate_metric", "hard") fallback preserves the original behavior unchanged. - configs/examples/soft_gate.yaml: new standalone reference config with a header explaining when to consider it (small selection split with continuous rewards) and when not to (paper reproduction, large or binary-reward settings). - README.md: add a short "Community-contributed configs" section that clearly flags this as user-contributed and non-default.	2026-05-30 08:09:03 +00:00
Yif Yang	d190bf37c1	Merge pull request #25 from lvbaocheng/feature/gate-soft-metric Add configurable gate metric (hard / soft / mixed) for skill validation Default is `hard`, preserving exact pre-PR behavior — verified by 22 unit assertions on the gate module plus an end-to-end 8-step trainer-trajectory test that produces a bit-for-bit identical accept/reject sequence between the pre-PR and post-PR code paths under `gate_metric: hard`. Paper- reproduction results are unaffected. `soft` and `mixed` are opt-in via `evaluation.gate_metric` in the config and address small-selection-set runs where discrete hard accuracy is too coarse to distinguish candidate skills.	2026-05-30 08:01:39 +00:00
Yif Yang	02695bd813	Merge pull request #24 from lvbaocheng/fix/claude-cli-effort-flag fix(claude): use --effort instead of deprecated --thinking flag	2026-05-30 15:31:00 +08:00
Yif Yang	cf287cb608	Merge pull request #20 from 1s1x/fix-continuous-reward-scores fix: support continuous reward scores (int truncation + falsy float)	2026-05-30 15:30:15 +08:00
Huangzisu	dbc90bd755	fix(auth): let env vars override yaml for openai_compatible mode The yaml default `azure_openai_auth_mode: azure_cli` was silently overwriting `AZURE_OPENAI_AUTH_MODE` exported by the user, because `configure_clients()` treats any non-empty config value as an explicit override. Switching the three auth_mode defaults (shared / optimizer / target) to "" lets `_clean()` drop them and restores the intended fallback chain: yaml → env var → module default ("azure_cli"). Also update README and .env.example to document the openai_compatible mode introduced in `d5c5b61`, and remove the misleading `OPENAI_API_KEY` snippet — SkillOpt reuses the `AZURE_OPENAI_*` env vars in this mode.	2026-05-30 06:58:05 +00:00
lvbaocheng	5d7875cb2e	Add configurable gate metric (hard / soft / mixed) for skill validation The training gate currently always compares candidate vs. current/best using hard exact-match accuracy. On environments with a small held-out selection set (e.g. 3-6 items) or partial-credit scoring, hard accuracy is too coarse: candidate skills that meaningfully improve per-item soft scores get rejected because the discrete hard count does not move. Add three opt-in metrics so users can pick the one that matches their scoring function: - `gate_metric: hard` — original behavior (default, fully backward compatible). - `gate_metric: soft` — gate on the soft / F1 / partial-credit score. - `gate_metric: mixed` — `(1 - w) * hard + w * soft`, where `w` is set by `gate_mixed_weight` (default 0.5). Changes ------- - `skillopt/evaluation/gate.py`: extend `evaluate_gate` with `cand_soft`, `metric`, and `mixed_weight` keyword arguments; add a pure helper `select_gate_score(hard, soft, metric, mixed_weight)`. Defaults preserve the original `metric="hard"` behavior — existing callers that only pass `cand_hard` keep working unchanged. - `skillopt/evaluation/__init__.py`: export the new helper / type. - `skillopt/engine/trainer.py`: read `evaluation.gate_metric` and `evaluation.gate_mixed_weight` from the config (with safe defaults), pass both metrics into `evaluate_gate`, and project the baseline `current_score` / `best_score` into metric space so subsequent comparisons are consistent. Print the gate metric on the `[6/6 EVALUATE]` line so logs make the decision basis explicit. The selection cache still records both `(hard, soft)` so a metric change on resume is non-destructive. - `configs/_base_/default.yaml`: document and ship the new keys with backward-compatible defaults (`hard`, `0.5`). Backward compatibility ---------------------- - Default config does not change behavior: `gate_metric` defaults to `hard`, exactly matching the previous gate. - `evaluate_gate(...)` keeps its existing positional signature; the new parameters are keyword-only with safe defaults. - `step_record.json` gains optional `gate_metric` and `candidate_gate_score` fields; old records still load. Tested ------ - Unit-tested all three metrics + boundary `mixed_weight` values (0.0 / 1.0) and rejection of unknown metric strings. All six cases pass. - Verified `skillopt.engine.trainer` imports cleanly after the refactor.	2026-05-30 14:45:27 +08:00
lvbaocheng	2532043d25	fix(claude): use --effort instead of deprecated --thinking flag Claude Code CLI v2.x renamed the flag; passing --thinking low causes all rollout calls to fail on CLI 2.1.87+. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-30 11:24:13 +08:00
zq	41be2f1803	fix(scoring): use float() instead of int() for continuous reward scores int() truncates smoothed composite scores (0.0-1.0) to 0, making all continuous reward values appear as failures. This broke SkillOpt training pipelines using SmoothedCompositeReward.	2026-05-30 07:47:41 +08:00
zq	a62ec857f1	fix(reflect): support continuous reward scores in failure filtering not r.get("hard") treats non-zero floats as success. Add explicit float threshold check (< 1e-9). Backward compatible with binary hard=0/1.	2026-05-29 19:04:42 +08:00
zq	afb552008b	fix(trainer): support continuous reward scores in bucket aggregation int() truncates any float in [0,1) to 0. Replace with float(). Also fix falsy float check in failure detection. Backward compatible with binary hard=0/1.	2026-05-29 19:03:52 +08:00
Yif Yang	75b5c7f31c	Merge pull request #16 from guilhermeleste/feat/pioneer-ai-provider-integration Add OpenAI-compatible backend support for Pioneer.ai and other providers	2026-05-29 10:14:32 +08:00
Yif Yang	74ea3a1a8f	Merge pull request #18 from yong2bba/docs/custom-env-smoke docs: add local environment smoke test guide	2026-05-29 10:12:55 +08:00
yongjin	657b987de6	docs: add local environment smoke test guide	2026-05-29 09:26:38 +09:00
hwq	2a40aa3c98	Add SearchQA id split	2026-05-28 11:29:59 +00:00
hwq	786d57b5cf	Make rollout completion tokens configurable	2026-05-28 09:45:47 +00:00
guilhermeleste	d5c5b61830	Add OpenAI-compatible backend support for Pioneer.ai and other providers - Add 'openai_compatible', 'compat', and 'openai' auth modes to azure_openai.py - Modify _make_client() to use OpenAI client (not AzureOpenAI) for compatible endpoints - Update type hints to support both AzureOpenAI and OpenAI clients - Auto-configure API version sentinel when using compatible modes - Add .env template for Pioneer.ai configuration This allows users to use Pioneer.ai or any OpenAI-compatible API endpoint as both optimizer and target backend without requiring Azure OpenAI. Resolves: Support for non-Azure OpenAI-compatible providers	2026-05-28 05:54:43 -03:00
Cuzyoung	99212e3956	docs: remove Star History section for now Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 08:12:51 +00:00
Cuzyoung	fc54c44e93	docs: add Star History chart to README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 08:10:16 +00:00
Yif Yang	48adf5a69f	Update citation format in README.md	2026-05-26 02:56:58 +08:00
Yif Yang	b11e6dcfb9	Enhance training description in README Updated README to include '(mini-)batchsize' in the training description.	2026-05-26 02:35:10 +08:00
Yif Yang	4c1b74fce2	Update BibTeX entry in index.html	2026-05-25 14:30:01 +08:00
Yif Yang	db6443384a	Update BibTeX entry for SkillOpt publication	2026-05-25 14:28:13 +08:00
Huangzisu	2c7d9074fb	update webpage for arxiv link	2026-05-25 05:32:04 +00:00
Yif Yang	c98bcdd5b3	Update README.md	2026-05-25 13:27:40 +08:00
Yif Yang	0f6db9afc4	Update README.md	2026-05-25 13:26:55 +08:00
Yif Yang	5a36ac35ae	Merge pull request #7 from microsoft/users/GitHubPolicyService/a41a3ce1-e5a1-4e18-810b-cfb8d2d21c29 Adding Microsoft SECURITY.MD	2026-05-25 13:09:26 +08:00
Lliar-liar	5f4b228543	Soften average gain column styling	2026-05-24 19:45:10 +00:00
Lliar-liar	a9cad7a125	Use official arXiv logomark	2026-05-24 19:43:19 +00:00
Lliar-liar	5e968115f5	Align citation section with SkillLens	2026-05-24 19:39:16 +00:00

1 2

87 Commits