microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
Yifan Yang	14c045f04f	Windows robustness for claude/codex backends (+ hardened JSON fallback) (#79 ) * Robustness for the claude/codex backends on Windows: argv overflow, subprocess encoding, tolerant JSON, test-eval dirs Fixes surfaced running SkillOpt end-to-end on the bundled `claude` backend (local Claude CLI) on Windows. None changes the OpenAI/GPT happy path. 1. skillopt/engine/trainer.py — the final test-eval directory (test_eval_final/) is written to before being created; add os.makedirs(..., exist_ok=True), matching the two sibling test-eval dirs. Without it, summary.json raises FileNotFoundError when a rollout yields zero predictions. 2. skillopt/model/claude_backend.py a. Pass the prompt via stdin (not argv): on Windows the whole command line is capped at ~32 KB and a large optimizer prompt (the success-analyst minibatch carrying several report trajectories) overflows it with [WinError 206], killing the run after retries. b. Pass the system prompt via --append-system-prompt-file (a temp file), not argv. The system prompt here is the skill being optimized, which SkillOpt grows over training; since the ~32 KB cap applies to the SUM of all argv, a grown skill would re-hit [WinError 206] even with the prompt on stdin. c. Pin the subprocess encoding to utf-8 (errors="replace"). With text=True and no encoding=, stdin is encoded with the system codepage; on a zh-CN box (cp936/GBK) a prompt containing an emoji or some Latin-1 characters raises UnicodeEncodeError before the CLI even starts, failing every retry. 3. skillopt/model/codex_backend.py — the same utf-8 encoding pin on its subprocess.run(input=...) call (identical unpinned-encoding pattern). 4. skillopt/utils/json_utils.py — extract_json() returned None for valid- looking JSON that strict json.loads rejects (unescaped ASCII quotes inside CJK string values, trailing commas), silently dropping the analyst's edits on non-schema backends (Claude/Qwen): reflect produces N edits, 0 applied. Add a json_repair fallback, but only on a single unambiguous object — a balanced-brace extractor plus a refuse-on-multiple-objects guard — so a chain-of-thought "scratch + final" response can't make repair silently return the wrong (discarded) object, which would be worse than None (None is detectable and retryable; a wrong-but-valid edit is applied blind). Declare json_repair in requirements.txt and the claude/qwen optional extras so the fallback is actually present (it otherwise no-ops, dropping edits silently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit `dca74a683e`) * fix(json_utils): harden tolerant JSON fallback from PR #77 Follow-up fixes on top of the cherry-picked Windows-robustness change: 1. Make _top_level_brace_objects() fully string-aware in its OUTER scan, not just inside an object. A '{' inside quoted prose (e.g. '"set it to {x}"') no longer starts a candidate object, so extract_json() returns None for prose pseudo-JSON instead of repairing it into a bogus dict — which would be strictly worse than dropping the edit, since extract_json feeds the optimizer's skill edits. 2. Pick the repair candidate BEFORE importing json_repair, so the missing- dependency RuntimeWarning only fires when there is genuinely a single malformed object that could have been repaired. Ordinary no-JSON / prose replies (the common case) now return None silently instead of warning on every call. 3. Resolve dependency-metadata inconsistency: json_repair is optional, so add it to the `all` extra (it was already in `claude`/`qwen`) and demote it from a hard requirement to an optional/commented entry in requirements.txt, matching the project's convention for backend-specific deps. Adds regression tests for prose-with-braces (-> None), no-warning-on-plain- text, single-object repair, and multi-object ambiguity. Existing 22 json tests still pass with and without json_repair installed. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: samuelgoofus-boop <260247789+samuelgoofus-boop@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 19:00:23 +08:00
Matt Van Horn	c31c50be51	fix(model): forward Qwen timeout and only set enable_thinking when true Two bugs made local vLLM targets score acc=0.000: the router did not forward 'timeout' to the Qwen backend (so runs used the 300s default), and qwen_backend always injected chat_template_kwargs.enable_thinking, which non-Qwen vLLM servers reject or answer with <think> output and no <answer> tag. Forward timeout and only set the field when enabled. Closes #28 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:41:35 -07:00
kaikai-macbook	41012e2d5e	Support Qwen chat as optimizer backend	2026-06-01 16:44:49 +08:00
Declan Murphy	c6da31df44	fix: use correct MiniMax endpoint, model name, and add .venv to gitignore	2026-05-31 05:27:50 +08:00
Declan Murphy	309ea64ff4	feat: integrate MiniMax into model router, backend config, and common common.py: - Add minimax_chat → MiniMax/MiniMax-Text-01 to _BACKEND_DEFAULT_MODELS - Add minimax/minimax_chat aliases to _BACKEND_ALIASES backend_config.py: - Add minimax_chat to set_optimizer_backend() valid set - Add minimax_chat to set_target_backend() valid set - Add minimax_chat to is_optimizer_chat_backend() - Add minimax_chat to is_target_chat_backend() __init__.py: - Import minimax_backend as _minimax - Add minimax_chat to set_backend() legacy handler - Add minimax_chat to get_backend_name() reporting - Route chat_target() and chat_target_messages() to _minimax - Update NotImplementedError messages to list minimax_chat - Aggregate _minimax into get_token_summary() - Add _minimax.reset_token_tracker() - Add configure_minimax_chat() delegator - Add _minimax to set_reasoning_effort() and set_target_deployment()	2026-05-31 05:22:33 +08:00
Declan Murphy	d224d425f9	feat: add MiniMax chat backend module Port qwen_backend.py pattern to minimax_backend.py as a new OpenAI-compatible urllib-based backend. Includes: - BASE_URL defaulting to https://api.minimax.chat/v1 - API_KEY, TIMEOUT_SECONDS, MAX_TOKENS, TEMPERATURE env vars - ENABLE_THINKING support (MiniMax thinking mode) - configure_minimax_chat() runtime configurator - chat_target() and chat_target_messages() functions - TokenTracker integration and get_token_summary() - set_target_deployment() support - Default model: MiniMax/MiniMax-Text-01	2026-05-31 05:22:29 +08:00
lvbaocheng	2532043d25	fix(claude): use --effort instead of deprecated --thinking flag Claude Code CLI v2.x renamed the flag; passing --thinking low causes all rollout calls to fail on CLI 2.1.87+. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-30 11:24:13 +08:00
guilhermeleste	d5c5b61830	Add OpenAI-compatible backend support for Pioneer.ai and other providers - Add 'openai_compatible', 'compat', and 'openai' auth modes to azure_openai.py - Modify _make_client() to use OpenAI client (not AzureOpenAI) for compatible endpoints - Update type hints to support both AzureOpenAI and OpenAI clients - Auto-configure API version sentinel when using compatible modes - Add .env template for Pioneer.ai configuration This allows users to use Pioneer.ai or any OpenAI-compatible API endpoint as both optimizer and target backend without requiring Azure OpenAI. Resolves: Support for non-Azure OpenAI-compatible providers	2026-05-28 05:54:43 -03:00
Cuzyoung	4a1b984d87	refactor: rename teacher/student to optimizer/target, remove best skills, fix slow update - Rename teacher -> optimizer, student -> target across all code, configs, docs, prompts - CLI: --teacher_model -> --optimizer_model, --student_model -> --target_model - Remove best_skill files, keep only initial skills - Fix slow update gate (force write into skill) - Fix SLOW_UPDATE marker stripping - Remove deep_reflect and meta_reflect mechanisms - Update .env.example with export prefix and azure_cli docs - Add endpoint empty validation in azure_openai.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-24 19:15:10 +00:00
CharlesYang030	244e346b83	SkillOpt v0.1.0: initial release - Skill optimization framework with training loop analogy - 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen) - WebUI for browser-based training control - Pluggable architecture for extending benchmarks and backends	2026-05-21 17:22:04 +00:00

10 Commits