microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 22:24:36 +08:00

Author	SHA1	Message	Date
Yifan Yang	2d7e37a395	fix(json_utils): reject prose pseudo-JSON in single quotes/backticks (#82 ) Follow-up to the string-aware brace scan: that change only skipped double-quoted prose, so brace-shaped text in single quotes, backticks, or bare prose (e.g. `{op: delete}`, '{x: 1}') still reached json_repair and was fabricated into a bogus dict — strictly worse than None, since extract_json feeds the optimizer's skill edits. Add a _looks_json_like() guard before repair: a genuine JSON object's first non-space char after `{` is `"` (a key) or `}` (empty). Prose pseudo-objects start with a bare word and are rejected, while legitimate repair targets (trailing commas, unescaped quotes inside string values) all begin with `"` and pass — including objects whose string VALUES contain single quotes or backticks, which must not be rejected. Found by an independent GPT-5.5 re-review of the merged #79 code. Adds regression tests for single-quoted / backticked / bare prose (-> None) and for legitimate objects with quote/backtick string values (still repaired). Tests: 30 pass (+3 skip) without json_repair, 33 pass with it, both clean under -W error::RuntimeWarning. Co-authored-by: Claude <noreply@anthropic.com>	2026-06-23 20:31:39 +08:00
Yifan Yang	14c045f04f	Windows robustness for claude/codex backends (+ hardened JSON fallback) (#79 ) * Robustness for the claude/codex backends on Windows: argv overflow, subprocess encoding, tolerant JSON, test-eval dirs Fixes surfaced running SkillOpt end-to-end on the bundled `claude` backend (local Claude CLI) on Windows. None changes the OpenAI/GPT happy path. 1. skillopt/engine/trainer.py — the final test-eval directory (test_eval_final/) is written to before being created; add os.makedirs(..., exist_ok=True), matching the two sibling test-eval dirs. Without it, summary.json raises FileNotFoundError when a rollout yields zero predictions. 2. skillopt/model/claude_backend.py a. Pass the prompt via stdin (not argv): on Windows the whole command line is capped at ~32 KB and a large optimizer prompt (the success-analyst minibatch carrying several report trajectories) overflows it with [WinError 206], killing the run after retries. b. Pass the system prompt via --append-system-prompt-file (a temp file), not argv. The system prompt here is the skill being optimized, which SkillOpt grows over training; since the ~32 KB cap applies to the SUM of all argv, a grown skill would re-hit [WinError 206] even with the prompt on stdin. c. Pin the subprocess encoding to utf-8 (errors="replace"). With text=True and no encoding=, stdin is encoded with the system codepage; on a zh-CN box (cp936/GBK) a prompt containing an emoji or some Latin-1 characters raises UnicodeEncodeError before the CLI even starts, failing every retry. 3. skillopt/model/codex_backend.py — the same utf-8 encoding pin on its subprocess.run(input=...) call (identical unpinned-encoding pattern). 4. skillopt/utils/json_utils.py — extract_json() returned None for valid- looking JSON that strict json.loads rejects (unescaped ASCII quotes inside CJK string values, trailing commas), silently dropping the analyst's edits on non-schema backends (Claude/Qwen): reflect produces N edits, 0 applied. Add a json_repair fallback, but only on a single unambiguous object — a balanced-brace extractor plus a refuse-on-multiple-objects guard — so a chain-of-thought "scratch + final" response can't make repair silently return the wrong (discarded) object, which would be worse than None (None is detectable and retryable; a wrong-but-valid edit is applied blind). Declare json_repair in requirements.txt and the claude/qwen optional extras so the fallback is actually present (it otherwise no-ops, dropping edits silently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit `dca74a683e`) * fix(json_utils): harden tolerant JSON fallback from PR #77 Follow-up fixes on top of the cherry-picked Windows-robustness change: 1. Make _top_level_brace_objects() fully string-aware in its OUTER scan, not just inside an object. A '{' inside quoted prose (e.g. '"set it to {x}"') no longer starts a candidate object, so extract_json() returns None for prose pseudo-JSON instead of repairing it into a bogus dict — which would be strictly worse than dropping the edit, since extract_json feeds the optimizer's skill edits. 2. Pick the repair candidate BEFORE importing json_repair, so the missing- dependency RuntimeWarning only fires when there is genuinely a single malformed object that could have been repaired. Ordinary no-JSON / prose replies (the common case) now return None silently instead of warning on every call. 3. Resolve dependency-metadata inconsistency: json_repair is optional, so add it to the `all` extra (it was already in `claude`/`qwen`) and demote it from a hard requirement to an optional/commented entry in requirements.txt, matching the project's convention for backend-specific deps. Adds regression tests for prose-with-braces (-> None), no-warning-on-plain- text, single-object repair, and multi-object ambiguity. Existing 22 json tests still pass with and without json_repair installed. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: samuelgoofus-boop <260247789+samuelgoofus-boop@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 19:00:23 +08:00
zq	41be2f1803	fix(scoring): use float() instead of int() for continuous reward scores int() truncates smoothed composite scores (0.0-1.0) to 0, making all continuous reward values appear as failures. This broke SkillOpt training pipelines using SmoothedCompositeReward.	2026-05-30 07:47:41 +08:00
CharlesYang030	244e346b83	SkillOpt v0.1.0: initial release - Skill optimization framework with training loop analogy - 11 benchmarks, 4 model backends (Azure OpenAI, Claude, Codex, Qwen) - WebUI for browser-based training control - Pluggable architecture for extending benchmarks and backends	2026-05-21 17:22:04 +00:00

4 Commits