Files
microsoft-SkillOpt/pyproject.toml
Yifan Yang 14c045f04f Windows robustness for claude/codex backends (+ hardened JSON fallback) (#79)
* Robustness for the claude/codex backends on Windows: argv overflow, subprocess encoding, tolerant JSON, test-eval dirs

Fixes surfaced running SkillOpt end-to-end on the bundled `claude` backend
(local Claude CLI) on Windows. None changes the OpenAI/GPT happy path.

1. skillopt/engine/trainer.py — the final test-eval directory
   (test_eval_final/) is written to before being created; add
   os.makedirs(..., exist_ok=True), matching the two sibling test-eval dirs.
   Without it, summary.json raises FileNotFoundError when a rollout yields
   zero predictions.

2. skillopt/model/claude_backend.py
   a. Pass the prompt via stdin (not argv): on Windows the whole command line
      is capped at ~32 KB and a large optimizer prompt (the success-analyst
      minibatch carrying several report trajectories) overflows it with
      [WinError 206], killing the run after retries.
   b. Pass the system prompt via --append-system-prompt-file (a temp file),
      not argv. The system prompt here is the skill being optimized, which
      SkillOpt grows over training; since the ~32 KB cap applies to the SUM of
      all argv, a grown skill would re-hit [WinError 206] even with the prompt
      on stdin.
   c. Pin the subprocess encoding to utf-8 (errors="replace"). With text=True
      and no encoding=, stdin is encoded with the system codepage; on a zh-CN
      box (cp936/GBK) a prompt containing an emoji or some Latin-1 characters
      raises UnicodeEncodeError before the CLI even starts, failing every retry.

3. skillopt/model/codex_backend.py — the same utf-8 encoding pin on its
   subprocess.run(input=...) call (identical unpinned-encoding pattern).

4. skillopt/utils/json_utils.py — extract_json() returned None for valid-
   looking JSON that strict json.loads rejects (unescaped ASCII quotes inside
   CJK string values, trailing commas), silently dropping the analyst's edits
   on non-schema backends (Claude/Qwen): reflect produces N edits, 0 applied.
   Add a json_repair fallback, but only on a single unambiguous object — a
   balanced-brace extractor plus a refuse-on-multiple-objects guard — so a
   chain-of-thought "scratch + final" response can't make repair silently
   return the wrong (discarded) object, which would be worse than None (None is
   detectable and retryable; a wrong-but-valid edit is applied blind). Declare
   json_repair in requirements.txt and the claude/qwen optional extras so the
   fallback is actually present (it otherwise no-ops, dropping edits silently).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit dca74a683e)

* fix(json_utils): harden tolerant JSON fallback from PR #77

Follow-up fixes on top of the cherry-picked Windows-robustness change:

1. Make _top_level_brace_objects() fully string-aware in its OUTER scan, not
   just inside an object. A '{' inside quoted prose (e.g. '"set it to {x}"')
   no longer starts a candidate object, so extract_json() returns None for
   prose pseudo-JSON instead of repairing it into a bogus dict — which would
   be strictly worse than dropping the edit, since extract_json feeds the
   optimizer's skill edits.

2. Pick the repair candidate BEFORE importing json_repair, so the missing-
   dependency RuntimeWarning only fires when there is genuinely a single
   malformed object that could have been repaired. Ordinary no-JSON / prose
   replies (the common case) now return None silently instead of warning on
   every call.

3. Resolve dependency-metadata inconsistency: json_repair is optional, so add
   it to the `all` extra (it was already in `claude`/`qwen`) and demote it
   from a hard requirement to an optional/commented entry in requirements.txt,
   matching the project's convention for backend-specific deps.

Adds regression tests for prose-with-braces (-> None), no-warning-on-plain-
text, single-object repair, and multi-object ambiguity. Existing 22 json
tests still pass with and without json_repair installed.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: samuelgoofus-boop <260247789+samuelgoofus-boop@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 19:00:23 +08:00

82 lines
2.4 KiB
TOML

[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "skillopt"
version = "0.1.0"
description = "SkillOpt: Agentic Skill Optimization via Reflective Training Loops"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
authors = [
{name = "SkillOpt Team"},
]
keywords = ["agent", "prompt-optimization", "skill-learning", "LLM", "agentic"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]
dependencies = [
"openai>=1.30.0",
"pyyaml>=6.0",
"numpy>=1.24.0",
"openpyxl>=3.1.0",
"azure-identity>=1.15.0",
"azure-core>=1.30.0",
"httpx>=0.27.0",
]
[project.optional-dependencies]
# Benchmark-specific dependencies
alfworld = ["alfworld>=0.4.0", "gymnasium>=0.29.0"]
# Claude model backend
claude = ["claude-agent-sdk>=0.1.0", "json_repair>=0.61.0"]
# Qwen local model backend (via vLLM)
qwen = ["vllm>=0.4.0", "json_repair>=0.61.0"]
# SearchQA data materialization
searchqa = ["datasets>=2.18.0"]
# Documentation site
docs = ["mkdocs-material>=9.5.0", "mkdocstrings[python]>=0.24.0"]
# WebUI dashboard
webui = ["gradio>=4.0.0"]
# Development tools
dev = ["ruff>=0.4.0", "pytest>=8.0.0"]
# All optional dependencies (except docs/dev/webui)
all = [
"alfworld>=0.4.0",
"gymnasium>=0.29.0",
"claude-agent-sdk>=0.1.0",
"json_repair>=0.61.0",
]
[project.scripts]
skillopt-train = "scripts.train:main"
skillopt-eval = "scripts.eval_only:main"
skillopt-sleep = "skillopt_sleep.__main__:main"
[project.urls]
Homepage = "https://github.com/microsoft/SkillOpt"
Documentation = "https://microsoft.github.io/SkillOpt"
Repository = "https://github.com/microsoft/SkillOpt"
Issues = "https://github.com/microsoft/SkillOpt/issues"
[tool.setuptools.packages.find]
# skillopt* = the research package; skillopt_sleep = the open-source Sleep tool
# (decoupled, zero dependency on the research code).
include = ["skillopt", "skillopt.*", "skillopt_sleep", "skillopt_sleep.*", "scripts*"]
[tool.ruff]
line-length = 120
target-version = "py310"
[tool.ruff.lint]
select = ["E", "F", "I", "W"]
ignore = ["E501"]