microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
Kirill Kostarev	05cdc26beb	Add reviewed task-file flow for Codex sleep runs	2026-06-20 08:58:48 +00:00
DB Lee	21f93c16c7	Add GitHub Copilot backend to SkillOpt-Sleep Add CopilotCliBackend that drives the GitHub Copilot CLI in non-interactive mode (copilot -p ... --output-format json) and parses the JSONL event stream for assistant.message content. Registered as the 'copilot' backend (with aliases) and wired through the CLI, config, experiment harness, and the Copilot MCP server's backend enum. - Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on Windows when responses contain non-cp1252 bytes). - Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and custom instructions disabled, so user MCP servers are not spawned per call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME / SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV. Validated end-to-end on real held-out tasks (researcher persona: 0.42 -> 1.00 lift; gate correctly rejects non-improving edits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 17:25:50 -07:00
Yifan Yang	722ce646d4	feat(sleep): experience replay + dream rollouts in the cycle (opt-in) Wires two consolidation mechanisms into the shipped nightly cycle, both default OFF so existing behavior is unchanged: - dream_rollouts (>1): multi-rollout contrastive reflection per task - recall_k (>0): associative recall of the K most-similar past tasks (from a capped task_archive persisted in state.json) into tonight's dream - dream_factor (>0): synthetic task variants New shared engine module skillopt_sleep/dream.py (recall_similar, dream_augment, dream_consolidate) is called by both the plugin cycle and the experiment harness, so reported numbers exercise the exact shipped code. Built on the existing rollouts_k/sample_id support already in consolidate.py/rollout.py. Validated (5 nights x 10 real tasks/night, full held-out test, GPT-5.5, gated): the gain scales with recall depth on a clean signal — SearchQA recall_k=10 +3.1, recall_k=20 +4.5, full-history reference +5.6; SpreadsheetBench (nano, gate-free) +3.6. Flat within noise on saturated/noisy cells. See docs/sleep/EXPERIENCE_REPLAY.md (+ raw runs under blog_runs/v2_port/). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-15 15:58:27 +00:00
Kirill Kostarev	31715a8b43	Add Codex Desktop transcript harvesting	2026-06-15 10:23:08 +00:00
Yifan Yang	b02ffc2c99	refactor(sleep): decouple engine to top-level skillopt_sleep/ (zero research dep) Open-source-tool / research-code separation: - git mv skillopt/sleep/ -> skillopt_sleep/ (top-level, sibling to the research skillopt/ package). History preserved as renames. - All imports skillopt.sleep.* -> skillopt_sleep.*. - Vendor the validation gate into skillopt_sleep/gate.py (a self-contained copy of skillopt.evaluation.gate). The engine now has ZERO dependency on the research package — verified: grep finds no `from skillopt.` in skillopt_sleep/, and consolidate's gate resolves to skillopt_sleep.gate. - Plugin scripts/commands/skill call `-m skillopt_sleep`. 29 tests pass; `python -m skillopt_sleep` runs standalone. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-08 14:31:52 +00:00

5 Commits