microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Author	SHA1	Message	Date
khashayar	1a70e4c9cd	devin harvest: space turns >=5s so single-turn sessions aren't dropped A harvested single-turn Devin session spanned only 1s (reply written 1000ms after the prompt), which the engine's harvest filter conservatively classifies as a <3s headless replay (skillopt_sleep Issue #62) and skips — so a real single-turn session mined 0 tasks. Widen the prompt->reply gap to 5s. With this, an end-to-end dry-run mines the task: "night 1: 1 sessions -> 1 tasks". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 22:03:15 +02:00
khashayar	9799c41461	devin plugin: full schema/tool parity with plugins/copilot Mirror the copilot MCP server: same rich _TOOL_SCHEMA (source, model, tasks_file, target_skill_path, max_sessions, max_tasks, lookback_hours, auto_adopt, json, edit_budget, hour, minute) and generic flag forwarding, plus sleep_schedule / sleep_unschedule. Devin specifics retained: the ATIF-v1.7 harvest step (run before data-reading actions, engine pointed at it via --claude-home, default --source claude) and post-adopt sync into .devin/skills/. Tests + README + rules snippet updated for the 7-tool interface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 21:56:42 +02:00
khashayar	e51eb7c4be	devin plugin: expand ~ in CLAUDE_HOME from env + add tests & ATIF fixture Review fixes: - Path bug: SKILLOPT_DEVIN_CLAUDE_HOME (and SKILLOPT_SLEEP_REPO) read from the env are now wrapped in os.path.expanduser, so the documented "~/..." config no longer passes a literal ~ to --claude-home (which yielded zero mined sessions). expanduser on an absolute default is a no-op. - tests/test_devin_plugin.py: tool-schema completeness, action→subcommand map, backend enum, the CLAUDE_HOME expansion regression, and an ATIF-v1.7 harvest shape test against a bundled fixture. - plugins/devin/fixtures/devin_sample.json: sample ATIF-v1.7 transcript. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 21:49:21 +02:00
khashayar	bec23ed020	Add Devin plugin (plugins/devin): MCP server + ATIF-v1.7 harvest Wires the skillopt_sleep engine into Devin (Cognition) via an MCP server, following the same thin-shell pattern as plugins/copilot. - mcp_server.py: stdlib-only stdio MCP server exposing the standard sleep_* tools (status, dry-run, run, adopt, harvest). REPO_ROOT defaults to ../.. so it finds skillopt_sleep automatically when run from plugins/devin/. - harvest_devin.py: converts Devin ATIF-v1.7 transcripts, agentmemory, and .devin/skills/*/SKILL.md into the Claude Code-compatible JSONL the engine consumes; enriches with taskKey + outcome envelopes (hard test/build signal or judge rubric). Workspace auto-detection; cross-platform paths. - judge.py, mcp-config.example.json, devin-rules.snippet.md, README.md. - plugins/README.md: add Devin to the platform + install tables. No changes to skillopt_sleep; shells out to `python -m skillopt_sleep` like the other plugins. Pure stdlib; default backend mock (no API spend). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 10:42:52 +02:00
carpedkm	889238b234	fix: add SKILLOPT_SLEEP_PYTHON override + lookback_hours first-run fallback Two fixes from issue #57 feedback: 1. run-sleep.sh: support SKILLOPT_SLEEP_PYTHON env var to explicitly set the Python interpreter. Useful on macOS where system Python is 3.9 but a newer Python is available elsewhere (e.g. Codex Desktop's bundled Python 3.12). Applied to both the shared runner and the bundled Claude Code plugin copy. 2. cycle.py: on first run (no prior harvest recorded), apply the lookback_hours config (default 72h) as a time cutoff. Previously, first run scanned the entire transcript history, which could trigger massive LLM mining on users with months of session data. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-20 14:07:50 +00:00
carpedkm	0d648b2580	fix: address codex+gpt-5.5 review findings - harvest: tighten sub-3s filter to also require prompt < 200 chars, avoiding false positives on fast real one-shot questions - openclaw schedule_cmd: add docstring clarifying it schedules the shared engine, not the OpenClaw-native runner Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-20 12:40:34 +00:00
carpedkm	7d36b1d592	fix: address review findings in plugin sync PR - OpenClaw schedule_cmd: pass project as required positional arg - OpenClaw schedule_cmd/unschedule_cmd: unpack Tuple[bool, str] return - OpenClaw schedule_cmd: propagate failure status (return 1 on not ok) - OpenClaw unschedule_cmd: pass project to avoid silent no-op - OpenClaw --minute default: 17 (consistent with engine and MCP) - harvest.py: move datetime import to module level Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-20 12:04:07 +00:00
carpedkm	0be780052a	feat: sync all 4 runtime plugins with full engine surface + fix #52 #58 #62 Bug fixes: - #52: bundle run-sleep.sh in Claude Code plugin + 4-level fallback - #58: add skillopt-sleep console script entry point in pyproject.toml - #62: filter headless claude -p replay sessions from harvest Plugin sync (Claude Code / Codex / Copilot / OpenClaw): - Document all 22 CLI flags, 7 actions, 4 backends across all SKILL.md files - Document config keys (preferences, gate_mode, dream_rollouts, etc.) - Document memory consolidation (evolve_memory / evolve_skill) - Add schedule/unschedule to all plugins - Copilot MCP: expand schema from 3 → 16 params + schedule tools - OpenClaw: add schedule/unschedule subcommands via shared scheduler Tests: - Cross-plugin parity test (prevents future feature drift) - MCP schema completeness test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-20 11:31:09 +00:00
Kirill Kostarev	05cdc26beb	Add reviewed task-file flow for Codex sleep runs	2026-06-20 08:58:48 +00:00
DB Lee	d367ae1eea	docs(plugins): list copilot in the cross-tool backend overview Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 17:38:10 -07:00
DB Lee	2c0980bda3	docs(copilot): correct backend hint in research MCP plugin (openai -> azure_openai) The advertised backend choices in scripts/train.py use 'azure_openai', not 'openai'; align the inputSchema description hint accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 17:25:50 -07:00
DB Lee	21f93c16c7	Add GitHub Copilot backend to SkillOpt-Sleep Add CopilotCliBackend that drives the GitHub Copilot CLI in non-interactive mode (copilot -p ... --output-format json) and parses the JSONL event stream for assistant.message content. Registered as the 'copilot' backend (with aliases) and wired through the CLI, config, experiment harness, and the Copilot MCP server's backend enum. - Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on Windows when responses contain non-cp1252 bytes). - Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and custom instructions disabled, so user MCP servers are not spawned per call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME / SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV. Validated end-to-end on real held-out tasks (researcher persona: 0.42 -> 1.00 lift; gate correctly rejects non-improving edits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 17:25:50 -07:00
DB Lee	5dc894715f	Add SkillOpt research-engine MCP server plugin for Copilot Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools (skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only stdio server, mirroring the existing SkillOpt-Sleep plugin layout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 17:24:00 -07:00
Yifan Yang	b701d9b6d9	docs: move SkillOpt-Sleep into the guide; clean docs/sleep; fix guide link Per maintainer request: - Remove the internal/scratch docs/sleep/ tree (reports, raw logs, blog run JSON, sweep.jsonl) — 23 files — and the root PUBLISHING.md. These were working notes, not reference docs. - Take the dedicated SkillOpt-Sleep content out of the main README (News bullet + section) and host it in the rendered guide instead: new section 9 in docs/guideline.html (deployment companion, the three plugins, opt-in experience replay / dream rollouts) with a sidebar entry. - Fix the README's opening reference so "Documentation & Reproduction Guide" links directly to the rendered GitHub Pages page, not the raw .html source. - Repoint the now-removed docs/sleep links in the plugin READMEs to the guide section. The plugin code (plugins/, skillopt_sleep/) is unchanged; only docs move. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-15 16:20:50 +00:00
Yifan Yang	576f2f8bad	Merge pull request #59 from Elzlxx/feat/openclaw-skillopt-sleep feat(plugins): add OpenClaw shell for SkillOpt-Sleep	2026-06-15 18:26:12 +08:00
Kirill Kostarev	31715a8b43	Add Codex Desktop transcript harvesting	2026-06-15 10:23:08 +00:00
Kirill Kostarev	d31e9d9407	Back up legacy Codex prompt during install	2026-06-15 10:21:30 +00:00
Kirill Kostarev	1953484822	Make Codex integration skill-first	2026-06-15 10:21:30 +00:00
Yifan Yang	86bad36ffe	feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and engine improvements since the initial drop: * Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells; refreshed plugin READMEs and install scripts. * Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule / unschedule the nightly cycle without external cron wiring. * Backend robustness: bounded retry with backoff (no more silent empty-string on transient 429/timeout), content-filter-safe rollout prompt, an output-contract guardrail that rejects edits violating the task's required format, and a per-sample cache key so repeated dream rollouts are independent samples (fixes degenerate single-sample reflection). * consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode controls, TaskRecord.system framing field. Scope: this commit ships only the plugin engine + shells. Research/benchmark harnesses and their data are intentionally not included; the public package has no dependency on them (the one research-evaluator import is now guarded). Marked as an early preview in the README; we'll keep iterating. 99/99 unit tests pass. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-14 16:12:00 +00:00
elzlxx	553446575a	feat(plugins): add OpenClaw shell for SkillOpt-Sleep Adds a thin OpenClaw shell wrapping the SkillOpt-Sleep engine. Enables nightly validation-gated skill improvement cycles for OpenClaw agents. Components: - skillopt_sleep_openclaw.py: DeepSeek V4 Pro + Ollama nomic-embed-text backend, mirroring the Claude/Codex/Copilot backend pattern. - run_sleep.py: CLI entry point supporting dry-run and pre-built task files. - run_sleep_cron.sh: bash wrapper for nightly cron invocation. - slash_sleep.py: /sleep command (status / run / adopt / reject / cost). - config.json: engine config tuned for our stack. - SKILL.md: OpenClaw skill manifest. - tests/: 14 held-out tasks across 3 categories (research-cron, devops, wiki). OpenClaw is the 4th ecosystem in which SkillOpt-Sleep can be deployed, joining Claude Code, Codex, and Copilot. The shell follows the same single-engine / thin-shell pattern as the existing three plugins. End-to-end tested: pipeline runs against real OpenClaw session transcripts, gate correctly rejects non-improvements, staging artifacts land in ~/.skillopt-sleep/staging/<night>/. Cost: ~$0.02/night on DeepSeek V4 Pro.	2026-06-14 23:27:54 +08:00
Yifan Yang	dae974a5e3	chore(sleep): English-only across the engine, plugins, and docs Remove every non-ASCII/CJK character for a professional open-source repo: - harvest.py: drop hardcoded Chinese feedback phrases; add an env-based extensibility hook (SKILLOPT_SLEEP_NEG_FEEDBACK / _POS_FEEDBACK) so any locale can be added without baking one in. Verified with a German example. - rollout.py / consolidate.py: English comments. - README.md section heading + anchor, CONTROLLABLE_DREAMING.md, plugin.json, marketplace.json (also fixed stale path skillopt-sleep-plugin -> plugins/claude-code), SKILL.md: English only. - Remove the internal WAKE_UP_SUMMARY.md note (not user-facing, not referenced). Verified: zero CJK chars remain anywhere; 29 tests pass. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-08 14:31:52 +00:00
Yifan Yang	f9db99853b	feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep. - claude-code/: existing plugin moved here; runner delegates to the shared launcher (fixes repo-root resolution after the move). - codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md + install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces. - copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools, plus mcp-config.example.json and a copilot-instructions snippet. Verified end to end (initialize -> tools/list -> tools/call returns real engine output). - plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep section; pyproject lists skillopt_sleep as a first-class package. Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero dependency on the research package. 29 tests pass; all three shells resolve. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>	2026-06-08 14:31:52 +00:00

22 Commits