22 Commits

Author SHA1 Message Date
khashayar
1a70e4c9cd devin harvest: space turns >=5s so single-turn sessions aren't dropped
A harvested single-turn Devin session spanned only 1s (reply written 1000ms
after the prompt), which the engine's harvest filter conservatively classifies
as a <3s headless replay (skillopt_sleep Issue #62) and skips — so a real
single-turn session mined 0 tasks. Widen the prompt->reply gap to 5s. With this,
an end-to-end dry-run mines the task: "night 1: 1 sessions -> 1 tasks".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 22:03:15 +02:00
khashayar
9799c41461 devin plugin: full schema/tool parity with plugins/copilot
Mirror the copilot MCP server: same rich _TOOL_SCHEMA (source, model,
tasks_file, target_skill_path, max_sessions, max_tasks, lookback_hours,
auto_adopt, json, edit_budget, hour, minute) and generic flag forwarding, plus
sleep_schedule / sleep_unschedule. Devin specifics retained: the ATIF-v1.7
harvest step (run before data-reading actions, engine pointed at it via
--claude-home, default --source claude) and post-adopt sync into .devin/skills/.
Tests + README + rules snippet updated for the 7-tool interface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 21:56:42 +02:00
khashayar
e51eb7c4be devin plugin: expand ~ in CLAUDE_HOME from env + add tests & ATIF fixture
Review fixes:
- Path bug: SKILLOPT_DEVIN_CLAUDE_HOME (and SKILLOPT_SLEEP_REPO) read from the
  env are now wrapped in os.path.expanduser, so the documented "~/..." config
  no longer passes a literal ~ to --claude-home (which yielded zero mined
  sessions). expanduser on an absolute default is a no-op.
- tests/test_devin_plugin.py: tool-schema completeness, action→subcommand map,
  backend enum, the CLAUDE_HOME expansion regression, and an ATIF-v1.7 harvest
  shape test against a bundled fixture.
- plugins/devin/fixtures/devin_sample.json: sample ATIF-v1.7 transcript.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 21:49:21 +02:00
khashayar
bec23ed020 Add Devin plugin (plugins/devin): MCP server + ATIF-v1.7 harvest
Wires the skillopt_sleep engine into Devin (Cognition) via an MCP server,
following the same thin-shell pattern as plugins/copilot.

- mcp_server.py: stdlib-only stdio MCP server exposing the standard sleep_*
  tools (status, dry-run, run, adopt, harvest). REPO_ROOT defaults to ../.. so
  it finds skillopt_sleep automatically when run from plugins/devin/.
- harvest_devin.py: converts Devin ATIF-v1.7 transcripts, agentmemory, and
  .devin/skills/*/SKILL.md into the Claude Code-compatible JSONL the engine
  consumes; enriches with taskKey + outcome envelopes (hard test/build signal
  or judge rubric). Workspace auto-detection; cross-platform paths.
- judge.py, mcp-config.example.json, devin-rules.snippet.md, README.md.
- plugins/README.md: add Devin to the platform + install tables.

No changes to skillopt_sleep; shells out to `python -m skillopt_sleep` like the
other plugins. Pure stdlib; default backend mock (no API spend).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 10:42:52 +02:00
carpedkm
889238b234 fix: add SKILLOPT_SLEEP_PYTHON override + lookback_hours first-run fallback
Two fixes from issue #57 feedback:

1. run-sleep.sh: support SKILLOPT_SLEEP_PYTHON env var to explicitly set
   the Python interpreter. Useful on macOS where system Python is 3.9 but
   a newer Python is available elsewhere (e.g. Codex Desktop's bundled
   Python 3.12). Applied to both the shared runner and the bundled
   Claude Code plugin copy.

2. cycle.py: on first run (no prior harvest recorded), apply the
   lookback_hours config (default 72h) as a time cutoff. Previously,
   first run scanned the entire transcript history, which could trigger
   massive LLM mining on users with months of session data.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-20 14:07:50 +00:00
carpedkm
0d648b2580 fix: address codex+gpt-5.5 review findings
- harvest: tighten sub-3s filter to also require prompt < 200 chars,
  avoiding false positives on fast real one-shot questions
- openclaw schedule_cmd: add docstring clarifying it schedules the
  shared engine, not the OpenClaw-native runner

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-20 12:40:34 +00:00
carpedkm
7d36b1d592 fix: address review findings in plugin sync PR
- OpenClaw schedule_cmd: pass project as required positional arg
- OpenClaw schedule_cmd/unschedule_cmd: unpack Tuple[bool, str] return
- OpenClaw schedule_cmd: propagate failure status (return 1 on not ok)
- OpenClaw unschedule_cmd: pass project to avoid silent no-op
- OpenClaw --minute default: 17 (consistent with engine and MCP)
- harvest.py: move datetime import to module level

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-20 12:04:07 +00:00
carpedkm
0be780052a feat: sync all 4 runtime plugins with full engine surface + fix #52 #58 #62
Bug fixes:
- #52: bundle run-sleep.sh in Claude Code plugin + 4-level fallback
- #58: add skillopt-sleep console script entry point in pyproject.toml
- #62: filter headless claude -p replay sessions from harvest

Plugin sync (Claude Code / Codex / Copilot / OpenClaw):
- Document all 22 CLI flags, 7 actions, 4 backends across all SKILL.md files
- Document config keys (preferences, gate_mode, dream_rollouts, etc.)
- Document memory consolidation (evolve_memory / evolve_skill)
- Add schedule/unschedule to all plugins
- Copilot MCP: expand schema from 3 → 16 params + schedule tools
- OpenClaw: add schedule/unschedule subcommands via shared scheduler

Tests:
- Cross-plugin parity test (prevents future feature drift)
- MCP schema completeness test

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-20 11:31:09 +00:00
Kirill Kostarev
05cdc26beb Add reviewed task-file flow for Codex sleep runs 2026-06-20 08:58:48 +00:00
DB Lee
d367ae1eea docs(plugins): list copilot in the cross-tool backend overview
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 17:38:10 -07:00
DB Lee
2c0980bda3 docs(copilot): correct backend hint in research MCP plugin (openai -> azure_openai)
The advertised backend choices in scripts/train.py use 'azure_openai',
not 'openai'; align the inputSchema description hint accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 17:25:50 -07:00
DB Lee
21f93c16c7 Add GitHub Copilot backend to SkillOpt-Sleep
Add CopilotCliBackend that drives the GitHub Copilot CLI in
non-interactive mode (copilot -p ... --output-format json) and parses the
JSONL event stream for assistant.message content. Registered as the
'copilot' backend (with aliases) and wired through the CLI, config,
experiment harness, and the Copilot MCP server's backend enum.

- Force UTF-8 decoding of CLI output (fixes cp1252 UnicodeDecodeError on
  Windows when responses contain non-cp1252 bytes).
- Minimise per-call startup: isolated COPILOT_HOME with built-in MCPs and
  custom instructions disabled, so user MCP servers are not spawned per
  call (~5x faster: 36s -> 7.4s). Override via SKILLOPT_SLEEP_COPILOT_HOME
  / SKILLOPT_SLEEP_COPILOT_MODEL / SKILLOPT_SLEEP_COPILOT_FULL_ENV.

Validated end-to-end on real held-out tasks (researcher persona:
0.42 -> 1.00 lift; gate correctly rejects non-improving edits).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 17:25:50 -07:00
DB Lee
5dc894715f Add SkillOpt research-engine MCP server plugin for Copilot
Exposes scripts/train.py and scripts/eval_only.py as Copilot MCP tools
(skillopt_list_configs, skillopt_train, skillopt_eval) via a stdlib-only
stdio server, mirroring the existing SkillOpt-Sleep plugin layout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 17:24:00 -07:00
Yifan Yang
b701d9b6d9 docs: move SkillOpt-Sleep into the guide; clean docs/sleep; fix guide link
Per maintainer request:
- Remove the internal/scratch docs/sleep/ tree (reports, raw logs, blog run
  JSON, sweep.jsonl) — 23 files — and the root PUBLISHING.md. These were
  working notes, not reference docs.
- Take the dedicated SkillOpt-Sleep content out of the main README (News bullet
  + section) and host it in the rendered guide instead: new section 9 in
  docs/guideline.html (deployment companion, the three plugins, opt-in
  experience replay / dream rollouts) with a sidebar entry.
- Fix the README's opening reference so "Documentation & Reproduction Guide"
  links directly to the rendered GitHub Pages page, not the raw .html source.
- Repoint the now-removed docs/sleep links in the plugin READMEs to the guide
  section.

The plugin code (plugins/, skillopt_sleep/) is unchanged; only docs move.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-15 16:20:50 +00:00
Yifan Yang
576f2f8bad Merge pull request #59 from Elzlxx/feat/openclaw-skillopt-sleep
feat(plugins): add OpenClaw shell for SkillOpt-Sleep
2026-06-15 18:26:12 +08:00
Kirill Kostarev
31715a8b43 Add Codex Desktop transcript harvesting 2026-06-15 10:23:08 +00:00
Kirill Kostarev
d31e9d9407 Back up legacy Codex prompt during install 2026-06-15 10:21:30 +00:00
Kirill Kostarev
1953484822 Make Codex integration skill-first 2026-06-15 10:21:30 +00:00
Yifan Yang
86bad36ffe feat(sleep): SkillOpt-Sleep plugin update (preview) — engine robustness + scheduling
Updates the SkillOpt-Sleep plugin on top of the current main. User-facing and
engine improvements since the initial drop:

* Command renamed /sleep -> /skillopt-sleep across Claude Code + Codex shells;
  refreshed plugin READMEs and install scripts.
* Built-in scheduling (skillopt_sleep/scheduler.py + __main__): schedule /
  unschedule the nightly cycle without external cron wiring.
* Backend robustness: bounded retry with backoff (no more silent empty-string
  on transient 429/timeout), content-filter-safe rollout prompt, an
  output-contract guardrail that rejects edits violating the task's required
  format, and a per-sample cache key so repeated dream rollouts are independent
  samples (fixes degenerate single-sample reflection).
* consolidate / rollout / replay: parallel multi-rollout dreaming, gate-mode
  controls, TaskRecord.system framing field.

Scope: this commit ships only the plugin engine + shells. Research/benchmark
harnesses and their data are intentionally not included; the public package
has no dependency on them (the one research-evaluator import is now guarded).
Marked as an early preview in the README; we'll keep iterating.

99/99 unit tests pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-14 16:12:00 +00:00
elzlxx
553446575a feat(plugins): add OpenClaw shell for SkillOpt-Sleep
Adds a thin OpenClaw shell wrapping the SkillOpt-Sleep engine. Enables
nightly validation-gated skill improvement cycles for OpenClaw agents.

Components:
- skillopt_sleep_openclaw.py: DeepSeek V4 Pro + Ollama nomic-embed-text
  backend, mirroring the Claude/Codex/Copilot backend pattern.
- run_sleep.py: CLI entry point supporting dry-run and pre-built task files.
- run_sleep_cron.sh: bash wrapper for nightly cron invocation.
- slash_sleep.py: /sleep command (status / run / adopt / reject / cost).
- config.json: engine config tuned for our stack.
- SKILL.md: OpenClaw skill manifest.
- tests/: 14 held-out tasks across 3 categories (research-cron, devops, wiki).

OpenClaw is the 4th ecosystem in which SkillOpt-Sleep can be deployed,
joining Claude Code, Codex, and Copilot. The shell follows the same
single-engine / thin-shell pattern as the existing three plugins.

End-to-end tested: pipeline runs against real OpenClaw session transcripts,
gate correctly rejects non-improvements, staging artifacts land in
~/.skillopt-sleep/staging/<night>/. Cost: ~$0.02/night on DeepSeek V4 Pro.
2026-06-14 23:27:54 +08:00
Yifan Yang
dae974a5e3 chore(sleep): English-only across the engine, plugins, and docs
Remove every non-ASCII/CJK character for a professional open-source repo:
  - harvest.py: drop hardcoded Chinese feedback phrases; add an env-based
    extensibility hook (SKILLOPT_SLEEP_NEG_FEEDBACK / _POS_FEEDBACK) so any
    locale can be added without baking one in. Verified with a German example.
  - rollout.py / consolidate.py: English comments.
  - README.md section heading + anchor, CONTROLLABLE_DREAMING.md, plugin.json,
    marketplace.json (also fixed stale path skillopt-sleep-plugin ->
    plugins/claude-code), SKILL.md: English only.
  - Remove the internal WAKE_UP_SUMMARY.md note (not user-facing, not referenced).

Verified: zero CJK chars remain anywhere; 29 tests pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-08 14:31:52 +00:00
Yifan Yang
f9db99853b feat(plugins): ship SkillOpt-Sleep for Claude Code, Codex, and Copilot
Restructure into plugins/{claude-code,codex,copilot}/ — one engine, three thin
shells, all calling the shared plugins/run-sleep.sh -> python -m skillopt_sleep.

  - claude-code/: existing plugin moved here; runner delegates to the shared
    launcher (fixes repo-root resolution after the move).
  - codex/: ~/.codex/prompts/sleep.md custom prompt + ~/.agents/skills SKILL.md +
    install.sh + AGENTS.md hint — Codex's documented, stable extension surfaces.
  - copilot/: a stdlib-only MCP server (mcp_server.py) exposing sleep_* tools,
    plus mcp-config.example.json and a copilot-instructions snippet. Verified end
    to end (initialize -> tools/list -> tools/call returns real engine output).
  - plugins/README.md overview table; main README News + a dedicated SkillOpt-Sleep
    section; pyproject lists skillopt_sleep as a first-class package.

Decoupling emphasized throughout: open-source tool (skillopt_sleep/) with zero
dependency on the research package. 29 tests pass; all three shells resolve.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-08 14:31:52 +00:00