Add Devin plugin (plugins/devin): MCP server + ATIF-v1.7 harvest

Wires the skillopt_sleep engine into Devin (Cognition) via an MCP server,
following the same thin-shell pattern as plugins/copilot.

- mcp_server.py: stdlib-only stdio MCP server exposing the standard sleep_*
  tools (status, dry-run, run, adopt, harvest). REPO_ROOT defaults to ../.. so
  it finds skillopt_sleep automatically when run from plugins/devin/.
- harvest_devin.py: converts Devin ATIF-v1.7 transcripts, agentmemory, and
  .devin/skills/*/SKILL.md into the Claude Code-compatible JSONL the engine
  consumes; enriches with taskKey + outcome envelopes (hard test/build signal
  or judge rubric). Workspace auto-detection; cross-platform paths.
- judge.py, mcp-config.example.json, devin-rules.snippet.md, README.md.
- plugins/README.md: add Devin to the platform + install tables.

No changes to skillopt_sleep; shells out to `python -m skillopt_sleep` like the
other plugins. Pure stdlib; default backend mock (no API spend).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
khashayar
2026-06-25 10:42:52 +02:00
parent 2d7e37a395
commit bec23ed020
7 changed files with 983 additions and 2 deletions

View File

@@ -1,4 +1,4 @@
# SkillOpt-Sleep — plugins for Claude Code, Codex, and Copilot
# SkillOpt-Sleep — plugins for Claude Code, Codex, Copilot, and Devin
**Your coding agent forgets everything between sessions. SkillOpt-Sleep fixes
that.** While you sleep, it reviews what you did today, notices the rules you
@@ -8,7 +8,7 @@ only the rules that actually make it score better on *your own* past tasks. You
wake up to an agent that's better at *your* work, and you approve every change
before it sticks.
One engine, three thin shells. It synthesizes **SkillOpt** (validation-gated
One engine, four thin shells. It synthesizes **SkillOpt** (validation-gated
bounded text optimization — the research in this repo), **Claude Dreams**
(offline consolidation; input never mutated; review-then-adopt), and the **agent
sleep** idea (short-term experience → long-term competence).
@@ -25,6 +25,7 @@ sleep** idea (short-term experience → long-term competence).
| **Claude Code** | [`claude-code/`](claude-code) | `.claude-plugin` + `/skillopt-sleep` command + skill + hooks | full, installable |
| **Codex** | [`codex/`](codex) | user-level `skillopt-sleep` skill + shared runner | full |
| **Copilot** | [`copilot/`](copilot) | MCP server (`sleep_*` tools) + `copilot-instructions` | full (MCP) |
| **Devin** | [`devin/`](devin) | MCP server (`sleep_*` tools) + Devin ATIF-v1.7 harvest + `.devin/rules` | full (MCP) |
## Install (pick your agent)
@@ -33,6 +34,7 @@ sleep** idea (short-term experience → long-term competence).
| **Claude Code** | `/plugin marketplace add microsoft/SkillOpt``/plugin install skillopt-sleep` | `/skillopt-sleep status` |
| **Codex** | `git clone``bash plugins/codex/install.sh` | `/skillopt-sleep status` |
| **Copilot** | `git clone` → register `plugins/copilot/mcp_server.py` as an MCP server | ask "run the sleep cycle" |
| **Devin** | `git clone``devin mcp add skillopt-sleep -- python3 plugins/devin/mcp_server.py` | ask "run the sleep cycle" |
Requirements: Python ≥ 3.10 and the agent's CLI on PATH. All three call the same
[`run-sleep.sh`](run-sleep.sh) → `python -m skillopt_sleep`, so behaviour is

63
plugins/devin/README.md Normal file
View File

@@ -0,0 +1,63 @@
# SkillOpt-Sleep — Devin integration
Give **Devin** (Cognition) a nightly **sleep cycle** via a tiny **MCP server**
that exposes the `skillopt_sleep` engine as tools. MCP is Devin's supported way
to add custom tooling, so this works in Devin's CLI and IDE.
Devin doesn't write transcripts in the format the engine consumes, so this
plugin adds a **Devin-specific harvester** that converts every locally available
source into the Claude Code-compatible JSONL the engine reads.
## What's here
| File | Purpose |
|---|---|
| `mcp_server.py` | stdlib-only MCP (stdio) server exposing `sleep_*` tools |
| `harvest_devin.py` | converts Devin ATIF-v1.7 transcripts + agentmemory + `.devin/skills` into JSONL, with `taskKey` + outcome envelopes |
| `judge.py` | reference judge for the deferred/judge branch of the validation gate |
| `mcp-config.example.json` | drop-in MCP server config |
| `devin-rules.snippet.md` | paste into `.devin/rules/skillopt-sleep.md` |
## What it harvests
| Source | Where |
|---|---|
| Devin transcripts (ATIF-v1.7) | `~/.local/share/devin/cli/transcripts/*.json` |
| agentmemory | `~/.agentmemory/standalone.json` |
| Skill files | `.devin/skills/*/SKILL.md` |
Workspaces are auto-detected from `~/.config/Devin/User/workspaceStorage/*/workspace.json`.
After `sleep_adopt`, the evolved skill is synced to `.devin/skills/skillopt-sleep-learned/SKILL.md`.
## Install
Requires Python ≥ 3.10. No third-party packages — the server is pure stdlib.
1. **Register the MCP server.** Use `mcp-config.example.json` as a template; set
`args` to the absolute path of this `mcp_server.py`. The engine is found
automatically (this plugin lives inside the SkillOpt repo). Or via the Devin
CLI:
```bash
devin mcp add skillopt-sleep \
--env "SKILLOPT_DEVIN_CLAUDE_HOME=$HOME/.skillopt-sleep-devin" \
-- python3 /abs/path/to/SkillOpt/plugins/devin/mcp_server.py
```
2. **(Optional)** copy `devin-rules.snippet.md` to `.devin/rules/skillopt-sleep.md`
so Devin proactively offers the tools.
3. Ask Devin: *"run the sleep cycle"*, *"what did the last sleep propose?"*, *"adopt it"*.
## Tools
| Tool | What it does |
|---|---|
| `sleep_status` | nights run so far + latest staged proposal |
| `sleep_dry_run` | preview cycle — no staging, no changes |
| `sleep_run` | full cycle; stages a proposal for review |
| `sleep_adopt` | apply the staged proposal; syncs skill to the workspace |
| `sleep_harvest` | debug: list the recurring tasks mined |
Default backend is `mock` (no API spend); `--backend claude|codex` uses your own
budget. Same engine as the other plugins — all call `python -m skillopt_sleep`.

View File

@@ -0,0 +1,18 @@
# SkillOpt-Sleep (Devin)
You have access to a nightly self-evolution cycle via the `skillopt-sleep` MCP
server. Use these tools to improve your long-term skills over time:
- **`sleep_status`** — how many nights have run + the latest staged proposal
- **`sleep_dry_run`** — preview a cycle without changing anything
- **`sleep_run`** — run a full cycle; stages a proposal for review
- **`sleep_adopt`** — apply the staged proposal to `.devin/skills/skillopt-sleep-learned/SKILL.md`
- **`sleep_harvest`** — debug: list the recurring tasks mined from recent sessions
When a user asks about the sleep cycle, skill evolution, or improving your
long-term memory, prefer calling these tools over explaining the concept.
Default backend is `mock` (no API spend). Pass `backend: "claude"` or
`backend: "codex"` with your own API key for real LLM-driven optimization.
Place this file at `.devin/rules/skillopt-sleep.md` in your workspace.

View File

@@ -0,0 +1,530 @@
#!/usr/bin/env python3
"""Convert Devin IDE local data into Claude Code-format JSONL transcripts.
Devin (Cognition) does not persist agent conversation transcripts to disk in a
format the sleep engine understands. This script bridges that gap by synthesising
JSONL files from every locally available source:
1. **Devin transcripts** (~/.local/share/devin/cli/transcripts/*.json)
Native ATIF-v1.7 format — source:"user" / source:"agent" messages
converted directly to user/assistant JSONL turns.
2. **agentmemory** (~/.agentmemory/standalone.json)
Memories saved by the `agentmemory` MCP server — each memory's title
becomes a synthetic user prompt; its content becomes the assistant reply.
3. **Skill files** (.devin/skills/*/SKILL.md)
Each skill description is converted to a session where the user asked
"use the <skill> skill" and the assistant described how to apply it.
Output layout (mirrors ~/.claude/projects/<slug>/<sessionId>.jsonl):
<out_dir>/projects/<slug>/<session_id>.jsonl
Workspace auto-detection order:
1. ``SKILLOPT_DEVIN_WORKSPACES`` env var — colon-separated abs paths
2. Devin registry: ``~/.config/Devin/User/workspaceStorage/*/workspace.json``
4. Working directory fallback
Usage (standalone):
python harvest_devin.py [--out-dir PATH] [--workspaces PATH ...]
"""
from __future__ import annotations
import argparse
import hashlib
import json
import os
import re
import sys
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional
from urllib.parse import unquote, urlparse
# ── cross-platform path resolution (Linux + Windows + macOS) ──────────────────
#
# Devin is a VS Code-family app, so its user-data dir moves with the OS:
# Linux ~/.config/<App>, Windows %APPDATA%\<App>, macOS
# ~/Library/Application Support/<App>. Resolve all candidates and let callers
# keep whichever actually exists.
def _app_data_roots(app: str) -> List[str]:
"""User-data dir candidates for a VS Code-family app, current OS first."""
home = os.path.expanduser("~")
roots: List[str] = []
if os.name == "nt":
appdata = os.environ.get("APPDATA") or os.path.join(home, "AppData", "Roaming")
roots.append(os.path.join(appdata, app))
elif sys.platform == "darwin":
roots.append(os.path.join(home, "Library", "Application Support", app))
# XDG / Linux (also a sensible fallback everywhere)
xdg = os.environ.get("XDG_CONFIG_HOME") or os.path.join(home, ".config")
roots.append(os.path.join(xdg, app))
# de-dupe, preserve order
return list(dict.fromkeys(roots))
def _devin_transcript_candidates() -> List[str]:
"""Where the Devin CLI may store ATIF transcripts, per OS."""
home = os.path.expanduser("~")
cands: List[str] = []
if os.name == "nt":
for base in (os.environ.get("LOCALAPPDATA"), os.environ.get("APPDATA")):
if base:
cands.append(os.path.join(base, "devin", "cli", "transcripts"))
elif sys.platform == "darwin":
cands.append(os.path.join(home, "Library", "Application Support",
"devin", "cli", "transcripts"))
cands.append(os.path.join(home, ".local", "share", "devin", "cli", "transcripts"))
return list(dict.fromkeys(cands))
def _first_existing(paths: List[str]) -> str:
"""First path that exists, else the first candidate (for nice messaging)."""
for p in paths:
if os.path.exists(p):
return p
return paths[0] if paths else ""
def _uri_to_path(folder: str) -> str:
"""Convert a VS Code ``file://`` workspace URI to a local path, cross-platform.
Linux: file:///home/u/proj -> /home/u/proj
Windows: file:///c%3A/Users/u/p -> c:/Users/u/p
"""
if not folder.startswith("file://"):
return folder
path = unquote(urlparse(folder).path)
# Windows drive paths come through as '/C:/...' — strip the leading slash.
if os.name == "nt" and re.match(r"^/[A-Za-z]:", path):
path = path[1:]
return path
# ── workspace auto-detection ─────────────────────────────────────────────────
def _workspaces_from_registry(storage_root: str) -> List[tuple]:
"""Read VS Code-style workspaceStorage to get (mtime, path) pairs."""
results: List[tuple] = []
if not os.path.isdir(storage_root):
return results
for entry in os.scandir(storage_root):
ws_json = os.path.join(entry.path, "workspace.json")
if not os.path.isfile(ws_json):
continue
try:
with open(ws_json, encoding="utf-8") as f:
data = json.load(f)
folder = _uri_to_path(data.get("folder", ""))
if folder and os.path.isdir(folder):
results.append((os.path.getmtime(ws_json), folder))
except Exception:
continue
return results
def _detect_workspaces() -> List[str]:
"""Return known workspace paths (Devin registry), newest first."""
env_val = os.environ.get("SKILLOPT_DEVIN_WORKSPACES", "")
if env_val:
# os.pathsep so Windows 'C:\a;C:\b' splits correctly (not on the drive colon)
return [p for p in env_val.split(os.pathsep) if p and os.path.isdir(p)]
registries: List[str] = [
os.path.join(r, "User", "workspaceStorage")
for r in _app_data_roots("Devin")
]
seen: set = set()
results: List[tuple] = []
for registry in registries:
for mtime, folder in _workspaces_from_registry(registry):
if folder not in seen:
seen.add(folder)
results.append((mtime, folder))
results.sort(reverse=True)
paths = [p for _, p in results]
return paths if paths else [os.getcwd()]
# ── helpers ───────────────────────────────────────────────────────────────────
def _slug(path: str) -> str:
"""SHA-256 of abs-path, first 16 hex chars — matches Claude Code's scheme."""
return hashlib.sha256(os.path.abspath(path).encode()).hexdigest()[:16]
def _iso(epoch_ms: Optional[float] = None) -> str:
dt = (datetime.fromtimestamp(epoch_ms / 1000.0, tz=timezone.utc)
if epoch_ms is not None else datetime.now(tz=timezone.utc))
return dt.strftime("%Y-%m-%dT%H:%M:%S.000Z")
def _write_session(
out_dir: str, project: str, session_id: str,
user_prompts: List[str], assistant_replies: List[str],
timestamp_base_ms: float,
task_key: Optional[str] = None,
) -> None:
slug = _slug(project)
session_dir = os.path.join(out_dir, "projects", slug)
os.makedirs(session_dir, exist_ok=True)
out_path = os.path.join(session_dir, f"{session_id}.jsonl")
ts = timestamp_base_ms
with open(out_path, "w", encoding="utf-8") as f:
for user_text, asst_text in zip(user_prompts, assistant_replies):
user_rec = {
"type": "user",
"message": {"role": "user", "content": user_text},
"cwd": project,
"timestamp": _iso(ts),
"sessionId": session_id,
"version": "1.0",
}
if task_key:
# grouping key so the miner can collapse repeats into one recurring task
user_rec["taskKey"] = task_key
f.write(json.dumps(user_rec, ensure_ascii=False) + "\n")
ts += 1000
f.write(json.dumps({
"type": "assistant",
"message": {"role": "assistant", "content": asst_text},
"timestamp": _iso(ts),
"sessionId": session_id,
"version": "1.0",
}, ensure_ascii=False) + "\n")
ts += 2000
def _append_history(out_dir: str, display: str, project: str, timestamp_ms: float) -> None:
record = {"display": display, "timestamp": timestamp_ms, "project": project}
with open(os.path.join(out_dir, "history.jsonl"), "a", encoding="utf-8") as f:
f.write(json.dumps(record, ensure_ascii=False) + "\n")
def _infer_project(text: str, workspaces: List[str]) -> str:
for ws in workspaces:
if os.path.basename(ws.rstrip("/")).lower() in text.lower():
return ws
return workspaces[0] if workspaces else os.getcwd()
# ── task identity + outcome extraction (fuel for the validation gate) ─────────
#
# SkillOpt's gate only works "where tasks recur and have a checkable correctness
# signal." These helpers add the two things a raw transcript lacks:
# * a stable taskKey so repeats collapse into one recurring task, and
# * an outcome envelope (success + verifier + re-runnable reference) so the
# held-out replay has something to score against.
_LANG_HINTS = [
("java", r"(java|spring|maven|\bmvn\b|gradle|\.java\b|lombok)"),
("python", r"(python|pytest|\bpip\b|\.py\b|django|flask)"),
("ts", r"(typescript|\.tsx?\b|\bnpm\b|jest|node)"),
("js", r"(javascript|\.jsx?\b)"),
("sql", r"(\bsql\b|select\s|mariadb|mysql|postgres|\.sql\b)"),
("go", r"(golang|\bgo test\b|\.go\b)"),
("rust", r"(rust|cargo|\.rs\b)"),
]
_INTENT_HINTS = [
("fix", r"(fix|bug|error|fail|npe|exception|broken|crash)"),
("implement", r"(implement|add|create|build|introduce|support)"),
("refactor", r"(refactor|clean ?up|rename|extract|simplify)"),
("test", r"(test|coverage|assert)"),
("review", r"(review|audit|inspect)"),
("optimize", r"(optimi[sz]e|perf|speed up|slow)"),
("explain", r"(explain|understand|what does|how does)"),
]
_STOPWORDS = {"please", "this", "that", "with", "from", "into", "should",
"would", "code", "using", "the", "have"}
def _normalize_task_key(text: str, project: str) -> str:
"""Stable '<lang>:<intent>:<target>' grouping key for a task."""
low = text.lower()
lang = next((n for n, pat in _LANG_HINTS if re.search(pat, low)), "general")
intent = next((n for n, pat in _INTENT_HINTS if re.search(pat, low)), "task")
# target: prefer a CamelCase identifier, then a filename, then first real word
m = re.search(r"\b([A-Z][a-z0-9]+(?:[A-Z][a-z0-9]+)+)\b", text) # CamelCase
if not m:
m = re.search(r"\b([\w-]+\.\w+)\b", text) # filename.ext
if m:
target = m.group(1)
else:
# first content word that isn't a stopword or an intent verb (e.g. "implement")
target = next((w for w in re.findall(r"[a-zA-Z]{4,}", low)
if w not in _STOPWORDS
and not any(re.search(pat, w) for _, pat in _INTENT_HINTS)),
"general")
target = re.sub(r"[^a-zA-Z0-9]+", "-", target).strip("-").lower()[:40] or "general"
return f"{lang}:{intent}:{target}"
_PASS_PAT = re.compile(
r"(build success|all tests? pass(?:ed)?|\b\d+ passed\b|\b0 failed\b|"
r"tests? pass(?:ed)?|✓|no errors)", re.IGNORECASE)
_FAIL_PAT = re.compile(
r"(build failure|tests? failed|\b[1-9]\d* failed\b|error:|traceback|"
r"assertion ?error)", re.IGNORECASE) # note: "0 failed" must NOT match
_CMD_PAT = re.compile(
r"((?:rtk\s+)?(?:mvn|gradle|pytest|npm(?:\s+run)?\s+test|yarn\s+test|"
r"go\s+test|cargo\s+test)[^\n`]*)", re.IGNORECASE)
def _detect_outcome(messages: List[str]) -> Optional[Dict[str, Any]]:
"""Best-effort checkable signal from agent messages. None ⇒ no hard signal."""
blob = "\n".join(m for m in messages if m)
pass_hit, fail_hit = _PASS_PAT.search(blob), _FAIL_PAT.search(blob)
if not pass_hit and not fail_hit:
return None
verifier = "tests" if re.search(r"test|pytest", blob, re.IGNORECASE) else "build"
out: Dict[str, Any] = {
"success": bool(pass_hit) and not fail_hit,
"verifier": verifier,
"evidence": (pass_hit or fail_hit).group(0).strip(),
}
cmd = _CMD_PAT.search(blob)
if cmd:
# keep only the command itself, dropping any "-> result" / ": output" tail
repro = re.split(r"\s*(?:->|→|:|,)\s*", cmd.group(1))[0].strip()
out["reference"] = {"repro": repro}
return out
def _build_rubric(user_prompt: str) -> List[str]:
"""Derive checkable criteria from the task so a judge has something to score."""
crit: List[str] = []
ids = re.findall(r"\b([A-Z][a-z0-9]+(?:[A-Z][a-z0-9]+)+|[\w-]+\.\w+)\b", user_prompt)
for i in dict.fromkeys(ids): # dedupe, preserve order
crit.append(f"Addresses {i}")
intent = _normalize_task_key(user_prompt, "").split(":")[1]
crit.append({
"fix": "Resolves the reported defect without introducing new errors",
"implement": "Implements the requested behavior end to end",
"refactor": "Preserves behavior while improving structure",
"test": "Adds or fixes tests that actually exercise the change",
"optimize": "Improves performance without changing results",
}.get(intent, "Satisfies the user's stated request"))
crit.append("Response is concrete and actionable, not a restatement of the task")
return crit[:5]
def _judge_rubric_fallback(user_prompt: str) -> Dict[str, Any]:
"""When no hard signal exists, attach a rubric and mark the task for judge
scoring. success=None tells the gate to defer/judge rather than trust it.
The actual scoring is done by judge.py (or the engine) at replay time."""
return {
"success": None,
"verifier": "judge",
"rubric": _build_rubric(user_prompt or ""),
}
def _write_outcome(out_dir: str, session_id: str, task_key: str, project: str,
ts_ms: float, outcome: Dict[str, Any]) -> None:
rec = {"type": "outcome", "sessionId": session_id, "taskKey": task_key,
"project": project, "timestamp": _iso(ts_ms), **outcome}
with open(os.path.join(out_dir, "outcomes.jsonl"), "a", encoding="utf-8") as f:
f.write(json.dumps(rec, ensure_ascii=False) + "\n")
# ── source 1: Devin ATIF-v1.7 transcripts ────────────────────────────────────
def harvest_devin_transcripts(
transcripts_dir: str, out_dir: str, workspaces: List[str]
) -> int:
"""Convert Devin CLI ATIF-v1.7 transcripts to Claude Code JSONL."""
if not os.path.isdir(transcripts_dir):
return 0
written = 0
for entry in os.scandir(transcripts_dir):
if not entry.name.endswith(".json"):
continue
try:
with open(entry.path, encoding="utf-8") as f:
data = json.load(f)
except Exception:
continue
if data.get("schema_version", "").startswith("ATIF"):
pass # Devin native format
else:
continue
session_id = data.get("session_id") or entry.name[:-5]
steps = data.get("steps") or []
user_prompts: List[str] = []
agent_replies: List[str] = []
project = ""
ts_base: Optional[float] = None
for step in steps:
src = step.get("source", "")
msg = str(step.get("message") or "").strip()
if not msg or src == "system":
continue
if src == "user":
user_prompts.append(msg)
if not project:
project = _infer_project(msg, workspaces)
elif src == "agent":
agent_replies.append(msg)
if ts_base is None:
raw_ts = step.get("timestamp", "")
if raw_ts:
try:
from datetime import datetime as _dt
ts_base = _dt.fromisoformat(
raw_ts.replace("Z", "+00:00")
).timestamp() * 1000
except Exception:
pass
if not user_prompts:
continue
if not project:
project = workspaces[0] if workspaces else os.getcwd()
if ts_base is None:
ts_base = datetime.now(tz=timezone.utc).timestamp() * 1000
# Identity + outcome: what makes this trajectory replayable & gradeable.
task_key = _normalize_task_key(user_prompts[0], project)
outcome = _detect_outcome(agent_replies) or _judge_rubric_fallback(user_prompts[0])
# Pair turns; pad shorter list
n = max(len(user_prompts), len(agent_replies))
user_prompts += [""] * (n - len(user_prompts))
agent_replies += [""] * (n - len(agent_replies))
sid = f"devin_{session_id}"
_write_session(
out_dir, project, sid,
user_prompts=[p for p in user_prompts if p],
assistant_replies=[r if r else "[no reply recorded]" for r, p in
zip(agent_replies, user_prompts) if p],
timestamp_base_ms=ts_base,
task_key=task_key,
)
_write_outcome(out_dir, sid, task_key, project, ts_base, outcome)
_append_history(
out_dir,
display=(user_prompts[0] or session_id)[:120],
project=project,
timestamp_ms=ts_base,
)
written += 1
return written
# ── source 2: agentmemory ─────────────────────────────────────────────────────
def harvest_agentmemory(agentmemory_path: str, out_dir: str,
workspaces: List[str]) -> int:
if not os.path.isfile(agentmemory_path):
return 0
with open(agentmemory_path, encoding="utf-8") as f:
data = json.load(f)
memories: Dict[str, Any] = data.get("mem:memories", {})
written = 0
base_ts = datetime.now(tz=timezone.utc).timestamp() * 1000 - len(memories) * 60_000
for i, (mem_id, mem) in enumerate(memories.items()):
title = str(mem.get("title", "")).strip()
content = str(mem.get("content", "")).strip()
if not title or not content:
continue
project = _infer_project(title + " " + content, workspaces)
ts = base_ts + i * 60_000
_write_session(out_dir, project, mem_id,
user_prompts=[title],
assistant_replies=[content],
timestamp_base_ms=ts)
_append_history(out_dir, display=title[:120], project=project, timestamp_ms=ts)
written += 1
return written
# ── source 3: skill files (.devin/skills) ─────────────────────────────────────
def harvest_skills(workspaces: List[str], out_dir: str) -> int:
written = 0
seen_ids: set = set()
for ws in workspaces:
skills_root = os.path.join(ws, ".devin", "skills")
if not os.path.isdir(skills_root):
continue
for skill_dir in os.scandir(skills_root):
if not skill_dir.is_dir():
continue
skill_md = os.path.join(skill_dir.path, "SKILL.md")
if not os.path.isfile(skill_md):
continue
sid = f"skill_{skill_dir.name}"
if sid in seen_ids:
continue
seen_ids.add(sid)
with open(skill_md, encoding="utf-8") as f:
raw = f.read()
body = re.sub(r"^---.*?---\s*", "", raw, flags=re.DOTALL).strip()
if not body:
continue
first_line = body.split("\n")[0].lstrip("# ").strip()
user_ask = f"Please use the {skill_dir.name} skill: {first_line}"
ts = datetime.now(tz=timezone.utc).timestamp() * 1000 - 3_600_000
_write_session(out_dir, ws, sid,
user_prompts=[user_ask],
assistant_replies=[body[:1200]],
timestamp_base_ms=ts)
_append_history(out_dir, display=user_ask[:120], project=ws, timestamp_ms=ts)
written += 1
return written
# ── main ─────────────────────────────────────────────────────────────────────
def main(argv=None) -> int:
parser = argparse.ArgumentParser(
description="Generate SkillOpt-Sleep transcripts from Devin local data"
)
parser.add_argument(
"--out-dir",
default=os.path.expanduser("~/.skillopt-sleep-devin"),
help="Output claude_home dir (default: ~/.skillopt-sleep-devin)",
)
parser.add_argument(
"--agentmemory",
default=os.path.expanduser("~/.agentmemory/standalone.json"),
help="Path to agentmemory standalone.json",
)
parser.add_argument(
"--devin-transcripts",
default=_first_existing(_devin_transcript_candidates()),
help="Devin CLI ATIF transcripts directory (default: per-OS auto-detect)",
)
parser.add_argument(
"--workspaces", nargs="*",
help="Workspace paths (default: auto-detect from Devin registry)",
)
parser.add_argument("--quiet", action="store_true")
args = parser.parse_args(argv)
out_dir = os.path.expanduser(args.out_dir)
os.makedirs(out_dir, exist_ok=True)
os.makedirs(os.path.join(out_dir, "projects"), exist_ok=True)
workspaces = args.workspaces or _detect_workspaces()
workspaces = [ws for ws in workspaces if os.path.isdir(ws)]
if not workspaces:
workspaces = [os.getcwd()]
total = 0
devin_transcripts = os.path.expanduser(args.devin_transcripts)
n = harvest_devin_transcripts(devin_transcripts, out_dir, workspaces)
if not args.quiet:
print(f"[harvest_devin] devin : {n} sessions")
total += n
n = harvest_agentmemory(args.agentmemory, out_dir, workspaces)
if not args.quiet:
print(f"[harvest_devin] agentmemory : {n} sessions")
total += n
n = harvest_skills(workspaces, out_dir)
if not args.quiet:
print(f"[harvest_devin] skill files : {n} sessions")
total += n
if not args.quiet:
print(f"[harvest_devin] total : {total} synthetic sessions → {out_dir}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

129
plugins/devin/judge.py Normal file
View File

@@ -0,0 +1,129 @@
#!/usr/bin/env python3
"""Reference judge for SkillOpt-Sleep — score a candidate reply against a rubric.
Tasks harvested without a hard test/build signal get ``verifier: "judge"`` and a
``rubric`` (see ``_build_rubric`` in harvest_devin.py). This module is the
scorer the validation gate calls for those tasks: given the rubric and a
candidate reply produced during replay, it returns a score in ``[0, 1]``. The
gate accepts a skill edit only if the *new* skill scores strictly higher on the
held-out tasks.
It is self-contained on purpose — in a full deployment the SkillOpt engine owns
replay+scoring, but having a runnable reference here lets you sanity-check the
judge path without the engine.
Backends (select via ``SKILLOPT_JUDGE``):
* ``heuristic`` (default) — keyword-coverage, offline, no API key, deterministic.
* ``claude`` — LLM judge via the Anthropic API (needs ANTHROPIC_API_KEY).
Usage:
python judge.py --rubric rubric.json --reply reply.txt
echo "<reply>" | python judge.py --rubric-inline '["Addresses OrderService", ...]'
"""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
from typing import List
_STOPWORDS = {"addresses", "resolves", "implements", "without", "introducing",
"behavior", "request", "response", "concrete", "actionable", "not",
"the", "and", "that", "with", "stated", "reported", "actually",
"preserves", "improving", "structure", "requested", "satisfies"}
# Cheap, fast model is the right default for a judge.
_JUDGE_MODEL = os.environ.get("SKILLOPT_JUDGE_MODEL", "claude-haiku-4-5-20251001")
def _content_words(text: str) -> List[str]:
return [w for w in re.findall(r"[A-Za-z][A-Za-z0-9_.\-]{3,}", text.lower())
if w not in _STOPWORDS]
def heuristic_score(reply: str, rubric: List[str]) -> float:
"""Fraction of rubric criteria whose key content words appear in the reply.
Crude but deterministic: each criterion is 'met' if at least one of its
content words shows up in the candidate reply. Good enough to smoke-test the
gate wiring; swap in the claude backend for real judging.
"""
if not rubric:
return 0.0
low = reply.lower()
met = 0
for criterion in rubric:
words = _content_words(criterion)
if not words: # nothing to check → treat as met
met += 1
continue
if any(w in low for w in words):
met += 1
return round(met / len(rubric), 3)
def claude_score(reply: str, rubric: List[str]) -> float:
"""LLM judge via the Anthropic API. Returns a 0..1 score.
Stdlib-only (urllib) so this file stays dependency-free. Falls back to the
heuristic if the key is missing or the call fails, so the gate never hard-errors.
"""
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
print("[judge] ANTHROPIC_API_KEY unset — using heuristic", file=sys.stderr)
return heuristic_score(reply, rubric)
import urllib.request
rubric_block = "\n".join(f"- {c}" for c in rubric)
prompt = (
"You are scoring an AI agent's reply against a rubric. For each criterion, "
"decide if the reply satisfies it. Respond with ONLY a number between 0 and "
"1 — the fraction of criteria satisfied.\n\n"
f"Rubric:\n{rubric_block}\n\nReply:\n{reply}\n\nScore:"
)
body = json.dumps({
"model": _JUDGE_MODEL,
"max_tokens": 8,
"messages": [{"role": "user", "content": prompt}],
}).encode()
req = urllib.request.Request(
"https://api.anthropic.com/v1/messages", data=body,
headers={"content-type": "application/json", "x-api-key": api_key,
"anthropic-version": "2023-06-01"},
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
data = json.load(resp)
text = "".join(b.get("text", "") for b in data.get("content", []))
m = re.search(r"[01](?:\.\d+)?", text)
return max(0.0, min(1.0, float(m.group(0)))) if m else heuristic_score(reply, rubric)
except Exception as exc: # network/auth/parse — degrade gracefully
print(f"[judge] claude backend failed ({exc}) — using heuristic", file=sys.stderr)
return heuristic_score(reply, rubric)
def score(reply: str, rubric: List[str]) -> float:
backend = os.environ.get("SKILLOPT_JUDGE", "heuristic")
return claude_score(reply, rubric) if backend == "claude" else heuristic_score(reply, rubric)
def main(argv=None) -> int:
p = argparse.ArgumentParser(description="Score a reply against a rubric (0..1)")
g = p.add_mutually_exclusive_group(required=True)
g.add_argument("--rubric", help="Path to a JSON file containing a list of criteria")
g.add_argument("--rubric-inline", help="Inline JSON list of criteria")
p.add_argument("--reply", help="Path to the reply text (default: stdin)")
args = p.parse_args(argv)
rubric = (json.load(open(args.rubric, encoding="utf-8")) if args.rubric
else json.loads(args.rubric_inline))
reply = (open(args.reply, encoding="utf-8").read() if args.reply
else sys.stdin.read())
print(score(reply, rubric))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,11 @@
{
"mcpServers": {
"skillopt-sleep": {
"command": "python3",
"args": ["/abs/path/to/SkillOpt/plugins/devin/mcp_server.py"],
"env": {
"SKILLOPT_DEVIN_CLAUDE_HOME": "~/.skillopt-sleep-devin"
}
}
}
}

228
plugins/devin/mcp_server.py Normal file
View File

@@ -0,0 +1,228 @@
#!/usr/bin/env python3
"""SkillOpt-Sleep — Devin MCP server (stdio, stdlib-only).
Exposes the sleep engine as MCP tools so Devin can drive it.
Speaks JSON-RPC 2.0 over stdio with just the handful of MCP methods Devin
needs. No third-party deps beyond the SkillOpt repo itself.
Before each tool call this server runs ``harvest_devin.py`` to convert
locally available Devin data (ATIF-v1.7 transcripts, agentmemory memories,
and .devin skill files) into the Claude Code-compatible JSONL transcripts
that the sleep engine consumes.
After ``sleep_adopt`` the evolved SKILL.md is also synced back into the active
Devin workspace's ``.devin/skills/`` directory so Devin picks it up immediately.
Tools exposed (identical interface to the Copilot plugin):
sleep_status show how many nights have run + latest staged proposal
sleep_dry_run harvest+mine+replay, report only (no staging)
sleep_run full cycle; stages a reviewed proposal
sleep_adopt apply the latest staged proposal
sleep_harvest debug: list mined recurring tasks
Configure Devin to launch::
python plugins/devin/mcp_server.py
with ``SKILLOPT_SLEEP_REPO`` set to this repo's root.
"""
from __future__ import annotations
import json
import os
import shutil
import subprocess
import sys
# ── constants ─────────────────────────────────────────────────────────────────
REPO_ROOT = (
os.environ.get("SKILLOPT_SLEEP_REPO")
or os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
)
PLUGIN_DIR = os.path.dirname(os.path.abspath(__file__))
CLAUDE_HOME = os.environ.get(
"SKILLOPT_DEVIN_CLAUDE_HOME",
os.path.expanduser("~/.skillopt-sleep-devin"),
)
MANAGED_SKILL_NAME = os.environ.get("SKILLOPT_MANAGED_SKILL", "skillopt-sleep-learned")
PROTOCOL_VERSION = "2024-11-05"
TOOLS = [
{
"name": "sleep_status",
"action": "status",
"description": "Show how many SkillOpt-Sleep nights have run and the latest staged proposal.",
},
{
"name": "sleep_dry_run",
"action": "dry-run",
"description": "Preview a sleep cycle (harvest+mine+replay) without staging anything.",
},
{
"name": "sleep_run",
"action": "run",
"description": "Run a full sleep cycle; stages a reviewed proposal. Nothing live changes until adopt.",
},
{
"name": "sleep_adopt",
"action": "adopt",
"description": (
"Apply the latest staged proposal to the managed SKILL.md. "
"Also syncs the evolved skill into the Devin workspace so Devin picks it up immediately."
),
},
{
"name": "sleep_harvest",
"action": "harvest",
"description": "Debug: list the recurring tasks mined from recent Devin sessions.",
},
]
_BY_NAME = {t["name"]: t for t in TOOLS}
_TOOL_SCHEMA = {
"type": "object",
"properties": {
"project": {
"type": "string",
"description": "Project dir to evolve (default: cwd).",
},
"backend": {
"type": "string",
"enum": ["mock", "claude", "codex"],
"description": "mock = no API spend (default); claude/codex = real.",
},
"scope": {"type": "string", "enum": ["invoked", "all"]},
},
"additionalProperties": False,
}
# ── harvest step ──────────────────────────────────────────────────────────────
def _run_harvest() -> str:
harvester = os.path.join(PLUGIN_DIR, "harvest_devin.py")
env = dict(os.environ)
env["PYTHONPATH"] = REPO_ROOT + os.pathsep + env.get("PYTHONPATH", "")
try:
proc = subprocess.run(
[sys.executable, harvester, "--out-dir", CLAUDE_HOME],
capture_output=True, text=True, timeout=60, env=env,
)
out = (proc.stdout or "").strip()
err = (proc.stderr or "").strip()
return out + (("\n[harvest stderr]\n" + err) if err else "")
except Exception as exc:
return f"[harvest_devin] warning: {exc}"
# ── post-adopt: sync evolved skill into workspace (.devin) ────────────────────
def _sync_skill(project: str) -> str:
src = os.path.join(CLAUDE_HOME, "skills", MANAGED_SKILL_NAME, "SKILL.md")
if not os.path.isfile(src):
return ""
if not project or not os.path.isdir(project):
return ""
synced = []
dot_root = os.path.join(project, ".devin")
if os.path.isdir(dot_root):
dst_dir = os.path.join(dot_root, "skills", MANAGED_SKILL_NAME)
os.makedirs(dst_dir, exist_ok=True)
dst = os.path.join(dst_dir, "SKILL.md")
shutil.copy2(src, dst)
synced.append(dst)
return ("\n" + "\n".join(f"[sleep] synced evolved skill → {p}" for p in synced)
if synced else "")
# ── engine call ───────────────────────────────────────────────────────────────
def _run_engine(action: str, args: dict) -> str:
harvest_out = _run_harvest()
project = args.get("project") or os.getcwd()
backend = args.get("backend") or "mock"
scope = args.get("scope") or "invoked"
cmd = [
sys.executable, "-m", "skillopt_sleep", action,
"--claude-home", CLAUDE_HOME,
"--project", project,
"--scope", scope,
"--backend", backend,
"--source", "claude",
]
env = dict(os.environ)
env["PYTHONPATH"] = REPO_ROOT + os.pathsep + env.get("PYTHONPATH", "")
try:
proc = subprocess.run(
cmd, capture_output=True, text=True, timeout=3600, env=env,
)
except Exception as exc:
return f"[harvest]\n{harvest_out}\n[error] failed to run engine: {exc}"
out = (proc.stdout or "").strip()
err = (proc.stderr or "").strip()
result = f"[harvest]\n{harvest_out}\n\n[engine]\n{out}"
if err:
result += f"\n[stderr]\n{err}"
if action == "adopt":
result += _sync_skill(project)
return result
# ── JSON-RPC / MCP plumbing ───────────────────────────────────────────────────
def _result(id_, result):
return {"jsonrpc": "2.0", "id": id_, "result": result}
def _error(id_, code, message):
return {"jsonrpc": "2.0", "id": id_, "error": {"code": code, "message": message}}
def handle(req: dict):
method = req.get("method")
id_ = req.get("id")
if method == "initialize":
return _result(id_, {
"protocolVersion": PROTOCOL_VERSION,
"capabilities": {"tools": {}},
"serverInfo": {"name": "skillopt-sleep-devin", "version": "0.1.0"},
})
if method in ("notifications/initialized", "initialized"):
return None
if method == "tools/list":
return _result(id_, {"tools": [
{"name": t["name"], "description": t["description"],
"inputSchema": _TOOL_SCHEMA}
for t in TOOLS
]})
if method == "tools/call":
params = req.get("params") or {}
name = params.get("name")
tool = _BY_NAME.get(name)
if not tool:
return _error(id_, -32602, f"unknown tool: {name}")
text = _run_engine(tool["action"], params.get("arguments") or {})
return _result(id_, {"content": [{"type": "text", "text": text}]})
if method == "ping":
return _result(id_, {})
return _error(id_, -32601, f"method not found: {method}")
def main() -> int:
for line in sys.stdin:
line = line.strip()
if not line:
continue
try:
req = json.loads(line)
except Exception:
continue
resp = handle(req)
if resp is not None:
sys.stdout.write(json.dumps(resp) + "\n")
sys.stdout.flush()
return 0
if __name__ == "__main__":
raise SystemExit(main())