diff --git a/README.md b/README.md
index 2d208d0ea..95d361434 100644
--- a/README.md
+++ b/README.md
@@ -1068,6 +1068,13 @@ Each application received complete, production-ready CLI interfaces — not demo
✅ 158 |
+| OpenRefine |
+Data Cleaning |
+cli-anything-openrefine |
+OpenRefine local HTTP API |
+✅ 76 |
+
+
| ⚡ n8n |
Workflow Automation |
cli-anything-n8n |
@@ -1436,6 +1443,7 @@ cli-anything/
├── 🌐 browser/agent-harness/ # Browser CLI (DOMShell MCP, new)
├── 🌐 web-yu-pri/agent-harness/ # Japan Post Web Yu-pri CLI (new)
├── 📄 libreoffice/agent-harness/ # LibreOffice CLI (158 tests)
+├── 🧹 openrefine/agent-harness/ # OpenRefine CLI (76 tests: 64 unit + 12 real backend e2e)
├── 📧 mailchimp/agent-harness/ # Mailchimp Marketing API CLI (303 commands, 36 unit tests)
├── 📚 zotero/agent-harness/ # Zotero CLI (new, write import support)
├── 📖 calibre/agent-harness/ # Calibre CLI (58 tests: 38 unit + 20 E2E)
diff --git a/openrefine/agent-harness/OPENREFINE.md b/openrefine/agent-harness/OPENREFINE.md
new file mode 100644
index 000000000..71dff270c
--- /dev/null
+++ b/openrefine/agent-harness/OPENREFINE.md
@@ -0,0 +1,97 @@
+# OpenRefine CLI-Anything Harness
+
+This harness exposes OpenRefine's documented local HTTP API as a stateful, agent-friendly Click CLI.
+It does not reimplement OpenRefine data cleaning. Project creation, row reads, operation application,
+export, and undo/redo are delegated to a running OpenRefine backend.
+
+## Backend Boundary
+
+- Default backend URL: `http://127.0.0.1:3333`
+- Override with `OPENREFINE_URL` or `--base-url`
+- Expected backend: OpenRefine 3.10.x or newer
+- Startup example: `openrefine -i 127.0.0.1 -p 3333`
+
+The backend wrapper lives at `cli_anything/openrefine/utils/openrefine_backend.py`.
+It wraps these OpenRefine surfaces:
+
+- `/command/core/get-version`
+- `/command/core/get-all-project-metadata`
+- `/command/core/get-project-metadata`
+- `/command/core/create-project-from-upload`
+- `/command/core/get-rows`
+- `/command/core/apply-operations`
+- `/command/core/export-rows`
+- `/command/core/get-history`
+- `/command/core/get-csrf-token`
+- `/command/core/undo-redo`
+- `/command/core/delete-project`
+
+## CLI Model
+
+The entry point is `cli-anything-openrefine`.
+
+Running the command with no subcommand enters the default REPL. One-shot commands are grouped by domain:
+
+- `server`: backend start and ping helpers
+- `project`: list, open, and import OpenRefine projects
+- `data`: inspect rows, apply operation histories, export rows
+- `ops`: generate reusable OpenRefine operation-history JSON
+- `session`: show state and call undo/redo
+
+All commands accept global `--json` for machine-readable output.
+
+## State Model
+
+Session state is JSON and defaults to `~/.cli-anything-openrefine/session.json`.
+Use `--session ` for isolated automation runs.
+
+The session stores:
+
+- backend URL
+- selected project id and name
+- last export path
+- local action history
+- redo stack
+
+Undo/redo uses OpenRefine's backend undo-redo endpoint when a project is selected. If no backend project is selected,
+the session store can still undo/redo local action history.
+
+## Operation Histories
+
+The harness passes OpenRefine operation JSON through to the backend. It also provides small builders for common operations:
+
+```bash
+cli-anything-openrefine ops text-transform ops.json --column Name --expression 'value.trim()'
+cli-anything-openrefine ops mass-edit ops.json --column City --edit NYC='New York'
+cli-anything-openrefine data apply ops.json --project-id 123456789
+```
+
+Agents can also provide existing OpenRefine operation-history JSON exported from the UI.
+
+## Install
+
+```bash
+cd openrefine/agent-harness
+python -m pip install -e .
+```
+
+## Test
+
+Backend-free unit tests:
+
+```bash
+python -m pytest cli_anything/openrefine/tests/test_core.py -v
+```
+
+Real backend E2E tests:
+
+```bash
+openrefine -i 127.0.0.1 -p 3333
+python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -v
+```
+
+## Limitations
+
+- The OpenRefine HTTP API is documented as subject to change. This harness targets OpenRefine 3.10.x API behavior.
+- Reconciliation-specific commands are not first-class yet; agents can still apply exported reconciliation operation histories.
+- Long-running operations are synchronous from the harness perspective and rely on backend HTTP completion.
diff --git a/openrefine/agent-harness/README.md b/openrefine/agent-harness/README.md
new file mode 100644
index 000000000..bf2436798
--- /dev/null
+++ b/openrefine/agent-harness/README.md
@@ -0,0 +1,22 @@
+# OpenRefine Agent Harness
+
+This is the standalone CLI-Anything harness package for OpenRefine.
+
+Install:
+
+```bash
+python -m pip install -e .
+```
+
+Run:
+
+```bash
+cli-anything-openrefine --help
+cli-anything-openrefine
+```
+
+Start OpenRefine first for backend commands:
+
+```bash
+openrefine -i 127.0.0.1 -p 3333
+```
diff --git a/openrefine/agent-harness/cli_anything/openrefine/README.md b/openrefine/agent-harness/cli_anything/openrefine/README.md
new file mode 100644
index 000000000..0eac1021b
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/README.md
@@ -0,0 +1,19 @@
+# CLI-Anything OpenRefine
+
+Agent-native CLI for OpenRefine data wrangling through the real local HTTP API.
+
+```bash
+cli-anything-openrefine --json project import messy.csv --name cleanup
+cli-anything-openrefine --json data rows --limit 5
+cli-anything-openrefine ops text-transform trim-name.json --column Name --expression 'value.trim()'
+cli-anything-openrefine --json data apply trim-name.json
+cli-anything-openrefine --json data export clean.csv
+```
+
+Run `cli-anything-openrefine` with no arguments for the REPL.
+
+Start OpenRefine first:
+
+```bash
+openrefine -i 127.0.0.1 -p 3333
+```
diff --git a/openrefine/agent-harness/cli_anything/openrefine/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/__init__.py
new file mode 100644
index 000000000..f22f701a0
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/__init__.py
@@ -0,0 +1,3 @@
+"""CLI-Anything harness for OpenRefine."""
+
+__version__ = "1.0.0"
diff --git a/openrefine/agent-harness/cli_anything/openrefine/__main__.py b/openrefine/agent-harness/cli_anything/openrefine/__main__.py
new file mode 100644
index 000000000..66a205a9c
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/__main__.py
@@ -0,0 +1,5 @@
+from .openrefine_cli import main
+
+
+if __name__ == "__main__":
+ raise SystemExit(main())
diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py
new file mode 100644
index 000000000..d3c83fa8e
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py
@@ -0,0 +1 @@
+"""Core OpenRefine harness primitives."""
diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/operations.py b/openrefine/agent-harness/cli_anything/openrefine/core/operations.py
new file mode 100644
index 000000000..316d75ee6
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/core/operations.py
@@ -0,0 +1,78 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+
+def load_operations(path: str | Path) -> list[dict[str, Any]]:
+ data = json.loads(Path(path).read_text(encoding="utf-8"))
+ if not isinstance(data, list):
+ raise ValueError("Operation history must be a JSON list")
+ for index, operation in enumerate(data):
+ if not isinstance(operation, dict):
+ raise ValueError(f"Operation {index} must be an object")
+ return data
+
+
+def save_operations(operations: list[dict[str, Any]], path: str | Path) -> Path:
+ target = Path(path)
+ target.parent.mkdir(parents=True, exist_ok=True)
+ target.write_text(json.dumps(operations, indent=2, sort_keys=True), encoding="utf-8")
+ return target
+
+
+def text_transform(column: str, expression: str, on_error: str = "keep-original") -> dict[str, Any]:
+ _require_text("column", column)
+ _require_text("expression", expression)
+ return {
+ "op": "core/text-transform",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "columnName": column,
+ "expression": expression,
+ "onError": on_error,
+ "repeat": False,
+ "repeatCount": 10,
+ "description": f"Text transform on {column} using expression {expression}",
+ }
+
+
+def mass_edit(column: str, edits: dict[str, str]) -> dict[str, Any]:
+ _require_text("column", column)
+ if not edits:
+ raise ValueError("edits must not be empty")
+ normalized = [{"from": [str(src)], "fromBlank": False, "fromError": False, "to": str(dst)} for src, dst in edits.items()]
+ return {
+ "op": "core/mass-edit",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "columnName": column,
+ "expression": "value",
+ "edits": normalized,
+ "description": f"Mass edit {len(edits)} value(s) in {column}",
+ }
+
+
+def column_addition(name: str, source_column: str, expression: str) -> dict[str, Any]:
+ _require_text("name", name)
+ _require_text("source_column", source_column)
+ _require_text("expression", expression)
+ return {
+ "op": "core/column-addition",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "baseColumnName": source_column,
+ "expression": expression,
+ "onError": "set-to-blank",
+ "newColumnName": name,
+ "columnInsertIndex": 1,
+ "description": f"Create column {name} from {source_column}",
+ }
+
+
+def column_removal(column: str) -> dict[str, Any]:
+ _require_text("column", column)
+ return {"op": "core/column-removal", "columnName": column, "description": f"Remove column {column}"}
+
+
+def _require_text(name: str, value: str) -> None:
+ if not isinstance(value, str) or not value.strip():
+ raise ValueError(f"{name} must be a non-empty string")
diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/project.py b/openrefine/agent-harness/cli_anything/openrefine/core/project.py
new file mode 100644
index 000000000..13c75d666
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/core/project.py
@@ -0,0 +1,115 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+from .operations import load_operations
+from .session import SessionState, SessionStore
+from ..utils.openrefine_backend import OpenRefineBackend
+
+
+class OpenRefineService:
+ def __init__(self, backend: OpenRefineBackend, store: SessionStore):
+ self.backend = backend
+ self.store = store
+
+ def status(self) -> dict[str, Any]:
+ state = self.store.load()
+ ping = self.backend.ping()
+ return {"backend": ping, "session": state.to_dict()}
+
+ def list_projects(self) -> dict[str, Any]:
+ return self.backend.list_projects()
+
+ def open_project(self, project_id: str, name: str | None = None) -> dict[str, Any]:
+ metadata = self.backend.get_project_metadata(project_id)
+ state = self.store.load()
+ state.base_url = self._backend_base_url()
+ state.project_id = project_id
+ state.project_name = name or metadata.get("name") or metadata.get("projectName") or project_id
+ self.store.record(state, "open", {"project_id": project_id, "project_name": state.project_name})
+ self.store.save(state)
+ return {"project_id": project_id, "project_name": state.project_name, "metadata": metadata}
+
+ def import_file(self, path: str | Path, name: str | None = None, project_format: str | None = None) -> dict[str, Any]:
+ created = self.backend.create_project(path, name=name, project_format=project_format)
+ project_id = _extract_project_id(created)
+ state = self.store.load()
+ state.base_url = self._backend_base_url()
+ state.project_id = project_id
+ state.project_name = name or Path(path).stem
+ self.store.record(state, "import", {"path": str(path), "project_id": project_id, "project_name": state.project_name})
+ self.store.save(state)
+ return {"project_id": project_id, "project_name": state.project_name, "response": created}
+
+ def apply_operations_file(self, operations_path: str | Path, project_id: str | None = None) -> dict[str, Any]:
+ operations = load_operations(operations_path)
+ state = self.store.load()
+ target_id = project_id or state.project_id
+ if not target_id:
+ raise ValueError("No project selected. Pass --project-id or import/open a project first.")
+ response = self.backend.apply_operations(target_id, operations)
+ state.base_url = self._backend_base_url()
+ self.store.record(state, "apply-operations", {"project_id": target_id, "operations_path": str(operations_path), "count": len(operations)})
+ state.project_id = target_id
+ self.store.save(state)
+ return {"project_id": target_id, "operation_count": len(operations), "response": response}
+
+ def export_rows(self, output_path: str | Path, export_format: str = "csv", project_id: str | None = None) -> dict[str, Any]:
+ state = self.store.load()
+ target_id = project_id or state.project_id
+ if not target_id:
+ raise ValueError("No project selected. Pass --project-id or import/open a project first.")
+ output = self.backend.export_rows(target_id, output_path, export_format)
+ state.base_url = self._backend_base_url()
+ state.project_id = target_id
+ state.last_export = str(output)
+ self.store.record(state, "export", {"project_id": target_id, "output": str(output), "format": export_format})
+ self.store.save(state)
+ return {"project_id": target_id, "output": str(output), "format": export_format, "bytes": output.stat().st_size}
+
+ def rows(self, start: int = 0, limit: int = 10, project_id: str | None = None) -> dict[str, Any]:
+ state = self.store.load()
+ target_id = project_id or state.project_id
+ if not target_id:
+ raise ValueError("No project selected. Pass --project-id or import/open a project first.")
+ return self.backend.get_rows(target_id, start=start, limit=limit)
+
+ def undo(self, project_id: str | None = None) -> dict[str, Any]:
+ state = self.store.load()
+ target_id = project_id or state.project_id
+ if not target_id:
+ local = self.store.undo(state)
+ self.store.save(state)
+ return {"mode": "session", "undone": local}
+ response = self.backend.undo(target_id)
+ state.base_url = self._backend_base_url()
+ local = self.store.undo(state) if state.history else None
+ self.store.save(state)
+ return {"mode": "backend", "project_id": target_id, "response": response, "local": local}
+
+ def redo(self, project_id: str | None = None) -> dict[str, Any]:
+ state = self.store.load()
+ target_id = project_id or state.project_id
+ if not target_id:
+ local = self.store.redo(state)
+ self.store.save(state)
+ return {"mode": "session", "redone": local}
+ response = self.backend.redo(target_id)
+ state.base_url = self._backend_base_url()
+ local = self.store.redo(state) if state.future else None
+ self.store.save(state)
+ return {"mode": "backend", "project_id": target_id, "response": response, "local": local}
+
+ def _backend_base_url(self) -> str:
+ return str(getattr(self.backend, "base_url", SessionState().base_url))
+
+
+def _extract_project_id(payload: dict[str, Any]) -> str:
+ for key in ("project", "projectID", "project_id", "id"):
+ value = payload.get(key)
+ if value:
+ return str(value)
+ if "Location" in payload:
+ return str(payload["Location"]).rstrip("/").split("/")[-1]
+ raise ValueError(f"Could not determine project id from OpenRefine response: {payload}")
diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/session.py b/openrefine/agent-harness/cli_anything/openrefine/core/session.py
new file mode 100644
index 000000000..7bc8e01d3
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/core/session.py
@@ -0,0 +1,111 @@
+from __future__ import annotations
+
+import json
+import os
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+
+DEFAULT_SESSION = Path.home() / ".cli-anything-openrefine" / "session.json"
+
+
+@dataclass
+class SessionState:
+ base_url: str = "http://127.0.0.1:3333"
+ project_id: str | None = None
+ project_name: str | None = None
+ last_export: str | None = None
+ history: list[dict[str, Any]] = field(default_factory=list)
+ future: list[dict[str, Any]] = field(default_factory=list)
+
+ def to_dict(self) -> dict[str, Any]:
+ return {
+ "base_url": self.base_url,
+ "project_id": self.project_id,
+ "project_name": self.project_name,
+ "last_export": self.last_export,
+ "history": self.history,
+ "future": self.future,
+ }
+
+ @classmethod
+ def from_dict(cls, data: dict[str, Any]) -> "SessionState":
+ return cls(
+ base_url=str(data.get("base_url") or "http://127.0.0.1:3333"),
+ project_id=data.get("project_id"),
+ project_name=data.get("project_name"),
+ last_export=data.get("last_export"),
+ history=list(data.get("history") or []),
+ future=list(data.get("future") or []),
+ )
+
+
+class SessionStore:
+ def __init__(self, path: str | Path | None = None):
+ self.path = Path(path) if path else DEFAULT_SESSION
+
+ def load(self) -> SessionState:
+ if not self.path.exists():
+ return SessionState()
+ data = json.loads(self.path.read_text(encoding="utf-8"))
+ if not isinstance(data, dict):
+ raise ValueError(f"Session file is not a JSON object: {self.path}")
+ return SessionState.from_dict(data)
+
+ def save(self, state: SessionState) -> Path:
+ _locked_save_json(self.path, state.to_dict(), indent=2, sort_keys=True)
+ return self.path
+
+ def effective_base_url(self, requested_base_url: str | None = None) -> str:
+ if requested_base_url:
+ return requested_base_url
+ try:
+ return self.load().base_url
+ except FileNotFoundError:
+ return SessionState().base_url
+
+ def record(self, state: SessionState, action: str, payload: dict[str, Any]) -> None:
+ state.history.append({"action": action, "payload": payload})
+ state.future.clear()
+
+ def undo(self, state: SessionState) -> dict[str, Any]:
+ if not state.history:
+ raise ValueError("No local session action to undo")
+ item = state.history.pop()
+ state.future.append(item)
+ return item
+
+ def redo(self, state: SessionState) -> dict[str, Any]:
+ if not state.future:
+ raise ValueError("No local session action to redo")
+ item = state.future.pop()
+ state.history.append(item)
+ return item
+
+
+def _locked_save_json(path: Path, data: dict[str, Any], **dump_kwargs: Any) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
+ try:
+ handle = path.open("r+", encoding="utf-8")
+ except FileNotFoundError:
+ handle = path.open("w+", encoding="utf-8")
+ with handle:
+ locked = False
+ try:
+ import fcntl
+
+ fcntl.flock(handle.fileno(), fcntl.LOCK_EX)
+ locked = True
+ except (ImportError, OSError):
+ pass
+ try:
+ handle.seek(0)
+ handle.truncate()
+ json.dump(data, handle, **dump_kwargs)
+ handle.write("\n")
+ handle.flush()
+ os.fsync(handle.fileno())
+ finally:
+ if locked:
+ fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
diff --git a/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py b/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py
new file mode 100644
index 000000000..4b0ac449a
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py
@@ -0,0 +1,351 @@
+from __future__ import annotations
+
+import json
+import os
+import shlex
+import sys
+import tempfile
+from pathlib import Path
+from typing import Any
+
+import click
+
+from . import __version__
+from .core.operations import column_addition, column_removal, mass_edit, save_operations, text_transform
+from .core.project import OpenRefineService
+from .core.session import SessionStore
+from .utils.openrefine_backend import OpenRefineBackend, OpenRefineError, start_openrefine
+from .utils.repl_skin import ReplSkin
+
+
+def _service(ctx: click.Context) -> OpenRefineService:
+ store = SessionStore(ctx.obj["session"])
+ base_url = store.effective_base_url(ctx.obj["base_url"])
+ ctx.obj["effective_base_url"] = base_url
+ return OpenRefineService(OpenRefineBackend(base_url, timeout=ctx.obj["timeout"]), store)
+
+
+def _emit(data: Any, as_json: bool) -> None:
+ if as_json:
+ click.echo(json.dumps(data, indent=2, sort_keys=True))
+ elif isinstance(data, dict):
+ for key, value in data.items():
+ click.echo(f"{key}: {value}")
+ else:
+ click.echo(str(data))
+
+
+def _handle(ctx: click.Context, func, *args, **kwargs) -> None:
+ try:
+ _emit(func(*args, **kwargs), ctx.obj["json"])
+ except (OpenRefineError, ValueError, OSError) as exc:
+ if ctx.obj["json"]:
+ click.echo(json.dumps({"error": str(exc), "ok": False}, indent=2, sort_keys=True), err=True)
+ else:
+ click.echo(f"Error: {exc}", err=True)
+ raise click.exceptions.Exit(1)
+
+
+@click.group(invoke_without_command=True)
+@click.option("--base-url", default=None, help="OpenRefine URL. Defaults to OPENREFINE_URL, then session state, then http://127.0.0.1:3333.")
+@click.option("--session", "session_path", type=click.Path(dir_okay=False), default=None, help="Session JSON path.")
+@click.option("--timeout", type=float, default=30.0, show_default=True)
+@click.option("--json", "json_output", is_flag=True, help="Emit machine-readable JSON.")
+@click.version_option(__version__)
+@click.pass_context
+def cli(ctx: click.Context, base_url: str, session_path: str | None, timeout: float, json_output: bool) -> None:
+ """Agent-native CLI for OpenRefine's local HTTP API."""
+ ctx.ensure_object(dict)
+ requested_base_url = base_url or os.environ.get("OPENREFINE_URL")
+ ctx.obj.update({"base_url": requested_base_url, "session": session_path, "timeout": timeout, "json": json_output})
+ if ctx.invoked_subcommand is None:
+ ctx.invoke(repl)
+
+
+@cli.command()
+@click.pass_context
+def repl(ctx: click.Context) -> None:
+ """Start the interactive REPL."""
+ history_file = _repl_history_file(ctx)
+ skin = ReplSkin("openrefine", version=__version__, history_file=history_file)
+ skin.print_banner()
+ prompt = skin.create_prompt_session()
+ commands = {
+ "status": "Check backend and session",
+ "projects": "List OpenRefine projects",
+ "import [name]": "Create a project from a local data file",
+ "open ": "Select an existing project",
+ "rows [limit]": "Show rows for current project",
+ "export [format]": "Export rows from current project",
+ "undo / redo": "Use OpenRefine undo-redo where possible",
+ "exit": "Quit",
+ }
+ while True:
+ try:
+ state = SessionStore(ctx.obj["session"]).load()
+ line = skin.get_input(prompt, project_name=state.project_name)
+ except (EOFError, KeyboardInterrupt):
+ skin.print_goodbye()
+ return
+ try:
+ parts = shlex.split(line)
+ except (IndexError, ValueError) as exc:
+ skin.error(str(exc))
+ continue
+ if not parts:
+ continue
+ try:
+ args = _repl_to_args(parts)
+ except (IndexError, ValueError) as exc:
+ skin.error(str(exc))
+ continue
+ if parts[0] in {"exit", "quit"}:
+ skin.print_goodbye()
+ return
+ if parts[0] == "help":
+ skin.help(commands)
+ continue
+ try:
+ cli.main(args=_global_args(ctx) + args, prog_name="cli-anything-openrefine", obj=ctx.obj, standalone_mode=False)
+ except SystemExit:
+ pass
+ except Exception as exc:
+ skin.error(str(exc))
+
+
+def _repl_to_args(parts: list[str]) -> list[str]:
+ command = parts[0]
+ if command == "projects":
+ return ["project", "list"]
+ if command == "import":
+ if len(parts) < 2:
+ raise ValueError("Usage: import [name]")
+ args = ["project", "import", parts[1]]
+ if len(parts) > 2:
+ args.extend(["--name", parts[2]])
+ return args
+ if command == "open":
+ if len(parts) < 2:
+ raise ValueError("Usage: open ")
+ return ["project", "open", parts[1]]
+ if command == "rows":
+ return ["data", "rows", "--limit", parts[1] if len(parts) > 1 else "10"]
+ if command == "export":
+ if len(parts) < 2:
+ raise ValueError("Usage: export [format]")
+ args = ["data", "export", parts[1]]
+ if len(parts) > 2:
+ args.extend(["--format", parts[2]])
+ return args
+ if command in {"status", "undo", "redo"}:
+ return ["session", command] if command in {"undo", "redo"} else ["status"]
+ return parts
+
+
+def _global_args(ctx: click.Context) -> list[str]:
+ args: list[str] = []
+ base_url = ctx.obj.get("effective_base_url") or ctx.obj.get("base_url")
+ if base_url:
+ args.extend(["--base-url", str(base_url)])
+ if ctx.obj.get("session"):
+ args.extend(["--session", str(ctx.obj["session"])])
+ if ctx.obj.get("timeout") is not None:
+ args.extend(["--timeout", str(ctx.obj["timeout"])])
+ if ctx.obj.get("json"):
+ args.append("--json")
+ return args
+
+
+def _repl_history_file(ctx: click.Context) -> str:
+ if ctx.obj.get("session"):
+ return str(Path(ctx.obj["session"]).expanduser().with_name("history"))
+ return str(Path(tempfile.gettempdir()) / "cli-anything-openrefine-history")
+
+
+@cli.command()
+@click.pass_context
+def status(ctx: click.Context) -> None:
+ """Show backend health and current session."""
+ _handle(ctx, lambda: _service(ctx).status())
+
+
+@cli.group()
+def server() -> None:
+ """Start or inspect an OpenRefine backend."""
+
+
+@server.command("start")
+@click.option("--port", default=3333, show_default=True)
+@click.option("--host", default="127.0.0.1", show_default=True)
+@click.option("--data-dir", type=click.Path(file_okay=False))
+@click.pass_context
+def server_start(ctx: click.Context, port: int, host: str, data_dir: str | None) -> None:
+ _handle(ctx, lambda: {"pid": start_openrefine(port=port, host=host, data_dir=data_dir).pid, "host": host, "port": port})
+
+
+@server.command("ping")
+@click.pass_context
+def server_ping(ctx: click.Context) -> None:
+ _handle(ctx, lambda: _service(ctx).backend.ping())
+
+
+@cli.group()
+def project() -> None:
+ """Project import, open, list, and metadata commands."""
+
+
+@project.command("list")
+@click.pass_context
+def project_list(ctx: click.Context) -> None:
+ _handle(ctx, lambda: _service(ctx).list_projects())
+
+
+@project.command("open")
+@click.argument("project_id")
+@click.option("--name")
+@click.pass_context
+def project_open(ctx: click.Context, project_id: str, name: str | None) -> None:
+ _handle(ctx, lambda: _service(ctx).open_project(project_id, name))
+
+
+@project.command("import")
+@click.argument("input_path", type=click.Path(exists=True, dir_okay=False))
+@click.option("--name")
+@click.option("--format", "project_format")
+@click.pass_context
+def project_import(ctx: click.Context, input_path: str, name: str | None, project_format: str | None) -> None:
+ _handle(ctx, lambda: _service(ctx).import_file(input_path, name, project_format))
+
+
+@cli.group()
+def data() -> None:
+ """Rows, operation histories, and exports."""
+
+
+@data.command("rows")
+@click.option("--project-id")
+@click.option("--start", default=0, show_default=True)
+@click.option("--limit", default=10, show_default=True)
+@click.pass_context
+def data_rows(ctx: click.Context, project_id: str | None, start: int, limit: int) -> None:
+ _handle(ctx, lambda: _service(ctx).rows(start, limit, project_id))
+
+
+@data.command("apply")
+@click.argument("operations_json", type=click.Path(exists=True, dir_okay=False))
+@click.option("--project-id")
+@click.pass_context
+def data_apply(ctx: click.Context, operations_json: str, project_id: str | None) -> None:
+ _handle(ctx, lambda: _service(ctx).apply_operations_file(operations_json, project_id))
+
+
+@data.command("export")
+@click.argument("output_path", type=click.Path(dir_okay=False))
+@click.option("--project-id")
+@click.option("--format", "export_format", default="csv", show_default=True)
+@click.pass_context
+def data_export(ctx: click.Context, output_path: str, project_id: str | None, export_format: str) -> None:
+ _handle(ctx, lambda: _service(ctx).export_rows(output_path, export_format, project_id))
+
+
+@cli.group()
+def ops() -> None:
+ """Build reusable OpenRefine operation-history JSON files."""
+
+
+@ops.command("text-transform")
+@click.argument("output", type=click.Path(dir_okay=False))
+@click.option("--column", required=True)
+@click.option("--expression", required=True)
+@click.pass_context
+def ops_text_transform(ctx: click.Context, output: str, column: str, expression: str) -> None:
+ def _build() -> dict[str, Any]:
+ op = text_transform(column, expression)
+ path = save_operations([op], output)
+ return {"output": str(path), "operations": [op]}
+
+ _handle(ctx, _build)
+
+
+@ops.command("mass-edit")
+@click.argument("output", type=click.Path(dir_okay=False))
+@click.option("--column", required=True)
+@click.option("--edit", multiple=True, help="Mapping in old=new form. Repeatable.")
+@click.pass_context
+def ops_mass_edit(ctx: click.Context, output: str, column: str, edit: tuple[str, ...]) -> None:
+ def _build() -> dict[str, Any]:
+ edits = {}
+ for item in edit:
+ if "=" not in item:
+ raise ValueError("--edit must be in old=new form")
+ src, dst = item.split("=", 1)
+ edits[src] = dst
+ op = mass_edit(column, edits)
+ path = save_operations([op], output)
+ return {"output": str(path), "operations": [op]}
+
+ _handle(ctx, _build)
+
+
+@ops.command("add-column")
+@click.argument("output", type=click.Path(dir_okay=False))
+@click.option("--name", required=True)
+@click.option("--source-column", required=True)
+@click.option("--expression", required=True)
+@click.pass_context
+def ops_add_column(ctx: click.Context, output: str, name: str, source_column: str, expression: str) -> None:
+ def _build() -> dict[str, Any]:
+ op = column_addition(name, source_column, expression)
+ path = save_operations([op], output)
+ return {"output": str(path), "operations": [op]}
+
+ _handle(ctx, _build)
+
+
+@ops.command("remove-column")
+@click.argument("output", type=click.Path(dir_okay=False))
+@click.option("--column", required=True)
+@click.pass_context
+def ops_remove_column(ctx: click.Context, output: str, column: str) -> None:
+ def _build() -> dict[str, Any]:
+ op = column_removal(column)
+ path = save_operations([op], output)
+ return {"output": str(path), "operations": [op]}
+
+ _handle(ctx, _build)
+
+
+@cli.group()
+def session() -> None:
+ """Session state and undo/redo."""
+
+
+@session.command("show")
+@click.pass_context
+def session_show(ctx: click.Context) -> None:
+ _handle(ctx, lambda: SessionStore(ctx.obj["session"]).load().to_dict())
+
+
+@session.command("undo")
+@click.option("--project-id")
+@click.pass_context
+def session_undo(ctx: click.Context, project_id: str | None) -> None:
+ _handle(ctx, lambda: _service(ctx).undo(project_id))
+
+
+@session.command("redo")
+@click.option("--project-id")
+@click.pass_context
+def session_redo(ctx: click.Context, project_id: str | None) -> None:
+ _handle(ctx, lambda: _service(ctx).redo(project_id))
+
+
+def main(argv: list[str] | None = None) -> int:
+ try:
+ return cli.main(args=argv, prog_name="cli-anything-openrefine", standalone_mode=True) or 0
+ except KeyboardInterrupt:
+ return 130
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md b/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md
new file mode 100644
index 000000000..2ef724c88
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md
@@ -0,0 +1,56 @@
+---
+name: "cli-anything-openrefine"
+description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo."
+contributor: "CLI-Anything-Team"
+---
+
+# CLI-Anything OpenRefine
+
+Use this skill when a task needs OpenRefine data cleaning, transformation, reusable operation histories, or CSV/TSV export from an automated agent workflow.
+
+## Prerequisites
+
+Install the harness:
+
+```bash
+cd openrefine/agent-harness
+python -m pip install -e .
+```
+
+Start OpenRefine before backend commands:
+
+```bash
+openrefine -i 127.0.0.1 -p 3333
+```
+
+Set a custom server with `OPENREFINE_URL=http://127.0.0.1:3333` or pass `--base-url`.
+
+## Command Rules For Agents
+
+- Prefer `--json` on every one-shot command.
+- Use `--session ` for isolated task state.
+- Import or open a project before row, apply, export, undo, or redo commands.
+- Existing OpenRefine operation-history JSON can be passed directly to `data apply`.
+- Generated files are normal OpenRefine operation JSON and exported CSV/TSV data.
+
+## Common Commands
+
+```bash
+cli-anything-openrefine --json server ping
+cli-anything-openrefine --json project list
+cli-anything-openrefine --json --session run/session.json project import messy.csv --name cleanup
+cli-anything-openrefine --json --session run/session.json data rows --limit 10
+cli-anything-openrefine --json ops text-transform run/trim.json --column Name --expression 'value.trim()'
+cli-anything-openrefine --json --session run/session.json data apply run/trim.json
+cli-anything-openrefine --json --session run/session.json data export run/clean.csv --format csv
+cli-anything-openrefine --json --session run/session.json session undo
+cli-anything-openrefine --json --session run/session.json session redo
+```
+
+## REPL
+
+Run `cli-anything-openrefine` with no subcommand to enter the REPL.
+
+## Error Handling
+
+When `--json` is set, command failures write a JSON object to stderr with `ok: false`.
diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md b/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md
new file mode 100644
index 000000000..8190673fb
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md
@@ -0,0 +1,149 @@
+# OpenRefine Harness Test Plan
+
+## Test Inventory Plan
+
+- `test_core.py`: 76 backend-free unit and CLI tests planned.
+- `test_full_e2e.py`: 12 real-backend E2E tests planned.
+
+## Unit Test Plan
+
+- `core.operations`: operation-history JSON builders, validation, save/load round trips, invalid JSON structures.
+- `core.session`: default state, atomic save/load, record, undo, redo, empty-stack errors.
+- `core.project`: service orchestration with fake backend, import/open/apply/export/rows, local and backend undo/redo behavior.
+- `utils.openrefine_backend`: small pure helpers and error types.
+- `openrefine_cli`: help output, default REPL entry, JSON operation builder commands, session show, REPL command mapping.
+
+## E2E Test Plan
+
+The E2E suite targets a real OpenRefine server available at `OPENREFINE_URL` or `http://127.0.0.1:3333`.
+It intentionally fails loudly when the backend is unavailable.
+
+## Realistic Workflow Scenarios
+
+- **CSV import and inspection**: create a project from messy CSV, fetch metadata and rows, verify row content.
+- **Cleaning operation history**: apply `core/text-transform` and verify exported CSV no longer contains padded names.
+- **Normalization operation history**: apply `core/mass-edit` to city values and verify exported content.
+- **Agent subprocess workflow**: run the installed or module CLI with `--json`, import data, inspect rows, export CSV, and parse exported rows with Python `csv`.
+- **Operation file workflow**: build an operation-history JSON file via CLI, apply it to a backend project, and verify operation count.
+- **State persistence**: verify session JSON persists current project and action history across subprocess calls.
+- **Undo/redo recovery**: apply a backend operation and exercise OpenRefine undo/redo endpoints.
+- **Error handling**: verify missing project errors are machine-readable JSON.
+- **Cleanup recovery**: delete a temporary project and verify it disappears from project metadata listings.
+
+## Test Results
+
+Unit suite run:
+
+```text
+$ python -m pytest cli_anything/openrefine/tests/test_core.py -q
+........................................................................ [ 94%]
+.... [100%]
+76 passed in 0.42s
+```
+
+Previous full suite run with OpenRefine 3.10.1 running at `http://127.0.0.1:3333`:
+
+```text
+$ python -m pytest cli_anything/openrefine/tests -q
+........................................................................ [ 94%]
+.... [100%]
+76 passed in 6.20s
+```
+
+Real backend E2E suite run with OpenRefine 3.10.1 running at `http://127.0.0.1:3333`:
+
+```text
+$ python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -q
+............ [100%]
+12 passed in 7.54s
+```
+
+CA-AutoAgent strict validation run after enabling mandatory full E2E:
+
+```text
+$ python
+passed= True
+unit pytest returncode= 0 stdout_tail= ['64 passed in 0.28s']
+full E2E pytest returncode= 0 stdout_tail= ['12 passed in 6.23s']
+```
+
+Current revision backend availability check:
+
+```text
+$ which openrefine || true
+openrefine not found
+$ which refine || true
+refine not found
+$ python - <<'PY'
+import requests
+try:
+ r = requests.get('http://127.0.0.1:3333/command/core/get-version', timeout=2)
+ print(r.status_code)
+ print(r.text[:200])
+except Exception as exc:
+ print(type(exc).__name__ + ': ' + str(exc))
+PY
+ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=3333): Max retries exceeded with url: /command/core/get-version (Caused by NewConnectionError("HTTPConnection(host='127.0.0.1', port=3333): Failed to establish a new connection: [Errno 1] Operation not permitted"))
+```
+
+Earlier sandbox-only E2E attempt before starting OpenRefine:
+
+```text
+$ python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -v --tb=short
+collected 12 items
+
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_backend_ping_reports_version ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_import_csv_and_metadata ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_get_rows_after_import ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_apply_text_transform_and_export_csv ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_apply_mass_edit_normalizes_city ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_help_subprocess PASSED
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_json_import_rows_export_workflow ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_build_apply_operation_file ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_session_persistence ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_backend_undo_redo_after_transform ERROR
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_error_for_missing_project_is_json PASSED
+cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_recovery_delete_project_removes_from_listing ERROR
+
+======================== 2 passed, 10 errors in 12.57s =========================
+```
+
+Those earlier backend E2E failures were explicit and expected before provisioning the server. OpenRefine was not running,
+and the network-isolated sandbox blocked loopback socket access with `PermissionError: [Errno 1] Operation not permitted`.
+The failure message includes:
+
+```text
+OpenRefine backend is not reachable.
+Install OpenRefine 3.10.x or newer from https://openrefine.org/download.html, then start it:
+ openrefine -i 127.0.0.1 -p 3333
+Set OPENREFINE_URL or pass --base-url if your server uses another host or port.
+```
+
+Collection check:
+
+```text
+$ python -m pytest cli_anything/openrefine/tests/ --collect-only -q
+88 tests collected in 0.17s
+```
+
+Setup metadata check:
+
+```text
+$ python setup.py --name
+cli-anything-openrefine
+$ python setup.py --version
+1.0.0
+```
+
+## Summary Statistics
+
+- Total collected tests: 88
+- Backend-free unit tests: 76 passing
+- E2E tests: 12 collected and previously passing against a real OpenRefine 3.10.1 local HTTP backend
+- Minimum validator thresholds met: 50+ pytest tests and 10+ E2E pytest tests
+
+## Coverage Notes
+
+- Unit tests cover operation JSON builders, session persistence, fake-backend service orchestration, CLI JSON output, and default REPL entry.
+- E2E tests cover real backend import, metadata, row reads, operation application, CSV export verification, subprocess CLI workflows, session persistence, undo/redo, JSON error handling, and cleanup recovery.
+- Reconciliation workflows are documented as a limitation and currently require applying exported OpenRefine reconciliation operation histories.
diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py b/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py
new file mode 100644
index 000000000..58b0e7427
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py
@@ -0,0 +1,9 @@
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+
+HARNESS_ROOT = Path(__file__).resolve().parents[3]
+if str(HARNESS_ROOT) not in sys.path:
+ sys.path.insert(0, str(HARNESS_ROOT))
diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py b/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py
new file mode 100644
index 000000000..37ac3e0f8
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py
@@ -0,0 +1,465 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+from click.testing import CliRunner
+
+from cli_anything.openrefine.core.operations import (
+ column_addition,
+ column_removal,
+ load_operations,
+ mass_edit,
+ save_operations,
+ text_transform,
+)
+from cli_anything.openrefine.core.project import OpenRefineService, _extract_project_id
+from cli_anything.openrefine.core.session import SessionState, SessionStore
+from cli_anything.openrefine import openrefine_cli
+from cli_anything.openrefine.openrefine_cli import _repl_to_args, cli
+from cli_anything.openrefine.utils.openrefine_backend import OpenRefineBackend, OpenRefineError, _coerce_json_or_text
+
+
+class FakeBackend:
+ def __init__(self, base_url="http://127.0.0.1:3333", timeout=30.0):
+ self.base_url = base_url.rstrip("/")
+ self.timeout = timeout
+ self.created = {"project": "123"}
+ self.operations = []
+ self.deleted = []
+
+ def ping(self):
+ return {"version": "3.10.1"}
+
+ def list_projects(self):
+ return {"projects": {"123": {"name": "Messy"}}}
+
+ def get_project_metadata(self, project_id):
+ return {"name": f"Project {project_id}", "project_id": project_id}
+
+ def create_project(self, path, name=None, project_format=None):
+ return dict(self.created, name=name, format=project_format, path=str(path))
+
+ def apply_operations(self, project_id, operations):
+ self.operations.append((project_id, operations))
+ return {"code": "ok"}
+
+ def export_rows(self, project_id, output_path, export_format="csv"):
+ path = Path(output_path)
+ path.write_text("name,value\nAlice,1\n", encoding="utf-8")
+ return path
+
+ def get_rows(self, project_id, start=0, limit=10):
+ return {"rows": [{"cells": [{"v": "Alice"}]}], "start": start, "limit": limit, "project": project_id}
+
+ def undo(self, project_id):
+ return {"undone": project_id}
+
+ def redo(self, project_id):
+ return {"redone": project_id}
+
+
+class RecordingOpenRefineBackend(OpenRefineBackend):
+ def __init__(self, history):
+ self.history = history
+ self.calls = []
+
+ def _json(self, method, path, **kwargs):
+ self.calls.append((method, path, kwargs))
+ if path == "/command/core/get-history":
+ return self.history
+ if path == "/command/core/undo-redo":
+ return {"code": "ok", "data": kwargs["data"]}
+ raise AssertionError(f"Unexpected endpoint: {path}")
+
+
+def test_text_transform_shape():
+ op = text_transform("Name", "value.trim()")
+ assert op["op"] == "core/text-transform"
+ assert op["columnName"] == "Name"
+ assert op["expression"] == "value.trim()"
+
+
+@pytest.mark.parametrize("column,expression", [("", "value"), ("Name", ""), (" ", "value")])
+def test_text_transform_rejects_blank(column, expression):
+ with pytest.raises(ValueError):
+ text_transform(column, expression)
+
+
+def test_mass_edit_shape():
+ op = mass_edit("City", {"NYC": "New York", "SF": "San Francisco"})
+ assert op["op"] == "core/mass-edit"
+ assert len(op["edits"]) == 2
+ assert op["edits"][0]["from"] == ["NYC"]
+
+
+def test_mass_edit_rejects_empty_edits():
+ with pytest.raises(ValueError):
+ mass_edit("City", {})
+
+
+def test_mass_edit_stringifies_values():
+ op = mass_edit("Code", {1: 2})
+ assert op["edits"][0]["from"] == ["1"]
+ assert op["edits"][0]["to"] == "2"
+
+
+def test_column_addition_shape():
+ op = column_addition("slug", "Name", "value.toLowercase()")
+ assert op["op"] == "core/column-addition"
+ assert op["newColumnName"] == "slug"
+ assert op["baseColumnName"] == "Name"
+
+
+def test_column_removal_shape():
+ op = column_removal("unused")
+ assert op == {"op": "core/column-removal", "columnName": "unused", "description": "Remove column unused"}
+
+
+@pytest.mark.parametrize("factory,args", [(column_addition, ("", "Name", "value")), (column_removal, ("",))])
+def test_column_builders_reject_blank(factory, args):
+ with pytest.raises(ValueError):
+ factory(*args)
+
+
+def test_save_and_load_operations_roundtrip(tmp_path):
+ path = tmp_path / "ops.json"
+ ops = [text_transform("Name", "value.trim()")]
+ save_operations(ops, path)
+ assert load_operations(path) == ops
+
+
+def test_load_operations_rejects_non_list(tmp_path):
+ path = tmp_path / "ops.json"
+ path.write_text("{}", encoding="utf-8")
+ with pytest.raises(ValueError):
+ load_operations(path)
+
+
+def test_load_operations_rejects_non_object_item(tmp_path):
+ path = tmp_path / "ops.json"
+ path.write_text("[1]", encoding="utf-8")
+ with pytest.raises(ValueError):
+ load_operations(path)
+
+
+def test_session_defaults():
+ state = SessionState()
+ assert state.base_url == "http://127.0.0.1:3333"
+ assert state.project_id is None
+ assert state.history == []
+
+
+def test_session_to_from_dict_roundtrip():
+ state = SessionState(project_id="abc", project_name="Demo", last_export="out.csv", history=[{"action": "x"}])
+ assert SessionState.from_dict(state.to_dict()).to_dict() == state.to_dict()
+
+
+def test_session_load_missing_returns_default(tmp_path):
+ assert SessionStore(tmp_path / "missing.json").load().project_id is None
+
+
+def test_session_save_creates_parent_and_loads(tmp_path):
+ store = SessionStore(tmp_path / "nested" / "session.json")
+ store.save(SessionState(project_id="p1"))
+ assert store.load().project_id == "p1"
+
+
+def test_session_effective_base_url_prefers_requested(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(base_url="http://127.0.0.1:4444"))
+ assert store.effective_base_url("http://127.0.0.1:5555") == "http://127.0.0.1:5555"
+
+
+def test_session_effective_base_url_reuses_session(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(base_url="http://127.0.0.1:4444"))
+ assert store.effective_base_url() == "http://127.0.0.1:4444"
+
+
+def test_session_record_clears_future():
+ store = SessionStore()
+ state = SessionState(future=[{"action": "redo"}])
+ store.record(state, "import", {"project": "p1"})
+ assert state.history[-1]["action"] == "import"
+ assert state.future == []
+
+
+def test_session_undo_moves_to_future():
+ store = SessionStore()
+ state = SessionState(history=[{"action": "import"}])
+ undone = store.undo(state)
+ assert undone["action"] == "import"
+ assert state.future == [undone]
+
+
+def test_session_redo_moves_to_history():
+ store = SessionStore()
+ state = SessionState(future=[{"action": "import"}])
+ redone = store.redo(state)
+ assert redone["action"] == "import"
+ assert state.history == [redone]
+
+
+def test_session_undo_empty_raises():
+ with pytest.raises(ValueError):
+ SessionStore().undo(SessionState())
+
+
+def test_session_redo_empty_raises():
+ with pytest.raises(ValueError):
+ SessionStore().redo(SessionState())
+
+
+@pytest.mark.parametrize("payload,expected", [
+ ({"project": 123}, "123"),
+ ({"projectID": "abc"}, "abc"),
+ ({"project_id": "def"}, "def"),
+ ({"id": "ghi"}, "ghi"),
+ ({"Location": "http://x/project/jkl"}, "jkl"),
+])
+def test_extract_project_id_variants(payload, expected):
+ assert _extract_project_id(payload) == expected
+
+
+def test_extract_project_id_failure():
+ with pytest.raises(ValueError):
+ _extract_project_id({"ok": True})
+
+
+def test_service_status(tmp_path):
+ service = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json"))
+ assert service.status()["backend"]["version"] == "3.10.1"
+
+
+def test_service_list_projects(tmp_path):
+ service = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json"))
+ assert "123" in service.list_projects()["projects"]
+
+
+def test_service_open_project_persists_session(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ result = OpenRefineService(FakeBackend(base_url="http://127.0.0.1:4444"), store).open_project("123")
+ assert result["project_name"] == "Project 123"
+ assert store.load().project_id == "123"
+ assert store.load().base_url == "http://127.0.0.1:4444"
+
+
+def test_service_import_file_persists_project(tmp_path):
+ csv = tmp_path / "input.csv"
+ csv.write_text("a\n1\n", encoding="utf-8")
+ store = SessionStore(tmp_path / "s.json")
+ result = OpenRefineService(FakeBackend(base_url="http://127.0.0.1:4444"), store).import_file(csv, name="Imported")
+ assert result["project_id"] == "123"
+ assert store.load().project_name == "Imported"
+ assert store.load().base_url == "http://127.0.0.1:4444"
+
+
+def test_service_apply_operations_uses_session_project(tmp_path):
+ ops = tmp_path / "ops.json"
+ save_operations([text_transform("a", "value.trim()")], ops)
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(project_id="123"))
+ backend = FakeBackend()
+ result = OpenRefineService(backend, store).apply_operations_file(ops)
+ assert result["operation_count"] == 1
+ assert backend.operations[0][0] == "123"
+
+
+def test_service_apply_operations_requires_project(tmp_path):
+ ops = tmp_path / "ops.json"
+ save_operations([text_transform("a", "value.trim()")], ops)
+ with pytest.raises(ValueError):
+ OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).apply_operations_file(ops)
+
+
+def test_service_export_writes_output_and_session(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(project_id="123"))
+ output = tmp_path / "out.csv"
+ result = OpenRefineService(FakeBackend(), store).export_rows(output)
+ assert output.read_text(encoding="utf-8").startswith("name,value")
+ assert result["bytes"] > 0
+ assert store.load().last_export == str(output)
+
+
+def test_service_rows_uses_project_override(tmp_path):
+ result = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).rows(project_id="override", limit=3)
+ assert result["project"] == "override"
+ assert result["limit"] == 3
+
+
+def test_service_rows_requires_project(tmp_path):
+ with pytest.raises(ValueError):
+ OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).rows()
+
+
+def test_service_undo_local_when_no_project(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(history=[{"action": "open"}]))
+ result = OpenRefineService(FakeBackend(), store).undo()
+ assert result["mode"] == "session"
+
+
+def test_service_redo_local_when_no_project(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(future=[{"action": "open"}]))
+ result = OpenRefineService(FakeBackend(), store).redo()
+ assert result["mode"] == "session"
+
+
+def test_service_undo_backend_when_project(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(project_id="123", history=[{"action": "apply"}]))
+ result = OpenRefineService(FakeBackend(), store).undo()
+ assert result["mode"] == "backend"
+ assert result["response"]["undone"] == "123"
+
+
+def test_service_redo_backend_when_project(tmp_path):
+ store = SessionStore(tmp_path / "s.json")
+ store.save(SessionState(project_id="123", future=[{"action": "apply"}]))
+ result = OpenRefineService(FakeBackend(), store).redo()
+ assert result["mode"] == "backend"
+ assert result["response"]["redone"] == "123"
+
+
+@pytest.mark.parametrize("text,expected", [("{\"a\": 1}", {"a": 1}), ("plain", "plain"), ("", "")])
+def test_coerce_json_or_text(text, expected):
+ assert _coerce_json_or_text(text) == expected
+
+
+def test_backend_undo_uses_openrefine_undo_id():
+ backend = RecordingOpenRefineBackend({"past": [{"id": 10}, {"id": 11}], "future": []})
+ result = backend.undo("123")
+ assert result["data"] == {"project": "123", "undoID": "11"}
+
+
+def test_backend_redo_uses_openrefine_last_done_id():
+ backend = RecordingOpenRefineBackend({"past": [], "future": [{"id": 12}, {"id": 13}]})
+ result = backend.redo("123")
+ assert result["data"] == {"project": "123", "lastDoneID": "12"}
+
+
+def test_backend_undo_without_history_raises():
+ with pytest.raises(OpenRefineError):
+ RecordingOpenRefineBackend({"past": []}).undo("123")
+
+
+def test_backend_redo_without_history_raises():
+ with pytest.raises(OpenRefineError):
+ RecordingOpenRefineBackend({"future": []}).redo("123")
+
+
+@pytest.mark.parametrize("parts,args", [
+ (["projects"], ["project", "list"]),
+ (["import", "x.csv"], ["project", "import", "x.csv"]),
+ (["import", "x.csv", "Demo"], ["project", "import", "x.csv", "--name", "Demo"]),
+ (["open", "123"], ["project", "open", "123"]),
+ (["rows"], ["data", "rows", "--limit", "10"]),
+ (["rows", "5"], ["data", "rows", "--limit", "5"]),
+ (["export", "out.csv"], ["data", "export", "out.csv"]),
+ (["export", "out.tsv", "tsv"], ["data", "export", "out.tsv", "--format", "tsv"]),
+ (["undo"], ["session", "undo"]),
+ (["redo"], ["session", "redo"]),
+])
+def test_repl_to_args(parts, args):
+ assert _repl_to_args(parts) == args
+
+
+@pytest.mark.parametrize("parts", [["import"], ["open"], ["export"]])
+def test_repl_to_args_rejects_incomplete_commands(parts):
+ with pytest.raises(ValueError):
+ _repl_to_args(parts)
+
+
+def test_cli_uses_session_base_url_when_not_supplied(tmp_path, monkeypatch):
+ session = tmp_path / "s.json"
+ SessionStore(session).save(SessionState(base_url="http://127.0.0.1:4444", project_id="123"))
+ seen = {}
+
+ class RecordingBackend(FakeBackend):
+ def get_rows(self, project_id, start=0, limit=10):
+ seen["base_url"] = self.base_url
+ return super().get_rows(project_id, start=start, limit=limit)
+
+ monkeypatch.setattr(openrefine_cli, "OpenRefineBackend", RecordingBackend)
+ result = CliRunner().invoke(cli, ["--json", "--session", str(session), "data", "rows"])
+ assert result.exit_code == 0
+ assert seen["base_url"] == "http://127.0.0.1:4444"
+
+
+def test_cli_session_show_invalid_json_uses_json_error(tmp_path):
+ session = tmp_path / "s.json"
+ session.write_text("{bad", encoding="utf-8")
+ result = CliRunner().invoke(cli, ["--json", "--session", str(session), "session", "show"])
+ assert result.exit_code == 1
+ assert json.loads(result.stderr)["ok"] is False
+
+
+def test_cli_help_runs():
+ result = CliRunner().invoke(cli, ["--help"])
+ assert result.exit_code == 0
+ assert "Agent-native CLI" in result.output
+
+
+def test_cli_ops_text_transform_json(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["--json", "ops", "text-transform", str(output), "--column", "Name", "--expression", "value.trim()"])
+ assert result.exit_code == 0
+ payload = json.loads(result.output)
+ assert payload["operations"][0]["op"] == "core/text-transform"
+ assert output.exists()
+
+
+def test_cli_ops_mass_edit_json(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["--json", "ops", "mass-edit", str(output), "--column", "City", "--edit", "NYC=New York"])
+ assert result.exit_code == 0
+ assert json.loads(output.read_text(encoding="utf-8"))[0]["op"] == "core/mass-edit"
+
+
+def test_cli_ops_mass_edit_bad_mapping(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["ops", "mass-edit", str(output), "--column", "City", "--edit", "bad"])
+ assert result.exit_code != 0
+
+
+def test_cli_ops_mass_edit_bad_mapping_json_error(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["--json", "ops", "mass-edit", str(output), "--column", "City", "--edit", "bad"])
+ assert result.exit_code == 1
+ assert json.loads(result.stderr) == {"error": "--edit must be in old=new form", "ok": False}
+
+
+def test_cli_ops_add_column_json(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["--json", "ops", "add-column", str(output), "--name", "slug", "--source-column", "Name", "--expression", "value"])
+ assert result.exit_code == 0
+ assert json.loads(result.output)["operations"][0]["newColumnName"] == "slug"
+
+
+def test_cli_ops_remove_column_json(tmp_path):
+ output = tmp_path / "ops.json"
+ result = CliRunner().invoke(cli, ["--json", "ops", "remove-column", str(output), "--column", "unused"])
+ assert result.exit_code == 0
+ assert json.loads(result.output)["operations"][0]["columnName"] == "unused"
+
+
+def test_cli_session_show_json_uses_custom_path(tmp_path):
+ session = tmp_path / "s.json"
+ result = CliRunner().invoke(cli, ["--json", "--session", str(session), "session", "show"])
+ assert result.exit_code == 0
+ assert json.loads(result.output)["base_url"].startswith("http")
+
+
+def test_cli_default_enters_repl_and_exits():
+ result = CliRunner().invoke(cli, input="exit\n")
+ assert result.exit_code == 0
+ assert "cli-anything" in result.output
+ assert "Openrefine" in result.output
+
+
+def test_openrefine_error_is_runtime_error():
+ assert issubclass(OpenRefineError, RuntimeError)
diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py b/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py
new file mode 100644
index 000000000..3f4ea1551
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py
@@ -0,0 +1,244 @@
+from __future__ import annotations
+
+import csv
+import json
+import os
+import shutil
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+import pytest
+
+from cli_anything.openrefine.utils.openrefine_backend import INSTALL_INSTRUCTIONS, OpenRefineBackend, OpenRefineError
+
+
+def _resolve_cli(name):
+ force = os.environ.get("CLI_ANYTHING_FORCE_INSTALLED", "").strip() == "1"
+ path = shutil.which(name)
+ if path:
+ print(f"[_resolve_cli] Using installed command: {path}")
+ return [path]
+ if force:
+ raise RuntimeError(f"{name} not found in PATH. Install with: pip install -e .")
+ module = "cli_anything.openrefine.openrefine_cli"
+ print(f"[_resolve_cli] Falling back to: {sys.executable} -m {module}")
+ return [sys.executable, "-m", module]
+
+
+@pytest.fixture(scope="session")
+def base_url():
+ return os.environ.get("OPENREFINE_URL", "http://127.0.0.1:3333")
+
+
+@pytest.fixture(scope="session")
+def backend(base_url):
+ client = OpenRefineBackend(base_url, timeout=15)
+ try:
+ deadline = time.time() + 10
+ last = None
+ while time.time() < deadline:
+ try:
+ client.ping()
+ return client
+ except Exception as exc:
+ last = exc
+ time.sleep(0.5)
+ raise last or RuntimeError("unknown readiness failure")
+ except Exception as exc:
+ raise AssertionError(f"{INSTALL_INSTRUCTIONS}\nE2E backend check failed for {base_url}: {exc}") from exc
+
+
+@pytest.fixture()
+def sample_csv(tmp_path):
+ path = tmp_path / "messy.csv"
+ path.write_text("Name,City,Amount\n Alice ,NYC,1\nBob,SF,2\nAlice,NYC,3\n", encoding="utf-8")
+ return path
+
+
+@pytest.fixture()
+def cli_base():
+ return _resolve_cli("cli-anything-openrefine")
+
+
+def _run(cli_base, args, check=True):
+ result = subprocess.run(cli_base + args, capture_output=True, text=True, check=False)
+ print("STDOUT:", result.stdout)
+ print("STDERR:", result.stderr)
+ if check and result.returncode != 0:
+ raise AssertionError(f"Command failed: {args}\nstdout={result.stdout}\nstderr={result.stderr}")
+ return result
+
+
+def _project_id(payload):
+ for key in ("project_id", "project", "projectID", "id"):
+ if payload.get(key):
+ return str(payload[key])
+ if isinstance(payload.get("response"), dict):
+ return _project_id(payload["response"])
+ raise AssertionError(f"No project id in payload: {payload}")
+
+
+def _cleanup(backend, project_id):
+ try:
+ backend.delete_project(project_id)
+ except Exception as exc:
+ print(f"cleanup failed for {project_id}: {exc}")
+
+
+def test_e2e_backend_ping_reports_version(backend):
+ payload = backend.ping()
+ assert payload
+ assert isinstance(payload, dict)
+
+
+def test_e2e_import_csv_and_metadata(backend, sample_csv):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-import")
+ project_id = _project_id(created)
+ try:
+ metadata = backend.get_project_metadata(project_id)
+ assert metadata
+ assert "cli-anything-e2e" in json.dumps(metadata)
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_get_rows_after_import(backend, sample_csv):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-rows")
+ project_id = _project_id(created)
+ try:
+ rows = backend.get_rows(project_id, limit=2)
+ assert "rows" in rows
+ assert len(rows["rows"]) >= 1
+ assert "Alice" in json.dumps(rows)
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_apply_text_transform_and_export_csv(backend, sample_csv, tmp_path):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-transform")
+ project_id = _project_id(created)
+ try:
+ operations = [{
+ "op": "core/text-transform",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "columnName": "Name",
+ "expression": "value.trim()",
+ "onError": "keep-original",
+ "repeat": False,
+ "repeatCount": 10,
+ }]
+ backend.apply_operations(project_id, operations)
+ output = backend.export_rows(project_id, tmp_path / "clean.csv")
+ print(f"\n CSV: {output} ({output.stat().st_size:,} bytes)")
+ content = output.read_text(encoding="utf-8")
+ assert " Alice " not in content
+ assert "Alice" in content
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_apply_mass_edit_normalizes_city(backend, sample_csv, tmp_path):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-mass-edit")
+ project_id = _project_id(created)
+ try:
+ operations = [{
+ "op": "core/mass-edit",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "columnName": "City",
+ "expression": "value",
+ "edits": [{"from": ["NYC"], "fromBlank": False, "fromError": False, "to": "New York"}],
+ }]
+ backend.apply_operations(project_id, operations)
+ output = backend.export_rows(project_id, tmp_path / "cities.csv")
+ assert "New York" in output.read_text(encoding="utf-8")
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_cli_help_subprocess(cli_base):
+ result = _run(cli_base, ["--help"])
+ assert "project" in result.stdout
+ assert "data" in result.stdout
+
+
+def test_e2e_cli_json_import_rows_export_workflow(backend, cli_base, sample_csv, tmp_path, base_url):
+ session = tmp_path / "session.json"
+ imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv), "--name", "cli-anything-e2e-cli"])
+ payload = json.loads(imported.stdout)
+ project_id = _project_id(payload)
+ try:
+ rows = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "rows", "--limit", "2"])
+ assert "Alice" in rows.stdout
+ output = tmp_path / "cli-export.csv"
+ exported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "export", str(output)])
+ export_payload = json.loads(exported.stdout)
+ assert export_payload["bytes"] > 0
+ with output.open(newline="", encoding="utf-8") as handle:
+ parsed = list(csv.reader(handle))
+ assert parsed[0] == ["Name", "City", "Amount"]
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_cli_build_apply_operation_file(backend, cli_base, sample_csv, tmp_path, base_url):
+ session = tmp_path / "session.json"
+ imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv), "--name", "cli-anything-e2e-ops"])
+ project_id = _project_id(json.loads(imported.stdout))
+ try:
+ ops = tmp_path / "ops.json"
+ _run(cli_base, ["--json", "ops", "text-transform", str(ops), "--column", "Name", "--expression", "value.trim()"])
+ applied = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "apply", str(ops)])
+ assert json.loads(applied.stdout)["operation_count"] == 1
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_cli_session_persistence(backend, cli_base, sample_csv, tmp_path, base_url):
+ session = tmp_path / "session.json"
+ imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv)])
+ project_id = _project_id(json.loads(imported.stdout))
+ try:
+ shown = _run(cli_base, ["--json", "--session", str(session), "session", "show"])
+ payload = json.loads(shown.stdout)
+ assert payload["project_id"] == project_id
+ assert payload["history"]
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_backend_undo_redo_after_transform(backend, sample_csv):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-undo")
+ project_id = _project_id(created)
+ try:
+ backend.apply_operations(project_id, [{
+ "op": "core/text-transform",
+ "engineConfig": {"mode": "row-based", "facets": []},
+ "columnName": "Name",
+ "expression": "value.trim()",
+ "onError": "keep-original",
+ "repeat": False,
+ "repeatCount": 10,
+ }])
+ assert backend.undo(project_id)
+ assert backend.redo(project_id)
+ finally:
+ _cleanup(backend, project_id)
+
+
+def test_e2e_cli_error_for_missing_project_is_json(cli_base, tmp_path, base_url):
+ session = tmp_path / "empty-session.json"
+ result = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "rows"], check=False)
+ assert result.returncode != 0
+ payload = json.loads(result.stderr)
+ assert payload["ok"] is False
+ assert "No project selected" in payload["error"]
+
+
+def test_e2e_recovery_delete_project_removes_from_listing(backend, sample_csv):
+ created = backend.create_project(sample_csv, name="cli-anything-e2e-delete")
+ project_id = _project_id(created)
+ backend.delete_project(project_id)
+ projects = backend.list_projects()
+ assert project_id not in json.dumps(projects)
diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py
new file mode 100644
index 000000000..28c51374c
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py
@@ -0,0 +1 @@
+"""Utility modules for the OpenRefine harness."""
diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py b/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py
new file mode 100644
index 000000000..1e66258da
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py
@@ -0,0 +1,215 @@
+from __future__ import annotations
+
+import json
+import shutil
+import subprocess
+import time
+from pathlib import Path
+from typing import Any
+from urllib.parse import parse_qs, urlparse
+
+import requests
+
+
+INSTALL_INSTRUCTIONS = """OpenRefine backend is not reachable.
+
+Install OpenRefine 3.10.x or newer from https://openrefine.org/download.html, then start it:
+ openrefine -i 127.0.0.1 -p 3333
+
+For source builds, run the documented startup command from the OpenRefine repository.
+Set OPENREFINE_URL or pass --base-url if your server uses another host or port.
+"""
+
+
+class OpenRefineError(RuntimeError):
+ pass
+
+
+class OpenRefineBackend:
+ def __init__(self, base_url: str = "http://127.0.0.1:3333", timeout: float = 30.0):
+ self.base_url = base_url.rstrip("/")
+ self.timeout = timeout
+ self.session = requests.Session()
+ self._csrf_token: str | None = None
+
+ def ping(self) -> dict[str, Any]:
+ response = self._request("GET", "/command/core/get-version", csrf=False)
+ try:
+ return response.json()
+ except ValueError:
+ return {"status": "ok", "text": response.text.strip()}
+
+ def wait_until_ready(self, seconds: float = 30.0) -> dict[str, Any]:
+ deadline = time.time() + seconds
+ last_error: Exception | None = None
+ while time.time() < deadline:
+ try:
+ return self.ping()
+ except Exception as exc: # pragma: no cover - exercised by backend E2E
+ last_error = exc
+ time.sleep(0.5)
+ raise OpenRefineError(f"{INSTALL_INSTRUCTIONS}\nLast error: {last_error}")
+
+ def list_projects(self) -> dict[str, Any]:
+ return self._json("GET", "/command/core/get-all-project-metadata", csrf=False)
+
+ def get_project_metadata(self, project_id: str) -> dict[str, Any]:
+ return self._json("GET", "/command/core/get-project-metadata", params={"project": project_id}, csrf=False)
+
+ def get_rows(self, project_id: str, start: int = 0, limit: int = 10) -> dict[str, Any]:
+ return self._json(
+ "GET",
+ "/command/core/get-rows",
+ params={"project": project_id, "start": start, "limit": limit},
+ csrf=False,
+ )
+
+ def create_project(self, input_path: str | Path, name: str | None = None, project_format: str | None = None) -> dict[str, Any]:
+ path = Path(input_path)
+ if not path.exists():
+ raise OpenRefineError(f"Input file not found: {path}")
+ data = {"project-name": name or path.stem}
+ if project_format:
+ data["format"] = project_format
+ with path.open("rb") as handle:
+ files = {"project-file": (path.name, handle)}
+ response = self._request("POST", "/command/core/create-project-from-upload", data=data, files=files, csrf=True)
+ project_id = _project_id_from_url(response.url)
+ if project_id:
+ return {"project": project_id, "location": response.url}
+ payload = _coerce_json_or_text(response.text)
+ if isinstance(payload, dict):
+ if payload.get("code") == "error":
+ raise OpenRefineError(str(payload.get("message") or payload))
+ return payload
+ return {"status": "ok", "text": payload}
+
+ def apply_operations(self, project_id: str, operations: list[dict[str, Any]]) -> dict[str, Any]:
+ return self._json(
+ "POST",
+ "/command/core/apply-operations",
+ data={"project": project_id, "operations": json.dumps(operations)},
+ csrf=True,
+ )
+
+ def export_rows(self, project_id: str, output_path: str | Path, export_format: str = "csv") -> Path:
+ response = self._request(
+ "POST",
+ "/command/core/export-rows",
+ data={"project": project_id, "format": export_format},
+ csrf=True,
+ )
+ target = Path(output_path)
+ target.parent.mkdir(parents=True, exist_ok=True)
+ target.write_bytes(response.content)
+ return target
+
+ def get_history(self, project_id: str) -> dict[str, Any]:
+ return self._json("GET", "/command/core/get-history", params={"project": project_id}, csrf=False)
+
+ def undo(self, project_id: str) -> dict[str, Any]:
+ entry_id = _latest_history_entry_id(self.get_history(project_id), "past")
+ if not entry_id:
+ raise OpenRefineError(f"No OpenRefine history entry to undo for project {project_id}")
+ return self._json("POST", "/command/core/undo-redo", data={"project": project_id, "undoID": entry_id}, csrf=True)
+
+ def redo(self, project_id: str) -> dict[str, Any]:
+ entry_id = _latest_history_entry_id(self.get_history(project_id), "future")
+ if not entry_id:
+ raise OpenRefineError(f"No OpenRefine history entry to redo for project {project_id}")
+ return self._json("POST", "/command/core/undo-redo", data={"project": project_id, "lastDoneID": entry_id}, csrf=True)
+
+ def delete_project(self, project_id: str) -> dict[str, Any]:
+ return self._json("POST", "/command/core/delete-project", data={"project": project_id}, csrf=True)
+
+ def get_csrf_token(self) -> str:
+ if self._csrf_token:
+ return self._csrf_token
+ try:
+ response = self._request("GET", "/command/core/get-csrf-token", csrf=False)
+ payload = _coerce_json_or_text(response.text)
+ if isinstance(payload, dict):
+ token = payload.get("token") or payload.get("csrfToken")
+ else:
+ token = str(payload).strip()
+ if token:
+ self._csrf_token = str(token)
+ return self._csrf_token
+ except OpenRefineError:
+ pass
+ self._csrf_token = "none"
+ return self._csrf_token
+
+ def _json(self, method: str, path: str, **kwargs: Any) -> dict[str, Any]:
+ response = self._request(method, path, **kwargs)
+ try:
+ payload = response.json()
+ except ValueError as exc:
+ raise OpenRefineError(f"Expected JSON from {path}, got: {response.text[:200]}") from exc
+ if not isinstance(payload, dict):
+ raise OpenRefineError(f"Expected JSON object from {path}")
+ return payload
+
+ def _request(self, method: str, path: str, csrf: bool = True, **kwargs: Any) -> requests.Response:
+ params = dict(kwargs.pop("params", {}) or {})
+ data = dict(kwargs.pop("data", {}) or {})
+ if csrf and method.upper() in {"POST", "PUT", "DELETE"}:
+ params.setdefault("csrf_token", self.get_csrf_token())
+ url = f"{self.base_url}{path}"
+ try:
+ response = self.session.request(method, url, params=params, data=data or None, timeout=self.timeout, **kwargs)
+ except requests.RequestException as exc:
+ raise OpenRefineError(f"{INSTALL_INSTRUCTIONS}\nRequest failed for {url}: {exc}") from exc
+ if response.status_code >= 400:
+ raise OpenRefineError(f"OpenRefine HTTP {response.status_code} for {url}: {response.text[:500]}")
+ return response
+
+
+def find_openrefine_executable() -> str | None:
+ for name in ("openrefine", "refine", "OpenRefine"):
+ path = shutil.which(name)
+ if path:
+ return path
+ return None
+
+
+def start_openrefine(port: int = 3333, host: str = "127.0.0.1", data_dir: str | Path | None = None) -> subprocess.Popen:
+ exe = find_openrefine_executable()
+ if not exe:
+ raise OpenRefineError(INSTALL_INSTRUCTIONS)
+ args = [exe, "-i", host, "-p", str(port)]
+ if data_dir:
+ args.extend(["-d", str(data_dir)])
+ return subprocess.Popen(args, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+
+
+def _coerce_json_or_text(text: str) -> Any:
+ stripped = text.strip()
+ if not stripped:
+ return ""
+ try:
+ return json.loads(stripped)
+ except ValueError:
+ return stripped
+
+
+def _project_id_from_url(url: str) -> str | None:
+ parsed = urlparse(url)
+ values = parse_qs(parsed.query).get("project") or parse_qs(parsed.query).get("projectID")
+ if values and values[0]:
+ return str(values[0])
+ return None
+
+
+def _latest_history_entry_id(history: dict[str, Any], stack_name: str) -> str | None:
+ entries = history.get(stack_name) or []
+ if not isinstance(entries, list) or not entries:
+ return None
+ entry = entries[-1] if stack_name == "past" else entries[0]
+ if not isinstance(entry, dict):
+ return None
+ for key in ("id", "historyEntryID", "history_entry_id"):
+ value = entry.get(key)
+ if value is not None:
+ return str(value)
+ return None
diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py b/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py
new file mode 100644
index 000000000..bc1fb6d1d
--- /dev/null
+++ b/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py
@@ -0,0 +1,567 @@
+"""cli-anything REPL Skin — Unified terminal interface for all CLI harnesses.
+
+Copy this file into your CLI package at:
+ cli_anything//utils/repl_skin.py
+
+Usage:
+ from cli_anything..utils.repl_skin import ReplSkin
+
+ skin = ReplSkin("shotcut", version="1.0.0")
+ skin.print_banner() # auto-detects repo-root or packaged SKILL.md
+ prompt_text = skin.prompt(project_name="my_video.mlt", modified=True)
+ skin.success("Project saved")
+ skin.error("File not found")
+ skin.warning("Unsaved changes")
+ skin.info("Processing 24 clips...")
+ skin.status("Track 1", "3 clips, 00:02:30")
+ skin.table(headers, rows)
+ skin.print_goodbye()
+"""
+
+import os
+import sys
+from pathlib import Path
+
+# ── ANSI color codes (no external deps for core styling) ──────────────
+
+_RESET = "\033[0m"
+_BOLD = "\033[1m"
+_DIM = "\033[2m"
+_ITALIC = "\033[3m"
+_UNDERLINE = "\033[4m"
+
+# Brand colors
+_CYAN = "\033[38;5;80m" # cli-anything brand cyan
+_CYAN_BG = "\033[48;5;80m"
+_WHITE = "\033[97m"
+_GRAY = "\033[38;5;245m"
+_DARK_GRAY = "\033[38;5;240m"
+_LIGHT_GRAY = "\033[38;5;250m"
+
+# Software accent colors — each software gets a unique accent
+_ACCENT_COLORS = {
+ "gimp": "\033[38;5;214m", # warm orange
+ "blender": "\033[38;5;208m", # deep orange
+ "inkscape": "\033[38;5;39m", # bright blue
+ "audacity": "\033[38;5;33m", # navy blue
+ "libreoffice": "\033[38;5;40m", # green
+ "obs_studio": "\033[38;5;55m", # purple
+ "kdenlive": "\033[38;5;69m", # slate blue
+ "shotcut": "\033[38;5;35m", # teal green
+}
+_DEFAULT_ACCENT = "\033[38;5;75m" # default sky blue
+
+# Status colors
+_GREEN = "\033[38;5;78m"
+_YELLOW = "\033[38;5;220m"
+_RED = "\033[38;5;196m"
+_BLUE = "\033[38;5;75m"
+_MAGENTA = "\033[38;5;176m"
+
+_SKILL_SOURCE_REPO = os.environ.get("CLI_ANYTHING_SKILL_REPO", "HKUDS/CLI-Anything")
+
+# ── Brand icon ────────────────────────────────────────────────────────
+
+# The cli-anything icon: a small colored diamond/chevron mark
+_ICON = f"{_CYAN}{_BOLD}◆{_RESET}"
+_ICON_SMALL = f"{_CYAN}▸{_RESET}"
+
+# ── Box drawing characters ────────────────────────────────────────────
+
+_H_LINE = "─"
+_V_LINE = "│"
+_TL = "╭"
+_TR = "╮"
+_BL = "╰"
+_BR = "╯"
+_T_DOWN = "┬"
+_T_UP = "┴"
+_T_RIGHT = "├"
+_T_LEFT = "┤"
+_CROSS = "┼"
+
+
+def _strip_ansi(text: str) -> str:
+ """Remove ANSI escape codes for length calculation."""
+ import re
+ return re.sub(r"\033\[[^m]*m", "", text)
+
+
+def _visible_len(text: str) -> int:
+ """Get visible length of text (excluding ANSI codes)."""
+ return len(_strip_ansi(text))
+
+
+def _display_home_path(path: str) -> str:
+ """Display a path relative to the home directory when possible."""
+ expanded = Path(path).expanduser().resolve()
+ home = Path.home().resolve()
+ try:
+ relative = expanded.relative_to(home)
+ return f"~/{relative.as_posix()}"
+ except ValueError:
+ return str(expanded)
+
+
+class ReplSkin:
+ """Unified REPL skin for cli-anything CLIs.
+
+ Provides consistent branding, prompts, and message formatting
+ across all CLI harnesses built with the cli-anything methodology.
+ """
+
+ def __init__(self, software: str, version: str = "1.0.0",
+ history_file: str | None = None, skill_path: str | None = None):
+ """Initialize the REPL skin.
+
+ Args:
+ software: Software name (e.g., "gimp", "shotcut", "blender").
+ version: CLI version string.
+ history_file: Path for persistent command history.
+ Defaults to ~/.cli-anything-/history
+ skill_path: Path to the SKILL.md file for agent discovery.
+ Auto-detected from the repo-root skills/ tree when present,
+ otherwise from the package's skills/ directory.
+ Displayed in banner for AI agents to know where to read skill info.
+ """
+ self.software = software.lower().replace("-", "_")
+ self.display_name = software.replace("_", " ").title()
+ self.version = version
+ software_aliases = {"iterm2_ctl": "iterm2"}
+ self.skill_slug = software_aliases.get(self.software, self.software).replace("_", "-")
+ self.skill_id = f"cli-anything-{self.skill_slug}"
+ self.skill_install_cmd = (
+ f"npx skills add {_SKILL_SOURCE_REPO} --skill {self.skill_id} -g -y"
+ )
+ global_skill_root = Path(
+ os.environ.get("CLI_ANYTHING_GLOBAL_SKILLS_DIR", str(Path.home() / ".agents" / "skills"))
+ ).expanduser()
+ self.global_skill_path = str(global_skill_root / self.skill_id / "SKILL.md")
+
+ # Prefer repo-root canonical skills//SKILL.md when running
+ # inside the CLI-Anything monorepo. Fall back to the packaged
+ # cli_anything//skills/SKILL.md for installed harnesses.
+ if skill_path is None:
+ package_skill = Path(__file__).resolve().parent.parent / "skills" / "SKILL.md"
+ repo_skill = None
+ for parent in Path(__file__).resolve().parents:
+ candidate = parent / "skills" / self.skill_id / "SKILL.md"
+ if candidate.is_file():
+ repo_skill = candidate
+ break
+ if repo_skill and repo_skill.is_file():
+ skill_path = str(repo_skill)
+ elif package_skill.is_file():
+ skill_path = str(package_skill)
+ self.skill_path = skill_path
+ self.accent = _ACCENT_COLORS.get(self.software, _DEFAULT_ACCENT)
+
+ # History file
+ if history_file is None:
+ hist_dir = Path.home() / f".cli-anything-{self.software}"
+ hist_dir.mkdir(parents=True, exist_ok=True)
+ self.history_file = str(hist_dir / "history")
+ else:
+ self.history_file = history_file
+
+ # Detect terminal capabilities
+ self._color = self._detect_color_support()
+
+ def _detect_color_support(self) -> bool:
+ """Check if terminal supports color."""
+ if os.environ.get("NO_COLOR"):
+ return False
+ if os.environ.get("CLI_ANYTHING_NO_COLOR"):
+ return False
+ if not hasattr(sys.stdout, "isatty"):
+ return False
+ return sys.stdout.isatty()
+
+ def _c(self, code: str, text: str) -> str:
+ """Apply color code if colors are supported."""
+ if not self._color:
+ return text
+ return f"{code}{text}{_RESET}"
+
+ # ── Banner ────────────────────────────────────────────────────────
+
+ def print_banner(self):
+ """Print the startup banner with branding."""
+ import textwrap
+
+ inner = 72
+
+ def _box_line(content: str) -> str:
+ """Wrap content in box drawing, padding to inner width."""
+ pad = inner - _visible_len(content)
+ vl = self._c(_DARK_GRAY, _V_LINE)
+ return f"{vl}{content}{' ' * max(0, pad)}{vl}"
+
+ def _meta_lines(label: str, value: str) -> list[str]:
+ """Wrap a metadata line for the banner box."""
+ icon = self._c(_MAGENTA, "◇")
+ label_text = self._c(_DARK_GRAY, label)
+ prefix = f" {icon} {label_text} "
+ available = max(12, inner - _visible_len(prefix))
+ wrapped = textwrap.wrap(
+ value,
+ width=available,
+ break_long_words=True,
+ break_on_hyphens=False,
+ ) or [""]
+ lines = [f"{prefix}{self._c(_LIGHT_GRAY, wrapped[0])}"]
+ continuation_prefix = " " * _visible_len(prefix)
+ for chunk in wrapped[1:]:
+ lines.append(f"{continuation_prefix}{self._c(_LIGHT_GRAY, chunk)}")
+ return lines
+
+ top = self._c(_DARK_GRAY, f"{_TL}{_H_LINE * inner}{_TR}")
+ bot = self._c(_DARK_GRAY, f"{_BL}{_H_LINE * inner}{_BR}")
+
+ # Title: ◆ cli-anything · Shotcut
+ icon = self._c(_CYAN + _BOLD, "◆")
+ brand = self._c(_CYAN + _BOLD, "cli-anything")
+ dot = self._c(_DARK_GRAY, "·")
+ name = self._c(self.accent + _BOLD, self.display_name)
+ title = f" {icon} {brand} {dot} {name}"
+
+ ver = f" {self._c(_DARK_GRAY, f' v{self.version}')}"
+ tip = f" {self._c(_DARK_GRAY, ' Type help for commands, quit to exit')}"
+ empty = ""
+
+ meta_lines: list[str] = []
+ meta_lines.extend(_meta_lines("Install:", self.skill_install_cmd))
+ meta_lines.extend(_meta_lines("Global skill:", _display_home_path(self.global_skill_path)))
+ print(top)
+ print(_box_line(title))
+ print(_box_line(ver))
+ for line in meta_lines:
+ print(_box_line(line))
+ print(_box_line(empty))
+ print(_box_line(tip))
+ print(bot)
+ print()
+
+ # ── Prompt ────────────────────────────────────────────────────────
+
+ def prompt(self, project_name: str = "", modified: bool = False,
+ context: str = "") -> str:
+ """Build a styled prompt string for prompt_toolkit or input().
+
+ Args:
+ project_name: Current project name (empty if none open).
+ modified: Whether the project has unsaved changes.
+ context: Optional extra context to show in prompt.
+
+ Returns:
+ Formatted prompt string.
+ """
+ parts = []
+
+ # Icon
+ if self._color:
+ parts.append(f"{_CYAN}◆{_RESET} ")
+ else:
+ parts.append("> ")
+
+ # Software name
+ parts.append(self._c(self.accent + _BOLD, self.software))
+
+ # Project context
+ if project_name or context:
+ ctx = context or project_name
+ mod = "*" if modified else ""
+ parts.append(f" {self._c(_DARK_GRAY, '[')}")
+ parts.append(self._c(_LIGHT_GRAY, f"{ctx}{mod}"))
+ parts.append(self._c(_DARK_GRAY, ']'))
+
+ parts.append(self._c(_GRAY, " ❯ "))
+
+ return "".join(parts)
+
+ def prompt_tokens(self, project_name: str = "", modified: bool = False,
+ context: str = ""):
+ """Build prompt_toolkit formatted text tokens for the prompt.
+
+ Use with prompt_toolkit's FormattedText for proper ANSI handling.
+
+ Returns:
+ list of (style, text) tuples for prompt_toolkit.
+ """
+ accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+ tokens = []
+
+ tokens.append(("class:icon", "◆ "))
+ tokens.append(("class:software", self.software))
+
+ if project_name or context:
+ ctx = context or project_name
+ mod = "*" if modified else ""
+ tokens.append(("class:bracket", " ["))
+ tokens.append(("class:context", f"{ctx}{mod}"))
+ tokens.append(("class:bracket", "]"))
+
+ tokens.append(("class:arrow", " ❯ "))
+
+ return tokens
+
+ def get_prompt_style(self):
+ """Get a prompt_toolkit Style object matching the skin.
+
+ Returns:
+ prompt_toolkit.styles.Style
+ """
+ try:
+ from prompt_toolkit.styles import Style
+ except ImportError:
+ return None
+
+ accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+
+ return Style.from_dict({
+ "icon": "#5fdfdf bold", # cyan brand color
+ "software": f"{accent_hex} bold",
+ "bracket": "#585858",
+ "context": "#bcbcbc",
+ "arrow": "#808080",
+ # Completion menu
+ "completion-menu.completion": "bg:#303030 #bcbcbc",
+ "completion-menu.completion.current": f"bg:{accent_hex} #000000",
+ "completion-menu.meta.completion": "bg:#303030 #808080",
+ "completion-menu.meta.completion.current": f"bg:{accent_hex} #000000",
+ # Auto-suggest
+ "auto-suggest": "#585858",
+ # Bottom toolbar
+ "bottom-toolbar": "bg:#1c1c1c #808080",
+ "bottom-toolbar.text": "#808080",
+ })
+
+ # ── Messages ──────────────────────────────────────────────────────
+
+ def success(self, message: str):
+ """Print a success message with green checkmark."""
+ icon = self._c(_GREEN + _BOLD, "✓")
+ print(f" {icon} {self._c(_GREEN, message)}")
+
+ def error(self, message: str):
+ """Print an error message with red cross."""
+ icon = self._c(_RED + _BOLD, "✗")
+ print(f" {icon} {self._c(_RED, message)}", file=sys.stderr)
+
+ def warning(self, message: str):
+ """Print a warning message with yellow triangle."""
+ icon = self._c(_YELLOW + _BOLD, "⚠")
+ print(f" {icon} {self._c(_YELLOW, message)}")
+
+ def info(self, message: str):
+ """Print an info message with blue dot."""
+ icon = self._c(_BLUE, "●")
+ print(f" {icon} {self._c(_LIGHT_GRAY, message)}")
+
+ def hint(self, message: str):
+ """Print a subtle hint message."""
+ print(f" {self._c(_DARK_GRAY, message)}")
+
+ def section(self, title: str):
+ """Print a section header."""
+ print()
+ print(f" {self._c(self.accent + _BOLD, title)}")
+ print(f" {self._c(_DARK_GRAY, _H_LINE * len(title))}")
+
+ # ── Status display ────────────────────────────────────────────────
+
+ def status(self, label: str, value: str):
+ """Print a key-value status line."""
+ lbl = self._c(_GRAY, f" {label}:")
+ val = self._c(_WHITE, f" {value}")
+ print(f"{lbl}{val}")
+
+ def status_block(self, items: dict[str, str], title: str = ""):
+ """Print a block of status key-value pairs.
+
+ Args:
+ items: Dict of label -> value pairs.
+ title: Optional title for the block.
+ """
+ if title:
+ self.section(title)
+
+ max_key = max(len(k) for k in items) if items else 0
+ for label, value in items.items():
+ lbl = self._c(_GRAY, f" {label:<{max_key}}")
+ val = self._c(_WHITE, f" {value}")
+ print(f"{lbl}{val}")
+
+ def progress(self, current: int, total: int, label: str = ""):
+ """Print a simple progress indicator.
+
+ Args:
+ current: Current step number.
+ total: Total number of steps.
+ label: Optional label for the progress.
+ """
+ pct = int(current / total * 100) if total > 0 else 0
+ bar_width = 20
+ filled = int(bar_width * current / total) if total > 0 else 0
+ bar = "█" * filled + "░" * (bar_width - filled)
+ text = f" {self._c(_CYAN, bar)} {self._c(_GRAY, f'{pct:3d}%')}"
+ if label:
+ text += f" {self._c(_LIGHT_GRAY, label)}"
+ print(text)
+
+ # ── Table display ─────────────────────────────────────────────────
+
+ def table(self, headers: list[str], rows: list[list[str]],
+ max_col_width: int = 40):
+ """Print a formatted table with box-drawing characters.
+
+ Args:
+ headers: Column header strings.
+ rows: List of rows, each a list of cell strings.
+ max_col_width: Maximum column width before truncation.
+ """
+ if not headers:
+ return
+
+ # Calculate column widths
+ col_widths = [min(len(h), max_col_width) for h in headers]
+ for row in rows:
+ for i, cell in enumerate(row):
+ if i < len(col_widths):
+ col_widths[i] = min(
+ max(col_widths[i], len(str(cell))), max_col_width
+ )
+
+ def pad(text: str, width: int) -> str:
+ t = str(text)[:width]
+ return t + " " * (width - len(t))
+
+ # Header
+ header_cells = [
+ self._c(_CYAN + _BOLD, pad(h, col_widths[i]))
+ for i, h in enumerate(headers)
+ ]
+ sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+ header_line = f" {sep.join(header_cells)}"
+ print(header_line)
+
+ # Separator
+ sep_parts = [self._c(_DARK_GRAY, _H_LINE * w) for w in col_widths]
+ sep_line = self._c(_DARK_GRAY, f" {'───'.join([_H_LINE * w for w in col_widths])}")
+ print(sep_line)
+
+ # Rows
+ for row in rows:
+ cells = []
+ for i, cell in enumerate(row):
+ if i < len(col_widths):
+ cells.append(self._c(_LIGHT_GRAY, pad(str(cell), col_widths[i])))
+ row_sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+ print(f" {row_sep.join(cells)}")
+
+ # ── Help display ──────────────────────────────────────────────────
+
+ def help(self, commands: dict[str, str]):
+ """Print a formatted help listing.
+
+ Args:
+ commands: Dict of command -> description pairs.
+ """
+ self.section("Commands")
+ max_cmd = max(len(c) for c in commands) if commands else 0
+ for cmd, desc in commands.items():
+ cmd_styled = self._c(self.accent, f" {cmd:<{max_cmd}}")
+ desc_styled = self._c(_GRAY, f" {desc}")
+ print(f"{cmd_styled}{desc_styled}")
+ print()
+
+ # ── Goodbye ───────────────────────────────────────────────────────
+
+ def print_goodbye(self):
+ """Print a styled goodbye message."""
+ print(f"\n {_ICON_SMALL} {self._c(_GRAY, 'Goodbye!')}\n")
+
+ # ── Prompt toolkit session factory ────────────────────────────────
+
+ def create_prompt_session(self):
+ """Create a prompt_toolkit PromptSession with skin styling.
+
+ Returns:
+ A configured PromptSession, or None if prompt_toolkit unavailable.
+ """
+ try:
+ from prompt_toolkit import PromptSession
+ from prompt_toolkit.history import FileHistory
+ from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
+ from prompt_toolkit.formatted_text import FormattedText
+
+ style = self.get_prompt_style()
+
+ session = PromptSession(
+ history=FileHistory(self.history_file),
+ auto_suggest=AutoSuggestFromHistory(),
+ style=style,
+ enable_history_search=True,
+ )
+ return session
+ except ImportError:
+ return None
+
+ def get_input(self, pt_session, project_name: str = "",
+ modified: bool = False, context: str = "") -> str:
+ """Get input from user using prompt_toolkit or fallback.
+
+ Args:
+ pt_session: A prompt_toolkit PromptSession (or None).
+ project_name: Current project name.
+ modified: Whether project has unsaved changes.
+ context: Optional context string.
+
+ Returns:
+ User input string (stripped).
+ """
+ if pt_session is not None:
+ from prompt_toolkit.formatted_text import FormattedText
+ tokens = self.prompt_tokens(project_name, modified, context)
+ return pt_session.prompt(FormattedText(tokens)).strip()
+ else:
+ raw_prompt = self.prompt(project_name, modified, context)
+ return input(raw_prompt).strip()
+
+ # ── Toolbar builder ───────────────────────────────────────────────
+
+ def bottom_toolbar(self, items: dict[str, str]):
+ """Create a bottom toolbar callback for prompt_toolkit.
+
+ Args:
+ items: Dict of label -> value pairs to show in toolbar.
+
+ Returns:
+ A callable that returns FormattedText for the toolbar.
+ """
+ def toolbar():
+ from prompt_toolkit.formatted_text import FormattedText
+ parts = []
+ for i, (k, v) in enumerate(items.items()):
+ if i > 0:
+ parts.append(("class:bottom-toolbar.text", " │ "))
+ parts.append(("class:bottom-toolbar.text", f" {k}: "))
+ parts.append(("class:bottom-toolbar", v))
+ return FormattedText(parts)
+ return toolbar
+
+
+# ── ANSI 256-color to hex mapping (for prompt_toolkit styles) ─────────
+
+_ANSI_256_TO_HEX = {
+ "\033[38;5;33m": "#0087ff", # audacity navy blue
+ "\033[38;5;35m": "#00af5f", # shotcut teal
+ "\033[38;5;39m": "#00afff", # inkscape bright blue
+ "\033[38;5;40m": "#00d700", # libreoffice green
+ "\033[38;5;55m": "#5f00af", # obs purple
+ "\033[38;5;69m": "#5f87ff", # kdenlive slate blue
+ "\033[38;5;75m": "#5fafff", # default sky blue
+ "\033[38;5;80m": "#5fd7d7", # brand cyan
+ "\033[38;5;208m": "#ff8700", # blender deep orange
+ "\033[38;5;214m": "#ffaf00", # gimp warm orange
+}
diff --git a/openrefine/agent-harness/coverage.matrix.json b/openrefine/agent-harness/coverage.matrix.json
new file mode 100644
index 000000000..35ffb1d8c
--- /dev/null
+++ b/openrefine/agent-harness/coverage.matrix.json
@@ -0,0 +1,78 @@
+{
+ "software": "OpenRefine",
+ "workflows": [
+ {
+ "use_case": "Import messy CSV files into OpenRefine projects and inspect project metadata.",
+ "cli_commands": [
+ "cli-anything-openrefine project import --name --json",
+ "cli-anything-openrefine project list --json",
+ "cli-anything-openrefine data rows --limit 2 --json"
+ ],
+ "backend_interfaces": [
+ "POST /command/core/create-project-from-upload",
+ "GET /command/core/get-project-metadata",
+ "GET /command/core/get-rows"
+ ],
+ "unit_tests": [
+ "test_service_import_file_persists_project",
+ "test_service_list_projects",
+ "test_service_rows_uses_project_override"
+ ],
+ "e2e_tests": [
+ "test_e2e_import_csv_and_metadata",
+ "test_e2e_get_rows_after_import",
+ "test_e2e_cli_json_import_rows_export_workflow"
+ ]
+ },
+ {
+ "use_case": "Build reusable operation histories, apply them to projects, and export cleaned rows.",
+ "cli_commands": [
+ "cli-anything-openrefine ops text-transform --column Name --expression value.trim() --json",
+ "cli-anything-openrefine data apply --json",
+ "cli-anything-openrefine data export --format csv --json"
+ ],
+ "backend_interfaces": [
+ "POST /command/core/apply-operations",
+ "POST /command/core/export-rows"
+ ],
+ "unit_tests": [
+ "test_text_transform_shape",
+ "test_save_and_load_operations_roundtrip",
+ "test_service_apply_operations_uses_session_project",
+ "test_service_export_writes_output_and_session"
+ ],
+ "e2e_tests": [
+ "test_e2e_apply_text_transform_and_export_csv",
+ "test_e2e_apply_mass_edit_normalizes_city",
+ "test_e2e_cli_build_apply_operation_file"
+ ]
+ },
+ {
+ "use_case": "Persist CLI session state, report backend health, and recover with undo, redo, and project deletion.",
+ "cli_commands": [
+ "cli-anything-openrefine server ping --json",
+ "cli-anything-openrefine session show --json",
+ "cli-anything-openrefine session undo --json",
+ "cli-anything-openrefine session redo --json"
+ ],
+ "backend_interfaces": [
+ "GET /command/core/get-version",
+ "POST /command/core/undo-redo",
+ "POST /command/core/delete-project",
+ "GET /command/core/get-all-project-metadata"
+ ],
+ "unit_tests": [
+ "test_session_save_creates_parent_and_loads",
+ "test_session_undo_moves_to_future",
+ "test_session_redo_moves_to_history",
+ "test_service_open_project_persists_session"
+ ],
+ "e2e_tests": [
+ "test_e2e_backend_ping_reports_version",
+ "test_e2e_cli_session_persistence",
+ "test_e2e_backend_undo_redo_after_transform",
+ "test_e2e_recovery_delete_project_removes_from_listing"
+ ]
+ }
+ ]
+}
diff --git a/openrefine/agent-harness/e2e.backend.json b/openrefine/agent-harness/e2e.backend.json
new file mode 100644
index 000000000..ca0c5bb0c
--- /dev/null
+++ b/openrefine/agent-harness/e2e.backend.json
@@ -0,0 +1,28 @@
+{
+ "name": "openrefine",
+ "backend_type": "local-http-server",
+ "start_command": [
+ "openrefine",
+ "-i",
+ "127.0.0.1",
+ "-p",
+ "3333"
+ ],
+ "provisioning": {
+ "download_url": "https://github.com/OpenRefine/OpenRefine/releases/download/3.10.1/openrefine-linux-3.10.1.tar.gz",
+ "extract_note": "Extract the OpenRefine release tarball and run the openrefine command, or the bundled refine executable, with -i 127.0.0.1 -p 3333.",
+ "data_dir": "/tmp/openrefine-data"
+ },
+ "readiness": {
+ "type": "http",
+ "url": "http://127.0.0.1:3333/command/core/get-version",
+ "timeout_seconds": 60
+ },
+ "e2e_command": [
+ "python3",
+ "-m",
+ "pytest",
+ "cli_anything/openrefine/tests/test_full_e2e.py",
+ "-q"
+ ]
+}
diff --git a/openrefine/agent-harness/setup.py b/openrefine/agent-harness/setup.py
new file mode 100644
index 000000000..3e779ef0c
--- /dev/null
+++ b/openrefine/agent-harness/setup.py
@@ -0,0 +1,29 @@
+from setuptools import find_namespace_packages, setup
+
+
+setup(
+ name="cli-anything-openrefine",
+ version="1.0.0",
+ description="CLI-Anything harness for OpenRefine data wrangling workflows",
+ long_description="Agent-native Click CLI for OpenRefine's local HTTP API, operation histories, exports, and sessions.",
+ author="CLI-Anything-Team",
+ author_email="",
+ maintainer="CLI-Anything-Team",
+ url="https://github.com/HKUDS/CLI-Anything",
+ python_requires=">=3.10",
+ packages=find_namespace_packages(include=["cli_anything.*"]),
+ install_requires=[
+ "click>=8.0",
+ "requests>=2.28",
+ "prompt-toolkit>=3.0",
+ ],
+ extras_require={"dev": ["pytest>=7.0"]},
+ package_data={
+ "cli_anything.openrefine": ["skills/*.md"],
+ },
+ entry_points={
+ "console_scripts": [
+ "cli-anything-openrefine=cli_anything.openrefine.openrefine_cli:main",
+ ],
+ },
+)
diff --git a/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md b/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md
new file mode 100644
index 000000000..39679515b
--- /dev/null
+++ b/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md
@@ -0,0 +1,10 @@
+---
+name: "cli-anything-openrefine"
+description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo."
+contributor: "CLI-Anything-Team"
+---
+
+# CLI-Anything OpenRefine
+
+This compatibility copy mirrors `skills/cli-anything-openrefine/SKILL.md` at the standalone output root.
+Use `cli-anything-openrefine --json` for project import, operation-history application, row export, and session undo/redo against a running OpenRefine server.
diff --git a/registry.json b/registry.json
index ffa3928b7..1068eae4d 100644
--- a/registry.json
+++ b/registry.json
@@ -24,6 +24,25 @@
}
]
},
+ {
+ "name": "openrefine",
+ "display_name": "OpenRefine",
+ "version": "1.0.0",
+ "description": "Agent-native CLI for OpenRefine import, operation-history cleaning, row inspection, export, and session undo/redo through the real local HTTP API.",
+ "requires": "OpenRefine 3.10.x or newer running as a local web server",
+ "homepage": "https://openrefine.org/",
+ "source_url": null,
+ "install_cmd": "pip install git+https://github.com/HKUDS/CLI-Anything.git#subdirectory=openrefine/agent-harness",
+ "entry_point": "cli-anything-openrefine",
+ "skill_md": "skills/cli-anything-openrefine/SKILL.md",
+ "category": "database",
+ "contributors": [
+ {
+ "name": "CLI-Anything-Team",
+ "url": "https://github.com/HKUDS/CLI-Anything"
+ }
+ ]
+ },
{
"name": "cc-switch",
"display_name": "CC Switch",
diff --git a/skills/cli-anything-openrefine/SKILL.md b/skills/cli-anything-openrefine/SKILL.md
new file mode 100644
index 000000000..2ef724c88
--- /dev/null
+++ b/skills/cli-anything-openrefine/SKILL.md
@@ -0,0 +1,56 @@
+---
+name: "cli-anything-openrefine"
+description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo."
+contributor: "CLI-Anything-Team"
+---
+
+# CLI-Anything OpenRefine
+
+Use this skill when a task needs OpenRefine data cleaning, transformation, reusable operation histories, or CSV/TSV export from an automated agent workflow.
+
+## Prerequisites
+
+Install the harness:
+
+```bash
+cd openrefine/agent-harness
+python -m pip install -e .
+```
+
+Start OpenRefine before backend commands:
+
+```bash
+openrefine -i 127.0.0.1 -p 3333
+```
+
+Set a custom server with `OPENREFINE_URL=http://127.0.0.1:3333` or pass `--base-url`.
+
+## Command Rules For Agents
+
+- Prefer `--json` on every one-shot command.
+- Use `--session ` for isolated task state.
+- Import or open a project before row, apply, export, undo, or redo commands.
+- Existing OpenRefine operation-history JSON can be passed directly to `data apply`.
+- Generated files are normal OpenRefine operation JSON and exported CSV/TSV data.
+
+## Common Commands
+
+```bash
+cli-anything-openrefine --json server ping
+cli-anything-openrefine --json project list
+cli-anything-openrefine --json --session run/session.json project import messy.csv --name cleanup
+cli-anything-openrefine --json --session run/session.json data rows --limit 10
+cli-anything-openrefine --json ops text-transform run/trim.json --column Name --expression 'value.trim()'
+cli-anything-openrefine --json --session run/session.json data apply run/trim.json
+cli-anything-openrefine --json --session run/session.json data export run/clean.csv --format csv
+cli-anything-openrefine --json --session run/session.json session undo
+cli-anything-openrefine --json --session run/session.json session redo
+```
+
+## REPL
+
+Run `cli-anything-openrefine` with no subcommand to enter the REPL.
+
+## Error Handling
+
+When `--json` is set, command failures write a JSON object to stderr with `ok: false`.