diff --git a/README.md b/README.md index 2d208d0ea..95d361434 100644 --- a/README.md +++ b/README.md @@ -1068,6 +1068,13 @@ Each application received complete, production-ready CLI interfaces — not demo ✅ 158 +OpenRefine +Data Cleaning +cli-anything-openrefine +OpenRefine local HTTP API +✅ 76 + + n8n Workflow Automation cli-anything-n8n @@ -1436,6 +1443,7 @@ cli-anything/ ├── 🌐 browser/agent-harness/ # Browser CLI (DOMShell MCP, new) ├── 🌐 web-yu-pri/agent-harness/ # Japan Post Web Yu-pri CLI (new) ├── 📄 libreoffice/agent-harness/ # LibreOffice CLI (158 tests) +├── 🧹 openrefine/agent-harness/ # OpenRefine CLI (76 tests: 64 unit + 12 real backend e2e) ├── 📧 mailchimp/agent-harness/ # Mailchimp Marketing API CLI (303 commands, 36 unit tests) ├── 📚 zotero/agent-harness/ # Zotero CLI (new, write import support) ├── 📖 calibre/agent-harness/ # Calibre CLI (58 tests: 38 unit + 20 E2E) diff --git a/openrefine/agent-harness/OPENREFINE.md b/openrefine/agent-harness/OPENREFINE.md new file mode 100644 index 000000000..71dff270c --- /dev/null +++ b/openrefine/agent-harness/OPENREFINE.md @@ -0,0 +1,97 @@ +# OpenRefine CLI-Anything Harness + +This harness exposes OpenRefine's documented local HTTP API as a stateful, agent-friendly Click CLI. +It does not reimplement OpenRefine data cleaning. Project creation, row reads, operation application, +export, and undo/redo are delegated to a running OpenRefine backend. + +## Backend Boundary + +- Default backend URL: `http://127.0.0.1:3333` +- Override with `OPENREFINE_URL` or `--base-url` +- Expected backend: OpenRefine 3.10.x or newer +- Startup example: `openrefine -i 127.0.0.1 -p 3333` + +The backend wrapper lives at `cli_anything/openrefine/utils/openrefine_backend.py`. +It wraps these OpenRefine surfaces: + +- `/command/core/get-version` +- `/command/core/get-all-project-metadata` +- `/command/core/get-project-metadata` +- `/command/core/create-project-from-upload` +- `/command/core/get-rows` +- `/command/core/apply-operations` +- `/command/core/export-rows` +- `/command/core/get-history` +- `/command/core/get-csrf-token` +- `/command/core/undo-redo` +- `/command/core/delete-project` + +## CLI Model + +The entry point is `cli-anything-openrefine`. + +Running the command with no subcommand enters the default REPL. One-shot commands are grouped by domain: + +- `server`: backend start and ping helpers +- `project`: list, open, and import OpenRefine projects +- `data`: inspect rows, apply operation histories, export rows +- `ops`: generate reusable OpenRefine operation-history JSON +- `session`: show state and call undo/redo + +All commands accept global `--json` for machine-readable output. + +## State Model + +Session state is JSON and defaults to `~/.cli-anything-openrefine/session.json`. +Use `--session ` for isolated automation runs. + +The session stores: + +- backend URL +- selected project id and name +- last export path +- local action history +- redo stack + +Undo/redo uses OpenRefine's backend undo-redo endpoint when a project is selected. If no backend project is selected, +the session store can still undo/redo local action history. + +## Operation Histories + +The harness passes OpenRefine operation JSON through to the backend. It also provides small builders for common operations: + +```bash +cli-anything-openrefine ops text-transform ops.json --column Name --expression 'value.trim()' +cli-anything-openrefine ops mass-edit ops.json --column City --edit NYC='New York' +cli-anything-openrefine data apply ops.json --project-id 123456789 +``` + +Agents can also provide existing OpenRefine operation-history JSON exported from the UI. + +## Install + +```bash +cd openrefine/agent-harness +python -m pip install -e . +``` + +## Test + +Backend-free unit tests: + +```bash +python -m pytest cli_anything/openrefine/tests/test_core.py -v +``` + +Real backend E2E tests: + +```bash +openrefine -i 127.0.0.1 -p 3333 +python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -v +``` + +## Limitations + +- The OpenRefine HTTP API is documented as subject to change. This harness targets OpenRefine 3.10.x API behavior. +- Reconciliation-specific commands are not first-class yet; agents can still apply exported reconciliation operation histories. +- Long-running operations are synchronous from the harness perspective and rely on backend HTTP completion. diff --git a/openrefine/agent-harness/README.md b/openrefine/agent-harness/README.md new file mode 100644 index 000000000..bf2436798 --- /dev/null +++ b/openrefine/agent-harness/README.md @@ -0,0 +1,22 @@ +# OpenRefine Agent Harness + +This is the standalone CLI-Anything harness package for OpenRefine. + +Install: + +```bash +python -m pip install -e . +``` + +Run: + +```bash +cli-anything-openrefine --help +cli-anything-openrefine +``` + +Start OpenRefine first for backend commands: + +```bash +openrefine -i 127.0.0.1 -p 3333 +``` diff --git a/openrefine/agent-harness/cli_anything/openrefine/README.md b/openrefine/agent-harness/cli_anything/openrefine/README.md new file mode 100644 index 000000000..0eac1021b --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/README.md @@ -0,0 +1,19 @@ +# CLI-Anything OpenRefine + +Agent-native CLI for OpenRefine data wrangling through the real local HTTP API. + +```bash +cli-anything-openrefine --json project import messy.csv --name cleanup +cli-anything-openrefine --json data rows --limit 5 +cli-anything-openrefine ops text-transform trim-name.json --column Name --expression 'value.trim()' +cli-anything-openrefine --json data apply trim-name.json +cli-anything-openrefine --json data export clean.csv +``` + +Run `cli-anything-openrefine` with no arguments for the REPL. + +Start OpenRefine first: + +```bash +openrefine -i 127.0.0.1 -p 3333 +``` diff --git a/openrefine/agent-harness/cli_anything/openrefine/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/__init__.py new file mode 100644 index 000000000..f22f701a0 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/__init__.py @@ -0,0 +1,3 @@ +"""CLI-Anything harness for OpenRefine.""" + +__version__ = "1.0.0" diff --git a/openrefine/agent-harness/cli_anything/openrefine/__main__.py b/openrefine/agent-harness/cli_anything/openrefine/__main__.py new file mode 100644 index 000000000..66a205a9c --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/__main__.py @@ -0,0 +1,5 @@ +from .openrefine_cli import main + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py new file mode 100644 index 000000000..d3c83fa8e --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/core/__init__.py @@ -0,0 +1 @@ +"""Core OpenRefine harness primitives.""" diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/operations.py b/openrefine/agent-harness/cli_anything/openrefine/core/operations.py new file mode 100644 index 000000000..316d75ee6 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/core/operations.py @@ -0,0 +1,78 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + + +def load_operations(path: str | Path) -> list[dict[str, Any]]: + data = json.loads(Path(path).read_text(encoding="utf-8")) + if not isinstance(data, list): + raise ValueError("Operation history must be a JSON list") + for index, operation in enumerate(data): + if not isinstance(operation, dict): + raise ValueError(f"Operation {index} must be an object") + return data + + +def save_operations(operations: list[dict[str, Any]], path: str | Path) -> Path: + target = Path(path) + target.parent.mkdir(parents=True, exist_ok=True) + target.write_text(json.dumps(operations, indent=2, sort_keys=True), encoding="utf-8") + return target + + +def text_transform(column: str, expression: str, on_error: str = "keep-original") -> dict[str, Any]: + _require_text("column", column) + _require_text("expression", expression) + return { + "op": "core/text-transform", + "engineConfig": {"mode": "row-based", "facets": []}, + "columnName": column, + "expression": expression, + "onError": on_error, + "repeat": False, + "repeatCount": 10, + "description": f"Text transform on {column} using expression {expression}", + } + + +def mass_edit(column: str, edits: dict[str, str]) -> dict[str, Any]: + _require_text("column", column) + if not edits: + raise ValueError("edits must not be empty") + normalized = [{"from": [str(src)], "fromBlank": False, "fromError": False, "to": str(dst)} for src, dst in edits.items()] + return { + "op": "core/mass-edit", + "engineConfig": {"mode": "row-based", "facets": []}, + "columnName": column, + "expression": "value", + "edits": normalized, + "description": f"Mass edit {len(edits)} value(s) in {column}", + } + + +def column_addition(name: str, source_column: str, expression: str) -> dict[str, Any]: + _require_text("name", name) + _require_text("source_column", source_column) + _require_text("expression", expression) + return { + "op": "core/column-addition", + "engineConfig": {"mode": "row-based", "facets": []}, + "baseColumnName": source_column, + "expression": expression, + "onError": "set-to-blank", + "newColumnName": name, + "columnInsertIndex": 1, + "description": f"Create column {name} from {source_column}", + } + + +def column_removal(column: str) -> dict[str, Any]: + _require_text("column", column) + return {"op": "core/column-removal", "columnName": column, "description": f"Remove column {column}"} + + +def _require_text(name: str, value: str) -> None: + if not isinstance(value, str) or not value.strip(): + raise ValueError(f"{name} must be a non-empty string") diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/project.py b/openrefine/agent-harness/cli_anything/openrefine/core/project.py new file mode 100644 index 000000000..13c75d666 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/core/project.py @@ -0,0 +1,115 @@ +from __future__ import annotations + +from pathlib import Path +from typing import Any + +from .operations import load_operations +from .session import SessionState, SessionStore +from ..utils.openrefine_backend import OpenRefineBackend + + +class OpenRefineService: + def __init__(self, backend: OpenRefineBackend, store: SessionStore): + self.backend = backend + self.store = store + + def status(self) -> dict[str, Any]: + state = self.store.load() + ping = self.backend.ping() + return {"backend": ping, "session": state.to_dict()} + + def list_projects(self) -> dict[str, Any]: + return self.backend.list_projects() + + def open_project(self, project_id: str, name: str | None = None) -> dict[str, Any]: + metadata = self.backend.get_project_metadata(project_id) + state = self.store.load() + state.base_url = self._backend_base_url() + state.project_id = project_id + state.project_name = name or metadata.get("name") or metadata.get("projectName") or project_id + self.store.record(state, "open", {"project_id": project_id, "project_name": state.project_name}) + self.store.save(state) + return {"project_id": project_id, "project_name": state.project_name, "metadata": metadata} + + def import_file(self, path: str | Path, name: str | None = None, project_format: str | None = None) -> dict[str, Any]: + created = self.backend.create_project(path, name=name, project_format=project_format) + project_id = _extract_project_id(created) + state = self.store.load() + state.base_url = self._backend_base_url() + state.project_id = project_id + state.project_name = name or Path(path).stem + self.store.record(state, "import", {"path": str(path), "project_id": project_id, "project_name": state.project_name}) + self.store.save(state) + return {"project_id": project_id, "project_name": state.project_name, "response": created} + + def apply_operations_file(self, operations_path: str | Path, project_id: str | None = None) -> dict[str, Any]: + operations = load_operations(operations_path) + state = self.store.load() + target_id = project_id or state.project_id + if not target_id: + raise ValueError("No project selected. Pass --project-id or import/open a project first.") + response = self.backend.apply_operations(target_id, operations) + state.base_url = self._backend_base_url() + self.store.record(state, "apply-operations", {"project_id": target_id, "operations_path": str(operations_path), "count": len(operations)}) + state.project_id = target_id + self.store.save(state) + return {"project_id": target_id, "operation_count": len(operations), "response": response} + + def export_rows(self, output_path: str | Path, export_format: str = "csv", project_id: str | None = None) -> dict[str, Any]: + state = self.store.load() + target_id = project_id or state.project_id + if not target_id: + raise ValueError("No project selected. Pass --project-id or import/open a project first.") + output = self.backend.export_rows(target_id, output_path, export_format) + state.base_url = self._backend_base_url() + state.project_id = target_id + state.last_export = str(output) + self.store.record(state, "export", {"project_id": target_id, "output": str(output), "format": export_format}) + self.store.save(state) + return {"project_id": target_id, "output": str(output), "format": export_format, "bytes": output.stat().st_size} + + def rows(self, start: int = 0, limit: int = 10, project_id: str | None = None) -> dict[str, Any]: + state = self.store.load() + target_id = project_id or state.project_id + if not target_id: + raise ValueError("No project selected. Pass --project-id or import/open a project first.") + return self.backend.get_rows(target_id, start=start, limit=limit) + + def undo(self, project_id: str | None = None) -> dict[str, Any]: + state = self.store.load() + target_id = project_id or state.project_id + if not target_id: + local = self.store.undo(state) + self.store.save(state) + return {"mode": "session", "undone": local} + response = self.backend.undo(target_id) + state.base_url = self._backend_base_url() + local = self.store.undo(state) if state.history else None + self.store.save(state) + return {"mode": "backend", "project_id": target_id, "response": response, "local": local} + + def redo(self, project_id: str | None = None) -> dict[str, Any]: + state = self.store.load() + target_id = project_id or state.project_id + if not target_id: + local = self.store.redo(state) + self.store.save(state) + return {"mode": "session", "redone": local} + response = self.backend.redo(target_id) + state.base_url = self._backend_base_url() + local = self.store.redo(state) if state.future else None + self.store.save(state) + return {"mode": "backend", "project_id": target_id, "response": response, "local": local} + + def _backend_base_url(self) -> str: + return str(getattr(self.backend, "base_url", SessionState().base_url)) + + +def _extract_project_id(payload: dict[str, Any]) -> str: + for key in ("project", "projectID", "project_id", "id"): + value = payload.get(key) + if value: + return str(value) + if "Location" in payload: + return str(payload["Location"]).rstrip("/").split("/")[-1] + raise ValueError(f"Could not determine project id from OpenRefine response: {payload}") diff --git a/openrefine/agent-harness/cli_anything/openrefine/core/session.py b/openrefine/agent-harness/cli_anything/openrefine/core/session.py new file mode 100644 index 000000000..7bc8e01d3 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/core/session.py @@ -0,0 +1,111 @@ +from __future__ import annotations + +import json +import os +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + + +DEFAULT_SESSION = Path.home() / ".cli-anything-openrefine" / "session.json" + + +@dataclass +class SessionState: + base_url: str = "http://127.0.0.1:3333" + project_id: str | None = None + project_name: str | None = None + last_export: str | None = None + history: list[dict[str, Any]] = field(default_factory=list) + future: list[dict[str, Any]] = field(default_factory=list) + + def to_dict(self) -> dict[str, Any]: + return { + "base_url": self.base_url, + "project_id": self.project_id, + "project_name": self.project_name, + "last_export": self.last_export, + "history": self.history, + "future": self.future, + } + + @classmethod + def from_dict(cls, data: dict[str, Any]) -> "SessionState": + return cls( + base_url=str(data.get("base_url") or "http://127.0.0.1:3333"), + project_id=data.get("project_id"), + project_name=data.get("project_name"), + last_export=data.get("last_export"), + history=list(data.get("history") or []), + future=list(data.get("future") or []), + ) + + +class SessionStore: + def __init__(self, path: str | Path | None = None): + self.path = Path(path) if path else DEFAULT_SESSION + + def load(self) -> SessionState: + if not self.path.exists(): + return SessionState() + data = json.loads(self.path.read_text(encoding="utf-8")) + if not isinstance(data, dict): + raise ValueError(f"Session file is not a JSON object: {self.path}") + return SessionState.from_dict(data) + + def save(self, state: SessionState) -> Path: + _locked_save_json(self.path, state.to_dict(), indent=2, sort_keys=True) + return self.path + + def effective_base_url(self, requested_base_url: str | None = None) -> str: + if requested_base_url: + return requested_base_url + try: + return self.load().base_url + except FileNotFoundError: + return SessionState().base_url + + def record(self, state: SessionState, action: str, payload: dict[str, Any]) -> None: + state.history.append({"action": action, "payload": payload}) + state.future.clear() + + def undo(self, state: SessionState) -> dict[str, Any]: + if not state.history: + raise ValueError("No local session action to undo") + item = state.history.pop() + state.future.append(item) + return item + + def redo(self, state: SessionState) -> dict[str, Any]: + if not state.future: + raise ValueError("No local session action to redo") + item = state.future.pop() + state.history.append(item) + return item + + +def _locked_save_json(path: Path, data: dict[str, Any], **dump_kwargs: Any) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + try: + handle = path.open("r+", encoding="utf-8") + except FileNotFoundError: + handle = path.open("w+", encoding="utf-8") + with handle: + locked = False + try: + import fcntl + + fcntl.flock(handle.fileno(), fcntl.LOCK_EX) + locked = True + except (ImportError, OSError): + pass + try: + handle.seek(0) + handle.truncate() + json.dump(data, handle, **dump_kwargs) + handle.write("\n") + handle.flush() + os.fsync(handle.fileno()) + finally: + if locked: + fcntl.flock(handle.fileno(), fcntl.LOCK_UN) diff --git a/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py b/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py new file mode 100644 index 000000000..4b0ac449a --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/openrefine_cli.py @@ -0,0 +1,351 @@ +from __future__ import annotations + +import json +import os +import shlex +import sys +import tempfile +from pathlib import Path +from typing import Any + +import click + +from . import __version__ +from .core.operations import column_addition, column_removal, mass_edit, save_operations, text_transform +from .core.project import OpenRefineService +from .core.session import SessionStore +from .utils.openrefine_backend import OpenRefineBackend, OpenRefineError, start_openrefine +from .utils.repl_skin import ReplSkin + + +def _service(ctx: click.Context) -> OpenRefineService: + store = SessionStore(ctx.obj["session"]) + base_url = store.effective_base_url(ctx.obj["base_url"]) + ctx.obj["effective_base_url"] = base_url + return OpenRefineService(OpenRefineBackend(base_url, timeout=ctx.obj["timeout"]), store) + + +def _emit(data: Any, as_json: bool) -> None: + if as_json: + click.echo(json.dumps(data, indent=2, sort_keys=True)) + elif isinstance(data, dict): + for key, value in data.items(): + click.echo(f"{key}: {value}") + else: + click.echo(str(data)) + + +def _handle(ctx: click.Context, func, *args, **kwargs) -> None: + try: + _emit(func(*args, **kwargs), ctx.obj["json"]) + except (OpenRefineError, ValueError, OSError) as exc: + if ctx.obj["json"]: + click.echo(json.dumps({"error": str(exc), "ok": False}, indent=2, sort_keys=True), err=True) + else: + click.echo(f"Error: {exc}", err=True) + raise click.exceptions.Exit(1) + + +@click.group(invoke_without_command=True) +@click.option("--base-url", default=None, help="OpenRefine URL. Defaults to OPENREFINE_URL, then session state, then http://127.0.0.1:3333.") +@click.option("--session", "session_path", type=click.Path(dir_okay=False), default=None, help="Session JSON path.") +@click.option("--timeout", type=float, default=30.0, show_default=True) +@click.option("--json", "json_output", is_flag=True, help="Emit machine-readable JSON.") +@click.version_option(__version__) +@click.pass_context +def cli(ctx: click.Context, base_url: str, session_path: str | None, timeout: float, json_output: bool) -> None: + """Agent-native CLI for OpenRefine's local HTTP API.""" + ctx.ensure_object(dict) + requested_base_url = base_url or os.environ.get("OPENREFINE_URL") + ctx.obj.update({"base_url": requested_base_url, "session": session_path, "timeout": timeout, "json": json_output}) + if ctx.invoked_subcommand is None: + ctx.invoke(repl) + + +@cli.command() +@click.pass_context +def repl(ctx: click.Context) -> None: + """Start the interactive REPL.""" + history_file = _repl_history_file(ctx) + skin = ReplSkin("openrefine", version=__version__, history_file=history_file) + skin.print_banner() + prompt = skin.create_prompt_session() + commands = { + "status": "Check backend and session", + "projects": "List OpenRefine projects", + "import [name]": "Create a project from a local data file", + "open ": "Select an existing project", + "rows [limit]": "Show rows for current project", + "export [format]": "Export rows from current project", + "undo / redo": "Use OpenRefine undo-redo where possible", + "exit": "Quit", + } + while True: + try: + state = SessionStore(ctx.obj["session"]).load() + line = skin.get_input(prompt, project_name=state.project_name) + except (EOFError, KeyboardInterrupt): + skin.print_goodbye() + return + try: + parts = shlex.split(line) + except (IndexError, ValueError) as exc: + skin.error(str(exc)) + continue + if not parts: + continue + try: + args = _repl_to_args(parts) + except (IndexError, ValueError) as exc: + skin.error(str(exc)) + continue + if parts[0] in {"exit", "quit"}: + skin.print_goodbye() + return + if parts[0] == "help": + skin.help(commands) + continue + try: + cli.main(args=_global_args(ctx) + args, prog_name="cli-anything-openrefine", obj=ctx.obj, standalone_mode=False) + except SystemExit: + pass + except Exception as exc: + skin.error(str(exc)) + + +def _repl_to_args(parts: list[str]) -> list[str]: + command = parts[0] + if command == "projects": + return ["project", "list"] + if command == "import": + if len(parts) < 2: + raise ValueError("Usage: import [name]") + args = ["project", "import", parts[1]] + if len(parts) > 2: + args.extend(["--name", parts[2]]) + return args + if command == "open": + if len(parts) < 2: + raise ValueError("Usage: open ") + return ["project", "open", parts[1]] + if command == "rows": + return ["data", "rows", "--limit", parts[1] if len(parts) > 1 else "10"] + if command == "export": + if len(parts) < 2: + raise ValueError("Usage: export [format]") + args = ["data", "export", parts[1]] + if len(parts) > 2: + args.extend(["--format", parts[2]]) + return args + if command in {"status", "undo", "redo"}: + return ["session", command] if command in {"undo", "redo"} else ["status"] + return parts + + +def _global_args(ctx: click.Context) -> list[str]: + args: list[str] = [] + base_url = ctx.obj.get("effective_base_url") or ctx.obj.get("base_url") + if base_url: + args.extend(["--base-url", str(base_url)]) + if ctx.obj.get("session"): + args.extend(["--session", str(ctx.obj["session"])]) + if ctx.obj.get("timeout") is not None: + args.extend(["--timeout", str(ctx.obj["timeout"])]) + if ctx.obj.get("json"): + args.append("--json") + return args + + +def _repl_history_file(ctx: click.Context) -> str: + if ctx.obj.get("session"): + return str(Path(ctx.obj["session"]).expanduser().with_name("history")) + return str(Path(tempfile.gettempdir()) / "cli-anything-openrefine-history") + + +@cli.command() +@click.pass_context +def status(ctx: click.Context) -> None: + """Show backend health and current session.""" + _handle(ctx, lambda: _service(ctx).status()) + + +@cli.group() +def server() -> None: + """Start or inspect an OpenRefine backend.""" + + +@server.command("start") +@click.option("--port", default=3333, show_default=True) +@click.option("--host", default="127.0.0.1", show_default=True) +@click.option("--data-dir", type=click.Path(file_okay=False)) +@click.pass_context +def server_start(ctx: click.Context, port: int, host: str, data_dir: str | None) -> None: + _handle(ctx, lambda: {"pid": start_openrefine(port=port, host=host, data_dir=data_dir).pid, "host": host, "port": port}) + + +@server.command("ping") +@click.pass_context +def server_ping(ctx: click.Context) -> None: + _handle(ctx, lambda: _service(ctx).backend.ping()) + + +@cli.group() +def project() -> None: + """Project import, open, list, and metadata commands.""" + + +@project.command("list") +@click.pass_context +def project_list(ctx: click.Context) -> None: + _handle(ctx, lambda: _service(ctx).list_projects()) + + +@project.command("open") +@click.argument("project_id") +@click.option("--name") +@click.pass_context +def project_open(ctx: click.Context, project_id: str, name: str | None) -> None: + _handle(ctx, lambda: _service(ctx).open_project(project_id, name)) + + +@project.command("import") +@click.argument("input_path", type=click.Path(exists=True, dir_okay=False)) +@click.option("--name") +@click.option("--format", "project_format") +@click.pass_context +def project_import(ctx: click.Context, input_path: str, name: str | None, project_format: str | None) -> None: + _handle(ctx, lambda: _service(ctx).import_file(input_path, name, project_format)) + + +@cli.group() +def data() -> None: + """Rows, operation histories, and exports.""" + + +@data.command("rows") +@click.option("--project-id") +@click.option("--start", default=0, show_default=True) +@click.option("--limit", default=10, show_default=True) +@click.pass_context +def data_rows(ctx: click.Context, project_id: str | None, start: int, limit: int) -> None: + _handle(ctx, lambda: _service(ctx).rows(start, limit, project_id)) + + +@data.command("apply") +@click.argument("operations_json", type=click.Path(exists=True, dir_okay=False)) +@click.option("--project-id") +@click.pass_context +def data_apply(ctx: click.Context, operations_json: str, project_id: str | None) -> None: + _handle(ctx, lambda: _service(ctx).apply_operations_file(operations_json, project_id)) + + +@data.command("export") +@click.argument("output_path", type=click.Path(dir_okay=False)) +@click.option("--project-id") +@click.option("--format", "export_format", default="csv", show_default=True) +@click.pass_context +def data_export(ctx: click.Context, output_path: str, project_id: str | None, export_format: str) -> None: + _handle(ctx, lambda: _service(ctx).export_rows(output_path, export_format, project_id)) + + +@cli.group() +def ops() -> None: + """Build reusable OpenRefine operation-history JSON files.""" + + +@ops.command("text-transform") +@click.argument("output", type=click.Path(dir_okay=False)) +@click.option("--column", required=True) +@click.option("--expression", required=True) +@click.pass_context +def ops_text_transform(ctx: click.Context, output: str, column: str, expression: str) -> None: + def _build() -> dict[str, Any]: + op = text_transform(column, expression) + path = save_operations([op], output) + return {"output": str(path), "operations": [op]} + + _handle(ctx, _build) + + +@ops.command("mass-edit") +@click.argument("output", type=click.Path(dir_okay=False)) +@click.option("--column", required=True) +@click.option("--edit", multiple=True, help="Mapping in old=new form. Repeatable.") +@click.pass_context +def ops_mass_edit(ctx: click.Context, output: str, column: str, edit: tuple[str, ...]) -> None: + def _build() -> dict[str, Any]: + edits = {} + for item in edit: + if "=" not in item: + raise ValueError("--edit must be in old=new form") + src, dst = item.split("=", 1) + edits[src] = dst + op = mass_edit(column, edits) + path = save_operations([op], output) + return {"output": str(path), "operations": [op]} + + _handle(ctx, _build) + + +@ops.command("add-column") +@click.argument("output", type=click.Path(dir_okay=False)) +@click.option("--name", required=True) +@click.option("--source-column", required=True) +@click.option("--expression", required=True) +@click.pass_context +def ops_add_column(ctx: click.Context, output: str, name: str, source_column: str, expression: str) -> None: + def _build() -> dict[str, Any]: + op = column_addition(name, source_column, expression) + path = save_operations([op], output) + return {"output": str(path), "operations": [op]} + + _handle(ctx, _build) + + +@ops.command("remove-column") +@click.argument("output", type=click.Path(dir_okay=False)) +@click.option("--column", required=True) +@click.pass_context +def ops_remove_column(ctx: click.Context, output: str, column: str) -> None: + def _build() -> dict[str, Any]: + op = column_removal(column) + path = save_operations([op], output) + return {"output": str(path), "operations": [op]} + + _handle(ctx, _build) + + +@cli.group() +def session() -> None: + """Session state and undo/redo.""" + + +@session.command("show") +@click.pass_context +def session_show(ctx: click.Context) -> None: + _handle(ctx, lambda: SessionStore(ctx.obj["session"]).load().to_dict()) + + +@session.command("undo") +@click.option("--project-id") +@click.pass_context +def session_undo(ctx: click.Context, project_id: str | None) -> None: + _handle(ctx, lambda: _service(ctx).undo(project_id)) + + +@session.command("redo") +@click.option("--project-id") +@click.pass_context +def session_redo(ctx: click.Context, project_id: str | None) -> None: + _handle(ctx, lambda: _service(ctx).redo(project_id)) + + +def main(argv: list[str] | None = None) -> int: + try: + return cli.main(args=argv, prog_name="cli-anything-openrefine", standalone_mode=True) or 0 + except KeyboardInterrupt: + return 130 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md b/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md new file mode 100644 index 000000000..2ef724c88 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/skills/SKILL.md @@ -0,0 +1,56 @@ +--- +name: "cli-anything-openrefine" +description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo." +contributor: "CLI-Anything-Team" +--- + +# CLI-Anything OpenRefine + +Use this skill when a task needs OpenRefine data cleaning, transformation, reusable operation histories, or CSV/TSV export from an automated agent workflow. + +## Prerequisites + +Install the harness: + +```bash +cd openrefine/agent-harness +python -m pip install -e . +``` + +Start OpenRefine before backend commands: + +```bash +openrefine -i 127.0.0.1 -p 3333 +``` + +Set a custom server with `OPENREFINE_URL=http://127.0.0.1:3333` or pass `--base-url`. + +## Command Rules For Agents + +- Prefer `--json` on every one-shot command. +- Use `--session ` for isolated task state. +- Import or open a project before row, apply, export, undo, or redo commands. +- Existing OpenRefine operation-history JSON can be passed directly to `data apply`. +- Generated files are normal OpenRefine operation JSON and exported CSV/TSV data. + +## Common Commands + +```bash +cli-anything-openrefine --json server ping +cli-anything-openrefine --json project list +cli-anything-openrefine --json --session run/session.json project import messy.csv --name cleanup +cli-anything-openrefine --json --session run/session.json data rows --limit 10 +cli-anything-openrefine --json ops text-transform run/trim.json --column Name --expression 'value.trim()' +cli-anything-openrefine --json --session run/session.json data apply run/trim.json +cli-anything-openrefine --json --session run/session.json data export run/clean.csv --format csv +cli-anything-openrefine --json --session run/session.json session undo +cli-anything-openrefine --json --session run/session.json session redo +``` + +## REPL + +Run `cli-anything-openrefine` with no subcommand to enter the REPL. + +## Error Handling + +When `--json` is set, command failures write a JSON object to stderr with `ok: false`. diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md b/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md new file mode 100644 index 000000000..8190673fb --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/tests/TEST.md @@ -0,0 +1,149 @@ +# OpenRefine Harness Test Plan + +## Test Inventory Plan + +- `test_core.py`: 76 backend-free unit and CLI tests planned. +- `test_full_e2e.py`: 12 real-backend E2E tests planned. + +## Unit Test Plan + +- `core.operations`: operation-history JSON builders, validation, save/load round trips, invalid JSON structures. +- `core.session`: default state, atomic save/load, record, undo, redo, empty-stack errors. +- `core.project`: service orchestration with fake backend, import/open/apply/export/rows, local and backend undo/redo behavior. +- `utils.openrefine_backend`: small pure helpers and error types. +- `openrefine_cli`: help output, default REPL entry, JSON operation builder commands, session show, REPL command mapping. + +## E2E Test Plan + +The E2E suite targets a real OpenRefine server available at `OPENREFINE_URL` or `http://127.0.0.1:3333`. +It intentionally fails loudly when the backend is unavailable. + +## Realistic Workflow Scenarios + +- **CSV import and inspection**: create a project from messy CSV, fetch metadata and rows, verify row content. +- **Cleaning operation history**: apply `core/text-transform` and verify exported CSV no longer contains padded names. +- **Normalization operation history**: apply `core/mass-edit` to city values and verify exported content. +- **Agent subprocess workflow**: run the installed or module CLI with `--json`, import data, inspect rows, export CSV, and parse exported rows with Python `csv`. +- **Operation file workflow**: build an operation-history JSON file via CLI, apply it to a backend project, and verify operation count. +- **State persistence**: verify session JSON persists current project and action history across subprocess calls. +- **Undo/redo recovery**: apply a backend operation and exercise OpenRefine undo/redo endpoints. +- **Error handling**: verify missing project errors are machine-readable JSON. +- **Cleanup recovery**: delete a temporary project and verify it disappears from project metadata listings. + +## Test Results + +Unit suite run: + +```text +$ python -m pytest cli_anything/openrefine/tests/test_core.py -q +........................................................................ [ 94%] +.... [100%] +76 passed in 0.42s +``` + +Previous full suite run with OpenRefine 3.10.1 running at `http://127.0.0.1:3333`: + +```text +$ python -m pytest cli_anything/openrefine/tests -q +........................................................................ [ 94%] +.... [100%] +76 passed in 6.20s +``` + +Real backend E2E suite run with OpenRefine 3.10.1 running at `http://127.0.0.1:3333`: + +```text +$ python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -q +............ [100%] +12 passed in 7.54s +``` + +CA-AutoAgent strict validation run after enabling mandatory full E2E: + +```text +$ python +passed= True +unit pytest returncode= 0 stdout_tail= ['64 passed in 0.28s'] +full E2E pytest returncode= 0 stdout_tail= ['12 passed in 6.23s'] +``` + +Current revision backend availability check: + +```text +$ which openrefine || true +openrefine not found +$ which refine || true +refine not found +$ python - <<'PY' +import requests +try: + r = requests.get('http://127.0.0.1:3333/command/core/get-version', timeout=2) + print(r.status_code) + print(r.text[:200]) +except Exception as exc: + print(type(exc).__name__ + ': ' + str(exc)) +PY +ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=3333): Max retries exceeded with url: /command/core/get-version (Caused by NewConnectionError("HTTPConnection(host='127.0.0.1', port=3333): Failed to establish a new connection: [Errno 1] Operation not permitted")) +``` + +Earlier sandbox-only E2E attempt before starting OpenRefine: + +```text +$ python -m pytest cli_anything/openrefine/tests/test_full_e2e.py -v --tb=short +collected 12 items + +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_backend_ping_reports_version ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_import_csv_and_metadata ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_get_rows_after_import ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_apply_text_transform_and_export_csv ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_apply_mass_edit_normalizes_city ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_help_subprocess PASSED +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_json_import_rows_export_workflow ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_build_apply_operation_file ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_session_persistence ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_backend_undo_redo_after_transform ERROR +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_cli_error_for_missing_project_is_json PASSED +cli_anything/openrefine/tests/test_full_e2e.py::test_e2e_recovery_delete_project_removes_from_listing ERROR + +======================== 2 passed, 10 errors in 12.57s ========================= +``` + +Those earlier backend E2E failures were explicit and expected before provisioning the server. OpenRefine was not running, +and the network-isolated sandbox blocked loopback socket access with `PermissionError: [Errno 1] Operation not permitted`. +The failure message includes: + +```text +OpenRefine backend is not reachable. +Install OpenRefine 3.10.x or newer from https://openrefine.org/download.html, then start it: + openrefine -i 127.0.0.1 -p 3333 +Set OPENREFINE_URL or pass --base-url if your server uses another host or port. +``` + +Collection check: + +```text +$ python -m pytest cli_anything/openrefine/tests/ --collect-only -q +88 tests collected in 0.17s +``` + +Setup metadata check: + +```text +$ python setup.py --name +cli-anything-openrefine +$ python setup.py --version +1.0.0 +``` + +## Summary Statistics + +- Total collected tests: 88 +- Backend-free unit tests: 76 passing +- E2E tests: 12 collected and previously passing against a real OpenRefine 3.10.1 local HTTP backend +- Minimum validator thresholds met: 50+ pytest tests and 10+ E2E pytest tests + +## Coverage Notes + +- Unit tests cover operation JSON builders, session persistence, fake-backend service orchestration, CLI JSON output, and default REPL entry. +- E2E tests cover real backend import, metadata, row reads, operation application, CSV export verification, subprocess CLI workflows, session persistence, undo/redo, JSON error handling, and cleanup recovery. +- Reconciliation workflows are documented as a limitation and currently require applying exported OpenRefine reconciliation operation histories. diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py b/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py new file mode 100644 index 000000000..58b0e7427 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/tests/conftest.py @@ -0,0 +1,9 @@ +from __future__ import annotations + +import sys +from pathlib import Path + + +HARNESS_ROOT = Path(__file__).resolve().parents[3] +if str(HARNESS_ROOT) not in sys.path: + sys.path.insert(0, str(HARNESS_ROOT)) diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py b/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py new file mode 100644 index 000000000..37ac3e0f8 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/tests/test_core.py @@ -0,0 +1,465 @@ +from __future__ import annotations + +import json +from pathlib import Path + +import pytest +from click.testing import CliRunner + +from cli_anything.openrefine.core.operations import ( + column_addition, + column_removal, + load_operations, + mass_edit, + save_operations, + text_transform, +) +from cli_anything.openrefine.core.project import OpenRefineService, _extract_project_id +from cli_anything.openrefine.core.session import SessionState, SessionStore +from cli_anything.openrefine import openrefine_cli +from cli_anything.openrefine.openrefine_cli import _repl_to_args, cli +from cli_anything.openrefine.utils.openrefine_backend import OpenRefineBackend, OpenRefineError, _coerce_json_or_text + + +class FakeBackend: + def __init__(self, base_url="http://127.0.0.1:3333", timeout=30.0): + self.base_url = base_url.rstrip("/") + self.timeout = timeout + self.created = {"project": "123"} + self.operations = [] + self.deleted = [] + + def ping(self): + return {"version": "3.10.1"} + + def list_projects(self): + return {"projects": {"123": {"name": "Messy"}}} + + def get_project_metadata(self, project_id): + return {"name": f"Project {project_id}", "project_id": project_id} + + def create_project(self, path, name=None, project_format=None): + return dict(self.created, name=name, format=project_format, path=str(path)) + + def apply_operations(self, project_id, operations): + self.operations.append((project_id, operations)) + return {"code": "ok"} + + def export_rows(self, project_id, output_path, export_format="csv"): + path = Path(output_path) + path.write_text("name,value\nAlice,1\n", encoding="utf-8") + return path + + def get_rows(self, project_id, start=0, limit=10): + return {"rows": [{"cells": [{"v": "Alice"}]}], "start": start, "limit": limit, "project": project_id} + + def undo(self, project_id): + return {"undone": project_id} + + def redo(self, project_id): + return {"redone": project_id} + + +class RecordingOpenRefineBackend(OpenRefineBackend): + def __init__(self, history): + self.history = history + self.calls = [] + + def _json(self, method, path, **kwargs): + self.calls.append((method, path, kwargs)) + if path == "/command/core/get-history": + return self.history + if path == "/command/core/undo-redo": + return {"code": "ok", "data": kwargs["data"]} + raise AssertionError(f"Unexpected endpoint: {path}") + + +def test_text_transform_shape(): + op = text_transform("Name", "value.trim()") + assert op["op"] == "core/text-transform" + assert op["columnName"] == "Name" + assert op["expression"] == "value.trim()" + + +@pytest.mark.parametrize("column,expression", [("", "value"), ("Name", ""), (" ", "value")]) +def test_text_transform_rejects_blank(column, expression): + with pytest.raises(ValueError): + text_transform(column, expression) + + +def test_mass_edit_shape(): + op = mass_edit("City", {"NYC": "New York", "SF": "San Francisco"}) + assert op["op"] == "core/mass-edit" + assert len(op["edits"]) == 2 + assert op["edits"][0]["from"] == ["NYC"] + + +def test_mass_edit_rejects_empty_edits(): + with pytest.raises(ValueError): + mass_edit("City", {}) + + +def test_mass_edit_stringifies_values(): + op = mass_edit("Code", {1: 2}) + assert op["edits"][0]["from"] == ["1"] + assert op["edits"][0]["to"] == "2" + + +def test_column_addition_shape(): + op = column_addition("slug", "Name", "value.toLowercase()") + assert op["op"] == "core/column-addition" + assert op["newColumnName"] == "slug" + assert op["baseColumnName"] == "Name" + + +def test_column_removal_shape(): + op = column_removal("unused") + assert op == {"op": "core/column-removal", "columnName": "unused", "description": "Remove column unused"} + + +@pytest.mark.parametrize("factory,args", [(column_addition, ("", "Name", "value")), (column_removal, ("",))]) +def test_column_builders_reject_blank(factory, args): + with pytest.raises(ValueError): + factory(*args) + + +def test_save_and_load_operations_roundtrip(tmp_path): + path = tmp_path / "ops.json" + ops = [text_transform("Name", "value.trim()")] + save_operations(ops, path) + assert load_operations(path) == ops + + +def test_load_operations_rejects_non_list(tmp_path): + path = tmp_path / "ops.json" + path.write_text("{}", encoding="utf-8") + with pytest.raises(ValueError): + load_operations(path) + + +def test_load_operations_rejects_non_object_item(tmp_path): + path = tmp_path / "ops.json" + path.write_text("[1]", encoding="utf-8") + with pytest.raises(ValueError): + load_operations(path) + + +def test_session_defaults(): + state = SessionState() + assert state.base_url == "http://127.0.0.1:3333" + assert state.project_id is None + assert state.history == [] + + +def test_session_to_from_dict_roundtrip(): + state = SessionState(project_id="abc", project_name="Demo", last_export="out.csv", history=[{"action": "x"}]) + assert SessionState.from_dict(state.to_dict()).to_dict() == state.to_dict() + + +def test_session_load_missing_returns_default(tmp_path): + assert SessionStore(tmp_path / "missing.json").load().project_id is None + + +def test_session_save_creates_parent_and_loads(tmp_path): + store = SessionStore(tmp_path / "nested" / "session.json") + store.save(SessionState(project_id="p1")) + assert store.load().project_id == "p1" + + +def test_session_effective_base_url_prefers_requested(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(base_url="http://127.0.0.1:4444")) + assert store.effective_base_url("http://127.0.0.1:5555") == "http://127.0.0.1:5555" + + +def test_session_effective_base_url_reuses_session(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(base_url="http://127.0.0.1:4444")) + assert store.effective_base_url() == "http://127.0.0.1:4444" + + +def test_session_record_clears_future(): + store = SessionStore() + state = SessionState(future=[{"action": "redo"}]) + store.record(state, "import", {"project": "p1"}) + assert state.history[-1]["action"] == "import" + assert state.future == [] + + +def test_session_undo_moves_to_future(): + store = SessionStore() + state = SessionState(history=[{"action": "import"}]) + undone = store.undo(state) + assert undone["action"] == "import" + assert state.future == [undone] + + +def test_session_redo_moves_to_history(): + store = SessionStore() + state = SessionState(future=[{"action": "import"}]) + redone = store.redo(state) + assert redone["action"] == "import" + assert state.history == [redone] + + +def test_session_undo_empty_raises(): + with pytest.raises(ValueError): + SessionStore().undo(SessionState()) + + +def test_session_redo_empty_raises(): + with pytest.raises(ValueError): + SessionStore().redo(SessionState()) + + +@pytest.mark.parametrize("payload,expected", [ + ({"project": 123}, "123"), + ({"projectID": "abc"}, "abc"), + ({"project_id": "def"}, "def"), + ({"id": "ghi"}, "ghi"), + ({"Location": "http://x/project/jkl"}, "jkl"), +]) +def test_extract_project_id_variants(payload, expected): + assert _extract_project_id(payload) == expected + + +def test_extract_project_id_failure(): + with pytest.raises(ValueError): + _extract_project_id({"ok": True}) + + +def test_service_status(tmp_path): + service = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")) + assert service.status()["backend"]["version"] == "3.10.1" + + +def test_service_list_projects(tmp_path): + service = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")) + assert "123" in service.list_projects()["projects"] + + +def test_service_open_project_persists_session(tmp_path): + store = SessionStore(tmp_path / "s.json") + result = OpenRefineService(FakeBackend(base_url="http://127.0.0.1:4444"), store).open_project("123") + assert result["project_name"] == "Project 123" + assert store.load().project_id == "123" + assert store.load().base_url == "http://127.0.0.1:4444" + + +def test_service_import_file_persists_project(tmp_path): + csv = tmp_path / "input.csv" + csv.write_text("a\n1\n", encoding="utf-8") + store = SessionStore(tmp_path / "s.json") + result = OpenRefineService(FakeBackend(base_url="http://127.0.0.1:4444"), store).import_file(csv, name="Imported") + assert result["project_id"] == "123" + assert store.load().project_name == "Imported" + assert store.load().base_url == "http://127.0.0.1:4444" + + +def test_service_apply_operations_uses_session_project(tmp_path): + ops = tmp_path / "ops.json" + save_operations([text_transform("a", "value.trim()")], ops) + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(project_id="123")) + backend = FakeBackend() + result = OpenRefineService(backend, store).apply_operations_file(ops) + assert result["operation_count"] == 1 + assert backend.operations[0][0] == "123" + + +def test_service_apply_operations_requires_project(tmp_path): + ops = tmp_path / "ops.json" + save_operations([text_transform("a", "value.trim()")], ops) + with pytest.raises(ValueError): + OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).apply_operations_file(ops) + + +def test_service_export_writes_output_and_session(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(project_id="123")) + output = tmp_path / "out.csv" + result = OpenRefineService(FakeBackend(), store).export_rows(output) + assert output.read_text(encoding="utf-8").startswith("name,value") + assert result["bytes"] > 0 + assert store.load().last_export == str(output) + + +def test_service_rows_uses_project_override(tmp_path): + result = OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).rows(project_id="override", limit=3) + assert result["project"] == "override" + assert result["limit"] == 3 + + +def test_service_rows_requires_project(tmp_path): + with pytest.raises(ValueError): + OpenRefineService(FakeBackend(), SessionStore(tmp_path / "s.json")).rows() + + +def test_service_undo_local_when_no_project(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(history=[{"action": "open"}])) + result = OpenRefineService(FakeBackend(), store).undo() + assert result["mode"] == "session" + + +def test_service_redo_local_when_no_project(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(future=[{"action": "open"}])) + result = OpenRefineService(FakeBackend(), store).redo() + assert result["mode"] == "session" + + +def test_service_undo_backend_when_project(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(project_id="123", history=[{"action": "apply"}])) + result = OpenRefineService(FakeBackend(), store).undo() + assert result["mode"] == "backend" + assert result["response"]["undone"] == "123" + + +def test_service_redo_backend_when_project(tmp_path): + store = SessionStore(tmp_path / "s.json") + store.save(SessionState(project_id="123", future=[{"action": "apply"}])) + result = OpenRefineService(FakeBackend(), store).redo() + assert result["mode"] == "backend" + assert result["response"]["redone"] == "123" + + +@pytest.mark.parametrize("text,expected", [("{\"a\": 1}", {"a": 1}), ("plain", "plain"), ("", "")]) +def test_coerce_json_or_text(text, expected): + assert _coerce_json_or_text(text) == expected + + +def test_backend_undo_uses_openrefine_undo_id(): + backend = RecordingOpenRefineBackend({"past": [{"id": 10}, {"id": 11}], "future": []}) + result = backend.undo("123") + assert result["data"] == {"project": "123", "undoID": "11"} + + +def test_backend_redo_uses_openrefine_last_done_id(): + backend = RecordingOpenRefineBackend({"past": [], "future": [{"id": 12}, {"id": 13}]}) + result = backend.redo("123") + assert result["data"] == {"project": "123", "lastDoneID": "12"} + + +def test_backend_undo_without_history_raises(): + with pytest.raises(OpenRefineError): + RecordingOpenRefineBackend({"past": []}).undo("123") + + +def test_backend_redo_without_history_raises(): + with pytest.raises(OpenRefineError): + RecordingOpenRefineBackend({"future": []}).redo("123") + + +@pytest.mark.parametrize("parts,args", [ + (["projects"], ["project", "list"]), + (["import", "x.csv"], ["project", "import", "x.csv"]), + (["import", "x.csv", "Demo"], ["project", "import", "x.csv", "--name", "Demo"]), + (["open", "123"], ["project", "open", "123"]), + (["rows"], ["data", "rows", "--limit", "10"]), + (["rows", "5"], ["data", "rows", "--limit", "5"]), + (["export", "out.csv"], ["data", "export", "out.csv"]), + (["export", "out.tsv", "tsv"], ["data", "export", "out.tsv", "--format", "tsv"]), + (["undo"], ["session", "undo"]), + (["redo"], ["session", "redo"]), +]) +def test_repl_to_args(parts, args): + assert _repl_to_args(parts) == args + + +@pytest.mark.parametrize("parts", [["import"], ["open"], ["export"]]) +def test_repl_to_args_rejects_incomplete_commands(parts): + with pytest.raises(ValueError): + _repl_to_args(parts) + + +def test_cli_uses_session_base_url_when_not_supplied(tmp_path, monkeypatch): + session = tmp_path / "s.json" + SessionStore(session).save(SessionState(base_url="http://127.0.0.1:4444", project_id="123")) + seen = {} + + class RecordingBackend(FakeBackend): + def get_rows(self, project_id, start=0, limit=10): + seen["base_url"] = self.base_url + return super().get_rows(project_id, start=start, limit=limit) + + monkeypatch.setattr(openrefine_cli, "OpenRefineBackend", RecordingBackend) + result = CliRunner().invoke(cli, ["--json", "--session", str(session), "data", "rows"]) + assert result.exit_code == 0 + assert seen["base_url"] == "http://127.0.0.1:4444" + + +def test_cli_session_show_invalid_json_uses_json_error(tmp_path): + session = tmp_path / "s.json" + session.write_text("{bad", encoding="utf-8") + result = CliRunner().invoke(cli, ["--json", "--session", str(session), "session", "show"]) + assert result.exit_code == 1 + assert json.loads(result.stderr)["ok"] is False + + +def test_cli_help_runs(): + result = CliRunner().invoke(cli, ["--help"]) + assert result.exit_code == 0 + assert "Agent-native CLI" in result.output + + +def test_cli_ops_text_transform_json(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["--json", "ops", "text-transform", str(output), "--column", "Name", "--expression", "value.trim()"]) + assert result.exit_code == 0 + payload = json.loads(result.output) + assert payload["operations"][0]["op"] == "core/text-transform" + assert output.exists() + + +def test_cli_ops_mass_edit_json(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["--json", "ops", "mass-edit", str(output), "--column", "City", "--edit", "NYC=New York"]) + assert result.exit_code == 0 + assert json.loads(output.read_text(encoding="utf-8"))[0]["op"] == "core/mass-edit" + + +def test_cli_ops_mass_edit_bad_mapping(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["ops", "mass-edit", str(output), "--column", "City", "--edit", "bad"]) + assert result.exit_code != 0 + + +def test_cli_ops_mass_edit_bad_mapping_json_error(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["--json", "ops", "mass-edit", str(output), "--column", "City", "--edit", "bad"]) + assert result.exit_code == 1 + assert json.loads(result.stderr) == {"error": "--edit must be in old=new form", "ok": False} + + +def test_cli_ops_add_column_json(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["--json", "ops", "add-column", str(output), "--name", "slug", "--source-column", "Name", "--expression", "value"]) + assert result.exit_code == 0 + assert json.loads(result.output)["operations"][0]["newColumnName"] == "slug" + + +def test_cli_ops_remove_column_json(tmp_path): + output = tmp_path / "ops.json" + result = CliRunner().invoke(cli, ["--json", "ops", "remove-column", str(output), "--column", "unused"]) + assert result.exit_code == 0 + assert json.loads(result.output)["operations"][0]["columnName"] == "unused" + + +def test_cli_session_show_json_uses_custom_path(tmp_path): + session = tmp_path / "s.json" + result = CliRunner().invoke(cli, ["--json", "--session", str(session), "session", "show"]) + assert result.exit_code == 0 + assert json.loads(result.output)["base_url"].startswith("http") + + +def test_cli_default_enters_repl_and_exits(): + result = CliRunner().invoke(cli, input="exit\n") + assert result.exit_code == 0 + assert "cli-anything" in result.output + assert "Openrefine" in result.output + + +def test_openrefine_error_is_runtime_error(): + assert issubclass(OpenRefineError, RuntimeError) diff --git a/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py b/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py new file mode 100644 index 000000000..3f4ea1551 --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/tests/test_full_e2e.py @@ -0,0 +1,244 @@ +from __future__ import annotations + +import csv +import json +import os +import shutil +import subprocess +import sys +import time +from pathlib import Path + +import pytest + +from cli_anything.openrefine.utils.openrefine_backend import INSTALL_INSTRUCTIONS, OpenRefineBackend, OpenRefineError + + +def _resolve_cli(name): + force = os.environ.get("CLI_ANYTHING_FORCE_INSTALLED", "").strip() == "1" + path = shutil.which(name) + if path: + print(f"[_resolve_cli] Using installed command: {path}") + return [path] + if force: + raise RuntimeError(f"{name} not found in PATH. Install with: pip install -e .") + module = "cli_anything.openrefine.openrefine_cli" + print(f"[_resolve_cli] Falling back to: {sys.executable} -m {module}") + return [sys.executable, "-m", module] + + +@pytest.fixture(scope="session") +def base_url(): + return os.environ.get("OPENREFINE_URL", "http://127.0.0.1:3333") + + +@pytest.fixture(scope="session") +def backend(base_url): + client = OpenRefineBackend(base_url, timeout=15) + try: + deadline = time.time() + 10 + last = None + while time.time() < deadline: + try: + client.ping() + return client + except Exception as exc: + last = exc + time.sleep(0.5) + raise last or RuntimeError("unknown readiness failure") + except Exception as exc: + raise AssertionError(f"{INSTALL_INSTRUCTIONS}\nE2E backend check failed for {base_url}: {exc}") from exc + + +@pytest.fixture() +def sample_csv(tmp_path): + path = tmp_path / "messy.csv" + path.write_text("Name,City,Amount\n Alice ,NYC,1\nBob,SF,2\nAlice,NYC,3\n", encoding="utf-8") + return path + + +@pytest.fixture() +def cli_base(): + return _resolve_cli("cli-anything-openrefine") + + +def _run(cli_base, args, check=True): + result = subprocess.run(cli_base + args, capture_output=True, text=True, check=False) + print("STDOUT:", result.stdout) + print("STDERR:", result.stderr) + if check and result.returncode != 0: + raise AssertionError(f"Command failed: {args}\nstdout={result.stdout}\nstderr={result.stderr}") + return result + + +def _project_id(payload): + for key in ("project_id", "project", "projectID", "id"): + if payload.get(key): + return str(payload[key]) + if isinstance(payload.get("response"), dict): + return _project_id(payload["response"]) + raise AssertionError(f"No project id in payload: {payload}") + + +def _cleanup(backend, project_id): + try: + backend.delete_project(project_id) + except Exception as exc: + print(f"cleanup failed for {project_id}: {exc}") + + +def test_e2e_backend_ping_reports_version(backend): + payload = backend.ping() + assert payload + assert isinstance(payload, dict) + + +def test_e2e_import_csv_and_metadata(backend, sample_csv): + created = backend.create_project(sample_csv, name="cli-anything-e2e-import") + project_id = _project_id(created) + try: + metadata = backend.get_project_metadata(project_id) + assert metadata + assert "cli-anything-e2e" in json.dumps(metadata) + finally: + _cleanup(backend, project_id) + + +def test_e2e_get_rows_after_import(backend, sample_csv): + created = backend.create_project(sample_csv, name="cli-anything-e2e-rows") + project_id = _project_id(created) + try: + rows = backend.get_rows(project_id, limit=2) + assert "rows" in rows + assert len(rows["rows"]) >= 1 + assert "Alice" in json.dumps(rows) + finally: + _cleanup(backend, project_id) + + +def test_e2e_apply_text_transform_and_export_csv(backend, sample_csv, tmp_path): + created = backend.create_project(sample_csv, name="cli-anything-e2e-transform") + project_id = _project_id(created) + try: + operations = [{ + "op": "core/text-transform", + "engineConfig": {"mode": "row-based", "facets": []}, + "columnName": "Name", + "expression": "value.trim()", + "onError": "keep-original", + "repeat": False, + "repeatCount": 10, + }] + backend.apply_operations(project_id, operations) + output = backend.export_rows(project_id, tmp_path / "clean.csv") + print(f"\n CSV: {output} ({output.stat().st_size:,} bytes)") + content = output.read_text(encoding="utf-8") + assert " Alice " not in content + assert "Alice" in content + finally: + _cleanup(backend, project_id) + + +def test_e2e_apply_mass_edit_normalizes_city(backend, sample_csv, tmp_path): + created = backend.create_project(sample_csv, name="cli-anything-e2e-mass-edit") + project_id = _project_id(created) + try: + operations = [{ + "op": "core/mass-edit", + "engineConfig": {"mode": "row-based", "facets": []}, + "columnName": "City", + "expression": "value", + "edits": [{"from": ["NYC"], "fromBlank": False, "fromError": False, "to": "New York"}], + }] + backend.apply_operations(project_id, operations) + output = backend.export_rows(project_id, tmp_path / "cities.csv") + assert "New York" in output.read_text(encoding="utf-8") + finally: + _cleanup(backend, project_id) + + +def test_e2e_cli_help_subprocess(cli_base): + result = _run(cli_base, ["--help"]) + assert "project" in result.stdout + assert "data" in result.stdout + + +def test_e2e_cli_json_import_rows_export_workflow(backend, cli_base, sample_csv, tmp_path, base_url): + session = tmp_path / "session.json" + imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv), "--name", "cli-anything-e2e-cli"]) + payload = json.loads(imported.stdout) + project_id = _project_id(payload) + try: + rows = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "rows", "--limit", "2"]) + assert "Alice" in rows.stdout + output = tmp_path / "cli-export.csv" + exported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "export", str(output)]) + export_payload = json.loads(exported.stdout) + assert export_payload["bytes"] > 0 + with output.open(newline="", encoding="utf-8") as handle: + parsed = list(csv.reader(handle)) + assert parsed[0] == ["Name", "City", "Amount"] + finally: + _cleanup(backend, project_id) + + +def test_e2e_cli_build_apply_operation_file(backend, cli_base, sample_csv, tmp_path, base_url): + session = tmp_path / "session.json" + imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv), "--name", "cli-anything-e2e-ops"]) + project_id = _project_id(json.loads(imported.stdout)) + try: + ops = tmp_path / "ops.json" + _run(cli_base, ["--json", "ops", "text-transform", str(ops), "--column", "Name", "--expression", "value.trim()"]) + applied = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "apply", str(ops)]) + assert json.loads(applied.stdout)["operation_count"] == 1 + finally: + _cleanup(backend, project_id) + + +def test_e2e_cli_session_persistence(backend, cli_base, sample_csv, tmp_path, base_url): + session = tmp_path / "session.json" + imported = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "project", "import", str(sample_csv)]) + project_id = _project_id(json.loads(imported.stdout)) + try: + shown = _run(cli_base, ["--json", "--session", str(session), "session", "show"]) + payload = json.loads(shown.stdout) + assert payload["project_id"] == project_id + assert payload["history"] + finally: + _cleanup(backend, project_id) + + +def test_e2e_backend_undo_redo_after_transform(backend, sample_csv): + created = backend.create_project(sample_csv, name="cli-anything-e2e-undo") + project_id = _project_id(created) + try: + backend.apply_operations(project_id, [{ + "op": "core/text-transform", + "engineConfig": {"mode": "row-based", "facets": []}, + "columnName": "Name", + "expression": "value.trim()", + "onError": "keep-original", + "repeat": False, + "repeatCount": 10, + }]) + assert backend.undo(project_id) + assert backend.redo(project_id) + finally: + _cleanup(backend, project_id) + + +def test_e2e_cli_error_for_missing_project_is_json(cli_base, tmp_path, base_url): + session = tmp_path / "empty-session.json" + result = _run(cli_base, ["--json", "--base-url", base_url, "--session", str(session), "data", "rows"], check=False) + assert result.returncode != 0 + payload = json.loads(result.stderr) + assert payload["ok"] is False + assert "No project selected" in payload["error"] + + +def test_e2e_recovery_delete_project_removes_from_listing(backend, sample_csv): + created = backend.create_project(sample_csv, name="cli-anything-e2e-delete") + project_id = _project_id(created) + backend.delete_project(project_id) + projects = backend.list_projects() + assert project_id not in json.dumps(projects) diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py b/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py new file mode 100644 index 000000000..28c51374c --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/utils/__init__.py @@ -0,0 +1 @@ +"""Utility modules for the OpenRefine harness.""" diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py b/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py new file mode 100644 index 000000000..1e66258da --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/utils/openrefine_backend.py @@ -0,0 +1,215 @@ +from __future__ import annotations + +import json +import shutil +import subprocess +import time +from pathlib import Path +from typing import Any +from urllib.parse import parse_qs, urlparse + +import requests + + +INSTALL_INSTRUCTIONS = """OpenRefine backend is not reachable. + +Install OpenRefine 3.10.x or newer from https://openrefine.org/download.html, then start it: + openrefine -i 127.0.0.1 -p 3333 + +For source builds, run the documented startup command from the OpenRefine repository. +Set OPENREFINE_URL or pass --base-url if your server uses another host or port. +""" + + +class OpenRefineError(RuntimeError): + pass + + +class OpenRefineBackend: + def __init__(self, base_url: str = "http://127.0.0.1:3333", timeout: float = 30.0): + self.base_url = base_url.rstrip("/") + self.timeout = timeout + self.session = requests.Session() + self._csrf_token: str | None = None + + def ping(self) -> dict[str, Any]: + response = self._request("GET", "/command/core/get-version", csrf=False) + try: + return response.json() + except ValueError: + return {"status": "ok", "text": response.text.strip()} + + def wait_until_ready(self, seconds: float = 30.0) -> dict[str, Any]: + deadline = time.time() + seconds + last_error: Exception | None = None + while time.time() < deadline: + try: + return self.ping() + except Exception as exc: # pragma: no cover - exercised by backend E2E + last_error = exc + time.sleep(0.5) + raise OpenRefineError(f"{INSTALL_INSTRUCTIONS}\nLast error: {last_error}") + + def list_projects(self) -> dict[str, Any]: + return self._json("GET", "/command/core/get-all-project-metadata", csrf=False) + + def get_project_metadata(self, project_id: str) -> dict[str, Any]: + return self._json("GET", "/command/core/get-project-metadata", params={"project": project_id}, csrf=False) + + def get_rows(self, project_id: str, start: int = 0, limit: int = 10) -> dict[str, Any]: + return self._json( + "GET", + "/command/core/get-rows", + params={"project": project_id, "start": start, "limit": limit}, + csrf=False, + ) + + def create_project(self, input_path: str | Path, name: str | None = None, project_format: str | None = None) -> dict[str, Any]: + path = Path(input_path) + if not path.exists(): + raise OpenRefineError(f"Input file not found: {path}") + data = {"project-name": name or path.stem} + if project_format: + data["format"] = project_format + with path.open("rb") as handle: + files = {"project-file": (path.name, handle)} + response = self._request("POST", "/command/core/create-project-from-upload", data=data, files=files, csrf=True) + project_id = _project_id_from_url(response.url) + if project_id: + return {"project": project_id, "location": response.url} + payload = _coerce_json_or_text(response.text) + if isinstance(payload, dict): + if payload.get("code") == "error": + raise OpenRefineError(str(payload.get("message") or payload)) + return payload + return {"status": "ok", "text": payload} + + def apply_operations(self, project_id: str, operations: list[dict[str, Any]]) -> dict[str, Any]: + return self._json( + "POST", + "/command/core/apply-operations", + data={"project": project_id, "operations": json.dumps(operations)}, + csrf=True, + ) + + def export_rows(self, project_id: str, output_path: str | Path, export_format: str = "csv") -> Path: + response = self._request( + "POST", + "/command/core/export-rows", + data={"project": project_id, "format": export_format}, + csrf=True, + ) + target = Path(output_path) + target.parent.mkdir(parents=True, exist_ok=True) + target.write_bytes(response.content) + return target + + def get_history(self, project_id: str) -> dict[str, Any]: + return self._json("GET", "/command/core/get-history", params={"project": project_id}, csrf=False) + + def undo(self, project_id: str) -> dict[str, Any]: + entry_id = _latest_history_entry_id(self.get_history(project_id), "past") + if not entry_id: + raise OpenRefineError(f"No OpenRefine history entry to undo for project {project_id}") + return self._json("POST", "/command/core/undo-redo", data={"project": project_id, "undoID": entry_id}, csrf=True) + + def redo(self, project_id: str) -> dict[str, Any]: + entry_id = _latest_history_entry_id(self.get_history(project_id), "future") + if not entry_id: + raise OpenRefineError(f"No OpenRefine history entry to redo for project {project_id}") + return self._json("POST", "/command/core/undo-redo", data={"project": project_id, "lastDoneID": entry_id}, csrf=True) + + def delete_project(self, project_id: str) -> dict[str, Any]: + return self._json("POST", "/command/core/delete-project", data={"project": project_id}, csrf=True) + + def get_csrf_token(self) -> str: + if self._csrf_token: + return self._csrf_token + try: + response = self._request("GET", "/command/core/get-csrf-token", csrf=False) + payload = _coerce_json_or_text(response.text) + if isinstance(payload, dict): + token = payload.get("token") or payload.get("csrfToken") + else: + token = str(payload).strip() + if token: + self._csrf_token = str(token) + return self._csrf_token + except OpenRefineError: + pass + self._csrf_token = "none" + return self._csrf_token + + def _json(self, method: str, path: str, **kwargs: Any) -> dict[str, Any]: + response = self._request(method, path, **kwargs) + try: + payload = response.json() + except ValueError as exc: + raise OpenRefineError(f"Expected JSON from {path}, got: {response.text[:200]}") from exc + if not isinstance(payload, dict): + raise OpenRefineError(f"Expected JSON object from {path}") + return payload + + def _request(self, method: str, path: str, csrf: bool = True, **kwargs: Any) -> requests.Response: + params = dict(kwargs.pop("params", {}) or {}) + data = dict(kwargs.pop("data", {}) or {}) + if csrf and method.upper() in {"POST", "PUT", "DELETE"}: + params.setdefault("csrf_token", self.get_csrf_token()) + url = f"{self.base_url}{path}" + try: + response = self.session.request(method, url, params=params, data=data or None, timeout=self.timeout, **kwargs) + except requests.RequestException as exc: + raise OpenRefineError(f"{INSTALL_INSTRUCTIONS}\nRequest failed for {url}: {exc}") from exc + if response.status_code >= 400: + raise OpenRefineError(f"OpenRefine HTTP {response.status_code} for {url}: {response.text[:500]}") + return response + + +def find_openrefine_executable() -> str | None: + for name in ("openrefine", "refine", "OpenRefine"): + path = shutil.which(name) + if path: + return path + return None + + +def start_openrefine(port: int = 3333, host: str = "127.0.0.1", data_dir: str | Path | None = None) -> subprocess.Popen: + exe = find_openrefine_executable() + if not exe: + raise OpenRefineError(INSTALL_INSTRUCTIONS) + args = [exe, "-i", host, "-p", str(port)] + if data_dir: + args.extend(["-d", str(data_dir)]) + return subprocess.Popen(args, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) + + +def _coerce_json_or_text(text: str) -> Any: + stripped = text.strip() + if not stripped: + return "" + try: + return json.loads(stripped) + except ValueError: + return stripped + + +def _project_id_from_url(url: str) -> str | None: + parsed = urlparse(url) + values = parse_qs(parsed.query).get("project") or parse_qs(parsed.query).get("projectID") + if values and values[0]: + return str(values[0]) + return None + + +def _latest_history_entry_id(history: dict[str, Any], stack_name: str) -> str | None: + entries = history.get(stack_name) or [] + if not isinstance(entries, list) or not entries: + return None + entry = entries[-1] if stack_name == "past" else entries[0] + if not isinstance(entry, dict): + return None + for key in ("id", "historyEntryID", "history_entry_id"): + value = entry.get(key) + if value is not None: + return str(value) + return None diff --git a/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py b/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py new file mode 100644 index 000000000..bc1fb6d1d --- /dev/null +++ b/openrefine/agent-harness/cli_anything/openrefine/utils/repl_skin.py @@ -0,0 +1,567 @@ +"""cli-anything REPL Skin — Unified terminal interface for all CLI harnesses. + +Copy this file into your CLI package at: + cli_anything//utils/repl_skin.py + +Usage: + from cli_anything..utils.repl_skin import ReplSkin + + skin = ReplSkin("shotcut", version="1.0.0") + skin.print_banner() # auto-detects repo-root or packaged SKILL.md + prompt_text = skin.prompt(project_name="my_video.mlt", modified=True) + skin.success("Project saved") + skin.error("File not found") + skin.warning("Unsaved changes") + skin.info("Processing 24 clips...") + skin.status("Track 1", "3 clips, 00:02:30") + skin.table(headers, rows) + skin.print_goodbye() +""" + +import os +import sys +from pathlib import Path + +# ── ANSI color codes (no external deps for core styling) ────────────── + +_RESET = "\033[0m" +_BOLD = "\033[1m" +_DIM = "\033[2m" +_ITALIC = "\033[3m" +_UNDERLINE = "\033[4m" + +# Brand colors +_CYAN = "\033[38;5;80m" # cli-anything brand cyan +_CYAN_BG = "\033[48;5;80m" +_WHITE = "\033[97m" +_GRAY = "\033[38;5;245m" +_DARK_GRAY = "\033[38;5;240m" +_LIGHT_GRAY = "\033[38;5;250m" + +# Software accent colors — each software gets a unique accent +_ACCENT_COLORS = { + "gimp": "\033[38;5;214m", # warm orange + "blender": "\033[38;5;208m", # deep orange + "inkscape": "\033[38;5;39m", # bright blue + "audacity": "\033[38;5;33m", # navy blue + "libreoffice": "\033[38;5;40m", # green + "obs_studio": "\033[38;5;55m", # purple + "kdenlive": "\033[38;5;69m", # slate blue + "shotcut": "\033[38;5;35m", # teal green +} +_DEFAULT_ACCENT = "\033[38;5;75m" # default sky blue + +# Status colors +_GREEN = "\033[38;5;78m" +_YELLOW = "\033[38;5;220m" +_RED = "\033[38;5;196m" +_BLUE = "\033[38;5;75m" +_MAGENTA = "\033[38;5;176m" + +_SKILL_SOURCE_REPO = os.environ.get("CLI_ANYTHING_SKILL_REPO", "HKUDS/CLI-Anything") + +# ── Brand icon ──────────────────────────────────────────────────────── + +# The cli-anything icon: a small colored diamond/chevron mark +_ICON = f"{_CYAN}{_BOLD}◆{_RESET}" +_ICON_SMALL = f"{_CYAN}▸{_RESET}" + +# ── Box drawing characters ──────────────────────────────────────────── + +_H_LINE = "─" +_V_LINE = "│" +_TL = "╭" +_TR = "╮" +_BL = "╰" +_BR = "╯" +_T_DOWN = "┬" +_T_UP = "┴" +_T_RIGHT = "├" +_T_LEFT = "┤" +_CROSS = "┼" + + +def _strip_ansi(text: str) -> str: + """Remove ANSI escape codes for length calculation.""" + import re + return re.sub(r"\033\[[^m]*m", "", text) + + +def _visible_len(text: str) -> int: + """Get visible length of text (excluding ANSI codes).""" + return len(_strip_ansi(text)) + + +def _display_home_path(path: str) -> str: + """Display a path relative to the home directory when possible.""" + expanded = Path(path).expanduser().resolve() + home = Path.home().resolve() + try: + relative = expanded.relative_to(home) + return f"~/{relative.as_posix()}" + except ValueError: + return str(expanded) + + +class ReplSkin: + """Unified REPL skin for cli-anything CLIs. + + Provides consistent branding, prompts, and message formatting + across all CLI harnesses built with the cli-anything methodology. + """ + + def __init__(self, software: str, version: str = "1.0.0", + history_file: str | None = None, skill_path: str | None = None): + """Initialize the REPL skin. + + Args: + software: Software name (e.g., "gimp", "shotcut", "blender"). + version: CLI version string. + history_file: Path for persistent command history. + Defaults to ~/.cli-anything-/history + skill_path: Path to the SKILL.md file for agent discovery. + Auto-detected from the repo-root skills/ tree when present, + otherwise from the package's skills/ directory. + Displayed in banner for AI agents to know where to read skill info. + """ + self.software = software.lower().replace("-", "_") + self.display_name = software.replace("_", " ").title() + self.version = version + software_aliases = {"iterm2_ctl": "iterm2"} + self.skill_slug = software_aliases.get(self.software, self.software).replace("_", "-") + self.skill_id = f"cli-anything-{self.skill_slug}" + self.skill_install_cmd = ( + f"npx skills add {_SKILL_SOURCE_REPO} --skill {self.skill_id} -g -y" + ) + global_skill_root = Path( + os.environ.get("CLI_ANYTHING_GLOBAL_SKILLS_DIR", str(Path.home() / ".agents" / "skills")) + ).expanduser() + self.global_skill_path = str(global_skill_root / self.skill_id / "SKILL.md") + + # Prefer repo-root canonical skills//SKILL.md when running + # inside the CLI-Anything monorepo. Fall back to the packaged + # cli_anything//skills/SKILL.md for installed harnesses. + if skill_path is None: + package_skill = Path(__file__).resolve().parent.parent / "skills" / "SKILL.md" + repo_skill = None + for parent in Path(__file__).resolve().parents: + candidate = parent / "skills" / self.skill_id / "SKILL.md" + if candidate.is_file(): + repo_skill = candidate + break + if repo_skill and repo_skill.is_file(): + skill_path = str(repo_skill) + elif package_skill.is_file(): + skill_path = str(package_skill) + self.skill_path = skill_path + self.accent = _ACCENT_COLORS.get(self.software, _DEFAULT_ACCENT) + + # History file + if history_file is None: + hist_dir = Path.home() / f".cli-anything-{self.software}" + hist_dir.mkdir(parents=True, exist_ok=True) + self.history_file = str(hist_dir / "history") + else: + self.history_file = history_file + + # Detect terminal capabilities + self._color = self._detect_color_support() + + def _detect_color_support(self) -> bool: + """Check if terminal supports color.""" + if os.environ.get("NO_COLOR"): + return False + if os.environ.get("CLI_ANYTHING_NO_COLOR"): + return False + if not hasattr(sys.stdout, "isatty"): + return False + return sys.stdout.isatty() + + def _c(self, code: str, text: str) -> str: + """Apply color code if colors are supported.""" + if not self._color: + return text + return f"{code}{text}{_RESET}" + + # ── Banner ──────────────────────────────────────────────────────── + + def print_banner(self): + """Print the startup banner with branding.""" + import textwrap + + inner = 72 + + def _box_line(content: str) -> str: + """Wrap content in box drawing, padding to inner width.""" + pad = inner - _visible_len(content) + vl = self._c(_DARK_GRAY, _V_LINE) + return f"{vl}{content}{' ' * max(0, pad)}{vl}" + + def _meta_lines(label: str, value: str) -> list[str]: + """Wrap a metadata line for the banner box.""" + icon = self._c(_MAGENTA, "◇") + label_text = self._c(_DARK_GRAY, label) + prefix = f" {icon} {label_text} " + available = max(12, inner - _visible_len(prefix)) + wrapped = textwrap.wrap( + value, + width=available, + break_long_words=True, + break_on_hyphens=False, + ) or [""] + lines = [f"{prefix}{self._c(_LIGHT_GRAY, wrapped[0])}"] + continuation_prefix = " " * _visible_len(prefix) + for chunk in wrapped[1:]: + lines.append(f"{continuation_prefix}{self._c(_LIGHT_GRAY, chunk)}") + return lines + + top = self._c(_DARK_GRAY, f"{_TL}{_H_LINE * inner}{_TR}") + bot = self._c(_DARK_GRAY, f"{_BL}{_H_LINE * inner}{_BR}") + + # Title: ◆ cli-anything · Shotcut + icon = self._c(_CYAN + _BOLD, "◆") + brand = self._c(_CYAN + _BOLD, "cli-anything") + dot = self._c(_DARK_GRAY, "·") + name = self._c(self.accent + _BOLD, self.display_name) + title = f" {icon} {brand} {dot} {name}" + + ver = f" {self._c(_DARK_GRAY, f' v{self.version}')}" + tip = f" {self._c(_DARK_GRAY, ' Type help for commands, quit to exit')}" + empty = "" + + meta_lines: list[str] = [] + meta_lines.extend(_meta_lines("Install:", self.skill_install_cmd)) + meta_lines.extend(_meta_lines("Global skill:", _display_home_path(self.global_skill_path))) + print(top) + print(_box_line(title)) + print(_box_line(ver)) + for line in meta_lines: + print(_box_line(line)) + print(_box_line(empty)) + print(_box_line(tip)) + print(bot) + print() + + # ── Prompt ──────────────────────────────────────────────────────── + + def prompt(self, project_name: str = "", modified: bool = False, + context: str = "") -> str: + """Build a styled prompt string for prompt_toolkit or input(). + + Args: + project_name: Current project name (empty if none open). + modified: Whether the project has unsaved changes. + context: Optional extra context to show in prompt. + + Returns: + Formatted prompt string. + """ + parts = [] + + # Icon + if self._color: + parts.append(f"{_CYAN}◆{_RESET} ") + else: + parts.append("> ") + + # Software name + parts.append(self._c(self.accent + _BOLD, self.software)) + + # Project context + if project_name or context: + ctx = context or project_name + mod = "*" if modified else "" + parts.append(f" {self._c(_DARK_GRAY, '[')}") + parts.append(self._c(_LIGHT_GRAY, f"{ctx}{mod}")) + parts.append(self._c(_DARK_GRAY, ']')) + + parts.append(self._c(_GRAY, " ❯ ")) + + return "".join(parts) + + def prompt_tokens(self, project_name: str = "", modified: bool = False, + context: str = ""): + """Build prompt_toolkit formatted text tokens for the prompt. + + Use with prompt_toolkit's FormattedText for proper ANSI handling. + + Returns: + list of (style, text) tuples for prompt_toolkit. + """ + accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff") + tokens = [] + + tokens.append(("class:icon", "◆ ")) + tokens.append(("class:software", self.software)) + + if project_name or context: + ctx = context or project_name + mod = "*" if modified else "" + tokens.append(("class:bracket", " [")) + tokens.append(("class:context", f"{ctx}{mod}")) + tokens.append(("class:bracket", "]")) + + tokens.append(("class:arrow", " ❯ ")) + + return tokens + + def get_prompt_style(self): + """Get a prompt_toolkit Style object matching the skin. + + Returns: + prompt_toolkit.styles.Style + """ + try: + from prompt_toolkit.styles import Style + except ImportError: + return None + + accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff") + + return Style.from_dict({ + "icon": "#5fdfdf bold", # cyan brand color + "software": f"{accent_hex} bold", + "bracket": "#585858", + "context": "#bcbcbc", + "arrow": "#808080", + # Completion menu + "completion-menu.completion": "bg:#303030 #bcbcbc", + "completion-menu.completion.current": f"bg:{accent_hex} #000000", + "completion-menu.meta.completion": "bg:#303030 #808080", + "completion-menu.meta.completion.current": f"bg:{accent_hex} #000000", + # Auto-suggest + "auto-suggest": "#585858", + # Bottom toolbar + "bottom-toolbar": "bg:#1c1c1c #808080", + "bottom-toolbar.text": "#808080", + }) + + # ── Messages ────────────────────────────────────────────────────── + + def success(self, message: str): + """Print a success message with green checkmark.""" + icon = self._c(_GREEN + _BOLD, "✓") + print(f" {icon} {self._c(_GREEN, message)}") + + def error(self, message: str): + """Print an error message with red cross.""" + icon = self._c(_RED + _BOLD, "✗") + print(f" {icon} {self._c(_RED, message)}", file=sys.stderr) + + def warning(self, message: str): + """Print a warning message with yellow triangle.""" + icon = self._c(_YELLOW + _BOLD, "⚠") + print(f" {icon} {self._c(_YELLOW, message)}") + + def info(self, message: str): + """Print an info message with blue dot.""" + icon = self._c(_BLUE, "●") + print(f" {icon} {self._c(_LIGHT_GRAY, message)}") + + def hint(self, message: str): + """Print a subtle hint message.""" + print(f" {self._c(_DARK_GRAY, message)}") + + def section(self, title: str): + """Print a section header.""" + print() + print(f" {self._c(self.accent + _BOLD, title)}") + print(f" {self._c(_DARK_GRAY, _H_LINE * len(title))}") + + # ── Status display ──────────────────────────────────────────────── + + def status(self, label: str, value: str): + """Print a key-value status line.""" + lbl = self._c(_GRAY, f" {label}:") + val = self._c(_WHITE, f" {value}") + print(f"{lbl}{val}") + + def status_block(self, items: dict[str, str], title: str = ""): + """Print a block of status key-value pairs. + + Args: + items: Dict of label -> value pairs. + title: Optional title for the block. + """ + if title: + self.section(title) + + max_key = max(len(k) for k in items) if items else 0 + for label, value in items.items(): + lbl = self._c(_GRAY, f" {label:<{max_key}}") + val = self._c(_WHITE, f" {value}") + print(f"{lbl}{val}") + + def progress(self, current: int, total: int, label: str = ""): + """Print a simple progress indicator. + + Args: + current: Current step number. + total: Total number of steps. + label: Optional label for the progress. + """ + pct = int(current / total * 100) if total > 0 else 0 + bar_width = 20 + filled = int(bar_width * current / total) if total > 0 else 0 + bar = "█" * filled + "░" * (bar_width - filled) + text = f" {self._c(_CYAN, bar)} {self._c(_GRAY, f'{pct:3d}%')}" + if label: + text += f" {self._c(_LIGHT_GRAY, label)}" + print(text) + + # ── Table display ───────────────────────────────────────────────── + + def table(self, headers: list[str], rows: list[list[str]], + max_col_width: int = 40): + """Print a formatted table with box-drawing characters. + + Args: + headers: Column header strings. + rows: List of rows, each a list of cell strings. + max_col_width: Maximum column width before truncation. + """ + if not headers: + return + + # Calculate column widths + col_widths = [min(len(h), max_col_width) for h in headers] + for row in rows: + for i, cell in enumerate(row): + if i < len(col_widths): + col_widths[i] = min( + max(col_widths[i], len(str(cell))), max_col_width + ) + + def pad(text: str, width: int) -> str: + t = str(text)[:width] + return t + " " * (width - len(t)) + + # Header + header_cells = [ + self._c(_CYAN + _BOLD, pad(h, col_widths[i])) + for i, h in enumerate(headers) + ] + sep = self._c(_DARK_GRAY, f" {_V_LINE} ") + header_line = f" {sep.join(header_cells)}" + print(header_line) + + # Separator + sep_parts = [self._c(_DARK_GRAY, _H_LINE * w) for w in col_widths] + sep_line = self._c(_DARK_GRAY, f" {'───'.join([_H_LINE * w for w in col_widths])}") + print(sep_line) + + # Rows + for row in rows: + cells = [] + for i, cell in enumerate(row): + if i < len(col_widths): + cells.append(self._c(_LIGHT_GRAY, pad(str(cell), col_widths[i]))) + row_sep = self._c(_DARK_GRAY, f" {_V_LINE} ") + print(f" {row_sep.join(cells)}") + + # ── Help display ────────────────────────────────────────────────── + + def help(self, commands: dict[str, str]): + """Print a formatted help listing. + + Args: + commands: Dict of command -> description pairs. + """ + self.section("Commands") + max_cmd = max(len(c) for c in commands) if commands else 0 + for cmd, desc in commands.items(): + cmd_styled = self._c(self.accent, f" {cmd:<{max_cmd}}") + desc_styled = self._c(_GRAY, f" {desc}") + print(f"{cmd_styled}{desc_styled}") + print() + + # ── Goodbye ─────────────────────────────────────────────────────── + + def print_goodbye(self): + """Print a styled goodbye message.""" + print(f"\n {_ICON_SMALL} {self._c(_GRAY, 'Goodbye!')}\n") + + # ── Prompt toolkit session factory ──────────────────────────────── + + def create_prompt_session(self): + """Create a prompt_toolkit PromptSession with skin styling. + + Returns: + A configured PromptSession, or None if prompt_toolkit unavailable. + """ + try: + from prompt_toolkit import PromptSession + from prompt_toolkit.history import FileHistory + from prompt_toolkit.auto_suggest import AutoSuggestFromHistory + from prompt_toolkit.formatted_text import FormattedText + + style = self.get_prompt_style() + + session = PromptSession( + history=FileHistory(self.history_file), + auto_suggest=AutoSuggestFromHistory(), + style=style, + enable_history_search=True, + ) + return session + except ImportError: + return None + + def get_input(self, pt_session, project_name: str = "", + modified: bool = False, context: str = "") -> str: + """Get input from user using prompt_toolkit or fallback. + + Args: + pt_session: A prompt_toolkit PromptSession (or None). + project_name: Current project name. + modified: Whether project has unsaved changes. + context: Optional context string. + + Returns: + User input string (stripped). + """ + if pt_session is not None: + from prompt_toolkit.formatted_text import FormattedText + tokens = self.prompt_tokens(project_name, modified, context) + return pt_session.prompt(FormattedText(tokens)).strip() + else: + raw_prompt = self.prompt(project_name, modified, context) + return input(raw_prompt).strip() + + # ── Toolbar builder ─────────────────────────────────────────────── + + def bottom_toolbar(self, items: dict[str, str]): + """Create a bottom toolbar callback for prompt_toolkit. + + Args: + items: Dict of label -> value pairs to show in toolbar. + + Returns: + A callable that returns FormattedText for the toolbar. + """ + def toolbar(): + from prompt_toolkit.formatted_text import FormattedText + parts = [] + for i, (k, v) in enumerate(items.items()): + if i > 0: + parts.append(("class:bottom-toolbar.text", " │ ")) + parts.append(("class:bottom-toolbar.text", f" {k}: ")) + parts.append(("class:bottom-toolbar", v)) + return FormattedText(parts) + return toolbar + + +# ── ANSI 256-color to hex mapping (for prompt_toolkit styles) ───────── + +_ANSI_256_TO_HEX = { + "\033[38;5;33m": "#0087ff", # audacity navy blue + "\033[38;5;35m": "#00af5f", # shotcut teal + "\033[38;5;39m": "#00afff", # inkscape bright blue + "\033[38;5;40m": "#00d700", # libreoffice green + "\033[38;5;55m": "#5f00af", # obs purple + "\033[38;5;69m": "#5f87ff", # kdenlive slate blue + "\033[38;5;75m": "#5fafff", # default sky blue + "\033[38;5;80m": "#5fd7d7", # brand cyan + "\033[38;5;208m": "#ff8700", # blender deep orange + "\033[38;5;214m": "#ffaf00", # gimp warm orange +} diff --git a/openrefine/agent-harness/coverage.matrix.json b/openrefine/agent-harness/coverage.matrix.json new file mode 100644 index 000000000..35ffb1d8c --- /dev/null +++ b/openrefine/agent-harness/coverage.matrix.json @@ -0,0 +1,78 @@ +{ + "software": "OpenRefine", + "workflows": [ + { + "use_case": "Import messy CSV files into OpenRefine projects and inspect project metadata.", + "cli_commands": [ + "cli-anything-openrefine project import --name --json", + "cli-anything-openrefine project list --json", + "cli-anything-openrefine data rows --limit 2 --json" + ], + "backend_interfaces": [ + "POST /command/core/create-project-from-upload", + "GET /command/core/get-project-metadata", + "GET /command/core/get-rows" + ], + "unit_tests": [ + "test_service_import_file_persists_project", + "test_service_list_projects", + "test_service_rows_uses_project_override" + ], + "e2e_tests": [ + "test_e2e_import_csv_and_metadata", + "test_e2e_get_rows_after_import", + "test_e2e_cli_json_import_rows_export_workflow" + ] + }, + { + "use_case": "Build reusable operation histories, apply them to projects, and export cleaned rows.", + "cli_commands": [ + "cli-anything-openrefine ops text-transform --column Name --expression value.trim() --json", + "cli-anything-openrefine data apply --json", + "cli-anything-openrefine data export --format csv --json" + ], + "backend_interfaces": [ + "POST /command/core/apply-operations", + "POST /command/core/export-rows" + ], + "unit_tests": [ + "test_text_transform_shape", + "test_save_and_load_operations_roundtrip", + "test_service_apply_operations_uses_session_project", + "test_service_export_writes_output_and_session" + ], + "e2e_tests": [ + "test_e2e_apply_text_transform_and_export_csv", + "test_e2e_apply_mass_edit_normalizes_city", + "test_e2e_cli_build_apply_operation_file" + ] + }, + { + "use_case": "Persist CLI session state, report backend health, and recover with undo, redo, and project deletion.", + "cli_commands": [ + "cli-anything-openrefine server ping --json", + "cli-anything-openrefine session show --json", + "cli-anything-openrefine session undo --json", + "cli-anything-openrefine session redo --json" + ], + "backend_interfaces": [ + "GET /command/core/get-version", + "POST /command/core/undo-redo", + "POST /command/core/delete-project", + "GET /command/core/get-all-project-metadata" + ], + "unit_tests": [ + "test_session_save_creates_parent_and_loads", + "test_session_undo_moves_to_future", + "test_session_redo_moves_to_history", + "test_service_open_project_persists_session" + ], + "e2e_tests": [ + "test_e2e_backend_ping_reports_version", + "test_e2e_cli_session_persistence", + "test_e2e_backend_undo_redo_after_transform", + "test_e2e_recovery_delete_project_removes_from_listing" + ] + } + ] +} diff --git a/openrefine/agent-harness/e2e.backend.json b/openrefine/agent-harness/e2e.backend.json new file mode 100644 index 000000000..ca0c5bb0c --- /dev/null +++ b/openrefine/agent-harness/e2e.backend.json @@ -0,0 +1,28 @@ +{ + "name": "openrefine", + "backend_type": "local-http-server", + "start_command": [ + "openrefine", + "-i", + "127.0.0.1", + "-p", + "3333" + ], + "provisioning": { + "download_url": "https://github.com/OpenRefine/OpenRefine/releases/download/3.10.1/openrefine-linux-3.10.1.tar.gz", + "extract_note": "Extract the OpenRefine release tarball and run the openrefine command, or the bundled refine executable, with -i 127.0.0.1 -p 3333.", + "data_dir": "/tmp/openrefine-data" + }, + "readiness": { + "type": "http", + "url": "http://127.0.0.1:3333/command/core/get-version", + "timeout_seconds": 60 + }, + "e2e_command": [ + "python3", + "-m", + "pytest", + "cli_anything/openrefine/tests/test_full_e2e.py", + "-q" + ] +} diff --git a/openrefine/agent-harness/setup.py b/openrefine/agent-harness/setup.py new file mode 100644 index 000000000..3e779ef0c --- /dev/null +++ b/openrefine/agent-harness/setup.py @@ -0,0 +1,29 @@ +from setuptools import find_namespace_packages, setup + + +setup( + name="cli-anything-openrefine", + version="1.0.0", + description="CLI-Anything harness for OpenRefine data wrangling workflows", + long_description="Agent-native Click CLI for OpenRefine's local HTTP API, operation histories, exports, and sessions.", + author="CLI-Anything-Team", + author_email="", + maintainer="CLI-Anything-Team", + url="https://github.com/HKUDS/CLI-Anything", + python_requires=">=3.10", + packages=find_namespace_packages(include=["cli_anything.*"]), + install_requires=[ + "click>=8.0", + "requests>=2.28", + "prompt-toolkit>=3.0", + ], + extras_require={"dev": ["pytest>=7.0"]}, + package_data={ + "cli_anything.openrefine": ["skills/*.md"], + }, + entry_points={ + "console_scripts": [ + "cli-anything-openrefine=cli_anything.openrefine.openrefine_cli:main", + ], + }, +) diff --git a/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md b/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md new file mode 100644 index 000000000..39679515b --- /dev/null +++ b/openrefine/agent-harness/skills/cli-anything-openrefine/SKILL.md @@ -0,0 +1,10 @@ +--- +name: "cli-anything-openrefine" +description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo." +contributor: "CLI-Anything-Team" +--- + +# CLI-Anything OpenRefine + +This compatibility copy mirrors `skills/cli-anything-openrefine/SKILL.md` at the standalone output root. +Use `cli-anything-openrefine --json` for project import, operation-history application, row export, and session undo/redo against a running OpenRefine server. diff --git a/registry.json b/registry.json index ffa3928b7..1068eae4d 100644 --- a/registry.json +++ b/registry.json @@ -24,6 +24,25 @@ } ] }, + { + "name": "openrefine", + "display_name": "OpenRefine", + "version": "1.0.0", + "description": "Agent-native CLI for OpenRefine import, operation-history cleaning, row inspection, export, and session undo/redo through the real local HTTP API.", + "requires": "OpenRefine 3.10.x or newer running as a local web server", + "homepage": "https://openrefine.org/", + "source_url": null, + "install_cmd": "pip install git+https://github.com/HKUDS/CLI-Anything.git#subdirectory=openrefine/agent-harness", + "entry_point": "cli-anything-openrefine", + "skill_md": "skills/cli-anything-openrefine/SKILL.md", + "category": "database", + "contributors": [ + { + "name": "CLI-Anything-Team", + "url": "https://github.com/HKUDS/CLI-Anything" + } + ] + }, { "name": "cc-switch", "display_name": "CC Switch", diff --git a/skills/cli-anything-openrefine/SKILL.md b/skills/cli-anything-openrefine/SKILL.md new file mode 100644 index 000000000..2ef724c88 --- /dev/null +++ b/skills/cli-anything-openrefine/SKILL.md @@ -0,0 +1,56 @@ +--- +name: "cli-anything-openrefine" +description: "Use OpenRefine through an agent-native CLI for importing messy data, applying JSON operation histories, inspecting rows, exporting cleaned data, and managing session undo/redo." +contributor: "CLI-Anything-Team" +--- + +# CLI-Anything OpenRefine + +Use this skill when a task needs OpenRefine data cleaning, transformation, reusable operation histories, or CSV/TSV export from an automated agent workflow. + +## Prerequisites + +Install the harness: + +```bash +cd openrefine/agent-harness +python -m pip install -e . +``` + +Start OpenRefine before backend commands: + +```bash +openrefine -i 127.0.0.1 -p 3333 +``` + +Set a custom server with `OPENREFINE_URL=http://127.0.0.1:3333` or pass `--base-url`. + +## Command Rules For Agents + +- Prefer `--json` on every one-shot command. +- Use `--session ` for isolated task state. +- Import or open a project before row, apply, export, undo, or redo commands. +- Existing OpenRefine operation-history JSON can be passed directly to `data apply`. +- Generated files are normal OpenRefine operation JSON and exported CSV/TSV data. + +## Common Commands + +```bash +cli-anything-openrefine --json server ping +cli-anything-openrefine --json project list +cli-anything-openrefine --json --session run/session.json project import messy.csv --name cleanup +cli-anything-openrefine --json --session run/session.json data rows --limit 10 +cli-anything-openrefine --json ops text-transform run/trim.json --column Name --expression 'value.trim()' +cli-anything-openrefine --json --session run/session.json data apply run/trim.json +cli-anything-openrefine --json --session run/session.json data export run/clean.csv --format csv +cli-anything-openrefine --json --session run/session.json session undo +cli-anything-openrefine --json --session run/session.json session redo +``` + +## REPL + +Run `cli-anything-openrefine` with no subcommand to enter the REPL. + +## Error Handling + +When `--json` is set, command failures write a JSON object to stderr with `ok: false`.