feat: surface gate detail in the workflow run/resume --json payload (#2965)

* feat: surface gate detail in the workflow run/resume --json payload A paused run was indistinguishable from any other pause in the machine-readable outcome, and the gate's prompt/options/choice never left the human-facing stream. Record each step's type in the run state's step results (one engine line) and, when the run sits at a gate, add a gate block (step_id/message/options/choice) to the payload so orchestrators can drive review gates without parsing stdout. Reference implementation for the proposal in #2964. Addresses #2964 * fix(workflow): only surface gate detail in --json when the run is paused Address review (#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(workflows): surface gate block on aborted runs; stabilize message Address Copilot review: - `_gate_outcome` now also surfaces the gate block when a run is `aborted` by a gate rejection (`on_reject: abort`), not only when `paused`. Abort is the only path that sets ABORTED and it leaves current_step_id on the gate, so an orchestrator can read the recorded `choice` for the stop. - Coerce `message` to a string (it may be a non-string YAML literal that GateStep only coerces for interpolation) so the JSON schema stays stable. - Tests: add a CLI-level aborted-path test, a message-coercion test, and extend the suppression test to allow `aborted`; share the run helper via `_invoke_json` to avoid duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): assert clean exit in gate-abort JSON test Address Copilot review: the gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would surface as an opaque JSON decode error. Route it through `_run_json` (which asserts exit_code == 0 before parsing) and drop the now-redundant `_invoke_json` helper — a gate abort emits the payload and returns, so the run exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: use result.output in run-helper assert; document step_data shape Address Copilot review: - `_run_json` asserted with `result.stdout` in the message, but under `--json` step output is redirected off stdout — the useful diagnostics live on `result.output`. Switch the assertion message to `result.output` (the JSON parse still reads stdout), matching the other CLI tests. - `StepContext.steps` documented a 5-key entry shape; the engine now also persists `type` and `status`. Update the docstring to the canonical 7-key shape so step authors/debuggers see the real record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): align gate-abort JSON test with aborted→exit-1 After rebasing onto main, a gate abort now emits the --json payload and then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the merged exit-code work). Give `_run_json` an `expected_exit` parameter (default 0) so the abort case asserts exit 1 while the paused/completed cases stay at 0 — keeping a single shared helper rather than duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): backward-compat gate detection + normalize gate options Address Copilot review: - A run paused by an older version has no persisted step `type`, so `_gate_outcome` would never surface its gate block on resume. Add `_is_gate_step`: prefer the `type` field, but when it is absent fall back to the gate's unique output signature (`on_reject`, written only by GateStep). A record with a different known `type` is still not a gate. - Normalize `options` to a list of strings (mirroring the `message` coercion) so an unvalidated workflow with non-string options can't destabilize the JSON schema. - Tests: options coercion, type-less gate detection, and a type-less non-gate negative case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize non-list gate options to a stable list[str] Address Copilot review: the prior options normalization only mapped a `list`, returning the raw value for any other shape (scalar/tuple), which contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`: None stays None; list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is one option, never iterated character-by-character). The emitted schema is now always list[str] | None. Extend the options test to cover list, tuple, bare string, numeric scalar, and None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize gate choice to str; portable plain-gate test Address Copilot review: - `_gate_outcome` normalized `message` and `options` but passed `choice` through as-is; an unvalidated gate can record a non-string `choice`, which contradicts the stable-schema rationale. Coerce `choice` to `str | None` (None still means "no decision yet"), consistent with the other two fields. Adds a focused choice-coercion test. - The plain (no-gate) test workflow used `run: "true"`, which fails under cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform `run: "exit 0"` (matching the exit-code suite's workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:28:06 +08:00 · 2026-06-22 19:05:54 +07:00
parent 487af97864
commit f5f76160a3
4 changed files with 309 additions and 4 deletions
--- a/src/specify_cli/init.py
+++ b/src/specify_cli/init.py
@@ -2099,13 +2099,85 @@ def _parse_input_values(input_values: list[str] | None) -> dict[str, Any]:

 def _workflow_run_payload(state: Any) -> dict[str, Any]:
    """Machine-readable summary of a run/resume outcome."""
-    return {
+    payload = {
        "run_id": state.run_id,
        "workflow_id": state.workflow_id,
        "status": state.status.value,
        "current_step_id": state.current_step_id,
        "current_step_index": state.current_step_index,
    }
+    gate = _gate_outcome(state)
+    if gate is not None:
+        payload["gate"] = gate
+    return payload
+
+
+def _is_gate_step(step: dict[str, Any]) -> bool:
+    """Whether a recorded step result is a gate.
+
+    Prefers the persisted ``type`` field, but when it is absent — a run paused
+    by an older version, whose step record predates ``type`` being stored —
+    falls back to the gate's unique output signature: only ``GateStep`` writes
+    an ``on_reject`` key. A record carrying a *different* known ``type`` is not
+    a gate, so the fallback applies only when ``type`` is missing entirely.
+    """
+    step_type = step.get("type")
+    if step_type == "gate":
+        return True
+    if step_type:
+        return False
+    output = step.get("output")
+    return isinstance(output, dict) and "on_reject" in output
+
+
+def _gate_outcome(state: Any) -> dict[str, Any] | None:
+    """Gate detail for the structured outcome, when the run rests at a gate.
+
+    A paused or gate-aborted run is otherwise indistinguishable from any
+    other pause/abort in the machine-readable payload; surfacing the gate's
+    prompt, options, and (after an interactive choice) the decision lets
+    orchestrators drive review gates without parsing the human-facing stream.
+    """
+    # Two run states rest *on* a gate: `paused` (awaiting a decision) and
+    # `aborted` (a gate rejected with `on_reject: abort` — the only path that
+    # sets ABORTED, leaving current_step_id on that gate). Any other status —
+    # notably `completed`/`failed` — must be suppressed: current_step_id is
+    # not cleared when a run whose last executed step was a gate moves on, so
+    # without this guard it would surface stale detail (run/resume/status).
+    if getattr(state.status, "value", state.status) not in ("paused", "aborted"):
+        return None
+    step = (getattr(state, "step_results", None) or {}).get(state.current_step_id)
+    if not isinstance(step, dict) or not _is_gate_step(step):
+        return None
+    output = step.get("output") or {}
+    # `message`, `options`, and `choice` may be non-string YAML literals in an
+    # unvalidated workflow (GateStep coerces none of them for the payload), so
+    # normalise all three for a stable JSON schema: message → str, options →
+    # list[str] | None, choice → str | None (None means no decision yet).
+    message = output.get("message")
+    choice = output.get("choice")
+    return {
+        "step_id": state.current_step_id,
+        "message": None if message is None else str(message),
+        "options": _normalize_gate_options(output.get("options")),
+        "choice": None if choice is None else str(choice),
+    }
+
+
+def _normalize_gate_options(options: Any) -> list[str] | None:
+    """Normalise a gate's ``options`` to a stable ``list[str]`` (or ``None``).
+
+    A valid gate stores a list, but an unvalidated workflow could leave a
+    scalar or tuple. ``None`` stays ``None`` (no options); a list/tuple maps
+    each element through ``str``; any other scalar becomes a single-element
+    list — so the emitted JSON schema is always ``list[str] | None``. A bare
+    string is treated as one option, never iterated character-by-character.
+    """
+    if options is None:
+        return None
+    if isinstance(options, (list, tuple)):
+        return [str(o) for o in options]
+    return [str(options)]


 def _run_outcome_exit_code(status_value: str) -> int:
--- a/src/specify_cli/workflows/base.py
+++ b/src/specify_cli/workflows/base.py
@@ -47,9 +47,10 @@ class StepContext:
    #: Resolved workflow inputs (from user prompts / defaults).
    inputs: dict[str, Any] = field(default_factory=dict)

-    #: Accumulated step results keyed by step ID.
-    #: Each entry is ``{"integration": ..., "model": ..., "options": ...,
-    #:   "input": ..., "output": ...}``.
+    #: Accumulated step results keyed by step ID. Each entry is the dict the
+    #: engine persists per step:
+    #: ``{"type": ..., "integration": ..., "model": ..., "options": ...,
+    #:   "input": ..., "output": ..., "status": ...}``.
    steps: dict[str, dict[str, Any]] = field(default_factory=dict)

    #: Current fan-out item (set only inside fan-out iterations).
--- a/src/specify_cli/workflows/engine.py
+++ b/src/specify_cli/workflows/engine.py
@@ -676,6 +676,7 @@ class WorkflowEngine:

            # Record step results — prefer resolved values from step output
            step_data = {
+                "type": step_type,
                "integration": result.output.get("integration")
                or step_config.get("integration")
                or context.default_integration,