Commit Graph

19 Commits

Author SHA1 Message Date
lselvar
f512b8b0d1 fix: resolve GitHub release asset API URL for private repo preset and workflow downloads (#2855)
* fix: resolve GitHub release asset API URL for private repo preset and workflow downloads

- Add shared `resolve_github_release_asset_api_url` utility to `_github_http.py` for
  reuse across preset and workflow download paths
- Apply the same private-repo fix from PR #2792 (extensions) to:
  - `PresetCatalog.download_pack` — ZIP downloads via catalog `download_url`
  - `preset add --from <url>` — ZIP downloads from a direct URL
  - `workflow add <url>` — workflow YAML downloads from a direct URL
  - `workflow add <id>` (catalog) — workflow YAML downloads via catalog `url`
- For browser release URLs (`github.com/…/releases/download/…`), the asset is
  resolved via the GitHub REST API and downloaded with `Accept: application/octet-stream`
- Direct REST API asset URLs (`api.github.com/…/releases/assets/<id>`) are
  downloaded directly with `Accept: application/octet-stream`
- Auth is preserved end-to-end through the existing `open_url` infrastructure
- Update `test_download_pack_sends_auth_header` and add
  `test_download_pack_accepts_direct_github_rest_asset_url` to cover both paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: URL-encode tag in release API URL to handle special characters

Encode the tag as a path segment (using quote with safe='') when
building the releases/tags/<tag> API URL. This prevents malformed
URLs when tags contain reserved characters like '/' or '#'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: add CLI-level tests for preset add --from GitHub release URL resolution

Adds regression tests covering:
- resolve_github_release_asset_api_url unit tests (passthrough, resolution,
  network error, URL encoding of special chars in tags)
- CLI-level 'preset add --from <github-release-url>' end-to-end flow
- CLI-level 'preset add --from <api-asset-url>' direct passthrough

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: deduplicate release URL resolution; fix test issues

- ExtensionCatalog._resolve_github_release_asset_api_url now delegates
  to the shared helper in _github_http.py (also gains URL-encoding fix)
- Remove unused 'io' import from test_github_http.py
- Remove duplicate 'provides' dict keys accidentally added to test_presets.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: align resolver timeout with download timeout; add workflow CLI tests

- Pass timeout=30 to resolve_github_release_asset_api_url in both
  workflow add paths so worst-case latency matches the download timeout
- Add CLI-level regression tests for 'workflow add <url>' covering
  browser URL resolution and direct API asset URL passthrough

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove unused urllib.request import; add catalog workflow test

- Remove unused 'import urllib.request' in preset add --from path
- Add CLI test for catalog-based 'workflow add <id>' with GitHub
  release URL resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: remove unused MagicMock imports from tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 10:41:40 -05:00
Huy Do
005c80a9c7 fix(workflows): render gate show_file contents in the interactive prompt (#2810)
* fix(workflows): render gate show_file contents in the interactive prompt

The gate step read and recorded `show_file` but never displayed its
contents at the interactive prompt, so the operator approved/rejected
without seeing the referenced file. Render the file inside the prompt
when stdin is a TTY, with a graceful notice for missing/unreadable
files. Non-interactive PAUSED behaviour, exit codes, resume semantics,
and no-`show_file` output are unchanged.

Closes #2809.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): keep gate _prompt signature stable and harden show_file reads

The gate prompt rendered show_file by passing it as a third positional
argument to _prompt. A test that stubs _prompt with a two-argument lambda
(test_gate_abort_still_halts_with_continue_on_error) then failed once the
branch caught up to main, because the call site passed three arguments to
the two-argument stub.

Compose the show_file material into the displayed message in execute() and
keep _prompt to its (message, options) contract. Display data no longer
widens the interactive seam, so stubbing _prompt stays stable and future
review material can be added without breaking callers. _prompt now renders
a multi-line message inside the gate box.

Also catch ValueError in _read_show_file so a path the OS rejects outright
(e.g. an embedded NUL byte) degrades to a notice instead of crashing the
prompt, matching the helper's stated contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): coerce gate prompt message to str before rendering

The multi-line render loop split the message on newlines, which assumes a
str. A non-string message (e.g. a YAML numeric literal) previously rendered
fine through the old f-string and would now raise on .split. Coerce with
str() to preserve that tolerance, and add a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): make gate stdin handling robust; tidy compose_prompt typing

Address review feedback on the gate tests and helper:

- Swap the gate module's sys.stdin for a fixed-isatty stub (shared
  _StubStdin / _force_gate_stdin helpers) instead of setattr on
  sys.stdin.isatty, which is not assignable under some pytest capture
  modes. This also forces the non-interactive tests to a non-TTY so they
  cannot block on input() when run in a real terminal.
- The non-interactive show_file test now hard-fails if _read_show_file is
  called, proving the file is not read on the PAUSED path.
- _compose_prompt accepts a non-string message (e.g. a YAML numeric
  literal) and always returns str via str(message), keeping its annotation
  and docstring accurate; the redundant coercion in _prompt is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): strip control chars from gate show_file; default tests non-TTY

Address review feedback:

- _read_show_file strips C0 control characters (except tab) from each line,
  so a show_file containing ANSI escape sequences (e.g. \x1b[2J) cannot
  clear the screen or spoof the prompt/options when rendered to a terminal.
- Add an autouse fixture on TestGateStep that defaults every gate test to a
  non-TTY stdin, so no test can drop into the interactive prompt and block
  on input() when the suite runs under a real TTY. Interactive tests opt
  back in via _force_gate_stdin(tty=True); the now-redundant explicit
  non-TTY calls were removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): localize gate stdin patch to the gate module's sys

_force_gate_stdin rebinds the gate module's `sys` name to a stand-in whose
stdin has a fixed isatty() and which delegates every other attribute to the
real sys, instead of mutating the process-wide sys.stdin. This keeps the
patch local to the gate module and leaves real stdin untouched. The gate
abort test, which used the same process-wide swap, now shares the helper, so
the pattern exists in exactly one place.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): sanitize the displayed gate show_file path, not just content

Control characters were stripped from show_file *contents* but the path was
still printed verbatim as the header (`f"{show_file}:"`) and echoed in the
read-error notice, so a show_file path containing ANSI escapes could still
inject terminal sequences. Centralize stripping in `_sanitize_for_display`
and apply it to every show_file-derived string that reaches the terminal —
the displayed path, each file line, and the error notice — while still
opening the file with the original path. Add a test for path sanitization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(workflows): inline control-char stripping, drop the helper

Reuse the existing _CONTROL_CHARS regex directly at the three display sites
instead of wrapping it in a one-line helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): also strip LF and C1 controls from gate show_file display

The control-char class skipped LF (so an embedded newline in a show_file
path could break the boxed layout) and the C1 range (so \x9b CSI and other
8-bit controls survived). Widen the class to [\x00-\x08\x0a-\x1f\x7f-\x9f]
(still keeping tab).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:04:52 -05:00
Samir Abed
34ce66139e feat: add support for rovodev (#2539)
* feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev
2026-06-04 11:34:05 -05:00
Huy Do
141119efea feat(workflows): add JSON output for workflow run resume and status (#2814)
* feat(workflows): add --json output to workflow run, resume, and status

Adds an opt-in `--json` flag to `workflow run`, `workflow resume`, and
`workflow status` that emits a single machine-readable object (run_id,
workflow_id, status, current step; status also reports per-step states
and a runs list) for automation and external orchestrators.

JSON is written via a small `_emit_workflow_json` helper using plain
stdout, so Rich markup, highlighting, and line-wrapping can never alter
the emitted object. Default human-readable output and exit codes are
unchanged when `--json` is omitted. Reference docs updated.

Closes #2811.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): keep --json stdout clean while steps write output

Suppressing the banner and the step-start callback was not enough to
guarantee a single parseable JSON object on stdout: individual steps still
write there while the engine runs. The gate step prints its prompt, and the
prompt step runs a CLI subprocess that inherits the process's stdout file
descriptor — either can corrupt the JSON stream for interactive runs or
integration-backed workflows.

Wrap engine.execute()/engine.resume() in a file-descriptor-level redirect
(dup2) when --json is set, so both Python-level writes and inherited-fd
subprocess output go to stderr while stdout carries only the emitted JSON.
Step progress stays visible on stderr. status does not run the engine, so
it is unaffected.

Tests cover both pollution channels (a Python print and a real subprocess)
via fd-level capture, and the inactive no-op path. Docs note the
stdout/stderr split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(workflows): fix stray escape sequence in --json redirect comments

The redirect helper's docstring and its test comment wrote ``print``\s,
which renders as "print\s" rather than "prints". Replace with plain
"prints".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 11:11:39 -05:00
One-TheOnly
8e5643d4ff fix(cursor-agent): enable headless CLI dispatch end-to-end (-p --trust --approve-mcps --force + Windows .cmd shim resolution) (#2631)
* fix(cursor-agent): enable CLI dispatch via ``-p --trust`` headless mode

Restores the ability for ``specify workflow run`` to dispatch the
cursor-agent CLI, complementing the existing in-IDE skill flow.
Without this fix, ``specify workflow run speckit --input
integration=cursor-agent ...`` fails with a misleading
``CLI not found or not installed`` error even when the CLI is
installed (since cursor-agent had ``requires_cli=False`` and an
unset ``build_exec_args``).

The cursor-agent CLI (>= 2026.05.16) supports headless execution
via ``-p`` (print mode with full tool access including write/shell)
and ``--trust`` (bypass Workspace Trust prompt). Without ``--trust``
the CLI exits non-zero in non-TTY contexts (verified locally).

Changes to ``src/specify_cli/integrations/cursor_agent/__init__.py``:

* ``config.requires_cli``: ``False`` -> ``True``
* ``config.install_url``: ``None`` -> Cursor CLI docs URL
* Override ``build_exec_args()`` to emit
  ``[cursor-agent, -p, --trust, <prompt>, ...]``
  with optional ``--model`` and ``--output-format json`` flags,
  mirroring the shape used by ``claude``/``codex``/``gemini``.

Tests:

* 34 existing cursor-agent tests still pass.
* 6 new tests in ``TestCursorAgentCliDispatch`` pin
  ``requires_cli``, ``install_url``, and the exact argv shape
  (default, text-output, with-model, and the hyphenated skill
  invocation form ``/speckit-<name>``).
* Full repo: 1085 / 1085 passed, no regressions.

Fixes #2629

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(integrations): resolve ``.cmd``/``.bat`` shims before subprocess.run

On Windows, ``shutil.which`` honors ``PATHEXT`` and locates wrappers
like ``cursor-agent.cmd`` and ``codex.cmd``, but Python's
``subprocess.run`` calls ``CreateProcess`` which does **not** consult
``PATHEXT`` and therefore fails with ``WinError 2`` on a bare argv
like ``[cursor-agent, ...]``.

Resolve ``exec_args[0]`` via ``shutil.which`` in
``IntegrationBase.dispatch_command`` so ``.cmd``/``.bat`` shims work
transparently. On POSIX this is a no-op for absolute paths and a
harmless lookup otherwise.

Verified locally on Windows 10 + cursor-agent 2026.05.16:
without this fix, ``specify workflow run speckit --input
integration=cursor-agent`` fails with ``FileNotFoundError`` even
after the cursor-agent integration starts producing valid exec
args (per the prior commit on this branch).

Tests:

* New: 2 cursor-agent tests pin the shim-resolution + passthrough
  behavior (``test_dispatch_command_resolves_cmd_shim_for_subprocess``
  and ``test_dispatch_command_passthrough_when_shutil_which_finds_nothing``).
* Updated: ``tests/test_workflows.py::TestCommandStep::test_dispatch_with_mock_cli``
  was mocking ``shutil.which`` only at the ``command`` step level
  and not at the ``base`` level, which made it environment-sensitive
  (fails locally when the real ``claude`` CLI is on PATH).  Added the
  matching base-level patch and updated the argv-assertion to reflect
  the resolved path. ``test_dispatch_failure_returns_failed_status``
  gets the same patch for consistency.
* Full repo: 2867 passed, 0 regression from this PR. The 12 remaining
  pre-existing failures are unrelated Windows ``symlink`` privilege
  failures (``WinError 1314``) on a non-admin Windows runner.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cursor-agent): inject --approve-mcps --force for headless MCP/tool access

The previous commit (1c55988) wired up ``-p --trust`` so the CLI launches
in headless mode without the Workspace Trust prompt, but that alone is
not enough to let ``specify workflow run`` drive a real speckit feature
end-to-end with cursor-agent on Windows. Two more flags are required:

* ``--approve-mcps``: without it, every MCP server configured in
  ``.cursor/mcp.json`` stays ``not loaded (needs approval)``, and any
  tool call against them is silently dropped. We hit this immediately
  trying to read a DingTalk PRD from a remote MCP server during the
  ``/speckit-specify`` step.
* ``--force``: without it, the agent halts on the first tool-call
  approval prompt (the tool call gets rejected and the workflow exits
  non-zero with a misleading message). With ``--force`` cursor-agent
  matches the implicit "trusted environment" semantics that ``claude -p``
  and ``codex --exec`` already have by default -- which is the right
  semantics for an unattended ``specify workflow run`` invocation.

Verified end-to-end on Windows 10 + cursor-agent 2026.05.16-0338208:

* ``cursor-agent -p --trust --approve-mcps --force --output-format text``
  + a ``/speckit-specify`` prompt that included a DingTalk URL produced
  a full spec.md (31.5 KB) plus checklists/requirements.md in ~10.7 min,
  reading the source PRD through the ``dingtalk-doc`` remote MCP server,
  deciding the ``specs/`` subpath itself, and updating
  ``.specify/feature.json`` and ``specs/menu-dictionary.md`` along the
  way -- no human-in-the-loop, no source PRD ever touched the filesystem.
* Without ``--approve-mcps`` the same prompt errors with the tool call
  rejected message; without ``--force`` the agent stops at the first
  non-MCP tool call.

Tests:

* ``test_build_exec_args_*`` updated to pin the new four-flag prefix.
* New ``test_build_exec_args_contains_mandatory_headless_flags`` asserts
  the four flags are always present together.
* ``test_dispatch_command_resolves_cmd_shim_for_subprocess`` updated to
  match the new argv layout.
* All 43 cursor-agent tests pass; no other tests touched.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(cursor-agent): express dispatch support via build_exec_args() instead of requires_cli

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(cursor-agent): use urlparse hostname check and cover dispatch without requires_cli

Co-authored-by: Cursor <cursoragent@cursor.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: 刘一 <liuyi@oureman.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-04 09:48:33 -05:00
Quratulain-bilal
bb2b49d0ae fix(workflows): validate run_id in RunState.load before touching the … (#2813)
* fix(workflows): validate run_id in RunState.load before touching the filesystem

``RunState.load(run_id, project_root)`` interpolates ``run_id`` directly
into ``project_root / ".specify" / "workflows" / "runs" / run_id`` and
then calls ``state_path.exists()`` and ``json.load`` on the result. The
run_id is reachable from user input via ``specify workflow resume
<run_id>`` (CLI argument) and via ``SPECKIT_WORKFLOW_RUN_ID`` (env var
override on the engine's run path), so a value like ``../escape``
turns ``runs_dir`` into ``.specify/workflows/escape/`` and:

  * ``state_path.exists()`` becomes a file-existence oracle for any
    path the process can read.
  * if a ``state.json`` exists at the traversed location (planted by
    a malicious dependency, a misconfigured shared workspace, or an
    older spec-kit version that happened to write there),
    ``json.load`` parses it and the workflow resumes under the
    attacker-chosen ``workflow_id`` / step state.
  * a subsequent ``state.save()`` then writes back to the traversed
    location, persisting the corruption.

``RunState.__init__`` already validates ``run_id`` against
``r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$'`` — but that check runs on
``state_data["run_id"]`` *after* ``load`` has already done the file
lookup, which is too late to prevent the disclosure.

This change extracts the pattern into a class-level constant
``_RUN_ID_PATTERN`` and a single ``_validate_run_id`` classmethod so
``__init__`` and ``load`` cannot drift, then calls the validator at the
top of ``load`` before any path is built. Mirrors the precedent in
``src/specify_cli/agents.py::_ensure_within_directory`` (used at line
437 of that file) which guards extension-install paths against the
same threat model.

Regression tests parametrize 9 traversal vectors (``../escape``,
``..``, ``../../etc/passwd``, ``foo/bar``, ``foo\bar``, ``.hidden``,
``-flag``, ``foo\x00bar``, empty) and plant a malicious ``state.json``
outside ``runs/`` so a missing guard would surface as a successful
load rather than the ambiguous ``FileNotFoundError``. A second test
asserts ``__init__`` and ``load`` reject the same representative
malformed ID, so future changes to one path can't silently drift from
the other.

* test(workflows): exercise RunState.load in shared-validation test, fix __init__ empty-string asymmetry

Copilot's review on this PR pointed out that
test_init_and_load_share_validation claimed to verify both entry
points share the same validation rules but never actually called
RunState.load — only __init__ and the shared
_validate_run_id helper. A regression in load (e.g. someone
deleting the cls._validate_run_id(run_id) call before the path is
built) would slip through even though __init__ and the helper
stayed aligned, defeating the whole point of the test.

Tightening the test surfaced a real asymmetry the previous version was
silently masking:

    self.run_id = run_id or str(uuid.uuid4())[:8]

The truthiness fallback meant RunState(run_id="") silently
substituted a UUID and skipped validation, while
RunState.load("", project_root) correctly rejected the empty
string. The two entry points diverged on the empty-string vector.
That is exactly the drift the test name claimed to defend against —
and the original test missed it.

Changes
-------

* engine.py: __init__ now distinguishes run_id is None
  (caller omitted it → auto-generate UUID) from an empty string
  (caller provided it → must validate like any other value). Both
  paths still flow through _validate_run_id, but only the
  explicit-None case auto-generates.

* test_workflows.py: test_init_and_load_share_validation is
  now parametrized over one representative vector per category from
  test_load_rejects_path_traversal (parent traversal, embedded
  separator, leading non-alphanumeric, empty string) and asserts that
  *all three* entry points — __init__, _validate_run_id, and
  load — reject the same input. Adding load to the assertion
  is the substantive fix Copilot asked for; keeping __init__ and
  the helper alongside it makes any future drift between the three
  immediately observable instead of having to read three separate
  tests.

Verification
------------

pytest tests/test_workflows.py — 168 passed (was 165 before the
parametrize expansion; __init__ empty-string vector would have
failed the new test against the old engine code, confirming the
asymmetry was real).
2026-06-03 14:26:07 -05:00
Huy Do
1732b9b62e feat(workflows): allow resume to accept updated workflow inputs (#2815)
`workflow resume` now accepts `--input key=value` (the same flag and
parsing as `workflow run`, via a shared `_parse_input_values` helper).
Supplied values are merged over the run's persisted inputs and
re-resolved through the existing typed-validation path
(`_resolve_inputs`), so a resumed/re-run step sees the updated inputs
and ill-typed values fail fast. Keys not supplied keep their persisted
values; resuming without `--input` is unchanged. Reference docs updated.

Distinct from #2405 (file-reference inputs at run time): this is about
supplying inputs at resume time, reusing the existing input model.

Closes #2812.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 12:04:07 -05:00
Huy Do
7bab0568c5 feat(workflows): add continue_on_error step field for non-halting failures (#2663)
* feat(workflows): add continue_on_error step field

Adds an optional `continue_on_error: bool` field on every step.
When set to `true` and the step fails, the engine records the
result (`exit_code`, `stderr` on `steps.<id>.output` plus `status`
as a sibling key on `steps.<id>`) and continues to the next sibling
step instead of halting the run. Downstream `if`, `switch`, or
`gate` steps can then branch on
`{{ steps.<id>.output.exit_code }}` to route the recovery path.

Engine details
--------------
`WorkflowEngine._execute_steps` now consults the step config when a
step returns `StepStatus.FAILED`:

- Gate aborts (`output.aborted`) always halt the run — operator
  decisions take precedence over the flag.
- Otherwise, if `continue_on_error` is the literal `True`, log a
  `step_continue_on_error` event and proceed to the next sibling.
  The runtime check uses identity comparison (`is True`) rather
  than truthiness, so truthy non-bool values like the string
  `"true"` cannot silently change run semantics even if a caller
  bypasses `validate_workflow()`.
- Otherwise, behave as before: log `step_failed`, set
  `RunStatus.FAILED`, and return.

Validation
----------
`_validate_steps` rejects non-bool values for `continue_on_error`.
Coerced strings like `"true"` are not accepted so authoring
mistakes surface at validation time rather than silently changing
run semantics.

Tests
-----
`TestContinueOnError` in `tests/test_workflows.py` (8 tests):
- `test_undeclared_failure_halts_run` — default halt behaviour.
- `test_declared_and_fired_continues_run` — flag + fail → continue.
- `test_declared_but_step_succeeded_is_noop` — flag + success → no-op.
- `test_if_branch_routes_around_failure` — end-to-end recovery.
- `test_gate_abort_still_halts_with_continue_on_error` — abort
  always halts.
- `test_validation_rejects_non_bool_continue_on_error` — `"true"`
  rejected at validation.
- `test_validation_accepts_bool_continue_on_error` — `true`/`false`
  pass cleanly.
- `test_engine_ignores_truthy_non_bool_continue_on_error` —
  defense-in-depth: engine ignores string `"true"` even when
  validation is bypassed.

Rebased onto current upstream/main (post #2664 merge); the new
`TestContinueOnError` class sits immediately after upstream's
`TestContextRunId` so the two feature suites coexist cleanly.

Closes #2591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(workflows): restore runtime context section, clarify gate prompt

Two Copilot findings on d0b9e00:

1. The `### Runtime Context` documentation for `{{ context.* }}` was
   lost during the rebase onto current main (the squash dropped the
   anchor where #2664 had added it). Restored under `## Expressions`
   so users can find `context.run_id` semantics and examples.

2. The continue_on_error example gate had message "Retry or skip?"
   but used the default `options: [approve, reject]` with `on_reject:
   skip`, which implied an automatic retry path that gates do not
   provide. Reworded the message to match the actual approve/reject
   semantics and added an explicit note that retry requires either
   custom gate options + downstream branching or a wrapper loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(workflows): clarify continue_on_error scope — returned FAILED only

Copilot finding on d0b9e00:

The README's "Error Handling" intro implied `continue_on_error` covers
"any other runtime error raised during step execution", but the engine
only consults the flag when a step returns `StepResult(status=FAILED, ...)`.
Exceptions raised out of `step_impl.execute()` propagate to
`WorkflowEngine.execute()`, where the catch-all logs `workflow_failed`
and re-raises — the step result is never recorded, and the flag is
never consulted.

Audited the whole PR diff for the same overclaim:

1. workflows/README.md — main fix. Reworded the Error Handling intro to
   "any step that returns StepResult(status=FAILED, ...)" and promoted
   the parenthetical structural-validation note into the Notes block.
   Added a new "Scope: returned failures only" note that names the
   exception path explicitly and tells step authors how to bring the
   flag into scope for exceptional code (catch internally and return
   FAILED with the failure encoded in `output`).

2. tests/test_workflows.py — section comment used "when an executable
   step fails", same ambiguity. Tightened to "when a step returns
   StepResult(status=FAILED, ...)" and added a sentence calling out
   that unhandled exceptions are out of scope.

3. src/specify_cli/workflows/engine.py — already correct ("any step
   that returns FAILED" in the validator comment; "lets the pipeline
   route around the failure" in the execute path). No change.

Engine semantics and test bodies are unchanged. Docs-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(workflows): clarify on_reject:skip semantics — engine returns COMPLETED, not auto-skip

Copilot finding on b8982a7:

The README example's gate message said "reject to skip the rest of this
branch", and the explanatory paragraph claimed [approve, reject] map
to "continue" vs "skip the rest of this branch". The engine does not
implement automatic branch-skipping. `on_reject: skip` returns
`StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step
runs unconditionally unless the author wires a downstream `if` reading
`{{ steps.<gate-id>.output.choice }}`.

Two fixes:

1. Restructured the YAML example so it actually demonstrates the
   manual-branching pattern: added a `recover` if-step after the gate
   that conditions on `steps.review.output.choice == 'approve'`. Now
   the example shows the real workflow author's responsibility instead
   of implying the engine does it.

2. Replaced the trailing paragraph with three precise notes:
   - both gate options return COMPLETED; `on_reject: skip` controls
     abort behaviour only, not sibling-skipping
   - all three `on_reject` values enumerated with their actual engine
     semantics (FAILED+aborted / COMPLETED / PAUSED)
   - the original retry-loop guidance retained as the third bullet

Updated the gate message in the example to match — "reject to leave the
failure recorded and move on" instead of "reject to skip the rest of
this branch".

Audited the whole PR diff for the same overclaim: no other instance.
Engine semantics, validation, and test bodies are unchanged. Docs-only.

161/161 tests/test_workflows.py pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(workflows): clarify gate's role — surfaces, doesn't programmatically branch

Audit follow-up to 393ac6b — three sites repeated the same minor
overclaim about gates being one of the "branch on it" step types
alongside `if` and `switch`:

1. workflows/README.md (the "downstream `if`, `switch`, or `gate`
   steps can branch on it" sentence introducing the example)
2. engine.py:236 (validator inline comment)
3. engine.py:657 (execute-path inline comment)

A `gate` step does not have a `condition` or `expression` field — it
only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36).
Programmatic branching happens in `if`/`switch`; a gate surfaces the
value to a human operator via message interpolation, and the operator's
choice is recorded in `output.choice` for a *subsequent* `if`/`switch`
to route on.

Reworded all three sites consistently: "a downstream `if` or `switch`
can branch on it (or a `gate` can surface it to the operator via
message interpolation)". The README example already demonstrates this
distinction — the gate carries `{{ }}` template variables in its
message and the `recover` if-step downstream is what actually branches
on the choice.

Engine semantics, validation, and test bodies are unchanged. Docs-only
on the README; comment-only on engine.py.

161/161 tests/test_workflows.py pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(workflows): use qualified StepStatus.* instead of bare FAILED/COMPLETED/PAUSED

Three Copilot inline comments on workflows/README.md lines 226, 282, 288
flagged that ``StepResult(status=FAILED, ...)`` is not valid Python —
``StepResult.status`` is a ``StepStatus`` enum value, so the
documented form should be ``StepStatus.FAILED``.

Audited the whole PR diff for the same shorthand. The bare unqualified
form appears in three files added/modified by this PR:

1. workflows/README.md (6 sites) — three ``StepResult(status=FAILED, ...)``
   parentheticals, plus the on_reject Notes bullet listing the three
   step statuses (``FAILED``, ``COMPLETED``, ``PAUSED``).

2. tests/test_workflows.py (4 sites) — section header for
   TestContinueOnError, two test-method docstrings, one inline comment
   about a gate's TTY-fallback behaviour.

3. src/specify_cli/workflows/engine.py (1 site) — the validator inline
   comment added in d0b9e00 said "returns FAILED" where the engine
   code itself uses ``StepStatus.FAILED``.

All 11 sites normalised to the qualified ``StepStatus.<name>`` form so
the docs / test docstrings / inline comments match what readers will
actually find in the engine code and the tests. Engine semantics,
validation, and test bodies are unchanged.

161/161 tests/test_workflows.py pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 10:10:07 -05:00
Pedro Barbosa
44aac9f6e4 feat: add native Cline integration (#2508)
* test: strip ansi to make asserts work

* feat: add native Cline integration
2026-06-01 11:20:48 -05:00
Copilot
cc3d828227 Add support for SPECKIT_WORKFLOW_RUN_ID override (#2742)
* Initial plan

* feat: support SPECKIT_WORKFLOW_RUN_ID override

* docs: clarify run_id env var precedence wording

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-29 10:50:00 -05:00
Huy Do
c6afe4cde1 feat(workflows): expose {{ context.run_id }} template variable (#2664)
* feat(workflows): expose `{{ context.run_id }}` template variable

Closes #2590.

Surfaces the engine-assigned run id (the same 8-character hex
string Spec Kit prints as `Run ID:` at the end of
`workflow run`) as a workflow template variable so YAML
authors can reference it from shell `run:`, command
`input.args:`, switch `expression:`, and any other field that
already evaluates `{{ ... }}` templates.

### Why

The run id is the natural join key between a Spec Kit workflow
run and downstream artifacts, telemetry, or per-run scratch
state. Today the operator sees it in stdout but workflows
themselves cannot reference it — there was no way to stamp a
log line, name a scratch directory, or tag an artifact with
the same id Spec Kit assigned.

The three motivating use cases from the issue:

1. Telemetry / observability — stamp logs and events with the
   run id so external systems can join workflow runs to
   downstream artifacts.
2. Per-run scratch / isolation — interactive operator commands
   that need their own state directory under
   `/tmp/run-<id>/`.
3. Run-id in artifact metadata — stable join key from artifact
   back to the producing run.

### Implementation

`StepContext.run_id` is already populated by `WorkflowEngine`
in both `execute()` and `resume()`. The only gap was the
template namespace builder.

`_build_namespace` (in `workflows/expressions.py`) now adds a
`context` key alongside the existing `inputs`, `steps`,
`item`, and `fan_in` namespaces:

```python
ns["context"] = {"run_id": run_id}
```

The value is always present (even outside a run) and falls
back to an empty string when no run is active. Workflows
referencing `{{ context.run_id }}` therefore never error — a
hard requirement from the issue's acceptance criteria for
dry-run, validation, and ad-hoc evaluator usage.

### Default behaviour preserved

Workflows that do not reference `{{ context.run_id }}` are
byte-equivalent to before this change. The `context`
namespace is added unconditionally to keep template
resolution branch-free, but its presence has no observable
effect when nothing references it.

### Tests

`TestExpressions` (unit-level) gains three tests:

- `test_context_run_id_resolves` — direct lookup against a
  `StepContext(run_id=...)`.
- `test_context_run_id_defaults_to_empty_when_unset` —
  graceful default outside a run context.
- `test_context_run_id_string_interpolation` — mixed
  template (e.g. `"RUN_ID={{ context.run_id }}"`).

`TestContextRunId` (end-to-end) covers the three step types
the acceptance criteria called out:

- `test_shell_run_resolves_run_id` — `run:` field
  substitution, verified via captured stdout.
- `test_command_input_args_resolves_run_id` — `input.args:`
  resolution, captured in step output even when CLI dispatch
  is unavailable (the artifact-metadata use case).
- `test_switch_expression_matches_on_run_id` — switch
  matches against the resolved value, proving the run id is a
  first-class value in the expression engine, not just an
  interpolation token.
- `test_workflow_without_context_reference_unchanged` —
  locks the byte-equivalent default required by the issue.

### Docs

`workflows/README.md` gains a "Runtime Context" subsection
under "Expressions" documenting the new namespace and the
three canonical use patterns (telemetry, per-run scratch,
artifact metadata).

* test(workflows): drop inline double-quotes in run_id shell tests

`test_shell_run_resolves_run_id` and
`test_switch_expression_matches_on_run_id` used
`run: 'echo "RUN_ID={{ context.run_id }}"'` with inner double-quotes
around the echo argument. Bash/sh strips those quotes before invoking
echo, but cmd.exe (used on Windows when `shell=True`) treats them
as literal characters and emits `"RUN_ID=abc12345"` — failing the
exact-match assertion. Linux passed; all three Windows-latest matrix
entries failed with `assert '"RUN_ID=abc12345"' == 'RUN_ID=abc12345'`.

Resolve by dropping the inner double-quotes (the value has no spaces
or shell metacharacters) and wrapping the YAML scalar in plain
double-quotes the same way other shell-step tests in this file do
(e.g. `run: "echo b-saw-..."`). Behaviour-equivalent on POSIX,
portable to cmd.exe.
2026-05-27 13:00:58 -05:00
LahkLeKey
0ae451f697 Add util for windows sub-process (#2598)
* Add util for windows sub-process

* Use platform-aware Copilot executable in subprocess calls

* Update test_workflows.py
2026-05-26 08:25:15 -05:00
Manfred Riem
616eba6a57 fix: while/do-while loop condition reads stale iteration-0 step output (#2662)
* fix: while/do-while loop condition reads stale iteration-0 step output

After executing namespaced loop body steps, copy each iteration's
results back to the original unprefixed step key so that
evaluate_condition() sees the latest values instead of stale
iteration-0 data.

Fixes #2592

* address review: cross-platform tests, preserve iteration-0 history

- Rewrite shell scripts in tests to use Python via script files
  instead of POSIX syntax, so they pass on Windows CI.
- Snapshot iteration-0 nested-step results under a namespaced key
  (parent:child:0) before the first copy-back overwrite, preserving
  complete per-iteration history for debugging.

* address review: skip copy-back on paused/failed iterations

Move the status check before the copy-back so that partial results
from paused or failed nested steps (e.g., a gate awaiting input)
do not overwrite the unprefixed key. This preserves correct resume
behavior.

* address review: quote paths in test shell commands

Quote both the Python executable and script file paths in the
run: commands to handle spaces in paths on Windows.

* address review: execute loop body with original IDs

Instead of namespacing step IDs for execution and copying results
back, execute the loop body with original (unprefixed) step IDs so
results naturally land at the right keys.  Snapshot previous
iteration results to namespaced keys (parent:child:N) for history
only.

This fixes multi-step loop bodies where step B references step A's
output within the same iteration — previously step B would see
stale data until the copy-back ran after the entire iteration.

* address review: namespaced execution with per-step copy-back

Revert to namespaced step IDs for execution (preserving unique
log entries and state keys per iteration) but copy each step's
result back to the unprefixed key immediately after it completes.

This preserves backward compatibility (same namespaced key format,
same log IDs) while fixing both the condition evaluation bug and
inter-step references within multi-step loop bodies.

* address review: alias after status check, add multi-step body test

- Move per-step aliasing below the PAUSED/FAILED/ABORTED status
  check so partial results from incomplete steps are not aliased
  back to the unprefixed key.
- Add test_while_loop_multi_step_body_inter_step_refs to exercise
  a multi-step loop body where step B reads step A's output within
  the same iteration, verifying per-step aliasing works correctly.

Addresses feedback from @doquanghuy (items 2 & 4) and Copilot
review on commit 9d0a222.

* address review: stable fallback IDs, expression-based inter-step test

- Use enumerate() for stable fallback IDs when loop body steps lack
  an explicit id (step-0, step-1, etc. instead of always step-0).
- Rewrite multi-step body test so step B uses expression
  substitution ({{ steps.step-a.output.stdout }}) instead of
  reading the counter file directly, making it a true regression
  test for per-step aliasing.
2026-05-21 12:25:03 -05:00
darion-yaphet
3d50f85875 fix(cli): clarify exception diagnostics (#2602)
Consolidate the CLI diagnostic plan, implementation, and test hardening into one reviewable change. The CLI now reports phase and target context for broad failure paths while preserving existing fail-fast behavior for real setup failures and warning-only behavior for optional best-effort work.

The workflow unit tests also avoid discovering real local agent CLIs, so developer machines with tools such as gemini installed do not hang pytest during metadata-only assertions.

Constraint: CLI setup failures must remain fail-fast, while optional preset and cleanup paths should continue with clear warnings.

Rejected: Replace broad handlers across the whole codebase in one pass | too broad for a targeted CLI diagnostic fix

Rejected: Add runtime timeouts to workflow agent dispatch | dispatch may legitimately be long-running and the observed hang was test isolation

Confidence: high

Scope-risk: moderate

Directive: Keep future best-effort CLI warnings tied to the failed phase and target so users can diagnose setup state.

Tested: uvx ruff check src/; uv run pytest tests/integrations/test_cli.py -v; uv run pytest tests/test_workflows.py::TestCommandStep::test_step_override_integration tests/test_workflows.py::TestPromptStep::test_execute_with_step_integration tests/test_workflows.py::TestPromptStep::test_execute_with_model -vv; uv run pytest

Not-tested: Real Nacos/PG/Redis-style external service failure injection; real interactive workflow dispatch against installed gemini CLI
2026-05-20 21:19:48 -05:00
Quratulain-bilal
409ec59704 fix(workflow): support integration: auto to follow project's initialized AI (#2421)
* fix(workflow): support integration: auto to follow project's initialized AI

Closes #2406

(squashed)

* fix(workflow): combine JSONDecodeError and UnicodeDecodeError handling

Address Copilot feedback: UnicodeDecodeError can be raised by both
read_text() and json.loads(), so combining the handlers ensures both
cases produce a consistent, clear error message.

* fix(workflows): honor integration_state schema guard and modern state in 'integration: auto'

Three Copilot follow-ups on PR #2421:

1. engine.py:799 — `_load_project_integration` was bypassing the same
   schema guard `_read_integration_json` enforces. It now reads the
   schema field directly, returns None on a future schema (so the
   workflow falls back to the literal 'auto' default rather than
   guessing), and routes through `normalize_integration_state` /
   `default_integration_key` so modern installs that record
   `default_integration` / `installed_integrations` (without the
   legacy top-level `integration` field) resolve correctly.

2. test_workflows.py — added two regression cases:
   - `integration: auto` resolves a modern normalized state file
   - `integration: auto` falls back when the state file declares a
     newer `integration_state_schema` than this CLI supports

3. test_cli.py — added a CLI-level regression for the `UnicodeDecodeError`
   branch in `_read_integration_json` to match the existing
   malformed-JSON coverage.

* refactor(integration): extract shared try_read_integration_json helper

Address Copilot review on PR #2421:

Both `_read_integration_json` (CLI) and `_load_project_integration`
(workflow engine) were parsing `.specify/integration.json` independently,
duplicating the schema guard and risking drift between the two readers.

Extract the parse + schema validation into a single low-level helper
`try_read_integration_json` in `integration_state.py` that returns either
the normalized state or a structured `IntegrationReadError`. Both callers
now delegate to this helper:

- CLI keeps its loud-fail UX: each error kind ("decode", "os",
  "not_object", "schema_too_new") is translated into the existing console
  message + typer.Exit(1).
- Engine keeps its silent fallback: any error simply returns None so
  `integration: auto` falls back to the workflow's literal default.

This eliminates the divergence Copilot flagged without changing observable
behavior for either caller.

* fix(integration): distinguish missing file from non-regular path

Address Copilot review on PR #2421:

`try_read_integration_json` was collapsing two distinct cases into a
single `(None, None)` return:

1. `.specify/integration.json` truly missing — silent fallback is correct.
2. Path exists but is a directory, socket, or other non-regular file —
   this is a misconfiguration the CLI should surface loudly.

Split the check: `exists()` falsey returns `(None, None)`; existing-but-
not-a-regular-file returns `(None, IntegrationReadError(kind="os", ...))`
so the CLI's loud-fail path produces an actionable error while the
engine still treats it as a fallback to the workflow's literal default.

* docs(workflow): clarify version pin, advisory integrations list, enum exemption

- workflow.yml: fix comment that said 0.8.3 was first release with auto
  resolution; the pin is >=0.8.5 so the comment now matches the pin.
- workflow.yml: clarify that requires.integrations.any is an advisory,
  non-exhaustive compatibility hint, not a closed set.
- engine.py: clarify that the auto-sentinel exemption only skips enum
  membership; declared type is still enforced through _coerce_input.

* fix(workflow): resolve auto sentinel for provided values; report stat errors

Two Copilot findings fixed:

1. _resolve_inputs only resolved the ``integration: auto`` sentinel when it
   came from the input default. A caller explicitly providing
   ``{"integration": "auto"}`` (which the workflow prompt advertises as a
   valid value) bypassed _resolve_default and the literal "auto" leaked
   to dispatch. Provided values now go through the same resolution path
   as defaults, and the enum-membership exemption applies in both cases.
   Regression test added.

2. try_read_integration_json used Path.exists() / Path.is_file() as a
   pre-check. Both return False on some OSErrors (e.g. permission errors
   during stat), which silently treated an unreadable-but-present file
   as missing — the engine fell back without warning and the CLI failed
   to surface the loud error. The pre-check is gone: read_text() is
   attempted directly, FileNotFoundError means missing (silent fallback),
   IsADirectoryError and other OSErrors become loud IntegrationReadError.

* fix(workflow): enforce declared type for string inputs, reject bool-as-number

Two Copilot findings fixed:

1. _coerce_input previously coerced/validated only ``number`` and
   ``boolean`` types, so ``type: string`` silently accepted any Python
   value (numbers, lists, dicts). A YAML authoring mistake like
   ``type: string`` + ``default: 5`` slipped through. Strings are now
   required to actually be strings; non-strings raise ValueError, which
   surfaces as an ``invalid default`` error from validate_workflow.

2. ``type: number`` accepted ``default: true`` because ``bool`` is a
   subclass of ``int`` (``float(True) == 1.0``). Bools are now rejected
   explicitly in the number path so the YAML mistake fails fast. The
   boolean path is also tightened to reject non-bool / non-string
   values for symmetry.

Comment on the auto-sentinel enum exemption updated to reflect the
stronger guarantee. Regression tests added for both rejections.

* fix(cli): drop unused normalize_integration_state import to satisfy ruff

CI's `uvx ruff check src/` flagged this as F401: the symbol was imported
under a private alias but never referenced. Tests stay green after
removal.
2026-05-15 16:03:33 -05:00
Manfred Riem
370b5b4890 fix(copilot): use --yolo to grant all permissions in non-interactive mode (#2298)
* fix(copilot): use --yolo to grant all permissions in non-interactive mode

The Copilot CLI's --allow-all-tools flag only covers tool execution
permissions but does not grant path or URL access. When the Copilot
agent autonomously runs shell commands (e.g. npm run build) during
workflow execution, the CLI blocks path access and cannot prompt for
approval in non-interactive mode, producing:

  Permission denied and could not request permission from user

Replace --allow-all-tools with --yolo (equivalent to --allow-all-tools
--allow-all-paths --allow-all-urls) to grant all three permission types.

Rename the opt-out env var from SPECKIT_ALLOW_ALL_TOOLS to
SPECKIT_COPILOT_ALLOW_ALL to match the formal --allow-all alias and
scope it to the Copilot integration.

Fixes #2294

* review: deprecate SPECKIT_ALLOW_ALL_TOOLS, rename to SPECKIT_COPILOT_ALLOW_ALL_TOOLS

Address Copilot review feedback:

- Honour the old SPECKIT_ALLOW_ALL_TOOLS env var as a fallback with a
  DeprecationWarning so existing opt-outs are not silently ignored.
- Rename the new canonical env var to SPECKIT_COPILOT_ALLOW_ALL_TOOLS.
- New var takes precedence when both are set.
- Use monkeypatch in tests to avoid flakiness from ambient env vars.
- Add tests for deprecation warning, precedence, and opt-out paths.

* review: use UserWarning instead of DeprecationWarning

DeprecationWarning is suppressed by default in Python, so users relying
on the old SPECKIT_ALLOW_ALL_TOOLS env var would never see the
deprecation notice during normal CLI runs. Switch to UserWarning which
is always shown. Update test to also assert the warning category.
2026-04-21 11:41:52 -05:00
Ayesha Khalid
b4c4e86cbc fix(integrations): strip UTF-8 BOM when reading agent context files (#2283)
* fix(integrations): strip UTF-8 BOM when reading agent context files

* test(integrations): add BOM regression tests for context file read/write

* test(workflows): mock shutil.which in tests that assume CLI is absent

* test(integrations): remove unused manifest variable in BOM test
2026-04-20 16:00:20 -05:00
Manfred Riem
02a1d610df docs: add workflows reference, reorganize into docs/reference/, and add --version flag (#2244)
* docs: add workflows reference, reorganize into docs/reference/, and add --version flag

- Move integrations.md, extensions.md, presets.md into docs/reference/
- New docs/reference/workflows.md: command reference for all workflow
  commands, built-in SDD Cycle workflow with Mermaid diagram, step types,
  expressions, input types, state/resume, and FAQ
- Rename workflow input feature_name to spec with prompt 'Describe what
  you want to build' to match speckit.specify command terminology
- Add --version / -V flag to root specify command with tests
- Update docs/toc.yml, README.md links, and docs/upgrade.md cross-reference
  to use reference/ paths
- Add workflow command to README CLI reference table

* docs: update speckit_version requirement to >=0.7.2 in workflow example
2026-04-16 13:34:08 -05:00
Copilot
a00e679918 Add workflow engine with catalog system (#2158)
* Initial plan

* Add workflow engine with step registry, expression engine, catalog system, and CLI commands

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Add comprehensive tests for workflow engine (94 tests)

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Address review feedback: do-while condition preservation and URL scheme validation

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Address review feedback, add CLI dispatch, interactive gates, and docs

Review comments (7/7):
- Add explanatory comment to empty except block
- Implement workflow catalog download with cleanup on failure
- Add input type coercion for number/boolean/enum
- Fix example workflow to remove non-existent output references
- Fix while_loop and if_then condition defaults (string 'false' → bool False)
- Fix resume step index tracking with step_offset parameter

CLI dispatch:
- Add build_exec_args() and dispatch_command() to IntegrationBase
- Override for Claude (skills: /speckit-specify), Gemini (-m flag),
  Codex (codex exec), Copilot (--agent speckit.specify)
- CommandStep invokes installed commands by name via integration CLI
- Add PromptStep for arbitrary inline prompts (10th step type)
- Stream CLI output live to terminal (no silent blocking)
- Remove timeout when streaming (user can Ctrl+C)
- Ctrl+C saves state as PAUSED for clean resume

Interactive gates:
- Gate steps prompt [1] approve [2] reject in TTY
- Fall back to PAUSED in non-interactive environments
- Resume re-executes the gate for interactive prompting

Documentation:
- workflows/README.md — user guide
- workflows/ARCHITECTURE.md — internals with Mermaid diagrams
- workflows/PUBLISHING.md — catalog submission guide

Tests: 94 → 122 workflow tests, 1362 total (all passing)

* Fix ruff lint errors: unused imports, f-string placeholders, undefined name

* Address second review: registry-backed validation, shell failures, loop/fan-out execution, URL validation

- VALID_STEP_TYPES now queries STEP_REGISTRY dynamically
- Shell step returns FAILED on non-zero exit code
- Persist workflow YAML in run directory for reliable resume
- Resume loads from run copy, falls back to installed workflow
- Engine iterates while/do-while loops up to max_iterations
- Engine expands fan-out per item with context.item
- HTTPS URL validation for catalog workflow installs (HTTP allowed for localhost)
- Fix catalog merge priority docstring (lower number wins)
- Fix dispatch_command docstring (no build_exec_args_for_command)
- Gate on_reject=retry pauses for re-prompt on resume
- Update docs to 10 step types, add prompt step to tables and README

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* Address third review: fan-out IDs, catalog guards, shell coercion, docs

- Fan-out generates unique per-item step IDs and collects results
- Catalog merge skips non-dict workflow entries (malformed data guard)
- Shell step coerces run_cmd to str after expression evaluation
- urlopen timeout=30 for catalog workflow installs
- yaml.dump with sort_keys=False, allow_unicode=True for catalog configs
- Document streaming timeout as intentionally unbounded (user Ctrl+C)
- Document --allow-all-tools as required for non-interactive + future enhancement
- Update test docstring and PUBLISHING.md to 10 step types with prompt

* Validate final URL after redirects in catalog fetch

urlopen follows redirects, so validate the response URL against the
same HTTPS/localhost rules to prevent redirect-based downgrade attacks.

* Address fourth review: filter arg eval, tags normalization, install redirect check

- Filter arguments now evaluated via _evaluate_simple_expression() so
  default(42) returns int not string
- Tags normalized: non-list/non-string values handled gracefully
- Install URL redirect validation (same as catalog fetch)
- Remove unused 'skipped' variable in catalog config parsing
- Author 'github' → 'GitHub' in example workflow
- Document nested step resume limitation (re-runs parent step)

* Add explanatory comment to empty except ValueError block

* Address fifth review: expression parsing, fan-out output, URL install, gate options

- Move string literal parsing before operator detection in expressions
  so quoted strings with operators (e.g. 'a in b') are not mis-parsed
- Fan-out: remove max_concurrency from persisted output, fix docstring
  to reflect sequential execution
- workflow add: support URL sources with HTTPS/redirect validation,
  validate workflow ID is non-empty before writing files
- Deduplicate local install logic via _validate_and_install_local()
- Remove 'edit' gate option from speckit workflow (not implemented)

* Add comments to empty except ValueError blocks in URL install

* Address sixth review: operator precedence, fan_in cleanup, registry resilience, docs

- Fix or/and operator precedence (or parsed first = lower precedence)
- Restore context.fan_in after fan-in step completes
- Catch JSONDecodeError in registry load for corrupted files
- Replace print() with on_step_start callback (library-safe)
- Gate validation warns when on_reject set but no reject option
- Shell step: document shell=True security tradeoff
- README: sdd-pipeline → speckit, parallel → sequential for fan-out
- ARCHITECTURE.md: parallel → fan-out/fan-in in diagram

* Address seventh review: string literal before pipe, type annotations, validate on install

- Move string literal check above pipe filter parsing so 'a | b' works
- Fix type annotations: input_values list[str] | None, run_id str | None
- Run validate_workflow() before installing from local path/URL
- Remove duplicate string literal check from expression parser

* Address eighth review: fan-out namespaced IDs, early return, catalog validation

- Fan-out per-item step IDs use _fanout_{step_id}_{base}_{idx} namespace
  to avoid collisions with user-defined step IDs
- Early return after fan-out loop when state is paused/failed/aborted
- Catalog installs parse + validate downloaded YAML before registering,
  using definition metadata instead of catalog entry for registry

* Address ninth review: populate catalog, fix indentation, priority, README

- Add speckit workflow entry to catalog.json so it's discoverable
- Fix shell step output dict indentation
- Catalog add_catalog priority derived from max existing + 1
- README Quick Start clarified with install + local file examples

* Address tenth review: max_iterations validation, catalog config guard, version alignment

- Validate max_iterations is int >= 1 in while and do-while steps
- Guard add_catalog against corrupted config (non-dict/non-list)
- Align speckit_version requirement to >=0.6.1 (current package version)
- Fan-out template validation uses separate seen_ids set to avoid
  false duplication errors with user-defined step IDs

* Address eleventh review: command step fails without CLI, ID mismatch warning, state persistence

- Command step returns FAILED when CLI not installed (was silent COMPLETED)
- Catalog install warns on workflow ID vs catalog key mismatch
- Engine persists state.save() before returning on unknown step type
- Update tests to expect FAILED for command steps without CLI
- Integration tests use shell steps for CLI-independent execution

* Address twelfth review: type annotations, version examples, streaming docs, requires

- Fix workflow_search type annotations (str | None)
- PUBLISHING.md: speckit_version >=0.15.0 → >=0.6.1
- Document that exit_code is captured and referenceable by later steps
- Mark requires as declared-but-not-enforced (planned enhancement)
- Note full stdout/stderr capture as planned enhancement

* Enforce catalog key matches workflow ID (fail instead of warn)

* Bundle speckit workflow: auto-install during specify init

- Add workflows/speckit to pyproject.toml force-include for wheel builds
- Add _locate_bundled_workflow() helper (mirrors _locate_bundled_extension)
- Auto-install speckit workflow during specify init (after git extension)
- Update all integration file inventory tests to expect workflow files

* Address fourteenth review: prompt fails without CLI, resolved step data, fan-out normalization

- PromptStep returns FAILED when CLI not installed (was silent COMPLETED)
- Engine step_data prefers resolved values from step output
- Fan-out normalizes output.results=[] for empty item lists
- subprocess.run inherits stdout/stderr (no explicit sys.stdout)
- Registry tests use issubset for extensibility

* Address fifteenth review: fan_in docstring, gate defaults, validation guards, reserved prefix

- FanInStep docstring: aggregate-only, no blocking semantics
- FanInStep: guard output_config as dict, handle None
- Gate validate: use same default options as execute
- Validate inputs is dict and steps is list before iterating
- Reserve _fanout_ prefix in step ID validation
- PUBLISHING.md: remove unenforced checklist items, add _fanout_ note

* Address sixteenth review: docs regex, fan_in try/finally, hyphenated dot-path keys

- PUBLISHING.md: update ID regex docs to match implementation (single-char OK)
- FanInStep: wrap expression evaluation in try/finally for context.fan_in
- Expression dot-path: allow hyphens in keys before list index (e.g. run-tests[0])

* Make speckit workflow integration-agnostic, document Copilot CLI requirement

- Workflow integration selectable via input (default: claude)
- Each command step uses {{ inputs.integration }} instead of hardcoded copilot
- Copilot docstring documents CLI requirement for workflow dispatch
- Added install_url for Copilot CLI docs

* Address seventeenth review: project checks, catalog robustness

- Add .specify/ project check to workflow run/resume/status/search/info
- remove_catalog validates config shape (dict + list) before indexing
- _fetch_single_catalog validates response is a dict
- _get_merged_workflows raises when all catalogs fail to fetch
- add_catalog guards against non-dict catalog entries in config

* Address eighteenth review: condition coercion, gate abort result, while default, cache guard, resume log

- evaluate_condition treats plain 'false'/'true' strings as booleans
- Gate abort returns StepResult(FAILED) instead of raising exception
  so step output is persisted in state for inspection
- while_loop max_iterations optional (default 10), validation aligned
- Catalog cache fallback catches invalid JSON gracefully
- resume() appends workflow_finished log entry like execute()

* Address nineteenth review: allow-all-tools opt-in, empty catalogs, abort dead code, while docstring

- --allow-all-tools controlled by SPECKIT_ALLOW_ALL_TOOLS env var (default: 1)
  Set to 0 to disable automatic tool approval for Copilot CLI
- Empty catalogs list falls back to built-in defaults (not an error)
- Remove unreachable WorkflowAbortError catches from execute/resume
  (gate abort now returns StepResult(FAILED) instead of raising)
- while_loop docstring updated: max_iterations is optional (default 10)

* Address twentieth review: gate abort maps to ABORTED status, do-while max_iterations optional

- Engine detects output.aborted from gate step and sets RunStatus.ABORTED
  (was unreachable — gate abort returned FAILED but status was always FAILED)
- do-while max_iterations now optional (default 10), aligned with while_loop
- do-while docstring and validation updated accordingly

* Coerce default_options to dict, align bundled workflow ID regex with validator

* Gate validates string options, prompt uses resolved integration, loop normalizes max_iterations

* Use parentId:childId convention for nested step IDs

- Fan-out per-item IDs use parentId:templateId:index (e.g. parallel:impl:0)
- Reserve ':' in user step IDs (validation rejects)
- Replaces _fanout_ prefix with cleaner namespacing
- Expressions like {{ steps.parallel:impl:0.output.file }} work naturally

* Validate workflow version is semantic versioning (X.Y.Z)

* Schema version validation, strict semver, load_workflow docstring, preserve max_concurrency

- Validate schema_version is '1.0' (reject unknown future schemas)
- Strict semver regex: ^\d+\.\d+\.\d+$ (rejects 1.0.0beta etc.)
- load_workflow docstring: 'parsed' not 'validated'
- Keep max_concurrency in fan-out output (was dropped)
- do_while docstring: engine re-evaluates step_config condition
- ARCHITECTURE.md: document nested resume limitation

* Path traversal prevention, loop step ID namespacing

- RunState validates run_id is alphanumeric+hyphens (no path separators)
- workflow_add validates catalog source doesn't escape workflows_dir
- Loop iterations namespace nested step IDs as parentId:childId:iteration
  so multiple iterations don't overwrite each other in context/state

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-04-14 10:11:56 -05:00