github-spec-kit

mirror of https://github.com/github/spec-kit.git synced 2026-07-03 12:28:06 +08:00

Author	SHA1	Message	Date
Ali jawwad	876dca8659	fix(workflows): gate validate() must not crash on non-string options (#3233 ) GateStep.validate() reports non-string options as an error, but then -- when on_reject is 'abort'/'retry' -- still runs the reject-choice check 'any(o.lower() in ... for o in options)'. For a non-string option (e.g. options: [123]) o.lower() raised AttributeError, which escaped validate() and broke validate_workflow's documented 'return a list of errors, never raise' contract. Guard the check so it only runs when every option is a string (the non-string case is already reported above). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 15:07:50 -05:00
Ali jawwad	9ece347a77	fix(workflows): make pipe-filter detection quote-aware in expressions (#3232 ) _evaluate_simple_expression used 'if "\|" in expr' / expr.split("\|", 1) to detect a filter pipe, so a literal '\|' inside a quoted operand (e.g. inputs.x == 'a\|b') was mistaken for a filter separator and raised a spurious ValueError ('unknown filter') instead of comparing the string. Use the existing quote/bracket-aware _find_top_level helper (added for the operator-splitting fix) so only a top-level pipe is treated as a filter separator. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 14:55:45 -05:00
Huy Do	3036fe6954	fix(workflows): reject a fan-in wait_for that names an unknown step at validation (#3225 ) * fix(workflows): reject a fan-in wait_for that names an unknown step at validation * fix(workflows): reject fan-in wait_for self-reference and non-string entries Address review feedback on the fan-in wait_for validator: - A fan-in's own id is added to seen_ids before the wait_for check, so `wait_for: [<self>]` passed validation while producing a silent empty join at runtime. Reject self-references explicitly. - Non-string entries (e.g. YAML `wait_for: [123]`) were skipped by the isinstance(str) guard and validated even though they can never match a real step id. Flag them as wiring errors. Add coverage for both cases.	2026-06-29 14:52:08 -05:00
Ali jawwad	d378485696	fix(workflows): reject infinite number-input default instead of raising OverflowError (#3199 ) WorkflowEngine._coerce_input normalizes a whole-valued number to int via int(value). For an infinite float (e.g. a 'type: number' input with YAML 'default: .inf') int(inf) raises OverflowError, which is not in the except (ValueError, TypeError) tuple. validate_workflow eager-coerces declared defaults and is documented to RETURN a list of errors, but it only catches ValueError -- so the OverflowError escaped and validate_workflow raised instead of reporting, breaking its contract. (NaN already surfaced cleanly because int(nan) raises ValueError.) Add OverflowError to the except tuple so an infinite default surfaces as the same clean 'expected a number' ValueError as NaN, consistent with the function's existing fail-fast-on-authoring-mistakes design. Finite values (5.0 -> 5, 3.5 -> 3.5) are unaffected. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 10:05:51 -05:00
Ali jawwad	2a9db1d350	fix(workflows): make expression operator/literal parsing quote-aware (#3197 ) _evaluate_simple_expression split on operator keywords using naive str.find/split, so a keyword INSIDE a quoted operand was treated as an operator: `inputs.mode == 'read and write'` split on the inner ' and ' and evaluated as `(inputs.mode == 'read) and (write')`. The literal short-circuit was also too greedy -- `'a' == 'b'` matched startswith("'")/endswith("'") and was stripped to the garbage truthy string `a' == 'b`, so `'done' == 'failed'` evaluated truthy and gated the wrong branch. Add a quote/bracket-aware _find_top_level helper (mirroring the existing _split_top_level_commas) and use it for the and/or/comparison/in/not-in splits; tighten the literal short-circuit to fire only when the opening quote's match is the final char. The docstring already lists comparisons + and/or/not + in/not-in + string literals as supported, so this restores the documented contract. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 09:53:35 -05:00
Si Zengyu	1add20341d	fix(extensions,presets,workflows): resolve private GHES release assets via /api/v3 (#3157 ) * feat(auth): add github_provider_hosts() to enumerate GHES hosts from auth.json Assisted-by: Claude Code (model: claude-sonnet-4-6, autonomous) * fix(extensions): resolve GHES release assets via /api/v3 Generalizes resolve_github_release_asset_api_url to GitHub Enterprise Server hosts (gated by auth.json github hosts), fixing private GHES extension/preset downloads. github/spec-kit#3147 Assisted-by: Claude Code (model: claude-sonnet-4-6, autonomous) * fix(extensions,presets): pass auth.json github hosts into release resolver Assisted-by: Claude Code (model: claude-sonnet-4-6, autonomous) * docs(auth): document GHES private catalog + release-asset auth Assisted-by: Claude Code (model: claude-sonnet-4-6, autonomous) * fix(presets,workflows): pass auth.json github hosts into remaining release resolvers Wires preset add --from and workflow add through github_provider_hosts() so private GHES release assets resolve via /api/v3 there too. github/spec-kit#3147 Assisted-by: Claude Code (model: claude-sonnet-4-6, autonomous) * test(presets): use module-level io.BytesIO in GHES preset test Addresses Copilot review on PR #3157: drop unnecessary __import__("io") in test_preset_add_from_ghes_release_url_resolves_via_api_v3 since io is already imported at module level. * fix(github-http): pass through GHES asset API URLs by path shape Addresses Copilot review on PR #3157. A direct GHES /api/v3 release asset URL was only returned as already-resolved when its host was in the allowlist; otherwise the resolver returned None and the caller downloaded the same URL without 'Accept: application/octet-stream', fetching JSON metadata instead of the binary. Gate the passthrough on path shape alone, mirroring the github.com case. This is safe: passthrough returns the input URL unchanged and the caller fetches it either way, so no new request to an arbitrary host is induced; the token stays independently gated by auth.json in open_url. The allowlist remains the anti-SSRF gate on the tag-lookup resolving path. Add test_passthrough_for_unlisted_ghes_api_asset_url.	2026-06-25 10:44:30 -05:00
Ali jawwad	fdaaf18371	fix(workflows): preserve commas inside quoted list-literal elements (#3134 ) * fix(workflows): preserve commas inside quoted list-literal elements The simple-expression evaluator parsed a list literal with a naive `inner.split(",")`, which splits on commas inside quoted strings (and nested brackets). So `{{ ["a, b", "c"] }}` evaluated to three items (`["a", "b", "c"]`) instead of two, silently corrupting `fan-out` `items:` and any list expression that contains a comma inside a quoted element. Split list-literal elements on top-level commas only, ignoring commas inside quotes or nested brackets, via a small `_split_top_level_commas` helper. Plain and empty lists are unchanged. Add tests covering quoted commas, nested lists, and the existing plain/empty cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): cover single-quoted and nested list literals Address review: extend the list-literal regression test to assert single-quoted elements with commas and nested lists parse correctly, alongside the existing double-quoted cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 15:10:02 -05:00
Zied Jlassi	f846d6526c	fix(workflows): validate requires keys and reject phantom permissions gate (#3079 ) * fix(workflows): validate requires keys and reject phantom permissions gate A workflow's `requires` block was parsed but its keys were never validated, so a typo or an unsupported key was silently ignored. Most importantly, authors could write `requires.permissions.shell: true` expecting a runtime capability gate — but no such gate exists: a `shell` step always runs with the user's privileges. The declaration gave a false sense of sandboxing. `validate_workflow` now accepts only the recognised keys (`speckit_version`, `integrations`, `tools`, `mcp`) and rejects anything else, with an explicit error for `requires.permissions` pointing authors to `gate` steps for approval. Docs and the model comment are updated to state that `requires` is advisory, not a security boundary. - Reject non-mapping `requires`, unknown keys, and `requires.permissions` - Clarify workflows reference + PUBLISHING.md shell-step guidance - Tests for valid keys, non-mapping, unknown key, and permissions Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> Assisted-by: AI * fix(workflows): address review feedback on requires validation Follow-up to the review on #3079: - Guard `requires` validation on `is not None` instead of truthiness so a falsy non-mapping value (e.g. `requires: []` or `requires: ''`) is reported as an error instead of being silently skipped; `requires:` (YAML null) is still treated as an omitted block. Add a regression test. - Reword the workflows security note so `requires.permissions` is shown as rejected/unsupported rather than as a valid example of `requires`. - Standardize on US spelling (`_RECOGNIZED_REQUIRES_KEYS`, "recognized") to match the surrounding code and ease searching. - Tighten the permissions-rejection test to assert on specific message markers (`requires.permissions` and the `gate` guidance) so it fails if the validation path or wording drifts. Assisted-by: AI Signed-off-by: Zied Jlassi (Architect AI) <6190550+zied-jlassi@users.noreply.github.com> * fix(workflows): scope requires validation to workflow keys (drop tools/mcp) tools and mcp belong to the bundle manifest requires schema (bundler/models/manifest.py, resolved in bundler/services/resolver.py), not the workflow requires validated here. Drop them from _RECOGNIZED_REQUIRES_KEYS and revert the PUBLISHING.md claim that this PR had introduced, so workflow requires only recognizes speckit_version and integrations. This keeps the existing docs accurate and resolves the inline doc-consistency review comments. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> * refactor(workflows): type WorkflowDefinition.requires as Any pre-validation self.requires holds the raw parsed value, which before validate_workflow() runs may be a non-mapping (None for a bare 'requires:', a list for 'requires: []', etc.). Annotating it dict[str, Any] was misleading for editors/type-checkers; use Any and document that validate_workflow() enforces the mapping shape. Addresses Copilot review feedback on engine.py. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> * fix(workflows): reject YAML-null requires: as a non-mapping Address Copilot review: validate requires the same way as inputs. A bare requires: parses as YAML null and was previously treated as an omitted block, which is inconsistent with inputs and lets a stray requires: line be silently ignored. Drop the is-not-None guard and check isinstance(..., dict) directly: an omitted block still defaults to {} (valid), but a present-but-non-mapping value -- YAML null, [] or '' -- is now an authoring error that surfaces. Tests: add YAML-null rejection + an omitted-is-still-valid guard test. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> --------- Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> Signed-off-by: Zied Jlassi (Architect AI) <6190550+zied-jlassi@users.noreply.github.com>	2026-06-24 14:49:43 -05:00
Huy Do	1cb935997c	fix: fail loudly on an unknown workflow expression filter (#3074 ) * fix: fail loudly on an unknown workflow expression filter The expression evaluator's filter dispatch fell through to `return value` for any unregistered filter, so a typo'd or unsupported filter such as `{{ items \| length }}` rendered the value unchanged with no error and the run completed — a silent wrong result. Raise a clear ValueError instead, naming the offending filter and the valid ones, mirroring the strict handling already used for `from_json`. The five registered filters (default/join/map/contains/from_json) are unchanged; the `name(arg)` form of an unknown filter is now caught too. * fix: distinguish a misused registered filter from an unknown one; cover map Address the review feedback on the unknown-filter fail-loud path: - A registered filter used in an unsupported form (e.g. `\| join` or `\| map` with no argument) raised the misleading "unknown filter '<name>'" — the filter is registered, the syntax isn't. It now raises a message naming it as a known filter misused. A new `_REGISTERED_FILTERS` constant drives the distinction. - `test_registered_filters_unaffected` now also exercises `map('attr')`, which it previously claimed to cover but didn't. Add `test_registered_filter_unsupported_form_raises` to pin the new path. * fix: include the no-arg default form in the filter-error hint Copilot review: the hint listed default('x') but omitted the valid no-argument default form (\| default), which this module supports.	2026-06-22 10:44:23 -05:00
Huy Do	f5f76160a3	feat: surface gate detail in the workflow run/resume --json payload (#2965 ) * feat: surface gate detail in the workflow run/resume --json payload A paused run was indistinguishable from any other pause in the machine-readable outcome, and the gate's prompt/options/choice never left the human-facing stream. Record each step's type in the run state's step results (one engine line) and, when the run sits at a gate, add a gate block (step_id/message/options/choice) to the payload so orchestrators can drive review gates without parsing stdout. Reference implementation for the proposal in #2964. Addresses #2964 * fix(workflow): only surface gate detail in --json when the run is paused Address review (#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(workflows): surface gate block on aborted runs; stabilize message Address Copilot review: - `_gate_outcome` now also surfaces the gate block when a run is `aborted` by a gate rejection (`on_reject: abort`), not only when `paused`. Abort is the only path that sets ABORTED and it leaves current_step_id on the gate, so an orchestrator can read the recorded `choice` for the stop. - Coerce `message` to a string (it may be a non-string YAML literal that GateStep only coerces for interpolation) so the JSON schema stays stable. - Tests: add a CLI-level aborted-path test, a message-coercion test, and extend the suppression test to allow `aborted`; share the run helper via `_invoke_json` to avoid duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): assert clean exit in gate-abort JSON test Address Copilot review: the gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would surface as an opaque JSON decode error. Route it through `_run_json` (which asserts exit_code == 0 before parsing) and drop the now-redundant `_invoke_json` helper — a gate abort emits the payload and returns, so the run exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: use result.output in run-helper assert; document step_data shape Address Copilot review: - `_run_json` asserted with `result.stdout` in the message, but under `--json` step output is redirected off stdout — the useful diagnostics live on `result.output`. Switch the assertion message to `result.output` (the JSON parse still reads stdout), matching the other CLI tests. - `StepContext.steps` documented a 5-key entry shape; the engine now also persists `type` and `status`. Update the docstring to the canonical 7-key shape so step authors/debuggers see the real record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): align gate-abort JSON test with aborted→exit-1 After rebasing onto main, a gate abort now emits the --json payload and then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the merged exit-code work). Give `_run_json` an `expected_exit` parameter (default 0) so the abort case asserts exit 1 while the paused/completed cases stay at 0 — keeping a single shared helper rather than duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): backward-compat gate detection + normalize gate options Address Copilot review: - A run paused by an older version has no persisted step `type`, so `_gate_outcome` would never surface its gate block on resume. Add `_is_gate_step`: prefer the `type` field, but when it is absent fall back to the gate's unique output signature (`on_reject`, written only by GateStep). A record with a different known `type` is still not a gate. - Normalize `options` to a list of strings (mirroring the `message` coercion) so an unvalidated workflow with non-string options can't destabilize the JSON schema. - Tests: options coercion, type-less gate detection, and a type-less non-gate negative case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize non-list gate options to a stable list[str] Address Copilot review: the prior options normalization only mapped a `list`, returning the raw value for any other shape (scalar/tuple), which contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`: None stays None; list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is one option, never iterated character-by-character). The emitted schema is now always list[str] \| None. Extend the options test to cover list, tuple, bare string, numeric scalar, and None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize gate choice to str; portable plain-gate test Address Copilot review: - `_gate_outcome` normalized `message` and `options` but passed `choice` through as-is; an unvalidated gate can record a non-string `choice`, which contradicts the stable-schema rationale. Coerce `choice` to `str \| None` (None still means "no decision yet"), consistent with the other two fields. Adds a focused choice-coercion test. - The plain (no-gate) test workflow used `run: "true"`, which fails under cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform `run: "exit 0"` (matching the exit-code suite's workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-22 07:05:54 -05:00
Huy Do	affbf5ead5	feat(workflows): add from_json expression filter (#2961 ) * feat(workflows): add from_json expression filter Step outputs captured as strings could never become typed values in templates - the filter set was default/join/map/contains only, so e.g. a fan-out items: could never consume a step's JSON stdout. Add an arg-less from_json pipe filter with parse-or-raise semantics: invalid JSON or non-string input raises a clear ValueError rather than passing through silently. Fixes #2960 * fix(expressions): make from_json strict — reject any arguments Address review (#2961): from_json('x') and from_json() previously fell through to a silent passthrough of the unparsed value. Reject any parenthesized form with a clear error so mis-wired templates fail loudly. Rename test to ...parses_object (JSON under test is an object) and add coverage for the strict no-arguments behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(workflows): document the from_json expression filter Address Copilot review: the user-facing filter references omitted the newly added `from_json` filter. Add it to the ARCHITECTURE.md filter table (with the `{{ steps.emit.output.stdout \| from_json }}` example) and to the filter enumerations in workflows/README.md and docs/reference/workflows.md so the docs match the evaluator's capabilities. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): make from_json strictness reject trailing tokens; fix docstring Address Copilot review: - Strictness only rejected parenthesized forms, so typos like `\| from_json)` or `\| from_json extra` still fell through to the unknown-filter path and silently returned the unparsed value. Match on the leading filter token and require the whole filter to be exactly `from_json`, so every mis-wired form raises. Extend the rejection test to cover the trailing-token cases. - The module docstring claimed "no imports", which is misleading now that the module imports `json`. Reword to state the actual sandbox guarantee: templates cannot do file I/O, import modules, or run arbitrary code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-17 13:43:26 -05:00
Copilot	00bff788c9	Add `init` workflow step to bootstrap projects like `specify init` (#2838 ) * Initial plan * Add init workflow step to bootstrap projects like `specify init` * Address review: simplify stderr capture and extract VALID_SCRIPT_TYPES * Address review: fail fast on non-empty dir, stdout fallback, README force fix * Populate exit_code/stdout/stderr in non-empty-dir fast-fail * fix: address three unresolved review comments in InitStep - Use `with os.scandir(...)` context manager so the iterator is always closed even when `any()` short-circuits, preventing file-descriptor leaks in long-running workflow runs. - Guard `os.chdir(prev_cwd)` in the `finally` block with a try/except so an `OSError` (e.g. directory deleted) doesn't bypass returning the captured `StepResult`. - Reject non-string `script` values in `validate()` with a clear error message, rather than silently passing them through to become `--script True` at runtime. * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix: remove no_git and branch_numbering options removed upstream The --no-git and --branch-numbering flags were removed from `specify init` on main. Update InitStep to drop these unsupported config fields and fix tests accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review — integration defaults, integration_options, engine-owned dirs - Apply DEFAULT_INIT_INTEGRATION fallback when neither step config nor workflow context provides an integration, so output.integration always reflects the actual integration used. - Add integration_options config field to support --integration-options passthrough (required for generic integration and --skills mode). - Exclude .specify/ from the non-empty directory fast-fail check so that here: true works when the engine has already created its run-state directory before steps execute. - Note: mix_stderr=False is not needed — Click 8.2+ captures stderr separately by default and the existing try/except handles access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: implicitly add --force when only engine-owned dirs exist When the workflow engine creates .specify/workflows/runs/ before steps execute, the directory is technically non-empty. Previously, specify init would prompt for confirmation (hanging in unattended mode) unless the user explicitly set force: true. Now the step detects that only engine-owned directories (.specify/) are present and implicitly adds --force so init proceeds without user interaction. Also fixes the test to exercise the implicit-force path rather than passing force: True explicitly (which bypassed the check entirely). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: derive VALID_SCRIPT_TYPES from shared constant, fail fast on OSError, include all resolved fields in output - Derive VALID_SCRIPT_TYPES from SCRIPT_TYPE_CHOICES in _agent_config so the valid set cannot drift from the specify init CLI. - Fail fast with a clear error when os.scandir() raises OSError (e.g. permission denied) instead of silently treating the directory as empty. - Include preset, force, and ignore_agent_tools in all output dicts (both fast-fail and normal paths) for consistent interpolation and debugging downstream. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: populate stderr from stdout on older Click, fix force comment wording - When Click does not expose result.stderr (older versions where stderr is mixed into stdout), use stdout as stderr on non-zero exit so workflows can consistently read steps.<id>.output.stderr for errors. - Update README inline comment for force: wording to say 'when target directory already exists' rather than 'non-empty directory', matching the actual specify init behavior for the project: form. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: build argv flags before early returns, use any() for dir scan - Move argv flag-building (--integration, --script, --preset, --ignore-agent-tools) before the non-empty-dir and OSError early returns so output['argv'] always reflects the complete command. - --force is appended after the check since it may be set implicitly. - Replace list comprehension with any() generator expression to short-circuit without allocating a full list of DirEntry objects. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: only treat .specify as engine-owned when it is a real directory A file or symlink named .specify should not be excluded from the non-empty check. Use entry.is_dir(follow_symlinks=False) to ensure only an actual directory is considered engine-owned content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard implicit force for engine dirs only, fix integration fallback order - Only set implicit --force when engine-owned directories (.specify/) are actually present. A completely empty directory no longer gets --force added unnecessarily. - Fix integration resolution precedence: resolve step config expression first, then fall back to workflow default (also resolved), then to DEFAULT_INIT_INTEGRATION. Previously, a step expression resolving to falsy would bypass the workflow default entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Manfred Riem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 11:46:51 -05:00
Huy Do	071f784dfa	feat(workflows): opt-in output_format: json exposes parsed shell stdout as output.data (#2963 ) * feat(workflows): opt-in output_format: json exposes parsed shell stdout as output.data No step that runs external code could hand a typed value to a later step, so e.g. a fan-out could never consume a runtime-computed collection. With output_format: json declared, stdout is parsed and exposed under output.data (raw keys unchanged); a parse failure fails the step with a clear error. Without the key, behavior is unchanged. Reference implementation for the proposal in #2962. Addresses #2962 * test(shell): emit JSON via sys.executable for cross-platform output_format tests Address review (#2963): replace non-portable echo '{...}' (Windows cmd.exe keeps the single quotes, breaking JSON) with the established '"{py}" "{script}"' pattern using sys.executable + a temp script, so the output_format tests pass on the Windows CI matrix. Also make the validate test's run inert (exit 0). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-17 08:09:17 -05:00
Huy Do	1ee2b626a8	fix: non-zero exit code when a workflow run ends failed or aborted (#2959 ) * fix: non-zero exit code when a workflow run ends failed or aborted workflow run and workflow resume printed Status: failed (or emitted the --json payload) and exited 0, so scripts and CI could not rely on the process exit code. Map terminal outcomes: failed\|aborted -> 1, completed\|paused -> 0, on both the text and --json paths. The previous exit-0-on-failed behavior was pinned by test_workflow_run_failing_yaml_without_project; the pin is updated to the new contract. Fixes #2958 * test: portable exit-code step commands + cover resume failed->exit-1 Address review (#2959): replace non-portable run: 'true'/'false' with 'exit 0'/'exit 1' (Windows cmd.exe has no true/false builtins under shell=True), and add an end-to-end 'workflow resume' test asserting a resumed failed run exits non-zero. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-17 08:02:00 -05:00
Copilot	c52ccd7dc7	Add workflow step catalog — community-installable step types (#2394 ) * Initial plan * Add workflow step catalog: StepRegistry, StepCatalog, CLI commands, and tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/2885e646-477d-4df8-b9a3-06d8cb29e748 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Potential fix for pull request finding 'An assert statement has a side-effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Address PR review: path traversal, cache robustness, collision check, failed-to-load display - Add resolve()+relative_to() path traversal guards in workflow_step_add and workflow_step_remove to prevent directory escape via step_id - Harden _is_url_cache_valid in both StepCatalog and WorkflowCatalog to coerce fetched_at to float and catch TypeError/ValueError - Check STEP_REGISTRY and StepRegistry before installing to prevent collisions with built-in step types or already-installed steps - Show 'Custom (installed, failed to load)' section in workflow step list for steps in the registry that failed to load into STEP_REGISTRY * Fix StepRegistry shape validation and StepCatalog empty-YAML handling Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Polish: rename _default to default_registry, strengthen unreadable-file test Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address PR review: atomic install, hostname validation, cache resilience, no dynamic imports in list/info Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Fix shutil.move with existing step_dir: remove before move to avoid subdirectory nesting Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Call load_custom_steps at execution time; enforce hostname in _safe_fetch and _validate_url Agent-Logs-Url: https://github.com/github/spec-kit/sessions/73865880-fb25-4061-a43e-4e4b4d1c4de6 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Wrap YAML parsing in try/except; atomic step install via os.rename() under same fs Agent-Logs-Url: https://github.com/github/spec-kit/sessions/ff915bc5-ec7e-4e6a-b505-35f5795250df Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Validate YAML root is a dict in _load_catalog_config and workflow_step_add; fix WorkflowCatalog hostname validation Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Fix load_custom_steps() package imports and add reserved step ID validation * Move _re/_sys imports out of loop and _RESERVED_STEP_IDS to module level * Address review: collision-resistant module names, extra_files support, remove orphan dir * Harden extra_files: warn on non-dict, resolve symlinks in path traversal check * Switch _safe_fetch and StepCatalog._fetch_single_catalog to use open_url for auth consistency * Harden step_id validation against path-segment tricks; raise on StepRegistry.save() OSError * Clean up sys.modules on broken step packages; handle StepValidationError in step add/remove * Address review thread: int-coerce priorities, sys.modules cleanup, _require_specify_project, registry-first remove * fix: normalize workflow step catalog metadata fallbacks * fix: address latest workflow step and catalog review findings * Handle non-string extra_files keys in workflow step add * Harden StepRegistry symlink reads and extra_files path/URL validation * Harden custom step loader and step remove against symlinks and OSError * Fix StepCatalog.search() to coerce non-string fields before joining * Fix WorkflowCatalog YAML parsing error handling and isinstance checks * Harden step registry save and custom step/catalog ID handling * Harden cache validation and staging OSError handling * Address review: reorder symlink guard and split mixed test - Move symlink-parent check before is_dir() in load_custom_steps() so we never stat an external target through a symlink - Split test_get_merged_steps_normalizes_list_ids_to_strings into two focused tests: one for list-id normalization, one for get_step_info return values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review: symlink-before-stat in loader, restore registry on rmtree failure - load_custom_steps(): check is_symlink() before is_dir() on step directories so symlinked entries are skipped without statting external targets - workflow_step_remove: restore the registry entry when shutil.rmtree() fails so filesystem and registry state stay consistent and a future 'step add' isn't blocked Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Harden step_id validation and file-write error handling - _validate_step_id_or_exit: reject whitespace-only/padded IDs, Windows-invalid characters (<>:"\|?), control characters, trailing dots/spaces, and Windows reserved device names (con, nul, etc.) - Wrap step.yml/__init__.py staging writes in OSError handler - Wrap extra_files disk writes (mkdir + write_bytes) in OSError handler that names the failing relative path - Registry rollback on rmtree failure: restore verbatim metadata and emit a warning if the restore itself fails Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Address review: cache symlink guard, verbatim registry rollback, Windows test fix - StepCatalog: add _is_cache_path_safe() guard that checks for symlinks in .specify/workflows/steps/.cache path; skip cache reads and writes when any component is symlinked to prevent writes outside project root - Registry rollback: write metadata directly to registry.data['steps'] and call save() instead of using add() which overwrites timestamps - temp_dir fixture: use ignore_errors=True on Windows to avoid flaky teardown from locked file handles (WinError 32) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Simplify exec_module call by removing redundant nested try/except Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix empty YAML tolerance in WorkflowCatalog.add_catalog, scope ignore_errors to Windows - WorkflowCatalog.add_catalog(): treat None from yaml.safe_load() (empty file) as an empty mapping instead of raising 'corrupted' - temp_dir fixture: limit ignore_errors to sys.platform == 'win32' so real cleanup issues surface on non-Windows platforms Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Chain exceptions in _load_catalog_config for both catalog classes Add 'from exc' to preserve root cause in tracebacks while keeping clean user-facing messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make default catalog tests hermetic by isolating HOME Monkeypatch Path.home() to project_dir and clear catalog env vars so tests don't break on machines with a real ~/.specify/step-catalogs.yml or ~/.specify/workflow-catalogs.yml. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix falsy ID handling in _get_merged_steps for list-based catalogs Check for None explicitly instead of using 'or' which drops valid falsy IDs like 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Compare reserved step IDs case-insensitively for filesystem safety On case-insensitive filesystems (Windows, common macOS), variants like STEP-REGISTRY.JSON would collide with the actual registry file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add explanatory comments to intentional empty except blocks Document why cache-read failures are silently ignored in both WorkflowCatalog and StepCatalog _fetch_single_catalog methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-16 18:03:45 -05:00
Huy Do	36fd5f6f49	fix: fail loudly when a fan-out 'items' expression does not resolve to a list (#2957 ) A non-list result from the items expression is a wiring error (the template did not resolve to a collection); silently fanning out over zero items hides it until a confusing downstream failure. Fail the step with an error naming the expression instead. An explicit empty list remains valid input. Fixes #2956	2026-06-16 15:33:11 -05:00
WOLIKIMCHENG	d8a81b23b5	test(workflows): cover executable override fallback preflight (#2843 )	2026-06-08 15:33:37 -05:00
lselvar	f512b8b0d1	fix: resolve GitHub release asset API URL for private repo preset and workflow downloads (#2855 ) * fix: resolve GitHub release asset API URL for private repo preset and workflow downloads - Add shared `resolve_github_release_asset_api_url` utility to `_github_http.py` for reuse across preset and workflow download paths - Apply the same private-repo fix from PR #2792 (extensions) to: - `PresetCatalog.download_pack` — ZIP downloads via catalog `download_url` - `preset add --from <url>` — ZIP downloads from a direct URL - `workflow add <url>` — workflow YAML downloads from a direct URL - `workflow add <id>` (catalog) — workflow YAML downloads via catalog `url` - For browser release URLs (`github.com/…/releases/download/…`), the asset is resolved via the GitHub REST API and downloaded with `Accept: application/octet-stream` - Direct REST API asset URLs (`api.github.com/…/releases/assets/<id>`) are downloaded directly with `Accept: application/octet-stream` - Auth is preserved end-to-end through the existing `open_url` infrastructure - Update `test_download_pack_sends_auth_header` and add `test_download_pack_accepts_direct_github_rest_asset_url` to cover both paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: URL-encode tag in release API URL to handle special characters Encode the tag as a path segment (using quote with safe='') when building the releases/tags/<tag> API URL. This prevents malformed URLs when tags contain reserved characters like '/' or '#'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: add CLI-level tests for preset add --from GitHub release URL resolution Adds regression tests covering: - resolve_github_release_asset_api_url unit tests (passthrough, resolution, network error, URL encoding of special chars in tags) - CLI-level 'preset add --from <github-release-url>' end-to-end flow - CLI-level 'preset add --from <api-asset-url>' direct passthrough Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: deduplicate release URL resolution; fix test issues - ExtensionCatalog._resolve_github_release_asset_api_url now delegates to the shared helper in _github_http.py (also gains URL-encoding fix) - Remove unused 'io' import from test_github_http.py - Remove duplicate 'provides' dict keys accidentally added to test_presets.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: align resolver timeout with download timeout; add workflow CLI tests - Pass timeout=30 to resolve_github_release_asset_api_url in both workflow add paths so worst-case latency matches the download timeout - Add CLI-level regression tests for 'workflow add <url>' covering browser URL resolution and direct API asset URL passthrough Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remove unused urllib.request import; add catalog workflow test - Remove unused 'import urllib.request' in preset add --from path - Add CLI test for catalog-based 'workflow add <id>' with GitHub release URL resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * style: remove unused MagicMock imports from tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-05 10:41:40 -05:00
Huy Do	005c80a9c7	fix(workflows): render gate show_file contents in the interactive prompt (#2810 ) * fix(workflows): render gate show_file contents in the interactive prompt The gate step read and recorded `show_file` but never displayed its contents at the interactive prompt, so the operator approved/rejected without seeing the referenced file. Render the file inside the prompt when stdin is a TTY, with a graceful notice for missing/unreadable files. Non-interactive PAUSED behaviour, exit codes, resume semantics, and no-`show_file` output are unchanged. Closes #2809. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): keep gate _prompt signature stable and harden show_file reads The gate prompt rendered show_file by passing it as a third positional argument to _prompt. A test that stubs _prompt with a two-argument lambda (test_gate_abort_still_halts_with_continue_on_error) then failed once the branch caught up to main, because the call site passed three arguments to the two-argument stub. Compose the show_file material into the displayed message in execute() and keep _prompt to its (message, options) contract. Display data no longer widens the interactive seam, so stubbing _prompt stays stable and future review material can be added without breaking callers. _prompt now renders a multi-line message inside the gate box. Also catch ValueError in _read_show_file so a path the OS rejects outright (e.g. an embedded NUL byte) degrades to a notice instead of crashing the prompt, matching the helper's stated contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): coerce gate prompt message to str before rendering The multi-line render loop split the message on newlines, which assumes a str. A non-string message (e.g. a YAML numeric literal) previously rendered fine through the old f-string and would now raise on .split. Coerce with str() to preserve that tolerance, and add a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): make gate stdin handling robust; tidy compose_prompt typing Address review feedback on the gate tests and helper: - Swap the gate module's sys.stdin for a fixed-isatty stub (shared _StubStdin / _force_gate_stdin helpers) instead of setattr on sys.stdin.isatty, which is not assignable under some pytest capture modes. This also forces the non-interactive tests to a non-TTY so they cannot block on input() when run in a real terminal. - The non-interactive show_file test now hard-fails if _read_show_file is called, proving the file is not read on the PAUSED path. - _compose_prompt accepts a non-string message (e.g. a YAML numeric literal) and always returns str via str(message), keeping its annotation and docstring accurate; the redundant coercion in _prompt is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): strip control chars from gate show_file; default tests non-TTY Address review feedback: - _read_show_file strips C0 control characters (except tab) from each line, so a show_file containing ANSI escape sequences (e.g. \x1b[2J) cannot clear the screen or spoof the prompt/options when rendered to a terminal. - Add an autouse fixture on TestGateStep that defaults every gate test to a non-TTY stdin, so no test can drop into the interactive prompt and block on input() when the suite runs under a real TTY. Interactive tests opt back in via _force_gate_stdin(tty=True); the now-redundant explicit non-TTY calls were removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): localize gate stdin patch to the gate module's sys _force_gate_stdin rebinds the gate module's `sys` name to a stand-in whose stdin has a fixed isatty() and which delegates every other attribute to the real sys, instead of mutating the process-wide sys.stdin. This keeps the patch local to the gate module and leaves real stdin untouched. The gate abort test, which used the same process-wide swap, now shares the helper, so the pattern exists in exactly one place. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): sanitize the displayed gate show_file path, not just content Control characters were stripped from show_file contents but the path was still printed verbatim as the header (`f"{show_file}:"`) and echoed in the read-error notice, so a show_file path containing ANSI escapes could still inject terminal sequences. Centralize stripping in `_sanitize_for_display` and apply it to every show_file-derived string that reaches the terminal — the displayed path, each file line, and the error notice — while still opening the file with the original path. Add a test for path sanitization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(workflows): inline control-char stripping, drop the helper Reuse the existing _CONTROL_CHARS regex directly at the three display sites instead of wrapping it in a one-line helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): also strip LF and C1 controls from gate show_file display The control-char class skipped LF (so an embedded newline in a show_file path could break the boxed layout) and the C1 range (so \x9b CSI and other 8-bit controls survived). Widen the class to [\x00-\x08\x0a-\x1f\x7f-\x9f] (still keeping tab). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 08:04:52 -05:00
Samir Abed	34ce66139e	feat: add support for rovodev (#2539 ) * feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev * fixup! feat: add support for rovodev	2026-06-04 11:34:05 -05:00
Huy Do	141119efea	feat(workflows): add JSON output for workflow run resume and status (#2814 ) * feat(workflows): add --json output to workflow run, resume, and status Adds an opt-in `--json` flag to `workflow run`, `workflow resume`, and `workflow status` that emits a single machine-readable object (run_id, workflow_id, status, current step; status also reports per-step states and a runs list) for automation and external orchestrators. JSON is written via a small `_emit_workflow_json` helper using plain stdout, so Rich markup, highlighting, and line-wrapping can never alter the emitted object. Default human-readable output and exit codes are unchanged when `--json` is omitted. Reference docs updated. Closes #2811. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): keep --json stdout clean while steps write output Suppressing the banner and the step-start callback was not enough to guarantee a single parseable JSON object on stdout: individual steps still write there while the engine runs. The gate step prints its prompt, and the prompt step runs a CLI subprocess that inherits the process's stdout file descriptor — either can corrupt the JSON stream for interactive runs or integration-backed workflows. Wrap engine.execute()/engine.resume() in a file-descriptor-level redirect (dup2) when --json is set, so both Python-level writes and inherited-fd subprocess output go to stderr while stdout carries only the emitted JSON. Step progress stays visible on stderr. status does not run the engine, so it is unaffected. Tests cover both pollution channels (a Python print and a real subprocess) via fd-level capture, and the inactive no-op path. Docs note the stdout/stderr split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(workflows): fix stray escape sequence in --json redirect comments The redirect helper's docstring and its test comment wrote ``print``\s, which renders as "print\s" rather than "prints". Replace with plain "prints". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 11:11:39 -05:00
One-TheOnly	8e5643d4ff	fix(cursor-agent): enable headless CLI dispatch end-to-end (-p --trust --approve-mcps --force + Windows .cmd shim resolution) (#2631 ) * fix(cursor-agent): enable CLI dispatch via ``-p --trust`` headless mode Restores the ability for ``specify workflow run`` to dispatch the cursor-agent CLI, complementing the existing in-IDE skill flow. Without this fix, ``specify workflow run speckit --input integration=cursor-agent ...`` fails with a misleading ``CLI not found or not installed`` error even when the CLI is installed (since cursor-agent had ``requires_cli=False`` and an unset ``build_exec_args``). The cursor-agent CLI (>= 2026.05.16) supports headless execution via ``-p`` (print mode with full tool access including write/shell) and ``--trust`` (bypass Workspace Trust prompt). Without ``--trust`` the CLI exits non-zero in non-TTY contexts (verified locally). Changes to ``src/specify_cli/integrations/cursor_agent/__init__.py``: * ``config.requires_cli``: ``False`` -> ``True`` * ``config.install_url``: ``None`` -> Cursor CLI docs URL * Override ``build_exec_args()`` to emit ``[cursor-agent, -p, --trust, <prompt>, ...]`` with optional ``--model`` and ``--output-format json`` flags, mirroring the shape used by ``claude``/``codex``/``gemini``. Tests: * 34 existing cursor-agent tests still pass. * 6 new tests in ``TestCursorAgentCliDispatch`` pin ``requires_cli``, ``install_url``, and the exact argv shape (default, text-output, with-model, and the hyphenated skill invocation form ``/speckit-<name>``). * Full repo: 1085 / 1085 passed, no regressions. Fixes #2629 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(integrations): resolve ``.cmd``/``.bat`` shims before subprocess.run On Windows, ``shutil.which`` honors ``PATHEXT`` and locates wrappers like ``cursor-agent.cmd`` and ``codex.cmd``, but Python's ``subprocess.run`` calls ``CreateProcess`` which does not consult ``PATHEXT`` and therefore fails with ``WinError 2`` on a bare argv like ``[cursor-agent, ...]``. Resolve ``exec_args[0]`` via ``shutil.which`` in ``IntegrationBase.dispatch_command`` so ``.cmd``/``.bat`` shims work transparently. On POSIX this is a no-op for absolute paths and a harmless lookup otherwise. Verified locally on Windows 10 + cursor-agent 2026.05.16: without this fix, ``specify workflow run speckit --input integration=cursor-agent`` fails with ``FileNotFoundError`` even after the cursor-agent integration starts producing valid exec args (per the prior commit on this branch). Tests: * New: 2 cursor-agent tests pin the shim-resolution + passthrough behavior (``test_dispatch_command_resolves_cmd_shim_for_subprocess`` and ``test_dispatch_command_passthrough_when_shutil_which_finds_nothing``). * Updated: ``tests/test_workflows.py::TestCommandStep::test_dispatch_with_mock_cli`` was mocking ``shutil.which`` only at the ``command`` step level and not at the ``base`` level, which made it environment-sensitive (fails locally when the real ``claude`` CLI is on PATH). Added the matching base-level patch and updated the argv-assertion to reflect the resolved path. ``test_dispatch_failure_returns_failed_status`` gets the same patch for consistency. * Full repo: 2867 passed, 0 regression from this PR. The 12 remaining pre-existing failures are unrelated Windows ``symlink`` privilege failures (``WinError 1314``) on a non-admin Windows runner. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cursor-agent): inject --approve-mcps --force for headless MCP/tool access The previous commit (`1c55988`) wired up ``-p --trust`` so the CLI launches in headless mode without the Workspace Trust prompt, but that alone is not enough to let ``specify workflow run`` drive a real speckit feature end-to-end with cursor-agent on Windows. Two more flags are required: * ``--approve-mcps``: without it, every MCP server configured in ``.cursor/mcp.json`` stays ``not loaded (needs approval)``, and any tool call against them is silently dropped. We hit this immediately trying to read a DingTalk PRD from a remote MCP server during the ``/speckit-specify`` step. * ``--force``: without it, the agent halts on the first tool-call approval prompt (the tool call gets rejected and the workflow exits non-zero with a misleading message). With ``--force`` cursor-agent matches the implicit "trusted environment" semantics that ``claude -p`` and ``codex --exec`` already have by default -- which is the right semantics for an unattended ``specify workflow run`` invocation. Verified end-to-end on Windows 10 + cursor-agent 2026.05.16-0338208: * ``cursor-agent -p --trust --approve-mcps --force --output-format text`` + a ``/speckit-specify`` prompt that included a DingTalk URL produced a full spec.md (31.5 KB) plus checklists/requirements.md in ~10.7 min, reading the source PRD through the ``dingtalk-doc`` remote MCP server, deciding the ``specs/`` subpath itself, and updating ``.specify/feature.json`` and ``specs/menu-dictionary.md`` along the way -- no human-in-the-loop, no source PRD ever touched the filesystem. * Without ``--approve-mcps`` the same prompt errors with the tool call rejected message; without ``--force`` the agent stops at the first non-MCP tool call. Tests: * ``test_build_exec_args_`` updated to pin the new four-flag prefix. New ``test_build_exec_args_contains_mandatory_headless_flags`` asserts the four flags are always present together. * ``test_dispatch_command_resolves_cmd_shim_for_subprocess`` updated to match the new argv layout. * All 43 cursor-agent tests pass; no other tests touched. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(cursor-agent): express dispatch support via build_exec_args() instead of requires_cli Co-authored-by: Cursor <cursoragent@cursor.com> * test(cursor-agent): use urlparse hostname check and cover dispatch without requires_cli Co-authored-by: Cursor <cursoragent@cursor.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: 刘一 <liuyi@oureman.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-04 09:48:33 -05:00
Quratulain-bilal	bb2b49d0ae	fix(workflows): validate run_id in RunState.load before touching the … (#2813 ) * fix(workflows): validate run_id in RunState.load before touching the filesystem ``RunState.load(run_id, project_root)`` interpolates ``run_id`` directly into ``project_root / ".specify" / "workflows" / "runs" / run_id`` and then calls ``state_path.exists()`` and ``json.load`` on the result. The run_id is reachable from user input via ``specify workflow resume <run_id>`` (CLI argument) and via ``SPECKIT_WORKFLOW_RUN_ID`` (env var override on the engine's run path), so a value like ``../escape`` turns ``runs_dir`` into ``.specify/workflows/escape/`` and: * ``state_path.exists()`` becomes a file-existence oracle for any path the process can read. * if a ``state.json`` exists at the traversed location (planted by a malicious dependency, a misconfigured shared workspace, or an older spec-kit version that happened to write there), ``json.load`` parses it and the workflow resumes under the attacker-chosen ``workflow_id`` / step state. * a subsequent ``state.save()`` then writes back to the traversed location, persisting the corruption. ``RunState.__init__`` already validates ``run_id`` against ``r'^[a-zA-Z0-9][a-zA-Z0-9_-]$'`` — but that check runs on ``state_data["run_id"]`` after* ``load`` has already done the file lookup, which is too late to prevent the disclosure. This change extracts the pattern into a class-level constant ``_RUN_ID_PATTERN`` and a single ``_validate_run_id`` classmethod so ``__init__`` and ``load`` cannot drift, then calls the validator at the top of ``load`` before any path is built. Mirrors the precedent in ``src/specify_cli/agents.py::_ensure_within_directory`` (used at line 437 of that file) which guards extension-install paths against the same threat model. Regression tests parametrize 9 traversal vectors (``../escape``, ``..``, ``../../etc/passwd``, ``foo/bar``, ``foo\bar``, ``.hidden``, ``-flag``, ``foo\x00bar``, empty) and plant a malicious ``state.json`` outside ``runs/`` so a missing guard would surface as a successful load rather than the ambiguous ``FileNotFoundError``. A second test asserts ``__init__`` and ``load`` reject the same representative malformed ID, so future changes to one path can't silently drift from the other. * test(workflows): exercise RunState.load in shared-validation test, fix __init__ empty-string asymmetry Copilot's review on this PR pointed out that test_init_and_load_share_validation claimed to verify both entry points share the same validation rules but never actually called RunState.load — only __init__ and the shared _validate_run_id helper. A regression in load (e.g. someone deleting the cls._validate_run_id(run_id) call before the path is built) would slip through even though __init__ and the helper stayed aligned, defeating the whole point of the test. Tightening the test surfaced a real asymmetry the previous version was silently masking: self.run_id = run_id or str(uuid.uuid4())[:8] The truthiness fallback meant RunState(run_id="") silently substituted a UUID and skipped validation, while RunState.load("", project_root) correctly rejected the empty string. The two entry points diverged on the empty-string vector. That is exactly the drift the test name claimed to defend against — and the original test missed it. Changes ------- * engine.py: __init__ now distinguishes run_id is None (caller omitted it → auto-generate UUID) from an empty string (caller provided it → must validate like any other value). Both paths still flow through _validate_run_id, but only the explicit-None case auto-generates. * test_workflows.py: test_init_and_load_share_validation is now parametrized over one representative vector per category from test_load_rejects_path_traversal (parent traversal, embedded separator, leading non-alphanumeric, empty string) and asserts that all three entry points — __init__, _validate_run_id, and load — reject the same input. Adding load to the assertion is the substantive fix Copilot asked for; keeping __init__ and the helper alongside it makes any future drift between the three immediately observable instead of having to read three separate tests. Verification ------------ pytest tests/test_workflows.py — 168 passed (was 165 before the parametrize expansion; __init__ empty-string vector would have failed the new test against the old engine code, confirming the asymmetry was real).	2026-06-03 14:26:07 -05:00
Huy Do	1732b9b62e	feat(workflows): allow resume to accept updated workflow inputs (#2815 ) `workflow resume` now accepts `--input key=value` (the same flag and parsing as `workflow run`, via a shared `_parse_input_values` helper). Supplied values are merged over the run's persisted inputs and re-resolved through the existing typed-validation path (`_resolve_inputs`), so a resumed/re-run step sees the updated inputs and ill-typed values fail fast. Keys not supplied keep their persisted values; resuming without `--input` is unchanged. Reference docs updated. Distinct from #2405 (file-reference inputs at run time): this is about supplying inputs at resume time, reusing the existing input model. Closes #2812. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 12:04:07 -05:00
Huy Do	7bab0568c5	feat(workflows): add continue_on_error step field for non-halting failures (#2663 ) * feat(workflows): add continue_on_error step field Adds an optional `continue_on_error: bool` field on every step. When set to `true` and the step fails, the engine records the result (`exit_code`, `stderr` on `steps.<id>.output` plus `status` as a sibling key on `steps.<id>`) and continues to the next sibling step instead of halting the run. Downstream `if`, `switch`, or `gate` steps can then branch on `{{ steps.<id>.output.exit_code }}` to route the recovery path. Engine details -------------- `WorkflowEngine._execute_steps` now consults the step config when a step returns `StepStatus.FAILED`: - Gate aborts (`output.aborted`) always halt the run — operator decisions take precedence over the flag. - Otherwise, if `continue_on_error` is the literal `True`, log a `step_continue_on_error` event and proceed to the next sibling. The runtime check uses identity comparison (`is True`) rather than truthiness, so truthy non-bool values like the string `"true"` cannot silently change run semantics even if a caller bypasses `validate_workflow()`. - Otherwise, behave as before: log `step_failed`, set `RunStatus.FAILED`, and return. Validation ---------- `_validate_steps` rejects non-bool values for `continue_on_error`. Coerced strings like `"true"` are not accepted so authoring mistakes surface at validation time rather than silently changing run semantics. Tests ----- `TestContinueOnError` in `tests/test_workflows.py` (8 tests): - `test_undeclared_failure_halts_run` — default halt behaviour. - `test_declared_and_fired_continues_run` — flag + fail → continue. - `test_declared_but_step_succeeded_is_noop` — flag + success → no-op. - `test_if_branch_routes_around_failure` — end-to-end recovery. - `test_gate_abort_still_halts_with_continue_on_error` — abort always halts. - `test_validation_rejects_non_bool_continue_on_error` — `"true"` rejected at validation. - `test_validation_accepts_bool_continue_on_error` — `true`/`false` pass cleanly. - `test_engine_ignores_truthy_non_bool_continue_on_error` — defense-in-depth: engine ignores string `"true"` even when validation is bypassed. Rebased onto current upstream/main (post #2664 merge); the new `TestContinueOnError` class sits immediately after upstream's `TestContextRunId` so the two feature suites coexist cleanly. Closes #2591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): restore runtime context section, clarify gate prompt Two Copilot findings on `d0b9e00`: 1. The `### Runtime Context` documentation for `{{ context.* }}` was lost during the rebase onto current main (the squash dropped the anchor where #2664 had added it). Restored under `## Expressions` so users can find `context.run_id` semantics and examples. 2. The continue_on_error example gate had message "Retry or skip?" but used the default `options: [approve, reject]` with `on_reject: skip`, which implied an automatic retry path that gates do not provide. Reworded the message to match the actual approve/reject semantics and added an explicit note that retry requires either custom gate options + downstream branching or a wrapper loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify continue_on_error scope — returned FAILED only Copilot finding on `d0b9e00`: The README's "Error Handling" intro implied `continue_on_error` covers "any other runtime error raised during step execution", but the engine only consults the flag when a step returns `StepResult(status=FAILED, ...)`. Exceptions raised out of `step_impl.execute()` propagate to `WorkflowEngine.execute()`, where the catch-all logs `workflow_failed` and re-raises — the step result is never recorded, and the flag is never consulted. Audited the whole PR diff for the same overclaim: 1. workflows/README.md — main fix. Reworded the Error Handling intro to "any step that returns StepResult(status=FAILED, ...)" and promoted the parenthetical structural-validation note into the Notes block. Added a new "Scope: returned failures only" note that names the exception path explicitly and tells step authors how to bring the flag into scope for exceptional code (catch internally and return FAILED with the failure encoded in `output`). 2. tests/test_workflows.py — section comment used "when an executable step fails", same ambiguity. Tightened to "when a step returns StepResult(status=FAILED, ...)" and added a sentence calling out that unhandled exceptions are out of scope. 3. src/specify_cli/workflows/engine.py — already correct ("any step that returns FAILED" in the validator comment; "lets the pipeline route around the failure" in the execute path). No change. Engine semantics and test bodies are unchanged. Docs-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify on_reject:skip semantics — engine returns COMPLETED, not auto-skip Copilot finding on `b8982a7`: The README example's gate message said "reject to skip the rest of this branch", and the explanatory paragraph claimed [approve, reject] map to "continue" vs "skip the rest of this branch". The engine does not implement automatic branch-skipping. `on_reject: skip` returns `StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step runs unconditionally unless the author wires a downstream `if` reading `{{ steps.<gate-id>.output.choice }}`. Two fixes: 1. Restructured the YAML example so it actually demonstrates the manual-branching pattern: added a `recover` if-step after the gate that conditions on `steps.review.output.choice == 'approve'`. Now the example shows the real workflow author's responsibility instead of implying the engine does it. 2. Replaced the trailing paragraph with three precise notes: - both gate options return COMPLETED; `on_reject: skip` controls abort behaviour only, not sibling-skipping - all three `on_reject` values enumerated with their actual engine semantics (FAILED+aborted / COMPLETED / PAUSED) - the original retry-loop guidance retained as the third bullet Updated the gate message in the example to match — "reject to leave the failure recorded and move on" instead of "reject to skip the rest of this branch". Audited the whole PR diff for the same overclaim: no other instance. Engine semantics, validation, and test bodies are unchanged. Docs-only. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify gate's role — surfaces, doesn't programmatically branch Audit follow-up to `393ac6b` — three sites repeated the same minor overclaim about gates being one of the "branch on it" step types alongside `if` and `switch`: 1. workflows/README.md (the "downstream `if`, `switch`, or `gate` steps can branch on it" sentence introducing the example) 2. engine.py:236 (validator inline comment) 3. engine.py:657 (execute-path inline comment) A `gate` step does not have a `condition` or `expression` field — it only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36). Programmatic branching happens in `if`/`switch`; a gate surfaces the value to a human operator via message interpolation, and the operator's choice is recorded in `output.choice` for a subsequent `if`/`switch` to route on. Reworded all three sites consistently: "a downstream `if` or `switch` can branch on it (or a `gate` can surface it to the operator via message interpolation)". The README example already demonstrates this distinction — the gate carries `{{ }}` template variables in its message and the `recover` if-step downstream is what actually branches on the choice. Engine semantics, validation, and test bodies are unchanged. Docs-only on the README; comment-only on engine.py. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): use qualified StepStatus.* instead of bare FAILED/COMPLETED/PAUSED Three Copilot inline comments on workflows/README.md lines 226, 282, 288 flagged that ``StepResult(status=FAILED, ...)`` is not valid Python — ``StepResult.status`` is a ``StepStatus`` enum value, so the documented form should be ``StepStatus.FAILED``. Audited the whole PR diff for the same shorthand. The bare unqualified form appears in three files added/modified by this PR: 1. workflows/README.md (6 sites) — three ``StepResult(status=FAILED, ...)`` parentheticals, plus the on_reject Notes bullet listing the three step statuses (``FAILED``, ``COMPLETED``, ``PAUSED``). 2. tests/test_workflows.py (4 sites) — section header for TestContinueOnError, two test-method docstrings, one inline comment about a gate's TTY-fallback behaviour. 3. src/specify_cli/workflows/engine.py (1 site) — the validator inline comment added in `d0b9e00` said "returns FAILED" where the engine code itself uses ``StepStatus.FAILED``. All 11 sites normalised to the qualified ``StepStatus.<name>`` form so the docs / test docstrings / inline comments match what readers will actually find in the engine code and the tests. Engine semantics, validation, and test bodies are unchanged. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 10:10:07 -05:00
Pedro Barbosa	44aac9f6e4	feat: add native Cline integration (#2508 ) * test: strip ansi to make asserts work * feat: add native Cline integration	2026-06-01 11:20:48 -05:00
Copilot	cc3d828227	Add support for SPECKIT_WORKFLOW_RUN_ID override (#2742 ) * Initial plan * feat: support SPECKIT_WORKFLOW_RUN_ID override * docs: clarify run_id env var precedence wording --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-29 10:50:00 -05:00
Huy Do	c6afe4cde1	feat(workflows): expose {{ context.run_id }} template variable (#2664 ) * feat(workflows): expose `{{ context.run_id }}` template variable Closes #2590. Surfaces the engine-assigned run id (the same 8-character hex string Spec Kit prints as `Run ID:` at the end of `workflow run`) as a workflow template variable so YAML authors can reference it from shell `run:`, command `input.args:`, switch `expression:`, and any other field that already evaluates `{{ ... }}` templates. ### Why The run id is the natural join key between a Spec Kit workflow run and downstream artifacts, telemetry, or per-run scratch state. Today the operator sees it in stdout but workflows themselves cannot reference it — there was no way to stamp a log line, name a scratch directory, or tag an artifact with the same id Spec Kit assigned. The three motivating use cases from the issue: 1. Telemetry / observability — stamp logs and events with the run id so external systems can join workflow runs to downstream artifacts. 2. Per-run scratch / isolation — interactive operator commands that need their own state directory under `/tmp/run-<id>/`. 3. Run-id in artifact metadata — stable join key from artifact back to the producing run. ### Implementation `StepContext.run_id` is already populated by `WorkflowEngine` in both `execute()` and `resume()`. The only gap was the template namespace builder. `_build_namespace` (in `workflows/expressions.py`) now adds a `context` key alongside the existing `inputs`, `steps`, `item`, and `fan_in` namespaces: ```python ns["context"] = {"run_id": run_id} ``` The value is always present (even outside a run) and falls back to an empty string when no run is active. Workflows referencing `{{ context.run_id }}` therefore never error — a hard requirement from the issue's acceptance criteria for dry-run, validation, and ad-hoc evaluator usage. ### Default behaviour preserved Workflows that do not reference `{{ context.run_id }}` are byte-equivalent to before this change. The `context` namespace is added unconditionally to keep template resolution branch-free, but its presence has no observable effect when nothing references it. ### Tests `TestExpressions` (unit-level) gains three tests: - `test_context_run_id_resolves` — direct lookup against a `StepContext(run_id=...)`. - `test_context_run_id_defaults_to_empty_when_unset` — graceful default outside a run context. - `test_context_run_id_string_interpolation` — mixed template (e.g. `"RUN_ID={{ context.run_id }}"`). `TestContextRunId` (end-to-end) covers the three step types the acceptance criteria called out: - `test_shell_run_resolves_run_id` — `run:` field substitution, verified via captured stdout. - `test_command_input_args_resolves_run_id` — `input.args:` resolution, captured in step output even when CLI dispatch is unavailable (the artifact-metadata use case). - `test_switch_expression_matches_on_run_id` — switch matches against the resolved value, proving the run id is a first-class value in the expression engine, not just an interpolation token. - `test_workflow_without_context_reference_unchanged` — locks the byte-equivalent default required by the issue. ### Docs `workflows/README.md` gains a "Runtime Context" subsection under "Expressions" documenting the new namespace and the three canonical use patterns (telemetry, per-run scratch, artifact metadata). * test(workflows): drop inline double-quotes in run_id shell tests `test_shell_run_resolves_run_id` and `test_switch_expression_matches_on_run_id` used `run: 'echo "RUN_ID={{ context.run_id }}"'` with inner double-quotes around the echo argument. Bash/sh strips those quotes before invoking echo, but cmd.exe (used on Windows when `shell=True`) treats them as literal characters and emits `"RUN_ID=abc12345"` — failing the exact-match assertion. Linux passed; all three Windows-latest matrix entries failed with `assert '"RUN_ID=abc12345"' == 'RUN_ID=abc12345'`. Resolve by dropping the inner double-quotes (the value has no spaces or shell metacharacters) and wrapping the YAML scalar in plain double-quotes the same way other shell-step tests in this file do (e.g. `run: "echo b-saw-..."`). Behaviour-equivalent on POSIX, portable to cmd.exe.	2026-05-27 13:00:58 -05:00
LahkLeKey	0ae451f697	Add util for windows sub-process (#2598 ) * Add util for windows sub-process * Use platform-aware Copilot executable in subprocess calls * Update test_workflows.py	2026-05-26 08:25:15 -05:00
Manfred Riem	616eba6a57	fix: while/do-while loop condition reads stale iteration-0 step output (#2662 ) * fix: while/do-while loop condition reads stale iteration-0 step output After executing namespaced loop body steps, copy each iteration's results back to the original unprefixed step key so that evaluate_condition() sees the latest values instead of stale iteration-0 data. Fixes #2592 * address review: cross-platform tests, preserve iteration-0 history - Rewrite shell scripts in tests to use Python via script files instead of POSIX syntax, so they pass on Windows CI. - Snapshot iteration-0 nested-step results under a namespaced key (parent:child:0) before the first copy-back overwrite, preserving complete per-iteration history for debugging. * address review: skip copy-back on paused/failed iterations Move the status check before the copy-back so that partial results from paused or failed nested steps (e.g., a gate awaiting input) do not overwrite the unprefixed key. This preserves correct resume behavior. * address review: quote paths in test shell commands Quote both the Python executable and script file paths in the run: commands to handle spaces in paths on Windows. * address review: execute loop body with original IDs Instead of namespacing step IDs for execution and copying results back, execute the loop body with original (unprefixed) step IDs so results naturally land at the right keys. Snapshot previous iteration results to namespaced keys (parent:child:N) for history only. This fixes multi-step loop bodies where step B references step A's output within the same iteration — previously step B would see stale data until the copy-back ran after the entire iteration. * address review: namespaced execution with per-step copy-back Revert to namespaced step IDs for execution (preserving unique log entries and state keys per iteration) but copy each step's result back to the unprefixed key immediately after it completes. This preserves backward compatibility (same namespaced key format, same log IDs) while fixing both the condition evaluation bug and inter-step references within multi-step loop bodies. * address review: alias after status check, add multi-step body test - Move per-step aliasing below the PAUSED/FAILED/ABORTED status check so partial results from incomplete steps are not aliased back to the unprefixed key. - Add test_while_loop_multi_step_body_inter_step_refs to exercise a multi-step loop body where step B reads step A's output within the same iteration, verifying per-step aliasing works correctly. Addresses feedback from @doquanghuy (items 2 & 4) and Copilot review on commit `9d0a222`. * address review: stable fallback IDs, expression-based inter-step test - Use enumerate() for stable fallback IDs when loop body steps lack an explicit id (step-0, step-1, etc. instead of always step-0). - Rewrite multi-step body test so step B uses expression substitution ({{ steps.step-a.output.stdout }}) instead of reading the counter file directly, making it a true regression test for per-step aliasing.	2026-05-21 12:25:03 -05:00
darion-yaphet	3d50f85875	fix(cli): clarify exception diagnostics (#2602 ) Consolidate the CLI diagnostic plan, implementation, and test hardening into one reviewable change. The CLI now reports phase and target context for broad failure paths while preserving existing fail-fast behavior for real setup failures and warning-only behavior for optional best-effort work. The workflow unit tests also avoid discovering real local agent CLIs, so developer machines with tools such as gemini installed do not hang pytest during metadata-only assertions. Constraint: CLI setup failures must remain fail-fast, while optional preset and cleanup paths should continue with clear warnings. Rejected: Replace broad handlers across the whole codebase in one pass \| too broad for a targeted CLI diagnostic fix Rejected: Add runtime timeouts to workflow agent dispatch \| dispatch may legitimately be long-running and the observed hang was test isolation Confidence: high Scope-risk: moderate Directive: Keep future best-effort CLI warnings tied to the failed phase and target so users can diagnose setup state. Tested: uvx ruff check src/; uv run pytest tests/integrations/test_cli.py -v; uv run pytest tests/test_workflows.py::TestCommandStep::test_step_override_integration tests/test_workflows.py::TestPromptStep::test_execute_with_step_integration tests/test_workflows.py::TestPromptStep::test_execute_with_model -vv; uv run pytest Not-tested: Real Nacos/PG/Redis-style external service failure injection; real interactive workflow dispatch against installed gemini CLI	2026-05-20 21:19:48 -05:00
Quratulain-bilal	409ec59704	fix(workflow): support integration: auto to follow project's initialized AI (#2421 ) * fix(workflow): support integration: auto to follow project's initialized AI Closes #2406 (squashed) * fix(workflow): combine JSONDecodeError and UnicodeDecodeError handling Address Copilot feedback: UnicodeDecodeError can be raised by both read_text() and json.loads(), so combining the handlers ensures both cases produce a consistent, clear error message. * fix(workflows): honor integration_state schema guard and modern state in 'integration: auto' Three Copilot follow-ups on PR #2421: 1. engine.py:799 — `_load_project_integration` was bypassing the same schema guard `_read_integration_json` enforces. It now reads the schema field directly, returns None on a future schema (so the workflow falls back to the literal 'auto' default rather than guessing), and routes through `normalize_integration_state` / `default_integration_key` so modern installs that record `default_integration` / `installed_integrations` (without the legacy top-level `integration` field) resolve correctly. 2. test_workflows.py — added two regression cases: - `integration: auto` resolves a modern normalized state file - `integration: auto` falls back when the state file declares a newer `integration_state_schema` than this CLI supports 3. test_cli.py — added a CLI-level regression for the `UnicodeDecodeError` branch in `_read_integration_json` to match the existing malformed-JSON coverage. * refactor(integration): extract shared try_read_integration_json helper Address Copilot review on PR #2421: Both `_read_integration_json` (CLI) and `_load_project_integration` (workflow engine) were parsing `.specify/integration.json` independently, duplicating the schema guard and risking drift between the two readers. Extract the parse + schema validation into a single low-level helper `try_read_integration_json` in `integration_state.py` that returns either the normalized state or a structured `IntegrationReadError`. Both callers now delegate to this helper: - CLI keeps its loud-fail UX: each error kind ("decode", "os", "not_object", "schema_too_new") is translated into the existing console message + typer.Exit(1). - Engine keeps its silent fallback: any error simply returns None so `integration: auto` falls back to the workflow's literal default. This eliminates the divergence Copilot flagged without changing observable behavior for either caller. * fix(integration): distinguish missing file from non-regular path Address Copilot review on PR #2421: `try_read_integration_json` was collapsing two distinct cases into a single `(None, None)` return: 1. `.specify/integration.json` truly missing — silent fallback is correct. 2. Path exists but is a directory, socket, or other non-regular file — this is a misconfiguration the CLI should surface loudly. Split the check: `exists()` falsey returns `(None, None)`; existing-but- not-a-regular-file returns `(None, IntegrationReadError(kind="os", ...))` so the CLI's loud-fail path produces an actionable error while the engine still treats it as a fallback to the workflow's literal default. * docs(workflow): clarify version pin, advisory integrations list, enum exemption - workflow.yml: fix comment that said 0.8.3 was first release with auto resolution; the pin is >=0.8.5 so the comment now matches the pin. - workflow.yml: clarify that requires.integrations.any is an advisory, non-exhaustive compatibility hint, not a closed set. - engine.py: clarify that the auto-sentinel exemption only skips enum membership; declared type is still enforced through _coerce_input. * fix(workflow): resolve auto sentinel for provided values; report stat errors Two Copilot findings fixed: 1. _resolve_inputs only resolved the ``integration: auto`` sentinel when it came from the input default. A caller explicitly providing ``{"integration": "auto"}`` (which the workflow prompt advertises as a valid value) bypassed _resolve_default and the literal "auto" leaked to dispatch. Provided values now go through the same resolution path as defaults, and the enum-membership exemption applies in both cases. Regression test added. 2. try_read_integration_json used Path.exists() / Path.is_file() as a pre-check. Both return False on some OSErrors (e.g. permission errors during stat), which silently treated an unreadable-but-present file as missing — the engine fell back without warning and the CLI failed to surface the loud error. The pre-check is gone: read_text() is attempted directly, FileNotFoundError means missing (silent fallback), IsADirectoryError and other OSErrors become loud IntegrationReadError. * fix(workflow): enforce declared type for string inputs, reject bool-as-number Two Copilot findings fixed: 1. _coerce_input previously coerced/validated only ``number`` and ``boolean`` types, so ``type: string`` silently accepted any Python value (numbers, lists, dicts). A YAML authoring mistake like ``type: string`` + ``default: 5`` slipped through. Strings are now required to actually be strings; non-strings raise ValueError, which surfaces as an ``invalid default`` error from validate_workflow. 2. ``type: number`` accepted ``default: true`` because ``bool`` is a subclass of ``int`` (``float(True) == 1.0``). Bools are now rejected explicitly in the number path so the YAML mistake fails fast. The boolean path is also tightened to reject non-bool / non-string values for symmetry. Comment on the auto-sentinel enum exemption updated to reflect the stronger guarantee. Regression tests added for both rejections. * fix(cli): drop unused normalize_integration_state import to satisfy ruff CI's `uvx ruff check src/` flagged this as F401: the symbol was imported under a private alias but never referenced. Tests stay green after removal.	2026-05-15 16:03:33 -05:00
Manfred Riem	370b5b4890	fix(copilot): use --yolo to grant all permissions in non-interactive mode (#2298 ) * fix(copilot): use --yolo to grant all permissions in non-interactive mode The Copilot CLI's --allow-all-tools flag only covers tool execution permissions but does not grant path or URL access. When the Copilot agent autonomously runs shell commands (e.g. npm run build) during workflow execution, the CLI blocks path access and cannot prompt for approval in non-interactive mode, producing: Permission denied and could not request permission from user Replace --allow-all-tools with --yolo (equivalent to --allow-all-tools --allow-all-paths --allow-all-urls) to grant all three permission types. Rename the opt-out env var from SPECKIT_ALLOW_ALL_TOOLS to SPECKIT_COPILOT_ALLOW_ALL to match the formal --allow-all alias and scope it to the Copilot integration. Fixes #2294 * review: deprecate SPECKIT_ALLOW_ALL_TOOLS, rename to SPECKIT_COPILOT_ALLOW_ALL_TOOLS Address Copilot review feedback: - Honour the old SPECKIT_ALLOW_ALL_TOOLS env var as a fallback with a DeprecationWarning so existing opt-outs are not silently ignored. - Rename the new canonical env var to SPECKIT_COPILOT_ALLOW_ALL_TOOLS. - New var takes precedence when both are set. - Use monkeypatch in tests to avoid flakiness from ambient env vars. - Add tests for deprecation warning, precedence, and opt-out paths. * review: use UserWarning instead of DeprecationWarning DeprecationWarning is suppressed by default in Python, so users relying on the old SPECKIT_ALLOW_ALL_TOOLS env var would never see the deprecation notice during normal CLI runs. Switch to UserWarning which is always shown. Update test to also assert the warning category.	2026-04-21 11:41:52 -05:00
Ayesha Khalid	b4c4e86cbc	fix(integrations): strip UTF-8 BOM when reading agent context files (#2283 ) * fix(integrations): strip UTF-8 BOM when reading agent context files * test(integrations): add BOM regression tests for context file read/write * test(workflows): mock shutil.which in tests that assume CLI is absent * test(integrations): remove unused manifest variable in BOM test	2026-04-20 16:00:20 -05:00
Manfred Riem	02a1d610df	docs: add workflows reference, reorganize into docs/reference/, and add --version flag (#2244 ) * docs: add workflows reference, reorganize into docs/reference/, and add --version flag - Move integrations.md, extensions.md, presets.md into docs/reference/ - New docs/reference/workflows.md: command reference for all workflow commands, built-in SDD Cycle workflow with Mermaid diagram, step types, expressions, input types, state/resume, and FAQ - Rename workflow input feature_name to spec with prompt 'Describe what you want to build' to match speckit.specify command terminology - Add --version / -V flag to root specify command with tests - Update docs/toc.yml, README.md links, and docs/upgrade.md cross-reference to use reference/ paths - Add workflow command to README CLI reference table * docs: update speckit_version requirement to >=0.7.2 in workflow example	2026-04-16 13:34:08 -05:00
Copilot	a00e679918	Add workflow engine with catalog system (#2158 ) * Initial plan * Add workflow engine with step registry, expression engine, catalog system, and CLI commands Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Add comprehensive tests for workflow engine (94 tests) Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback: do-while condition preservation and URL scheme validation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback, add CLI dispatch, interactive gates, and docs Review comments (7/7): - Add explanatory comment to empty except block - Implement workflow catalog download with cleanup on failure - Add input type coercion for number/boolean/enum - Fix example workflow to remove non-existent output references - Fix while_loop and if_then condition defaults (string 'false' → bool False) - Fix resume step index tracking with step_offset parameter CLI dispatch: - Add build_exec_args() and dispatch_command() to IntegrationBase - Override for Claude (skills: /speckit-specify), Gemini (-m flag), Codex (codex exec), Copilot (--agent speckit.specify) - CommandStep invokes installed commands by name via integration CLI - Add PromptStep for arbitrary inline prompts (10th step type) - Stream CLI output live to terminal (no silent blocking) - Remove timeout when streaming (user can Ctrl+C) - Ctrl+C saves state as PAUSED for clean resume Interactive gates: - Gate steps prompt [1] approve [2] reject in TTY - Fall back to PAUSED in non-interactive environments - Resume re-executes the gate for interactive prompting Documentation: - workflows/README.md — user guide - workflows/ARCHITECTURE.md — internals with Mermaid diagrams - workflows/PUBLISHING.md — catalog submission guide Tests: 94 → 122 workflow tests, 1362 total (all passing) * Fix ruff lint errors: unused imports, f-string placeholders, undefined name * Address second review: registry-backed validation, shell failures, loop/fan-out execution, URL validation - VALID_STEP_TYPES now queries STEP_REGISTRY dynamically - Shell step returns FAILED on non-zero exit code - Persist workflow YAML in run directory for reliable resume - Resume loads from run copy, falls back to installed workflow - Engine iterates while/do-while loops up to max_iterations - Engine expands fan-out per item with context.item - HTTPS URL validation for catalog workflow installs (HTTP allowed for localhost) - Fix catalog merge priority docstring (lower number wins) - Fix dispatch_command docstring (no build_exec_args_for_command) - Gate on_reject=retry pauses for re-prompt on resume - Update docs to 10 step types, add prompt step to tables and README * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Address third review: fan-out IDs, catalog guards, shell coercion, docs - Fan-out generates unique per-item step IDs and collects results - Catalog merge skips non-dict workflow entries (malformed data guard) - Shell step coerces run_cmd to str after expression evaluation - urlopen timeout=30 for catalog workflow installs - yaml.dump with sort_keys=False, allow_unicode=True for catalog configs - Document streaming timeout as intentionally unbounded (user Ctrl+C) - Document --allow-all-tools as required for non-interactive + future enhancement - Update test docstring and PUBLISHING.md to 10 step types with prompt * Validate final URL after redirects in catalog fetch urlopen follows redirects, so validate the response URL against the same HTTPS/localhost rules to prevent redirect-based downgrade attacks. * Address fourth review: filter arg eval, tags normalization, install redirect check - Filter arguments now evaluated via _evaluate_simple_expression() so default(42) returns int not string - Tags normalized: non-list/non-string values handled gracefully - Install URL redirect validation (same as catalog fetch) - Remove unused 'skipped' variable in catalog config parsing - Author 'github' → 'GitHub' in example workflow - Document nested step resume limitation (re-runs parent step) * Add explanatory comment to empty except ValueError block * Address fifth review: expression parsing, fan-out output, URL install, gate options - Move string literal parsing before operator detection in expressions so quoted strings with operators (e.g. 'a in b') are not mis-parsed - Fan-out: remove max_concurrency from persisted output, fix docstring to reflect sequential execution - workflow add: support URL sources with HTTPS/redirect validation, validate workflow ID is non-empty before writing files - Deduplicate local install logic via _validate_and_install_local() - Remove 'edit' gate option from speckit workflow (not implemented) * Add comments to empty except ValueError blocks in URL install * Address sixth review: operator precedence, fan_in cleanup, registry resilience, docs - Fix or/and operator precedence (or parsed first = lower precedence) - Restore context.fan_in after fan-in step completes - Catch JSONDecodeError in registry load for corrupted files - Replace print() with on_step_start callback (library-safe) - Gate validation warns when on_reject set but no reject option - Shell step: document shell=True security tradeoff - README: sdd-pipeline → speckit, parallel → sequential for fan-out - ARCHITECTURE.md: parallel → fan-out/fan-in in diagram * Address seventh review: string literal before pipe, type annotations, validate on install - Move string literal check above pipe filter parsing so 'a \| b' works - Fix type annotations: input_values list[str] \| None, run_id str \| None - Run validate_workflow() before installing from local path/URL - Remove duplicate string literal check from expression parser * Address eighth review: fan-out namespaced IDs, early return, catalog validation - Fan-out per-item step IDs use _fanout_{step_id}_{base}_{idx} namespace to avoid collisions with user-defined step IDs - Early return after fan-out loop when state is paused/failed/aborted - Catalog installs parse + validate downloaded YAML before registering, using definition metadata instead of catalog entry for registry * Address ninth review: populate catalog, fix indentation, priority, README - Add speckit workflow entry to catalog.json so it's discoverable - Fix shell step output dict indentation - Catalog add_catalog priority derived from max existing + 1 - README Quick Start clarified with install + local file examples * Address tenth review: max_iterations validation, catalog config guard, version alignment - Validate max_iterations is int >= 1 in while and do-while steps - Guard add_catalog against corrupted config (non-dict/non-list) - Align speckit_version requirement to >=0.6.1 (current package version) - Fan-out template validation uses separate seen_ids set to avoid false duplication errors with user-defined step IDs * Address eleventh review: command step fails without CLI, ID mismatch warning, state persistence - Command step returns FAILED when CLI not installed (was silent COMPLETED) - Catalog install warns on workflow ID vs catalog key mismatch - Engine persists state.save() before returning on unknown step type - Update tests to expect FAILED for command steps without CLI - Integration tests use shell steps for CLI-independent execution * Address twelfth review: type annotations, version examples, streaming docs, requires - Fix workflow_search type annotations (str \| None) - PUBLISHING.md: speckit_version >=0.15.0 → >=0.6.1 - Document that exit_code is captured and referenceable by later steps - Mark requires as declared-but-not-enforced (planned enhancement) - Note full stdout/stderr capture as planned enhancement * Enforce catalog key matches workflow ID (fail instead of warn) * Bundle speckit workflow: auto-install during specify init - Add workflows/speckit to pyproject.toml force-include for wheel builds - Add _locate_bundled_workflow() helper (mirrors _locate_bundled_extension) - Auto-install speckit workflow during specify init (after git extension) - Update all integration file inventory tests to expect workflow files * Address fourteenth review: prompt fails without CLI, resolved step data, fan-out normalization - PromptStep returns FAILED when CLI not installed (was silent COMPLETED) - Engine step_data prefers resolved values from step output - Fan-out normalizes output.results=[] for empty item lists - subprocess.run inherits stdout/stderr (no explicit sys.stdout) - Registry tests use issubset for extensibility * Address fifteenth review: fan_in docstring, gate defaults, validation guards, reserved prefix - FanInStep docstring: aggregate-only, no blocking semantics - FanInStep: guard output_config as dict, handle None - Gate validate: use same default options as execute - Validate inputs is dict and steps is list before iterating - Reserve _fanout_ prefix in step ID validation - PUBLISHING.md: remove unenforced checklist items, add _fanout_ note * Address sixteenth review: docs regex, fan_in try/finally, hyphenated dot-path keys - PUBLISHING.md: update ID regex docs to match implementation (single-char OK) - FanInStep: wrap expression evaluation in try/finally for context.fan_in - Expression dot-path: allow hyphens in keys before list index (e.g. run-tests[0]) * Make speckit workflow integration-agnostic, document Copilot CLI requirement - Workflow integration selectable via input (default: claude) - Each command step uses {{ inputs.integration }} instead of hardcoded copilot - Copilot docstring documents CLI requirement for workflow dispatch - Added install_url for Copilot CLI docs * Address seventeenth review: project checks, catalog robustness - Add .specify/ project check to workflow run/resume/status/search/info - remove_catalog validates config shape (dict + list) before indexing - _fetch_single_catalog validates response is a dict - _get_merged_workflows raises when all catalogs fail to fetch - add_catalog guards against non-dict catalog entries in config * Address eighteenth review: condition coercion, gate abort result, while default, cache guard, resume log - evaluate_condition treats plain 'false'/'true' strings as booleans - Gate abort returns StepResult(FAILED) instead of raising exception so step output is persisted in state for inspection - while_loop max_iterations optional (default 10), validation aligned - Catalog cache fallback catches invalid JSON gracefully - resume() appends workflow_finished log entry like execute() * Address nineteenth review: allow-all-tools opt-in, empty catalogs, abort dead code, while docstring - --allow-all-tools controlled by SPECKIT_ALLOW_ALL_TOOLS env var (default: 1) Set to 0 to disable automatic tool approval for Copilot CLI - Empty catalogs list falls back to built-in defaults (not an error) - Remove unreachable WorkflowAbortError catches from execute/resume (gate abort now returns StepResult(FAILED) instead of raising) - while_loop docstring updated: max_iterations is optional (default 10) * Address twentieth review: gate abort maps to ABORTED status, do-while max_iterations optional - Engine detects output.aborted from gate step and sets RunStatus.ABORTED (was unreachable — gate abort returned FAILED but status was always FAILED) - do-while max_iterations now optional (default 10), aligned with while_loop - do-while docstring and validation updated accordingly * Coerce default_options to dict, align bundled workflow ID regex with validator * Gate validates string options, prompt uses resolved integration, loop normalizes max_iterations * Use parentId:childId convention for nested step IDs - Fan-out per-item IDs use parentId:templateId:index (e.g. parallel:impl:0) - Reserve ':' in user step IDs (validation rejects) - Replaces _fanout_ prefix with cleaner namespacing - Expressions like {{ steps.parallel:impl:0.output.file }} work naturally * Validate workflow version is semantic versioning (X.Y.Z) * Schema version validation, strict semver, load_workflow docstring, preserve max_concurrency - Validate schema_version is '1.0' (reject unknown future schemas) - Strict semver regex: ^\d+\.\d+\.\d+$ (rejects 1.0.0beta etc.) - load_workflow docstring: 'parsed' not 'validated' - Keep max_concurrency in fan-out output (was dropped) - do_while docstring: engine re-evaluates step_config condition - ARCHITECTURE.md: document nested resume limitation * Path traversal prevention, loop step ID namespacing - RunState validates run_id is alphanumeric+hyphens (no path separators) - workflow_add validates catalog source doesn't escape workflows_dir - Loop iterations namespace nested step IDs as parentId:childId:iteration so multiple iterations don't overwrite each other in context/state --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-14 10:11:56 -05:00

36 Commits