github-spec-kit

mirror of https://github.com/github/spec-kit.git synced 2026-07-03 20:36:23 +08:00

Author	SHA1	Message	Date
Ali jawwad	d378485696	fix(workflows): reject infinite number-input default instead of raising OverflowError (#3199 ) WorkflowEngine._coerce_input normalizes a whole-valued number to int via int(value). For an infinite float (e.g. a 'type: number' input with YAML 'default: .inf') int(inf) raises OverflowError, which is not in the except (ValueError, TypeError) tuple. validate_workflow eager-coerces declared defaults and is documented to RETURN a list of errors, but it only catches ValueError -- so the OverflowError escaped and validate_workflow raised instead of reporting, breaking its contract. (NaN already surfaced cleanly because int(nan) raises ValueError.) Add OverflowError to the except tuple so an infinite default surfaces as the same clean 'expected a number' ValueError as NaN, consistent with the function's existing fail-fast-on-authoring-mistakes design. Finite values (5.0 -> 5, 3.5 -> 3.5) are unaffected. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 10:05:51 -05:00
Zied Jlassi	f846d6526c	fix(workflows): validate requires keys and reject phantom permissions gate (#3079 ) * fix(workflows): validate requires keys and reject phantom permissions gate A workflow's `requires` block was parsed but its keys were never validated, so a typo or an unsupported key was silently ignored. Most importantly, authors could write `requires.permissions.shell: true` expecting a runtime capability gate — but no such gate exists: a `shell` step always runs with the user's privileges. The declaration gave a false sense of sandboxing. `validate_workflow` now accepts only the recognised keys (`speckit_version`, `integrations`, `tools`, `mcp`) and rejects anything else, with an explicit error for `requires.permissions` pointing authors to `gate` steps for approval. Docs and the model comment are updated to state that `requires` is advisory, not a security boundary. - Reject non-mapping `requires`, unknown keys, and `requires.permissions` - Clarify workflows reference + PUBLISHING.md shell-step guidance - Tests for valid keys, non-mapping, unknown key, and permissions Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> Assisted-by: AI * fix(workflows): address review feedback on requires validation Follow-up to the review on #3079: - Guard `requires` validation on `is not None` instead of truthiness so a falsy non-mapping value (e.g. `requires: []` or `requires: ''`) is reported as an error instead of being silently skipped; `requires:` (YAML null) is still treated as an omitted block. Add a regression test. - Reword the workflows security note so `requires.permissions` is shown as rejected/unsupported rather than as a valid example of `requires`. - Standardize on US spelling (`_RECOGNIZED_REQUIRES_KEYS`, "recognized") to match the surrounding code and ease searching. - Tighten the permissions-rejection test to assert on specific message markers (`requires.permissions` and the `gate` guidance) so it fails if the validation path or wording drifts. Assisted-by: AI Signed-off-by: Zied Jlassi (Architect AI) <6190550+zied-jlassi@users.noreply.github.com> * fix(workflows): scope requires validation to workflow keys (drop tools/mcp) tools and mcp belong to the bundle manifest requires schema (bundler/models/manifest.py, resolved in bundler/services/resolver.py), not the workflow requires validated here. Drop them from _RECOGNIZED_REQUIRES_KEYS and revert the PUBLISHING.md claim that this PR had introduced, so workflow requires only recognizes speckit_version and integrations. This keeps the existing docs accurate and resolves the inline doc-consistency review comments. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> * refactor(workflows): type WorkflowDefinition.requires as Any pre-validation self.requires holds the raw parsed value, which before validate_workflow() runs may be a non-mapping (None for a bare 'requires:', a list for 'requires: []', etc.). Annotating it dict[str, Any] was misleading for editors/type-checkers; use Any and document that validate_workflow() enforces the mapping shape. Addresses Copilot review feedback on engine.py. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> * fix(workflows): reject YAML-null requires: as a non-mapping Address Copilot review: validate requires the same way as inputs. A bare requires: parses as YAML null and was previously treated as an omitted block, which is inconsistent with inputs and lets a stray requires: line be silently ignored. Drop the is-not-None guard and check isinstance(..., dict) directly: an omitted block still defaults to {} (valid), but a present-but-non-mapping value -- YAML null, [] or '' -- is now an authoring error that surfaces. Tests: add YAML-null rejection + an omitted-is-still-valid guard test. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> --------- Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com> Signed-off-by: Zied Jlassi (Architect AI) <6190550+zied-jlassi@users.noreply.github.com>	2026-06-24 14:49:43 -05:00
Huy Do	f5f76160a3	feat: surface gate detail in the workflow run/resume --json payload (#2965 ) * feat: surface gate detail in the workflow run/resume --json payload A paused run was indistinguishable from any other pause in the machine-readable outcome, and the gate's prompt/options/choice never left the human-facing stream. Record each step's type in the run state's step results (one engine line) and, when the run sits at a gate, add a gate block (step_id/message/options/choice) to the payload so orchestrators can drive review gates without parsing stdout. Reference implementation for the proposal in #2964. Addresses #2964 * fix(workflow): only surface gate detail in --json when the run is paused Address review (#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(workflows): surface gate block on aborted runs; stabilize message Address Copilot review: - `_gate_outcome` now also surfaces the gate block when a run is `aborted` by a gate rejection (`on_reject: abort`), not only when `paused`. Abort is the only path that sets ABORTED and it leaves current_step_id on the gate, so an orchestrator can read the recorded `choice` for the stop. - Coerce `message` to a string (it may be a non-string YAML literal that GateStep only coerces for interpolation) so the JSON schema stays stable. - Tests: add a CLI-level aborted-path test, a message-coercion test, and extend the suppression test to allow `aborted`; share the run helper via `_invoke_json` to avoid duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): assert clean exit in gate-abort JSON test Address Copilot review: the gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would surface as an opaque JSON decode error. Route it through `_run_json` (which asserts exit_code == 0 before parsing) and drop the now-redundant `_invoke_json` helper — a gate abort emits the payload and returns, so the run exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: use result.output in run-helper assert; document step_data shape Address Copilot review: - `_run_json` asserted with `result.stdout` in the message, but under `--json` step output is redirected off stdout — the useful diagnostics live on `result.output`. Switch the assertion message to `result.output` (the JSON parse still reads stdout), matching the other CLI tests. - `StepContext.steps` documented a 5-key entry shape; the engine now also persists `type` and `status`. Update the docstring to the canonical 7-key shape so step authors/debuggers see the real record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(workflows): align gate-abort JSON test with aborted→exit-1 After rebasing onto main, a gate abort now emits the --json payload and then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the merged exit-code work). Give `_run_json` an `expected_exit` parameter (default 0) so the abort case asserts exit 1 while the paused/completed cases stay at 0 — keeping a single shared helper rather than duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): backward-compat gate detection + normalize gate options Address Copilot review: - A run paused by an older version has no persisted step `type`, so `_gate_outcome` would never surface its gate block on resume. Add `_is_gate_step`: prefer the `type` field, but when it is absent fall back to the gate's unique output signature (`on_reject`, written only by GateStep). A record with a different known `type` is still not a gate. - Normalize `options` to a list of strings (mirroring the `message` coercion) so an unvalidated workflow with non-string options can't destabilize the JSON schema. - Tests: options coercion, type-less gate detection, and a type-less non-gate negative case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize non-list gate options to a stable list[str] Address Copilot review: the prior options normalization only mapped a `list`, returning the raw value for any other shape (scalar/tuple), which contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`: None stays None; list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is one option, never iterated character-by-character). The emitted schema is now always list[str] \| None. Extend the options test to cover list, tuple, bare string, numeric scalar, and None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): normalize gate choice to str; portable plain-gate test Address Copilot review: - `_gate_outcome` normalized `message` and `options` but passed `choice` through as-is; an unvalidated gate can record a non-string `choice`, which contradicts the stable-schema rationale. Coerce `choice` to `str \| None` (None still means "no decision yet"), consistent with the other two fields. Adds a focused choice-coercion test. - The plain (no-gate) test workflow used `run: "true"`, which fails under cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform `run: "exit 0"` (matching the exit-code suite's workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-22 07:05:54 -05:00
Copilot	00bff788c9	Add `init` workflow step to bootstrap projects like `specify init` (#2838 ) * Initial plan * Add init workflow step to bootstrap projects like `specify init` * Address review: simplify stderr capture and extract VALID_SCRIPT_TYPES * Address review: fail fast on non-empty dir, stdout fallback, README force fix * Populate exit_code/stdout/stderr in non-empty-dir fast-fail * fix: address three unresolved review comments in InitStep - Use `with os.scandir(...)` context manager so the iterator is always closed even when `any()` short-circuits, preventing file-descriptor leaks in long-running workflow runs. - Guard `os.chdir(prev_cwd)` in the `finally` block with a try/except so an `OSError` (e.g. directory deleted) doesn't bypass returning the captured `StepResult`. - Reject non-string `script` values in `validate()` with a clear error message, rather than silently passing them through to become `--script True` at runtime. * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix: remove no_git and branch_numbering options removed upstream The --no-git and --branch-numbering flags were removed from `specify init` on main. Update InitStep to drop these unsupported config fields and fix tests accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review — integration defaults, integration_options, engine-owned dirs - Apply DEFAULT_INIT_INTEGRATION fallback when neither step config nor workflow context provides an integration, so output.integration always reflects the actual integration used. - Add integration_options config field to support --integration-options passthrough (required for generic integration and --skills mode). - Exclude .specify/ from the non-empty directory fast-fail check so that here: true works when the engine has already created its run-state directory before steps execute. - Note: mix_stderr=False is not needed — Click 8.2+ captures stderr separately by default and the existing try/except handles access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: implicitly add --force when only engine-owned dirs exist When the workflow engine creates .specify/workflows/runs/ before steps execute, the directory is technically non-empty. Previously, specify init would prompt for confirmation (hanging in unattended mode) unless the user explicitly set force: true. Now the step detects that only engine-owned directories (.specify/) are present and implicitly adds --force so init proceeds without user interaction. Also fixes the test to exercise the implicit-force path rather than passing force: True explicitly (which bypassed the check entirely). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: derive VALID_SCRIPT_TYPES from shared constant, fail fast on OSError, include all resolved fields in output - Derive VALID_SCRIPT_TYPES from SCRIPT_TYPE_CHOICES in _agent_config so the valid set cannot drift from the specify init CLI. - Fail fast with a clear error when os.scandir() raises OSError (e.g. permission denied) instead of silently treating the directory as empty. - Include preset, force, and ignore_agent_tools in all output dicts (both fast-fail and normal paths) for consistent interpolation and debugging downstream. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: populate stderr from stdout on older Click, fix force comment wording - When Click does not expose result.stderr (older versions where stderr is mixed into stdout), use stdout as stderr on non-zero exit so workflows can consistently read steps.<id>.output.stderr for errors. - Update README inline comment for force: wording to say 'when target directory already exists' rather than 'non-empty directory', matching the actual specify init behavior for the project: form. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: build argv flags before early returns, use any() for dir scan - Move argv flag-building (--integration, --script, --preset, --ignore-agent-tools) before the non-empty-dir and OSError early returns so output['argv'] always reflects the complete command. - --force is appended after the check since it may be set implicitly. - Replace list comprehension with any() generator expression to short-circuit without allocating a full list of DirEntry objects. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: only treat .specify as engine-owned when it is a real directory A file or symlink named .specify should not be excluded from the non-empty check. Use entry.is_dir(follow_symlinks=False) to ensure only an actual directory is considered engine-owned content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard implicit force for engine dirs only, fix integration fallback order - Only set implicit --force when engine-owned directories (.specify/) are actually present. A completely empty directory no longer gets --force added unnecessarily. - Fix integration resolution precedence: resolve step config expression first, then fall back to workflow default (also resolved), then to DEFAULT_INIT_INTEGRATION. Previously, a step expression resolving to falsy would bypass the workflow default entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Manfred Riem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 11:46:51 -05:00
Copilot	40d832f90a	Allow `specify workflow run` to execute YAML files without a project (#2825 ) * Initial plan * feat: add --workflow option to init command for post-init workflow execution * chore: remove unused import in test file * refactor: allow workflow run without project when given a YAML file path Instead of adding --workflow to init, make `specify workflow run ./file.yml` work without requiring a .specify/ project directory. When the source is a YAML file that exists on disk, cwd is used as the project root. When it's a workflow ID, the .specify/ project requirement is preserved. * Handle standalone workflow path edge cases * Fix USERPROFILE env var portability and docs notation * Fix workflow YAML path detection to require regular files * Harden workflow run against unsafe .specify paths --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-06-04 07:09:54 -05:00
Quratulain-bilal	bb2b49d0ae	fix(workflows): validate run_id in RunState.load before touching the … (#2813 ) * fix(workflows): validate run_id in RunState.load before touching the filesystem ``RunState.load(run_id, project_root)`` interpolates ``run_id`` directly into ``project_root / ".specify" / "workflows" / "runs" / run_id`` and then calls ``state_path.exists()`` and ``json.load`` on the result. The run_id is reachable from user input via ``specify workflow resume <run_id>`` (CLI argument) and via ``SPECKIT_WORKFLOW_RUN_ID`` (env var override on the engine's run path), so a value like ``../escape`` turns ``runs_dir`` into ``.specify/workflows/escape/`` and: * ``state_path.exists()`` becomes a file-existence oracle for any path the process can read. * if a ``state.json`` exists at the traversed location (planted by a malicious dependency, a misconfigured shared workspace, or an older spec-kit version that happened to write there), ``json.load`` parses it and the workflow resumes under the attacker-chosen ``workflow_id`` / step state. * a subsequent ``state.save()`` then writes back to the traversed location, persisting the corruption. ``RunState.__init__`` already validates ``run_id`` against ``r'^[a-zA-Z0-9][a-zA-Z0-9_-]$'`` — but that check runs on ``state_data["run_id"]`` after* ``load`` has already done the file lookup, which is too late to prevent the disclosure. This change extracts the pattern into a class-level constant ``_RUN_ID_PATTERN`` and a single ``_validate_run_id`` classmethod so ``__init__`` and ``load`` cannot drift, then calls the validator at the top of ``load`` before any path is built. Mirrors the precedent in ``src/specify_cli/agents.py::_ensure_within_directory`` (used at line 437 of that file) which guards extension-install paths against the same threat model. Regression tests parametrize 9 traversal vectors (``../escape``, ``..``, ``../../etc/passwd``, ``foo/bar``, ``foo\bar``, ``.hidden``, ``-flag``, ``foo\x00bar``, empty) and plant a malicious ``state.json`` outside ``runs/`` so a missing guard would surface as a successful load rather than the ambiguous ``FileNotFoundError``. A second test asserts ``__init__`` and ``load`` reject the same representative malformed ID, so future changes to one path can't silently drift from the other. * test(workflows): exercise RunState.load in shared-validation test, fix __init__ empty-string asymmetry Copilot's review on this PR pointed out that test_init_and_load_share_validation claimed to verify both entry points share the same validation rules but never actually called RunState.load — only __init__ and the shared _validate_run_id helper. A regression in load (e.g. someone deleting the cls._validate_run_id(run_id) call before the path is built) would slip through even though __init__ and the helper stayed aligned, defeating the whole point of the test. Tightening the test surfaced a real asymmetry the previous version was silently masking: self.run_id = run_id or str(uuid.uuid4())[:8] The truthiness fallback meant RunState(run_id="") silently substituted a UUID and skipped validation, while RunState.load("", project_root) correctly rejected the empty string. The two entry points diverged on the empty-string vector. That is exactly the drift the test name claimed to defend against — and the original test missed it. Changes ------- * engine.py: __init__ now distinguishes run_id is None (caller omitted it → auto-generate UUID) from an empty string (caller provided it → must validate like any other value). Both paths still flow through _validate_run_id, but only the explicit-None case auto-generates. * test_workflows.py: test_init_and_load_share_validation is now parametrized over one representative vector per category from test_load_rejects_path_traversal (parent traversal, embedded separator, leading non-alphanumeric, empty string) and asserts that all three entry points — __init__, _validate_run_id, and load — reject the same input. Adding load to the assertion is the substantive fix Copilot asked for; keeping __init__ and the helper alongside it makes any future drift between the three immediately observable instead of having to read three separate tests. Verification ------------ pytest tests/test_workflows.py — 168 passed (was 165 before the parametrize expansion; __init__ empty-string vector would have failed the new test against the old engine code, confirming the asymmetry was real).	2026-06-03 14:26:07 -05:00
Huy Do	1732b9b62e	feat(workflows): allow resume to accept updated workflow inputs (#2815 ) `workflow resume` now accepts `--input key=value` (the same flag and parsing as `workflow run`, via a shared `_parse_input_values` helper). Supplied values are merged over the run's persisted inputs and re-resolved through the existing typed-validation path (`_resolve_inputs`), so a resumed/re-run step sees the updated inputs and ill-typed values fail fast. Keys not supplied keep their persisted values; resuming without `--input` is unchanged. Reference docs updated. Distinct from #2405 (file-reference inputs at run time): this is about supplying inputs at resume time, reusing the existing input model. Closes #2812. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 12:04:07 -05:00
Huy Do	7bab0568c5	feat(workflows): add continue_on_error step field for non-halting failures (#2663 ) * feat(workflows): add continue_on_error step field Adds an optional `continue_on_error: bool` field on every step. When set to `true` and the step fails, the engine records the result (`exit_code`, `stderr` on `steps.<id>.output` plus `status` as a sibling key on `steps.<id>`) and continues to the next sibling step instead of halting the run. Downstream `if`, `switch`, or `gate` steps can then branch on `{{ steps.<id>.output.exit_code }}` to route the recovery path. Engine details -------------- `WorkflowEngine._execute_steps` now consults the step config when a step returns `StepStatus.FAILED`: - Gate aborts (`output.aborted`) always halt the run — operator decisions take precedence over the flag. - Otherwise, if `continue_on_error` is the literal `True`, log a `step_continue_on_error` event and proceed to the next sibling. The runtime check uses identity comparison (`is True`) rather than truthiness, so truthy non-bool values like the string `"true"` cannot silently change run semantics even if a caller bypasses `validate_workflow()`. - Otherwise, behave as before: log `step_failed`, set `RunStatus.FAILED`, and return. Validation ---------- `_validate_steps` rejects non-bool values for `continue_on_error`. Coerced strings like `"true"` are not accepted so authoring mistakes surface at validation time rather than silently changing run semantics. Tests ----- `TestContinueOnError` in `tests/test_workflows.py` (8 tests): - `test_undeclared_failure_halts_run` — default halt behaviour. - `test_declared_and_fired_continues_run` — flag + fail → continue. - `test_declared_but_step_succeeded_is_noop` — flag + success → no-op. - `test_if_branch_routes_around_failure` — end-to-end recovery. - `test_gate_abort_still_halts_with_continue_on_error` — abort always halts. - `test_validation_rejects_non_bool_continue_on_error` — `"true"` rejected at validation. - `test_validation_accepts_bool_continue_on_error` — `true`/`false` pass cleanly. - `test_engine_ignores_truthy_non_bool_continue_on_error` — defense-in-depth: engine ignores string `"true"` even when validation is bypassed. Rebased onto current upstream/main (post #2664 merge); the new `TestContinueOnError` class sits immediately after upstream's `TestContextRunId` so the two feature suites coexist cleanly. Closes #2591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): restore runtime context section, clarify gate prompt Two Copilot findings on `d0b9e00`: 1. The `### Runtime Context` documentation for `{{ context.* }}` was lost during the rebase onto current main (the squash dropped the anchor where #2664 had added it). Restored under `## Expressions` so users can find `context.run_id` semantics and examples. 2. The continue_on_error example gate had message "Retry or skip?" but used the default `options: [approve, reject]` with `on_reject: skip`, which implied an automatic retry path that gates do not provide. Reworded the message to match the actual approve/reject semantics and added an explicit note that retry requires either custom gate options + downstream branching or a wrapper loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify continue_on_error scope — returned FAILED only Copilot finding on `d0b9e00`: The README's "Error Handling" intro implied `continue_on_error` covers "any other runtime error raised during step execution", but the engine only consults the flag when a step returns `StepResult(status=FAILED, ...)`. Exceptions raised out of `step_impl.execute()` propagate to `WorkflowEngine.execute()`, where the catch-all logs `workflow_failed` and re-raises — the step result is never recorded, and the flag is never consulted. Audited the whole PR diff for the same overclaim: 1. workflows/README.md — main fix. Reworded the Error Handling intro to "any step that returns StepResult(status=FAILED, ...)" and promoted the parenthetical structural-validation note into the Notes block. Added a new "Scope: returned failures only" note that names the exception path explicitly and tells step authors how to bring the flag into scope for exceptional code (catch internally and return FAILED with the failure encoded in `output`). 2. tests/test_workflows.py — section comment used "when an executable step fails", same ambiguity. Tightened to "when a step returns StepResult(status=FAILED, ...)" and added a sentence calling out that unhandled exceptions are out of scope. 3. src/specify_cli/workflows/engine.py — already correct ("any step that returns FAILED" in the validator comment; "lets the pipeline route around the failure" in the execute path). No change. Engine semantics and test bodies are unchanged. Docs-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify on_reject:skip semantics — engine returns COMPLETED, not auto-skip Copilot finding on `b8982a7`: The README example's gate message said "reject to skip the rest of this branch", and the explanatory paragraph claimed [approve, reject] map to "continue" vs "skip the rest of this branch". The engine does not implement automatic branch-skipping. `on_reject: skip` returns `StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step runs unconditionally unless the author wires a downstream `if` reading `{{ steps.<gate-id>.output.choice }}`. Two fixes: 1. Restructured the YAML example so it actually demonstrates the manual-branching pattern: added a `recover` if-step after the gate that conditions on `steps.review.output.choice == 'approve'`. Now the example shows the real workflow author's responsibility instead of implying the engine does it. 2. Replaced the trailing paragraph with three precise notes: - both gate options return COMPLETED; `on_reject: skip` controls abort behaviour only, not sibling-skipping - all three `on_reject` values enumerated with their actual engine semantics (FAILED+aborted / COMPLETED / PAUSED) - the original retry-loop guidance retained as the third bullet Updated the gate message in the example to match — "reject to leave the failure recorded and move on" instead of "reject to skip the rest of this branch". Audited the whole PR diff for the same overclaim: no other instance. Engine semantics, validation, and test bodies are unchanged. Docs-only. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify gate's role — surfaces, doesn't programmatically branch Audit follow-up to `393ac6b` — three sites repeated the same minor overclaim about gates being one of the "branch on it" step types alongside `if` and `switch`: 1. workflows/README.md (the "downstream `if`, `switch`, or `gate` steps can branch on it" sentence introducing the example) 2. engine.py:236 (validator inline comment) 3. engine.py:657 (execute-path inline comment) A `gate` step does not have a `condition` or `expression` field — it only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36). Programmatic branching happens in `if`/`switch`; a gate surfaces the value to a human operator via message interpolation, and the operator's choice is recorded in `output.choice` for a subsequent `if`/`switch` to route on. Reworded all three sites consistently: "a downstream `if` or `switch` can branch on it (or a `gate` can surface it to the operator via message interpolation)". The README example already demonstrates this distinction — the gate carries `{{ }}` template variables in its message and the `recover` if-step downstream is what actually branches on the choice. Engine semantics, validation, and test bodies are unchanged. Docs-only on the README; comment-only on engine.py. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): use qualified StepStatus.* instead of bare FAILED/COMPLETED/PAUSED Three Copilot inline comments on workflows/README.md lines 226, 282, 288 flagged that ``StepResult(status=FAILED, ...)`` is not valid Python — ``StepResult.status`` is a ``StepStatus`` enum value, so the documented form should be ``StepStatus.FAILED``. Audited the whole PR diff for the same shorthand. The bare unqualified form appears in three files added/modified by this PR: 1. workflows/README.md (6 sites) — three ``StepResult(status=FAILED, ...)`` parentheticals, plus the on_reject Notes bullet listing the three step statuses (``FAILED``, ``COMPLETED``, ``PAUSED``). 2. tests/test_workflows.py (4 sites) — section header for TestContinueOnError, two test-method docstrings, one inline comment about a gate's TTY-fallback behaviour. 3. src/specify_cli/workflows/engine.py (1 site) — the validator inline comment added in `d0b9e00` said "returns FAILED" where the engine code itself uses ``StepStatus.FAILED``. All 11 sites normalised to the qualified ``StepStatus.<name>`` form so the docs / test docstrings / inline comments match what readers will actually find in the engine code and the tests. Engine semantics, validation, and test bodies are unchanged. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 10:10:07 -05:00
Copilot	cc3d828227	Add support for SPECKIT_WORKFLOW_RUN_ID override (#2742 ) * Initial plan * feat: support SPECKIT_WORKFLOW_RUN_ID override * docs: clarify run_id env var precedence wording --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-29 10:50:00 -05:00
Manfred Riem	616eba6a57	fix: while/do-while loop condition reads stale iteration-0 step output (#2662 ) * fix: while/do-while loop condition reads stale iteration-0 step output After executing namespaced loop body steps, copy each iteration's results back to the original unprefixed step key so that evaluate_condition() sees the latest values instead of stale iteration-0 data. Fixes #2592 * address review: cross-platform tests, preserve iteration-0 history - Rewrite shell scripts in tests to use Python via script files instead of POSIX syntax, so they pass on Windows CI. - Snapshot iteration-0 nested-step results under a namespaced key (parent:child:0) before the first copy-back overwrite, preserving complete per-iteration history for debugging. * address review: skip copy-back on paused/failed iterations Move the status check before the copy-back so that partial results from paused or failed nested steps (e.g., a gate awaiting input) do not overwrite the unprefixed key. This preserves correct resume behavior. * address review: quote paths in test shell commands Quote both the Python executable and script file paths in the run: commands to handle spaces in paths on Windows. * address review: execute loop body with original IDs Instead of namespacing step IDs for execution and copying results back, execute the loop body with original (unprefixed) step IDs so results naturally land at the right keys. Snapshot previous iteration results to namespaced keys (parent:child:N) for history only. This fixes multi-step loop bodies where step B references step A's output within the same iteration — previously step B would see stale data until the copy-back ran after the entire iteration. * address review: namespaced execution with per-step copy-back Revert to namespaced step IDs for execution (preserving unique log entries and state keys per iteration) but copy each step's result back to the unprefixed key immediately after it completes. This preserves backward compatibility (same namespaced key format, same log IDs) while fixing both the condition evaluation bug and inter-step references within multi-step loop bodies. * address review: alias after status check, add multi-step body test - Move per-step aliasing below the PAUSED/FAILED/ABORTED status check so partial results from incomplete steps are not aliased back to the unprefixed key. - Add test_while_loop_multi_step_body_inter_step_refs to exercise a multi-step loop body where step B reads step A's output within the same iteration, verifying per-step aliasing works correctly. Addresses feedback from @doquanghuy (items 2 & 4) and Copilot review on commit `9d0a222`. * address review: stable fallback IDs, expression-based inter-step test - Use enumerate() for stable fallback IDs when loop body steps lack an explicit id (step-0, step-1, etc. instead of always step-0). - Rewrite multi-step body test so step B uses expression substitution ({{ steps.step-a.output.stdout }}) instead of reading the counter file directly, making it a true regression test for per-step aliasing.	2026-05-21 12:25:03 -05:00
Quratulain-bilal	409ec59704	fix(workflow): support integration: auto to follow project's initialized AI (#2421 ) * fix(workflow): support integration: auto to follow project's initialized AI Closes #2406 (squashed) * fix(workflow): combine JSONDecodeError and UnicodeDecodeError handling Address Copilot feedback: UnicodeDecodeError can be raised by both read_text() and json.loads(), so combining the handlers ensures both cases produce a consistent, clear error message. * fix(workflows): honor integration_state schema guard and modern state in 'integration: auto' Three Copilot follow-ups on PR #2421: 1. engine.py:799 — `_load_project_integration` was bypassing the same schema guard `_read_integration_json` enforces. It now reads the schema field directly, returns None on a future schema (so the workflow falls back to the literal 'auto' default rather than guessing), and routes through `normalize_integration_state` / `default_integration_key` so modern installs that record `default_integration` / `installed_integrations` (without the legacy top-level `integration` field) resolve correctly. 2. test_workflows.py — added two regression cases: - `integration: auto` resolves a modern normalized state file - `integration: auto` falls back when the state file declares a newer `integration_state_schema` than this CLI supports 3. test_cli.py — added a CLI-level regression for the `UnicodeDecodeError` branch in `_read_integration_json` to match the existing malformed-JSON coverage. * refactor(integration): extract shared try_read_integration_json helper Address Copilot review on PR #2421: Both `_read_integration_json` (CLI) and `_load_project_integration` (workflow engine) were parsing `.specify/integration.json` independently, duplicating the schema guard and risking drift between the two readers. Extract the parse + schema validation into a single low-level helper `try_read_integration_json` in `integration_state.py` that returns either the normalized state or a structured `IntegrationReadError`. Both callers now delegate to this helper: - CLI keeps its loud-fail UX: each error kind ("decode", "os", "not_object", "schema_too_new") is translated into the existing console message + typer.Exit(1). - Engine keeps its silent fallback: any error simply returns None so `integration: auto` falls back to the workflow's literal default. This eliminates the divergence Copilot flagged without changing observable behavior for either caller. * fix(integration): distinguish missing file from non-regular path Address Copilot review on PR #2421: `try_read_integration_json` was collapsing two distinct cases into a single `(None, None)` return: 1. `.specify/integration.json` truly missing — silent fallback is correct. 2. Path exists but is a directory, socket, or other non-regular file — this is a misconfiguration the CLI should surface loudly. Split the check: `exists()` falsey returns `(None, None)`; existing-but- not-a-regular-file returns `(None, IntegrationReadError(kind="os", ...))` so the CLI's loud-fail path produces an actionable error while the engine still treats it as a fallback to the workflow's literal default. * docs(workflow): clarify version pin, advisory integrations list, enum exemption - workflow.yml: fix comment that said 0.8.3 was first release with auto resolution; the pin is >=0.8.5 so the comment now matches the pin. - workflow.yml: clarify that requires.integrations.any is an advisory, non-exhaustive compatibility hint, not a closed set. - engine.py: clarify that the auto-sentinel exemption only skips enum membership; declared type is still enforced through _coerce_input. * fix(workflow): resolve auto sentinel for provided values; report stat errors Two Copilot findings fixed: 1. _resolve_inputs only resolved the ``integration: auto`` sentinel when it came from the input default. A caller explicitly providing ``{"integration": "auto"}`` (which the workflow prompt advertises as a valid value) bypassed _resolve_default and the literal "auto" leaked to dispatch. Provided values now go through the same resolution path as defaults, and the enum-membership exemption applies in both cases. Regression test added. 2. try_read_integration_json used Path.exists() / Path.is_file() as a pre-check. Both return False on some OSErrors (e.g. permission errors during stat), which silently treated an unreadable-but-present file as missing — the engine fell back without warning and the CLI failed to surface the loud error. The pre-check is gone: read_text() is attempted directly, FileNotFoundError means missing (silent fallback), IsADirectoryError and other OSErrors become loud IntegrationReadError. * fix(workflow): enforce declared type for string inputs, reject bool-as-number Two Copilot findings fixed: 1. _coerce_input previously coerced/validated only ``number`` and ``boolean`` types, so ``type: string`` silently accepted any Python value (numbers, lists, dicts). A YAML authoring mistake like ``type: string`` + ``default: 5`` slipped through. Strings are now required to actually be strings; non-strings raise ValueError, which surfaces as an ``invalid default`` error from validate_workflow. 2. ``type: number`` accepted ``default: true`` because ``bool`` is a subclass of ``int`` (``float(True) == 1.0``). Bools are now rejected explicitly in the number path so the YAML mistake fails fast. The boolean path is also tightened to reject non-bool / non-string values for symmetry. Comment on the auto-sentinel enum exemption updated to reflect the stronger guarantee. Regression tests added for both rejections. * fix(cli): drop unused normalize_integration_state import to satisfy ruff CI's `uvx ruff check src/` flagged this as F401: the symbol was imported under a private alias but never referenced. Tests stay green after removal.	2026-05-15 16:03:33 -05:00
Copilot	a00e679918	Add workflow engine with catalog system (#2158 ) * Initial plan * Add workflow engine with step registry, expression engine, catalog system, and CLI commands Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Add comprehensive tests for workflow engine (94 tests) Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback: do-while condition preservation and URL scheme validation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback, add CLI dispatch, interactive gates, and docs Review comments (7/7): - Add explanatory comment to empty except block - Implement workflow catalog download with cleanup on failure - Add input type coercion for number/boolean/enum - Fix example workflow to remove non-existent output references - Fix while_loop and if_then condition defaults (string 'false' → bool False) - Fix resume step index tracking with step_offset parameter CLI dispatch: - Add build_exec_args() and dispatch_command() to IntegrationBase - Override for Claude (skills: /speckit-specify), Gemini (-m flag), Codex (codex exec), Copilot (--agent speckit.specify) - CommandStep invokes installed commands by name via integration CLI - Add PromptStep for arbitrary inline prompts (10th step type) - Stream CLI output live to terminal (no silent blocking) - Remove timeout when streaming (user can Ctrl+C) - Ctrl+C saves state as PAUSED for clean resume Interactive gates: - Gate steps prompt [1] approve [2] reject in TTY - Fall back to PAUSED in non-interactive environments - Resume re-executes the gate for interactive prompting Documentation: - workflows/README.md — user guide - workflows/ARCHITECTURE.md — internals with Mermaid diagrams - workflows/PUBLISHING.md — catalog submission guide Tests: 94 → 122 workflow tests, 1362 total (all passing) * Fix ruff lint errors: unused imports, f-string placeholders, undefined name * Address second review: registry-backed validation, shell failures, loop/fan-out execution, URL validation - VALID_STEP_TYPES now queries STEP_REGISTRY dynamically - Shell step returns FAILED on non-zero exit code - Persist workflow YAML in run directory for reliable resume - Resume loads from run copy, falls back to installed workflow - Engine iterates while/do-while loops up to max_iterations - Engine expands fan-out per item with context.item - HTTPS URL validation for catalog workflow installs (HTTP allowed for localhost) - Fix catalog merge priority docstring (lower number wins) - Fix dispatch_command docstring (no build_exec_args_for_command) - Gate on_reject=retry pauses for re-prompt on resume - Update docs to 10 step types, add prompt step to tables and README * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Address third review: fan-out IDs, catalog guards, shell coercion, docs - Fan-out generates unique per-item step IDs and collects results - Catalog merge skips non-dict workflow entries (malformed data guard) - Shell step coerces run_cmd to str after expression evaluation - urlopen timeout=30 for catalog workflow installs - yaml.dump with sort_keys=False, allow_unicode=True for catalog configs - Document streaming timeout as intentionally unbounded (user Ctrl+C) - Document --allow-all-tools as required for non-interactive + future enhancement - Update test docstring and PUBLISHING.md to 10 step types with prompt * Validate final URL after redirects in catalog fetch urlopen follows redirects, so validate the response URL against the same HTTPS/localhost rules to prevent redirect-based downgrade attacks. * Address fourth review: filter arg eval, tags normalization, install redirect check - Filter arguments now evaluated via _evaluate_simple_expression() so default(42) returns int not string - Tags normalized: non-list/non-string values handled gracefully - Install URL redirect validation (same as catalog fetch) - Remove unused 'skipped' variable in catalog config parsing - Author 'github' → 'GitHub' in example workflow - Document nested step resume limitation (re-runs parent step) * Add explanatory comment to empty except ValueError block * Address fifth review: expression parsing, fan-out output, URL install, gate options - Move string literal parsing before operator detection in expressions so quoted strings with operators (e.g. 'a in b') are not mis-parsed - Fan-out: remove max_concurrency from persisted output, fix docstring to reflect sequential execution - workflow add: support URL sources with HTTPS/redirect validation, validate workflow ID is non-empty before writing files - Deduplicate local install logic via _validate_and_install_local() - Remove 'edit' gate option from speckit workflow (not implemented) * Add comments to empty except ValueError blocks in URL install * Address sixth review: operator precedence, fan_in cleanup, registry resilience, docs - Fix or/and operator precedence (or parsed first = lower precedence) - Restore context.fan_in after fan-in step completes - Catch JSONDecodeError in registry load for corrupted files - Replace print() with on_step_start callback (library-safe) - Gate validation warns when on_reject set but no reject option - Shell step: document shell=True security tradeoff - README: sdd-pipeline → speckit, parallel → sequential for fan-out - ARCHITECTURE.md: parallel → fan-out/fan-in in diagram * Address seventh review: string literal before pipe, type annotations, validate on install - Move string literal check above pipe filter parsing so 'a \| b' works - Fix type annotations: input_values list[str] \| None, run_id str \| None - Run validate_workflow() before installing from local path/URL - Remove duplicate string literal check from expression parser * Address eighth review: fan-out namespaced IDs, early return, catalog validation - Fan-out per-item step IDs use _fanout_{step_id}_{base}_{idx} namespace to avoid collisions with user-defined step IDs - Early return after fan-out loop when state is paused/failed/aborted - Catalog installs parse + validate downloaded YAML before registering, using definition metadata instead of catalog entry for registry * Address ninth review: populate catalog, fix indentation, priority, README - Add speckit workflow entry to catalog.json so it's discoverable - Fix shell step output dict indentation - Catalog add_catalog priority derived from max existing + 1 - README Quick Start clarified with install + local file examples * Address tenth review: max_iterations validation, catalog config guard, version alignment - Validate max_iterations is int >= 1 in while and do-while steps - Guard add_catalog against corrupted config (non-dict/non-list) - Align speckit_version requirement to >=0.6.1 (current package version) - Fan-out template validation uses separate seen_ids set to avoid false duplication errors with user-defined step IDs * Address eleventh review: command step fails without CLI, ID mismatch warning, state persistence - Command step returns FAILED when CLI not installed (was silent COMPLETED) - Catalog install warns on workflow ID vs catalog key mismatch - Engine persists state.save() before returning on unknown step type - Update tests to expect FAILED for command steps without CLI - Integration tests use shell steps for CLI-independent execution * Address twelfth review: type annotations, version examples, streaming docs, requires - Fix workflow_search type annotations (str \| None) - PUBLISHING.md: speckit_version >=0.15.0 → >=0.6.1 - Document that exit_code is captured and referenceable by later steps - Mark requires as declared-but-not-enforced (planned enhancement) - Note full stdout/stderr capture as planned enhancement * Enforce catalog key matches workflow ID (fail instead of warn) * Bundle speckit workflow: auto-install during specify init - Add workflows/speckit to pyproject.toml force-include for wheel builds - Add _locate_bundled_workflow() helper (mirrors _locate_bundled_extension) - Auto-install speckit workflow during specify init (after git extension) - Update all integration file inventory tests to expect workflow files * Address fourteenth review: prompt fails without CLI, resolved step data, fan-out normalization - PromptStep returns FAILED when CLI not installed (was silent COMPLETED) - Engine step_data prefers resolved values from step output - Fan-out normalizes output.results=[] for empty item lists - subprocess.run inherits stdout/stderr (no explicit sys.stdout) - Registry tests use issubset for extensibility * Address fifteenth review: fan_in docstring, gate defaults, validation guards, reserved prefix - FanInStep docstring: aggregate-only, no blocking semantics - FanInStep: guard output_config as dict, handle None - Gate validate: use same default options as execute - Validate inputs is dict and steps is list before iterating - Reserve _fanout_ prefix in step ID validation - PUBLISHING.md: remove unenforced checklist items, add _fanout_ note * Address sixteenth review: docs regex, fan_in try/finally, hyphenated dot-path keys - PUBLISHING.md: update ID regex docs to match implementation (single-char OK) - FanInStep: wrap expression evaluation in try/finally for context.fan_in - Expression dot-path: allow hyphens in keys before list index (e.g. run-tests[0]) * Make speckit workflow integration-agnostic, document Copilot CLI requirement - Workflow integration selectable via input (default: claude) - Each command step uses {{ inputs.integration }} instead of hardcoded copilot - Copilot docstring documents CLI requirement for workflow dispatch - Added install_url for Copilot CLI docs * Address seventeenth review: project checks, catalog robustness - Add .specify/ project check to workflow run/resume/status/search/info - remove_catalog validates config shape (dict + list) before indexing - _fetch_single_catalog validates response is a dict - _get_merged_workflows raises when all catalogs fail to fetch - add_catalog guards against non-dict catalog entries in config * Address eighteenth review: condition coercion, gate abort result, while default, cache guard, resume log - evaluate_condition treats plain 'false'/'true' strings as booleans - Gate abort returns StepResult(FAILED) instead of raising exception so step output is persisted in state for inspection - while_loop max_iterations optional (default 10), validation aligned - Catalog cache fallback catches invalid JSON gracefully - resume() appends workflow_finished log entry like execute() * Address nineteenth review: allow-all-tools opt-in, empty catalogs, abort dead code, while docstring - --allow-all-tools controlled by SPECKIT_ALLOW_ALL_TOOLS env var (default: 1) Set to 0 to disable automatic tool approval for Copilot CLI - Empty catalogs list falls back to built-in defaults (not an error) - Remove unreachable WorkflowAbortError catches from execute/resume (gate abort now returns StepResult(FAILED) instead of raising) - while_loop docstring updated: max_iterations is optional (default 10) * Address twentieth review: gate abort maps to ABORTED status, do-while max_iterations optional - Engine detects output.aborted from gate step and sets RunStatus.ABORTED (was unreachable — gate abort returned FAILED but status was always FAILED) - do-while max_iterations now optional (default 10), aligned with while_loop - do-while docstring and validation updated accordingly * Coerce default_options to dict, align bundled workflow ID regex with validator * Gate validates string options, prompt uses resolved integration, loop normalizes max_iterations * Use parentId:childId convention for nested step IDs - Fan-out per-item IDs use parentId:templateId:index (e.g. parallel:impl:0) - Reserve ':' in user step IDs (validation rejects) - Replaces _fanout_ prefix with cleaner namespacing - Expressions like {{ steps.parallel:impl:0.output.file }} work naturally * Validate workflow version is semantic versioning (X.Y.Z) * Schema version validation, strict semver, load_workflow docstring, preserve max_concurrency - Validate schema_version is '1.0' (reject unknown future schemas) - Strict semver regex: ^\d+\.\d+\.\d+$ (rejects 1.0.0beta etc.) - load_workflow docstring: 'parsed' not 'validated' - Keep max_concurrency in fan-out output (was dropped) - do_while docstring: engine re-evaluates step_config condition - ARCHITECTURE.md: document nested resume limitation * Path traversal prevention, loop step ID namespacing - RunState validates run_id is alphanumeric+hyphens (no path separators) - workflow_add validates catalog source doesn't escape workflows_dir - Loop iterations namespace nested step IDs as parentId:childId:iteration so multiple iterations don't overwrite each other in context/state --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-14 10:11:56 -05:00

12 Commits