github-spec-kit

mirror of https://github.com/github/spec-kit.git synced 2026-07-03 12:28:06 +08:00

Author	SHA1	Message	Date
Huy Do	affbf5ead5	feat(workflows): add from_json expression filter (#2961 ) * feat(workflows): add from_json expression filter Step outputs captured as strings could never become typed values in templates - the filter set was default/join/map/contains only, so e.g. a fan-out items: could never consume a step's JSON stdout. Add an arg-less from_json pipe filter with parse-or-raise semantics: invalid JSON or non-string input raises a clear ValueError rather than passing through silently. Fixes #2960 * fix(expressions): make from_json strict — reject any arguments Address review (#2961): from_json('x') and from_json() previously fell through to a silent passthrough of the unparsed value. Reject any parenthesized form with a clear error so mis-wired templates fail loudly. Rename test to ...parses_object (JSON under test is an object) and add coverage for the strict no-arguments behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(workflows): document the from_json expression filter Address Copilot review: the user-facing filter references omitted the newly added `from_json` filter. Add it to the ARCHITECTURE.md filter table (with the `{{ steps.emit.output.stdout \| from_json }}` example) and to the filter enumerations in workflows/README.md and docs/reference/workflows.md so the docs match the evaluator's capabilities. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(workflows): make from_json strictness reject trailing tokens; fix docstring Address Copilot review: - Strictness only rejected parenthesized forms, so typos like `\| from_json)` or `\| from_json extra` still fell through to the unknown-filter path and silently returned the unparsed value. Match on the leading filter token and require the whole filter to be exactly `from_json`, so every mis-wired form raises. Extend the rejection test to cover the trailing-token cases. - The module docstring claimed "no imports", which is misleading now that the module imports `json`. Reword to state the actual sandbox guarantee: templates cannot do file I/O, import modules, or run arbitrary code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-17 13:43:26 -05:00
Copilot	00bff788c9	Add `init` workflow step to bootstrap projects like `specify init` (#2838 ) * Initial plan * Add init workflow step to bootstrap projects like `specify init` * Address review: simplify stderr capture and extract VALID_SCRIPT_TYPES * Address review: fail fast on non-empty dir, stdout fallback, README force fix * Populate exit_code/stdout/stderr in non-empty-dir fast-fail * fix: address three unresolved review comments in InitStep - Use `with os.scandir(...)` context manager so the iterator is always closed even when `any()` short-circuits, preventing file-descriptor leaks in long-running workflow runs. - Guard `os.chdir(prev_cwd)` in the `finally` block with a try/except so an `OSError` (e.g. directory deleted) doesn't bypass returning the captured `StepResult`. - Reject non-string `script` values in `validate()` with a clear error message, rather than silently passing them through to become `--script True` at runtime. * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix: remove no_git and branch_numbering options removed upstream The --no-git and --branch-numbering flags were removed from `specify init` on main. Update InitStep to drop these unsupported config fields and fix tests accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review — integration defaults, integration_options, engine-owned dirs - Apply DEFAULT_INIT_INTEGRATION fallback when neither step config nor workflow context provides an integration, so output.integration always reflects the actual integration used. - Add integration_options config field to support --integration-options passthrough (required for generic integration and --skills mode). - Exclude .specify/ from the non-empty directory fast-fail check so that here: true works when the engine has already created its run-state directory before steps execute. - Note: mix_stderr=False is not needed — Click 8.2+ captures stderr separately by default and the existing try/except handles access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: implicitly add --force when only engine-owned dirs exist When the workflow engine creates .specify/workflows/runs/ before steps execute, the directory is technically non-empty. Previously, specify init would prompt for confirmation (hanging in unattended mode) unless the user explicitly set force: true. Now the step detects that only engine-owned directories (.specify/) are present and implicitly adds --force so init proceeds without user interaction. Also fixes the test to exercise the implicit-force path rather than passing force: True explicitly (which bypassed the check entirely). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: derive VALID_SCRIPT_TYPES from shared constant, fail fast on OSError, include all resolved fields in output - Derive VALID_SCRIPT_TYPES from SCRIPT_TYPE_CHOICES in _agent_config so the valid set cannot drift from the specify init CLI. - Fail fast with a clear error when os.scandir() raises OSError (e.g. permission denied) instead of silently treating the directory as empty. - Include preset, force, and ignore_agent_tools in all output dicts (both fast-fail and normal paths) for consistent interpolation and debugging downstream. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: populate stderr from stdout on older Click, fix force comment wording - When Click does not expose result.stderr (older versions where stderr is mixed into stdout), use stdout as stderr on non-zero exit so workflows can consistently read steps.<id>.output.stderr for errors. - Update README inline comment for force: wording to say 'when target directory already exists' rather than 'non-empty directory', matching the actual specify init behavior for the project: form. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: build argv flags before early returns, use any() for dir scan - Move argv flag-building (--integration, --script, --preset, --ignore-agent-tools) before the non-empty-dir and OSError early returns so output['argv'] always reflects the complete command. - --force is appended after the check since it may be set implicitly. - Replace list comprehension with any() generator expression to short-circuit without allocating a full list of DirEntry objects. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: only treat .specify as engine-owned when it is a real directory A file or symlink named .specify should not be excluded from the non-empty check. Use entry.is_dir(follow_symlinks=False) to ensure only an actual directory is considered engine-owned content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard implicit force for engine dirs only, fix integration fallback order - Only set implicit --force when engine-owned directories (.specify/) are actually present. A completely empty directory no longer gets --force added unnecessarily. - Fix integration resolution precedence: resolve step config expression first, then fall back to workflow default (also resolved), then to DEFAULT_INIT_INTEGRATION. Previously, a step expression resolving to falsy would bypass the workflow default entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Manfred Riem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-17 11:46:51 -05:00
Copilot	c52ccd7dc7	Add workflow step catalog — community-installable step types (#2394 ) * Initial plan * Add workflow step catalog: StepRegistry, StepCatalog, CLI commands, and tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/2885e646-477d-4df8-b9a3-06d8cb29e748 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Potential fix for pull request finding 'An assert statement has a side-effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Address PR review: path traversal, cache robustness, collision check, failed-to-load display - Add resolve()+relative_to() path traversal guards in workflow_step_add and workflow_step_remove to prevent directory escape via step_id - Harden _is_url_cache_valid in both StepCatalog and WorkflowCatalog to coerce fetched_at to float and catch TypeError/ValueError - Check STEP_REGISTRY and StepRegistry before installing to prevent collisions with built-in step types or already-installed steps - Show 'Custom (installed, failed to load)' section in workflow step list for steps in the registry that failed to load into STEP_REGISTRY * Fix StepRegistry shape validation and StepCatalog empty-YAML handling Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Polish: rename _default to default_registry, strengthen unreadable-file test Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address PR review: atomic install, hostname validation, cache resilience, no dynamic imports in list/info Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Fix shutil.move with existing step_dir: remove before move to avoid subdirectory nesting Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Call load_custom_steps at execution time; enforce hostname in _safe_fetch and _validate_url Agent-Logs-Url: https://github.com/github/spec-kit/sessions/73865880-fb25-4061-a43e-4e4b4d1c4de6 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Wrap YAML parsing in try/except; atomic step install via os.rename() under same fs Agent-Logs-Url: https://github.com/github/spec-kit/sessions/ff915bc5-ec7e-4e6a-b505-35f5795250df Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Validate YAML root is a dict in _load_catalog_config and workflow_step_add; fix WorkflowCatalog hostname validation Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Fix load_custom_steps() package imports and add reserved step ID validation * Move _re/_sys imports out of loop and _RESERVED_STEP_IDS to module level * Address review: collision-resistant module names, extra_files support, remove orphan dir * Harden extra_files: warn on non-dict, resolve symlinks in path traversal check * Switch _safe_fetch and StepCatalog._fetch_single_catalog to use open_url for auth consistency * Harden step_id validation against path-segment tricks; raise on StepRegistry.save() OSError * Clean up sys.modules on broken step packages; handle StepValidationError in step add/remove * Address review thread: int-coerce priorities, sys.modules cleanup, _require_specify_project, registry-first remove * fix: normalize workflow step catalog metadata fallbacks * fix: address latest workflow step and catalog review findings * Handle non-string extra_files keys in workflow step add * Harden StepRegistry symlink reads and extra_files path/URL validation * Harden custom step loader and step remove against symlinks and OSError * Fix StepCatalog.search() to coerce non-string fields before joining * Fix WorkflowCatalog YAML parsing error handling and isinstance checks * Harden step registry save and custom step/catalog ID handling * Harden cache validation and staging OSError handling * Address review: reorder symlink guard and split mixed test - Move symlink-parent check before is_dir() in load_custom_steps() so we never stat an external target through a symlink - Split test_get_merged_steps_normalizes_list_ids_to_strings into two focused tests: one for list-id normalization, one for get_step_info return values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review: symlink-before-stat in loader, restore registry on rmtree failure - load_custom_steps(): check is_symlink() before is_dir() on step directories so symlinked entries are skipped without statting external targets - workflow_step_remove: restore the registry entry when shutil.rmtree() fails so filesystem and registry state stay consistent and a future 'step add' isn't blocked Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Harden step_id validation and file-write error handling - _validate_step_id_or_exit: reject whitespace-only/padded IDs, Windows-invalid characters (<>:"\|?), control characters, trailing dots/spaces, and Windows reserved device names (con, nul, etc.) - Wrap step.yml/__init__.py staging writes in OSError handler - Wrap extra_files disk writes (mkdir + write_bytes) in OSError handler that names the failing relative path - Registry rollback on rmtree failure: restore verbatim metadata and emit a warning if the restore itself fails Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Address review: cache symlink guard, verbatim registry rollback, Windows test fix - StepCatalog: add _is_cache_path_safe() guard that checks for symlinks in .specify/workflows/steps/.cache path; skip cache reads and writes when any component is symlinked to prevent writes outside project root - Registry rollback: write metadata directly to registry.data['steps'] and call save() instead of using add() which overwrites timestamps - temp_dir fixture: use ignore_errors=True on Windows to avoid flaky teardown from locked file handles (WinError 32) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Simplify exec_module call by removing redundant nested try/except Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix empty YAML tolerance in WorkflowCatalog.add_catalog, scope ignore_errors to Windows - WorkflowCatalog.add_catalog(): treat None from yaml.safe_load() (empty file) as an empty mapping instead of raising 'corrupted' - temp_dir fixture: limit ignore_errors to sys.platform == 'win32' so real cleanup issues surface on non-Windows platforms Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Chain exceptions in _load_catalog_config for both catalog classes Add 'from exc' to preserve root cause in tracebacks while keeping clean user-facing messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make default catalog tests hermetic by isolating HOME Monkeypatch Path.home() to project_dir and clear catalog env vars so tests don't break on machines with a real ~/.specify/step-catalogs.yml or ~/.specify/workflow-catalogs.yml. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix falsy ID handling in _get_merged_steps for list-based catalogs Check for None explicitly instead of using 'or' which drops valid falsy IDs like 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Compare reserved step IDs case-insensitively for filesystem safety On case-insensitive filesystems (Windows, common macOS), variants like STEP-REGISTRY.JSON would collide with the actual registry file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add explanatory comments to intentional empty except blocks Document why cache-read failures are silently ignored in both WorkflowCatalog and StepCatalog _fetch_single_catalog methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-16 18:03:45 -05:00
Huy Do	7bab0568c5	feat(workflows): add continue_on_error step field for non-halting failures (#2663 ) * feat(workflows): add continue_on_error step field Adds an optional `continue_on_error: bool` field on every step. When set to `true` and the step fails, the engine records the result (`exit_code`, `stderr` on `steps.<id>.output` plus `status` as a sibling key on `steps.<id>`) and continues to the next sibling step instead of halting the run. Downstream `if`, `switch`, or `gate` steps can then branch on `{{ steps.<id>.output.exit_code }}` to route the recovery path. Engine details -------------- `WorkflowEngine._execute_steps` now consults the step config when a step returns `StepStatus.FAILED`: - Gate aborts (`output.aborted`) always halt the run — operator decisions take precedence over the flag. - Otherwise, if `continue_on_error` is the literal `True`, log a `step_continue_on_error` event and proceed to the next sibling. The runtime check uses identity comparison (`is True`) rather than truthiness, so truthy non-bool values like the string `"true"` cannot silently change run semantics even if a caller bypasses `validate_workflow()`. - Otherwise, behave as before: log `step_failed`, set `RunStatus.FAILED`, and return. Validation ---------- `_validate_steps` rejects non-bool values for `continue_on_error`. Coerced strings like `"true"` are not accepted so authoring mistakes surface at validation time rather than silently changing run semantics. Tests ----- `TestContinueOnError` in `tests/test_workflows.py` (8 tests): - `test_undeclared_failure_halts_run` — default halt behaviour. - `test_declared_and_fired_continues_run` — flag + fail → continue. - `test_declared_but_step_succeeded_is_noop` — flag + success → no-op. - `test_if_branch_routes_around_failure` — end-to-end recovery. - `test_gate_abort_still_halts_with_continue_on_error` — abort always halts. - `test_validation_rejects_non_bool_continue_on_error` — `"true"` rejected at validation. - `test_validation_accepts_bool_continue_on_error` — `true`/`false` pass cleanly. - `test_engine_ignores_truthy_non_bool_continue_on_error` — defense-in-depth: engine ignores string `"true"` even when validation is bypassed. Rebased onto current upstream/main (post #2664 merge); the new `TestContinueOnError` class sits immediately after upstream's `TestContextRunId` so the two feature suites coexist cleanly. Closes #2591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): restore runtime context section, clarify gate prompt Two Copilot findings on `d0b9e00`: 1. The `### Runtime Context` documentation for `{{ context.* }}` was lost during the rebase onto current main (the squash dropped the anchor where #2664 had added it). Restored under `## Expressions` so users can find `context.run_id` semantics and examples. 2. The continue_on_error example gate had message "Retry or skip?" but used the default `options: [approve, reject]` with `on_reject: skip`, which implied an automatic retry path that gates do not provide. Reworded the message to match the actual approve/reject semantics and added an explicit note that retry requires either custom gate options + downstream branching or a wrapper loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify continue_on_error scope — returned FAILED only Copilot finding on `d0b9e00`: The README's "Error Handling" intro implied `continue_on_error` covers "any other runtime error raised during step execution", but the engine only consults the flag when a step returns `StepResult(status=FAILED, ...)`. Exceptions raised out of `step_impl.execute()` propagate to `WorkflowEngine.execute()`, where the catch-all logs `workflow_failed` and re-raises — the step result is never recorded, and the flag is never consulted. Audited the whole PR diff for the same overclaim: 1. workflows/README.md — main fix. Reworded the Error Handling intro to "any step that returns StepResult(status=FAILED, ...)" and promoted the parenthetical structural-validation note into the Notes block. Added a new "Scope: returned failures only" note that names the exception path explicitly and tells step authors how to bring the flag into scope for exceptional code (catch internally and return FAILED with the failure encoded in `output`). 2. tests/test_workflows.py — section comment used "when an executable step fails", same ambiguity. Tightened to "when a step returns StepResult(status=FAILED, ...)" and added a sentence calling out that unhandled exceptions are out of scope. 3. src/specify_cli/workflows/engine.py — already correct ("any step that returns FAILED" in the validator comment; "lets the pipeline route around the failure" in the execute path). No change. Engine semantics and test bodies are unchanged. Docs-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify on_reject:skip semantics — engine returns COMPLETED, not auto-skip Copilot finding on `b8982a7`: The README example's gate message said "reject to skip the rest of this branch", and the explanatory paragraph claimed [approve, reject] map to "continue" vs "skip the rest of this branch". The engine does not implement automatic branch-skipping. `on_reject: skip` returns `StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step runs unconditionally unless the author wires a downstream `if` reading `{{ steps.<gate-id>.output.choice }}`. Two fixes: 1. Restructured the YAML example so it actually demonstrates the manual-branching pattern: added a `recover` if-step after the gate that conditions on `steps.review.output.choice == 'approve'`. Now the example shows the real workflow author's responsibility instead of implying the engine does it. 2. Replaced the trailing paragraph with three precise notes: - both gate options return COMPLETED; `on_reject: skip` controls abort behaviour only, not sibling-skipping - all three `on_reject` values enumerated with their actual engine semantics (FAILED+aborted / COMPLETED / PAUSED) - the original retry-loop guidance retained as the third bullet Updated the gate message in the example to match — "reject to leave the failure recorded and move on" instead of "reject to skip the rest of this branch". Audited the whole PR diff for the same overclaim: no other instance. Engine semantics, validation, and test bodies are unchanged. Docs-only. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): clarify gate's role — surfaces, doesn't programmatically branch Audit follow-up to `393ac6b` — three sites repeated the same minor overclaim about gates being one of the "branch on it" step types alongside `if` and `switch`: 1. workflows/README.md (the "downstream `if`, `switch`, or `gate` steps can branch on it" sentence introducing the example) 2. engine.py:236 (validator inline comment) 3. engine.py:657 (execute-path inline comment) A `gate` step does not have a `condition` or `expression` field — it only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36). Programmatic branching happens in `if`/`switch`; a gate surfaces the value to a human operator via message interpolation, and the operator's choice is recorded in `output.choice` for a subsequent `if`/`switch` to route on. Reworded all three sites consistently: "a downstream `if` or `switch` can branch on it (or a `gate` can surface it to the operator via message interpolation)". The README example already demonstrates this distinction — the gate carries `{{ }}` template variables in its message and the `recover` if-step downstream is what actually branches on the choice. Engine semantics, validation, and test bodies are unchanged. Docs-only on the README; comment-only on engine.py. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(workflows): use qualified StepStatus.* instead of bare FAILED/COMPLETED/PAUSED Three Copilot inline comments on workflows/README.md lines 226, 282, 288 flagged that ``StepResult(status=FAILED, ...)`` is not valid Python — ``StepResult.status`` is a ``StepStatus`` enum value, so the documented form should be ``StepStatus.FAILED``. Audited the whole PR diff for the same shorthand. The bare unqualified form appears in three files added/modified by this PR: 1. workflows/README.md (6 sites) — three ``StepResult(status=FAILED, ...)`` parentheticals, plus the on_reject Notes bullet listing the three step statuses (``FAILED``, ``COMPLETED``, ``PAUSED``). 2. tests/test_workflows.py (4 sites) — section header for TestContinueOnError, two test-method docstrings, one inline comment about a gate's TTY-fallback behaviour. 3. src/specify_cli/workflows/engine.py (1 site) — the validator inline comment added in `d0b9e00` said "returns FAILED" where the engine code itself uses ``StepStatus.FAILED``. All 11 sites normalised to the qualified ``StepStatus.<name>`` form so the docs / test docstrings / inline comments match what readers will actually find in the engine code and the tests. Engine semantics, validation, and test bodies are unchanged. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 10:10:07 -05:00
Huy Do	c6afe4cde1	feat(workflows): expose {{ context.run_id }} template variable (#2664 ) * feat(workflows): expose `{{ context.run_id }}` template variable Closes #2590. Surfaces the engine-assigned run id (the same 8-character hex string Spec Kit prints as `Run ID:` at the end of `workflow run`) as a workflow template variable so YAML authors can reference it from shell `run:`, command `input.args:`, switch `expression:`, and any other field that already evaluates `{{ ... }}` templates. ### Why The run id is the natural join key between a Spec Kit workflow run and downstream artifacts, telemetry, or per-run scratch state. Today the operator sees it in stdout but workflows themselves cannot reference it — there was no way to stamp a log line, name a scratch directory, or tag an artifact with the same id Spec Kit assigned. The three motivating use cases from the issue: 1. Telemetry / observability — stamp logs and events with the run id so external systems can join workflow runs to downstream artifacts. 2. Per-run scratch / isolation — interactive operator commands that need their own state directory under `/tmp/run-<id>/`. 3. Run-id in artifact metadata — stable join key from artifact back to the producing run. ### Implementation `StepContext.run_id` is already populated by `WorkflowEngine` in both `execute()` and `resume()`. The only gap was the template namespace builder. `_build_namespace` (in `workflows/expressions.py`) now adds a `context` key alongside the existing `inputs`, `steps`, `item`, and `fan_in` namespaces: ```python ns["context"] = {"run_id": run_id} ``` The value is always present (even outside a run) and falls back to an empty string when no run is active. Workflows referencing `{{ context.run_id }}` therefore never error — a hard requirement from the issue's acceptance criteria for dry-run, validation, and ad-hoc evaluator usage. ### Default behaviour preserved Workflows that do not reference `{{ context.run_id }}` are byte-equivalent to before this change. The `context` namespace is added unconditionally to keep template resolution branch-free, but its presence has no observable effect when nothing references it. ### Tests `TestExpressions` (unit-level) gains three tests: - `test_context_run_id_resolves` — direct lookup against a `StepContext(run_id=...)`. - `test_context_run_id_defaults_to_empty_when_unset` — graceful default outside a run context. - `test_context_run_id_string_interpolation` — mixed template (e.g. `"RUN_ID={{ context.run_id }}"`). `TestContextRunId` (end-to-end) covers the three step types the acceptance criteria called out: - `test_shell_run_resolves_run_id` — `run:` field substitution, verified via captured stdout. - `test_command_input_args_resolves_run_id` — `input.args:` resolution, captured in step output even when CLI dispatch is unavailable (the artifact-metadata use case). - `test_switch_expression_matches_on_run_id` — switch matches against the resolved value, proving the run id is a first-class value in the expression engine, not just an interpolation token. - `test_workflow_without_context_reference_unchanged` — locks the byte-equivalent default required by the issue. ### Docs `workflows/README.md` gains a "Runtime Context" subsection under "Expressions" documenting the new namespace and the three canonical use patterns (telemetry, per-run scratch, artifact metadata). * test(workflows): drop inline double-quotes in run_id shell tests `test_shell_run_resolves_run_id` and `test_switch_expression_matches_on_run_id` used `run: 'echo "RUN_ID={{ context.run_id }}"'` with inner double-quotes around the echo argument. Bash/sh strips those quotes before invoking echo, but cmd.exe (used on Windows when `shell=True`) treats them as literal characters and emits `"RUN_ID=abc12345"` — failing the exact-match assertion. Linux passed; all three Windows-latest matrix entries failed with `assert '"RUN_ID=abc12345"' == 'RUN_ID=abc12345'`. Resolve by dropping the inner double-quotes (the value has no spaces or shell metacharacters) and wrapping the YAML scalar in plain double-quotes the same way other shell-step tests in this file do (e.g. `run: "echo b-saw-..."`). Behaviour-equivalent on POSIX, portable to cmd.exe.	2026-05-27 13:00:58 -05:00
Quratulain-bilal	409ec59704	fix(workflow): support integration: auto to follow project's initialized AI (#2421 ) * fix(workflow): support integration: auto to follow project's initialized AI Closes #2406 (squashed) * fix(workflow): combine JSONDecodeError and UnicodeDecodeError handling Address Copilot feedback: UnicodeDecodeError can be raised by both read_text() and json.loads(), so combining the handlers ensures both cases produce a consistent, clear error message. * fix(workflows): honor integration_state schema guard and modern state in 'integration: auto' Three Copilot follow-ups on PR #2421: 1. engine.py:799 — `_load_project_integration` was bypassing the same schema guard `_read_integration_json` enforces. It now reads the schema field directly, returns None on a future schema (so the workflow falls back to the literal 'auto' default rather than guessing), and routes through `normalize_integration_state` / `default_integration_key` so modern installs that record `default_integration` / `installed_integrations` (without the legacy top-level `integration` field) resolve correctly. 2. test_workflows.py — added two regression cases: - `integration: auto` resolves a modern normalized state file - `integration: auto` falls back when the state file declares a newer `integration_state_schema` than this CLI supports 3. test_cli.py — added a CLI-level regression for the `UnicodeDecodeError` branch in `_read_integration_json` to match the existing malformed-JSON coverage. * refactor(integration): extract shared try_read_integration_json helper Address Copilot review on PR #2421: Both `_read_integration_json` (CLI) and `_load_project_integration` (workflow engine) were parsing `.specify/integration.json` independently, duplicating the schema guard and risking drift between the two readers. Extract the parse + schema validation into a single low-level helper `try_read_integration_json` in `integration_state.py` that returns either the normalized state or a structured `IntegrationReadError`. Both callers now delegate to this helper: - CLI keeps its loud-fail UX: each error kind ("decode", "os", "not_object", "schema_too_new") is translated into the existing console message + typer.Exit(1). - Engine keeps its silent fallback: any error simply returns None so `integration: auto` falls back to the workflow's literal default. This eliminates the divergence Copilot flagged without changing observable behavior for either caller. * fix(integration): distinguish missing file from non-regular path Address Copilot review on PR #2421: `try_read_integration_json` was collapsing two distinct cases into a single `(None, None)` return: 1. `.specify/integration.json` truly missing — silent fallback is correct. 2. Path exists but is a directory, socket, or other non-regular file — this is a misconfiguration the CLI should surface loudly. Split the check: `exists()` falsey returns `(None, None)`; existing-but- not-a-regular-file returns `(None, IntegrationReadError(kind="os", ...))` so the CLI's loud-fail path produces an actionable error while the engine still treats it as a fallback to the workflow's literal default. * docs(workflow): clarify version pin, advisory integrations list, enum exemption - workflow.yml: fix comment that said 0.8.3 was first release with auto resolution; the pin is >=0.8.5 so the comment now matches the pin. - workflow.yml: clarify that requires.integrations.any is an advisory, non-exhaustive compatibility hint, not a closed set. - engine.py: clarify that the auto-sentinel exemption only skips enum membership; declared type is still enforced through _coerce_input. * fix(workflow): resolve auto sentinel for provided values; report stat errors Two Copilot findings fixed: 1. _resolve_inputs only resolved the ``integration: auto`` sentinel when it came from the input default. A caller explicitly providing ``{"integration": "auto"}`` (which the workflow prompt advertises as a valid value) bypassed _resolve_default and the literal "auto" leaked to dispatch. Provided values now go through the same resolution path as defaults, and the enum-membership exemption applies in both cases. Regression test added. 2. try_read_integration_json used Path.exists() / Path.is_file() as a pre-check. Both return False on some OSErrors (e.g. permission errors during stat), which silently treated an unreadable-but-present file as missing — the engine fell back without warning and the CLI failed to surface the loud error. The pre-check is gone: read_text() is attempted directly, FileNotFoundError means missing (silent fallback), IsADirectoryError and other OSErrors become loud IntegrationReadError. * fix(workflow): enforce declared type for string inputs, reject bool-as-number Two Copilot findings fixed: 1. _coerce_input previously coerced/validated only ``number`` and ``boolean`` types, so ``type: string`` silently accepted any Python value (numbers, lists, dicts). A YAML authoring mistake like ``type: string`` + ``default: 5`` slipped through. Strings are now required to actually be strings; non-strings raise ValueError, which surfaces as an ``invalid default`` error from validate_workflow. 2. ``type: number`` accepted ``default: true`` because ``bool`` is a subclass of ``int`` (``float(True) == 1.0``). Bools are now rejected explicitly in the number path so the YAML mistake fails fast. The boolean path is also tightened to reject non-bool / non-string values for symmetry. Comment on the auto-sentinel enum exemption updated to reflect the stronger guarantee. Regression tests added for both rejections. * fix(cli): drop unused normalize_integration_state import to satisfy ruff CI's `uvx ruff check src/` flagged this as F401: the symbol was imported under a private alias but never referenced. Tests stay green after removal.	2026-05-15 16:03:33 -05:00
Manfred Riem	02a1d610df	docs: add workflows reference, reorganize into docs/reference/, and add --version flag (#2244 ) * docs: add workflows reference, reorganize into docs/reference/, and add --version flag - Move integrations.md, extensions.md, presets.md into docs/reference/ - New docs/reference/workflows.md: command reference for all workflow commands, built-in SDD Cycle workflow with Mermaid diagram, step types, expressions, input types, state/resume, and FAQ - Rename workflow input feature_name to spec with prompt 'Describe what you want to build' to match speckit.specify command terminology - Add --version / -V flag to root specify command with tests - Update docs/toc.yml, README.md links, and docs/upgrade.md cross-reference to use reference/ paths - Add workflow command to README CLI reference table * docs: update speckit_version requirement to >=0.7.2 in workflow example	2026-04-16 13:34:08 -05:00
Copilot	a00e679918	Add workflow engine with catalog system (#2158 ) * Initial plan * Add workflow engine with step registry, expression engine, catalog system, and CLI commands Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Add comprehensive tests for workflow engine (94 tests) Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback: do-while condition preservation and URL scheme validation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> * Address review feedback, add CLI dispatch, interactive gates, and docs Review comments (7/7): - Add explanatory comment to empty except block - Implement workflow catalog download with cleanup on failure - Add input type coercion for number/boolean/enum - Fix example workflow to remove non-existent output references - Fix while_loop and if_then condition defaults (string 'false' → bool False) - Fix resume step index tracking with step_offset parameter CLI dispatch: - Add build_exec_args() and dispatch_command() to IntegrationBase - Override for Claude (skills: /speckit-specify), Gemini (-m flag), Codex (codex exec), Copilot (--agent speckit.specify) - CommandStep invokes installed commands by name via integration CLI - Add PromptStep for arbitrary inline prompts (10th step type) - Stream CLI output live to terminal (no silent blocking) - Remove timeout when streaming (user can Ctrl+C) - Ctrl+C saves state as PAUSED for clean resume Interactive gates: - Gate steps prompt [1] approve [2] reject in TTY - Fall back to PAUSED in non-interactive environments - Resume re-executes the gate for interactive prompting Documentation: - workflows/README.md — user guide - workflows/ARCHITECTURE.md — internals with Mermaid diagrams - workflows/PUBLISHING.md — catalog submission guide Tests: 94 → 122 workflow tests, 1362 total (all passing) * Fix ruff lint errors: unused imports, f-string placeholders, undefined name * Address second review: registry-backed validation, shell failures, loop/fan-out execution, URL validation - VALID_STEP_TYPES now queries STEP_REGISTRY dynamically - Shell step returns FAILED on non-zero exit code - Persist workflow YAML in run directory for reliable resume - Resume loads from run copy, falls back to installed workflow - Engine iterates while/do-while loops up to max_iterations - Engine expands fan-out per item with context.item - HTTPS URL validation for catalog workflow installs (HTTP allowed for localhost) - Fix catalog merge priority docstring (lower number wins) - Fix dispatch_command docstring (no build_exec_args_for_command) - Gate on_reject=retry pauses for re-prompt on resume - Update docs to 10 step types, add prompt step to tables and README * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Address third review: fan-out IDs, catalog guards, shell coercion, docs - Fan-out generates unique per-item step IDs and collects results - Catalog merge skips non-dict workflow entries (malformed data guard) - Shell step coerces run_cmd to str after expression evaluation - urlopen timeout=30 for catalog workflow installs - yaml.dump with sort_keys=False, allow_unicode=True for catalog configs - Document streaming timeout as intentionally unbounded (user Ctrl+C) - Document --allow-all-tools as required for non-interactive + future enhancement - Update test docstring and PUBLISHING.md to 10 step types with prompt * Validate final URL after redirects in catalog fetch urlopen follows redirects, so validate the response URL against the same HTTPS/localhost rules to prevent redirect-based downgrade attacks. * Address fourth review: filter arg eval, tags normalization, install redirect check - Filter arguments now evaluated via _evaluate_simple_expression() so default(42) returns int not string - Tags normalized: non-list/non-string values handled gracefully - Install URL redirect validation (same as catalog fetch) - Remove unused 'skipped' variable in catalog config parsing - Author 'github' → 'GitHub' in example workflow - Document nested step resume limitation (re-runs parent step) * Add explanatory comment to empty except ValueError block * Address fifth review: expression parsing, fan-out output, URL install, gate options - Move string literal parsing before operator detection in expressions so quoted strings with operators (e.g. 'a in b') are not mis-parsed - Fan-out: remove max_concurrency from persisted output, fix docstring to reflect sequential execution - workflow add: support URL sources with HTTPS/redirect validation, validate workflow ID is non-empty before writing files - Deduplicate local install logic via _validate_and_install_local() - Remove 'edit' gate option from speckit workflow (not implemented) * Add comments to empty except ValueError blocks in URL install * Address sixth review: operator precedence, fan_in cleanup, registry resilience, docs - Fix or/and operator precedence (or parsed first = lower precedence) - Restore context.fan_in after fan-in step completes - Catch JSONDecodeError in registry load for corrupted files - Replace print() with on_step_start callback (library-safe) - Gate validation warns when on_reject set but no reject option - Shell step: document shell=True security tradeoff - README: sdd-pipeline → speckit, parallel → sequential for fan-out - ARCHITECTURE.md: parallel → fan-out/fan-in in diagram * Address seventh review: string literal before pipe, type annotations, validate on install - Move string literal check above pipe filter parsing so 'a \| b' works - Fix type annotations: input_values list[str] \| None, run_id str \| None - Run validate_workflow() before installing from local path/URL - Remove duplicate string literal check from expression parser * Address eighth review: fan-out namespaced IDs, early return, catalog validation - Fan-out per-item step IDs use _fanout_{step_id}_{base}_{idx} namespace to avoid collisions with user-defined step IDs - Early return after fan-out loop when state is paused/failed/aborted - Catalog installs parse + validate downloaded YAML before registering, using definition metadata instead of catalog entry for registry * Address ninth review: populate catalog, fix indentation, priority, README - Add speckit workflow entry to catalog.json so it's discoverable - Fix shell step output dict indentation - Catalog add_catalog priority derived from max existing + 1 - README Quick Start clarified with install + local file examples * Address tenth review: max_iterations validation, catalog config guard, version alignment - Validate max_iterations is int >= 1 in while and do-while steps - Guard add_catalog against corrupted config (non-dict/non-list) - Align speckit_version requirement to >=0.6.1 (current package version) - Fan-out template validation uses separate seen_ids set to avoid false duplication errors with user-defined step IDs * Address eleventh review: command step fails without CLI, ID mismatch warning, state persistence - Command step returns FAILED when CLI not installed (was silent COMPLETED) - Catalog install warns on workflow ID vs catalog key mismatch - Engine persists state.save() before returning on unknown step type - Update tests to expect FAILED for command steps without CLI - Integration tests use shell steps for CLI-independent execution * Address twelfth review: type annotations, version examples, streaming docs, requires - Fix workflow_search type annotations (str \| None) - PUBLISHING.md: speckit_version >=0.15.0 → >=0.6.1 - Document that exit_code is captured and referenceable by later steps - Mark requires as declared-but-not-enforced (planned enhancement) - Note full stdout/stderr capture as planned enhancement * Enforce catalog key matches workflow ID (fail instead of warn) * Bundle speckit workflow: auto-install during specify init - Add workflows/speckit to pyproject.toml force-include for wheel builds - Add _locate_bundled_workflow() helper (mirrors _locate_bundled_extension) - Auto-install speckit workflow during specify init (after git extension) - Update all integration file inventory tests to expect workflow files * Address fourteenth review: prompt fails without CLI, resolved step data, fan-out normalization - PromptStep returns FAILED when CLI not installed (was silent COMPLETED) - Engine step_data prefers resolved values from step output - Fan-out normalizes output.results=[] for empty item lists - subprocess.run inherits stdout/stderr (no explicit sys.stdout) - Registry tests use issubset for extensibility * Address fifteenth review: fan_in docstring, gate defaults, validation guards, reserved prefix - FanInStep docstring: aggregate-only, no blocking semantics - FanInStep: guard output_config as dict, handle None - Gate validate: use same default options as execute - Validate inputs is dict and steps is list before iterating - Reserve _fanout_ prefix in step ID validation - PUBLISHING.md: remove unenforced checklist items, add _fanout_ note * Address sixteenth review: docs regex, fan_in try/finally, hyphenated dot-path keys - PUBLISHING.md: update ID regex docs to match implementation (single-char OK) - FanInStep: wrap expression evaluation in try/finally for context.fan_in - Expression dot-path: allow hyphens in keys before list index (e.g. run-tests[0]) * Make speckit workflow integration-agnostic, document Copilot CLI requirement - Workflow integration selectable via input (default: claude) - Each command step uses {{ inputs.integration }} instead of hardcoded copilot - Copilot docstring documents CLI requirement for workflow dispatch - Added install_url for Copilot CLI docs * Address seventeenth review: project checks, catalog robustness - Add .specify/ project check to workflow run/resume/status/search/info - remove_catalog validates config shape (dict + list) before indexing - _fetch_single_catalog validates response is a dict - _get_merged_workflows raises when all catalogs fail to fetch - add_catalog guards against non-dict catalog entries in config * Address eighteenth review: condition coercion, gate abort result, while default, cache guard, resume log - evaluate_condition treats plain 'false'/'true' strings as booleans - Gate abort returns StepResult(FAILED) instead of raising exception so step output is persisted in state for inspection - while_loop max_iterations optional (default 10), validation aligned - Catalog cache fallback catches invalid JSON gracefully - resume() appends workflow_finished log entry like execute() * Address nineteenth review: allow-all-tools opt-in, empty catalogs, abort dead code, while docstring - --allow-all-tools controlled by SPECKIT_ALLOW_ALL_TOOLS env var (default: 1) Set to 0 to disable automatic tool approval for Copilot CLI - Empty catalogs list falls back to built-in defaults (not an error) - Remove unreachable WorkflowAbortError catches from execute/resume (gate abort now returns StepResult(FAILED) instead of raising) - while_loop docstring updated: max_iterations is optional (default 10) * Address twentieth review: gate abort maps to ABORTED status, do-while max_iterations optional - Engine detects output.aborted from gate step and sets RunStatus.ABORTED (was unreachable — gate abort returned FAILED but status was always FAILED) - do-while max_iterations now optional (default 10), aligned with while_loop - do-while docstring and validation updated accordingly * Coerce default_options to dict, align bundled workflow ID regex with validator * Gate validates string options, prompt uses resolved integration, loop normalizes max_iterations * Use parentId:childId convention for nested step IDs - Fan-out per-item IDs use parentId:templateId:index (e.g. parallel:impl:0) - Reserve ':' in user step IDs (validation rejects) - Replaces _fanout_ prefix with cleaner namespacing - Expressions like {{ steps.parallel:impl:0.output.file }} work naturally * Validate workflow version is semantic versioning (X.Y.Z) * Schema version validation, strict semver, load_workflow docstring, preserve max_concurrency - Validate schema_version is '1.0' (reject unknown future schemas) - Strict semver regex: ^\d+\.\d+\.\d+$ (rejects 1.0.0beta etc.) - load_workflow docstring: 'parsed' not 'validated' - Keep max_concurrency in fan-out output (was dropped) - do_while docstring: engine re-evaluates step_config condition - ARCHITECTURE.md: document nested resume limitation * Path traversal prevention, loop step ID namespacing - RunState validates run_id is alphanumeric+hyphens (no path separators) - workflow_add validates catalog source doesn't escape workflows_dir - Loop iterations namespace nested step IDs as parentId:childId:iteration so multiple iterations don't overwrite each other in context/state --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-14 10:11:56 -05:00

8 Commits