mirror of https://github.com/github/spec-kit.git synced 2026-07-03 12:28:06 +08:00

Files

Copilot a00e679918 Add workflow engine with catalog system (#2158 )

* Initial plan

* Add workflow engine with step registry, expression engine, catalog system, and CLI commands

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Add comprehensive tests for workflow engine (94 tests)

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Address review feedback: do-while condition preservation and URL scheme validation

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/72a7bb5d-071f-4d67-a507-7e1abae2384d

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Address review feedback, add CLI dispatch, interactive gates, and docs

Review comments (7/7):
- Add explanatory comment to empty except block
- Implement workflow catalog download with cleanup on failure
- Add input type coercion for number/boolean/enum
- Fix example workflow to remove non-existent output references
- Fix while_loop and if_then condition defaults (string 'false' → bool False)
- Fix resume step index tracking with step_offset parameter

CLI dispatch:
- Add build_exec_args() and dispatch_command() to IntegrationBase
- Override for Claude (skills: /speckit-specify), Gemini (-m flag),
  Codex (codex exec), Copilot (--agent speckit.specify)
- CommandStep invokes installed commands by name via integration CLI
- Add PromptStep for arbitrary inline prompts (10th step type)
- Stream CLI output live to terminal (no silent blocking)
- Remove timeout when streaming (user can Ctrl+C)
- Ctrl+C saves state as PAUSED for clean resume

Interactive gates:
- Gate steps prompt [1] approve [2] reject in TTY
- Fall back to PAUSED in non-interactive environments
- Resume re-executes the gate for interactive prompting

Documentation:
- workflows/README.md — user guide
- workflows/ARCHITECTURE.md — internals with Mermaid diagrams
- workflows/PUBLISHING.md — catalog submission guide

Tests: 94 → 122 workflow tests, 1362 total (all passing)

* Fix ruff lint errors: unused imports, f-string placeholders, undefined name

* Address second review: registry-backed validation, shell failures, loop/fan-out execution, URL validation

- VALID_STEP_TYPES now queries STEP_REGISTRY dynamically
- Shell step returns FAILED on non-zero exit code
- Persist workflow YAML in run directory for reliable resume
- Resume loads from run copy, falls back to installed workflow
- Engine iterates while/do-while loops up to max_iterations
- Engine expands fan-out per item with context.item
- HTTPS URL validation for catalog workflow installs (HTTP allowed for localhost)
- Fix catalog merge priority docstring (lower number wins)
- Fix dispatch_command docstring (no build_exec_args_for_command)
- Gate on_reject=retry pauses for re-prompt on resume
- Update docs to 10 step types, add prompt step to tables and README

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* Address third review: fan-out IDs, catalog guards, shell coercion, docs

- Fan-out generates unique per-item step IDs and collects results
- Catalog merge skips non-dict workflow entries (malformed data guard)
- Shell step coerces run_cmd to str after expression evaluation
- urlopen timeout=30 for catalog workflow installs
- yaml.dump with sort_keys=False, allow_unicode=True for catalog configs
- Document streaming timeout as intentionally unbounded (user Ctrl+C)
- Document --allow-all-tools as required for non-interactive + future enhancement
- Update test docstring and PUBLISHING.md to 10 step types with prompt

* Validate final URL after redirects in catalog fetch

urlopen follows redirects, so validate the response URL against the
same HTTPS/localhost rules to prevent redirect-based downgrade attacks.

* Address fourth review: filter arg eval, tags normalization, install redirect check

- Filter arguments now evaluated via _evaluate_simple_expression() so
  default(42) returns int not string
- Tags normalized: non-list/non-string values handled gracefully
- Install URL redirect validation (same as catalog fetch)
- Remove unused 'skipped' variable in catalog config parsing
- Author 'github' → 'GitHub' in example workflow
- Document nested step resume limitation (re-runs parent step)

* Add explanatory comment to empty except ValueError block

* Address fifth review: expression parsing, fan-out output, URL install, gate options

- Move string literal parsing before operator detection in expressions
  so quoted strings with operators (e.g. 'a in b') are not mis-parsed
- Fan-out: remove max_concurrency from persisted output, fix docstring
  to reflect sequential execution
- workflow add: support URL sources with HTTPS/redirect validation,
  validate workflow ID is non-empty before writing files
- Deduplicate local install logic via _validate_and_install_local()
- Remove 'edit' gate option from speckit workflow (not implemented)

* Add comments to empty except ValueError blocks in URL install

* Address sixth review: operator precedence, fan_in cleanup, registry resilience, docs

- Fix or/and operator precedence (or parsed first = lower precedence)
- Restore context.fan_in after fan-in step completes
- Catch JSONDecodeError in registry load for corrupted files
- Replace print() with on_step_start callback (library-safe)
- Gate validation warns when on_reject set but no reject option
- Shell step: document shell=True security tradeoff
- README: sdd-pipeline → speckit, parallel → sequential for fan-out
- ARCHITECTURE.md: parallel → fan-out/fan-in in diagram

* Address seventh review: string literal before pipe, type annotations, validate on install

- Move string literal check above pipe filter parsing so 'a | b' works
- Fix type annotations: input_values list[str] | None, run_id str | None
- Run validate_workflow() before installing from local path/URL
- Remove duplicate string literal check from expression parser

* Address eighth review: fan-out namespaced IDs, early return, catalog validation

- Fan-out per-item step IDs use _fanout_{step_id}_{base}_{idx} namespace
  to avoid collisions with user-defined step IDs
- Early return after fan-out loop when state is paused/failed/aborted
- Catalog installs parse + validate downloaded YAML before registering,
  using definition metadata instead of catalog entry for registry

* Address ninth review: populate catalog, fix indentation, priority, README

- Add speckit workflow entry to catalog.json so it's discoverable
- Fix shell step output dict indentation
- Catalog add_catalog priority derived from max existing + 1
- README Quick Start clarified with install + local file examples

* Address tenth review: max_iterations validation, catalog config guard, version alignment

- Validate max_iterations is int >= 1 in while and do-while steps
- Guard add_catalog against corrupted config (non-dict/non-list)
- Align speckit_version requirement to >=0.6.1 (current package version)
- Fan-out template validation uses separate seen_ids set to avoid
  false duplication errors with user-defined step IDs

* Address eleventh review: command step fails without CLI, ID mismatch warning, state persistence

- Command step returns FAILED when CLI not installed (was silent COMPLETED)
- Catalog install warns on workflow ID vs catalog key mismatch
- Engine persists state.save() before returning on unknown step type
- Update tests to expect FAILED for command steps without CLI
- Integration tests use shell steps for CLI-independent execution

* Address twelfth review: type annotations, version examples, streaming docs, requires

- Fix workflow_search type annotations (str | None)
- PUBLISHING.md: speckit_version >=0.15.0 → >=0.6.1
- Document that exit_code is captured and referenceable by later steps
- Mark requires as declared-but-not-enforced (planned enhancement)
- Note full stdout/stderr capture as planned enhancement

* Enforce catalog key matches workflow ID (fail instead of warn)

* Bundle speckit workflow: auto-install during specify init

- Add workflows/speckit to pyproject.toml force-include for wheel builds
- Add _locate_bundled_workflow() helper (mirrors _locate_bundled_extension)
- Auto-install speckit workflow during specify init (after git extension)
- Update all integration file inventory tests to expect workflow files

* Address fourteenth review: prompt fails without CLI, resolved step data, fan-out normalization

- PromptStep returns FAILED when CLI not installed (was silent COMPLETED)
- Engine step_data prefers resolved values from step output
- Fan-out normalizes output.results=[] for empty item lists
- subprocess.run inherits stdout/stderr (no explicit sys.stdout)
- Registry tests use issubset for extensibility

* Address fifteenth review: fan_in docstring, gate defaults, validation guards, reserved prefix

- FanInStep docstring: aggregate-only, no blocking semantics
- FanInStep: guard output_config as dict, handle None
- Gate validate: use same default options as execute
- Validate inputs is dict and steps is list before iterating
- Reserve _fanout_ prefix in step ID validation
- PUBLISHING.md: remove unenforced checklist items, add _fanout_ note

* Address sixteenth review: docs regex, fan_in try/finally, hyphenated dot-path keys

- PUBLISHING.md: update ID regex docs to match implementation (single-char OK)
- FanInStep: wrap expression evaluation in try/finally for context.fan_in
- Expression dot-path: allow hyphens in keys before list index (e.g. run-tests[0])

* Make speckit workflow integration-agnostic, document Copilot CLI requirement

- Workflow integration selectable via input (default: claude)
- Each command step uses {{ inputs.integration }} instead of hardcoded copilot
- Copilot docstring documents CLI requirement for workflow dispatch
- Added install_url for Copilot CLI docs

* Address seventeenth review: project checks, catalog robustness

- Add .specify/ project check to workflow run/resume/status/search/info
- remove_catalog validates config shape (dict + list) before indexing
- _fetch_single_catalog validates response is a dict
- _get_merged_workflows raises when all catalogs fail to fetch
- add_catalog guards against non-dict catalog entries in config

* Address eighteenth review: condition coercion, gate abort result, while default, cache guard, resume log

- evaluate_condition treats plain 'false'/'true' strings as booleans
- Gate abort returns StepResult(FAILED) instead of raising exception
  so step output is persisted in state for inspection
- while_loop max_iterations optional (default 10), validation aligned
- Catalog cache fallback catches invalid JSON gracefully
- resume() appends workflow_finished log entry like execute()

* Address nineteenth review: allow-all-tools opt-in, empty catalogs, abort dead code, while docstring

- --allow-all-tools controlled by SPECKIT_ALLOW_ALL_TOOLS env var (default: 1)
  Set to 0 to disable automatic tool approval for Copilot CLI
- Empty catalogs list falls back to built-in defaults (not an error)
- Remove unreachable WorkflowAbortError catches from execute/resume
  (gate abort now returns StepResult(FAILED) instead of raising)
- while_loop docstring updated: max_iterations is optional (default 10)

* Address twentieth review: gate abort maps to ABORTED status, do-while max_iterations optional

- Engine detects output.aborted from gate step and sets RunStatus.ABORTED
  (was unreachable — gate abort returned FAILED but status was always FAILED)
- do-while max_iterations now optional (default 10), aligned with while_loop
- do-while docstring and validation updated accordingly

* Coerce default_options to dict, align bundled workflow ID regex with validator

* Gate validates string options, prompt uses resolved integration, loop normalizes max_iterations

* Use parentId:childId convention for nested step IDs

- Fan-out per-item IDs use parentId:templateId:index (e.g. parallel:impl:0)
- Reserve ':' in user step IDs (validation rejects)
- Replaces _fanout_ prefix with cleaner namespacing
- Expressions like {{ steps.parallel:impl:0.output.file }} work naturally

* Validate workflow version is semantic versioning (X.Y.Z)

* Schema version validation, strict semver, load_workflow docstring, preserve max_concurrency

- Validate schema_version is '1.0' (reject unknown future schemas)
- Strict semver regex: ^\d+\.\d+\.\d+$ (rejects 1.0.0beta etc.)
- load_workflow docstring: 'parsed' not 'validated'
- Keep max_concurrency in fan-out output (was dropped)
- do_while docstring: engine re-evaluates step_config condition
- ARCHITECTURE.md: document nested resume limitation

* Path traversal prevention, loop step ID namespacing

- RunState validates run_id is alphanumeric+hyphens (no path separators)
- workflow_add validates catalog source doesn't escape workflows_dir
- Loop iterations namespace nested step IDs as parentId:childId:iteration
  so multiple iterations don't overwrite each other in context/state

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

2026-04-14 10:11:56 -05:00

9.8 KiB

Raw Blame History

Workflow System Architecture

This document describes the internal architecture of the workflow engine — how definitions are parsed, steps are dispatched, state is persisted, and catalogs are resolved.

For usage instructions, see README.md.

Execution Model

When specify workflow run is invoked, the engine loads a YAML definition, resolves inputs, and dispatches steps sequentially through the step registry:

flowchart TD
    A["specify workflow run my-workflow"] --> B["WorkflowEngine.load_workflow()"]
    B --> C["WorkflowDefinition.from_yaml()"]
    C --> D["_resolve_inputs()"]
    D --> E["validate_workflow()"]
    E --> F["RunState.create()"]
    F --> G["_execute_steps()"]
    G --> H{Step type?}
    H -- command --> I["CommandStep.execute()"]
    H -- shell --> J["ShellStep.execute()"]
    H -- gate --> K["GateStep.execute()"]
    H -- "if" --> L["IfThenStep.execute()"]
    H -- switch --> M["SwitchStep.execute()"]
    H -- "while/do-while" --> N["Loop steps"]
    H -- "fan-out/fan-in" --> O["Fan-out/fan-in"]

    I --> P{Result status?}
    J --> P
    K --> P
    L --> P
    M --> P
    N --> P
    O --> P
    P -- COMPLETED --> Q{Has next_steps?}
    P -- PAUSED --> R["Save state → exit"]
    P -- FAILED --> S["Log error → exit"]
    Q -- Yes --> G
    Q -- No --> T{More steps?}
    T -- Yes --> G
    T -- No --> U["Status = COMPLETED"]

    style R fill:#ff9800,color:#fff
    style S fill:#f44336,color:#fff
    style U fill:#4caf50,color:#fff

Sequential Execution

Steps execute sequentially. Each step receives a StepContext containing resolved inputs, accumulated step results, and workflow-level defaults. After execution, the step's output is stored in context.steps[step_id] and made available to subsequent steps via expressions like {{ steps.specify.output.file }}.

Nested Steps (Control Flow)

Steps like if, switch, while, and do-while return next_steps — inline step definitions that the engine executes recursively via _execute_steps(). Nested steps share the same StepContext and RunState, so their outputs are visible to later top-level steps.

State Persistence and Resume

The engine saves RunState to disk after each step, enabling resume from the exact point of interruption:

flowchart LR
    A["CREATED"] --> B["RUNNING"]
    B --> C["COMPLETED"]
    B --> D["PAUSED"]
    B --> E["FAILED"]
    B --> F["ABORTED"]
    D -- "resume()" --> B
    E -- "resume()" --> B

When a gate step pauses execution, the engine persists current_step_index and all accumulated step_results. On specify workflow resume <run_id>, the engine restores the context and continues from the paused step.

Note: Resume tracking is at the top-level step index only. If a nested step (inside if/switch/while) pauses, resume re-runs the parent control-flow step and its nested body. A nested step-path stack for exact resume is a planned enhancement.

Step Types

The engine ships with 10 built-in step types, each in its own subpackage under src/specify_cli/workflows/steps/:

Type Key	Class	Purpose	Returns `next_steps`?
`command`	`CommandStep`	Invoke an installed Spec Kit command via integration CLI	No
`prompt`	`PromptStep`	Send an arbitrary inline prompt to integration CLI	No
`shell`	`ShellStep`	Run a shell command, capture output	No
`gate`	`GateStep`	Interactive human review/approval	No (pauses in CI)
`if`	`IfThenStep`	Conditional branching (then/else)	Yes
`switch`	`SwitchStep`	Multi-branch dispatch on expression	Yes
`while`	`WhileStep`	Loop while condition is truthy	Yes (if true)
`do-while`	`DoWhileStep`	Loop, always runs body at least once	Yes (always)
`fan-out`	`FanOutStep`	Dispatch per item over a collection	No (engine expands)
`fan-in`	`FanInStep`	Aggregate results from fan-out	No

Step Registry

All step types register into STEP_REGISTRY via _register_builtin_steps() in src/specify_cli/workflows/__init__.py. The registry maps type_key strings to step instances:

STEP_REGISTRY: dict[str, StepBase]  # e.g., {"command": CommandStep(), "gate": GateStep(), ...}

Registration is explicit — each step class is imported and instantiated. New step types follow the same pattern: subclass StepBase, set type_key, implement execute() and optionally validate().

Expression Engine

Workflow definitions use Jinja2-like {{ expression }} syntax for dynamic values. The expression engine in src/specify_cli/workflows/expressions.py supports:

Feature	Syntax	Example
Variable access	`{{ inputs.name }}`	Dot-path traversal into context
Step outputs	`{{ steps.plan.output.file }}`	Access previous step results
Comparisons	`==`, `!=`, `>`, `<`, `>=`, `<=`	`{{ count > 5 }}`
Boolean logic	`and`, `or`, `not`	`{{ items and status == 'ok' }}`
Membership	`in`, `not in`	`{{ 'error' not in status }}`
Literals	strings, numbers, booleans, lists	`{{ true }}`, `{{ [1, 2] }}`
Filter: `default`	`{{ val \| default('fallback') }}`	Fallback for None/empty
Filter: `join`	`{{ list \| join(', ') }}`	Join list elements
Filter: `contains`	`{{ text \| contains('sub') }}`	Substring/membership check
Filter: `map`	`{{ list \| map('attr') }}`	Extract attribute from each item

Single expressions ({{ expr }} only) return typed values. Mixed templates ("text {{ expr }} more") return interpolated strings.

Namespace

The expression evaluator builds a namespace from the StepContext:

Key	Source	Available when
`inputs`	Resolved workflow inputs	Always
`steps`	Accumulated step results	After first step
`item`	Current iteration item	Inside fan-out
`fan_in`	Aggregated results	Inside fan-in

Input Resolution

When a workflow is executed, _resolve_inputs() validates and coerces provided values against the inputs: schema:

Declared Type	Coercion	Example
`string`	None (pass-through)	`"my-feature"`
`number`	`float()` → `int()` if whole	`"42"` → `42`
`boolean`	`"true"/"1"/"yes"` → `True`	`"false"` → `False`
`enum`	Validates against allowed values	`["full", "backend-only"]`

Missing required inputs raise ValueError. Inputs with default values use the default when not provided.

Catalog System

flowchart TD
    A["specify workflow search"] --> B["WorkflowCatalog.get_active_catalogs()"]
    B --> C{SPECKIT_WORKFLOW_CATALOG_URL set?}
    C -- Yes --> D["Single custom catalog"]
    C -- No --> E{.specify/workflow-catalogs.yml exists?}
    E -- Yes --> F["Project-level catalog stack"]
    E -- No --> G{"~/.specify/workflow-catalogs.yml exists?"}
    G -- Yes --> H["User-level catalog stack"]
    G -- No --> I["Built-in defaults"]
    I --> J["default (install allowed)"]
    I --> K["community (discovery only)"]

    style D fill:#ff9800,color:#fff
    style F fill:#2196f3,color:#fff
    style H fill:#2196f3,color:#fff
    style J fill:#4caf50,color:#fff
    style K fill:#9e9e9e,color:#fff

Catalogs are fetched with a 1-hour cache (per-URL, SHA256-hashed cache files in .specify/workflows/.cache/). Each catalog entry has a priority (for merge ordering) and install_allowed flag.

When specify workflow add <id> installs from catalog, it downloads the workflow YAML from the catalog entry's url field into .specify/workflows/<id>/workflow.yml.

State and Configuration Locations

Component	Location	Format	Purpose
Workflow definitions	`.specify/workflows/{id}/workflow.yml`	YAML	Installed workflow definitions
Workflow registry	`.specify/workflows/workflow-registry.json`	JSON	Installed workflows metadata
Run state	`.specify/workflows/runs/{run_id}/state.json`	JSON	Persisted execution state
Run inputs	`.specify/workflows/runs/{run_id}/inputs.json`	JSON	Resolved input values
Run log	`.specify/workflows/runs/{run_id}/log.jsonl`	JSONL	Append-only event log
Catalog cache	`.specify/workflows/.cache/*.json`	JSON	Cached catalog entries (1hr TTL)
Project catalogs	`.specify/workflow-catalogs.yml`	YAML	Project-level catalog sources
User catalogs	`~/.specify/workflow-catalogs.yml`	YAML	User-level catalog sources

Module Structure

src/specify_cli/
├── workflows/
│   ├── __init__.py          # STEP_REGISTRY + _register_builtin_steps()
│   ├── base.py              # StepBase, StepContext, StepResult, StepStatus, RunStatus
│   ├── catalog.py           # WorkflowCatalog, WorkflowCatalogEntry, WorkflowRegistry
│   ├── engine.py            # WorkflowDefinition, WorkflowEngine, RunState, validate_workflow()
│   ├── expressions.py       # evaluate_expression(), evaluate_condition(), filters
│   └── steps/
│       ├── command/         # Dispatch command to AI integration
│       ├── shell/           # Run shell command
│       ├── gate/            # Human review checkpoint
│       ├── if_then/         # Conditional branching
│       ├── prompt/          # Arbitrary inline prompts
│       ├── switch/          # Multi-branch dispatch
│       ├── while_loop/      # While loop
│       ├── do_while/        # Do-while loop
│       ├── fan_out/         # Sequential per-item dispatch
│       └── fan_in/          # Result aggregation
└── __init__.py              # CLI commands: specify workflow run/resume/status/
                             #   list/add/remove/search/info,
                             #   specify workflow catalog list/add/remove

9.8 KiB Raw Blame History