Commit Graph

207 Commits

Author SHA1 Message Date
darion-yaphet
826e193cee refactor: move extension command handlers to extensions/_commands.py (PR-7/8) (#3014)
* refactor: move extension command handlers to extensions/_commands.py (PR-7/8)

Convert the flat extensions.py module into an extensions/ package and
extract all extension_app and catalog_app command handlers plus their
private helpers (_resolve_installed_extension, _resolve_catalog_extension,
_print_extension_info) out of __init__.py into the new
extensions/_commands.py, mirroring the domain-dir layout used for
presets/_commands.py (PR-6) and integrations/_commands.py (PR-5).

- extensions.py -> extensions/__init__.py (pure rename, 99%); intra-module
  relative imports bumped from `.x` to `..x` since they reference root
  siblings.
- Root helpers (_require_specify_project, _locate_bundled_extension,
  load_init_options, _display_project_path) are reached through thin shims
  that re-fetch from the parent package at call time, so test
  monkeypatching of specify_cli.<helper> keeps working unchanged.
- __init__.py drops ~1444 lines (3511 -> 2067); CLI surface preserved via
  register(app).

No behavior change. Full suite failure set is identical before/after
(82 pre-existing env failures, 0 new).

* fix(extensions): preserve per-command path in update backup for skills agents

Skills agents (extension == "/SKILL.md") name every command file SKILL.md,
each in its own per-command subdir (e.g. speckit-plan/SKILL.md). The update
backup keyed the backup path on cmd_file.name alone, so all of an agent's
skill files collided onto a single backup path — each shutil.copy2 overwrote
the previous one, and rollback restored one skill's content over all the
others, corrupting or losing the rest.

Mirror the real on-disk layout by using cmd_file.relative_to(commands_dir),
keeping each backup path unique. This also makes backed_up_command_files
values unique so restore copies the correct content back to each command.

Add a regression test asserting two distinct skill files survive a
backup -> failed-update -> rollback cycle with their own content.

* style(extensions): use yaml.safe_dump when writing catalog config

The catalog add/remove handlers wrote the integration catalog config with
yaml.dump. Switch to yaml.safe_dump to align with the SafeDumper used by the
presets commands and to refuse emitting !!python/object tags if a non-basic
value ever reaches the config dict.

Output is unchanged for the current basic-type payload (str/int/bool/dict/
list) — this is a defensive/consistency change, not a behavioral fix.

* fix(extensions): correct _print_cli_warning import path in skill registration

register_enabled_extensions_for_agent imported _print_cli_warning from `.` (the extensions package), but the helper lives in the parent specify_cli package. The wrong level raised ImportError inside the error handlers, aborting extension/skill registration on the first failure instead of warning and continuing. Use `..` to match the other parent-package imports.

* fix(extensions): escape untrusted values in Rich markup output

User-provided arguments and extension/catalog metadata (names, descriptions, versions, IDs, paths) were interpolated into Rich markup strings without escaping. Values containing markup sequences (e.g. [red]...) would be parsed as markup, allowing output injection that could corrupt or mislead CLI messages.

Wrap all such interpolations with rich.markup.escape across the extension/catalog command handlers: list, search, info (_print_extension_info), add (including --dev paths), remove, enable, disable, set-priority, update, and the ambiguous-match resolvers (error strings and Table rows). Reuse the already-computed safe_extension where available.

Escaping is a no-op for benign strings, so normal output is unchanged.

* Prevent Rich markup injection in extension CLI output

User-controlled catalog URLs and extension IDs are rendered through Rich-enabled console paths, so every remaining output-only interpolation now escapes markup while leaving stored values and filesystem behavior unchanged. Regression tests cover catalog add, install hints, remove hints, and state command messages with bracketed markup-like values.

* Prevent markup injection from exception text

Rich markup remains enabled for styled CLI messages, so exception text and config path labels must be escaped before rendering. YAML parser errors, URL validation failures, download errors, and extension validation errors can include user-controlled catalog or manifest values.

Constraint: Preserve existing exception handling and user-facing error paths

Rejected: Disable Rich markup for these messages | existing output intentionally uses markup for labels and styling

Confidence: high

Scope-risk: narrow

Directive: Escape user-controlled exception text before interpolating into Rich-rendered strings

Tested: .venv/bin/python -m pytest tests/test_extensions.py -q

Co-authored-by: OmX <omx@oh-my-codex.dev>

* Prevent path and manifest review regressions

Catalog path labels are rendered through Rich markup and downloaded update manifests are trusted long enough to validate extension IDs. Escape displayed project paths before rendering, and reject non-mapping extension.yml payloads before ID validation so bad archives fail with a clear rollback reason.

---------

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-06-22 13:40:57 -05:00
meymchen
6a3ee9b64e feat: add ZCode (Z.AI) integration (#3063)
* feat: add ZCode (Z.AI) integration

Add a skills-based integration for ZCode, Z.AI's Claude-Code-style
agent. ZCode uses the same SKILL.md layout as Claude Code, so spec-kit
installs workflows into .zcode/skills/speckit-<name>/SKILL.md, invoked
in chat as $speckit-<name>.

- ZcodeIntegration(SkillsIntegration) with .zcode/ folder and --skills option
- Register in INTEGRATION_REGISTRY
- Catalog entry (tags: cli, skills, z-ai)
- Tests via SkillsIntegrationTests mixin
- Document in integrations reference and README

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: render $speckit-* invocations for ZCode skills

ZCode is documented as a skills agent invoked with $speckit-<command>,
but the central invocation rendering only special-cased codex, so
specify init Next Steps and extension hooks rendered the dotted
/speckit.<command> form instead.

Centralize the $speckit-* decision in a DOLLAR_SKILLS_AGENTS set with an
is_dollar_skills_agent() helper, and route both init Next Steps and
HookExecutor._render_hook_invocation through it. Add ZCode invocation
regression tests mirroring the existing Codex/Kimi coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:14:18 -05:00
Austin Z.
bbdf1b8f40 fix(agent-context): support multiple context files safely (#2969)
* fix(agent-context): support multiple context files safely

* fix(agent-context): harden context file validation

* fix(agent-context): preserve disabled context target

* fix(agent-context): address review follow-ups

* fix(agent-context): dedupe PowerShell context files

* fix(agent-context): align context file dedupe

* fix(agent-context): align bash context file dedupe

* fix(agent-context): preserve disabled display target

* fix(agent-context): require yaml-capable updater python

* fix(agent-context): preserve context files config

* fix(agent-context): align context file fallbacks

* fix(agent-context): share context file resolution

---------

Co-authored-by: AustinZ21 <AustinZ21@users.noreply.github.com>
2026-06-22 12:10:55 -05:00
Manfred Riem
79a34b892d fix(presets): use _repo_root() for bundled-core source-checkout fallback (#3086) (#3091)
* fix(presets): use _repo_root() for bundled-core source-checkout fallback

The tier-5 fallback in PresetResolver.resolve() and
_find_bundled_core() computed the repo root as
Path(__file__).parent.parent.parent. After presets.py was moved to
presets/__init__.py (#2826) that chain is one level short, resolving
to src/ and looking for src/templates/commands/<cmd>.md, which never
exists. As a result, wrap-strategy presets found no core base layer in
source/editable installs.

Use the shared _repo_root() helper so both fallbacks resolve against
the actual repo-root templates/ tree. Wheel installs were unaffected
(core_pack path), so this only impacts source/editable checkouts.

Refs #3086

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(presets): restore dropped def for oserror-manifest test

A prior edit accidentally removed the
def test_resolve_extension_command_via_manifest_skips_oserror_manifests
line, orphaning its body inside the new bundled-core test. Restore the
test definition so pytest collects it again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(presets): move bundled-core tests into TestPresetResolver

The two tier-5 fallback regression tests exercise collect_all_layers()
and resolve(), not resolve_core(), so they belong in TestPresetResolver
rather than TestResolveCore. Relocate them for clearer suite navigation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 11:33:44 -05:00
Huy Do
1cb935997c fix: fail loudly on an unknown workflow expression filter (#3074)
* fix: fail loudly on an unknown workflow expression filter

The expression evaluator's filter dispatch fell through to `return value`
for any unregistered filter, so a typo'd or unsupported filter such as
`{{ items | length }}` rendered the value unchanged with no error and the
run completed — a silent wrong result.

Raise a clear ValueError instead, naming the offending filter and the valid
ones, mirroring the strict handling already used for `from_json`. The five
registered filters (default/join/map/contains/from_json) are unchanged; the
`name(arg)` form of an unknown filter is now caught too.

* fix: distinguish a misused registered filter from an unknown one; cover map

Address the review feedback on the unknown-filter fail-loud path:

- A *registered* filter used in an unsupported form (e.g. `| join` or
  `| map` with no argument) raised the misleading "unknown filter
  '<name>'" — the filter is registered, the syntax isn't. It now raises
  a message naming it as a known filter misused. A new
  `_REGISTERED_FILTERS` constant drives the distinction.
- `test_registered_filters_unaffected` now also exercises `map('attr')`,
  which it previously claimed to cover but didn't. Add
  `test_registered_filter_unsupported_form_raises` to pin the new path.

* fix: include the no-arg default form in the filter-error hint

Copilot review: the hint listed default('x') but omitted the valid
no-argument default form (| default), which this module supports.
2026-06-22 10:44:23 -05:00
Manfred Riem
902f5431f9 Harden command registration path handling (#3088)
* fix: validate command 'file' field against path traversal in registrar

CommandRegistrar.register_commands() read each command body from
source_dir / cmd_file without validating the manifest 'file' field,
unlike the parallel skill and preset readers which already reject
absolute paths and '..' traversal. A malicious extension/preset/bundle
manifest with file: ../../../etc/passwd (or an absolute path) could
read arbitrary host files verbatim into a generated agent command at a
predictable path (GHSA-w5fv-7w9x-7fc5, CWE-22).

Add the same containment guard at the command read site and reject a
traversal/absolute 'file' at manifest-load time in
ExtensionManifest._validate() for defense-in-depth, plus regression
tests for both the read path and the manifest validator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test/fix: address review — robust absolute-path test and tolerant reads

- register_commands(): use is_file() instead of exists() and skip the
  command if read_text() raises (directory or non-UTF8 file), aligning
  with the other command/skill readers.
- Traversal tests: point the absolute-path payload at the real temp
  secret.txt (guaranteed to exist on all platforms) instead of
  /etc/passwd, so the absolute-path guard is genuinely exercised and the
  test fails if it regresses, rather than passing because the target
  happens not to exist (e.g. on Windows runners).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: rename traversal fixtures to avoid CodeQL secret-storage false positive

The regression fixtures named an out-of-tree file secret.txt with
TOP-SECRET-CREDENTIAL content. CodeQL's clear-text-storage heuristic
treated that read content as sensitive and followed the static path
into the pre-existing write_text sinks in _write_registered_output,
raising false 'clear-text storage of sensitive information' alerts on
PR 3088. Rename the fixtures to neutral outside.txt / OUTSIDE-FILE-MARKER
and drop /etc/passwd payloads; the test semantics (a file outside
source_dir must never be read into a generated command) are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reject Windows drive-relative 'file' values in traversal guards

is_absolute() is False for Windows drive-relative paths like C:outside.txt,
which contain no '..' yet resolve against the process CWD on that drive —
bypassing the containment guard on Windows. Evaluate the 'file' value under
PureWindowsPath as well so both the registrar runtime guard and the
manifest-load validator reject drive letters (and backslash '..' segments)
cross-platform. Extend the regression tests with drive-relative cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use anchor under both path flavors so POSIX-absolute is rejected on Windows

On a Windows runner WindowsPath('/abs/outside.md').is_absolute() is False
(no drive), so the prior native-Path check let a leading-slash 'file' value
through and the manifest validator did not raise. Evaluate the value under
both PurePosixPath and PureWindowsPath and reject any non-empty anchor —
covering POSIX-absolute, Windows drive-relative, Windows absolute, and
rooted-without-drive — in both the registrar guard and the manifest
validator. The registrar join now uses the raw 'file' string so native
separators are handled by the resolve()/relative_to() containment check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: validate command 'file' field against path traversal in registrar

CommandRegistrar.register_commands() read each command body from
source_dir / cmd_file without validating the manifest 'file' field,
unlike the parallel skill and preset readers which already reject
absolute paths and '..' traversal. A malicious extension/preset/bundle
manifest with file: ../../../etc/passwd (or an absolute path) could
read arbitrary host files verbatim into a generated agent command at a
predictable path (GHSA-w5fv-7w9x-7fc5, CWE-22).

Add the same containment guard at the command read site and reject a
traversal/absolute 'file' at manifest-load time in
ExtensionManifest._validate() for defense-in-depth, plus regression
tests for both the read path and the manifest validator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test/fix: address review — robust absolute-path test and tolerant reads

- register_commands(): use is_file() instead of exists() and skip the
  command if read_text() raises (directory or non-UTF8 file), aligning
  with the other command/skill readers.
- Traversal tests: point the absolute-path payload at the real temp
  secret.txt (guaranteed to exist on all platforms) instead of
  /etc/passwd, so the absolute-path guard is genuinely exercised and the
  test fails if it regresses, rather than passing because the target
  happens not to exist (e.g. on Windows runners).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: rename traversal fixtures to avoid CodeQL secret-storage false positive

The regression fixtures named an out-of-tree file secret.txt with
TOP-SECRET-CREDENTIAL content. CodeQL's clear-text-storage heuristic
treated that read content as sensitive and followed the static path
into the pre-existing write_text sinks in _write_registered_output,
raising false 'clear-text storage of sensitive information' alerts on
PR 3088. Rename the fixtures to neutral outside.txt / OUTSIDE-FILE-MARKER
and drop /etc/passwd payloads; the test semantics (a file outside
source_dir must never be read into a generated command) are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reject Windows drive-relative 'file' values in traversal guards

is_absolute() is False for Windows drive-relative paths like C:outside.txt,
which contain no '..' yet resolve against the process CWD on that drive —
bypassing the containment guard on Windows. Evaluate the 'file' value under
PureWindowsPath as well so both the registrar runtime guard and the
manifest-load validator reject drive letters (and backslash '..' segments)
cross-platform. Extend the regression tests with drive-relative cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use anchor under both path flavors so POSIX-absolute is rejected on Windows

On a Windows runner WindowsPath('/abs/outside.md').is_absolute() is False
(no drive), so the prior native-Path check let a leading-slash 'file' value
through and the manifest validator did not raise. Evaluate the value under
both PurePosixPath and PureWindowsPath and reject any non-empty anchor —
covering POSIX-absolute, Windows drive-relative, Windows absolute, and
rooted-without-drive — in both the registrar guard and the manifest
validator. The registrar join now uses the raw 'file' string so native
separators are handled by the resolve()/relative_to() containment check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: harden register_commands inputs and tighten manifest 'file' validation

Address review feedback on #3088:
- register_commands(): skip non-string/empty 'file' values instead of
  raising TypeError, and hoist source_dir.resolve() out of the per-command
  loop.
- ExtensionManifest._validate(): reject 'file' values with leading/trailing
  whitespace with a clear ValidationError instead of a confusing
  missing-file failure later.
- tests: add non-string 'file' and whitespace cases; use yaml.safe_dump
  with explicit utf-8 encoding in the manifest validation test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: align runtime '..' policy, correct comment, dedupe test helper

Address review feedback on #3088:
- register_commands(): also reject '..' segments under both POSIX and
  Windows semantics, keeping runtime policy consistent with
  ExtensionManifest._validate() and the skill/preset readers (not just
  relying on the resolve()/relative_to() containment backstop).
- Replace the version-dependent is_absolute() claim in the extensions.py
  comment with the actual portability rationale (native Path is OS-
  dependent; C:foo is anchored but not absolute).
- Extract the duplicated leak-detection assertion into
  _assert_no_marker_leak() and add an in-bounds '..' payload that
  exercises the new runtime '..' rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Extract shared path-safety policy and warn on unreadable command files

Introduce relative_extension_path_violation() in _utils.py as the single
source of truth for the extension-relative `file` path-safety policy, and
use it from both the runtime registrar guard (agents.py) and the
manifest-load validator (extensions.py) so the two cannot drift.

Warn (instead of silently skipping) when an in-bounds command file exists
but cannot be read/decoded, surfacing misconfigured extensions.

Add unit tests for the shared helper, a read-skip warning test, and make
the in-bounds `..` test create its target file so the skip is attributable
to the `..` rejection rather than file absence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Retrigger CI

Empty commit to re-trigger code scanning / CodeQL analysis on the PR
merge ref.

Assisted-by: GitHub Copilot CLI (model: Claude Opus 4.8, autonomous)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 10:25:29 -05:00
Ali jawwad
f9c6cf83e5 fix(presets): preserve argument-hint in preset SKILL.md generation (#2978)
* fix(presets): preserve argument-hint in preset SKILL.md generation

Preset-provided and extension-override commands that declare
`argument-hint:` in their frontmatter had it dropped from the generated
Claude SKILL.md, and it was re-dropped when a preset was removed and its
overridden skill restored. This is the preset-side analog of the
extension fix in #2903 / #2916.

Factor the argument-hint carry-over into a shared
CommandRegistrar.apply_argument_hint() helper and apply it at the four
preset skill-generation sites (register, reconcile override-restore, and
the core/extension unregister-restore paths). The extension path from

The helper writes argument-hint into the frontmatter dict before
serialization (so a folded multi-line description cannot be split into
invalid YAML) and only for integrations that support it (those exposing
inject_argument_hint -- currently Claude), leaving build_skill_frontmatter's
shared shape unchanged for every other agent. Core templates carry no
argument-hint, so the core-restore path is a no-op. No behavior change for
non-Claude agents or the core path.

Add regression tests covering a folding description (Claude) and the
non-Claude gate (codex).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(presets): address review - guard skill_frontmatter type and tighten apply_argument_hint annotations

Add a symmetric isinstance(skill_frontmatter, dict) guard so the helper stays a safe no-op if a caller passes a non-dict, and annotate the parameters as Dict[str, Any] with an optional integration to match real call-site usage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:52:13 -05:00
Huy Do
f5f76160a3 feat: surface gate detail in the workflow run/resume --json payload (#2965)
* feat: surface gate detail in the workflow run/resume --json payload

A paused run was indistinguishable from any other pause in the
machine-readable outcome, and the gate's prompt/options/choice never
left the human-facing stream. Record each step's type in the run
state's step results (one engine line) and, when the run sits at a
gate, add a gate block (step_id/message/options/choice) to the payload
so orchestrators can drive review gates without parsing stdout.

Reference implementation for the proposal in #2964.

Addresses #2964

* fix(workflow): only surface gate detail in --json when the run is paused

Address review (#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(workflows): surface gate block on aborted runs; stabilize message

Address Copilot review:
- `_gate_outcome` now also surfaces the gate block when a run is `aborted`
  by a gate rejection (`on_reject: abort`), not only when `paused`. Abort
  is the only path that sets ABORTED and it leaves current_step_id on the
  gate, so an orchestrator can read the recorded `choice` for the stop.
- Coerce `message` to a string (it may be a non-string YAML literal that
  GateStep only coerces for interpolation) so the JSON schema stays stable.
- Tests: add a CLI-level aborted-path test, a message-coercion test, and
  extend the suppression test to allow `aborted`; share the run helper via
  `_invoke_json` to avoid duplicating the invoke boilerplate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): assert clean exit in gate-abort JSON test

Address Copilot review: the gate-abort test parsed stdout without first
asserting the CLI exited cleanly, so an invoke failure would surface as an
opaque JSON decode error. Route it through `_run_json` (which asserts
exit_code == 0 before parsing) and drop the now-redundant `_invoke_json`
helper — a gate abort emits the payload and returns, so the run exits 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: use result.output in run-helper assert; document step_data shape

Address Copilot review:
- `_run_json` asserted with `result.stdout` in the message, but under
  `--json` step output is redirected off stdout — the useful diagnostics
  live on `result.output`. Switch the assertion message to `result.output`
  (the JSON parse still reads stdout), matching the other CLI tests.
- `StepContext.steps` documented a 5-key entry shape; the engine now also
  persists `type` and `status`. Update the docstring to the canonical
  7-key shape so step authors/debuggers see the real record.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): align gate-abort JSON test with aborted→exit-1

After rebasing onto main, a gate abort now emits the --json payload and
then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the
merged exit-code work). Give `_run_json` an `expected_exit` parameter
(default 0) so the abort case asserts exit 1 while the paused/completed
cases stay at 0 — keeping a single shared helper rather than duplicating
the invoke boilerplate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): backward-compat gate detection + normalize gate options

Address Copilot review:
- A run paused by an older version has no persisted step `type`, so
  `_gate_outcome` would never surface its gate block on resume. Add
  `_is_gate_step`: prefer the `type` field, but when it is absent fall back
  to the gate's unique output signature (`on_reject`, written only by
  GateStep). A record with a different known `type` is still not a gate.
- Normalize `options` to a list of strings (mirroring the `message`
  coercion) so an unvalidated workflow with non-string options can't
  destabilize the JSON schema.
- Tests: options coercion, type-less gate detection, and a type-less
  non-gate negative case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): normalize non-list gate options to a stable list[str]

Address Copilot review: the prior options normalization only mapped a
`list`, returning the raw value for any other shape (scalar/tuple), which
contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`:
None stays None; list/tuple maps each element through str; any other scalar
becomes a single-element list (a bare string is one option, never iterated
character-by-character). The emitted schema is now always list[str] | None.
Extend the options test to cover list, tuple, bare string, numeric scalar,
and None.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): normalize gate choice to str; portable plain-gate test

Address Copilot review:
- `_gate_outcome` normalized `message` and `options` but passed `choice`
  through as-is; an unvalidated gate can record a non-string `choice`,
  which contradicts the stable-schema rationale. Coerce `choice` to
  `str | None` (None still means "no decision yet"), consistent with the
  other two fields. Adds a focused choice-coercion test.
- The plain (no-gate) test workflow used `run: "true"`, which fails under
  cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform
  `run: "exit 0"` (matching the exit-code suite's workflows).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-22 07:05:54 -05:00
Manfred Riem
487af97864 feat: add specify bundle command (#3070)
* docs: dogfood Spec Kit — bundler SDD artifacts + constitution

Scaffold Spec Kit (--integration copilot) and run the full SDD workflow
against the `specify bundle` subcommand feature:

- spec.md (4 user stories, 31 FRs, 8 success criteria) + clarifications
- plan.md, research.md, data-model.md, contracts/, quickstart.md
- tasks.md (43 dependency-ordered tasks, organized by user story)
- Spec Kit Constitution v1.0.0 (code quality, testing, UX, performance,
  dependency/security principles) derived from deep codebase analysis
- plan Constitution Check + tasks grounded against the ratified principles

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bundler): add `specify bundle` subcommand for role-based setups

Implements the Spec Kit Bundler as a `specify bundle ...` subcommand group
that calls existing primitive machinery in-process with zero new dependencies,
per the v1.0.0 constitution (Principles I-V).

Adds the `specify_cli.bundler` package (models, services, lib helpers) and the
`commands/bundle` Typer group wiring search, info, list, install, update,
remove, validate, build, init, and catalog list/add/remove (with --json and
--offline). Includes manifest/catalog schemas, version + integration-clash
gating, discovery-only refusal, idempotent install with atomic rollback,
non-collateral removal, and offline-first catalog resolution.

Ships an 82-test suite (contract/unit/integration), four sample role bundles
(product-manager, business-analyst, security-researcher, developer), README
"Bundles" docs, and an AGENTS.md pitfall on the test-venv gotcha. Marks
tasks T001-T043 complete and records follow-ups T044 (live in-process
primitive dispatch) and T045 (install from a local artifact path).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(contributing): document running the full test suite via project .venv

Add a "Running the full test suite" subsection under Automated checks covering
`uv pip install -e ".[test]"` + `.venv/bin/python -m pytest`, with the
shared/global editable-install contamination caveat that mirrors the AGENTS.md
pitfall.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bundler): wire real in-process primitive install + local-artifact install

Closes the two follow-ups left after the initial bundler landing.

T044 — DefaultPrimitiveInstaller now performs real installs through existing
machinery instead of raising "use the primitive command" errors:
- presets/extensions install via their reusable managers
  (install_from_directory / install_from_zip); bundled assets install fully
  offline, catalog assets are fetched only when the network is allowed.
- workflows/steps delegate to the existing `workflow add` / `workflow step add`
  command callables in-process (project root as cwd), avoiding any duplicated
  download/validation logic (Principle I).
- `--offline` is threaded through DefaultPrimitiveInstaller(allow_network=…) so
  network-only kinds refuse with an actionable message rather than silently
  reaching out.

T045 — `specify bundle install` now accepts a local path (a built .zip
artifact, a bundle directory, or a bundle.yml) and installs directly without
consulting the catalog stack; bundle-ids still resolve via the stack.

Adds 13 tests (routing, offline gating, local-source resolution, and an
end-to-end offline build → install → list → remove of the bundled
agent-context extension). Bundler suite: 95 passing; ruff clean. Marks T044
and T045 complete in tasks.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(bundler): append Phase 8 convergence tasks from converge assessment

Ran the converge command: assessed the codebase against spec.md, plan.md,
tasks.md, and the v1.0.0 constitution. Appended 7 traceable gap-closure tasks
(T046–T052) as a new "Phase 8: Convergence" section. Append-only — no existing
tasks were modified and no application code was changed.

Findings: 1 CRITICAL (Constitution III — bundle group undocumented under
docs/reference/), 3 HIGH (FR-005/SC-007 validate references; FR-009/SC-002 info
expansion; FR-012 install-time init), 3 MEDIUM (FR-013 integration precedence;
FR-020 surface overlaps; FR-028 update refresh).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Implement Phase 8 convergence tasks (T046–T052)

Close the gaps the converge command found between the bundler spec/plan/
constitution and the code:

- T046: add docs/reference/bundles.md documenting the full `specify bundle`
  command group; link it from docs/reference/overview.md (Constitution III).
- T047: wire a reference checker into `bundle validate` (services/references.py);
  online runs fail and name unresolved component references, offline runs warn.
- T048: expand `bundle info` to enumerate the full component set (versions,
  preset priority/strategy) plus the bundle integration — info == install.
- T049/T050: `bundle install`/`bundle init` now scaffold an uninitialized
  project via the existing `specify init` machinery, choosing the integration by
  precedence (override → bundle-declared → Copilot + OS default script type).
- T051: surface foreseeable component overlaps during info and install.
- T052: `bundle update` refreshes already-installed components via a new
  refresh path in install_bundle, preserving primitive-level overrides.

Adds unit/contract/integration coverage (107 tests pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* converge: append Phase 9 (T053) — surface bundle trust indicator

Re-run of converge after Phase 8. The seven Phase 8 tasks are verified closed.
One residual partial gap remains: the `verified`/trust indicator (FR-010,
FR-027) is exposed only in `bundle info --json`, absent from `bundle search`
(the primary discovery surface) and `bundle info` text. Appended as a single
new task for implement to complete. Append-only; no code changed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Implement T053 — surface bundle trust indicator in discovery

`bundle search` (text + JSON) and `bundle info` (text + JSON) now expose each
catalog entry's verification/trust level (verified vs community), so users can
judge a bundle's trust before installing, per FR-010 / FR-027. Previously
`verified` was only present in `bundle info --json`.

Adds contract coverage; 108 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: dogfood Spec Kit — bundler SDD artifacts + constitution

Scaffold Spec Kit (--integration copilot) and run the full SDD workflow
against the `specify bundle` subcommand feature:

- spec.md (4 user stories, 31 FRs, 8 success criteria) + clarifications
- plan.md, research.md, data-model.md, contracts/, quickstart.md
- tasks.md (43 dependency-ordered tasks, organized by user story)
- Spec Kit Constitution v1.0.0 (code quality, testing, UX, performance,
  dependency/security principles) derived from deep codebase analysis
- plan Constitution Check + tasks grounded against the ratified principles

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bundler): add `specify bundle` subcommand for role-based setups

Implements the Spec Kit Bundler as a `specify bundle ...` subcommand group
that calls existing primitive machinery in-process with zero new dependencies,
per the v1.0.0 constitution (Principles I-V).

Adds the `specify_cli.bundler` package (models, services, lib helpers) and the
`commands/bundle` Typer group wiring search, info, list, install, update,
remove, validate, build, init, and catalog list/add/remove (with --json and
--offline). Includes manifest/catalog schemas, version + integration-clash
gating, discovery-only refusal, idempotent install with atomic rollback,
non-collateral removal, and offline-first catalog resolution.

Ships an 82-test suite (contract/unit/integration), four sample role bundles
(product-manager, business-analyst, security-researcher, developer), README
"Bundles" docs, and an AGENTS.md pitfall on the test-venv gotcha. Marks
tasks T001-T043 complete and records follow-ups T044 (live in-process
primitive dispatch) and T045 (install from a local artifact path).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(contributing): document running the full test suite via project .venv

Add a "Running the full test suite" subsection under Automated checks covering
`uv pip install -e ".[test]"` + `.venv/bin/python -m pytest`, with the
shared/global editable-install contamination caveat that mirrors the AGENTS.md
pitfall.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bundler): wire real in-process primitive install + local-artifact install

Closes the two follow-ups left after the initial bundler landing.

T044 — DefaultPrimitiveInstaller now performs real installs through existing
machinery instead of raising "use the primitive command" errors:
- presets/extensions install via their reusable managers
  (install_from_directory / install_from_zip); bundled assets install fully
  offline, catalog assets are fetched only when the network is allowed.
- workflows/steps delegate to the existing `workflow add` / `workflow step add`
  command callables in-process (project root as cwd), avoiding any duplicated
  download/validation logic (Principle I).
- `--offline` is threaded through DefaultPrimitiveInstaller(allow_network=…) so
  network-only kinds refuse with an actionable message rather than silently
  reaching out.

T045 — `specify bundle install` now accepts a local path (a built .zip
artifact, a bundle directory, or a bundle.yml) and installs directly without
consulting the catalog stack; bundle-ids still resolve via the stack.

Adds 13 tests (routing, offline gating, local-source resolution, and an
end-to-end offline build → install → list → remove of the bundled
agent-context extension). Bundler suite: 95 passing; ruff clean. Marks T044
and T045 complete in tasks.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(bundler): append Phase 8 convergence tasks from converge assessment

Ran the converge command: assessed the codebase against spec.md, plan.md,
tasks.md, and the v1.0.0 constitution. Appended 7 traceable gap-closure tasks
(T046–T052) as a new "Phase 8: Convergence" section. Append-only — no existing
tasks were modified and no application code was changed.

Findings: 1 CRITICAL (Constitution III — bundle group undocumented under
docs/reference/), 3 HIGH (FR-005/SC-007 validate references; FR-009/SC-002 info
expansion; FR-012 install-time init), 3 MEDIUM (FR-013 integration precedence;
FR-020 surface overlaps; FR-028 update refresh).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Implement Phase 8 convergence tasks (T046–T052)

Close the gaps the converge command found between the bundler spec/plan/
constitution and the code:

- T046: add docs/reference/bundles.md documenting the full `specify bundle`
  command group; link it from docs/reference/overview.md (Constitution III).
- T047: wire a reference checker into `bundle validate` (services/references.py);
  online runs fail and name unresolved component references, offline runs warn.
- T048: expand `bundle info` to enumerate the full component set (versions,
  preset priority/strategy) plus the bundle integration — info == install.
- T049/T050: `bundle install`/`bundle init` now scaffold an uninitialized
  project via the existing `specify init` machinery, choosing the integration by
  precedence (override → bundle-declared → Copilot + OS default script type).
- T051: surface foreseeable component overlaps during info and install.
- T052: `bundle update` refreshes already-installed components via a new
  refresh path in install_bundle, preserving primitive-level overrides.

Adds unit/contract/integration coverage (107 tests pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* converge: append Phase 9 (T053) — surface bundle trust indicator

Re-run of converge after Phase 8. The seven Phase 8 tasks are verified closed.
One residual partial gap remains: the `verified`/trust indicator (FR-010,
FR-027) is exposed only in `bundle info --json`, absent from `bundle search`
(the primary discovery surface) and `bundle info` text. Appended as a single
new task for implement to complete. Append-only; no code changed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Implement T053 — surface bundle trust indicator in discovery

`bundle search` (text + JSON) and `bundle info` (text + JSON) now expose each
catalog entry's verification/trust level (verified vs community), so users can
judge a bundle's trust before installing, per FR-010 / FR-027. Previously
`verified` was only present in `bundle info --json`.

Adds contract coverage; 108 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address PR review — annotations, Windows paths, HTTPS, errors, reproducible builds

Resolves automated review feedback on github/spec-kit#3070:

- validator: drop redundant string-quoting on ReferenceChecker's
  `str | None` return so the annotation evaluates as a real union under
  `from __future__ import annotations`.
- adapters: normalize Windows drive-letter paths (e.g. C:\...) to the
  local-file branch so offline file catalogs resolve on Windows.
- adapters: enforce HTTPS (HTTP only for localhost) and require a host on
  remote catalog URLs before any network call, mirroring
  specify_cli.catalogs URL validation (MITM/downgrade protection).
- adapters: pass `origin` to loads_json for local files and HTTP payloads
  so JSON parse errors name the real source instead of <string>.
- manifest: parse component `priority` defensively, raising an actionable
  BundlerError on non-integer values instead of a raw ValueError.
- packager: write zip members with a fixed timestamp + permissions so
  identical inputs yield byte-for-byte identical artifacts (genuinely
  reproducible builds), and strengthen the determinism test accordingly.

Adds regression tests for priority validation, plain-HTTP/host rejection,
and byte-level artifact reproducibility (111 bundler tests pass; ruff clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address PR review round 2 — nested output dir + file:// URLs

- packager: when --output points inside the bundle directory, exclude the
  whole output subtree from collection so previously-built artifacts are
  never re-packaged (prevents broken reproducibility and unbounded growth).
- adapters: resolve file:// catalog URLs via url2pathname and preserve
  netloc, so Windows file URLs (file:///C:/...) and UNC shares
  (file://server/share) resolve correctly instead of dropping the host or
  producing /C:/x.

Adds regression tests for nested-output exclusion and file:// resolution
(113 bundler tests pass; ruff clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address PR review round 3 — discovery UX + hardening

- bundle search/info: fall back to the built-in/user catalog stack instead of
  requiring a Spec Kit project, so discovery works in a fresh directory (and
  the README/quickstart examples now match actual behavior). install still
  auto-initializes a project as before.
- packager: traverse with os.walk(followlinks=False) and prune symlinked
  directories before descending, so a symlink-to-dir can no longer pull in
  out-of-tree files (which previously turned "skip symlinks" into a hard
  ensure_within() failure and did extra filesystem work).
- records: parse contributed-component priority defensively, raising an
  actionable BundlerError on a corrupt records file instead of leaking a raw
  ValueError/traceback.
- installer: give install_bundle's manifest parameter an explicit
  BundleManifest | None type for a clearer, safer service API.

Adds regression tests for project-less search/info, symlinked-dir pruning,
and corrupt-priority records (117 bundler tests pass; ruff clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address PR review round 4 + markdownlint exclusions

Review fixes:
- bundle info: expand the manifest regardless of install policy so
  discovery-only bundles remain inspectable (only install is refused).
- _download_manifest: handle local .zip download_url by extracting bundle.yml
  (via _local_manifest_source), and add a real remote HTTPS fetch path using
  the shared authenticated, redirect-validated open_url client (HTTPS enforced
  on the initial URL and every redirect; offline still refuses).
- _run_init: thread the --offline flag through to the init callback so
  `bundle install/init --offline` never performs network init.
- conflict.ConflictReport: use field(default_factory=list) and drop the
  None + __post_init__ workaround.
- CatalogSource.from_dict: parse priority defensively, raising an actionable
  BundlerError naming the source + offending value instead of a raw ValueError.

markdownlint:
- Exclude .specify/, .github/, and specs/ (and their subdirectories) from
  markdownlint so the in-flight dogfooding scaffolding doesn't trip the linter.

Adds regression tests for discovery-only info, local-zip download_url, and
non-integer catalog priority (120 bundler tests pass; ruff clean; the PR's own
markdown lints clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address PR review round 5 + ignore generated files in whitespace check

Review fixes:
- packager: exclude any prior build artifact for this bundle (matching
  <id>-*.zip), not just the current output path, so older artifacts next to
  bundle.yml are never re-packaged.
- docs(bundles): correct the note — `search` and `info` work without a project
  (they fall back to the built-in/user catalog stack); only list/update/remove/
  catalog require an initialized project.

CI / generated files:
- .gitattributes: mark the generated dogfooding scaffolding (.specify/**, the
  speckit .github agent/prompt files, copilot-instructions.md, specs/**) with
  -whitespace so `git diff --check` (the Lint workflow's whitespace gate) stops
  flagging emitted trailing whitespace. These files are produced by
  `specify init` and are scrubbed before merge.

Adds a regression test for prior-artifact exclusion (121 bundler tests pass;
ruff clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): collision-resistant catalog ids, canonical local paths, explicit uninstalled result

Addresses review round 6 (PR #3070):
- catalog_config._derive_id now combines host label with the URL path stem so
  multiple catalogs from the same host get distinct, stable default ids.
- add_source canonicalizes local file paths to absolute before persisting, so
  project config no longer depends on the caller's cwd.
- InstallResult gains a dedicated `uninstalled` list; remove_bundle no longer
  overloads `installed` for removals, and the CLI prints from `uninstalled`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): confine config writes, guard indeterminate integration, fix validate docs

Addresses review round 7 (PR #3070):
- save_records and catalog_config._write now pass within=project_root to
  dump_json/dump_yaml, refusing symlinked .specify paths that escape the
  project (defense-in-depth, matching the rest of the codebase).
- resolve_install_plan now fails when a bundle pins an integration but the
  project's active integration cannot be determined and no explicit
  --integration override was given, instead of silently adopting the bundle's
  required integration (FR-019 guard). CLI passes integration_explicit.
- docs/reference/bundles.md: corrected the validate semantics to describe the
  actual best-effort online behavior (unreachable catalogs warn, not fail).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): Windows path handling + review round 8 hardening

Fix Windows CI failures:
- is_safe_relpath now rejects POSIX-absolute (/abs) and Windows drive-absolute
  (C:\x, UNC) paths on every OS, instead of passing them through on Windows
  where os.path.isabs('/abs') is False and Path('/abs').parts yields '\\'.
- _download_manifest treats a Windows drive-letter download_url (C:\bundle.yml,
  which urlparse reads as scheme 'c') as a local file, fixing the empty
  component set in `bundle info` on Windows.

Address review round 8 (PR #3070):
- Bundled workflows now install under --offline (locate via
  _locate_bundled_workflow) instead of being refused unconditionally.
- bundle update preserves the original installed_at timestamp on refresh
  (import find_record; reuse the existing record's timestamp).
- _derive_id lowercases the host label so 'Example.com' and 'example.com'
  produce the same deterministic id.
- CatalogEntry.from_dict validates 'tags' is a list and 'verified' is a real
  boolean, raising BundlerError on invalid untrusted shapes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): normalize SemVer prerelease spellings before version parsing

Addresses review round 9 (PR #3070): parse_version and is_semver now apply the
same prerelease normalization (mirroring specify_cli._version._normalize_tag)
so SemVer spellings like 1.2.3-rc1 / 1.2.3-alpha1 validate and compare
consistently across is_semver, parse_version, and satisfies. Leading 'v' is
also stripped. Keeps the manifest validator and constraint checks in agreement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): no collateral removal + enforce manifest-pinned versions

Addresses review round 10 (PR #3070):
- install_bundle records only the components this bundle actually contributed:
  freshly-installed components, plus pre-existing ones already owned by this
  bundle (refresh) or a sibling bundle (shared/refcounted). A component that is
  installed on disk but tracked by no bundle was installed independently and is
  no longer attributed, so `bundle remove` won't uninstall it (FR-022).
- preset/extension/workflow install paths now verify the active catalog's
  advertised version matches the manifest-pinned component.version before
  downloading/installing, raising BundlerError on mismatch so bundles stay
  reproducible. When a catalog advertises no version the pin can't be enforced
  and installation proceeds.

Added regression tests: independent pre-existing component survives removal;
version-mismatch refusal (helper + workflow path).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(scripts): add SPECIFY_INIT_DIR to target a member project from the repo root (#2892)

* feat(scripts): add SPECIFY_INIT_DIR to target a member project from the repo root

Resolve an explicit SPECIFY_INIT_DIR project override once in the core
get_repo_root / Get-RepoRoot, so a non-interactive / CI caller can target a
member project (the directory containing .specify/) from a monorepo root
without cd. Strict by design: the path must exist and contain .specify/,
otherwise it hard-errors with no silent fallback.

- Single resolver in core; the git feature-branch script inherits it by
  sourcing core, with no per-extension copies.
- PS resolver verifies the resolved path is a directory (Resolve-Path also
  succeeds for files) so a file value errors as "not an existing directory".
- get_feature_paths splits decl/assignment so a SPECIFY_INIT_DIR failure
  propagates instead of being masked by `local`.
- create-new-feature-branch: when core is absent (only git-common loaded) and
  SPECIFY_INIT_DIR is set, hard-error rather than silently using the git root.
- Document SPECIFY_INIT_DIR and SPECIFY_FEATURE_DIRECTORY in the core reference.
- Tests for valid/relative/trailing-slash/file/missing/no-.specify targets,
  feature-axis composition, the no-core guard, and a PowerShell mirror.

* fix: guard SPECIFY_INIT_DIR with stale core scripts

* docs: clarify SPECIFY_FEATURE_DIRECTORY precedence wording

* fix: normalize trailing slash in PowerShell SPECIFY_INIT_DIR resolver

Resolve-Path preserves a trailing separator from its input, so a
SPECIFY_INIT_DIR ending in a slash returned a root that didn't match the
bash resolver (whose `cd && pwd` strips it). That broke
test_ps_trailing_slash_tolerated on the CI runners, which do have pwsh.
Trim it with TrimEndingDirectorySeparator (no-op on a bare root or a path
with no trailing separator).

Also fix the misleading test comment: the PowerShell mirror runs on the
CI ubuntu/windows runners (they ship pwsh), it is not skipped there.

* test: normalize bash path expectations on Windows

* docs: clarify SPECIFY_INIT_DIR root helpers

* chore: sync dogfooded .specify core scripts with SPECIFY_INIT_DIR

Mirror the SPECIFY_INIT_DIR resolver (resolve_specify_init_dir in
common.sh) into the committed dogfooding .specify/scripts/bash copies so
the git extension's create-new-feature-branch.sh finds an up-to-date
common.sh instead of failing with "requires updated Spec Kit core
scripts". Fixes the test_init_dir.py CI failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): harden remote catalog fetch and config parsing

- adapters: route catalog HTTP fetches through the shared authenticated
  client (authentication.http.open_url) so auth.json tokens apply and the
  Authorization header is stripped on cross-host/downgrade redirects.
  Reject any redirect that leaves HTTPS via a redirect_validator and
  re-validate the final URL after redirects, closing the urlopen
  auto-redirect MITM/downgrade gap.
- catalog_config._read: raise an actionable BundlerError when the config
  top level is not a mapping, 'catalogs' is not a list, or an entry is
  not a mapping, instead of letting list(<str>) produce a downstream
  AttributeError.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): tighten record read confinement, policy gate, and precedence

Addresses review 4534504799:

- records.load_records: confine the read via ensure_within(project_root,
  ...) so a symlinked/traversal-escaping .specify cannot read arbitrary
  files outside the project (matches the write path's within= guard).
- catalog_config._slug: lowercase so derived catalog ids are
  deterministic across platforms and case-variant duplicates can't slip
  past the case-sensitive dup check.
- installer.install_bundle: reword the docstring's misleading "atomic on
  failure" claim to describe the real scoped guarantee (record written
  only on full success; rollback limited to newly-installed components).
- bundle update: enforce the source install_policy like install, refusing
  to update from a discovery-only source (FR-025).
- catalog source precedence: the CLI now passes ~/.specify as the user
  config dir so project > user > built-in precedence is actually
  reachable (previously the user scope was silently ignored).
- .gitattributes: scope the specs whitespace exemption to the generated
  dogfooding feature dir (specs/001-spec-kit-bundler/**) instead of all
  of specs/**.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): no collateral refresh, catalog id integrity, loud info

Addresses review 4534571362:

- installer: in refresh mode (bundle update) only re-apply already-
  installed components that this bundle (or a sibling) owns. Components
  installed independently and tracked by no bundle are now skipped, never
  refreshed, so update cannot make collateral changes (FR-022).
- catalog.load_catalog_payload: validate each entry's own id is present
  and matches its enclosing bundles key, rejecting catalogs that would
  otherwise list a spoofed or unresolvable id.
- bundle info: stop swallowing manifest download failures. If the
  manifest can't be resolved (e.g. --offline against an https download_url
  or a download failure), surface the error and exit non-zero instead of
  silently degrading to catalog `provides` counts, preserving the "info
  == what install applies" guarantee.

Added regressions: refresh leaves independently-installed components
untouched, catalog id key/field mismatch + missing id rejection, and
info exits non-zero when the manifest is unresolvable offline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): confine catalog-config and integration-marker reads

Addresses review 4534716790: two more state reads bypassed the
symlink/path-escape confinement that records and the write paths already
enforce.

- catalog_config._read: validate the config path with
  ensure_within(project_root, ...) before exists()/read, so a symlinked
  .specify resolving outside project_root is rejected instead of read.
- lib.project.active_integration: confine the .specify/integration.json
  read the same way; an out-of-tree escape is treated as "not
  determinable" (returns None) rather than followed.

Added regressions covering both via a symlinked .specify pointing
outside the project root.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): validate manifest tags, disambiguate derived ids by full host

Addresses review 4534768419:

- manifest.from_dict: reject a non-list `tags` (e.g. a bare string) instead
  of splitting it character-by-character, matching the catalog parser and
  the schema contract (tags = list of strings).
- catalog_config._derive_id: derive ids from the full host (TLD included)
  so example.com and example.net no longer collide on the same id. Updated
  the affected id assertions.
- CHANGELOG: call out the new `specify bundle` command group in the
  unreleased section (the PR's headline user-facing feature).
- .gitattributes: clarify the specs whitespace exemption — the dogfooding
  feature dir is scrubbed before merge (not retained), so it doesn't weaken
  checks for kept docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(gitattributes): retain whitespace exemption for constitution.md

The project constitution (.specify/memory/constitution.md) is the one
dogfooding artifact carried forward past the pre-merge scrub. Give it its
own standalone whitespace exemption so it survives removal of the broader
.specify/** generated-scaffolding exemption.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): accurate uninstall count, confine catalog read, safe bundle id

Addresses review 4534812056:

- installer.remove_bundle: only count a component as uninstalled when
  installer.remove() actually ran; components already absent on disk are
  reported as skipped, keeping the uninstalled count accurate.
- catalog.load_source_stack: confine the project-scoped .specify config read
  with ensure_within, so a symlinked .specify/ resolving outside the project
  root is refused (consistent with the bundler's other guarded reads).
- manifest: enforce a filesystem-safe slug for bundle.id in structural
  validation; packager.build_bundle adds an ensure_within defense-in-depth
  check so a crafted id can never push the artifact outside the output dir.

Also reverts the CHANGELOG entry (the changelog is updated separately).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): validate requires/provides shapes in manifest and catalog

Addresses review 4534855443:

- manifest: validate requires.tools and requires.mcp as list-of-strings via
  a shared _parse_str_list helper (also reused for tags), so a bare string
  like `tools: docker` is rejected with an actionable BundlerError instead of
  being split character-by-character.
- catalog.CatalogEntry.from_dict: validate that `requires` and `provides` are
  mappings before accessing them, so an untrusted catalog payload with
  `requires: "..."` raises a named BundlerError rather than escaping as a raw
  AttributeError traceback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): require README.md when building a bundle artifact

Addresses review 4534938014: build_bundle now fails early with an
actionable error when README.md is missing, matching the documented
artifact contract (manifest + README) instead of silently producing a
bundle with no human-facing description.

Also reverts CHANGELOG.md to the upstream/main copy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): validate record shapes; drop stale install --refresh claim

Addresses review 4534969692:

- records.InstalledBundleRecord.from_dict: hard-error when
  contributed_components is not a list, instead of iterating a corrupt
  bare string character-by-character.
- records.load_records: validate the top-level 'bundles' field is a list and
  fail with a clear BundlerError when a corrupt file makes it a mapping/string.
- PR description: remove the inaccurate "supports --refresh" note from
  `bundle install` (refresh is the `bundle update` path); docs already omit it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): refuse symlinked .specify, reject bad url schemes, IPv6 ids

Addresses review 4534997724:

- lib.project.find_project_root: a symlinked .specify is no longer accepted
  as a project root (is_dir() follows symlinks), matching the confinement the
  rest of the CLI applies and avoiding confusing downstream failures.
- catalog_config.add_source: reject unsupported url schemes (ssh://, ftp://,
  ...) up front instead of silently treating them as local paths; local paths
  containing ':' but not '://' are still allowed.
- catalog_config._derive_id: derive the host via urlparse().hostname so IPv6
  literals, credentials, and ports no longer corrupt the derived id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): strict semver, narrow artifact skip, preserve priority 0

Addresses review 4535084048:

- versioning.is_semver: enforce a full MAJOR.MINOR.PATCH SemVer (with optional
  pre-release/build) via a dedicated regex, instead of accepting any
  packaging.version.Version-parseable string (e.g. "1", "1.0"). This makes
  BundleManifest.structural_errors() reject non-semver versions.
- packager: narrow the prior-artifact skip pattern to semver-named zips
  (<id>-<x.y.z>.zip) so legitimate assets like <id>-assets.zip are still
  packaged.
- primitives (preset + extension install): use an explicit `is None` check so
  an intentional priority of 0 is preserved instead of being replaced by the
  default.

Adds regressions: non-semver rejection ("1"/"1.0"/"1.2.3.4"), asset-not-
excluded vs semver-artifact-excluded, and priority-0 pass-through.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): artifact regex for prerelease+build; clarify integration/priority docs

Addresses review 4535132279:

- packager: the prior-artifact skip regex now matches semver names carrying
  both a prerelease and build-metadata segment (e.g. 1.0.0-rc1+build5), so such
  an existing artifact is excluded rather than re-packaged — keeping builds
  bounded/deterministic, consistent with is_semver().
- docs/reference/bundles.md: correct the install integration wording.
  --integration selects the integration when initializing a new project and
  confirms the target when a pinned bundle's active integration can't be
  determined; it does NOT override a bundle that targets a specific integration
  (a mismatch aborts with no changes).
- examples/security-researcher README: reword the preset priority note in terms
  of the numeric comparison (ascending priority order) to avoid inverting the
  meaning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): --integration can't bypass clash guard; honest rollback docs

Addresses review 4535159341:

- bundle install: for an already-initialized project, the project's recorded
  active integration is now authoritative. --integration no longer overrides it
  (which let a copilot project install a claude-pinned bundle via
  `--integration claude`, bypassing the FR-019 clash guard). The override still
  selects the integration at init time and confirms the target only when the
  active integration cannot be determined.
- docs/reference/bundles.md: reword the install guarantee to match the
  implementation — no provenance record is written unless the install fully
  succeeds, and rollback of this run's components is best-effort (removal errors
  are swallowed, so partial on-disk state may remain). Dropped the inaccurate
  "atomic / rolls back everything" claim.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): validate component kind/id when loading records

Addresses review 4535194606: _component_from_dict now rejects a contributed
component whose 'kind' is not a supported component kind or whose 'id' is
empty, raising a BundlerError that explicitly flags the records file as
corrupt. Previously such a record loaded successfully and only failed later
(e.g. in primitive_manager() during bundle remove/update) with a less
actionable error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): address review 4535234003 (7 findings)

- versioning: tolerate an uppercase `V` prefix in `_normalize_semver` and
  `is_semver`, mirroring specify_cli._version tag normalization (V -> v) so
  `V1.2.3` parses and validates consistently.
- validator: import BundlerError and narrow the speckit_version constraint
  except clause to `BundlerError` only, so programming errors are no longer
  masked behind an "invalid constraint" message.
- bundle update: accept `--integration` and thread it through
  resolve_install_plan the same way `bundle install` does (override used only
  when the active integration can't be auto-detected), so integration-pinned
  bundles can be updated where `.specify/integration.json` is missing/unreadable.
- bundle validate: fold reference warnings into `report.warnings` so the
  ValidationReport is the single warning channel at the CLI layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(bundler): make update --integration help assertion ANSI-safe

Rich can split the "--integration" option label with ANSI escape codes
between the two leading dashes, so the literal substring check failed under
CI's terminal settings. Match the un-split option word instead, mirroring how
test_bundle_help_lists_all_commands checks bare command names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): preserve exec bits in artifacts; document install-time pins

Addresses review 4535280786:

- packager.build_bundle: no longer forces every ZIP member to 0644, which
  stripped the executable bit from bundled scripts (e.g. extension hook
  scripts) and could break them after extraction. Permissions are now
  normalized reproducibly to 0755 when the source file has any execute bit
  set, otherwise 0644 — identical inputs still yield byte-for-byte identical
  artifacts.
- installer.install_bundle + docs/reference/bundles.md: document that version
  pins are enforced install-time only. Because primitive is_installed checks
  are id-based (not version-aware), an already-present component is skipped
  during install without comparing its on-disk version to the manifest pin;
  pins are guaranteed applied only on a real install or `bundle update` refresh.

Added a regression asserting executable sources map to 0755 and plain files to
0644 in the built artifact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(bundler): skip exec-bit packager test on Windows

Windows filesystems do not carry Unix execute bits, so chmod(0o755) is a no-op
and the source file reports no execute bit — the packager then correctly stores
the member as 0644. The assertion that an executable source maps to 0755 is only
meaningful on POSIX, so skip it on nt rather than asserting platform-specific
behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): normalize prerelease spellings inside version constraints

Addresses review 4535327154: parse_version() normalized SemVer prerelease
spellings (e.g. 1.2.3-rc1 -> 1.2.3rc1) but parse_constraint() passed the
constraint to packaging.SpecifierSet unmodified, so ">=1.2.3-rc1" raised
InvalidSpecifier even though the same spelling is accepted for installed
versions. parse_constraint() now normalizes the version portion of each
comma-separated clause via the shared _normalize_semver helper, so prerelease
handling is consistent across versions and constraints.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bundler): validate schema versions and required record identity fields

Addresses review 4535351596:

- records.load_records: validate the on-disk 'schema_version' (required;
  forward-compatible across same-major minor bumps) and fail fast with an
  actionable error on a missing/unknown version, rather than silently parsing a
  possibly-incompatible format and risking incorrect bundle attribution/removal.
- records.InstalledBundleRecord.from_dict: treat missing 'bundle_id' or
  'version' as corruption and raise BundlerError, instead of coercing them to
  empty strings that let later list/remove/update operations behave
  unpredictably.
- catalog_config._read: validate 'schema_version' when present (same-major
  compatibility) and fail fast on an unsupported version so an incompatible
  future config shape can't be mis-parsed into a wrong effective catalog stack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(bundler): scrub generated dogfooding scaffold before merge

The bundler feature was developed by dogfooding Spec Kit on itself. Now that
the work is complete, remove all generated scaffolding so it does not land in
the repository on merge:

- specs/001-spec-kit-bundler/** (spec, plan, research, data-model, contracts,
  quickstart, tasks, checklists)
- .specify/** (extensions, integrations, scripts, templates, workflows,
  feature/init/integration metadata)
- .github/agents/speckit.*.agent.md, .github/prompts/speckit.*.prompt.md, and
  .github/copilot-instructions.md (Copilot integration scaffold)

Retained: .specify/memory/constitution.md — the single dogfooding artifact
carried forward — with its whitespace exemption in .gitattributes.

.gitattributes and .markdownlint-cli2.jsonc are reverted to the upstream
baseline (plus the constitution whitespace exemption), dropping the now-moot
exemptions for the removed scaffold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Pascal THUET <pascal.thuet@arte.tv>
2026-06-19 17:07:20 -05:00
Pascal THUET
d9370d909d fix: isolate per-extension failures so one bad extension can't drop the rest (#2951)
* fix: isolate per-extension failures in register_enabled_extensions_for_agent

The per-extension loop had no error isolation: if registering one enabled
extension raised (e.g. an OSError writing a command file), the loop aborted and
the exception propagated, so every subsequent enabled extension was silently
skipped. Callers wrap the whole call in a single best-effort try/except, so the
wholesale abort surfaced as one warning while the command still exited 0 —
leaving the agent with only a prefix of its extensions.

Wrap the per-extension body in try/except: warn (naming the extension) and
continue, so one bad extension can no longer drop the others. Add a regression
test that forces the first-iterated extension to raise and asserts the rest
still register.

Closes #2950

* fix(extensions): preserve command registry when skills fail

* fix: clarify skill registration warning
2026-06-19 12:41:02 -05:00
Pascal THUET
a17a658bbd feat(scripts): add SPECIFY_INIT_DIR to target a member project from the repo root (#2892)
* feat(scripts): add SPECIFY_INIT_DIR to target a member project from the repo root

Resolve an explicit SPECIFY_INIT_DIR project override once in the core
get_repo_root / Get-RepoRoot, so a non-interactive / CI caller can target a
member project (the directory containing .specify/) from a monorepo root
without cd. Strict by design: the path must exist and contain .specify/,
otherwise it hard-errors with no silent fallback.

- Single resolver in core; the git feature-branch script inherits it by
  sourcing core, with no per-extension copies.
- PS resolver verifies the resolved path is a directory (Resolve-Path also
  succeeds for files) so a file value errors as "not an existing directory".
- get_feature_paths splits decl/assignment so a SPECIFY_INIT_DIR failure
  propagates instead of being masked by `local`.
- create-new-feature-branch: when core is absent (only git-common loaded) and
  SPECIFY_INIT_DIR is set, hard-error rather than silently using the git root.
- Document SPECIFY_INIT_DIR and SPECIFY_FEATURE_DIRECTORY in the core reference.
- Tests for valid/relative/trailing-slash/file/missing/no-.specify targets,
  feature-axis composition, the no-core guard, and a PowerShell mirror.

* fix: guard SPECIFY_INIT_DIR with stale core scripts

* docs: clarify SPECIFY_FEATURE_DIRECTORY precedence wording

* fix: normalize trailing slash in PowerShell SPECIFY_INIT_DIR resolver

Resolve-Path preserves a trailing separator from its input, so a
SPECIFY_INIT_DIR ending in a slash returned a root that didn't match the
bash resolver (whose `cd && pwd` strips it). That broke
test_ps_trailing_slash_tolerated on the CI runners, which do have pwsh.
Trim it with TrimEndingDirectorySeparator (no-op on a bare root or a path
with no trailing separator).

Also fix the misleading test comment: the PowerShell mirror runs on the
CI ubuntu/windows runners (they ship pwsh), it is not skipped there.

* test: normalize bash path expectations on Windows

* docs: clarify SPECIFY_INIT_DIR root helpers
2026-06-19 12:05:42 -05:00
Ed Harrod
98ee02a98b feat(claude): run /analyze in a forked subagent (#2511)
* claude: run /analyze in a forked subagent

/analyze is explicitly read-only and produces a compact analysis
report from heavy artefact reads (spec.md, plan.md, tasks.md). It
matches the canonical use case for context: fork — bulk inputs that
collapse to a short summary, no need for conversation history.

Forking keeps the artefact contents out of the main conversation
context, which is the concern raised in #752.

Done as a per-command opt-in via FORK_CONTEXT_COMMANDS so other
spec-kit commands (which are interactive or have side effects) are
unaffected.

Refs #752

* claude: apply per-command frontmatter on every skill-generation path

argument-hint and fork context were injected only in setup(), so skills
produced via post_process_skill_content() directly (presets, extensions)
lost them - e.g. a preset overriding speckit-analyze dropped context: fork.

Move the per-command injection into post_process_skill_content(), deriving
the command stem from the frontmatter name, so all generation paths stay
consistent. setup() now just calls post_process_skill_content().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* claude: drop redundant post-process loop from setup

SkillsIntegration.setup() already runs post_process_skill_content()
on every SKILL.md before writing it, and that method now applies the
argument-hint and fork-context injection. The per-file re-process loop
in ClaudeIntegration.setup() was therefore a no-op, so inherit the
base setup() directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 10:28:45 -05:00
LuoHui1
4eda983950 fix: count worktree branches in git extension numbering (#3054)
* fix: count worktree branches in git extension numbering

* fix: preserve literal plus branch prefixes
2026-06-18 09:40:32 -05:00
Ben Buttigieg
0c29d890ab feat: add /speckit.converge command (#3001)
* Add /speckit.converge SDD artifacts and project scaffolding

Dogfood the converge feature through Spec Kit's own workflow:

- spec.md, plan.md, tasks.md, research, data-model, contracts, quickstart
- requirements checklist for the feature
- ratified constitution v1.0.0 (.specify/memory)
- Specify project scaffolding (.specify/, .github agent + prompt files)

Defines a built-in /speckit.converge command that assesses spec/plan/tasks
against the codebase and appends remaining work as new tasks (no git, no
change tracking, append-only). Implementation not yet started.

Excludes unrelated working-tree changes to agents.py, extensions.py,
test_extensions.py, catalog.community.json, and README.md.

* Implement /speckit.converge command

Add the built-in converge command that assesses the codebase against a
feature's spec.md, plan.md, and tasks.md and appends remaining unbuilt work
as new traceable tasks to tasks.md (append-only; no git, no change tracking).

- templates/commands/converge.md: full command body (load artifacts, assess
  code, classify findings missing/partial/contradicts/unrequested, append
  '## Phase N — Convergence' tasks with source-ref + gap-type, read-only
  guardrails, converged branch, handoff, before/after_converge hooks)
- Register converge as a core command across all enumeration sites
  (SKILL_DESCRIPTIONS, _FALLBACK_CORE_COMMAND_NAMES, ARGUMENT_HINTS, and the
  integration test command lists incl. copilot/generic file inventories)
- init.py Next Steps panel + README Core Commands table
- tasks.md: T001-T024 complete (T025 manual quickstart pending)

Full suite green: 2343 passed.

* Record quickstart validation results for /speckit.converge (T025)

All six quickstart scenarios validated (GitHub Copilot agent, macOS/zsh):
S1 gap->appended traceable task, S2 implement+re-converge, S3 converged leaves
tasks.md unchanged, S4 read-only boundaries, S5 missing-prereq stop, S6 cross-
integration install (copilot + windsurf). Automated suite: 2343 passed.

* Record 2026-06-16 re-verification results for /speckit.converge (T025)

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Fix integration upgrade deleting settings.json and dropping script +x

Two upgrade-path bugs surfaced during converge E2E validation:

- copilot upgrade stale-deleted .vscode/settings.json because setup() only tracks the file when it creates it; on upgrade the pre-existing file is merged and left untracked, so Phase 2 stale cleanup removed it. Add an integration-level stale_cleanup_exclusions() hook (CopilotIntegration returns {.vscode/settings.json}) and subtract it from stale_keys.

- shared .specify/scripts/*.sh lost their execute bit because the managed refresh rewrites them with the bundled source mode (often 0o644) and nothing restored perms. Call ensure_executable_scripts() after the managed-refresh block (POSIX only).

Add regression tests in TestIntegrationUpgrade covering both fixes (validated to fail without the fixes).

* fix: resolve markdownlint errors in PR files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: clean up runtime state files from PR

Remove .specify state files that are per-project runtime artifacts:
- feature.json, init-options.json, integration.json
- manifest files, extension registry, bug artifacts

These are generated by 'specify init' and should not be committed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: fold converge artifacts from #3003 and #3005

- Add speckit.converge Copilot agent and prompt files (#3003)
- Add regression test for Claude argument hints (#3005)
- Remove invalid converge entry from Claude argument hints
- Fix documentation removing branch-prefix fallback claims

Supersedes: #3003, #3005

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove non-converge specify scaffolding from PR

Remove .specify/ artifacts, non-converge .github/agents and prompts,
and copilot-instructions.md that were generated by 'specify init'
and are not part of the converge command feature.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove SDD spec artifacts from PR

Remove specs/001-converge-command/ — the spec/plan/tasks/research SDD
artifacts produced while building this feature. spec-kit does not track
a specs/ directory on main (those are outputs of running the workflow on
the repo, not part of the shipped tool).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove generated Copilot converge command files

Remove .github/agents/speckit.converge.agent.md and
.github/prompts/speckit.converge.prompt.md — these are generated by
'specify init --integration copilot' from templates/commands/converge.md
(all __SPECKIT_COMMAND_*__/{SCRIPT} tokens are resolved). main tracks no
.github/agents or .github/prompts files; the template is the source of truth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: split out unrelated integration-upgrade fix

Move the stale_cleanup_exclusions / executable-bit upgrade fix
(base.py, copilot, _migrate_commands.py, test_integration_subcommand.py)
out of this PR into its own change. This PR is now scoped purely to the
/speckit.converge command.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: add converge to core command template ordering

converge is a core command in SKILL_DESCRIPTIONS but was missing from
_CORE_COMMAND_TEMPLATE_ORDER, so it sorted with the fallback rank. Add it
after 'implement' to keep core-command ordering consistent across
integrations.

Addresses review feedback on #3001.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: make converge findings example neutral

Replace the self-referential sample evidence text in the Convergence
Findings table with a neutral placeholder so agents are less likely to copy
nonsensical template-specific findings into real output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs: clarify converge scope and hook outcome wording

- Remove FR-specific parenthetical from code-scope rule so it doesn't imply
  a hard FR-001 reference exists in every feature
- Replace unsupported 'pass outcome to hook context' instruction with explicit
  in-session outcome reporting before hook listing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: align converge task example with tasks format

Use  (no colon) in the convergence task example so it
matches tasks-template formatting and downstream expectations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Clarification of usage

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs: align converge phase/task-id format with tasks template

- Use  (colon) for consistency with tasks template
- Clarify appended task IDs must be zero-padded ( style)
- Update checklist example to a concrete zero-padded ID ()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: standardize converge phase heading format

Use  consistently in converge.md (including the
append-only contract section) to match Step 7 and tasks template style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 14:47:00 -05:00
Ben Buttigieg
84db931f18 fix: preserve .vscode/settings.json and script +x bit on integration upgrade (#3020)
* fix: preserve .vscode/settings.json and script +x bit on integration upgrade

During 'specify integration upgrade', Phase 2 stale-cleanup removes files
present in the old manifest but absent from the new one. Copilot's setup()
merges into an existing .vscode/settings.json and stops tracking it, so the
file was being deleted on upgrade (destroying user settings). Add a
stale_cleanup_exclusions() hook that integrations use to protect such
conditionally-tracked merge targets. Also restore the executable bit on
shared .sh scripts after the managed-refresh step on POSIX.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address review on stale-cleanup fix

- Normalize stale_cleanup_exclusions() to POSIX before subtracting from
  manifest keys, so exclusions built with os.path.join / backslashes still
  match on Windows.
- Strengthen test_upgrade_preserves_existing_vscode_settings to add a
  user-defined key and assert it survives the upgrade (via --force, exercising
  the merge + stale-cleanup path) instead of the brittle after == before check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 14:22:04 -05:00
Huy Do
affbf5ead5 feat(workflows): add from_json expression filter (#2961)
* feat(workflows): add from_json expression filter

Step outputs captured as strings could never become typed values in
templates - the filter set was default/join/map/contains only, so e.g.
a fan-out items: could never consume a step's JSON stdout. Add an
arg-less from_json pipe filter with parse-or-raise semantics: invalid
JSON or non-string input raises a clear ValueError rather than passing
through silently.

Fixes #2960

* fix(expressions): make from_json strict — reject any arguments

Address review (#2961): from_json('x') and from_json() previously fell through to a silent passthrough of the unparsed value. Reject any parenthesized form with a clear error so mis-wired templates fail loudly. Rename test to ...parses_object (JSON under test is an object) and add coverage for the strict no-arguments behavior.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs(workflows): document the from_json expression filter

Address Copilot review: the user-facing filter references omitted the
newly added `from_json` filter. Add it to the ARCHITECTURE.md filter table
(with the `{{ steps.emit.output.stdout | from_json }}` example) and to the
filter enumerations in workflows/README.md and docs/reference/workflows.md
so the docs match the evaluator's capabilities.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): make from_json strictness reject trailing tokens; fix docstring

Address Copilot review:
- Strictness only rejected parenthesized forms, so typos like
  `| from_json)` or `| from_json extra` still fell through to the
  unknown-filter path and silently returned the unparsed value. Match on
  the leading filter token and require the whole filter to be exactly
  `from_json`, so every mis-wired form raises. Extend the rejection test to
  cover the trailing-token cases.
- The module docstring claimed "no imports", which is misleading now that
  the module imports `json`. Reword to state the actual sandbox guarantee:
  templates cannot do file I/O, import modules, or run arbitrary code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-17 13:43:26 -05:00
Copilot
00bff788c9 Add init workflow step to bootstrap projects like specify init (#2838)
* Initial plan

* Add init workflow step to bootstrap projects like `specify init`

* Address review: simplify stderr capture and extract VALID_SCRIPT_TYPES

* Address review: fail fast on non-empty dir, stdout fallback, README force fix

* Populate exit_code/stdout/stderr in non-empty-dir fast-fail

* fix: address three unresolved review comments in InitStep

- Use `with os.scandir(...)` context manager so the iterator is always
  closed even when `any()` short-circuits, preventing file-descriptor
  leaks in long-running workflow runs.
- Guard `os.chdir(prev_cwd)` in the `finally` block with a try/except
  so an `OSError` (e.g. directory deleted) doesn't bypass returning
  the captured `StepResult`.
- Reject non-string `script` values in `validate()` with a clear error
  message, rather than silently passing them through to become
  `--script True` at runtime.

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* fix: remove no_git and branch_numbering options removed upstream

The --no-git and --branch-numbering flags were removed from `specify init`
on main. Update InitStep to drop these unsupported config fields and fix
tests accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address review — integration defaults, integration_options, engine-owned dirs

- Apply DEFAULT_INIT_INTEGRATION fallback when neither step config nor
  workflow context provides an integration, so output.integration always
  reflects the actual integration used.
- Add integration_options config field to support --integration-options
  passthrough (required for generic integration and --skills mode).
- Exclude .specify/ from the non-empty directory fast-fail check so that
  here: true works when the engine has already created its run-state
  directory before steps execute.
- Note: mix_stderr=False is not needed — Click 8.2+ captures stderr
  separately by default and the existing try/except handles access.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: implicitly add --force when only engine-owned dirs exist

When the workflow engine creates .specify/workflows/runs/ before steps
execute, the directory is technically non-empty. Previously, specify init
would prompt for confirmation (hanging in unattended mode) unless the
user explicitly set force: true. Now the step detects that only
engine-owned directories (.specify/) are present and implicitly adds
--force so init proceeds without user interaction.

Also fixes the test to exercise the implicit-force path rather than
passing force: True explicitly (which bypassed the check entirely).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: derive VALID_SCRIPT_TYPES from shared constant, fail fast on OSError, include all resolved fields in output

- Derive VALID_SCRIPT_TYPES from SCRIPT_TYPE_CHOICES in _agent_config
  so the valid set cannot drift from the specify init CLI.
- Fail fast with a clear error when os.scandir() raises OSError (e.g.
  permission denied) instead of silently treating the directory as empty.
- Include preset, force, and ignore_agent_tools in all output dicts
  (both fast-fail and normal paths) for consistent interpolation and
  debugging downstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: populate stderr from stdout on older Click, fix force comment wording

- When Click does not expose result.stderr (older versions where stderr
  is mixed into stdout), use stdout as stderr on non-zero exit so
  workflows can consistently read steps.<id>.output.stderr for errors.
- Update README inline comment for force: wording to say 'when target
  directory already exists' rather than 'non-empty directory', matching
  the actual specify init behavior for the project: form.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: build argv flags before early returns, use any() for dir scan

- Move argv flag-building (--integration, --script, --preset,
  --ignore-agent-tools) before the non-empty-dir and OSError early
  returns so output['argv'] always reflects the complete command.
- --force is appended after the check since it may be set implicitly.
- Replace list comprehension with any() generator expression to
  short-circuit without allocating a full list of DirEntry objects.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: only treat .specify as engine-owned when it is a real directory

A file or symlink named .specify should not be excluded from the
non-empty check. Use entry.is_dir(follow_symlinks=False) to ensure
only an actual directory is considered engine-owned content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: guard implicit force for engine dirs only, fix integration fallback order

- Only set implicit --force when engine-owned directories (.specify/)
  are actually present. A completely empty directory no longer gets
  --force added unnecessarily.
- Fix integration resolution precedence: resolve step config expression
  first, then fall back to workflow default (also resolved), then to
  DEFAULT_INIT_INTEGRATION. Previously, a step expression resolving to
  falsy would bypass the workflow default entirely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Manfred Riem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 11:46:51 -05:00
Huy Do
071f784dfa feat(workflows): opt-in output_format: json exposes parsed shell stdout as output.data (#2963)
* feat(workflows): opt-in output_format: json exposes parsed shell stdout as output.data

No step that runs external code could hand a typed value to a later
step, so e.g. a fan-out could never consume a runtime-computed
collection. With output_format: json declared, stdout is parsed and
exposed under output.data (raw keys unchanged); a parse failure fails
the step with a clear error. Without the key, behavior is unchanged.

Reference implementation for the proposal in #2962.

Addresses #2962

* test(shell): emit JSON via sys.executable for cross-platform output_format tests

Address review (#2963): replace non-portable echo '{...}' (Windows cmd.exe keeps the single quotes, breaking JSON) with the established '"{py}" "{script}"' pattern using sys.executable + a temp script, so the output_format tests pass on the Windows CI matrix. Also make the validate test's run inert (exit 0).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-17 08:09:17 -05:00
Huy Do
1ee2b626a8 fix: non-zero exit code when a workflow run ends failed or aborted (#2959)
* fix: non-zero exit code when a workflow run ends failed or aborted

workflow run and workflow resume printed Status: failed (or emitted the
--json payload) and exited 0, so scripts and CI could not rely on the
process exit code. Map terminal outcomes: failed|aborted -> 1,
completed|paused -> 0, on both the text and --json paths.

The previous exit-0-on-failed behavior was pinned by
test_workflow_run_failing_yaml_without_project; the pin is updated to
the new contract.

Fixes #2958

* test: portable exit-code step commands + cover resume failed->exit-1

Address review (#2959): replace non-portable run: 'true'/'false' with 'exit 0'/'exit 1' (Windows cmd.exe has no true/false builtins under shell=True), and add an end-to-end 'workflow resume' test asserting a resumed failed run exits non-zero.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-17 08:02:00 -05:00
Seiya Kojima
811a3aa447 fix(skills): preserve non-ASCII characters in skill frontmatter (#2917)
* fix(skills): preserve non-ASCII chars in skill frontmatter

Skill SKILL.md frontmatter descriptions containing non-ASCII
characters were escaped to \uXXXX / \xXX sequences because
yaml.safe_dump() was called without allow_unicode=True.

- Add allow_unicode=True to the 7 skill/command frontmatter
  safe_dump sites (extensions, presets, claude integration)
- Add regression tests for the render and extension-install paths

Follows the approach of #1936; encoding="utf-8" is already set on
the affected write paths, so no encoding change is needed here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(_utils): add dump_frontmatter helper

Centralize skill/command frontmatter YAML serialization into a single
_utils.dump_frontmatter helper so no call site can drop allow_unicode or
diverge on formatting. Route the 7 existing sites through it and drop a
now-unused local yaml import.

Switch the extension test fixtures to yaml.safe_dump for parity with the
production safe-dump/safe-load codepaths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 07:57:54 -05:00
Jina Park
de18d21b1c fix: prevent extension self-install from deleting source dir (#2990) (#2991)
* fix: prevent extension self-install from deleting source dir (#2990)

`specify extension add <path> --dev --force` permanently deleted the
extension directory without registering it when the source path resolved
to the extension's own install location (`.specify/extensions/<id>`).

With `--force`, `install_from_directory()` removed the existing
installation (the source) and then `shutil.copytree()` tried to copy from
the now-deleted directory, destroying it and crashing.

Add a guard that fails fast with a clear ValidationError when the resolved
source path equals the install destination, before any destructive
operation runs. Includes a regression test asserting the directory and its
contents survive.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: harden extension self-install guard

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 07:56:17 -05:00
Manfred Riem
75aee19c6e fix: disable Rich Live transient mode on Windows to prevent PS 5.1 hang (#2938)
* fix: disable Rich Live transient mode on Windows to prevent PS 5.1 hang

PowerShell 5.1's legacy console host does not reliably support VT escape
sequences. Rich's Live(transient=True) attempts cursor restoration on
context exit, which hangs indefinitely on that console.

Set transient=False when sys.platform == 'win32' in both init.py (progress
tracker) and _console.py (select_with_arrows). The only cosmetic effect is
that progress output remains visible after completion on Windows.

Fixes #2927

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: address review feedback on test quality

- Use captured['transient'] instead of .get() for clearer KeyError on failure
- Source guards now assert both the platform check AND transient=_transient usage
- Remove unused imports (MagicMock retained as it's used, removed pytest)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: use regex in source guards for resilience to formatting changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: use single DOTALL regex to verify assignment flows into Live()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip duplicate tracker print on Windows when transient=False

When transient is False, Rich leaves the Live output on screen. The
subsequent console.print(tracker.render()) would duplicate it. Gate
it behind _transient so Windows users see the tracker exactly once.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-17 06:48:42 -05:00
Copilot
c52ccd7dc7 Add workflow step catalog — community-installable step types (#2394)
* Initial plan

* Add workflow step catalog: StepRegistry, StepCatalog, CLI commands, and tests

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/2885e646-477d-4df8-b9a3-06d8cb29e748

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Potential fix for pull request finding 'An assert statement has a side-effect'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* Address PR review: path traversal, cache robustness, collision check, failed-to-load display

- Add resolve()+relative_to() path traversal guards in workflow_step_add and
  workflow_step_remove to prevent directory escape via step_id
- Harden _is_url_cache_valid in both StepCatalog and WorkflowCatalog to
  coerce fetched_at to float and catch TypeError/ValueError
- Check STEP_REGISTRY and StepRegistry before installing to prevent
  collisions with built-in step types or already-installed steps
- Show 'Custom (installed, failed to load)' section in workflow step list
  for steps in the registry that failed to load into STEP_REGISTRY

* Fix StepRegistry shape validation and StepCatalog empty-YAML handling

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Polish: rename _default to default_registry, strengthen unreadable-file test

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0dca6393-f5a9-40de-bb5c-77ba6af033d2

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Address PR review: atomic install, hostname validation, cache resilience, no dynamic imports in list/info

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Fix shutil.move with existing step_dir: remove before move to avoid subdirectory nesting

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/3e18fef0-e2e6-4b3e-9e8d-9adb1e5e464e

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Call load_custom_steps at execution time; enforce hostname in _safe_fetch and _validate_url

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/73865880-fb25-4061-a43e-4e4b4d1c4de6

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Wrap YAML parsing in try/except; atomic step install via os.rename() under same fs

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/ff915bc5-ec7e-4e6a-b505-35f5795250df

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Validate YAML root is a dict in _load_catalog_config and workflow_step_add; fix WorkflowCatalog hostname validation

Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

* Fix load_custom_steps() package imports and add reserved step ID validation

* Move _re/_sys imports out of loop and _RESERVED_STEP_IDS to module level

* Address review: collision-resistant module names, extra_files support, remove orphan dir

* Harden extra_files: warn on non-dict, resolve symlinks in path traversal check

* Switch _safe_fetch and StepCatalog._fetch_single_catalog to use open_url for auth consistency

* Harden step_id validation against path-segment tricks; raise on StepRegistry.save() OSError

* Clean up sys.modules on broken step packages; handle StepValidationError in step add/remove

* Address review thread: int-coerce priorities, sys.modules cleanup, _require_specify_project, registry-first remove

* fix: normalize workflow step catalog metadata fallbacks

* fix: address latest workflow step and catalog review findings

* Handle non-string extra_files keys in workflow step add

* Harden StepRegistry symlink reads and extra_files path/URL validation

* Harden custom step loader and step remove against symlinks and OSError

* Fix StepCatalog.search() to coerce non-string fields before joining

* Fix WorkflowCatalog YAML parsing error handling and isinstance checks

* Harden step registry save and custom step/catalog ID handling

* Harden cache validation and staging OSError handling

* Address review: reorder symlink guard and split mixed test

- Move symlink-parent check before is_dir() in load_custom_steps() so
  we never stat an external target through a symlink
- Split test_get_merged_steps_normalizes_list_ids_to_strings into two
  focused tests: one for list-id normalization, one for get_step_info
  return values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: symlink-before-stat in loader, restore registry on rmtree failure

- load_custom_steps(): check is_symlink() before is_dir() on step
  directories so symlinked entries are skipped without statting external
  targets
- workflow_step_remove: restore the registry entry when shutil.rmtree()
  fails so filesystem and registry state stay consistent and a future
  'step add' isn't blocked

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Harden step_id validation and file-write error handling

- _validate_step_id_or_exit: reject whitespace-only/padded IDs,
  Windows-invalid characters (<>:"|?*), control characters, trailing
  dots/spaces, and Windows reserved device names (con, nul, etc.)
- Wrap step.yml/__init__.py staging writes in OSError handler
- Wrap extra_files disk writes (mkdir + write_bytes) in OSError handler
  that names the failing relative path
- Registry rollback on rmtree failure: restore verbatim metadata and
  emit a warning if the restore itself fails

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: cache symlink guard, verbatim registry rollback, Windows test fix

- StepCatalog: add _is_cache_path_safe() guard that checks for symlinks
  in .specify/workflows/steps/.cache path; skip cache reads and writes
  when any component is symlinked to prevent writes outside project root
- Registry rollback: write metadata directly to registry.data['steps']
  and call save() instead of using add() which overwrites timestamps
- temp_dir fixture: use ignore_errors=True on Windows to avoid flaky
  teardown from locked file handles (WinError 32)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify exec_module call by removing redundant nested try/except

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix empty YAML tolerance in WorkflowCatalog.add_catalog, scope ignore_errors to Windows

- WorkflowCatalog.add_catalog(): treat None from yaml.safe_load() (empty
  file) as an empty mapping instead of raising 'corrupted'
- temp_dir fixture: limit ignore_errors to sys.platform == 'win32' so
  real cleanup issues surface on non-Windows platforms

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Chain exceptions in _load_catalog_config for both catalog classes

Add 'from exc' to preserve root cause in tracebacks while keeping
clean user-facing messages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make default catalog tests hermetic by isolating HOME

Monkeypatch Path.home() to project_dir and clear catalog env vars so
tests don't break on machines with a real ~/.specify/step-catalogs.yml
or ~/.specify/workflow-catalogs.yml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix falsy ID handling in _get_merged_steps for list-based catalogs

Check for None explicitly instead of using 'or' which drops valid
falsy IDs like 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Compare reserved step IDs case-insensitively for filesystem safety

On case-insensitive filesystems (Windows, common macOS), variants like
STEP-REGISTRY.JSON would collide with the actual registry file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add explanatory comments to intentional empty except blocks

Document why cache-read failures are silently ignored in both
WorkflowCatalog and StepCatalog _fetch_single_catalog methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-16 18:03:45 -05:00
Pascal THUET
9cd20c6c25 feat(dev): add integration scaffolder (#2685)
* feat(dev): add integration scaffolder

* fix(dev): address integration scaffold review feedback

* fix(dev): address scaffold follow-up review

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix(dev): default scaffolded integrations to multi_install_safe = False

The scaffold template emitted `multi_install_safe = True` alongside a
placeholder `context_file = "AGENTS.md"`. Registered as-is, that violates the
registry contract (test_safe_integrations_have_distinct_context_files): codex
already pairs AGENTS.md with multi_install_safe = True, so the generated
boilerplate would collide on first registration.

Default the scaffold to False (matching IntegrationBase) so generated code is
registry-test-friendly out of the box; contributors opt in once they pick a
unique context_file. Aligns the generated test skeleton and both scaffold
tests, which previously contradicted each other (one expected True, one False).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(dev): harden scaffold writes and accept case-insensitive --type

- Guard scaffold_integration() against symlinked target directories: walk
  each path component under the repo root and refuse symlinked dirs, then
  confirm the write destination resolves inside the repo (mirrors the
  manifest directory guard). Prevents scaffolding outside the repo when a
  contributor's integrations/tests path is symlinked.
- Make the `--type` click.Choice case-insensitive so `--type YAML` is
  accepted, matching scaffold_integration()'s strip()/lower() normalization
  instead of rejecting at the CLI layer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(dev): report scaffold filesystem failures as a clean CLI error

The `dev integration scaffold` command only caught FileExistsError/ValueError,
so an OSError raised during mkdir()/write_text() (permission denied, read-only
checkout, a path component that is a file, ...) bubbled up as a traceback
instead of a clean error + exit code. Broaden the handler to OSError (which
also covers FileExistsError) and add coverage for the filesystem-error path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(dev): move scaffold command under integration

* fix(dev): roll back partial scaffold writes

* fix(dev): correct lint docs and generated test docstring

- local-development.md: ruff check src/ is enforced in CI, not absent
- scaffolded test docstring: drop misleading 'scaffold' wording

* fix(scaffold): create only leaf integration directory

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 17:48:40 -05:00
Alicia Sykes
6d057b6239 fix(tests): don't run PowerShell tests via WSL-interop powershell.exe (#2971)
* fix(tests): don't run PowerShell tests via WSL-interop powershell.exe

* fix(tests): applies copilot feedback, with rename
2026-06-16 17:36:24 -05:00
Ahmet TOK
1150d32aee Add Zed integration (#2780)
* feat: add Zed integration

* fix: update integrations stats grid to 31 for consistency

* fix: address Copilot review feedback

- Remove non-actionable --skills flag from ZedIntegration (Zed is always
  skills-based, like Agy)
- Align zed_skill_mode predicate with ai_skills for consistency across
  init output and hook rendering
- Consolidate claude/cursor/zed slash-skill return blocks in
  _render_hook_invocation to reduce duplication
- Override test_options_include_skills_flag for Zed (no --skills flag)

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: address Copilot review round 2

- Make zed_skill_mode unconditional in hook rendering (Zed is always
  skills-based, no --skills option)
- Add test_init_persists_ai_skills_for_zed that exercises the actual
  CLI init path and verifies HookExecutor renders /speckit-plan
  without manual init-options manipulation

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: address copilot review feedback for zed integration

- Update integration count from 31 to 33 in docs/index.md (32 integrations + Generic)
- Make zed_skill_mode unconditional to match extensions.py behavior
- Consolidate slash-skill integrations into a set for consistency
- Move os import to module level in test_integration_zed.py

* fix: refine slash-skill logic and ai-skills validation

- Fix slash-skill integrations: Claude/Cursor require ai_skills=true; Zed/Agy/Devin are always skills
- Allow --ai-skills with --integration (not just --ai) to fix validation error

* fix: remove unused variables and update ai-skills help text

- Add agy_skill_mode and devin_skill_mode variables to fix F841 lint error
- Use all skill mode variables in the slash-skill conditional check
- Update --ai-skills help text to reflect it works with --integration too

* fix: add trae_skill_mode to hook invocation for consistency

Trae is a SkillsIntegration like Zed/Agy/Devin, so it should also be treated
as always-skills-based in hook invocation rendering.

* fix: make Agy always skills-based for consistency

AgyIntegration is a SkillsIntegration subclass with no --skills option,
so it should be treated as always skills-based (like Zed, Devin, Trae).
This aligns init.py skill mode detection with extensions.py hook rendering.

* fix: gate agy_skill_mode and refactor _render_hook_invocation to use sets

Addressed Copilot review comments:

- Restored _is_skills_integration guard on agy_skill_mode in init.py
  to be defensive about runtime integration type.
- Refactored _render_hook_invocation() in extensions.py to use
  always_slash/conditional_slash frozensets instead of individual
  per-agent booleans, eliminating unused variables (F841) and making
  it harder for conditions to drift between integrations.
- Centralized slash-skill determination so adding a new unconditional
  slash-skill integration is a one-key addition.

* fix: address latest Copilot review comments

- Added copilot to CONDITIONAL_SLASH_AGENTS for consistent
  hook invocation rendering with init.py
- Moved always_slash/conditional_slash frozensets to module
  scope to avoid per-call reallocation
- Replaced manual os.chdir() with monkeypatch.chdir() in test
- Overrode test_options_include_skills_flag for Zed (no --skills)

* fix: address latest Copilot review comments

- Removed redundant local import yaml in _register_extension_skills
  (yaml is already imported at module scope)
- Split --ai-skills usage hint into two separate print statements
  for better readability
- Changed integrations count from '33' to '30+' to avoid future drift

* fix: re-add _is_skills_integration definition lost in merge

The _is_skills_integration variable was accidentally dropped during the
web UI merge resolution of upstream/main's removal of legacy --ai flags.
Re-added the definition via isinstance(resolved_integration, SkillsIntegration)
check so that skill-mode booleans work correctly.

* fix: gate zed_skill_mode on _is_skills_integration for consistency

Aligns zed_skill_mode with the other skills-based agents (codex, claude,
cursor-agent, copilot) which all use _is_skills_integration gating.
Since ZedIntegration extends SkillsIntegration, behavior is unchanged.

* fix: remove unused claude_skill_mode and cursor_skill_mode locals in _render_hook_invocation

These variables became unused after the refactor to ALWAYS_SLASH_AGENTS /
CONDITIONAL_SLASH_AGENTS sets. Claude and Cursor-Agent are now handled by the
CONDITIONAL_SLASH_AGENTS path, so the separate boolean locals are dead code.

Fixes ruff F841 and addresses Copilot review feedback that was repeated across
multiple review rounds.

* fix: align agy/trae invocation format in init next-steps with hook rendering and build_command_invocation

- Moved agy and trae from '-<name>' (dollar/Codex format) to
  '/speckit-<name>' (slash format) in _display_cmd() to match:
  - HookExecutor._render_hook_invocation() (ALWAYS_SLASH_AGENTS for trae,
    CONDITIONAL_SLASH_AGENTS for agy)
  - SkillsIntegration.build_command_invocation() (default: /speckit-<name>)
- The '$' prefix is specific to Codex; all other skills agents use '/'.

* fix: address Copilot review comments on hook invocation consistency

- Add is_slash_skills_agent() helper to extensions.py to centralize the
  agent-to-invocation-format mapping, reducing drift risk between
  HookExecutor._render_hook_invocation() and init.py _display_cmd()
- Use the shared helper in both locations; init.py now imports and
  delegates to is_slash_skills_agent() instead of maintaining its own
  per-agent boolean matrix
- Fix test_hooks_render_skill_invocation to use ai_skills=False,
  proving Zed renders /speckit-<name> unconditionally
- Add parameterized TestSlashSkillsSets covering all agents in
  ALWAYS_SLASH_AGENTS and CONDITIONAL_SLASH_AGENTS with ai_skills
  both true and false

* fix: address Copilot review comments on type safety and test API

- Make is_slash_skills_agent() accept str | None to match its call sites
  (init_options.get("ai") can return None)
- Refactor TestSlashSkillsSets to use public execute_hook() API instead of
  private _render_hook_invocation() method

* fix: address Copilot review comments on typing and naming clarity

- Add from __future__ import annotations to extensions.py so PEP 604
  unions (str | None) are safe regardless of Python version
- Add clarifying _ai_skills_enabled local variable in init.py's
  _display_cmd() to make the semantic meaning explicit when passing it
  to is_slash_skills_agent()

* fix: move invocation-style logic into shared _invocation_style module

- Extract ALWAYS_SLASH_AGENTS, CONDITIONAL_SLASH_AGENTS, and
  is_slash_skills_agent() from extensions.py into new _invocation_style.py
  module, eliminating the awkward init.py -> extensions.py import
  dependency for invocation-style decision logic
- Both HookExecutor._render_hook_invocation() and init.py _display_cmd()
  now import from the shared module instead of one subsystem importing
  from the other
- Revert /SKILL.md change: the leading slash is semantically significant
  (path component vs filename suffix)

* fix: add None guard before i.options() in test_options_include_skills_flag

get_integration() returns IntegrationBase | None, so i.options()
is a type error without a None check.

* fix: override test_options_include_skills_flag for Zed (always skills, no --skills flag)

Zed is always skills-based and doesn't expose a --skills option.
Override the inherited base test to assert --skills is absent.

* fix: rename test and skip inherited test_options_include_skills_flag for Zed

- Skip inherited test_options_include_skills_flag (not applicable — Zed
  is always skills-based with no --skills flag)
- Add test_options_do_not_include_skills_flag with correct name matching
  the assertion (--skills is absent)

* fix: add defensive non-string check in is_slash_skills_agent

Reject non-string values for selected_ai to prevent TypeError from
set membership checks when persisted init-options contain corrupted
data (e.g. list or dict instead of string).

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-16 17:29:08 -05:00
Huy Do
36fd5f6f49 fix: fail loudly when a fan-out 'items' expression does not resolve to a list (#2957)
A non-list result from the items expression is a wiring error (the
template did not resolve to a collection); silently fanning out over
zero items hides it until a confusing downstream failure. Fail the
step with an error naming the expression instead. An explicit empty
list remains valid input.

Fixes #2956
2026-06-16 15:33:11 -05:00
darion-yaphet
f20e8ee6f7 refactor: move preset command handlers to presets/_commands.py (PR-6/8) (#2826)
* refactor(presets): convert presets.py module to presets/ package

Pure structural move to mirror integrations/. presets.py becomes
presets/__init__.py with relative imports rebased one level deeper.
No behavior change; public import surface (from .presets import ...)
preserved. Prepares for co-locating preset command handlers in PR-6/8.

* refactor: move preset command handlers to presets/_commands.py (PR-6/8)

Cut the preset_app / preset_catalog_app Typer groups and all 12 command
handlers out of __init__.py into presets/_commands.py, exposing register(app)
— mirrors the integration co-location from PR-5. __init__.py now registers
via _register_preset_cmds(app), dropping ~620 lines (3282 -> 2663).

Handlers lazy-import root helpers (_require_specify_project, get_speckit_version,
_locate_bundled_preset, _display_project_path) via 'from .. import' so test
monkeypatching of specify_cli.<helper> keeps working. _locate_bundled_preset
kept as an explicit re-export in __init__.py for that resolution path.

CLI surface and public imports unchanged. Full suite: 3162 passed, 40 skipped.
2026-06-16 14:52:12 -05:00
Ali jawwad
63a2a17305 fix(extensions): preserve argument-hint in extension Claude SKILL.md (#2916)
Extension-provided commands that declare `argument-hint:` in their
frontmatter had that field dropped from the generated Claude
`.claude/skills/<name>/SKILL.md`, while core template commands keep it.
The extension skill generator built the frontmatter via the shared
build_skill_frontmatter() (name/description/compatibility/metadata only)
and never forwarded argument-hint.

Carry argument-hint from the parsed source command frontmatter into the
skill frontmatter dict before serialization, gated on the integration
exposing inject_argument_hint so only argument-hint-aware agents (Claude)
receive the key and build_skill_frontmatter's shared shape stays unchanged
for every other agent. The value is injected into the dict rather than via
the string-based inject_argument_hint helper, so a folded multi-line
description cannot be split into invalid YAML.

Add regression tests covering a folding description (Claude) and the
non-Claude gate (kimi).

Closes #2903

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 07:28:39 -05:00
darion-yaphet
36ad3cde1b fix(presets): harden preset URL installs against unsafe redirects (#2911)
* Harden preset URL installs against unsafe redirects

Preset URL installs already rejected non-HTTPS source URLs, but the authenticated opener follows redirects. Validate the final response URL before writing the ZIP, preserve GitHub release asset URL resolution after the preset command module split, stream the response to disk, and keep catalog config serialization on safe YAML output.

Constraint: open_url follows redirects, so source URL validation alone does not constrain the downloaded target

Rejected: Keep response.read() for simplicity | large preset downloads should not be buffered entirely in memory

Confidence: high

Scope-risk: narrow

Directive: Keep preset URL policy aligned with workflow installer redirect validation

Tested: uvx ruff check src/specify_cli/__init__.py src/specify_cli/presets/__init__.py src/specify_cli/presets/_commands.py tests/test_presets.py

Tested: uv run pytest tests/test_presets.py -q

Not-tested: Real network redirect integration against a live HTTP server

Co-authored-by: OmX <omx@oh-my-codex.dev>

* Reject malformed preset download URLs

Preset downloads should fail early when a URL lacks a hostname, even if the scheme is HTTPS. The redirect error now describes any disallowed target instead of implying that only non-HTTPS redirects are blocked.

* Prevent credentialed preset redirects from downgrading transport

Preset URL downloads already checked the final URL after urllib followed redirects, but that was too late for authenticated requests because same-host redirects could preserve Authorization during the redirect itself. The authenticated HTTP helper now supports an opt-in redirect validator, and preset downloads use it to reject disallowed redirect targets before following them. The redirect auth handlers also stop preserving credentials across HTTPS to non-HTTPS downgrades as defense in depth.

* test(presets): 修复 URL 解析测试 mock 缺少 redirect_validator 参数

重定向安全加固为 open_url 新增 redirect_validator 参数,
两处 fake_open_url mock 签名未同步导致 TypeError。
补齐参数后全部 3717 个测试通过。

---------

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-06-11 07:21:50 -05:00
Manfred Riem
5ae7ff53d0 fix: skip recovered files during refresh_managed overwrite check (#2918) (#2919)
_is_managed() in install_shared_infra now consults manifest.is_recovered()
before treating a hash-matching file as managed. Files marked recovered
(pre-existing on disk, not installed by Spec Kit) are no longer overwritten
by integration use/switch even when their hash matches the manifest entry.

This closes the gap documented in the manifest API: callers using
refresh_managed MUST check is_recovered first.

Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 16:39:57 -05:00
Manfred Riem
40e48ed22c feat: add category and effect as first-class fields in extension schema (#2899)
* feat: add category and effect as first-class fields in extension schema

Add `category` and `effect` as optional fields in the extension schema
(`extension.yml`) and community catalog (`catalog.community.json`).

Schema changes:
- Valid categories: docs, code, process, integration, visibility
- Valid effects: read-only, read-write
- Both fields are optional (backward-compatible with existing extensions)
- Validation raises ValidationError for invalid values when present

Propagation:
- Added `category` and `effect` to all 108 entries in catalog.community.json
  (populated from the existing docs/community/extensions.md table)
- Updated extension template with commented category/effect fields
- Updated add-community-extension skill with new JSON template fields
- Updated `specify extension info` CLI output to display category/effect
- Added properties to ExtensionManifest class

Tests:
- test_valid_category: all 5 category values pass
- test_valid_effect: both effect values pass
- test_invalid_category: invalid value raises ValidationError
- test_invalid_effect: invalid value raises ValidationError
- test_category_and_effect_optional: omitting fields still works

Closes #2874

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: make category free-form, keep effect validated

Category is a free-form string (only validated as non-empty when present),
while effect remains restricted to 'read-only' or 'read-write'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback

- Add type guard before 'in' check for effect to prevent TypeError on
  unhashable YAML values (list/dict)
- Comment out category/effect in template so authors must opt in
- Use VALID_EFFECTS constant in test instead of hard-coded values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update category docstring to reflect free-form semantics

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: clarify canonical extension effect values

---------

Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-06-10 07:44:27 -05:00
Quratulain-bilal
de88c23bb6 fix(catalogs): validate extension and preset catalog payload shape (#2621)
* fix(catalogs): validate extension and preset catalog payload shape

`ExtensionCatalog._fetch_single_catalog` and
`PresetCatalog._fetch_single_catalog` only check that the `extensions` /
`presets` key is *present* in the parsed catalog JSON. They don't check
that the value is a JSON object, and they don't check that the root is
a JSON object at all. A malformed (or compromised) upstream catalog
returning:

    {"schema_version": "1.0", "extensions": []}

passes both `"extensions" not in catalog_data` and the subsequent
`response.read()` JSON parse, gets cached on disk, and then crashes
deep inside `_get_merged_extensions` (resp. `_get_merged_packs`) with:

    AttributeError: 'list' object has no attribute 'items'

instead of the existing user-facing
`ExtensionError("Invalid catalog format from <url>")` /
`PresetError("Invalid preset catalog format")` that the surrounding
code is clearly trying to produce.

The sibling integration-catalog reader already validates this — see
`src/specify_cli/integrations/catalog.py` where the fetch path
explicitly checks both `isinstance(catalog_data, dict)` and
`isinstance(catalog_data.get("integrations"), dict)` before returning.
This change mirrors that pattern in the extension and preset readers so
the three catalog fetchers stay consistent and a malformed upstream
surfaces as the user-facing error instead of a raw Python traceback.

Adds parametrized regression tests covering:
- root payload is not a JSON object (list, str, int, null)
- root is a dict but `extensions` / `presets` value is the wrong type
  (list, str, null, int)

All eight bad-payload shapes now raise the expected catalog error.

* fix(catalogs): skip non-mapping entries during extension and preset merge

Addresses Copilot review feedback on this PR.

`_fetch_single_catalog` now validates that the ``extensions`` / ``presets``
value is a mapping, but it doesn't (and shouldn't) validate every entry
inside that mapping. A payload like:

    {"schema_version": "1.0", "extensions": {"good": {...}, "bad": []}}

passes the fetch-level guard, then later crashes inside
``_get_merged_extensions`` (resp. ``_get_merged_packs``) at
``{**ext_data, ...}`` with ``TypeError: 'list' object is not a mapping``.

The sibling integration-catalog reader at
``src/specify_cli/integrations/catalog.py:245`` handles this with a
per-entry ``isinstance(integ_data, dict)`` skip during merge, so one
malformed entry doesn't poison an otherwise valid catalog. This change
mirrors that pattern in the extension and preset mergers and adds
regression tests asserting that valid entries continue to merge while
malformed siblings are silently dropped.

* fix(catalogs): validate cached extension and preset payload shape

Addresses Copilot review feedback on this PR (round 2).

The earlier commits in this branch added payload-shape validation on the
network fetch path. The cache-hit path still returned
``json.loads(cache_file.read_text())`` directly without re-checking the
shape, so a cache poisoned by an older spec-kit version (or a manual
edit, or an upstream that briefly served a bad payload before the
network guards landed) would re-crash every invocation of
``_get_merged_extensions`` / ``_get_merged_packs`` with
``AttributeError: 'list' object has no attribute 'items'`` despite the
cache being "valid" by age.

Extracts the shape validation into ``_validate_catalog_payload`` on both
``ExtensionCatalog`` and ``PresetCatalog``, and calls it from both the
cache-load and network-fetch branches of ``_fetch_single_catalog``. If
the cached payload fails validation, the cache read is treated like a
``json.JSONDecodeError`` — the cached value is discarded and the
function falls through to the network fetch, which refreshes the cache
with a clean payload on success. Never propagates ``AttributeError`` to
the caller.

Regression tests parametrize the four root-bad-type variants plus three
``extensions``/``presets``-bad-type variants per file, asserting that a
poisoned cache silently recovers via network refetch and returns the
freshly-fetched payload.

* fix(catalogs): include URL in missing-keys error to match sibling branches

Addresses Copilot review feedback on this PR (round 3).

``_validate_catalog_payload`` advertises in its docstring that the
catalog URL is included in error messages "so the user can tell which
catalog in a multi-catalog stack is malformed" — but the missing-keys
branch raised ``PresetError("Invalid preset catalog format")`` without
the URL, breaking that contract and making multi-catalog debugging
harder. The root-bad-type and nested-bad-type branches in the same
helper already include the URL; this commit brings the middle branch
in line.

For consistency, the same fix is applied to the legacy single-catalog
fetch paths in ``ExtensionCatalog.fetch_catalog`` and
``PresetCatalog.fetch_catalog`` (where the URL was likewise dropped
from the missing-keys error).

The existing regex matchers in the regression tests target the
``"Invalid (preset )?catalog format"`` prefix, which is preserved
verbatim before the ``from <url>`` suffix — no test changes needed.

* fix(catalogs): broaden cache except tuples and reuse validator in fetch_catalog

Addresses Copilot review feedback on this PR (round 4):

1. ``ExtensionCatalog.fetch_catalog`` and ``PresetCatalog.fetch_catalog``
   — the legacy single-catalog methods — still only checked key
   presence. A payload like ``42`` (root non-object) crashed with
   ``TypeError: argument of type 'int' is not iterable`` during the
   ``"schema_version" in catalog_data`` check, and an entry mapping of
   the wrong type crashed downstream. Both now reuse
   ``_validate_catalog_payload`` so the network-side behaviour of the
   legacy methods stays consistent with the multi-catalog
   ``_fetch_single_catalog`` path. (Copilot #3335623482, #3335623556.)

2. The cache-read ``except`` tuples in ``_fetch_single_catalog`` and
   ``fetch_catalog`` were too narrow. ``read_text`` can raise
   ``OSError`` (permissions / disk / handle limit) or ``UnicodeError``
   (cache file written by an older client in a different encoding)
   in addition to ``json.JSONDecodeError``. Without those in the
   tuple, an unreadable cache crashed the caller instead of falling
   through to the network refetch the cache contract documents. Both
   sites now catch ``(json.JSONDecodeError, OSError, UnicodeError,
   <DomainError>)``. (Copilot #3335623588, #3335623608.)

3. While here, pinned ``encoding="utf-8"`` on every cache ``read_text``
   call so cache files written by an older Windows client (with a
   non-UTF-8 default locale) decode the same way on a newer client.

Regression tests:

  - ``test_fetch_catalog_rejects_malformed_payload`` — 7 parametrized
    payloads per file covering root-non-object + nested-bad-type
    variants asserting ``fetch_catalog`` raises the named domain error.
  - ``test_fetch_catalog_recovers_from_unreadable_cache`` — writes
    ``b"\xff\xfe\x00not-utf-8"`` to the cache file and asserts
    ``fetch_catalog`` silently falls through to the mocked network and
    returns the freshly-fetched payload.

* fix(catalogs): harden cache-validity checks and pin UTF-8 on writes

The cache-best-effort contract added in 7f44b25 was incomplete on two
points raised by Copilot:

1. The cache-validity helpers (is_cache_valid /
   _is_url_cache_valid, plus the inline metadata-age check inside
   _fetch_single_catalog for per-URL caches) read the metadata file
   without specifying an encoding and only caught
   json.JSONDecodeError / ValueError / KeyError /
   TypeError. A metadata file written by a tool using the system
   locale codec, or one whose handle is briefly unavailable, would
   raise UnicodeDecodeError / OSError and propagate past the
   read-side try/except in fetch_catalog — the very crash the
   read-side guard was meant to prevent. The validity checks now read
   with encoding="utf-8" and treat OSError / UnicodeError
   as cache-invalid, matching the documented contract.

2. The network-fetch path wrote the cache and metadata files with bare
   write_text(...), picking up the platform default encoding. The
   read path was already pinned to UTF-8 (and the
   integrations/catalog.py:193-203 sibling writes UTF-8 too), so
   on hosts whose default codec isn't UTF-8 the write/read pair could
   disagree and force an unnecessary refetch on every invocation. All
   four write_text calls now pass encoding="utf-8" so the
   cache survives a round trip on any platform.

Also rewords the misleading # Fetch from network comment in
extensions.fetch_catalog — it sat above the cache-check block,
which read as if the cache step had been skipped.

Tests
-----

Adds two parametrized regression tests per catalog:

* test_fetch_catalog_recovers_from_unreadable_metadata plants
  non-UTF-8 bytes in the metadata file, asserts is_cache_valid()
  returns False (rather than raising), and confirms
  fetch_catalog falls through to the network instead of crashing.

* test_fetch_catalog_writes_cache_as_utf8 round-trips a payload
  containing a non-ASCII identifier (café) through the public
  fetch path and reads the cache back with
  read_text(encoding="utf-8"), catching encoding drift at the
  byte level rather than relying on the system codec to happen to be
  UTF-8.

Both pairs follow the established sibling-file symmetry — the
extension and preset suites stay in lock-step.

* test(catalogs): assert UTF-8 write encoding by recording write_text kwargs

Copilot's review on this PR caught that test_fetch_catalog_writes_cache_as_utf8
claimed to validate UTF-8 at the byte level but actually only round-tripped a
non-ASCII string through json.dumps/read_text. Because json.dumps defaults to
ensure_ascii=True, 'café' was serialized as the all-ASCII escape 'caf\u00e9'
before reaching write_text — the bytes on disk were identical regardless of the
encoding kwarg, so a locale-encoded write would have round-tripped just fine.
The drift guard the test name advertised was not actually being enforced.

Rewriting these tests to observe the production code's argument directly:
each test now monkey-patches pathlib.Path.write_text with a recorder that
captures the encoding kwarg for every call, runs the production fetch, and
asserts every write into the cache directory passed encoding='utf-8'. That is
the substantive thing the regression guard cares about — non-ASCII payload
tricks were the wrong lever to pull, because json.dumps was masking the
encoding choice before write_text ever ran.

Both tests verified locally against the current production code (492 passed in
the extensions+presets suites) and confirmed to fail against a synthetic
no-encoding write (the recorder records None instead of 'utf-8', the assertion
catches it). Same change applied symmetrically to test_extensions.py and
test_presets.py to keep the sibling files in lockstep with the production
code paths in extensions.py and presets.py.

* fix(catalogs): catch AttributeError on non-mapping cache metadata; drop stale line refs

Copilot's review on the previous push pointed out that the
cache-validity helpers still had a gap: metadata.get("cached_at", "")
assumes metadata is a dict, but json.loads happily parses a
file containing [] / "oops" / 42 / true / null into
a non-mapping. The except tuple covered json.JSONDecodeError,
OSError, UnicodeError, ValueError, KeyError and
TypeError but not AttributeError, so a valid-JSON-but-non-dict
metadata payload would still crash the caller instead of degrading to
"cache invalid" as the docstring promised.

This affected four cache-validity sites — symmetric across the two
catalog modules:

* extensions.py — inline per-URL metadata-age check in
  _fetch_single_catalog
* extensions.py — is_cache_valid (legacy default-URL path)
* presets.py — _is_url_cache_valid
* presets.py — is_cache_valid

All four except tuples now include AttributeError with a comment
naming the exact failure (metadata.get(...) on a non-mapping) so
the next reader doesn't have to reconstruct the reasoning.

Separately, Copilot flagged that several comments hard-coded a line
range from a sibling file
(integrations/catalog.py:193-203) — those references will go stale
the moment that file changes. Replaced the hard-coded ranges with
file-only references (integrations/catalog.py) so the pointer
stays accurate as that file evolves. Same change applied to both
modules.

Tests
-----

test_is_cache_valid_handles_non_mapping_metadata is added to both
test_extensions.py and test_presets.py, parametrized over the
five JSON non-mapping root types ([], "oops", 42,
true, null). Each variant plants the metadata file with that
exact content and asserts is_cache_valid() returns False
without raising. The parametrize covers every JSON type the public
spec allows at the root, so a regression that drops AttributeError
from any except tuple is caught against every observable shape rather
than relying on the next reviewer to remember the .get /
non-mapping interaction.

pytest tests/test_extensions.py tests/test_presets.py — 502
passed (was 492 before; the parametrize adds five vectors per file).

* fix(catalogs): make cache writes best-effort to match read-side contract
2026-06-09 07:22:49 -05:00
Pascal THUET
f65d9f9382 feat(integration): add status reporting (#2674)
* feat(integration): add status reporting

* docs(integration): include status in query command docstring

* fix(integration): handle Windows extended-length paths in status containment

On Windows, os.readlink() (and sometimes Path.resolve()) return paths with
the \\?\ extended-length prefix. Comparing such a target against a plain
project root via Path.relative_to() spuriously fails, so an in-project
dangling symlink was classified as `invalid` instead of `missing` — failing
test_status_treats_dangling_symlink_as_missing and the windows-style variant
on the Windows CI runners.

Centralize the containment check in _is_within_project() and strip the
\\?\ / \\?\UNC\ prefix from both sides before relative_to(). Add portable
regression tests for the prefix-stripping helper and the containment contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* test(integration): restore top-level pytest import after rebase

A three-way merge / rebase onto main silently dropped the module-level
`import pytest` from test_integration_subcommand.py: main reorganized the
import block without it (using only a local `import pytest as _pytest`),
while this branch added top-level fixtures and `pytest.skip`/`pytest.raises`
usage. The overlapping import-hunk edits resolved by dropping the import,
breaking collection with `NameError: name 'pytest' is not defined` on every
runner. Re-add the import in the third-party group.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(integration): fix Windows UNC path assertion in status helper test

`test_strip_extended_length_prefix_normalizes_windows_paths` compared the
str() form of the helper's output against a hand-built string. On Windows,
pathlib renders a UNC root with a trailing separator (`\\server\share\`),
so the exact string match failed there (`\\server\share\` != `\\server\share`)
even though `_strip_extended_length_prefix` behaves correctly — the trailing
separator is irrelevant to the `relative_to` containment check it feeds.

Compare Path objects (semantic equality) instead of exact strings so the
assertion holds on both POSIX and Windows. No production code change needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(integration): make shared-manifest remediation specify --integration

The fallback `_manifest_suggestion` for the shared `speckit` manifest (used
when no usable default integration is recorded) suggested
`specify init --here --force`, which can trigger interactive integration
selection. For CI/agent consumers of `integration status`, surface an
explicit `--integration <key>` placeholder, matching the file's existing
`<key>` suggestion style.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-09 07:20:43 -05:00
Copilot
927f54feea feat: make git extension opt-in and remove --no-git at v0.10.0 (#2873)
* feat(init)!: make git extension opt-in and remove --no-git at v0.10.0

- Remove --no-git parameter from specify init command
- Remove git extension auto-installation from init flow
- Git repository initialization (git init) still runs when git is available
- Remove --no-git from all test invocations across the test suite
- Update docs to reflect opt-in git extension behavior
- Replace TestGitExtensionAutoInstall with TestGitExtensionOptIn tests

BREAKING CHANGE: specify init no longer auto-installs the git extension.
Use `specify extension add git` to install it explicitly.
The --no-git flag has been removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(scripts): remove git operations from core scripts

Git functionality is now entirely managed by the git extension.
Core scripts only handle directory-based feature creation and numbering.

- Remove has_git(), check_feature_branch(), git branch creation from core
- Simplify number detection to use only spec directory scanning
- Remove HAS_GIT output from get_feature_paths()
- Remove git remote fetching and branch querying
- Keep BRANCH_NAME output key for backward compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: remove all git operations from core

- Remove is_git_repo() and init_git_repo() dead code from _utils.py
- Remove --branch-numbering from init command
- Remove git from 'specify check' (now extension-only)
- Update docs: git is optional prerequisite, check command description
- Fix tests to reflect no-git-in-core reality (fallback to main)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(scripts): remove directory scanning and branch fallback from core

Core scripts now resolve feature context exclusively from:
1. SPECIFY_FEATURE env var (set by git extension)
2. .specify/feature.json (persisted by specify command)

Removed find_feature_dir_by_prefix() and directory scanning heuristics —
these are the git extension's responsibility. Scripts error clearly when
no feature context is available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: introduce feature_numbering, deprecate branch_numbering in init-options

- specify command template now reads feature_numbering (preferred) with
  fallback to branch_numbering (deprecated) from init-options.json
- Git extension reads git-config.yml > feature_numbering > branch_numbering
- init now writes feature_numbering: sequential to init-options.json
- Deprecation warning emitted when branch_numbering is used as fallback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove trailing whitespace in common.ps1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(scripts): persist SPECIFY_FEATURE_DIRECTORY env var to feature.json

When SPECIFY_FEATURE_DIRECTORY is set, get_feature_paths() now writes the
value to .specify/feature.json so future sessions without the env var can
still resolve the feature directory. The write is idempotent — it skips
when the file already contains the same value.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address review feedback — error messages and docs

- Update error messages in common.sh and common.ps1 to reference
  SPECIFY_FEATURE_DIRECTORY instead of SPECIFY_FEATURE (which no longer
  resolves feature directories)
- Fix get_current_branch comment (returns empty string, not error)
- Update upgrade.md to reference SPECIFY_FEATURE_DIRECTORY with correct
  example paths
- Update local-development.md troubleshooting: replace stale 'Git step
  skipped' row with actionable git extension guidance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(scripts): harden feature.json persistence

- Use json_escape in printf fallback when jq is unavailable (common.sh)
- Replace utf8NoBOM encoding with UTF8Encoding($false) for PowerShell
  5.1 compatibility (common.ps1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(scripts): remove dead feature_json_matches_feature_dir functions

These guards are no longer needed since the branch-name validation they
protected against has been removed from check-prerequisites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(git-ext): rename create-new-feature to create-new-feature-branch

The git extension's script only creates the git branch — rename it to
reflect that responsibility. The core create-new-feature.sh/.ps1 handles
feature directory creation and feature.json persistence.

Also includes fixes from review feedback:
- common.sh: _persist_feature_json uses json_escape fallback
- common.ps1: Save-FeatureJson uses UTF8Encoding for PS 5.1 compat
- common.ps1: case-sensitive path stripping on non-Windows
- create-new-feature.sh/ps1: output both SPECIFY_FEATURE and
  SPECIFY_FEATURE_DIRECTORY
- setup-tasks.sh: fix stale 'Validate branch' comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tests): update references to renamed git extension scripts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tests): remove duplicate EXT_CREATE_FEATURE assignments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-09 06:13:07 -05:00
WOLIKIMCHENG
d8a81b23b5 test(workflows): cover executable override fallback preflight (#2843) 2026-06-08 15:33:37 -05:00
Seiya Kojima
4ec4635dd1 feat(extensions): per-event hook lists with priority ordering (#2798)
* feat(extensions): per-event hook lists with priority ordering

The manifest validator restricted each hook event to a single mapping,
even though HookExecutor stores entries as a list per event. This blocked
an extension from running multiple commands on one event (e.g. a
verification step plus a doc-generation step after speckit.plan), and
get_hooks_for_event returned entries in raw insertion order with no way
to influence execution order across or within extensions.

This change:

1. Validator: accept hooks.<event> as either a single mapping or a list
   of mappings. Each entry is validated individually and may carry an
   optional integer `priority` (>= 1, default 10; bool rejected).
2. Command-ref normalization: apply rename / alias->canonical rewriting
   to every entry in the list, not just the head.
3. register_hooks: expand list entries, persist `priority`, and
   purge-and-replace all entries owned by the extension on each event so a
   reinstall whose shape changed (single<->list, or a shorter list) leaves
   no orphaned entries behind.
4. get_hooks_for_event: sort enabled entries by `priority` ascending with
   a stable sort (ties keep insertion order). The existing
   normalize_priority helper is reused as the sort key so corrupted
   on-disk values fall back to the default instead of raising.

Backward compatible: existing single-mapping manifests parse and register
unchanged with priority defaulting to 10. The extension-level `priority`
used by preset/template resolution is independent of the new hook-entry
`priority`.

Implements #2378

* fix(extensions): harden register_hooks per PR review

- Skip non-dict hook entries before .get() so a manifest that bypasses
  validation can't crash register_hooks with AttributeError.
- Normalize `priority` on save via normalize_priority so the on-disk
  config stays clean, mirroring the read-side defense in
  get_hooks_for_event.
- Tests: cover the non-dict-entry skip and add encoding="utf-8" to the
  new tests' manifest writes.

* fix(extensions): purge dropped-event hook orphans on reinstall

register_hooks only purged events the new manifest still declared, so an
extension that dropped an event on reinstall left stale entries for it in
the project config. Purge this extension's entries from undeclared events
(and prune emptied events) before registering; scoped to this extension,
and a no-op for the install/update flow where unregister_hooks runs first.

* fix(extensions): reject boolean priority and complete orphan purge

- normalize_priority falls back to default for bool values
- dedup deletes duplicate commands before re-insert for last-wins ties
- register_hooks purges orphans even when all hooks are dropped

* docs(extensions): document per-event hook lists and priority

- EXTENSION-API-REFERENCE: hook event accepts a mapping or list; add
  priority field reference and last-wins dedup note
- EXTENSION-DEVELOPMENT-GUIDE: add list-form example with priority

* docs(extensions): show both single and list hook forms in schema snippet

* docs(extensions): reference DEFAULT_HOOK_PRIORITY in normalize_priority

normalize_priority hard-coded the default as the literal 10 in both its
signature and docstring, duplicating DEFAULT_HOOK_PRIORITY. Reference the
constant in the signature and drop the literal from the docstring so the
default has a single source of truth.
2026-06-08 08:03:46 -05:00
Copilot
7106858c4e feat!: remove legacy --ai, --ai-commands-dir, and --ai-skills flags (0.10.0) (#2872)
* Initial plan

* feat!: remove legacy --ai, --ai-commands-dir, and --ai-skills flags at 0.10.0

* refactor(tests): rename stale test_ai_help_* methods to test_agent_config_*

* fix: address review — derive agent folder for generic integration and remove redundant test

- Security notice now falls back to integration_parsed_options['commands_dir']
  when AGENT_CONFIG folder is None (generic integration).
- Remove test_agent_config_includes_kiro_cli which duplicates the assertion
  in test_runtime_config_uses_kiro_cli_and_removes_q.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: scrub all remaining --ai flag references from source and tests

- Remove dead AI_ASSISTANT_ALIASES, AI_ASSISTANT_HELP, and
  _build_ai_assistant_help() from _agent_config.py
- Update comments/docstrings in extensions.py, presets.py, and
  integration subpackages to reference 'skills mode' or
  '--integration' instead of the removed flags
- Fix catalog.json generic integration description
- Update test docstrings/comments in test_extension_skills.py,
  test_extensions.py, and test_presets.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: remove legacy --ai flag rejection tests

The flags are fully removed from the CLI; typer handles unknown options
generically. No custom rejection logic exists to test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* revert: remove manual CHANGELOG.md entry

CHANGELOG is generated automatically; manual edits should not be made.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: make generic catalog description self-explanatory

Include the required --commands-dir sub-option in the description so
readers don't need to look up integration docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tests): rename duplicate test classes to avoid shadowing

The rename from Test*AutoPromote to Test*Integration collided with the
existing Test*Integration(SkillsIntegrationTests) base classes, causing
the shared test suites to be silently overwritten. Rename the CLI init
flow classes to Test*InitFlow instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 14:56:28 -05:00
Manfred Riem
60302fefec feat(extensions): add bundled bug triage workflow extension (#2871)
* feat(extensions): add bundled bug triage workflow extension (#2870)

Add a bundled 'bug' extension providing a three-stage bug triage workflow:

- speckit.bug.assess: triage a bug report (pasted text or URL), locate
  suspected code paths, and propose a remediation
- speckit.bug.fix: apply the proposed remediation and record what changed
- speckit.bug.test: validate the fix and record the verification result

Each bug gets its own directory under .specify/bugs/<slug>/ with one
Markdown report per stage (assessment.md, fix.md, test.md). The slug is
the only handle the three commands share; existing bug directories are
never overwritten.

Mirrors the layout of the existing bundled extensions (git, agent-context):

- extensions/bug/extension.yml, README.md, commands/
- extensions/catalog.json: register 'bug' (alphabetical, between
  agent-context and git)
- pyproject.toml: add wheel mapping to specify_cli/core_pack/extensions/bug

Closes #2870

* address Copilot review on #2871

- speckit.bug.assess.md: drop POSIX-specific 'mkdir -p' example;
  reword the prerequisite to describe the requirement (ensure BUG_DIR
  exists) without assuming a specific shell.
- speckit.bug.fix.md: fix the slug-resolution fallback wording. It
  listed '.specify/bugs/*/assessment.md' but then keyed off whether
  'exactly one bug directory' existed; now it correctly keys off whether
  exactly one matching 'assessment.md' was found and uses the slug from
  its parent directory.
- tests/extensions/bug/test_bug_extension.py: add a smoke test analogous
  to the agent-context extension's coverage. Validates the bundled
  layout, catalog registration, '_locate_bundled_extension("bug")'
  resolution, and that 'ExtensionManager.install_from_directory' installs
  the three commands.

All 333 tests in tests/extensions/, tests/test_extensions.py, and
tests/test_extension_registration.py pass.

* address Copilot review on #2871 (round 2)

- Import _locate_bundled_extension from the public 'specify_cli'
  package (it is re-exported in __init__.py) instead of the private
  'specify_cli._assets' module, so the test does not depend on internal
  module layout.
- Clarify module docstring: install_from_directory is called with
  register_commands=False, so commands are copied and recorded in the
  installed manifest but not registered with AI agents. Wording updated
  to avoid implying otherwise.

* address Copilot review on #2871 (round 3)

- tests/extensions/bug/test_bug_extension.py: read extension.yml as
  UTF-8 explicitly to avoid platform-dependent default encoding (notably
  on Windows). Matches how the README is read in the same module.
- extensions/bug/commands/speckit.bug.assess.md: add a 'Safety When
  Fetching URLs' section. Instructs the agent to treat fetched page
  content as untrusted input (no obeying embedded prompt-injection
  directives), forbids supplying credentials/secrets that a page asks
  for, scopes the fetch to the URL the user provided (no following
  redirects to other resources), and requires suspicious content to be
  quoted verbatim under an 'Unverified' heading rather than acted on.
- extensions/catalog.json: bump 'updated_at' to today (2026-06-05) so
  consumers that cache by this field invalidate when 'bug' is added.
- extensions/bug/README.md: minor grammar fix ('a reproduction that was
  not actually performed').

All 251 tests in tests/extensions/bug/, tests/test_extensions.py, and
tests/test_extension_registration.py pass.

* speckit.bug.assess: add URL Trust Policy for fetched bug-report URLs

Builds on the 'Safety When Fetching URLs' section by adding a tiered
classification rule the agent applies before any fetch:

1. Refuse outright (no fetch, no prompt) for non-http(s) schemes,
   loopback, link-local, RFC1918 private space, and known cloud
   instance-metadata endpoints (169.254.169.254, metadata.google.internal,
   100.100.100.200, metadata.azure.com). This closes the SSRF /
   internal-recon vector opened by 'paste any URL'.
2. Fetch silently for an explicit allowlist of widely-used public
   bug-report sources (github, gitlab, bitbucket, atlassian.net, linear,
   stackoverflow/stackexchange, sentry). This preserves the paste-a-URL
   ergonomics the workflow is built for.
3. Otherwise prompt once in interactive mode (default 'no', naming the
   resolved host explicitly); in automated mode skip the fetch and
   record '[UNVERIFIED - fetch skipped: host not on safe list: <host>]'
   in assessment.md so a human can decide later.

In every case, assessment.md records the verbatim URL, the resolved host,
and which branch of the policy was taken (allowlisted /
confirmed-by-user / auto-refused: <reason>) so the per-bug directory's
audit trail is complete. Preflight HEAD probes are explicitly forbidden
since the probe itself is the request the policy gates.

Execution step 1 now defers to the policy before fetching.

* speckit.bug.assess: remove 'post-redirect-resolution' inconsistency

The URL Trust Policy explicitly forbids following redirects, but the
audit-trail bullet asked the agent to record the host
'post-redirect-resolution', which contradicted that rule and could lead
agents to follow redirects unintentionally to determine what to log.

Reword both call sites to refer to the host parsed from the URL the user
supplied (no resolution implied):

- Tier-3 interactive prompt: '...naming the host parsed from the URL
  explicitly...'
- Recorded fields: 'The host parsed from that URL (no redirect following
  - see the rule above).'

No behavior change; clarification only.
2026-06-05 12:37:25 -05:00
lselvar
f512b8b0d1 fix: resolve GitHub release asset API URL for private repo preset and workflow downloads (#2855)
* fix: resolve GitHub release asset API URL for private repo preset and workflow downloads

- Add shared `resolve_github_release_asset_api_url` utility to `_github_http.py` for
  reuse across preset and workflow download paths
- Apply the same private-repo fix from PR #2792 (extensions) to:
  - `PresetCatalog.download_pack` — ZIP downloads via catalog `download_url`
  - `preset add --from <url>` — ZIP downloads from a direct URL
  - `workflow add <url>` — workflow YAML downloads from a direct URL
  - `workflow add <id>` (catalog) — workflow YAML downloads via catalog `url`
- For browser release URLs (`github.com/…/releases/download/…`), the asset is
  resolved via the GitHub REST API and downloaded with `Accept: application/octet-stream`
- Direct REST API asset URLs (`api.github.com/…/releases/assets/<id>`) are
  downloaded directly with `Accept: application/octet-stream`
- Auth is preserved end-to-end through the existing `open_url` infrastructure
- Update `test_download_pack_sends_auth_header` and add
  `test_download_pack_accepts_direct_github_rest_asset_url` to cover both paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: URL-encode tag in release API URL to handle special characters

Encode the tag as a path segment (using quote with safe='') when
building the releases/tags/<tag> API URL. This prevents malformed
URLs when tags contain reserved characters like '/' or '#'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: add CLI-level tests for preset add --from GitHub release URL resolution

Adds regression tests covering:
- resolve_github_release_asset_api_url unit tests (passthrough, resolution,
  network error, URL encoding of special chars in tags)
- CLI-level 'preset add --from <github-release-url>' end-to-end flow
- CLI-level 'preset add --from <api-asset-url>' direct passthrough

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: deduplicate release URL resolution; fix test issues

- ExtensionCatalog._resolve_github_release_asset_api_url now delegates
  to the shared helper in _github_http.py (also gains URL-encoding fix)
- Remove unused 'io' import from test_github_http.py
- Remove duplicate 'provides' dict keys accidentally added to test_presets.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: align resolver timeout with download timeout; add workflow CLI tests

- Pass timeout=30 to resolve_github_release_asset_api_url in both
  workflow add paths so worst-case latency matches the download timeout
- Add CLI-level regression tests for 'workflow add <url>' covering
  browser URL resolution and direct API asset URL passthrough

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove unused urllib.request import; add catalog workflow test

- Remove unused 'import urllib.request' in preset add --from path
- Add CLI test for catalog-based 'workflow add <id>' with GitHub
  release URL resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: remove unused MagicMock imports from tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Manfred Riem <mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 10:41:40 -05:00
Huy Do
005c80a9c7 fix(workflows): render gate show_file contents in the interactive prompt (#2810)
* fix(workflows): render gate show_file contents in the interactive prompt

The gate step read and recorded `show_file` but never displayed its
contents at the interactive prompt, so the operator approved/rejected
without seeing the referenced file. Render the file inside the prompt
when stdin is a TTY, with a graceful notice for missing/unreadable
files. Non-interactive PAUSED behaviour, exit codes, resume semantics,
and no-`show_file` output are unchanged.

Closes #2809.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): keep gate _prompt signature stable and harden show_file reads

The gate prompt rendered show_file by passing it as a third positional
argument to _prompt. A test that stubs _prompt with a two-argument lambda
(test_gate_abort_still_halts_with_continue_on_error) then failed once the
branch caught up to main, because the call site passed three arguments to
the two-argument stub.

Compose the show_file material into the displayed message in execute() and
keep _prompt to its (message, options) contract. Display data no longer
widens the interactive seam, so stubbing _prompt stays stable and future
review material can be added without breaking callers. _prompt now renders
a multi-line message inside the gate box.

Also catch ValueError in _read_show_file so a path the OS rejects outright
(e.g. an embedded NUL byte) degrades to a notice instead of crashing the
prompt, matching the helper's stated contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): coerce gate prompt message to str before rendering

The multi-line render loop split the message on newlines, which assumes a
str. A non-string message (e.g. a YAML numeric literal) previously rendered
fine through the old f-string and would now raise on .split. Coerce with
str() to preserve that tolerance, and add a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): make gate stdin handling robust; tidy compose_prompt typing

Address review feedback on the gate tests and helper:

- Swap the gate module's sys.stdin for a fixed-isatty stub (shared
  _StubStdin / _force_gate_stdin helpers) instead of setattr on
  sys.stdin.isatty, which is not assignable under some pytest capture
  modes. This also forces the non-interactive tests to a non-TTY so they
  cannot block on input() when run in a real terminal.
- The non-interactive show_file test now hard-fails if _read_show_file is
  called, proving the file is not read on the PAUSED path.
- _compose_prompt accepts a non-string message (e.g. a YAML numeric
  literal) and always returns str via str(message), keeping its annotation
  and docstring accurate; the redundant coercion in _prompt is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): strip control chars from gate show_file; default tests non-TTY

Address review feedback:

- _read_show_file strips C0 control characters (except tab) from each line,
  so a show_file containing ANSI escape sequences (e.g. \x1b[2J) cannot
  clear the screen or spoof the prompt/options when rendered to a terminal.
- Add an autouse fixture on TestGateStep that defaults every gate test to a
  non-TTY stdin, so no test can drop into the interactive prompt and block
  on input() when the suite runs under a real TTY. Interactive tests opt
  back in via _force_gate_stdin(tty=True); the now-redundant explicit
  non-TTY calls were removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(workflows): localize gate stdin patch to the gate module's sys

_force_gate_stdin rebinds the gate module's `sys` name to a stand-in whose
stdin has a fixed isatty() and which delegates every other attribute to the
real sys, instead of mutating the process-wide sys.stdin. This keeps the
patch local to the gate module and leaves real stdin untouched. The gate
abort test, which used the same process-wide swap, now shares the helper, so
the pattern exists in exactly one place.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): sanitize the displayed gate show_file path, not just content

Control characters were stripped from show_file *contents* but the path was
still printed verbatim as the header (`f"{show_file}:"`) and echoed in the
read-error notice, so a show_file path containing ANSI escapes could still
inject terminal sequences. Centralize stripping in `_sanitize_for_display`
and apply it to every show_file-derived string that reaches the terminal —
the displayed path, each file line, and the error notice — while still
opening the file with the original path. Add a test for path sanitization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(workflows): inline control-char stripping, drop the helper

Reuse the existing _CONTROL_CHARS regex directly at the three display sites
instead of wrapping it in a one-line helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): also strip LF and C1 controls from gate show_file display

The control-char class skipped LF (so an embedded newline in a show_file
path could break the boxed layout) and the C1 range (so \x9b CSI and other
8-bit controls survived). Widen the class to [\x00-\x08\x0a-\x1f\x7f-\x9f]
(still keeping tab).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:04:52 -05:00
Samir Abed
34ce66139e feat: add support for rovodev (#2539)
* feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev

* fixup! feat: add support for rovodev
2026-06-04 11:34:05 -05:00
Huy Do
141119efea feat(workflows): add JSON output for workflow run resume and status (#2814)
* feat(workflows): add --json output to workflow run, resume, and status

Adds an opt-in `--json` flag to `workflow run`, `workflow resume`, and
`workflow status` that emits a single machine-readable object (run_id,
workflow_id, status, current step; status also reports per-step states
and a runs list) for automation and external orchestrators.

JSON is written via a small `_emit_workflow_json` helper using plain
stdout, so Rich markup, highlighting, and line-wrapping can never alter
the emitted object. Default human-readable output and exit codes are
unchanged when `--json` is omitted. Reference docs updated.

Closes #2811.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(workflows): keep --json stdout clean while steps write output

Suppressing the banner and the step-start callback was not enough to
guarantee a single parseable JSON object on stdout: individual steps still
write there while the engine runs. The gate step prints its prompt, and the
prompt step runs a CLI subprocess that inherits the process's stdout file
descriptor — either can corrupt the JSON stream for interactive runs or
integration-backed workflows.

Wrap engine.execute()/engine.resume() in a file-descriptor-level redirect
(dup2) when --json is set, so both Python-level writes and inherited-fd
subprocess output go to stderr while stdout carries only the emitted JSON.
Step progress stays visible on stderr. status does not run the engine, so
it is unaffected.

Tests cover both pollution channels (a Python print and a real subprocess)
via fd-level capture, and the inactive no-op path. Docs note the
stdout/stderr split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(workflows): fix stray escape sequence in --json redirect comments

The redirect helper's docstring and its test comment wrote ``print``\s,
which renders as "print\s" rather than "prints". Replace with plain
"prints".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 11:11:39 -05:00
minbang
a9a759450d fix: recover active skills registration for extensions (#2803)
Extension command registration now resolves the active skills directory before writing command artifacts. This lets initialized skills-backed agents recover a missing active skills directory while preserving the existing preset registration behavior.

Add regression coverage for missing active skills directories, shared skills directories, and symlinked parent guards.

Fixes #2769.

Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-04 09:53:31 -05:00
One-TheOnly
8e5643d4ff fix(cursor-agent): enable headless CLI dispatch end-to-end (-p --trust --approve-mcps --force + Windows .cmd shim resolution) (#2631)
* fix(cursor-agent): enable CLI dispatch via ``-p --trust`` headless mode

Restores the ability for ``specify workflow run`` to dispatch the
cursor-agent CLI, complementing the existing in-IDE skill flow.
Without this fix, ``specify workflow run speckit --input
integration=cursor-agent ...`` fails with a misleading
``CLI not found or not installed`` error even when the CLI is
installed (since cursor-agent had ``requires_cli=False`` and an
unset ``build_exec_args``).

The cursor-agent CLI (>= 2026.05.16) supports headless execution
via ``-p`` (print mode with full tool access including write/shell)
and ``--trust`` (bypass Workspace Trust prompt). Without ``--trust``
the CLI exits non-zero in non-TTY contexts (verified locally).

Changes to ``src/specify_cli/integrations/cursor_agent/__init__.py``:

* ``config.requires_cli``: ``False`` -> ``True``
* ``config.install_url``: ``None`` -> Cursor CLI docs URL
* Override ``build_exec_args()`` to emit
  ``[cursor-agent, -p, --trust, <prompt>, ...]``
  with optional ``--model`` and ``--output-format json`` flags,
  mirroring the shape used by ``claude``/``codex``/``gemini``.

Tests:

* 34 existing cursor-agent tests still pass.
* 6 new tests in ``TestCursorAgentCliDispatch`` pin
  ``requires_cli``, ``install_url``, and the exact argv shape
  (default, text-output, with-model, and the hyphenated skill
  invocation form ``/speckit-<name>``).
* Full repo: 1085 / 1085 passed, no regressions.

Fixes #2629

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(integrations): resolve ``.cmd``/``.bat`` shims before subprocess.run

On Windows, ``shutil.which`` honors ``PATHEXT`` and locates wrappers
like ``cursor-agent.cmd`` and ``codex.cmd``, but Python's
``subprocess.run`` calls ``CreateProcess`` which does **not** consult
``PATHEXT`` and therefore fails with ``WinError 2`` on a bare argv
like ``[cursor-agent, ...]``.

Resolve ``exec_args[0]`` via ``shutil.which`` in
``IntegrationBase.dispatch_command`` so ``.cmd``/``.bat`` shims work
transparently. On POSIX this is a no-op for absolute paths and a
harmless lookup otherwise.

Verified locally on Windows 10 + cursor-agent 2026.05.16:
without this fix, ``specify workflow run speckit --input
integration=cursor-agent`` fails with ``FileNotFoundError`` even
after the cursor-agent integration starts producing valid exec
args (per the prior commit on this branch).

Tests:

* New: 2 cursor-agent tests pin the shim-resolution + passthrough
  behavior (``test_dispatch_command_resolves_cmd_shim_for_subprocess``
  and ``test_dispatch_command_passthrough_when_shutil_which_finds_nothing``).
* Updated: ``tests/test_workflows.py::TestCommandStep::test_dispatch_with_mock_cli``
  was mocking ``shutil.which`` only at the ``command`` step level
  and not at the ``base`` level, which made it environment-sensitive
  (fails locally when the real ``claude`` CLI is on PATH).  Added the
  matching base-level patch and updated the argv-assertion to reflect
  the resolved path. ``test_dispatch_failure_returns_failed_status``
  gets the same patch for consistency.
* Full repo: 2867 passed, 0 regression from this PR. The 12 remaining
  pre-existing failures are unrelated Windows ``symlink`` privilege
  failures (``WinError 1314``) on a non-admin Windows runner.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cursor-agent): inject --approve-mcps --force for headless MCP/tool access

The previous commit (1c55988) wired up ``-p --trust`` so the CLI launches
in headless mode without the Workspace Trust prompt, but that alone is
not enough to let ``specify workflow run`` drive a real speckit feature
end-to-end with cursor-agent on Windows. Two more flags are required:

* ``--approve-mcps``: without it, every MCP server configured in
  ``.cursor/mcp.json`` stays ``not loaded (needs approval)``, and any
  tool call against them is silently dropped. We hit this immediately
  trying to read a DingTalk PRD from a remote MCP server during the
  ``/speckit-specify`` step.
* ``--force``: without it, the agent halts on the first tool-call
  approval prompt (the tool call gets rejected and the workflow exits
  non-zero with a misleading message). With ``--force`` cursor-agent
  matches the implicit "trusted environment" semantics that ``claude -p``
  and ``codex --exec`` already have by default -- which is the right
  semantics for an unattended ``specify workflow run`` invocation.

Verified end-to-end on Windows 10 + cursor-agent 2026.05.16-0338208:

* ``cursor-agent -p --trust --approve-mcps --force --output-format text``
  + a ``/speckit-specify`` prompt that included a DingTalk URL produced
  a full spec.md (31.5 KB) plus checklists/requirements.md in ~10.7 min,
  reading the source PRD through the ``dingtalk-doc`` remote MCP server,
  deciding the ``specs/`` subpath itself, and updating
  ``.specify/feature.json`` and ``specs/menu-dictionary.md`` along the
  way -- no human-in-the-loop, no source PRD ever touched the filesystem.
* Without ``--approve-mcps`` the same prompt errors with the tool call
  rejected message; without ``--force`` the agent stops at the first
  non-MCP tool call.

Tests:

* ``test_build_exec_args_*`` updated to pin the new four-flag prefix.
* New ``test_build_exec_args_contains_mandatory_headless_flags`` asserts
  the four flags are always present together.
* ``test_dispatch_command_resolves_cmd_shim_for_subprocess`` updated to
  match the new argv layout.
* All 43 cursor-agent tests pass; no other tests touched.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(cursor-agent): express dispatch support via build_exec_args() instead of requires_cli

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(cursor-agent): use urlparse hostname check and cover dispatch without requires_cli

Co-authored-by: Cursor <cursoragent@cursor.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: 刘一 <liuyi@oureman.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-04 09:48:33 -05:00
Copilot
40d832f90a Allow specify workflow run to execute YAML files without a project (#2825)
* Initial plan

* feat: add --workflow option to init command for post-init workflow execution

* chore: remove unused import in test file

* refactor: allow workflow run without project when given a YAML file path

Instead of adding --workflow to init, make `specify workflow run ./file.yml`
work without requiring a .specify/ project directory. When the source is a
YAML file that exists on disk, cwd is used as the project root. When it's a
workflow ID, the .specify/ project requirement is preserved.

* Handle standalone workflow path edge cases

* Fix USERPROFILE env var portability and docs notation

* Fix workflow YAML path detection to require regular files

* Harden workflow run against unsafe .specify paths

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-06-04 07:09:54 -05:00
Grissiom.GuRui
659a41a6cc feat(extensions): add --force flag to extension add for overwrite reinstall (#2530)
* feat(extensions): add --force flag to extension add for overwrite reinstall

Add --force support to `specify extension add` that allows overwriting
an already-installed extension without manually removing it first.

- install_from_directory() and install_from_zip() accept force=True,
  automatically calling remove() before installation
- The --force CLI flag works with all install modes (--dev, --from URL,
  bundled, and catalog)
- Config files (*-config.yml) are preserved across force reinstall
- Error message suggests --force when extension is already installed
- 6 new tests covering unit and CLI force reinstall flows

* fix: address PR review feedback on --force implementation

- Remove unused `backup_config_dir` variable assignment (Ruff F841)
- Defer `remove()` until after `_validate_install_conflicts()` to prevent
  data loss if validation fails mid-reinstall
- Use `TemporaryDirectory` instead of `NamedTemporaryFile` in ZIP test
  to avoid Windows file-locking failures

* fix: only restore config backup when --force actually triggers a remove

When --force is used but the extension is not already installed, the
backup restore/cleanup should not run. Previously it could resurrect
stale config files from a previous removal and delete the backup
directory unnecessarily.

* fix: address Copilot review feedback on --force implementation

- Clear stale backup dir before remove() so only fresh backups are restored
- Restore only config files (*-config.yml, *-config.local.yml) from backup
- Remove trailing \n from --force console message (console.print adds newline)

* fix: handle non-directory paths in backup cleanup/restore

- Use is_dir() before rmtree/iterdir on backup path to avoid crashes
  when .backup/<id> exists as a file or symlink
- Remove unused manifest1 variable in test_install_force_reinstall

* fix: handle symlinks in backup cleanup/restore and correct CLI message

- Check is_symlink() before is_dir() in backup cleanup and restore:
  Path.is_dir() follows symlinks (returns True for symlink-to-dir) but
  shutil.rmtree() raises OSError on symlinks. Handle symlinks by
  unlinking them instead.
- Skip symlink entries during config file restore.
- Change --force dev-install message from "Reinstalling" to
  "Installing [...] (will overwrite if already installed)" because
  --force also works for first-time installs.
2026-06-03 16:34:24 -05:00
Pascal THUET
4028c50af8 fix: render script command hints with active agent separator (#2649)
* fix script command hints for agent separators

* Address command hint review feedback

* chore: remove whitespace-only PR churn

* test: fix PowerShell command hint invocation

* fix: preserve hyphens in script command hints

* fix: render managed script command hints
2026-06-03 16:24:13 -05:00
darion-yaphet
67fecd357a chore(tests): fix ruff lint violations in tests/ (#2827)
Clear pre-existing lint debt flagged by repo-wide `ruff check` (the lint
config only scopes src/, so tests/ had drifted). No behavior change.

- F401/F541: drop unused imports and redundant f-string prefixes (autofix)
- E741: rename ambiguous `l` to `ln` in comprehensions
- E702: split semicolon-joined statements onto separate lines
- F841: drop unused bindings while keeping the side-effecting calls
  (_minimal_feature, install_from_directory)

Full suite: 3344 passed, 40 skipped. ruff check (repo-wide): clean.
2026-06-03 16:02:26 -05:00
Quratulain-bilal
bb2b49d0ae fix(workflows): validate run_id in RunState.load before touching the … (#2813)
* fix(workflows): validate run_id in RunState.load before touching the filesystem

``RunState.load(run_id, project_root)`` interpolates ``run_id`` directly
into ``project_root / ".specify" / "workflows" / "runs" / run_id`` and
then calls ``state_path.exists()`` and ``json.load`` on the result. The
run_id is reachable from user input via ``specify workflow resume
<run_id>`` (CLI argument) and via ``SPECKIT_WORKFLOW_RUN_ID`` (env var
override on the engine's run path), so a value like ``../escape``
turns ``runs_dir`` into ``.specify/workflows/escape/`` and:

  * ``state_path.exists()`` becomes a file-existence oracle for any
    path the process can read.
  * if a ``state.json`` exists at the traversed location (planted by
    a malicious dependency, a misconfigured shared workspace, or an
    older spec-kit version that happened to write there),
    ``json.load`` parses it and the workflow resumes under the
    attacker-chosen ``workflow_id`` / step state.
  * a subsequent ``state.save()`` then writes back to the traversed
    location, persisting the corruption.

``RunState.__init__`` already validates ``run_id`` against
``r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$'`` — but that check runs on
``state_data["run_id"]`` *after* ``load`` has already done the file
lookup, which is too late to prevent the disclosure.

This change extracts the pattern into a class-level constant
``_RUN_ID_PATTERN`` and a single ``_validate_run_id`` classmethod so
``__init__`` and ``load`` cannot drift, then calls the validator at the
top of ``load`` before any path is built. Mirrors the precedent in
``src/specify_cli/agents.py::_ensure_within_directory`` (used at line
437 of that file) which guards extension-install paths against the
same threat model.

Regression tests parametrize 9 traversal vectors (``../escape``,
``..``, ``../../etc/passwd``, ``foo/bar``, ``foo\bar``, ``.hidden``,
``-flag``, ``foo\x00bar``, empty) and plant a malicious ``state.json``
outside ``runs/`` so a missing guard would surface as a successful
load rather than the ambiguous ``FileNotFoundError``. A second test
asserts ``__init__`` and ``load`` reject the same representative
malformed ID, so future changes to one path can't silently drift from
the other.

* test(workflows): exercise RunState.load in shared-validation test, fix __init__ empty-string asymmetry

Copilot's review on this PR pointed out that
test_init_and_load_share_validation claimed to verify both entry
points share the same validation rules but never actually called
RunState.load — only __init__ and the shared
_validate_run_id helper. A regression in load (e.g. someone
deleting the cls._validate_run_id(run_id) call before the path is
built) would slip through even though __init__ and the helper
stayed aligned, defeating the whole point of the test.

Tightening the test surfaced a real asymmetry the previous version was
silently masking:

    self.run_id = run_id or str(uuid.uuid4())[:8]

The truthiness fallback meant RunState(run_id="") silently
substituted a UUID and skipped validation, while
RunState.load("", project_root) correctly rejected the empty
string. The two entry points diverged on the empty-string vector.
That is exactly the drift the test name claimed to defend against —
and the original test missed it.

Changes
-------

* engine.py: __init__ now distinguishes run_id is None
  (caller omitted it → auto-generate UUID) from an empty string
  (caller provided it → must validate like any other value). Both
  paths still flow through _validate_run_id, but only the
  explicit-None case auto-generates.

* test_workflows.py: test_init_and_load_share_validation is
  now parametrized over one representative vector per category from
  test_load_rejects_path_traversal (parent traversal, embedded
  separator, leading non-alphanumeric, empty string) and asserts that
  *all three* entry points — __init__, _validate_run_id, and
  load — reject the same input. Adding load to the assertion
  is the substantive fix Copilot asked for; keeping __init__ and
  the helper alongside it makes any future drift between the three
  immediately observable instead of having to read three separate
  tests.

Verification
------------

pytest tests/test_workflows.py — 168 passed (was 165 before the
parametrize expansion; __init__ empty-string vector would have
failed the new test against the old engine code, confirming the
asymmetry was real).
2026-06-03 14:26:07 -05:00