mirror of
https://github.com/github/spec-kit.git
synced 2026-07-06 05:53:12 +08:00
* feat(extensions): add bundled bug triage workflow extension (#2870) Add a bundled 'bug' extension providing a three-stage bug triage workflow: - speckit.bug.assess: triage a bug report (pasted text or URL), locate suspected code paths, and propose a remediation - speckit.bug.fix: apply the proposed remediation and record what changed - speckit.bug.test: validate the fix and record the verification result Each bug gets its own directory under .specify/bugs/<slug>/ with one Markdown report per stage (assessment.md, fix.md, test.md). The slug is the only handle the three commands share; existing bug directories are never overwritten. Mirrors the layout of the existing bundled extensions (git, agent-context): - extensions/bug/extension.yml, README.md, commands/ - extensions/catalog.json: register 'bug' (alphabetical, between agent-context and git) - pyproject.toml: add wheel mapping to specify_cli/core_pack/extensions/bug Closes #2870 * address Copilot review on #2871 - speckit.bug.assess.md: drop POSIX-specific 'mkdir -p' example; reword the prerequisite to describe the requirement (ensure BUG_DIR exists) without assuming a specific shell. - speckit.bug.fix.md: fix the slug-resolution fallback wording. It listed '.specify/bugs/*/assessment.md' but then keyed off whether 'exactly one bug directory' existed; now it correctly keys off whether exactly one matching 'assessment.md' was found and uses the slug from its parent directory. - tests/extensions/bug/test_bug_extension.py: add a smoke test analogous to the agent-context extension's coverage. Validates the bundled layout, catalog registration, '_locate_bundled_extension("bug")' resolution, and that 'ExtensionManager.install_from_directory' installs the three commands. All 333 tests in tests/extensions/, tests/test_extensions.py, and tests/test_extension_registration.py pass. * address Copilot review on #2871 (round 2) - Import _locate_bundled_extension from the public 'specify_cli' package (it is re-exported in __init__.py) instead of the private 'specify_cli._assets' module, so the test does not depend on internal module layout. - Clarify module docstring: install_from_directory is called with register_commands=False, so commands are copied and recorded in the installed manifest but not registered with AI agents. Wording updated to avoid implying otherwise. * address Copilot review on #2871 (round 3) - tests/extensions/bug/test_bug_extension.py: read extension.yml as UTF-8 explicitly to avoid platform-dependent default encoding (notably on Windows). Matches how the README is read in the same module. - extensions/bug/commands/speckit.bug.assess.md: add a 'Safety When Fetching URLs' section. Instructs the agent to treat fetched page content as untrusted input (no obeying embedded prompt-injection directives), forbids supplying credentials/secrets that a page asks for, scopes the fetch to the URL the user provided (no following redirects to other resources), and requires suspicious content to be quoted verbatim under an 'Unverified' heading rather than acted on. - extensions/catalog.json: bump 'updated_at' to today (2026-06-05) so consumers that cache by this field invalidate when 'bug' is added. - extensions/bug/README.md: minor grammar fix ('a reproduction that was not actually performed'). All 251 tests in tests/extensions/bug/, tests/test_extensions.py, and tests/test_extension_registration.py pass. * speckit.bug.assess: add URL Trust Policy for fetched bug-report URLs Builds on the 'Safety When Fetching URLs' section by adding a tiered classification rule the agent applies before any fetch: 1. Refuse outright (no fetch, no prompt) for non-http(s) schemes, loopback, link-local, RFC1918 private space, and known cloud instance-metadata endpoints (169.254.169.254, metadata.google.internal, 100.100.100.200, metadata.azure.com). This closes the SSRF / internal-recon vector opened by 'paste any URL'. 2. Fetch silently for an explicit allowlist of widely-used public bug-report sources (github, gitlab, bitbucket, atlassian.net, linear, stackoverflow/stackexchange, sentry). This preserves the paste-a-URL ergonomics the workflow is built for. 3. Otherwise prompt once in interactive mode (default 'no', naming the resolved host explicitly); in automated mode skip the fetch and record '[UNVERIFIED - fetch skipped: host not on safe list: <host>]' in assessment.md so a human can decide later. In every case, assessment.md records the verbatim URL, the resolved host, and which branch of the policy was taken (allowlisted / confirmed-by-user / auto-refused: <reason>) so the per-bug directory's audit trail is complete. Preflight HEAD probes are explicitly forbidden since the probe itself is the request the policy gates. Execution step 1 now defers to the policy before fetching. * speckit.bug.assess: remove 'post-redirect-resolution' inconsistency The URL Trust Policy explicitly forbids following redirects, but the audit-trail bullet asked the agent to record the host 'post-redirect-resolution', which contradicted that rule and could lead agents to follow redirects unintentionally to determine what to log. Reword both call sites to refer to the host parsed from the URL the user supplied (no resolution implied): - Tier-3 interactive prompt: '...naming the host parsed from the URL explicitly...' - Recorded fields: 'The host parsed from that URL (no redirect following - see the rule above).' No behavior change; clarification only.
118 lines
5.4 KiB
Markdown
118 lines
5.4 KiB
Markdown
---
|
|
description: "Validate that a previously fixed bug is resolved and record the verification report"
|
|
---
|
|
|
|
# Test Bug Fix
|
|
|
|
Validate that the fix recorded by `__SPECKIT_COMMAND_BUG_FIX__` actually resolves the bug described by `__SPECKIT_COMMAND_BUG_ASSESS__`. The output is a verification report at `.specify/bugs/<slug>/test.md`.
|
|
|
|
## User Input
|
|
|
|
```text
|
|
$ARGUMENTS
|
|
```
|
|
|
|
The user input should identify the bug to validate. Accept any of:
|
|
|
|
- `slug=<bug-slug>` or `--slug <bug-slug>` or a bare slug-like token.
|
|
- A path that contains the slug (e.g. `.specify/bugs/login-timeout/`).
|
|
- **Nothing** — fall back to context (see below).
|
|
|
|
## Slug Resolution
|
|
|
|
Resolve `BUG_SLUG` in this order, stopping at the first match:
|
|
|
|
1. **Explicit user input** — a slug passed in `$ARGUMENTS` (any of the forms above).
|
|
2. **Conversation context** — if the current session has just run `__SPECKIT_COMMAND_BUG_ASSESS__` or `__SPECKIT_COMMAND_BUG_FIX__`, the slug it reported is the working slug. Reuse it without re-prompting. Confirm it by checking that `.specify/bugs/<slug>/fix.md` exists; if it does not, fall through.
|
|
3. **Single candidate on disk** — list `.specify/bugs/*/fix.md`. If exactly one bug has a `fix.md`, use it.
|
|
4. **Disambiguate**:
|
|
- **Interactive mode**: ask the user which bug to validate and list the candidates.
|
|
- **Automated mode**: stop with an error listing the candidates. Do not guess.
|
|
|
|
Once resolved, set `BUG_SLUG` and `BUG_DIR = .specify/bugs/<BUG_SLUG>`, and briefly state in your reply which resolution path was used (explicit / from context / single candidate / asked).
|
|
|
|
## Prerequisites
|
|
|
|
- `BUG_DIR/assessment.md` MUST exist.
|
|
- `BUG_DIR/fix.md` MUST exist. If not, stop and instruct the user to run `__SPECKIT_COMMAND_BUG_FIX__` first.
|
|
- If `BUG_DIR/test.md` already exists, ask the user whether to overwrite it (interactive mode) or refuse (automated mode).
|
|
- Read both `assessment.md` and `fix.md` in full so you know:
|
|
- The original symptom and reproduction steps (from `assessment.md`).
|
|
- The actual code changes and tests added (from `fix.md`).
|
|
|
|
## Execution
|
|
|
|
1. **Plan the validation**
|
|
- Decide which checks prove the bug is gone:
|
|
- Re-run the reproduction steps from the assessment (or their automated equivalent).
|
|
- Run the tests added or updated in the fix.
|
|
- Run any broader regression suite that touches the changed files.
|
|
- Decide which checks prove nothing was broken:
|
|
- Existing test suites for the changed modules.
|
|
- Lint / type-check if the project uses them.
|
|
|
|
2. **Run the checks**
|
|
- Execute each planned check. Capture command, exit status, and a short excerpt of relevant output (last few lines, or the failing assertion).
|
|
- If a check is destructive, network-dependent, or expensive, skip it and record it as `skipped` with a reason; do not run it without explicit user consent.
|
|
- If you cannot run a check at all (missing tooling, no test framework configured), record it as `not-run` with a reason instead of fabricating a result.
|
|
|
|
3. **Judge the outcome**
|
|
- Mark the fix as:
|
|
- **verified** — all critical checks pass and the original symptom no longer reproduces.
|
|
- **partial** — the original symptom is gone but unrelated regressions appeared, or some checks are inconclusive.
|
|
- **failed** — the symptom still reproduces or the regression suite is broken by the fix.
|
|
- Do not over-claim. If reproduction was not actually performed (e.g., the bug required a production environment), say so explicitly.
|
|
|
|
4. **Write the verification report**
|
|
|
|
Write to `BUG_DIR/test.md` using this structure:
|
|
|
|
```markdown
|
|
# Bug Verification: <short title>
|
|
|
|
- **Slug**: <BUG_SLUG>
|
|
- **Tested**: <ISO 8601 date>
|
|
- **Assessment**: ./assessment.md
|
|
- **Fix**: ./fix.md
|
|
- **Result**: verified | partial | failed
|
|
|
|
## Summary
|
|
|
|
<One or two sentences: does the bug reproduce, did the fix hold, were any regressions found.>
|
|
|
|
## Checks Performed
|
|
|
|
| Check | Command / Action | Result | Notes |
|
|
|-------|------------------|--------|-------|
|
|
| Reproduction (post-fix) | <command or manual steps> | pass / fail / skipped / not-run | <short note> |
|
|
| New / updated tests | `<command>` | pass / fail | <short note> |
|
|
| Regression suite | `<command>` | pass / fail / skipped | <short note> |
|
|
| Lint / type-check | `<command>` | pass / fail / skipped | <short note> |
|
|
|
|
## Output Excerpts
|
|
|
|
<Short snippets of relevant output (e.g., final summary line of a test run, the failing assertion). Keep it tight — no full logs.>
|
|
|
|
## Residual Risks
|
|
|
|
- <known limitation, environment not covered, etc.>
|
|
|
|
## Recommendation
|
|
|
|
<One paragraph. Examples:>
|
|
- "Close the bug — verified end-to-end."
|
|
- "Hold — reproduction inconclusive; needs verification in staging."
|
|
- "Reopen — symptom still reproduces; rerun `__SPECKIT_COMMAND_BUG_ASSESS__`."
|
|
```
|
|
|
|
5. **Report back** with:
|
|
- The slug and `BUG_DIR/test.md` path.
|
|
- The result (`verified`, `partial`, `failed`).
|
|
- If the result is `failed`, recommend re-running `__SPECKIT_COMMAND_BUG_ASSESS__` with the new evidence captured in `test.md`.
|
|
|
|
## Guardrails
|
|
|
|
- This command MUST NOT modify source code. It only runs checks and writes inside `.specify/bugs/<slug>/`.
|
|
- Never overwrite an existing `test.md` without confirmation.
|
|
- Never mark a fix as `verified` based on tests alone if the original assessment listed a reproduction that you did not actually exercise — downgrade to `partial` and say so.
|