github-spec-kit

mirror of https://github.com/github/spec-kit.git synced 2026-07-03 12:28:06 +08:00

Files

Eldar Shlomi 39921ddd3b fix(shared-infra): record skipped files in speckit.manifest.json (#2483 )

* fix(shared-infra): record skipped files in speckit.manifest.json

`install_shared_infra` skipped files that already existed on disk
when `force=False`, but the skip branches in both the scripts loop
and the templates loop only appended to `skipped_files` without
calling `manifest.record_existing`. So when the function ran with a
fresh manifest against an already-populated `.specify/` tree (e.g.
after the manifest was deleted, corrupted, or extracted out of band),
every file went down the skip path, `planned_copies` /
`planned_templates` stayed empty, and `manifest.save()` wrote an
empty `files` field — leaving the integration believing nothing was
installed.

Record every skipped file in the manifest, but only when it is not
already tracked. This preserves the original hash for files that
were previously recorded so `check_modified()` (used by
`integration use` to decide whether a user has customized a
template) keeps working correctly.

Add `TestSpeckitManifestRecordsSkippedFiles` in
`tests/integrations/test_integration_claude.py` covering both the
fresh-skip path and the recover-after-lost-manifest path.

Fixes #2107

* fix(shared-infra): guard manifest.record_existing against non-file dst

Address Copilot review feedback on PR #2483. The previous fix called
``manifest.record_existing(rel_skip)`` from the skip branch of both
loops in ``install_shared_infra``, which would crash with
``IsADirectoryError`` (or another ``OSError``) if a directory or other
non-regular-file happened to exist at the expected destination path —
since ``record_existing`` opens the file to compute its SHA-256.

Three coordinated fixes:

1. ``IntegrationManifest.record_existing`` now validates its
   precondition: it raises ``ValueError`` if the path is a symlink or
   is not a regular file. The docstring already promised "an
   already-existing file"; this enforces it. The symlink check runs on
   the un-resolved path because ``_validate_rel_path`` calls
   ``resolve()``, which would silently follow the symlink. Mirrors the
   existing ``_ensure_safe_manifest_destination`` precedent in the
   same module.

2. In ``install_shared_infra``'s scripts and templates skip branches,
   guard the ``record_existing`` call with ``dst.is_file()`` and wrap
   it in ``try/except (OSError, ValueError)``. A directory collision,
   permission error, or TOCTOU race no longer aborts the whole
   install — the user gets a per-path warning, the path still
   surfaces in ``skipped_files``, and the rest of the install
   continues.

3. ``_read_manifest_files`` in the regression test no longer falls
   back to ``data.get("_files")`` (Copilot's low-confidence finding):
   the silent fallback could mask a schema regression where the
   public ``files`` key is renamed. It now asserts ``"files" in data``
   and that the value is a dict.

Add two regression tests in ``TestSpeckitManifestRecordsSkippedFiles``
covering the directory-at-destination edge case for both the scripts
loop and the templates loop. Both verify (a) install does not crash,
(b) the non-file path is not recorded in the manifest, and (c) the
path still surfaces in the user-visible warning.

The "shared infrastructure file(s)" warning text is changed to
"path(s)" so it remains accurate when non-file entries appear in the
list.

Refs #2107

* fix(manifest): lexical pre-check for record_existing + add error-case tests

Address Copilot review (2026-05-11, review id 4266902103):

1. `record_existing` was calling `(self.project_root / rel).is_symlink()`
   BEFORE validating containment. For absolute paths or paths containing
   `..`, this performed a filesystem stat outside the project root before
   `_validate_rel_path()` raised. Add a cheap lexical pre-check that
   delegates to `_validate_rel_path()` for the canonical error messages,
   so the symlink stat only ever runs on paths that are already lexically
   inside the project root.

2. Add focused unit tests in `tests/integrations/test_manifest.py` for
   the symlink and non-regular-file error paths, including:
     - symlink target rejection
     - dangling symlink rejection (caught by the symlink guard before
       the is_file check)
     - directory path rejection (is_file == False)
     - missing-path rejection (is_file == False)
     - absolute-path lexical pre-check
   The Copilot reviewer noted these guards had no focused coverage in
   `test_manifest.py`, only via the `test_integration_claude.py`
   regression test.

3. The third Copilot finding (repeated `dict(self._files)` copies via
   `manifest.files` in the skip branches) is already resolved on this
   branch by using `prior_hashes` — the function-scope snapshot taken at
   the top of `install_shared_infra` — for the membership check, instead
   of `manifest.files`.

AI disclosure: drafted with assistance from Claude (Opus 4.7).

* fix(manifest): track recovered files separately + symlink-ancestor + canonical-path guards

Address Copilot review id 4309888722 (2026-05-18) on PR #2483:

1. Recovery semantics (shared_infra.py:371, 412) — install_shared_infra
   now passes ``recovered=True`` when re-recording a skipped existing
   file. This flag funnels into a new ``recovered_files`` array in the
   manifest JSON, so a future ``refresh_managed`` run can distinguish
   "hash I produced" from "hash I observed on a file that may be a user
   customization" and avoid silent overwrite without ``--refresh-shared-infra``.
   Schema is purely additive: ``files: dict[str, str]`` is unchanged; the
   new ``recovered_files: list[str]`` is omitted when empty.

2. Symlinked ancestor (manifest.py:172) — ``record_existing`` now walks
   every component of the rel path and rejects any symlinked ancestor,
   not just a symlinked leaf. Catches ``linked_dir/file.txt`` where
   ``linked_dir`` is a symlink, which previously slipped past the leaf-only
   ``is_symlink()`` check and was resolved through by ``_validate_rel_path``.
   Mirrors the component-walk pattern in ``_ensure_safe_manifest_directory``.

3. Misleading "escapes project root" message (manifest.py:168) — paths
   like ``dir/../file.txt`` normalize inside the project, so the old
   message lied about what was wrong. New message: "Manifest paths must
   be canonical; '..' segments are not allowed". Still rejects (canonical
   keys are required so ``check_modified``/``uninstall`` cannot key the
   same file under two paths).

Tests: 7 new test methods across TestManifestRecoveredFiles and
TestRecordExistingNewGuards covering all 4 Copilot findings. Full suite
passes locally.

🤖 AI disclosure: drafted with assistance from Claude (Opus 4.7).

* fix(manifest): normalize is_recovered input through _validate_rel_path

Address Copilot review comment id 4309888722 round-5 (2026-05-21) on PR #2483:

``is_recovered()`` previously checked ``self._recovered_files`` membership
with bare ``Path(rel).as_posix()``, while ``record_existing()`` stores keys
via ``_validate_rel_path(rel, root).relative_to(root).as_posix()``. The two
normalizations disagreed on absolute paths and paths that escape the
project root — ``is_recovered`` would silently return False for inputs that
``record_existing`` would have refused entirely.

The fix routes ``is_recovered`` through the same ``_validate_rel_path``
pipeline; ``ValueError`` from the validator is caught and converted to
False so query semantics stay exception-free (Python ``__contains__``
convention).

Tests: 2 new methods in ``TestManifestRecoveredFiles``:
- ``test_is_recovered_absolute_path_returns_false``
- ``test_is_recovered_escaping_path_returns_false``

🤖 AI disclosure: drafted with assistance from Claude (Opus 4.7).

* fix(manifest): clear recovered marker on managed re-record + reject '..' in is_recovered

Address Copilot Round-7 review comments on PR #2483:

1. record_existing(recovered=False) and record_file now BOTH discard the
   path from _recovered_files. The marker is meant to flag "we observed
   this file but cannot vouch it's a managed baseline" — once the same
   path is re-recorded as managed (either explicitly or by writing fresh
   bytes), the marker is stale and must clear so refresh_managed and
   future is_recovered queries return the truthful answer.

2. is_recovered now applies the same canonical-key guard as record_existing
   (rejects absolute paths and '..' segments lexically before delegating
   to _validate_rel_path). Such paths can never be stored keys, so the
   query correctly returns False without depending on _validate_rel_path
   semantics that diverged from record_existing's stricter contract.

record_file docstring updated to mention the side-effect on recovered
markers.

Tests: 3 new methods in TestManifestRecoveredFiles covering
record_existing(false) clearing, record_file clearing, and is_recovered
dotdot rejection.

* test(manifest): update is_recovered comments to reflect Round-7 lexical guard

Round 8 — addresses Copilot review comment on tests/integrations/test_manifest.py:362.

After Round-7 (1dbf0c2), is_recovered() rejects absolute paths and '..' segments
up front via a lexical guard, returning False without calling _validate_rel_path
at all. The test comments still described the prior "_validate_rel_path raises;
we catch" code path, which is misleading for readers.

Updated comments in both:
  - test_is_recovered_absolute_path_returns_false (Copilot's exact target)
  - test_is_recovered_escaping_path_returns_false (same comment-class issue;
    fixed preemptively to avoid a Round-9 finding on the same drift)

Pure documentation change. Test assertions and behavior unchanged; all manifest
tests still green.

* fix(manifest): document OS errors on record_existing + filter orphan recovered_files on load

Round 9 — addresses Copilot review on PR #2483:

1. record_existing's docstring now documents OSError/PermissionError as
   possible raises (in addition to ValueError) — the implementation has
   always been able to raise them from is_symlink, is_file, or the
   file-read used to hash, but the contract did not reflect that.
   Callers should be prepared for both surfaces.

2. load() now filters recovered_files entries that don't correspond to
   keys in files. An externally-edited or partially-corrupted manifest
   can deserialize with orphan recovered paths; rather than reject the
   whole manifest (too strict on the upgrade path), we drop the orphans
   and let the inconsistency self-correct on the next save(). is_recovered
   then returns the truthful False for the orphan.

Tests: new test_load_filters_recovered_files_not_in_files asserting an
orphan recovered entry is dropped on load.