fix: resolve GitHub release asset API URL for private repo bundle downloads (#3136)

* fix: resolve GitHub release asset API URL for private repo bundle downloads

For private/SSO-protected GitHub repos, browser release download URLs
(https://github.com/<owner>/<repo>/releases/download/<tag>/<asset>)
redirect to an HTML/SSO page instead of delivering the asset, causing
bundle manifest downloads to fail.

Extends the pattern from #2855 (presets/workflows) to cover the bundle
manifest download path in _download_remote_manifest:

- Resolves browser release URLs to GitHub REST API asset URLs via
  resolve_github_release_asset_api_url before downloading
- Direct REST API asset URLs (api.github.com/repos/.../releases/assets/<id>)
  are passed through directly
- Both cases use Accept: application/octet-stream so the API returns the
  binary payload rather than JSON metadata
- The original catalog URL is used to determine artifact format (.zip vs
  YAML) since the resolved API URL does not carry the file extension

Adds two CLI-level contract tests:
- bundle info resolves browser release URL via GitHub tags API
- bundle info passes direct API asset URL through with octet-stream

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: detect ZIP payload by magic bytes; add zip and API-asset tests

Address Copilot review feedback on PR #3136:

1. Detect ZIP payloads by magic bytes (PK\x03\x04) in addition to the
   '.zip' URL suffix so that direct GitHub REST asset URLs — which carry
   no file extension — are correctly routed through the ZIP extraction
   path when the asset is a ZIP bundle artifact.

2. Add two new contract tests:
   - test_bundle_info_resolves_github_browser_release_url_zip: exercises
     the '.zip' browser release URL path end-to-end, verifying the tags
     API lookup fires, octet-stream header is used, and bundle.yml is
     successfully extracted from the ZIP payload.
   - test_bundle_info_api_asset_url_zip_detected_by_magic_bytes: verifies
     that a direct REST asset URL returning ZIP bytes is detected by magic
     and parsed correctly without a tags API call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: improve error message, broaden ZIP magic, drop unused tmp_path

Address second-round Copilot review feedback on PR #3136:

- Error message: when the download fails, report the original catalog
  download_url so the user knows which entry to fix; include the resolved
  REST API URL when it differs for easier debugging.
- ZIP detection: broaden the magic-bytes check from PK\x03\x04 to raw[:2]
  == b"PK", covering all valid ZIP variants (local-file header PK\x03\x04,
  empty-archive PK\x05\x06, spanned/split PK\x07\x08).
- Tests: remove the unused tmp_path parameter from
  test_bundle_info_resolves_github_browser_release_url_zip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: use full 4-byte ZIP signatures instead of 2-byte PK prefix

Address Copilot feedback: raw[:2] == b"PK" is too broad and could
misclassify any payload starting with ASCII "PK" as a ZIP, producing
a confusing "not a valid bundle" error.

Use the three specific 4-byte ZIP magic signatures instead:
  PK\x03\x04 — local file header (standard ZIP)
  PK\x05\x06 — end-of-central-directory (empty archive)
  PK\x07\x08 — data descriptor / spanning marker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: harden _download_remote_manifest parsing and tighten tests

- Promote _ZIP_SIGNATURES to module-level constant (was redefined per call)
- Use PurePosixPath for URL path suffix extraction so query strings and
  fragments are ignored and URL paths are treated as POSIX on all OSes
- Move yaml/BundleManifest imports to function top to flatten the
  previously nested try/except into a single handler with explicit
  except _yaml.YAMLError and except Exception clauses
- Re-add None guard on _local_manifest_source return: the function is
  typed Optional[BundleManifest] and without the guard a None return
  propagates silently to callers that degrade gracefully rather than
  raising an actionable error; comment explains it is defensive not dead
- Assert exact resolved asset URL in browser-URL download tests, not
  just the Accept header, so a regression where download uses the
  original URL instead of the resolved one would be caught
- Add resolution-failure test: when tags API finds no matching asset the
  code falls back to the original URL and exits non-zero with Error:

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(bundle): pass github_provider_hosts() for GHES private release downloads

Extends the GHES support pattern from extensions and presets (#2855, #3157)
to the bundle manifest download path: resolve_github_release_asset_api_url
now receives github_hosts=github_provider_hosts() so browser release URLs
from GitHub Enterprise Server instances are resolved via /api/v3 rather
than falling back to the unauthenticated download path.

Also adds a contract test covering the GHES resolution path for
_download_remote_manifest (analogous to the existing github.com tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(bundle): remove unused ghes_entry variable from GHES contract test

The dict was defined but never consumed — the test drives GHES host
recognition entirely through the github_provider_hosts() patch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(bundle): include source URL in remote manifest parse errors

Thread the catalog URL (and resolved API URL when it differs) into the
YAML parse, generic parse, and ZIP-extraction error paths of
_download_remote_manifest so failures point at the offending source
instead of an opaque temp path. Addresses PR review feedback.

Assisted-by: GitHub Copilot (model: Claude Opus 4.8, autonomous)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Manfred Riem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
lselvar
2026-07-01 17:30:20 -04:00
committed by GitHub
parent 6288dea6ae
commit 3b30e40aaa
2 changed files with 409 additions and 19 deletions

View File

@@ -8,6 +8,7 @@ from __future__ import annotations
import json
from pathlib import Path
from unittest.mock import patch
import pytest
import yaml
@@ -404,3 +405,315 @@ def test_install_integration_override_cannot_bypass_clash_guard(project: Path):
)
assert result.exit_code == 1
assert "claude" in result.output and "copilot" in result.output
# ===== Private GitHub release asset URL resolution =====
class FakeBundleResponse:
"""Minimal context-manager response stub for open_url fakes."""
def __init__(self, data: bytes, url: str = "https://api.github.com/repos/org/repo/releases/assets/99"):
self._data = data
self._url = url
def read(self) -> bytes:
return self._data
def geturl(self) -> str:
return self._url
def __enter__(self):
return self
def __exit__(self, *_):
return False
def _make_catalog_config(catalog_path: Path, project: Path) -> None:
"""Write a bundle-catalogs.yml pointing at *catalog_path* in *project*."""
config = {
"schema_version": "1.0",
"catalogs": [
{
"id": "test",
"url": str(catalog_path),
"priority": 1,
"install_policy": "install-allowed",
}
],
}
(project / ".specify" / "bundle-catalogs.yml").write_text(
yaml.safe_dump(config), encoding="utf-8"
)
def test_bundle_info_resolves_github_browser_release_url(project: Path):
"""bundle info resolves a private-repo browser release URL via the GitHub API."""
browser_url = "https://github.com/org/repo/releases/download/v1.0/bundle.yml"
api_asset_url = "https://api.github.com/repos/org/repo/releases/assets/99"
captured = []
manifest_yaml = yaml.safe_dump(valid_manifest_dict()).encode()
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
if "releases/tags/" in url:
# GitHub API release-tags lookup — return asset list
return FakeBundleResponse(
json.dumps({
"assets": [{"name": "bundle.yml", "url": api_asset_url}]
}).encode(),
url=url,
)
# Actual asset download
return FakeBundleResponse(manifest_yaml, url=api_asset_url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=browser_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
assert result.exit_code == 0, result.output
# The browser release URL must have been resolved via the GitHub tags API
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 1, f"Expected exactly one tags API call; got {captured}"
assert "releases/tags/v1.0" in tag_calls[0]
# The actual download must use the resolved API asset URL with octet-stream
asset_calls = [(url, h) for url, h in captured if "releases/assets/" in url]
assert len(asset_calls) == 1
assert asset_calls[0][0] == api_asset_url
assert asset_calls[0][1] == {"Accept": "application/octet-stream"}
def test_bundle_info_passes_through_api_asset_url(project: Path):
"""bundle info passes a direct GitHub API asset URL through with octet-stream."""
api_asset_url = "https://api.github.com/repos/org/repo/releases/assets/77"
captured = []
manifest_yaml = yaml.safe_dump(valid_manifest_dict()).encode()
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
return FakeBundleResponse(manifest_yaml, url=api_asset_url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=api_asset_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
assert result.exit_code == 0, result.output
# No tags API call — URL was already a REST asset URL
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 0
# Exactly one download call to the asset URL with octet-stream
asset_calls = [(url, h) for url, h in captured if "releases/assets/" in url]
assert len(asset_calls) == 1
assert asset_calls[0][0] == api_asset_url
assert asset_calls[0][1] == {"Accept": "application/octet-stream"}
def test_bundle_info_resolves_github_browser_release_url_zip(project: Path):
"""bundle info resolves a browser release URL for a .zip artifact and extracts bundle.yml."""
import io
import zipfile
browser_url = "https://github.com/org/repo/releases/download/v2.0/bundle.zip"
api_asset_url = "https://api.github.com/repos/org/repo/releases/assets/88"
# Build a minimal in-memory ZIP containing bundle.yml
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as zf:
zf.writestr("bundle.yml", yaml.safe_dump(valid_manifest_dict()))
zip_bytes = buf.getvalue()
captured = []
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
if "releases/tags/" in url:
return FakeBundleResponse(
json.dumps({
"assets": [{"name": "bundle.zip", "url": api_asset_url}]
}).encode(),
url=url,
)
return FakeBundleResponse(zip_bytes, url=api_asset_url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=browser_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
assert result.exit_code == 0, result.output
# tags API lookup must have fired
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 1
assert "releases/tags/v2.0" in tag_calls[0]
# Asset download uses the resolved API URL with octet-stream
asset_calls = [(url, h) for url, h in captured if "releases/assets/" in url]
assert len(asset_calls) == 1
assert asset_calls[0][0] == api_asset_url
assert asset_calls[0][1] == {"Accept": "application/octet-stream"}
# Manifest was successfully parsed from the ZIP
payload = json.loads(result.output)
assert payload["id"] == "demo-bundle"
def test_bundle_info_api_asset_url_zip_detected_by_magic_bytes(project: Path):
"""bundle info correctly handles a direct API asset URL that serves ZIP bytes."""
import io
import zipfile
api_asset_url = "https://api.github.com/repos/org/repo/releases/assets/55"
# Build a minimal in-memory ZIP containing bundle.yml
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as zf:
zf.writestr("bundle.yml", yaml.safe_dump(valid_manifest_dict()))
zip_bytes = buf.getvalue()
captured = []
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
return FakeBundleResponse(zip_bytes, url=api_asset_url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=api_asset_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
assert result.exit_code == 0, result.output
# No tags API call — URL was already a REST asset URL
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 0
# Download used octet-stream header
asset_calls = [(url, h) for url, h in captured if "releases/assets/" in url]
assert len(asset_calls) == 1
assert asset_calls[0][1] == {"Accept": "application/octet-stream"}
# ZIP bytes were detected by magic and bundle.yml extracted correctly
payload = json.loads(result.output)
assert payload["id"] == "demo-bundle"
def test_bundle_info_github_release_url_resolution_failure_falls_back_and_errors(project: Path):
"""When the GitHub tags API lookup finds no matching asset, fall back to the
original browser URL and surface a meaningful error (not a raw traceback)."""
browser_url = "https://github.com/org/repo/releases/download/v3.0/bundle.yml"
captured = []
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
if "releases/tags/" in url:
# Tags API responds but the asset list doesn't include our file
return FakeBundleResponse(
json.dumps({"assets": []}).encode(),
url=url,
)
# Fallback download: GitHub serves HTML (SSO redirect) instead of YAML
return FakeBundleResponse(b"<html>SSO login required</html>", url=url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=browser_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
# Must exit non-zero — the HTML body is not a valid bundle manifest
assert result.exit_code == 1
# The tags API lookup must have fired
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 1
# The fallback download should use the original browser URL (no octet-stream)
fallback_calls = [(url, h) for url, h in captured if url == browser_url]
assert len(fallback_calls) == 1
assert fallback_calls[0][1] is None # no Accept header on the original URL
# Error output must be actionable (not a raw traceback)
assert "Error:" in result.output
def test_bundle_info_resolves_ghes_browser_release_url(project: Path):
"""bundle info resolves a GHES private-repo browser release URL via /api/v3."""
ghes_host = "ghes.example"
browser_url = f"https://{ghes_host}/org/repo/releases/download/v1.0/bundle.yml"
api_asset_url = f"https://{ghes_host}/api/v3/repos/org/repo/releases/assets/42"
captured = []
manifest_yaml = yaml.safe_dump(valid_manifest_dict()).encode()
def fake_open_url(url, timeout=None, extra_headers=None, redirect_validator=None):
captured.append((url, extra_headers))
if "/api/v3/repos/" in url and "releases/tags/" in url:
return FakeBundleResponse(
json.dumps({
"assets": [{"name": "bundle.yml", "url": api_asset_url}]
}).encode(),
url=url,
)
return FakeBundleResponse(manifest_yaml, url=api_asset_url)
catalog = project / "catalog.json"
write_catalog_file(
catalog,
{"demo-bundle": catalog_entry_dict("demo-bundle", download_url=browser_url)},
)
_make_catalog_config(catalog, project)
with patch("specify_cli.authentication.http.open_url", side_effect=fake_open_url), \
patch("specify_cli.authentication.http.github_provider_hosts", return_value=(ghes_host,)):
result = runner.invoke(app, ["bundle", "info", "demo-bundle", "--json"])
assert result.exit_code == 0, result.output
# The GHES /api/v3 tags lookup must have fired
tag_calls = [url for url, _ in captured if "releases/tags/" in url]
assert len(tag_calls) == 1
assert f"{ghes_host}/api/v3/repos/org/repo/releases/tags/v1.0" in tag_calls[0]
# Asset download must use the resolved GHES API URL with octet-stream
asset_calls = [(url, h) for url, h in captured if "releases/assets/" in url]
assert len(asset_calls) == 1
assert asset_calls[0][0] == api_asset_url
assert asset_calls[0][1] == {"Accept": "application/octet-stream"}
payload = json.loads(result.output)
assert payload["id"] == "demo-bundle"