Files
larksuite-cli/shortcuts/apps/sensitive_paths.go
raistlin042 e93e2a98e1 feat(apps): replace +html-publish cwd hard-reject with credential-file scan (#1072)
* feat(apps): replace +html-publish cwd hard-reject with credential-file scan

The previous --path == "." block was a coarse heuristic: it caught the
common foot-gun of publishing a repo root, but also rejected legitimate
clean cwds, and let a ./dist with a forgotten .env ship the secret
through anyway (the sensitive-paths scanner was advisory and never ran
on the Execute path).

Move the gate from path shape to path content:

- Validate now walks --path candidates and rejects publishes that
  include well-known credential files (.env / .env.* / .npmrc / .netrc
  / .git-credentials / .aws/credentials / .gcloud/credentials* /
  .docker/config.json / .kube/config). Living in Validate (not DryRun)
  means dry-run returns non-zero on hit too, so the dry-run preview
  matches Execute.
- Narrow the credential pattern set. .git/, SSH private keys, *.pem
  and *.key are out of scope -- they're not env-token files and the
  false-positive rate (public certs, docs about key formats) is high.
- Add --allow-sensitive as the escape hatch for legitimate cases
  (e.g. a docs site shipping .env.example on purpose). DryRun surfaces
  the waived list in sensitive_waived so the caller can relay it.
- Drop the cwd defense-in-depth in runHTMLPublish. A clean cwd is now
  a valid publish target.

The lark-apps skill and the html-publish reference are updated to
describe the new gate, the override flag, and the patterns now
explicitly out of scope.

* feat(apps): drop .gcloud/* from credential-file scan

The .gcloud/credentials pattern matched a non-existent path: gcloud's
actual config dir is ~/.config/gcloud/ (XDG-based), and the real
credential files there are credentials.db / access_tokens.db /
application_default_credentials.json -- none of which would land under
a .gcloud/ segment in a publish payload.

Drop the rule rather than fix it: the realistic gcloud foot-gun would
require recognizing the .config/gcloud/* tree by file basename, which
is a broader change than the targeted env/cred scan in this PR. The
remaining 7 patterns (.env / .env.* / .npmrc / .netrc /
.git-credentials / .aws/credentials / .docker/config.json /
.kube/config) cover the common Node/Python/CLI-tooling foot-guns.

* fix(apps): close credential-scan bypass when --path is the parent dir itself

isSensitiveRelPath anchors cloud-SDK matchers on adjacent parent/file
segments (.aws/credentials, .docker/config.json, .kube/config), but
walker strips that parent via filepath.Rel when --path is the conventional
parent dir (e.g. ./.aws), yielding a bare RelPath="credentials" that
slipped through silently. Same bypass for the single-file form
--path ./.aws/credentials (walker sets RelPath = Base(rootPath)).

Wrap the scan in isSensitiveCandidate: keep the fast RelPath scan, and
on miss fall back to filepath.Abs(AbsPath) so the parent segment is
visible again. isSensitiveRelPath itself is unchanged; existing tests
still pin its pure-function contract.

* fix(apps): drop filepath.Abs from sensitive scan to satisfy forbidigo lint

The previous fix called filepath.Abs(c.AbsPath) — banned by the repo's
forbidigo rule because shortcuts must not reach into the filesystem for
path resolution.

Reframe the same fix without fs access: re-prepend the root's basename
(or, for the single-file form, the parent dir's basename of rootPath)
to RelPath and re-scan only the parent-anchored credential pairs
(.aws/credentials, .docker/config.json, .kube/config). Leaf matchers
(.env / .npmrc / ...) stay scoped to RelPath — incidentally closing a
latent false-positive where --path /home/alice/.env/dist would have
flagged every file under it just because .env appeared in the
absolute path.
2026-05-25 23:24:40 +08:00

126 lines
4.0 KiB
Go

// Copyright (c) 2026 Lark Technologies Pte. Ltd.
// SPDX-License-Identifier: MIT
package apps
import (
"path/filepath"
"strings"
)
// isSensitiveRelPath reports whether a relative path inside the candidate
// manifest is a well-known env / credential file that should not ship to a
// public-internet share URL. The check is path-element-wise (each
// "/"-delimited segment is inspected) so credential files nested under
// arbitrary subdirectories are still caught.
//
// Used by +html-publish: dry-run AND Execute both block by default when any
// candidate matches. Pass --allow-sensitive to override (legitimate cases:
// a documentation site shipping example credential files on purpose).
//
// Scope is intentionally narrow — only files that conventionally hold API
// tokens or service credentials, not the broader "anything cryptographic"
// surface. SSH private keys, generic *.pem / *.key, and SCM internals are
// out of scope here; if they leak it's a separate problem to address.
func isSensitiveRelPath(rel string) bool {
if rel == "" {
return false
}
parts := strings.Split(rel, "/")
for i, p := range parts {
switch {
case p == ".env" || strings.HasPrefix(p, ".env."):
return true
case p == ".npmrc":
return true
case p == ".netrc":
return true
case p == ".git-credentials":
return true
}
if i == 0 {
continue
}
parent := parts[i-1]
switch parent {
case ".aws":
if p == "credentials" {
return true
}
case ".docker":
if p == "config.json" {
return true
}
case ".kube":
if p == "config" {
return true
}
}
}
return false
}
// hasParentAnchoredCredentialPair scans a "/"-delimited path for the
// cloud-SDK matchers that depend on a conventional parent dir:
// .aws/credentials, .docker/config.json, .kube/config. The leaf-name
// matchers (.env / .npmrc / ...) intentionally do NOT run here, so callers
// can probe a path that includes surrounding root context without risking
// a leaf-rule false-positive on the context segment itself (e.g. a literal
// ".env" directory somewhere in --path's ancestry).
func hasParentAnchoredCredentialPair(path string) bool {
parts := strings.Split(path, "/")
for i := 1; i < len(parts); i++ {
switch parts[i-1] {
case ".aws":
if parts[i] == "credentials" {
return true
}
case ".docker":
if parts[i] == "config.json" {
return true
}
case ".kube":
if parts[i] == "config" {
return true
}
}
}
return false
}
// isSensitiveCandidate is the call-site wrapper used by +html-publish.
//
// Two passes:
//
// 1. Scan RelPath with the full matcher (isSensitiveRelPath). Handles the
// common in-tree case (e.g. ./site/.env, ./dist/.docker/config.json).
// 2. Re-probe at the boundary between rootPath and the candidate, using
// ONLY hasParentAnchoredCredentialPair. walker strips the root segment
// via filepath.Rel, so when --path is itself the conventional parent
// dir (e.g. ./.aws) RelPath comes back as a bare "credentials" and
// step 1 has no parent to anchor on. Re-prepending the root's basename
// — or, for the single-file form, the parent dir's basename of
// rootPath — exposes the missing segment. Leaf matchers are NOT re-run
// in this pass, so an ancestor like /home/alice/.env/dist can't
// false-positive every file beneath it just because ".env" appears in
// the root context.
//
// Pure string-level reasoning over rootPath — no filesystem access, no
// reliance on cwd — so it composes with the project's fileio sandbox and
// stays inside the shortcuts-layer constraint against direct fs lookups.
func isSensitiveCandidate(rootPath string, c htmlPublishCandidate) bool {
if isSensitiveRelPath(c.RelPath) {
return true
}
for _, ctx := range []string{filepath.Base(rootPath), filepath.Base(filepath.Dir(rootPath))} {
switch ctx {
case "", ".", "..", "/":
continue
}
if hasParentAnchoredCredentialPair(filepath.ToSlash(filepath.Join(ctx, c.RelPath))) {
return true
}
}
return false
}