Files
williamfzc 2efadece34 feat: add scheduled issue labeler for type/domain triage (#251)
* ci: add issue labeler workflow

Add a manual GitHub Actions workflow and script to poll issues and apply type/domain labels.

* feat(issue-labels): refine heuristics and add docs

Improve domain detection and add safeguards to avoid overriding manual type triage by default. Refresh regression samples from real issues and document usage.

* ci(issue-labels): enable hourly scheduled labeling

Run hourly on schedule with write mode by default while keeping manual dispatch dry-run by default.

* ci(issue-labels): shorten lookback window to 6h

Reduce scheduled scan window while keeping overlap for missed runs.

* ci(issue-labels): opt into Node 24 actions runtime

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 and use Node 24 for the script runtime to avoid upcoming Node 20 deprecation warnings.

* ci(issue-labels): restore lookback input for manual runs

Allow workflow_dispatch to override lookback_hours while keeping hourly schedule fixed.

* ci(issue-labels): upgrade checkout/setup-node to v6

Use actions/checkout@v6 and actions/setup-node@v6 to align with Node 24 runtime and avoid Node 20 deprecation warnings.

* fix(ci): label only unlabeled issues via search api

* fix(ci): refine issue labeling heuristics from live issues

* fix(ci): address remaining issue label review comments

* fix(ci): fix issue label arg parsing regression

* docs(issue-labels): clarify one-shot unlabeled triage scope
2026-04-07 10:35:40 +08:00

75 lines
3.2 KiB
Markdown

# Issue Labels
This script searches unlabeled GitHub issues in a repository and applies labels based on heuristics. It is intentionally a one-shot triage pass: once any label is added to an issue, that issue is out of scope for future scheduled runs.
It only covers two label dimensions:
- **Type**: `bug` / `enhancement` / `question` / `documentation` / `performance` / `security`
- **Domain**: `domain/<service>` (multi-select)
Related GitHub Actions workflow: `.github/workflows/issue-labels.yml`.
## Labeling Rules (Current)
### Type (single-select; write only when matched)
- Candidates: `bug`, `enhancement`, `question`, `documentation`, `performance`, `security`
- Type is written **only when keywords are matched** in title/body. If nothing matches, the script will not add or correct type labels.
- By default, the script **does not override existing type labels** to avoid reverting manual triage. Use `--override-type` if you really want the script to enforce the computed type.
### Domain (multi-select; add-only by default)
- Managed labels prerequisite: the standard type labels plus `domain/<service>` labels should exist in the repository. If a specific issue needs a managed label that is missing, the script prints a warning, skips that issue, and continues processing the rest.
- Label format: `domain/<service>` (e.g. `domain/base`, `domain/im`)
- Signals (strong → weak):
1) Explicit `domain/<service>` in text
2) Command mention: `lark-cli <service>` / `lark cli <service>` (maps `docs``doc`)
3) Loose title match (careful; excludes English `im` to reduce false positives)
4) A small set of conservative keyword heuristics as fallback
- By default, the script only adds missing domain labels and never removes existing ones.
- If you want stricter domain synchronization, use `--sync-domains`.
- Note: the current implementation only removes existing `domain/*` labels when it can positively match at least one domain for the issue, so this is not an exact-sync cleanup mode.
## Usage
### GitHub Actions (recommended)
The workflow supports both:
- `schedule` (hourly)
- `workflow_dispatch` (manual run)
Scheduled runs write labels by default. Manual runs default to dry-run unless `dry_run=false` is selected.
Only issues with no labels are scanned. This is intentional: the automation is meant to triage brand-new unlabeled issues once, not to continuously reconcile labels on previously triaged issues.
### Local dry-run
Provide a token to avoid anonymous rate limits:
```bash
GITHUB_TOKEN=$(gh auth token) \
node scripts/issue-labels/index.js \
--repo larksuite/cli \
--max-issues 100 \
--dry-run --json
```
### Common Flags
- `--dry-run`: Do not write labels, only print planned changes
- `--json`: JSON output (usually with `--dry-run`)
- `--max-issues <n>` / `--max-pages <n>`: Bound unlabeled-issue search size for each run
- `--sync-domains`: Stricter `domain/*` sync when at least one managed domain matches (may still leave stale labels if nothing matches)
- `--override-type`: Allow overriding existing type labels (use with caution)
## Regression Samples
`samples.json` is a regression dataset sampled from real issues in `larksuite/cli` (issue bodies are truncated).
Run tests:
```bash
node scripts/issue-labels/test.js
```