mirror of
https://github.com/CherryHQ/cherry-studio.git
synced 2026-07-03 20:59:22 +08:00
refactor/code-cli
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4ef2889fd3 |
refactor(file-ref): split persistent ref ownership (#16532)
Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com> Signed-off-by: eurfelux <eurfelux@gmail.com> |
||
|
|
bf5910f424 |
refactor(shared): clean up @shared residue, single-process residents, barrels, naming
- Drop dead @deprecated LanFileChunk{,Ack}Message types; de-Redux a stale comment
- Relocate single-process modules out of @shared (Invariant 1.1):
- types/plugin.ts -> main/utils (markdownParser-only)
- utils/ocr.ts + types/ocr.ts -> renderer (renderer-only; types now standalone)
- utils/searchSnippet.ts, utils/pdf.ts -> main/utils (main-only)
- utils/externalApp.ts EXTERNAL_APPS -> inlined into ExternalAppsService
- Convert types/file, utils/file, utils/command, utils/api barrels from
`export *` to explicit named exports (section 3.1)
- Naming: urlUtil.ts->url.ts; api/utils.ts->api/format.ts (+fold formatApiHost/
formatOllamaApiHost); hasAPIVersion->hasApiVersion; URLString/FileURLString->
UrlString/FileUrlString; enum codeCLI->CodeCli, terminalApps->TerminalApp with
UPPER_SNAKE_CASE members
- Sync docs to the moves/renames; track data/types route-by-shape as an open question
|
||
|
|
e2f479d912 |
feat(files): wire file page to real data (#15338)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com> Signed-off-by: icarus <eurfelux@gmail.com> |
||
|
|
d06302db48 |
refactor(shared): rename preset & codeLanguages files to camelCase per naming-conventions §3.2
`.ts` files under `src/shared` must use camelCase (naming-conventions §3.2);
kebab-case is only sanctioned under `packages/ui/` and `src/renderer/routes/`.
The `presets/` kebab naming came from best-practice-layered-preset-pattern.md,
which predated and conflicted with the authoritative spec.
- Rename presets/{code-cli,default-assistant,file-processing,mini-apps,
translate-languages,web-search-providers}.ts and utils/code-languages.ts
(plus the two matching __tests__ files) to camelCase, and update all importers
- Fix the upstream generator scripts/update-languages.ts to emit
codeLanguages.ts; otherwise `pnpm update:languages` would recreate the
kebab-named file
- Correct best-practice-layered-preset-pattern.md (kebab -> camelCase) and link
it to naming-conventions §3.2 so it cannot drift again
- Fix two stale `types/file` path references in file/architecture.md
|
||
|
|
53a3577389 |
refactor(renderer): flatten src/renderer/src to src/renderer
Move all renderer source from src/renderer/src/* up one level to
src/renderer/*, removing the redundant nested src directory.
- Update path aliases (@renderer, @types, @logger, @data) and TanStack
Router paths in electron.vite.config.ts; update tsconfig.{json,web,node}
path mappings and include globs.
- Fix Vite root-relative script paths in the 8 renderer HTML entries.
- Update cross-process relative imports in main/preload (language,
apiServer models, preload index) to drop the /src segment.
- Switch renderer test imports of the logger mock to the @test-mocks alias.
- Update hardcoded renderer paths in scripts and their fixtures, lint
configs (eslint/oxlint/biome), CODEOWNERS, docs, and the data-classify tool.
- Convert deep (../../+) relative imports within the renderer to the
@renderer alias (69 files, 108 imports); keep single-level relatives.
- Fix doc links broken by the move and correct one pre-existing broken
link in naming-conventions.md.
|
||
|
|
c514dcc049 |
refactor(shared): move packages/shared to src/shared
packages/shared was never a real pnpm workspace package (no package.json); it was referenced only through the @shared TypeScript path alias. Relocate it under src/ via git mv (143 files, detected as pure renames).
Repoint the @shared alias and include globs to src/shared across electron.vite.config.ts, tsconfig.{json,node,web}.json and vitest.config.ts; update scripts/check-custom-exts.ts, scripts/update-languages.ts, the eslint.config.mjs generated-file globs, the data-classify generator output targets, .github/CODEOWNERS path rules, and CLAUDE.md/docs/source-comment references.
The @shared alias name is unchanged, so all 1403 @shared/* import sites resolve without modification. Verified with typecheck:node, typecheck:web and the full test suite (700 files, 9739 tests passing).
|
||
|
|
db8a1834c2 |
feat(file-manager): unified DirectoryTreeBuilder primitive for Notes (#15363)
### What this PR does Before this PR: - Directory loading is split between two pipelines: `FileStorage.getDirectoryStructure` (Notes) and `FileStorage.listDirectory` (everything else). Same workspace, two scans, two watchers. - `FileStorage.listDirectory` defaults `maxEntries: 20`, silently truncating list-mode callers (workspace trees rendered `tests/__mocks__` as a file because it sorted out of the first 20 results). - Notes' manual chokidar plumbing (`startFileWatcher` / `onFileChange`) lives on legacy `file-change` IPC and crashes with `EMFILE: too many open files` on large repos because the watcher opens one FD per directory in `node_modules`. After this PR: - One primitive — `DirectoryTreeBuilder` (`src/main/file/tree/builder.ts`, RFC §12) — owns the in-memory tree and the chokidar watcher. It is the only directory-walking code on the main side. - Builder dedupe lives behind `TreeRegistry` (lifecycle `WhenReady`): identical `(rootPath, options)` requests share one builder, with a 500ms grace window so a remount inside one React commit reuses the warm scan + watcher instead of paying for a rescan. - `.gitignore` parsing (via `ignore@7`) drives BOTH ripgrep's `--ignore-file` AND chokidar's `ignored` predicate, so EMFILE on `node_modules`-heavy workspaces is gone. `.git` is always excluded even when `.gitignore` doesn't list it. - `search.listDirectory` is the now-real Phase 2 entry point (was a stub). `maxEntries` default raised to `Number.MAX_SAFE_INTEGER`; truncation is a search-mode concern and callers that want a cap pass it explicitly. The 20-entry default was the bug. - Renderer-side `useDirectoryTree(rootPath, options)` hook applies mutation pushes to a JSON-mirrored `TreeDirRoot`. The tree DTO and `TreeNode` class live in `packages/shared/file/types/tree.ts` so main and renderer share one shape (no parent cycles in JSON; WeakMap parent ref recovered in `FromJSON`). - Notes (`NotesPage`) migrated off `getDirectoryStructure` + manual watcher onto `useDirectoryTree` + `useNote` join. The Redux `starredPaths` / `expandedPaths` path stays untouched in this PR — `useNote` already covers that surface via `noteTable`. Fixes # ### Why we need it and why it was done in this way The following tradeoffs were made: - **Builder dedupe on main, not renderer.** A renderer-side cache would still pay one IPC round-trip per remount because the expensive thing is the FS scan + watcher install, which only the main side owns. Sharing on the main means one ripgrep + one chokidar regardless of how many panes mount the same root. - **500ms grace window for builder teardown.** Long enough to cover React's "deletions before insertions" effect ordering (sub-millisecond in practice), short enough that closing a workspace doesn't keep watcher FDs alive noticeably. - **`Tree_*` IPC owned by `TreeRegistry.onInit`.** The lifetime of the handlers must match the lifetime of the chokidar watchers they reference. Putting them in `FileManager` would leak handlers across re-init. - **`TreeNode` class hierarchy (not plain DTOs)** lets `rename` mutate `path` once at the subtree root and have `adjustChildrenPaths` cascade — the alternative is rebuilding the subtree, which throws away every consumer's identity-based caching. - **Hard ESLint isolation:** `src/main/file/tree/**` does not import `@main/data/**`. Tree is a runtime concern; persistence is an orthogonal one (`noteTable` is a sparse state overlay on top of FS paths, not a tree mirror). RFC §12.6. The following alternatives were considered: - Keep the renderer-side cache (rejected — covered above; main-side dedupe is strictly stronger). - Add an `excludeGlobs` array option (rejected — the right source of truth is `.gitignore`, and users editing `.gitignore` get free updates). - One DTO without classes (rejected — rename cascade is the load-bearing case). Links to places where the discussion took place: RFC §12 (`v2-refactor-temp/docs/file-manager/rfc-file-manager.md`) ### Breaking changes - Removed legacy IPC channels: `File_GetDirectoryStructure`, `File_StartWatcher`, `File_StopWatcher`, `File_PauseWatcher`, `File_ResumeWatcher`. Only Notes used them; the migration is in this PR. No renderer outside Notes touched them. - `search.listDirectory` (renamed from `FileStorage.listDirectory`) now defaults to unbounded `maxEntries`. Existing callers that relied on the implicit 20-cap (none I could find) would need to pass `maxEntries: 20` explicitly. ### Special notes for your reviewer - `src/main/file/tree/__tests__/builder.test.ts` covers initial scan, `.gitignore` honoring, chokidar fan-out, `dispose` cleanup, and JSON round-trip (no parent cycles). - `src/main/file/tree/__tests__/registry.test.ts` covers builder dedupe, grace-window reuse, multi-consumer mutation fan-out, and `webContents`-destroyed cascade cleanup. - `src/main/utils/file/__tests__/search.test.ts` locks in the `maxEntries` default fix and the `--hidden` flag wiring. - ArtifactPane / chat-page integration of this primitive lives on `feat/chat-page` and is not part of this PR (that branch has the Shell forceMount + chat-page Tree IPC integration on top). ### Checklist - [x] PR: The PR description is expressive enough and will help future contributors - [x] Code: Write code that humans can understand and Keep it simple - [x] Refactor: You have left the code cleaner than you found it (Boy Scout Rule) - [ ] Upgrade: Impact of this change on upgrade flows was considered and addressed if required - [ ] Documentation: A user-guide update was considered and is present (link) or not required. - [x] Self-review: I have reviewed my own code before requesting review from others ### Release note ```release-note NONE ``` --------- Signed-off-by: suyao <sy20010504@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4a1f39b7ae |
feat(file): Phase 2 Batch 0 — FileMigrator + cross-module coordination (#15067)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: SuYao <sy20010504@gmail.com> Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com> Signed-off-by: icarus <eurfelux@gmail.com> |
||
|
|
6ec914cf0f |
refactor(file-entry): rename trashedAt to deletedAt (#15246)
### What this PR does Before this PR: - `file_entry` table used `trashed_at` for the soft-delete timestamp, diverging from every other soft-deletable table in the schema (`agent`, `assistant`, `message`, `topic`), which all use `deleted_at`. After this PR: - `file_entry.deleted_at` (and BO field `deletedAt`) — naming is consistent across the entire schema. - Renamed identifiers: - Schema field: `trashedAt` → `deletedAt` - SQL column: `trashed_at` → `deleted_at` - Index: `fe_trashed_at_idx` → `fe_deleted_at_idx` - CHECK constraint: `fe_external_no_trash` → `fe_external_no_delete` - Updated all source files, tests, and architecture docs (including `v2-refactor-temp/docs/file-manager/`). - **Intentionally NOT renamed** (out of scope — these are API surface / concept names, not the column name): `moveToTrash`, `restoreFromTrash`, `inTrash` (query flag), `isTrashed`, `batchTrash`, `internalTrash`, and "Trash" as a concept in comments/docs. Fixes # ### Why we need it and why it was done in this way The following tradeoffs were made: - **Scope discipline**: kept the rename strictly at the column-identifier layer (4 identifiers). Did not change API names or concept words — switching the "Trash" concept to "Delete" is a larger semantic change that deserves its own PR. - **Migration 0026 contains a manual SQL patch.** drizzle-orm/drizzle-kit issue [#3653](https://github.com/drizzle-team/drizzle-orm/issues/3653) causes the SQLite rebuild-table path to drop the leading `ALTER TABLE … RENAME COLUMN` statement. The generated `INSERT … SELECT "deleted_at" FROM file_entry` would fail because the source table still has `trashed_at`. The migration manually prepends an explicit `ALTER TABLE file_entry RENAME COLUMN trashed_at TO deleted_at;` before the rebuild. Upstream fix landed in `drizzle-kit@1.0.0-beta`/`rc` but is not backported to the `0.31.x` stable line we depend on. - **Why keeping the manual patch is acceptable**: per `CLAUDE.md` § v2 Refactoring, `migrations/sqlite-drizzle/` is throwaway during v2 — it will be wiped and regenerated as a single clean initial migration from the final schemas before release. Mid-development DB drift is explicitly acceptable, and the manual SQL only needs to survive until that regeneration. The following alternatives were considered: - Selecting `create column` in `drizzle-kit generate` instead of `rename column`: also produces invalid SQL (same root cause — the rebuild path puts the new column name in the `SELECT` list regardless of the rename mapping). Rejected. - Skipping the `0026` migration entirely and relying on `db:push` / DB reset during dev: pollutes `_journal.json` divergence and makes the next schema change confusing. Rejected. - Upgrading to `drizzle-kit@1.0.0-beta`/`rc` to get the fix: v1 is a major rewrite with significant breaking changes (alternation engine rewrite, ORM type system rewrite, migration folder layout change). Out of scope for this PR. Rejected. Links to places where the discussion took place: N/A ### Breaking changes None. Dev-only DB column rename during v2 refactor. No user-visible behavior change. No public API surface change. v1 data never reaches this branch except through migrators in `src/main/data/migration/v2/`. ### Special notes for your reviewer - The single manual edit to drizzle-generated SQL is in `migrations/sqlite-drizzle/0026_sturdy_aqueduct.sql` — look for the `MANUAL PATCH` comment block at the top. Without it the migration will fail to apply. - "Trash" concept words still appear throughout the file-manager codebase by design (function names, comments, docs section headings). If we later want to migrate the whole concept to "Delete", that should be a follow-up PR. ### Checklist This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR. Approvers are expected to review this list. - [x] PR: The PR description is expressive enough and will help future contributors - [x] Code: [Write code that humans can understand](https://en.wikiquote.org/wiki/Martin_Fowler#code-for-humans) and [Keep it simple](https://en.wikipedia.org/wiki/KISS_principle) - [x] Refactor: You have [left the code cleaner than you found it (Boy Scout Rule)](https://learning.oreilly.com/library/view/97-things-every/9780596809515/ch08.html) - [x] Upgrade: Impact of this change on upgrade flows was considered and addressed if required - [ ] Documentation: A [user-guide update](https://docs.cherry-ai.com) was considered and is present (link) or not required. Check this only when the PR introduces or changes a user-facing feature or behavior. - [x] Self-review: I have reviewed my own code (e.g., via [`/gh-pr-review`](/.claude/skills/gh-pr-review/SKILL.md), `gh pr diff`, or GitHub UI) before requesting review from others ### Release note ```release-note NONE ``` --------- Signed-off-by: icarus <eurfelux@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d2c568e349 |
feat(file): Add schema and foundation for new file module (#13451)
### What this PR does
Adds the **Phase 1a contract surface** for the file module — types, DB
schema, DataApi + File IPC contracts, FileManager skeleton, and
architecture docs.
**Phase 1b.1 (Read Path & Repository), 1b.2 (Write Path & Lifecycle),
1b.3 (Watcher & DanglingCache), and 1b.4 (OrphanSweep &
FileRefCheckerRegistry) are now all landed on top of 1a in this same
PR.** This is the complete Phase 1b runtime — reviewers see the full
read + write + watcher + orphan-sweep picture in one place.
Design, contracts, and decision rationale live in the architecture docs:
-
[`docs/references/file/architecture.md`](docs/references/file/architecture.md)
— module boundaries, type system, IPC/DataApi contracts, layered
architecture, service lifecycle, mutation propagation
-
[`docs/references/file/file-manager-architecture.md`](docs/references/file/file-manager-architecture.md)
— FileManager internals (storage, version detection, atomic writes,
reference cleanup, DirectoryWatcher, orphan sweep, DanglingCache state
machine, key design decisions)
#### Phase 1a deliverables
- Types (`FileEntry` / `FileInfo` / `FileHandle` /
`CanonicalExternalPath` brand)
- DB schema (`file_entry` + `file_ref`) with per-origin CHECK
constraints
- DataApi schemas + stub handlers
- File IPC contract (polymorphic `FileHandle` dispatch;
`batchGetMetadata` included)
- FileManager skeleton + `internal/*` + `ops/*` + `watcher/` +
`DanglingCache` + `versionCache`
- Mutation propagation design (three typed events + prefix-based
queryKey invalidation)
#### Phase 1b.1 deliverables (read-path runtime)
- Shared utilities: `getFileTypeByExt`, `sanitizeFilename`,
`validateFileName` extracted to `@shared/file/types`
- Path utilities: `canonicalizeExternalPath` (NFC + null-byte guard +
trailing-sep strip), `isPathInside`, `isUnderInternalStorage`,
`canWrite`
- FS read primitives: `stat`, `exists`, `read` (text/base64/binary
overloads), `hash` (initial MD5; swapped to xxhash-h64 in 1b.2)
- Metadata utilities: `getFileType(path)`, `isTextFile`, `mimeToExt`
- Repositories: `FileEntryService` + `FileRefService` read methods
(Drizzle-backed, Zod-branded outputs)
- Pure-function modules: `internal/content/{read,hash}`,
`internal/dispatch.ts` (FileHandle dispatcher)
- `toFileInfo(entry)` projection
- `FileManager` class as `BaseService` (`@Injectable('FileManager')`
`@ServicePhase(WhenReady)`); read methods only
- `DanglingCache` + `VersionCache` minimal viable singletons (full impls
in 1b.3 / 1b.2)
- DataApi `/files/*` read handlers fully implemented (entries / single /
ref-counts / refs-by-source)
- 60+ TDD tests (unit + boundary + setupTestDatabase integration)
#### Phase 1b.2 deliverables (write-path runtime)
- **FS atomic primitives** (open to non-file-module consumers per
architecture §5.3): `atomicWriteFile` (tmp + fsync + rename +
fsync(dir)), `atomicWriteIfUnchanged` (re-stat OCC + content-hash
fallback for second-precision mtime), `createAtomicWriteStream`
(Writable wrapper, abort/destroy unlinks tmp)
- **FS general primitives**: `write` (delegates to atomicWriteFile),
`copy` (atomic dest), `move` (rename + EXDEV → copy+unlink fallback),
`remove` (idempotent ENOENT), `mkdir` / `ensureDir` / `removeDir`,
`download` (fetch → atomic stream)
- **Hash swap**: `hash()` migrated MD5 → `xxhash-wasm` h64 streaming;
legacy `md5` dep retained for KnowledgeService loaders
- **VersionCache LRU**: capacity-bounded (default 2000) with
re-insert-on-touch recency
- **Repository mutations**: `FileEntryService.create/update/delete`
(auto UUIDv7 default id, raw DB CHECK errors propagate);
`FileRefService.create/createMany/cleanupBySource/cleanupBySourceBatch`
(`onConflictDoNothing` for batch upsert)
- **internal/entry/**: `create.createInternal` (4 source variants: bytes
/ base64 / path / url) + `create.ensureExternal` (canonicalize + stat +
idempotent upsert + duplicate-suspect peer warn);
`lifecycle.trash/restore/permanentDelete` + batch variants (DB+FS
decoupled, internal best-effort unlink); `rename` (internal DB-only,
external fs.move + canonical externalPath); `copy` (pipes through
createInternal with rollback)
- **internal/content/write.ts**: `write` / `writeIfUnchanged`
(cache-not-trusted re-stat OCC, `StaleVersionError` rewrap from
`PathStaleVersionError`) / `createWriteStream` / `*ByPath` variants
- **internal/system/**: `shell.open` / `shell.showInFolder` (electron
`shell` wrappers); `tempCopy.withTempCopy` (isolated tmp dir; cleanup on
throw)
- **FileManager facade**: every IFileManager mutation method now
delegates to its `internal/*` counterpart (`createInternalEntry` /
`ensureExternalEntry` / `batchCreate*` / `batchEnsure*` / `write` /
`writeIfUnchanged` / `createWriteStream` / `createReadStream` / `trash`
/ `restore` / `permanentDelete` + batch / `rename` / `copy` /
`withTempCopy` / `open` / `showInFolder`); no method throws
notImplemented anymore
- ~60 new TDD tests (each behavior unit = one RED→GREEN→REFACTOR
commit); end-to-end integration scenarios via `setupTestDatabase` cover
atomic-rollback zero-residue, OCC second-precision-mtime
no-false-positive, trash-external CHECK enforcement, full
create→write→read→trash→restore→permanentDelete round-trip, and external
permanentDelete-leaves-user-file-untouched
#### Phase 1b.3 deliverables (watcher + DanglingCache observability)
- **DanglingCache class** (replaces 1a const-literal skeleton):
`byEntryId: Map<entryId, CachedState>` + `pathToEntryIds:
Map<canonicalPath, Set<entryId>>` reverse index, lazy TTL expiration
(default 30 min per architecture §11.2), `forceRecheck` escape hatch,
`Emitter<DanglingStateChangedEvent>` firing only on genuine state
transitions (same-state observations are silent). Injectable `now` /
`statProbe` / `ttlMs` / `fileEntryService` seams for deterministic
tests.
- **createDirectoryWatcher** chokidar v4 wrapper: `add` / `unlink` /
`change` / `ready` / `error` events; built-in OS-junk basename ignores
(`.DS_Store` / `.localized` / `Thumbs.db` / `desktop.ini`); idempotent
`close()`. Factory auto-wires `add` →
`danglingCache.onFsEvent(path,'present')` and `unlink` → `'missing'`.
(Architecture §8.2's richer `onAddDir`/`onUnlinkDir`/`onRename` events
deferred — no consumer needs them in scope.)
- **Reverse-index maintenance from mutation flows**: `ensureExternal`
calls `addEntry` + `onFsEvent('present','ops')` on insert (no-op on
reuse); `permanentDelete(external)` calls `removeEntry`;
`rename(external)` swaps `removeEntry(oldPath) + addEntry(newPath) +
onFsEvent(newPath,'present','ops')`.
- **FileManager surface**: `getDanglingState({id})` (internal →
'present', external → cache check, unknown id → 'unknown');
`batchGetDanglingStates({ids})` (parallel fan-out, unknown ids mapped to
'unknown'); `subscribeDangling({id}, listener)` (in-process per-entry
filter; renderer fan-out via `file-manager-event` IPC channel deferred
to Phase 2).
- **FileManager.onInit**: awaits `danglingCache.initFromDb()` (populates
reverse index from non-trashed external entries; no startup stat probe
per architecture §10.6); registers `File_GetDanglingState` /
`File_BatchGetDanglingStates` IPC handlers via `this.ipcHandle`
(auto-disposed on stop).
- New `IpcChannel` constants: `File_GetDanglingState`,
`File_BatchGetDanglingStates`.
- ~30 new TDD tests across DanglingCache (18 unit) + watcher (6 real-FS)
+ FileManager integration (INT-7..INT-10).
#### Phase 1b.4 deliverables (orphan sweep + FileRefCheckerRegistry)
- **FileRefCheckerRegistry**: `Record<FileRefSourceType,
SourceTypeChecker<...>>` typed registry forces exhaustive coverage at
compile time — adding a new variant to `FileRefSourceType` without a
checker triggers a TS build error. Phase 1 ships `FileRefSourceType =
'temp_session' | 'knowledge_item'`: real DB-backed checker for
`knowledge_item` (Drizzle `inArray` against `knowledge_item`);
`temp_session` checker treats every sourceId as gone (sessions are
in-memory only). `chat_message` / `painting` / `note` are **deliberately
not in the union yet** — each will be added in lockstep (tuple entry in
`allSourceTypes` + `createRefSchema` variant + `SourceTypeChecker`) by
the PR that migrates the owning domain's DB tables to v2. Stray writes
during the migration window fail fast at `FileRefSchema.parse` rather
than being silently persisted under a no-op stub.
- **OrphanRefScanner** (RFC §6.4): `scanOneType(sourceType)` enumerates
distinct `file_ref.sourceId` per type, asks the checker which are alive,
deletes the rest via `cleanupBySourceBatch`. `scanAll()` aggregates
across every registered sourceType. Backed by new
`FileRefService.listDistinctSourceIds` to keep all SQL inside the repo.
- **Report-only orphan-entry pass** (architecture §7.1 default policy is
"preserve"): `scanOrphanEntries` groups active entries with zero
`file_ref` rows by origin. **No deletion** — surfaced via
`getOrphanReport()` for the cleanup-UI consumer. Backed by new
`FileEntryService.findUnreferenced` LEFT JOIN-based query.
- **Startup file sweep** (architecture §10): `runStartupFileSweep`
snapshots `file_entry.id` (active + trashed) into a `Set` via new
`FileEntryService.listAllIds`, walks `{userData}/files/`, plans unlink
for (a) UUID-named files whose id is not in the snapshot and (b)
`*.tmp-<UUID>` atomic-write residue. Applies the `mtime > 5min`
freshness gate (§10.3) — files newer than that are presumed in-flight
and preserved. Plan-then-execute with the `50% / 20-count-floor /
10MB-floor` safety threshold (§10.4); aborts emit
`abortReason='count-fraction'|'byte-fraction'`. Single structured
`orphan-file-sweep` log per run (info / warn / error per outcome,
§10.5).
- **DB-sweep umbrella + observability** (`runDbSweep`): runs scanAll +
scanOrphanEntries, emits one `orphan-sweep` structured record
summarising both passes; failure path returns `outcome='failed'` +
`errorMessage` so callers don't throw on background fire-and-forget.
- **FileManager integration**: `onInit` schedules a fire-and-forget
`runStartupSweeps` that runs the FS-level + DB-level sweeps in parallel;
failures of either are logged but never block ready. `getOrphanReport()`
exposes the most recent `DbSweepReport` (orphan-ref counts already
cleaned + orphan-entry counts preserved) + `lastRunAt` for the cleanup
UI surface.
- ~30 new TDD tests across registry (14 unit) + orphan sweep (16 unit +
integration) + FileManager integration (INT-11/INT-12) + repo
(`findUnreferenced`, `listAllIds`, `listDistinctSourceIds`).
**Out of scope (deferred to Phase 2)**:
- Architecture §7.2 dangling-external auto-cleanup (external + missing +
0-ref + >30d retention) — narrow extension shipping with the cleanup UI.
- Adding `chat_message` / `painting` / `note` as `FileRefSourceType`
variants (tuple entry + schema + checker added together) — gated on each
domain's v2 batch migration.
- Cleanup-UI surface that consumes `getOrphanReport()` — Phase 2
renderer work.
Renderer-side File IPC bridge for write/dangling methods stays deferred
to Phase 2 alongside the consumer-batch migrations. The Phase 1b runtime
is consumable from main-side business services through
`application.get('FileManager')`.
### Why we need it and why it was done in this way
Contract-first concentrates design review in one place; Phase 1b.x then
becomes pure "honor the contracts". Each 1b.x phase keeps strict TDD
(RED → GREEN → REFACTOR per behavior, ~one commit per cycle); each phase
ends with a verification gate (push → CI green) before the next phase
begins.
Core decisions (origin two-state, `FileEntry`/`FileInfo` split, DataApi
SQL-only, external `permanentDelete` DB-only, TTL `DanglingCache`, OCC
trust boundary, atomic write fsync default, etc.) and their rationale
are recorded in `file-manager-architecture.md §12 Key Design Decisions`
— not duplicated here.
### Breaking changes
None — purely additive (read+write paths are new, no existing callers
replaced yet).
### Special notes for your reviewer
- Review focus: contracts (1a) + read-path runtime (1b.1) + write-path
runtime (1b.2) + watcher/DanglingCache (1b.3) + orphan sweep / registry
(1b.4). Phase 1b is now complete on this branch.
- **Phase 1a contract stability policy** (architecture.md top) is
binding — any 1b.x PR that finds a contract mismatch PRs the doc
revision first.
- Deferred (Phase 2): renderer-side File IPC bridge for
write/dangling/orphan methods (alongside consumer migration); cleanup-UI
surface consuming `getOrphanReport()`; architecture §7.2
dangling-external auto-cleanup (>30d retention); adding `chat_message` /
`painting` / `note` as `FileRefSourceType` variants (each adds tuple
entry + schema + checker in lockstep, gated on the owning domain's v2
batch migration); DanglingCache periodic snapshot logger (architecture
§11.8); `listDirectory` ripgrep wrapper; `compressImage`
(KnowledgeService consumer); FileUploadService + `file_upload` table
(Vercel AI SDK Files API).
### Checklist
- [x] PR: The PR description is expressive enough and will help future
contributors
- [x] Code: Write code that humans can understand and Keep it simple
- [x] Refactor: You have left the code cleaner than you found it (Boy
Scout Rule)
- [ ] Upgrade: N/A — purely additive
- [ ] Documentation: Internal architecture docs included
(`docs/references/file/`); no user-facing docs change
- [x] Self-review: I have reviewed my own code before requesting review
from others
### Release note
```release-note
NONE
```
---------
Signed-off-by: icarus <eurfelux@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com>
|