- add §5.1 file_ref belongs to FILE_STORAGE (file_entry member, not
source-owned): fileEntryId NOT NULL cascade FK is owning member; job_schedule
row-scope analogy does not hold (no cross-domain NOT NULL FK)
- clarify finalize #6: source table owned by declarer, target may be cross-domain
- drop 'desensitization' from beforeArchive step (self-use keeps plaintext)
- mark backup:refs:generate/check as planned (not yet in package.json)
Signed-off-by: George·Dong <GeorgeDong32@qq.com>
- DbColumnName is Drizzle property name (camelCase); physical SQLite column
is snake_case via casing:'snake_case'. backup uses drizzle builder only
- add entity_tag to §3.5 TAGS_GROUPS overview (junction, tagId->tag cascade)
- split translate_history into two independent aggregates (translate_language
natural-key + translate_history uuid-entity); it has no languageCode column,
only source/targetLanguage optional FKs
Signed-off-by: George·Dong <GeorgeDong32@qq.com>
openspec/ is a local symlink not tracked by git; referencing it in
the git-tracked architecture doc is a dead link for other clones.
The restore-safety trio is now self-contained in §9.
Signed-off-by: George·Dong <GeorgeDong32@qq.com>
### What this PR does
Before this PR:
- Knowledge embeddings and reranking ran through the legacy
embedjs-based
knowledgeV1 stack with their own provider clients, independent of the
app's
AI service.
- File-processing intake accepted several heterogeneous input shapes,
and
knowledge file items were tracked by FileEntry ids, coupling file
content to
the file-manager entry/cache.
After this PR:
- Embeddings and reranking are routed through the unified `AiService`
(with
cherryin rerank support) and guarded by strict embedding-dimension
validation
that rejects stale/mismatched vectors.
- File-processing intake is collapsed to a single path-based model;
knowledge
file items are stored by base-relative path under the knowledge-base
directory, and v1 uploads are copied into the v2 base dir during
migration so
migrated items stay reindexable/restorable.
- Legacy `knowledgeV1` is removed; the orchestration services were
renamed to
`KnowledgeService` / `FileProcessingService`.
- Chat -> knowledge attach is temporarily disconnected (tracked TODO)
while the
v2 file-manager bridge is rebuilt.
Fixes #N/A (no linked issue)
### Why we need it and why it was done in this way
Routing embeddings/rerank through `AiService` unifies provider handling
and
credentials and removes the parallel embedjs client stack and its v1
coupling.
Storing knowledge files by base-relative path (instead of FileEntry ids)
makes
each knowledge base self-contained and portable.
The following tradeoffs were made:
- A large, coordinated refactor plus a migration step that physically
copies v1
uploads into the v2 base dir, in exchange for removing the parallel
client
stack and making bases self-contained.
- Base-relative path storage required a fail-fast/dedup strategy for
same-named
files and a guard for blank legacy filenames.
The following alternatives were considered:
- Keeping the embedjs stack behind an adapter — rejected; perpetuates
the
parallel client and v1 coupling.
- Keeping FileEntry-id storage — rejected; couples knowledge files to
the
file-manager cache and blocks portability.
### Breaking changes
- `knowledgeV1` is removed. Legacy v1 knowledge data reaches v2 only
through the
v2 migrators; there is no v1 fallback.
- The v2 knowledge HTTP API (API gateway) now returns v2-native
per-entry fields
(`embeddingModelId`, `createdAt` on base entries; `chunkId`,
`scoreKind`,
`rank` on search results). The response envelope (`knowledge_bases`,
`searched_bases`, `total`) is unchanged. See
`v2-refactor-temp/docs/breaking-changes/2026-06-05-knowledge-api-v2.md`.
### Special notes for your reviewer
- This branch went through several rounds of multi-agent code review.
The most
recent 6 commits address review findings: directory-import path
collisions,
migrated-file source copying + blank `relativePath` guard, addItems
rollback
error preservation, eager `document_to_markdown` output-target
validation, a
`CompletedKnowledgeBase` type guard, and breaking-changes doc
corrections.
- Chat -> knowledge attach is intentionally disconnected for now
(tracked in
`v2-refactor-temp/docs/knowledge/knowledge-todo.md`).
- Local full `pnpm lint`/`pnpm test` was not run per the project's
review
conventions; please rely on CI / `pnpm build:check`.
### Checklist
- [x] Branch: This PR targets the correct branch — `main` for active
development, `v1` for v1 maintenance fixes
- [x] PR: The PR description is expressive enough and will help future
contributors
- [x] Code: Write code that humans can understand and Keep it simple
- [x] Refactor: You have left the code cleaner than you found it (Boy
Scout Rule)
- [x] Upgrade: Impact of this change on upgrade flows was considered and
addressed if required
- [ ] Documentation: A user-guide update was considered and is present
(link) or not required.
- [x] Self-review: I have reviewed my own code before requesting review
from others
### Release note
```release-note
NONE
```
---------
Signed-off-by: eeee0717 <chentao020717Work@outlook.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The features/ vs type-bucket split applies to both process roots (src/main/features/, src/renderer/features/); update the definition and the sibling-bucket note to cover the renderer (components/, hooks/, utils/).
Rewrite the src/ tree to the current v2 layout: add main/{ai,features,utils}, drop the to-be-removed knowledge/integration/apiServer and the v1 renderer store/aiCore, remove the redundant renderer src/ level, and move shared/ under src/.
Replace the flat subsystem table with a core/ subsystem table (Lifecycle, Window Manager, Scheduler & Jobs, Paths), link each system in Four Data Systems to its overview, and drop the non-core MCP/CherryClaw/Knowledge/Message-System/API-Server rows. Correct window ownership to WindowManager, and point to Naming Conventions 4.10 for the feature-vs-bucket placement rule.
Establish when a main-process domain belongs in src/main/features/ versus the services/ and utils/ type-buckets: a domain earns its own features/<domain>/ tree only when it is large, complex, and multi-file, while a single cohesive service (even domain-specific) stays as services/<Domain>Service.ts with its lone helper in utils/.
Decouple naming from placement in §4.5, and surface the rule in the §1 quick reference and the §7 decision tree.
Move builtinSkills.test.ts from src/main/__tests__ into src/main/utils/__tests__ next to src/main/utils/builtinSkills.ts, and update the under-test import and the barrel mock to the new relative paths.
src/main/__tests__/mcp.test.ts exercised src/shared/mcp.ts, sitting one layer away from the module it covers and duplicating the home of the existing src/shared/__tests__/mcp.test.ts. Merge its toCamelCase, buildMcpToolName, generateMcpToolFunctionName and buildFunctionCallToolName suites into the shared file and drop the misplaced copy.
DataApi handlers and their services may perform only SQLite reads/writes via Drizzle; fs/network/process/external-service side effects are prohibited regardless of nesting depth or an accompanying DB write. Add a Hard Rule section to api-design-guidelines, scope-limit service domain workflows to DB I/O, and echo the boundary in the overview and README.
Previously the CSLOGGER_* console overrides and the verbose (silly) file level were gated behind isDev, so a packaged build could not raise its log level for troubleshooting. Replace the isDev gates in both the main and renderer LoggerService with isDev || DIAGNOSTICS_ENABLED, so setting CS_DIAGNOSTICS makes the logger behave exactly as in dev: verbose file level, console output, and the CSLOGGER_* filters all turn on together. Idempotent in dev.
Reuses the existing DIAGNOSTICS_ENABLED flag as the single source of truth; the renderer reads CS_DIAGNOSTICS via the preload-exposed process.env. Document the behavior in the diagnostics and logging guides.
Promote the temporary boot profiler into a permanent, opt-in diagnostics facility gated by CS_DIAGNOSTICS (off by default, zero overhead when unset). Move it to src/main/core/diagnostics.ts so the db and lifecycle layers no longer cross-import a lifecycle-internal file.
Probes, all gated by the same flag: per-service init timing, phase service spans, event-loop lag, and a whenReady V8 CPU profile (carried over); slow DB queries, now covering interactive-transaction interiors and batches (not just client.execute); slow IPC handlers (BaseService.ipcHandle); window construction + ready-to-show latency (WindowManager); and slow DataApi requests (ApiServer).
DataApi request duration is consolidated to a single monotonic performance.now() measurement in handleRequest, computed only when enabled; the redundant Date.now() duration in MiddlewareEngine is removed and metadata.duration is now optional.
Packaged-build safe: the CPU profile is written to the app logs directory (not process.cwd()) and a failed write can never break boot. Thresholds live in SLOW_THRESHOLD_MS; usage in docs/guides/diagnostics.md.
Also demote per-service stop/destroy logs to debug to quiet shutdown output.
SelectionService loaded the native selection-hook addon and built the toolbar + action-pool windows synchronously in onReady, during the WhenReady boot phase, stalling the concurrent main-window creation on the single main thread.
Move activation to onAllReady + setImmediate so it warms up after the main window paints. The first text selection still feels instant — no text can be selected this early in cold start. A pre-fire stop stays safe: _doActivate() no-ops unless the service is Ready.
The boot reconcile of crash-orphaned pending assistant turns (findPendingAssistantMessageIds) full-scanned the message and agent_session_message tables. Add a plain status index on each so the lookup is a SEARCH, not a SCAN.
Plain, not partial: Drizzle binds status = ?, which SQLite cannot match against a partial (literal-predicate) index. Also select only id, since the reconcile just flips matched rows to error.
Replace the v1 renderer-driven disk polling with a main-process service:
- Remove checkAppStorageQuota (browser storage-quota >=95% warning); v2
business data lives in main-process SQLite, so the browser-side quota no
longer reflects where data is stored.
- Add StorageMonitorService (WhenReady, @DependsOn WindowManager): fs.statfs
with capacity-adaptive polling (5-60 min by free space), pushing health
transitions (ok<->low) to the main window only.
- Rename utils/dataLimit.ts to hooks/useStorageMonitorNotification.ts, a thin
subscriber that drives the existing antd notification (UI migration deferred
to the broader v2 refactor).
- Drop the App_GetDiskInfo IPC handler, the getDiskInfo preload API, and the
check-disk-space dependency.
- Add a breaking-change entry for the removed storage-quota warning.