CherryHQ-cherry-studio

mirror of https://github.com/CherryHQ/cherry-studio.git synced 2026-07-03 12:27:41 +08:00

Author	SHA1	Message	Date
fullex	9b9570116a	refactor(db): replace libsql with better-sqlite3 + sqlite-vec (#16626 )	2026-07-02 13:21:13 +08:00
fullex	714c1a3dd3	docs(data): add database-construction reference and consolidate migration/custom-SQL/FTS docs Add docs/references/data/database-construction.md as the single home for how the SQLite DB is built and evolved: boot init order, drizzle migrations (regenerate-never-rename, CI gates, additive-vs-rebuild), the CUSTOM_SQL_STATEMENTS every-boot replay (~0.1ms O(1)), and the FTS5 fts_rowid rowid-stability rule, plus a gotchas table. Move the Migrations and Custom SQL sections out of database-patterns.md into it (left as pointers), and index it from data/README.md and src/main/data/db/README.md. Fix stale references found while consolidating: wrong generate command, customSql.ts vs customSqls.ts, columnHelpers.ts vs _columnHelpers.ts, a nonexistent messageFts.ts, yarn vs pnpm, the v2-todo single-0000 claim, the generated-column wording, and v1 data.blocks vocabulary in the testing doc.	2026-06-17 05:52:01 -07:00
槑囿脑袋	1382a8dd7c	feat(knowledge): route embeddings and reranking through the AI service (#15796 ) ### What this PR does Before this PR: - Knowledge embeddings and reranking ran through the legacy embedjs-based knowledgeV1 stack with their own provider clients, independent of the app's AI service. - File-processing intake accepted several heterogeneous input shapes, and knowledge file items were tracked by FileEntry ids, coupling file content to the file-manager entry/cache. After this PR: - Embeddings and reranking are routed through the unified `AiService` (with cherryin rerank support) and guarded by strict embedding-dimension validation that rejects stale/mismatched vectors. - File-processing intake is collapsed to a single path-based model; knowledge file items are stored by base-relative path under the knowledge-base directory, and v1 uploads are copied into the v2 base dir during migration so migrated items stay reindexable/restorable. - Legacy `knowledgeV1` is removed; the orchestration services were renamed to `KnowledgeService` / `FileProcessingService`. - Chat -> knowledge attach is temporarily disconnected (tracked TODO) while the v2 file-manager bridge is rebuilt. Fixes #N/A (no linked issue) ### Why we need it and why it was done in this way Routing embeddings/rerank through `AiService` unifies provider handling and credentials and removes the parallel embedjs client stack and its v1 coupling. Storing knowledge files by base-relative path (instead of FileEntry ids) makes each knowledge base self-contained and portable. The following tradeoffs were made: - A large, coordinated refactor plus a migration step that physically copies v1 uploads into the v2 base dir, in exchange for removing the parallel client stack and making bases self-contained. - Base-relative path storage required a fail-fast/dedup strategy for same-named files and a guard for blank legacy filenames. The following alternatives were considered: - Keeping the embedjs stack behind an adapter — rejected; perpetuates the parallel client and v1 coupling. - Keeping FileEntry-id storage — rejected; couples knowledge files to the file-manager cache and blocks portability. ### Breaking changes - `knowledgeV1` is removed. Legacy v1 knowledge data reaches v2 only through the v2 migrators; there is no v1 fallback. - The v2 knowledge HTTP API (API gateway) now returns v2-native per-entry fields (`embeddingModelId`, `createdAt` on base entries; `chunkId`, `scoreKind`, `rank` on search results). The response envelope (`knowledge_bases`, `searched_bases`, `total`) is unchanged. See `v2-refactor-temp/docs/breaking-changes/2026-06-05-knowledge-api-v2.md`. ### Special notes for your reviewer - This branch went through several rounds of multi-agent code review. The most recent 6 commits address review findings: directory-import path collisions, migrated-file source copying + blank `relativePath` guard, addItems rollback error preservation, eager `document_to_markdown` output-target validation, a `CompletedKnowledgeBase` type guard, and breaking-changes doc corrections. - Chat -> knowledge attach is intentionally disconnected for now (tracked in `v2-refactor-temp/docs/knowledge/knowledge-todo.md`). - Local full `pnpm lint`/`pnpm test` was not run per the project's review conventions; please rely on CI / `pnpm build:check`. ### Checklist - [x] Branch: This PR targets the correct branch — `main` for active development, `v1` for v1 maintenance fixes - [x] PR: The PR description is expressive enough and will help future contributors - [x] Code: Write code that humans can understand and Keep it simple - [x] Refactor: You have left the code cleaner than you found it (Boy Scout Rule) - [x] Upgrade: Impact of this change on upgrade flows was considered and addressed if required - [ ] Documentation: A user-guide update was considered and is present (link) or not required. - [x] Self-review: I have reviewed my own code before requesting review from others ### Release note ```release-note NONE ``` --------- Signed-off-by: eeee0717 <chentao020717Work@outlook.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 14:04:29 +08:00
fullex	ad922067d4	refactor(mcp): rename MCP* identifiers to Mcp* per naming conventions Apply the naming-conventions §6.1 acronym-casing rule (MCP -> Mcp) to the MCP* PascalCase identifier family across the codebase (McpServer, McpTool, McpToolResponse, BuiltinMcpServerNames, McpService, etc.) plus the local identifiers boundMcp/enableMcp/disableMcp and the didiMcp registry key. Regenerate the OpenAPI spec from the renamed schemas. Deliberately left unchanged (not naming-convention identifiers): persisted field keys read by migrators (enabledMCPs), v1 Redux selectors (selectMCP), string values (ExaMCP, logger labels), and UPPER_SNAKE constants (MCP_*). Also fix naming issues in the data reference docs that prompted this: - JSONStreamReader -> JsonStreamReader (match the real class name) - rowToMCPServer -> rowToMcpServer (match the real function name) - replace the TopicService getInstance() skeleton with a direct singleton - sync stale MCPServer/MCPTool/McpService references in affected docs	2026-06-03 04:39:14 -07:00
槑囿脑袋	9a54295c02	refactor(knowledge): knowledge workflow jobs round2 (#15371 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com> Signed-off-by: eeee0717 <chentao020717Work@outlook.com>	2026-05-28 20:33:18 +08:00
fullex	6aeb14d6d1	docs(testing): add database-testing guide and update CLAUDE.md New guide at docs/references/testing/database-testing.md describing: - When to use setupTestDatabase() (and when not to). - Options, lifecycle behaviour, and PRAGMA handling. - Migration recipe (before/after diffs) for converting legacy tests. - Anti-patterns: vi.mock('@application') override, hand-written CREATE TABLE SQL, describe.concurrent within harness scope, nested setupTestDatabase() calls, re-adding the vi.mock('node:fs', importOriginal) escape hatch. - Gotchas: LibSQL transaction connection recycle + setPragma replay, pathToFileURL for Windows, FTS5 with NULL content, truncate-vs-drop. Linked from CLAUDE.md's Testing Guidelines section so future contributors find the convention.	2026-04-15 09:34:28 -07:00

6 Commits