github/CherryHQ-cherry-studio

Fork 0

mirror of https://github.com/CherryHQ/cherry-studio.git synced 2026-07-03 20:59:22 +08:00

Files

fullex 9b9570116a refactor(db): replace libsql with better-sqlite3 + sqlite-vec (#16626 )

2026-07-02 13:21:13 +08:00

18 KiB

Raw Permalink Blame History

Database Schema Guidelines

Schema File Organization

Principles

Scenario	Approach
Strongly related tables in same domain	Merge into one file
Core tables / Complex business logic	One file per table
Tables that may cross multiple domains	One file per table

Decision Criteria

Merge when:

Tables have strong foreign key relationships (e.g., many-to-many)
Tables belong to the same business domain
Tables are unlikely to evolve independently

Separate (one file per table) when:

Core table with many fields and complex logic
Has a dedicated Service layer counterpart
May expand independently in the future

File Naming

Single-table files: named after the table export name (message.ts for messageTable, topic.ts for topicTable)
Multi-table files: lowercase, named by domain (tagging.ts for tagTable + entityTagTable)
Helper utilities: underscore prefix (_columnHelpers.ts) to indicate non-table definitions

Naming Conventions

Table names: Use singular form with snake_case (e.g., topic, message, app_state)
Export names: Use xxxTable pattern (e.g., topicTable, messageTable)
Column names: Drizzle auto-infers from property names, no need to specify explicitly

Column Helpers

All helpers are exported from ./schemas/_columnHelpers.ts.

Primary Keys

Helper	UUID Version	Use Case
`uuidPrimaryKey()`	v4 (random)	General purpose tables
`uuidPrimaryKeyOrdered()`	v7 (time-ordered)	Large tables with time-based queries

Usage:

import { uuidPrimaryKey, uuidPrimaryKeyOrdered } from './_columnHelpers'

// General purpose table
export const topicTable = sqliteTable('topic', {
  id: uuidPrimaryKey(),
  name: text(),
  ...
})

// Large table with time-ordered data
export const messageTable = sqliteTable('message', {
  id: uuidPrimaryKeyOrdered(),
  content: text(),
  ...
})

Behavior:

ID is auto-generated if not provided during insert
Can be manually specified for migration scenarios
Use .returning() to get the generated ID after insert

Timestamps

Helper	Fields	Use Case
`createUpdateTimestamps`	`createdAt`, `updatedAt`	Tables without soft delete
`createUpdateDeleteTimestamps`	`createdAt`, `updatedAt`, `deletedAt`	Tables with soft delete

Usage:

import {
  createUpdateTimestamps,
  createUpdateDeleteTimestamps,
} from "./_columnHelpers";

// Without soft delete
export const tagTable = sqliteTable("tag", {
  id: uuidPrimaryKey(),
  name: text(),
  ...createUpdateTimestamps,
});

// With soft delete
export const topicTable = sqliteTable("topic", {
  id: uuidPrimaryKey(),
  name: text(),
  ...createUpdateDeleteTimestamps,
});

Behavior:

createdAt: Auto-set to Date.now() on insert
updatedAt: Auto-set on insert, auto-updated on update
deletedAt: null by default, set to timestamp for soft delete

JSON Fields

For JSON column support, use { mode: 'json' }:

data: text({ mode: "json" }).$type<MyDataType>();

Drizzle handles JSON serialization/deserialization automatically.

Column Nullability and Defaults

When `nullable` vs `NOT NULL`

A column may be nullable only when NULL carries a domain meaning distinct from any value in the column's domain:

Pattern	Example
Optional foreign key	`assistant.modelId` (no model selected yet)
Time of an event that may not have occurred	`deletedAt`, `cancelledAt`
Unassigned-tagged state	`pr.reviewerId` (unassigned vs assigned)

All other columns should be NOT NULL with an appropriate default. If a column "should" always have a value, switch it to NOT NULL — do not add a ?? someValue fallback in rowToEntity to mask NULL. See Default Values & Nullability § R3.

Common offender: boolean columns without `.notNull()`

// ❌ Wrong — inferred type is `boolean | null`
isEnabled: integer({ mode: 'boolean' }).default(true)

// ✅ Right
isEnabled: integer({ mode: 'boolean' }).notNull().default(true)

mode: 'boolean' implies two values to a reader, but Drizzle treats nullability and default as orthogonal. Without .notNull(), every reader writes row.isEnabled ?? true — exactly the fabricated-fallback pattern R3 forbids. .default(true) runs at INSERT only; it does not constrain existing NULLs.

Pair .notNull().default(...) on every boolean unless NULL carries a third meaning (almost never — "unknown enabled" usually maps to false).

Where the default value lives

Location	Use for	Note
DB `.default('X')`	Type-level "empty" values (`''`, `0`, `false`, `[]`) — won't change because they aren't product choices	Effectively a near-permanent choice in SQLite — every change requires a full-table rebuild that copies every row and never touches the existing ones; legacy NULL backfill must be hand-written into the rebuild's `INSERT ... SELECT`. For product-chosen values that could evolve (`'🌟'`, default model parameters), prefer service `??`. See Default Values & Nullability § DB defaults are near-permanent.
Drizzle `$defaultFn(() => …)`	Dynamic per-row values: UUIDs, `Date.now()`	Lives in the schema file but runs in JS at INSERT time
Service `dto.x ?? DEFAULT`	Tunable product values that may evolve (e.g., inference parameters)	No migration needed when defaults change; covers all callers (handler, seeder, internal-service)
Zod `.default()`	Avoid on entity / Create / Update schemas	Bypassed by non-handler callers; forces type asymmetry; see API Design Guidelines § E

For the full rationale and decision tree, see Default Values & Nullability.

Foreign Keys

Basic Usage

// SET NULL: preserve record when referenced record is deleted
groupId: text().references(() => groupTable.id, { onDelete: "set null" });

// CASCADE: delete record when referenced record is deleted
topicId: text().references(() => topicTable.id, { onDelete: "cascade" });

Self-Referencing Foreign Keys

For self-referencing foreign keys (e.g., tree structures with parentId), always use the foreignKey operator in the table's third parameter:

import { foreignKey, sqliteTable, text } from "drizzle-orm/sqlite-core";

export const messageTable = sqliteTable(
  "message",
  {
    id: uuidPrimaryKeyOrdered(),
    parentId: text(), // Do NOT use .references() here
    // ...other fields
  },
  (t) => [
    // Use foreignKey operator for self-referencing
    foreignKey({ columns: [t.parentId], foreignColumns: [t.id] }).onDelete(
      "set null"
    ),
  ]
);

Why this approach:

Avoids TypeScript circular reference issues (no need for AnySQLiteColumn type annotation)
More explicit and readable
Allows chaining .onDelete() / .onUpdate() actions

Circular Foreign Key References

Avoid circular foreign key references between tables. For example:

// ❌ BAD: Circular FK between tables
// tableA.currentItemId -> tableB.id
// tableB.ownerId -> tableA.id

If you encounter a scenario that seems to require circular references:

Identify which relationship is "weaker" - typically the one that can be null or is less critical for data integrity
Remove the FK constraint from the weaker side - let the application layer handle validation and consistency (this is known as "soft references" pattern)
Document the application-layer constraint in code comments

// ✅ GOOD: Break the cycle by handling one side at application layer
export const topicTable = sqliteTable("topic", {
  id: uuidPrimaryKey(),
  // Application-managed reference (no FK constraint)
  // Validated by TopicService.setCurrentMessage()
  currentMessageId: text(),
});

export const messageTable = sqliteTable("message", {
  id: uuidPrimaryKeyOrdered(),
  // Database-enforced FK
  topicId: text().references(() => topicTable.id, { onDelete: "cascade" }),
});

Why soft references for SQLite:

SQLite does not support DEFERRABLE constraints (unlike PostgreSQL/Oracle)
Application-layer validation provides equivalent data integrity
Simplifies insert/update operations without transaction ordering concerns

Migrations

The migration workflow — pnpm db:migrations:generate after schema changes, regenerate-never-rename, CI gates, and additive-vs-rebuild — is consolidated in Database Construction.

Field Generation Rules

The schema uses Drizzle's auto-generation features. Follow these rules:

Auto-generated fields (NEVER set manually)

id: Uses $defaultFn() with UUID v4/v7, auto-generated on insert
createdAt: Uses $defaultFn() with Date.now(), auto-generated on insert
updatedAt: Uses $defaultFn() and $onUpdateFn(), auto-updated on every update

Using `.returning()` pattern

Always use .returning() to get inserted/updated data instead of re-querying:

// Good: Use returning()
const [row] = await db.insert(table).values(data).returning();
return rowToEntity(row);

// Avoid: Re-query after insert (unnecessary database round-trip)
await db.insert(table).values({ id, ...data });
return this.getById(id);

Row → Entity Mapping

All rowToEntity functions follow a unified paradigm: a shallow nullsToUndefined(row) strips DB NULL → undefined, then date fields are converted manually. See the Row → Entity Mapping section of data-api-in-main.md for the paradigm, and services/utils/README.md for function signatures and rejected alternatives.

Key principles:

Shallow, not recursive: only column-level NULLs are handled; nested JSON payloads are not deep-cleaned
No third-party null-handling library: the in-house nullsToUndefined (~10 LOC) is sufficient — avoid dependency bloat
No fabricated fallbacks: row.x ?? '🌟' / row.x ?? [] is forbidden — see Default Values & Nullability § R3. If a value "should" always be present, fix the column constraint instead of masking NULL in the mapper.

Soft delete support

The schema supports soft delete via deletedAt field (see createUpdateDeleteTimestamps). Business logic can choose to use soft delete or hard delete based on requirements.

Raw SQL Queries & Recursive CTEs

Drizzle's casing: 'snake_case' only applies to the ORM channel (db.select(), db.insert(), db.update()). Raw SQL via db.all(sql\...`)returns SQLite's native snake_case columns with **no runtime mapping** — the TypeScript generic ondb.all()is a compile-time assertion only. Sodb.all<typeof messageTable.$inferSelect>(sql`SELECT * FROM message`)lies to the type system: at runtimerow.parentIdisundefined; the actual key is parent_id`.

Recursive CTEs (WITH RECURSIVE) are the main reason raw SQL is needed — Drizzle does not yet support them in the query builder.

Pattern: CTE for IDs, ORM for rows

Keep raw SQL minimal. Use the CTE to compute the set of IDs you need (single-word column, casing-safe), then fetch full rows through the ORM where camelCase mapping is automatic and fully type-safe.

// Step 1 — recursive CTE returns ID-only
const idRows = await db.all<{ id: string }>(sql`
  WITH RECURSIVE ancestors AS (
    SELECT id, parent_id FROM message WHERE id = ${nodeId} AND deleted_at IS NULL
    UNION ALL
    SELECT m.id, m.parent_id FROM message m
    INNER JOIN ancestors a ON m.id = a.parent_id
    WHERE m.deleted_at IS NULL
  )
  SELECT id FROM ancestors
`)
const ids = idRows.map((r) => r.id)

// Step 2 — fetch full rows via ORM (auto camelCase)
const rows = ids.length > 0
  ? await db.select().from(messageTable).where(inArray(messageTable.id, ids))
  : []

// Step 3 — restore CTE order (IN-list does not preserve order)
const order = new Map(ids.map((id, i) => [id, i]))
rows.sort((a, b) => order.get(a.id)! - order.get(b.id)!)

If the CTE computes a derived value (e.g. tree_depth), select it alongside id — single-word aliases are also casing-safe — and join it back via a Map.

Don't SELECT * with raw SQL or write a snake→camel helper to patch the output: both bypass Drizzle's type-safety and let future schema changes drift silently.

Reference implementations: MessageService.getTree / getBranchMessages / getPathToNode, KnowledgeItemService.getCascadeIdsInBase.

Custom SQL & FTS5

Triggers, FTS5 virtual tables, the CUSTOM_SQL_STATEMENTS every-boot replay (and why it's ~0.1 ms O(1)), the fts_rowid rowid-stability rule, and the idempotency rules (vtables IF NOT EXISTS, triggers DROP+CREATE) are consolidated in Database Construction (§3 Custom SQL, §4 FTS5).

Seeding

For initial data population (default preferences, builtin languages, preset providers), see Database Seeding Guide.

Write Serialization (`DbService.withWriteTx`)

application.get('DbService').withWriteTx(fn) runs fn as one synchronous BEGIN IMMEDIATE transaction on the single persistent connection.

When it earns its keep. With better-sqlite3 every statement is atomic on its own — a lone getDb().insert(...).run() is a complete implicit transaction and needs no wrapper. A transaction earns its keep only when a mutation must commit all-or-nothing across more than one statement:

Use it when composing multiple writes, or a read-then-write (validate/select then insert/update/delete), into one atomic unit — the majority of write paths here (create/update/delete that also touch join tables, purge pins/tags, reorder via neighbour reads, or cascade-delete). The premise is atomicity (rollback across statements), not serialization: the single synchronous connection already serializes every write by construction.
Don't use for a single autocommit write — call getDb() directly, or pass getDb() to the write's *Tx form (this.fooTx(getDb(), …)). Routing a lone write through withWriteTx buys nothing for atomicity and falsely implies a multi-statement invariant. The *Tx form stays composable, so the same primitive can still be pulled into a larger withWriteTx when a caller genuinely needs multi-write atomicity.

withWriteTx vs db.transaction(). withWriteTx(fn) is a thin wrapper over getDb().transaction(fn, { behavior: 'immediate' }) behind the isReady guard. BEGIN IMMEDIATE takes the write lock up front, which only matters when a second connection writes concurrently; the main DB uses one connection, so it behaves identically to a plain db.transaction(fn). Prefer withWriteTx as the conventional, greppable write seam with the correct write-intent default — but a direct db.transaction() is equivalent for atomicity and not an error. withWriteTx is not the readiness gate: getDb() already throws when the DB isn't ready, so writes made outside withWriteTx are still guarded. The single synchronous connection serializes all access, so there is no process-wide mutex and no SQLITE_BUSY retry — the libsql-era serialization this wrapper originally existed for (upstream #288) is gone.

Signature

withWriteTx<T>(fn: (tx: DbOrTx) => T): T

fn must be synchronous — better-sqlite3 rejects a Promise-returning transaction callback. Internals: one synchronous BEGIN IMMEDIATE transaction behind the isReady guard; the single connection serializes all access, so callers never contend.

Usage

const dbService = application.get('DbService')

// A single write does NOT use withWriteTx — go straight through getDb(),
// or pass it to the *Tx form:
jobService.setMetadataTx(dbService.getDb(), jobId, merged)

// withWriteTx is for composing multiple writes into one atomic transaction:
dbService.withWriteTx((tx) => {
  jobService.cancelByIdsTx(tx, ids, error)
  jobService.resetToPendingByIdsTx(tx, otherIds)
})

Two-form DAO pattern

Each write method has a composable *Tx form and a thin non-Tx wrapper. A single-write method's wrapper passes getDb() to the *Tx form; a multi-write / read-then-write method's wrapper composes one or more *Tx calls inside a single withWriteTx. Either way the *Tx form stays composable, so batch/recovery paths can pull it into a larger transaction. See JobService / JobScheduleService for canonical examples.

cancelByIdsTx(tx: DbOrTx, ids: string[], error: JobError): void { /* SQL via tx */ }

// Single write → call the *Tx form with getDb() directly (no withWriteTx):
cancelByIds(ids: string[], error: JobError): void {
  return this.cancelByIdsTx(application.get('DbService').getDb(), ids, error)
}

Rules

Rule	Rationale
`fn` must be synchronous and only do DB ops — no network / file IO / handler execution	better-sqlite3 rejects a Promise-returning callback; the transaction blocks the single connection until `fn` returns
Do not wrap reads	WAL mode gives readers snapshot isolation; wrapping adds needless serialization
Don't wrap a single autocommit write	one statement is already an implicit transaction — call `getDb()` (or the `*Tx` form) directly; `getDb()` still guards readiness
Wrap tight loops in one `withWriteTx`, not per-iteration	One `BEGIN IMMEDIATE` transaction vs N

When to migrate existing callsites

Path	Action
Multi-statement / read-then-write mutations	Wrap in a transaction — `withWriteTx` (preferred) or a direct `db.transaction()`
Single-statement writes	Don't wrap — call `getDb()` (or the `*Tx` form) directly
Boot-only writes (migrations, seeders)	Leave
Pure reads	Leave

Reference

Concurrency & Locks — Layer 0.

18 KiB Raw Permalink Blame History