mirror of
https://github.com/CherryHQ/cherry-studio.git
synced 2026-07-03 12:27:41 +08:00
418 lines
18 KiB
Markdown
418 lines
18 KiB
Markdown
# Database Schema Guidelines
|
|
|
|
## Schema File Organization
|
|
|
|
### Principles
|
|
|
|
| Scenario | Approach |
|
|
| -------------------------------------- | ------------------- |
|
|
| Strongly related tables in same domain | Merge into one file |
|
|
| Core tables / Complex business logic | One file per table |
|
|
| Tables that may cross multiple domains | One file per table |
|
|
|
|
### Decision Criteria
|
|
|
|
**Merge when:**
|
|
|
|
- Tables have strong foreign key relationships (e.g., many-to-many)
|
|
- Tables belong to the same business domain
|
|
- Tables are unlikely to evolve independently
|
|
|
|
**Separate (one file per table) when:**
|
|
|
|
- Core table with many fields and complex logic
|
|
- Has a dedicated Service layer counterpart
|
|
- May expand independently in the future
|
|
|
|
### File Naming
|
|
|
|
- **Single-table files**: named after the table export name (`message.ts` for `messageTable`, `topic.ts` for `topicTable`)
|
|
- **Multi-table files**: lowercase, named by domain (`tagging.ts` for `tagTable` + `entityTagTable`)
|
|
- **Helper utilities**: underscore prefix (`_columnHelpers.ts`) to indicate non-table definitions
|
|
|
|
## Naming Conventions
|
|
|
|
- **Table names**: Use **singular** form with snake_case (e.g., `topic`, `message`, `app_state`)
|
|
- **Export names**: Use `xxxTable` pattern (e.g., `topicTable`, `messageTable`)
|
|
- **Column names**: Drizzle auto-infers from property names, no need to specify explicitly
|
|
|
|
## Column Helpers
|
|
|
|
All helpers are exported from `./schemas/_columnHelpers.ts`.
|
|
|
|
### Primary Keys
|
|
|
|
| Helper | UUID Version | Use Case |
|
|
| ------------------------- | ----------------- | ------------------------------------ |
|
|
| `uuidPrimaryKey()` | v4 (random) | General purpose tables |
|
|
| `uuidPrimaryKeyOrdered()` | v7 (time-ordered) | Large tables with time-based queries |
|
|
|
|
**Usage:**
|
|
|
|
```typescript
|
|
import { uuidPrimaryKey, uuidPrimaryKeyOrdered } from './_columnHelpers'
|
|
|
|
// General purpose table
|
|
export const topicTable = sqliteTable('topic', {
|
|
id: uuidPrimaryKey(),
|
|
name: text(),
|
|
...
|
|
})
|
|
|
|
// Large table with time-ordered data
|
|
export const messageTable = sqliteTable('message', {
|
|
id: uuidPrimaryKeyOrdered(),
|
|
content: text(),
|
|
...
|
|
})
|
|
```
|
|
|
|
**Behavior:**
|
|
|
|
- ID is auto-generated if not provided during insert
|
|
- Can be manually specified for migration scenarios
|
|
- Use `.returning()` to get the generated ID after insert
|
|
|
|
### Timestamps
|
|
|
|
| Helper | Fields | Use Case |
|
|
| ------------------------------ | ------------------------------------- | -------------------------- |
|
|
| `createUpdateTimestamps` | `createdAt`, `updatedAt` | Tables without soft delete |
|
|
| `createUpdateDeleteTimestamps` | `createdAt`, `updatedAt`, `deletedAt` | Tables with soft delete |
|
|
|
|
**Usage:**
|
|
|
|
```typescript
|
|
import {
|
|
createUpdateTimestamps,
|
|
createUpdateDeleteTimestamps,
|
|
} from "./_columnHelpers";
|
|
|
|
// Without soft delete
|
|
export const tagTable = sqliteTable("tag", {
|
|
id: uuidPrimaryKey(),
|
|
name: text(),
|
|
...createUpdateTimestamps,
|
|
});
|
|
|
|
// With soft delete
|
|
export const topicTable = sqliteTable("topic", {
|
|
id: uuidPrimaryKey(),
|
|
name: text(),
|
|
...createUpdateDeleteTimestamps,
|
|
});
|
|
```
|
|
|
|
**Behavior:**
|
|
|
|
- `createdAt`: Auto-set to `Date.now()` on insert
|
|
- `updatedAt`: Auto-set on insert, auto-updated on update
|
|
- `deletedAt`: `null` by default, set to timestamp for soft delete
|
|
|
|
## JSON Fields
|
|
|
|
For JSON column support, use `{ mode: 'json' }`:
|
|
|
|
```typescript
|
|
data: text({ mode: "json" }).$type<MyDataType>();
|
|
```
|
|
|
|
Drizzle handles JSON serialization/deserialization automatically.
|
|
|
|
## Column Nullability and Defaults
|
|
|
|
### When `nullable` vs `NOT NULL`
|
|
|
|
A column may be `nullable` only when **NULL carries a domain meaning distinct from any value in the column's domain**:
|
|
|
|
| Pattern | Example |
|
|
|---|---|
|
|
| Optional foreign key | `assistant.modelId` (no model selected yet) |
|
|
| Time of an event that may not have occurred | `deletedAt`, `cancelledAt` |
|
|
| Unassigned-tagged state | `pr.reviewerId` (unassigned vs assigned) |
|
|
|
|
All other columns should be `NOT NULL` with an appropriate default. If a column "should" always have a value, switch it to `NOT NULL` — do **not** add a `?? someValue` fallback in `rowToEntity` to mask NULL. See [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md).
|
|
|
|
#### Common offender: boolean columns without `.notNull()`
|
|
|
|
```typescript
|
|
// ❌ Wrong — inferred type is `boolean | null`
|
|
isEnabled: integer({ mode: 'boolean' }).default(true)
|
|
|
|
// ✅ Right
|
|
isEnabled: integer({ mode: 'boolean' }).notNull().default(true)
|
|
```
|
|
|
|
`mode: 'boolean'` implies two values to a reader, but Drizzle treats
|
|
nullability and default as orthogonal. Without `.notNull()`, every reader
|
|
writes `row.isEnabled ?? true` — exactly the fabricated-fallback pattern
|
|
R3 forbids. `.default(true)` runs at INSERT only; it does not constrain
|
|
existing NULLs.
|
|
|
|
Pair `.notNull().default(...)` on every boolean unless NULL carries a
|
|
third meaning (almost never — "unknown enabled" usually maps to `false`).
|
|
|
|
### Where the default value lives
|
|
|
|
| Location | Use for | Note |
|
|
|---|---|---|
|
|
| **DB `.default('X')`** | Type-level "empty" values (`''`, `0`, `false`, `[]`) — won't change because they aren't product choices | **Effectively a near-permanent choice in SQLite** — every change requires a full-table rebuild that copies every row and never touches the existing ones; legacy NULL backfill must be hand-written into the rebuild's `INSERT ... SELECT`. For product-chosen values that could evolve (`'🌟'`, default model parameters), prefer service `??`. See [Default Values & Nullability § DB defaults are near-permanent](./best-practice-default-values-and-nullability.md#db-defaults-are-near-permanent). |
|
|
| **Drizzle `$defaultFn(() => …)`** | Dynamic per-row values: UUIDs, `Date.now()` | Lives in the schema file but runs in JS at INSERT time |
|
|
| **Service `dto.x ?? DEFAULT`** | Tunable product values that may evolve (e.g., inference parameters) | No migration needed when defaults change; covers all callers (handler, seeder, internal-service) |
|
|
| **Zod `.default()`** | Avoid on entity / Create / Update schemas | Bypassed by non-handler callers; forces type asymmetry; see [API Design Guidelines § E](./api-design-guidelines.md#e-default-values-do-not-live-in-zod-schemas) |
|
|
|
|
For the full rationale and decision tree, see [Default Values & Nullability](./best-practice-default-values-and-nullability.md).
|
|
|
|
## Foreign Keys
|
|
|
|
### Basic Usage
|
|
|
|
```typescript
|
|
// SET NULL: preserve record when referenced record is deleted
|
|
groupId: text().references(() => groupTable.id, { onDelete: "set null" });
|
|
|
|
// CASCADE: delete record when referenced record is deleted
|
|
topicId: text().references(() => topicTable.id, { onDelete: "cascade" });
|
|
```
|
|
|
|
### Self-Referencing Foreign Keys
|
|
|
|
For self-referencing foreign keys (e.g., tree structures with parentId), **always use the `foreignKey` operator** in the table's third parameter:
|
|
|
|
```typescript
|
|
import { foreignKey, sqliteTable, text } from "drizzle-orm/sqlite-core";
|
|
|
|
export const messageTable = sqliteTable(
|
|
"message",
|
|
{
|
|
id: uuidPrimaryKeyOrdered(),
|
|
parentId: text(), // Do NOT use .references() here
|
|
// ...other fields
|
|
},
|
|
(t) => [
|
|
// Use foreignKey operator for self-referencing
|
|
foreignKey({ columns: [t.parentId], foreignColumns: [t.id] }).onDelete(
|
|
"set null"
|
|
),
|
|
]
|
|
);
|
|
```
|
|
|
|
**Why this approach:**
|
|
|
|
- Avoids TypeScript circular reference issues (no need for `AnySQLiteColumn` type annotation)
|
|
- More explicit and readable
|
|
- Allows chaining `.onDelete()` / `.onUpdate()` actions
|
|
|
|
### Circular Foreign Key References
|
|
|
|
**Avoid circular foreign key references between tables.** For example:
|
|
|
|
```typescript
|
|
// ❌ BAD: Circular FK between tables
|
|
// tableA.currentItemId -> tableB.id
|
|
// tableB.ownerId -> tableA.id
|
|
```
|
|
|
|
If you encounter a scenario that seems to require circular references:
|
|
|
|
1. **Identify which relationship is "weaker"** - typically the one that can be null or is less critical for data integrity
|
|
2. **Remove the FK constraint from the weaker side** - let the application layer handle validation and consistency (this is known as "soft references" pattern)
|
|
3. **Document the application-layer constraint** in code comments
|
|
|
|
```typescript
|
|
// ✅ GOOD: Break the cycle by handling one side at application layer
|
|
export const topicTable = sqliteTable("topic", {
|
|
id: uuidPrimaryKey(),
|
|
// Application-managed reference (no FK constraint)
|
|
// Validated by TopicService.setCurrentMessage()
|
|
currentMessageId: text(),
|
|
});
|
|
|
|
export const messageTable = sqliteTable("message", {
|
|
id: uuidPrimaryKeyOrdered(),
|
|
// Database-enforced FK
|
|
topicId: text().references(() => topicTable.id, { onDelete: "cascade" }),
|
|
});
|
|
```
|
|
|
|
**Why soft references for SQLite:**
|
|
|
|
- SQLite does not support `DEFERRABLE` constraints (unlike PostgreSQL/Oracle)
|
|
- Application-layer validation provides equivalent data integrity
|
|
- Simplifies insert/update operations without transaction ordering concerns
|
|
|
|
## Migrations
|
|
|
|
The migration workflow — `pnpm db:migrations:generate` after schema changes, regenerate-never-rename, CI gates, and additive-vs-rebuild — is consolidated in **[Database Construction](./database-construction.md)**.
|
|
|
|
## Field Generation Rules
|
|
|
|
The schema uses Drizzle's auto-generation features. Follow these rules:
|
|
|
|
### Auto-generated fields (NEVER set manually)
|
|
|
|
- `id`: Uses `$defaultFn()` with UUID v4/v7, auto-generated on insert
|
|
- `createdAt`: Uses `$defaultFn()` with `Date.now()`, auto-generated on insert
|
|
- `updatedAt`: Uses `$defaultFn()` and `$onUpdateFn()`, auto-updated on every update
|
|
|
|
### Using `.returning()` pattern
|
|
|
|
Always use `.returning()` to get inserted/updated data instead of re-querying:
|
|
|
|
```typescript
|
|
// Good: Use returning()
|
|
const [row] = await db.insert(table).values(data).returning();
|
|
return rowToEntity(row);
|
|
|
|
// Avoid: Re-query after insert (unnecessary database round-trip)
|
|
await db.insert(table).values({ id, ...data });
|
|
return this.getById(id);
|
|
```
|
|
|
|
### Row → Entity Mapping
|
|
|
|
All `rowToEntity` functions follow a unified paradigm: a shallow `nullsToUndefined(row)` strips DB NULL → undefined, then date fields are converted manually. See the [Row → Entity Mapping](./data-api-in-main.md#row--entity-mapping) section of `data-api-in-main.md` for the paradigm, and [services/utils/README.md](../../../src/main/data/services/utils/README.md) for function signatures and rejected alternatives.
|
|
|
|
Key principles:
|
|
|
|
- **Shallow, not recursive**: only column-level NULLs are handled; nested JSON payloads are not deep-cleaned
|
|
- **No third-party null-handling library**: the in-house `nullsToUndefined` (~10 LOC) is sufficient — avoid dependency bloat
|
|
- **No fabricated fallbacks**: `row.x ?? '🌟'` / `row.x ?? []` is forbidden — see [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md). If a value "should" always be present, fix the column constraint instead of masking NULL in the mapper.
|
|
|
|
### Soft delete support
|
|
|
|
The schema supports soft delete via `deletedAt` field (see `createUpdateDeleteTimestamps`).
|
|
Business logic can choose to use soft delete or hard delete based on requirements.
|
|
|
|
## Raw SQL Queries & Recursive CTEs
|
|
|
|
Drizzle's `casing: 'snake_case'` only applies to the ORM channel
|
|
(`db.select()`, `db.insert()`, `db.update()`). Raw SQL via `db.all(sql\`...\`)`
|
|
returns SQLite's native snake_case columns with **no runtime mapping** — the
|
|
TypeScript generic on `db.all<T>()` is a compile-time assertion only. So
|
|
`db.all<typeof messageTable.$inferSelect>(sql\`SELECT * FROM message\`)` lies
|
|
to the type system: at runtime `row.parentId` is `undefined`; the actual key
|
|
is `parent_id`.
|
|
|
|
Recursive CTEs (`WITH RECURSIVE`) are the main reason raw SQL is needed —
|
|
Drizzle does not yet support them in the query builder.
|
|
|
|
### Pattern: CTE for IDs, ORM for rows
|
|
|
|
Keep raw SQL minimal. Use the CTE to compute the **set of IDs** you need
|
|
(single-word column, casing-safe), then fetch full rows through the ORM where
|
|
camelCase mapping is automatic and fully type-safe.
|
|
|
|
```typescript
|
|
// Step 1 — recursive CTE returns ID-only
|
|
const idRows = await db.all<{ id: string }>(sql`
|
|
WITH RECURSIVE ancestors AS (
|
|
SELECT id, parent_id FROM message WHERE id = ${nodeId} AND deleted_at IS NULL
|
|
UNION ALL
|
|
SELECT m.id, m.parent_id FROM message m
|
|
INNER JOIN ancestors a ON m.id = a.parent_id
|
|
WHERE m.deleted_at IS NULL
|
|
)
|
|
SELECT id FROM ancestors
|
|
`)
|
|
const ids = idRows.map((r) => r.id)
|
|
|
|
// Step 2 — fetch full rows via ORM (auto camelCase)
|
|
const rows = ids.length > 0
|
|
? await db.select().from(messageTable).where(inArray(messageTable.id, ids))
|
|
: []
|
|
|
|
// Step 3 — restore CTE order (IN-list does not preserve order)
|
|
const order = new Map(ids.map((id, i) => [id, i]))
|
|
rows.sort((a, b) => order.get(a.id)! - order.get(b.id)!)
|
|
```
|
|
|
|
If the CTE computes a derived value (e.g. `tree_depth`), select it alongside
|
|
`id` — single-word aliases are also casing-safe — and join it back via a `Map`.
|
|
|
|
**Don't** `SELECT *` with raw SQL or write a snake→camel helper to patch the
|
|
output: both bypass Drizzle's type-safety and let future schema changes drift
|
|
silently.
|
|
|
|
Reference implementations: `MessageService.getTree` / `getBranchMessages` /
|
|
`getPathToNode`, `KnowledgeItemService.getCascadeIdsInBase`.
|
|
|
|
## Custom SQL & FTS5
|
|
|
|
Triggers, FTS5 virtual tables, the `CUSTOM_SQL_STATEMENTS` every-boot replay (and why it's ~0.1 ms O(1)), the `fts_rowid` rowid-stability rule, and the idempotency rules (vtables `IF NOT EXISTS`, triggers `DROP+CREATE`) are consolidated in **[Database Construction](./database-construction.md)** (§3 Custom SQL, §4 FTS5).
|
|
|
|
## Seeding
|
|
|
|
For initial data population (default preferences, builtin languages, preset providers), see [Database Seeding Guide](./database-seeding-guide.md).
|
|
|
|
## Write Serialization (`DbService.withWriteTx`)
|
|
|
|
`application.get('DbService').withWriteTx(fn)` runs `fn` as one synchronous `BEGIN IMMEDIATE` transaction on the single persistent connection.
|
|
|
|
**When it earns its keep.** With better-sqlite3 every statement is atomic on its own — a lone `getDb().insert(...).run()` is a complete implicit transaction and needs no wrapper. A transaction earns its keep only when a mutation must commit **all-or-nothing across more than one statement**:
|
|
|
|
- **Use it** when composing multiple writes, or a read-then-write (validate/select then insert/update/delete), into one atomic unit — the majority of write paths here (create/update/delete that also touch join tables, purge pins/tags, reorder via neighbour reads, or cascade-delete). The premise is **atomicity** (rollback across statements), not serialization: the single synchronous connection already serializes every write by construction.
|
|
- **Don't use** for a single autocommit write — call `getDb()` directly, or pass `getDb()` to the write's `*Tx` form (`this.fooTx(getDb(), …)`). Routing a lone write through `withWriteTx` buys nothing for atomicity and falsely implies a multi-statement invariant. The `*Tx` form stays composable, so the same primitive can still be pulled into a larger `withWriteTx` when a caller genuinely needs multi-write atomicity.
|
|
|
|
**`withWriteTx` vs `db.transaction()`.** `withWriteTx(fn)` is a thin wrapper over `getDb().transaction(fn, { behavior: 'immediate' })` behind the `isReady` guard. `BEGIN IMMEDIATE` takes the write lock up front, which only matters when a second connection writes concurrently; the main DB uses one connection, so it behaves identically to a plain `db.transaction(fn)`. Prefer `withWriteTx` as the conventional, greppable write seam with the correct write-intent default — but a direct `db.transaction()` is **equivalent** for atomicity and not an error. `withWriteTx` is **not** the readiness gate: `getDb()` already throws when the DB isn't ready, so writes made outside `withWriteTx` are still guarded. The single synchronous connection serializes all access, so there is no process-wide mutex and no `SQLITE_BUSY` retry — the libsql-era serialization this wrapper originally existed for (upstream #288) is gone.
|
|
|
|
### Signature
|
|
|
|
```ts
|
|
withWriteTx<T>(fn: (tx: DbOrTx) => T): T
|
|
```
|
|
|
|
`fn` must be **synchronous** — better-sqlite3 rejects a Promise-returning transaction callback. Internals: one synchronous `BEGIN IMMEDIATE` transaction behind the `isReady` guard; the single connection serializes all access, so callers never contend.
|
|
|
|
### Usage
|
|
|
|
```ts
|
|
const dbService = application.get('DbService')
|
|
|
|
// A single write does NOT use withWriteTx — go straight through getDb(),
|
|
// or pass it to the *Tx form:
|
|
jobService.setMetadataTx(dbService.getDb(), jobId, merged)
|
|
|
|
// withWriteTx is for composing multiple writes into one atomic transaction:
|
|
dbService.withWriteTx((tx) => {
|
|
jobService.cancelByIdsTx(tx, ids, error)
|
|
jobService.resetToPendingByIdsTx(tx, otherIds)
|
|
})
|
|
```
|
|
|
|
### Two-form DAO pattern
|
|
|
|
Each write method has a composable `*Tx` form and a thin non-Tx wrapper. A single-write method's wrapper passes `getDb()` to the `*Tx` form; a multi-write / read-then-write method's wrapper composes one or more `*Tx` calls inside a single `withWriteTx`. Either way the `*Tx` form stays composable, so batch/recovery paths can pull it into a larger transaction. See `JobService` / `JobScheduleService` for canonical examples.
|
|
|
|
```ts
|
|
cancelByIdsTx(tx: DbOrTx, ids: string[], error: JobError): void { /* SQL via tx */ }
|
|
|
|
// Single write → call the *Tx form with getDb() directly (no withWriteTx):
|
|
cancelByIds(ids: string[], error: JobError): void {
|
|
return this.cancelByIdsTx(application.get('DbService').getDb(), ids, error)
|
|
}
|
|
```
|
|
|
|
### Rules
|
|
|
|
| Rule | Rationale |
|
|
| --- | --- |
|
|
| `fn` must be synchronous and only do DB ops — no network / file IO / handler execution | better-sqlite3 rejects a Promise-returning callback; the transaction blocks the single connection until `fn` returns |
|
|
| Do not wrap reads | WAL mode gives readers snapshot isolation; wrapping adds needless serialization |
|
|
| Don't wrap a single autocommit write | one statement is already an implicit transaction — call `getDb()` (or the `*Tx` form) directly; `getDb()` still guards readiness |
|
|
| Wrap tight loops in one `withWriteTx`, not per-iteration | One `BEGIN IMMEDIATE` transaction vs N |
|
|
|
|
### When to migrate existing callsites
|
|
|
|
| Path | Action |
|
|
| --- | --- |
|
|
| Multi-statement / read-then-write mutations | Wrap in a transaction — `withWriteTx` (preferred) or a direct `db.transaction()` |
|
|
| Single-statement writes | Don't wrap — call `getDb()` (or the `*Tx` form) directly |
|
|
| Boot-only writes (migrations, seeders) | Leave |
|
|
| Pure reads | Leave |
|
|
|
|
### Reference
|
|
|
|
[Concurrency & Locks — Layer 0](../job-and-scheduler/concurrency-and-locks.md).
|