Files
CherryHQ-cherry-studio/docs/references/data/database-patterns.md

418 lines
18 KiB
Markdown

# Database Schema Guidelines
## Schema File Organization
### Principles
| Scenario | Approach |
| -------------------------------------- | ------------------- |
| Strongly related tables in same domain | Merge into one file |
| Core tables / Complex business logic | One file per table |
| Tables that may cross multiple domains | One file per table |
### Decision Criteria
**Merge when:**
- Tables have strong foreign key relationships (e.g., many-to-many)
- Tables belong to the same business domain
- Tables are unlikely to evolve independently
**Separate (one file per table) when:**
- Core table with many fields and complex logic
- Has a dedicated Service layer counterpart
- May expand independently in the future
### File Naming
- **Single-table files**: named after the table export name (`message.ts` for `messageTable`, `topic.ts` for `topicTable`)
- **Multi-table files**: lowercase, named by domain (`tagging.ts` for `tagTable` + `entityTagTable`)
- **Helper utilities**: underscore prefix (`_columnHelpers.ts`) to indicate non-table definitions
## Naming Conventions
- **Table names**: Use **singular** form with snake_case (e.g., `topic`, `message`, `app_state`)
- **Export names**: Use `xxxTable` pattern (e.g., `topicTable`, `messageTable`)
- **Column names**: Drizzle auto-infers from property names, no need to specify explicitly
## Column Helpers
All helpers are exported from `./schemas/_columnHelpers.ts`.
### Primary Keys
| Helper | UUID Version | Use Case |
| ------------------------- | ----------------- | ------------------------------------ |
| `uuidPrimaryKey()` | v4 (random) | General purpose tables |
| `uuidPrimaryKeyOrdered()` | v7 (time-ordered) | Large tables with time-based queries |
**Usage:**
```typescript
import { uuidPrimaryKey, uuidPrimaryKeyOrdered } from './_columnHelpers'
// General purpose table
export const topicTable = sqliteTable('topic', {
id: uuidPrimaryKey(),
name: text(),
...
})
// Large table with time-ordered data
export const messageTable = sqliteTable('message', {
id: uuidPrimaryKeyOrdered(),
content: text(),
...
})
```
**Behavior:**
- ID is auto-generated if not provided during insert
- Can be manually specified for migration scenarios
- Use `.returning()` to get the generated ID after insert
### Timestamps
| Helper | Fields | Use Case |
| ------------------------------ | ------------------------------------- | -------------------------- |
| `createUpdateTimestamps` | `createdAt`, `updatedAt` | Tables without soft delete |
| `createUpdateDeleteTimestamps` | `createdAt`, `updatedAt`, `deletedAt` | Tables with soft delete |
**Usage:**
```typescript
import {
createUpdateTimestamps,
createUpdateDeleteTimestamps,
} from "./_columnHelpers";
// Without soft delete
export const tagTable = sqliteTable("tag", {
id: uuidPrimaryKey(),
name: text(),
...createUpdateTimestamps,
});
// With soft delete
export const topicTable = sqliteTable("topic", {
id: uuidPrimaryKey(),
name: text(),
...createUpdateDeleteTimestamps,
});
```
**Behavior:**
- `createdAt`: Auto-set to `Date.now()` on insert
- `updatedAt`: Auto-set on insert, auto-updated on update
- `deletedAt`: `null` by default, set to timestamp for soft delete
## JSON Fields
For JSON column support, use `{ mode: 'json' }`:
```typescript
data: text({ mode: "json" }).$type<MyDataType>();
```
Drizzle handles JSON serialization/deserialization automatically.
## Column Nullability and Defaults
### When `nullable` vs `NOT NULL`
A column may be `nullable` only when **NULL carries a domain meaning distinct from any value in the column's domain**:
| Pattern | Example |
|---|---|
| Optional foreign key | `assistant.modelId` (no model selected yet) |
| Time of an event that may not have occurred | `deletedAt`, `cancelledAt` |
| Unassigned-tagged state | `pr.reviewerId` (unassigned vs assigned) |
All other columns should be `NOT NULL` with an appropriate default. If a column "should" always have a value, switch it to `NOT NULL` — do **not** add a `?? someValue` fallback in `rowToEntity` to mask NULL. See [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md).
#### Common offender: boolean columns without `.notNull()`
```typescript
// ❌ Wrong — inferred type is `boolean | null`
isEnabled: integer({ mode: 'boolean' }).default(true)
// ✅ Right
isEnabled: integer({ mode: 'boolean' }).notNull().default(true)
```
`mode: 'boolean'` implies two values to a reader, but Drizzle treats
nullability and default as orthogonal. Without `.notNull()`, every reader
writes `row.isEnabled ?? true` — exactly the fabricated-fallback pattern
R3 forbids. `.default(true)` runs at INSERT only; it does not constrain
existing NULLs.
Pair `.notNull().default(...)` on every boolean unless NULL carries a
third meaning (almost never — "unknown enabled" usually maps to `false`).
### Where the default value lives
| Location | Use for | Note |
|---|---|---|
| **DB `.default('X')`** | Type-level "empty" values (`''`, `0`, `false`, `[]`) — won't change because they aren't product choices | **Effectively a near-permanent choice in SQLite** — every change requires a full-table rebuild that copies every row and never touches the existing ones; legacy NULL backfill must be hand-written into the rebuild's `INSERT ... SELECT`. For product-chosen values that could evolve (`'🌟'`, default model parameters), prefer service `??`. See [Default Values & Nullability § DB defaults are near-permanent](./best-practice-default-values-and-nullability.md#db-defaults-are-near-permanent). |
| **Drizzle `$defaultFn(() => …)`** | Dynamic per-row values: UUIDs, `Date.now()` | Lives in the schema file but runs in JS at INSERT time |
| **Service `dto.x ?? DEFAULT`** | Tunable product values that may evolve (e.g., inference parameters) | No migration needed when defaults change; covers all callers (handler, seeder, internal-service) |
| **Zod `.default()`** | Avoid on entity / Create / Update schemas | Bypassed by non-handler callers; forces type asymmetry; see [API Design Guidelines § E](./api-design-guidelines.md#e-default-values-do-not-live-in-zod-schemas) |
For the full rationale and decision tree, see [Default Values & Nullability](./best-practice-default-values-and-nullability.md).
## Foreign Keys
### Basic Usage
```typescript
// SET NULL: preserve record when referenced record is deleted
groupId: text().references(() => groupTable.id, { onDelete: "set null" });
// CASCADE: delete record when referenced record is deleted
topicId: text().references(() => topicTable.id, { onDelete: "cascade" });
```
### Self-Referencing Foreign Keys
For self-referencing foreign keys (e.g., tree structures with parentId), **always use the `foreignKey` operator** in the table's third parameter:
```typescript
import { foreignKey, sqliteTable, text } from "drizzle-orm/sqlite-core";
export const messageTable = sqliteTable(
"message",
{
id: uuidPrimaryKeyOrdered(),
parentId: text(), // Do NOT use .references() here
// ...other fields
},
(t) => [
// Use foreignKey operator for self-referencing
foreignKey({ columns: [t.parentId], foreignColumns: [t.id] }).onDelete(
"set null"
),
]
);
```
**Why this approach:**
- Avoids TypeScript circular reference issues (no need for `AnySQLiteColumn` type annotation)
- More explicit and readable
- Allows chaining `.onDelete()` / `.onUpdate()` actions
### Circular Foreign Key References
**Avoid circular foreign key references between tables.** For example:
```typescript
// ❌ BAD: Circular FK between tables
// tableA.currentItemId -> tableB.id
// tableB.ownerId -> tableA.id
```
If you encounter a scenario that seems to require circular references:
1. **Identify which relationship is "weaker"** - typically the one that can be null or is less critical for data integrity
2. **Remove the FK constraint from the weaker side** - let the application layer handle validation and consistency (this is known as "soft references" pattern)
3. **Document the application-layer constraint** in code comments
```typescript
// ✅ GOOD: Break the cycle by handling one side at application layer
export const topicTable = sqliteTable("topic", {
id: uuidPrimaryKey(),
// Application-managed reference (no FK constraint)
// Validated by TopicService.setCurrentMessage()
currentMessageId: text(),
});
export const messageTable = sqliteTable("message", {
id: uuidPrimaryKeyOrdered(),
// Database-enforced FK
topicId: text().references(() => topicTable.id, { onDelete: "cascade" }),
});
```
**Why soft references for SQLite:**
- SQLite does not support `DEFERRABLE` constraints (unlike PostgreSQL/Oracle)
- Application-layer validation provides equivalent data integrity
- Simplifies insert/update operations without transaction ordering concerns
## Migrations
The migration workflow — `pnpm db:migrations:generate` after schema changes, regenerate-never-rename, CI gates, and additive-vs-rebuild — is consolidated in **[Database Construction](./database-construction.md)**.
## Field Generation Rules
The schema uses Drizzle's auto-generation features. Follow these rules:
### Auto-generated fields (NEVER set manually)
- `id`: Uses `$defaultFn()` with UUID v4/v7, auto-generated on insert
- `createdAt`: Uses `$defaultFn()` with `Date.now()`, auto-generated on insert
- `updatedAt`: Uses `$defaultFn()` and `$onUpdateFn()`, auto-updated on every update
### Using `.returning()` pattern
Always use `.returning()` to get inserted/updated data instead of re-querying:
```typescript
// Good: Use returning()
const [row] = await db.insert(table).values(data).returning();
return rowToEntity(row);
// Avoid: Re-query after insert (unnecessary database round-trip)
await db.insert(table).values({ id, ...data });
return this.getById(id);
```
### Row → Entity Mapping
All `rowToEntity` functions follow a unified paradigm: a shallow `nullsToUndefined(row)` strips DB NULL → undefined, then date fields are converted manually. See the [Row → Entity Mapping](./data-api-in-main.md#row--entity-mapping) section of `data-api-in-main.md` for the paradigm, and [services/utils/README.md](../../../src/main/data/services/utils/README.md) for function signatures and rejected alternatives.
Key principles:
- **Shallow, not recursive**: only column-level NULLs are handled; nested JSON payloads are not deep-cleaned
- **No third-party null-handling library**: the in-house `nullsToUndefined` (~10 LOC) is sufficient — avoid dependency bloat
- **No fabricated fallbacks**: `row.x ?? '🌟'` / `row.x ?? []` is forbidden — see [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md). If a value "should" always be present, fix the column constraint instead of masking NULL in the mapper.
### Soft delete support
The schema supports soft delete via `deletedAt` field (see `createUpdateDeleteTimestamps`).
Business logic can choose to use soft delete or hard delete based on requirements.
## Raw SQL Queries & Recursive CTEs
Drizzle's `casing: 'snake_case'` only applies to the ORM channel
(`db.select()`, `db.insert()`, `db.update()`). Raw SQL via `db.all(sql\`...\`)`
returns SQLite's native snake_case columns with **no runtime mapping** — the
TypeScript generic on `db.all<T>()` is a compile-time assertion only. So
`db.all<typeof messageTable.$inferSelect>(sql\`SELECT * FROM message\`)` lies
to the type system: at runtime `row.parentId` is `undefined`; the actual key
is `parent_id`.
Recursive CTEs (`WITH RECURSIVE`) are the main reason raw SQL is needed —
Drizzle does not yet support them in the query builder.
### Pattern: CTE for IDs, ORM for rows
Keep raw SQL minimal. Use the CTE to compute the **set of IDs** you need
(single-word column, casing-safe), then fetch full rows through the ORM where
camelCase mapping is automatic and fully type-safe.
```typescript
// Step 1 — recursive CTE returns ID-only
const idRows = await db.all<{ id: string }>(sql`
WITH RECURSIVE ancestors AS (
SELECT id, parent_id FROM message WHERE id = ${nodeId} AND deleted_at IS NULL
UNION ALL
SELECT m.id, m.parent_id FROM message m
INNER JOIN ancestors a ON m.id = a.parent_id
WHERE m.deleted_at IS NULL
)
SELECT id FROM ancestors
`)
const ids = idRows.map((r) => r.id)
// Step 2 — fetch full rows via ORM (auto camelCase)
const rows = ids.length > 0
? await db.select().from(messageTable).where(inArray(messageTable.id, ids))
: []
// Step 3 — restore CTE order (IN-list does not preserve order)
const order = new Map(ids.map((id, i) => [id, i]))
rows.sort((a, b) => order.get(a.id)! - order.get(b.id)!)
```
If the CTE computes a derived value (e.g. `tree_depth`), select it alongside
`id` — single-word aliases are also casing-safe — and join it back via a `Map`.
**Don't** `SELECT *` with raw SQL or write a snake→camel helper to patch the
output: both bypass Drizzle's type-safety and let future schema changes drift
silently.
Reference implementations: `MessageService.getTree` / `getBranchMessages` /
`getPathToNode`, `KnowledgeItemService.getCascadeIdsInBase`.
## Custom SQL & FTS5
Triggers, FTS5 virtual tables, the `CUSTOM_SQL_STATEMENTS` every-boot replay (and why it's ~0.1 ms O(1)), the `fts_rowid` rowid-stability rule, and the idempotency rules (vtables `IF NOT EXISTS`, triggers `DROP+CREATE`) are consolidated in **[Database Construction](./database-construction.md)** (§3 Custom SQL, §4 FTS5).
## Seeding
For initial data population (default preferences, builtin languages, preset providers), see [Database Seeding Guide](./database-seeding-guide.md).
## Write Serialization (`DbService.withWriteTx`)
`application.get('DbService').withWriteTx(fn)` runs `fn` as one synchronous `BEGIN IMMEDIATE` transaction on the single persistent connection.
**When it earns its keep.** With better-sqlite3 every statement is atomic on its own — a lone `getDb().insert(...).run()` is a complete implicit transaction and needs no wrapper. A transaction earns its keep only when a mutation must commit **all-or-nothing across more than one statement**:
- **Use it** when composing multiple writes, or a read-then-write (validate/select then insert/update/delete), into one atomic unit — the majority of write paths here (create/update/delete that also touch join tables, purge pins/tags, reorder via neighbour reads, or cascade-delete). The premise is **atomicity** (rollback across statements), not serialization: the single synchronous connection already serializes every write by construction.
- **Don't use** for a single autocommit write — call `getDb()` directly, or pass `getDb()` to the write's `*Tx` form (`this.fooTx(getDb(), …)`). Routing a lone write through `withWriteTx` buys nothing for atomicity and falsely implies a multi-statement invariant. The `*Tx` form stays composable, so the same primitive can still be pulled into a larger `withWriteTx` when a caller genuinely needs multi-write atomicity.
**`withWriteTx` vs `db.transaction()`.** `withWriteTx(fn)` is a thin wrapper over `getDb().transaction(fn, { behavior: 'immediate' })` behind the `isReady` guard. `BEGIN IMMEDIATE` takes the write lock up front, which only matters when a second connection writes concurrently; the main DB uses one connection, so it behaves identically to a plain `db.transaction(fn)`. Prefer `withWriteTx` as the conventional, greppable write seam with the correct write-intent default — but a direct `db.transaction()` is **equivalent** for atomicity and not an error. `withWriteTx` is **not** the readiness gate: `getDb()` already throws when the DB isn't ready, so writes made outside `withWriteTx` are still guarded. The single synchronous connection serializes all access, so there is no process-wide mutex and no `SQLITE_BUSY` retry — the libsql-era serialization this wrapper originally existed for (upstream #288) is gone.
### Signature
```ts
withWriteTx<T>(fn: (tx: DbOrTx) => T): T
```
`fn` must be **synchronous** — better-sqlite3 rejects a Promise-returning transaction callback. Internals: one synchronous `BEGIN IMMEDIATE` transaction behind the `isReady` guard; the single connection serializes all access, so callers never contend.
### Usage
```ts
const dbService = application.get('DbService')
// A single write does NOT use withWriteTx — go straight through getDb(),
// or pass it to the *Tx form:
jobService.setMetadataTx(dbService.getDb(), jobId, merged)
// withWriteTx is for composing multiple writes into one atomic transaction:
dbService.withWriteTx((tx) => {
jobService.cancelByIdsTx(tx, ids, error)
jobService.resetToPendingByIdsTx(tx, otherIds)
})
```
### Two-form DAO pattern
Each write method has a composable `*Tx` form and a thin non-Tx wrapper. A single-write method's wrapper passes `getDb()` to the `*Tx` form; a multi-write / read-then-write method's wrapper composes one or more `*Tx` calls inside a single `withWriteTx`. Either way the `*Tx` form stays composable, so batch/recovery paths can pull it into a larger transaction. See `JobService` / `JobScheduleService` for canonical examples.
```ts
cancelByIdsTx(tx: DbOrTx, ids: string[], error: JobError): void { /* SQL via tx */ }
// Single write → call the *Tx form with getDb() directly (no withWriteTx):
cancelByIds(ids: string[], error: JobError): void {
return this.cancelByIdsTx(application.get('DbService').getDb(), ids, error)
}
```
### Rules
| Rule | Rationale |
| --- | --- |
| `fn` must be synchronous and only do DB ops — no network / file IO / handler execution | better-sqlite3 rejects a Promise-returning callback; the transaction blocks the single connection until `fn` returns |
| Do not wrap reads | WAL mode gives readers snapshot isolation; wrapping adds needless serialization |
| Don't wrap a single autocommit write | one statement is already an implicit transaction — call `getDb()` (or the `*Tx` form) directly; `getDb()` still guards readiness |
| Wrap tight loops in one `withWriteTx`, not per-iteration | One `BEGIN IMMEDIATE` transaction vs N |
### When to migrate existing callsites
| Path | Action |
| --- | --- |
| Multi-statement / read-then-write mutations | Wrap in a transaction — `withWriteTx` (preferred) or a direct `db.transaction()` |
| Single-statement writes | Don't wrap — call `getDb()` (or the `*Tx` form) directly |
| Boot-only writes (migrations, seeders) | Leave |
| Pure reads | Leave |
### Reference
[Concurrency & Locks — Layer 0](../job-and-scheduler/concurrency-and-locks.md).