18 KiB
Database Schema Guidelines
Schema File Organization
Principles
| Scenario | Approach |
|---|---|
| Strongly related tables in same domain | Merge into one file |
| Core tables / Complex business logic | One file per table |
| Tables that may cross multiple domains | One file per table |
Decision Criteria
Merge when:
- Tables have strong foreign key relationships (e.g., many-to-many)
- Tables belong to the same business domain
- Tables are unlikely to evolve independently
Separate (one file per table) when:
- Core table with many fields and complex logic
- Has a dedicated Service layer counterpart
- May expand independently in the future
File Naming
- Single-table files: named after the table export name (
message.tsformessageTable,topic.tsfortopicTable) - Multi-table files: lowercase, named by domain (
tagging.tsfortagTable+entityTagTable) - Helper utilities: underscore prefix (
_columnHelpers.ts) to indicate non-table definitions
Naming Conventions
- Table names: Use singular form with snake_case (e.g.,
topic,message,app_state) - Export names: Use
xxxTablepattern (e.g.,topicTable,messageTable) - Column names: Drizzle auto-infers from property names, no need to specify explicitly
Column Helpers
All helpers are exported from ./schemas/_columnHelpers.ts.
Primary Keys
| Helper | UUID Version | Use Case |
|---|---|---|
uuidPrimaryKey() |
v4 (random) | General purpose tables |
uuidPrimaryKeyOrdered() |
v7 (time-ordered) | Large tables with time-based queries |
Usage:
import { uuidPrimaryKey, uuidPrimaryKeyOrdered } from './_columnHelpers'
// General purpose table
export const topicTable = sqliteTable('topic', {
id: uuidPrimaryKey(),
name: text(),
...
})
// Large table with time-ordered data
export const messageTable = sqliteTable('message', {
id: uuidPrimaryKeyOrdered(),
content: text(),
...
})
Behavior:
- ID is auto-generated if not provided during insert
- Can be manually specified for migration scenarios
- Use
.returning()to get the generated ID after insert
Timestamps
| Helper | Fields | Use Case |
|---|---|---|
createUpdateTimestamps |
createdAt, updatedAt |
Tables without soft delete |
createUpdateDeleteTimestamps |
createdAt, updatedAt, deletedAt |
Tables with soft delete |
Usage:
import {
createUpdateTimestamps,
createUpdateDeleteTimestamps,
} from "./_columnHelpers";
// Without soft delete
export const tagTable = sqliteTable("tag", {
id: uuidPrimaryKey(),
name: text(),
...createUpdateTimestamps,
});
// With soft delete
export const topicTable = sqliteTable("topic", {
id: uuidPrimaryKey(),
name: text(),
...createUpdateDeleteTimestamps,
});
Behavior:
createdAt: Auto-set toDate.now()on insertupdatedAt: Auto-set on insert, auto-updated on updatedeletedAt:nullby default, set to timestamp for soft delete
JSON Fields
For JSON column support, use { mode: 'json' }:
data: text({ mode: "json" }).$type<MyDataType>();
Drizzle handles JSON serialization/deserialization automatically.
Column Nullability and Defaults
When nullable vs NOT NULL
A column may be nullable only when NULL carries a domain meaning distinct from any value in the column's domain:
| Pattern | Example |
|---|---|
| Optional foreign key | assistant.modelId (no model selected yet) |
| Time of an event that may not have occurred | deletedAt, cancelledAt |
| Unassigned-tagged state | pr.reviewerId (unassigned vs assigned) |
All other columns should be NOT NULL with an appropriate default. If a column "should" always have a value, switch it to NOT NULL — do not add a ?? someValue fallback in rowToEntity to mask NULL. See Default Values & Nullability § R3.
Common offender: boolean columns without .notNull()
// ❌ Wrong — inferred type is `boolean | null`
isEnabled: integer({ mode: 'boolean' }).default(true)
// ✅ Right
isEnabled: integer({ mode: 'boolean' }).notNull().default(true)
mode: 'boolean' implies two values to a reader, but Drizzle treats
nullability and default as orthogonal. Without .notNull(), every reader
writes row.isEnabled ?? true — exactly the fabricated-fallback pattern
R3 forbids. .default(true) runs at INSERT only; it does not constrain
existing NULLs.
Pair .notNull().default(...) on every boolean unless NULL carries a
third meaning (almost never — "unknown enabled" usually maps to false).
Where the default value lives
| Location | Use for | Note |
|---|---|---|
DB .default('X') |
Type-level "empty" values ('', 0, false, []) — won't change because they aren't product choices |
Effectively a near-permanent choice in SQLite — every change requires a full-table rebuild that copies every row and never touches the existing ones; legacy NULL backfill must be hand-written into the rebuild's INSERT ... SELECT. For product-chosen values that could evolve ('🌟', default model parameters), prefer service ??. See Default Values & Nullability § DB defaults are near-permanent. |
Drizzle $defaultFn(() => …) |
Dynamic per-row values: UUIDs, Date.now() |
Lives in the schema file but runs in JS at INSERT time |
Service dto.x ?? DEFAULT |
Tunable product values that may evolve (e.g., inference parameters) | No migration needed when defaults change; covers all callers (handler, seeder, internal-service) |
Zod .default() |
Avoid on entity / Create / Update schemas | Bypassed by non-handler callers; forces type asymmetry; see API Design Guidelines § E |
For the full rationale and decision tree, see Default Values & Nullability.
Foreign Keys
Basic Usage
// SET NULL: preserve record when referenced record is deleted
groupId: text().references(() => groupTable.id, { onDelete: "set null" });
// CASCADE: delete record when referenced record is deleted
topicId: text().references(() => topicTable.id, { onDelete: "cascade" });
Self-Referencing Foreign Keys
For self-referencing foreign keys (e.g., tree structures with parentId), always use the foreignKey operator in the table's third parameter:
import { foreignKey, sqliteTable, text } from "drizzle-orm/sqlite-core";
export const messageTable = sqliteTable(
"message",
{
id: uuidPrimaryKeyOrdered(),
parentId: text(), // Do NOT use .references() here
// ...other fields
},
(t) => [
// Use foreignKey operator for self-referencing
foreignKey({ columns: [t.parentId], foreignColumns: [t.id] }).onDelete(
"set null"
),
]
);
Why this approach:
- Avoids TypeScript circular reference issues (no need for
AnySQLiteColumntype annotation) - More explicit and readable
- Allows chaining
.onDelete()/.onUpdate()actions
Circular Foreign Key References
Avoid circular foreign key references between tables. For example:
// ❌ BAD: Circular FK between tables
// tableA.currentItemId -> tableB.id
// tableB.ownerId -> tableA.id
If you encounter a scenario that seems to require circular references:
- Identify which relationship is "weaker" - typically the one that can be null or is less critical for data integrity
- Remove the FK constraint from the weaker side - let the application layer handle validation and consistency (this is known as "soft references" pattern)
- Document the application-layer constraint in code comments
// ✅ GOOD: Break the cycle by handling one side at application layer
export const topicTable = sqliteTable("topic", {
id: uuidPrimaryKey(),
// Application-managed reference (no FK constraint)
// Validated by TopicService.setCurrentMessage()
currentMessageId: text(),
});
export const messageTable = sqliteTable("message", {
id: uuidPrimaryKeyOrdered(),
// Database-enforced FK
topicId: text().references(() => topicTable.id, { onDelete: "cascade" }),
});
Why soft references for SQLite:
- SQLite does not support
DEFERRABLEconstraints (unlike PostgreSQL/Oracle) - Application-layer validation provides equivalent data integrity
- Simplifies insert/update operations without transaction ordering concerns
Migrations
The migration workflow — pnpm db:migrations:generate after schema changes, regenerate-never-rename, CI gates, and additive-vs-rebuild — is consolidated in Database Construction.
Field Generation Rules
The schema uses Drizzle's auto-generation features. Follow these rules:
Auto-generated fields (NEVER set manually)
id: Uses$defaultFn()with UUID v4/v7, auto-generated on insertcreatedAt: Uses$defaultFn()withDate.now(), auto-generated on insertupdatedAt: Uses$defaultFn()and$onUpdateFn(), auto-updated on every update
Using .returning() pattern
Always use .returning() to get inserted/updated data instead of re-querying:
// Good: Use returning()
const [row] = await db.insert(table).values(data).returning();
return rowToEntity(row);
// Avoid: Re-query after insert (unnecessary database round-trip)
await db.insert(table).values({ id, ...data });
return this.getById(id);
Row → Entity Mapping
All rowToEntity functions follow a unified paradigm: a shallow nullsToUndefined(row) strips DB NULL → undefined, then date fields are converted manually. See the Row → Entity Mapping section of data-api-in-main.md for the paradigm, and services/utils/README.md for function signatures and rejected alternatives.
Key principles:
- Shallow, not recursive: only column-level NULLs are handled; nested JSON payloads are not deep-cleaned
- No third-party null-handling library: the in-house
nullsToUndefined(~10 LOC) is sufficient — avoid dependency bloat - No fabricated fallbacks:
row.x ?? '🌟'/row.x ?? []is forbidden — see Default Values & Nullability § R3. If a value "should" always be present, fix the column constraint instead of masking NULL in the mapper.
Soft delete support
The schema supports soft delete via deletedAt field (see createUpdateDeleteTimestamps).
Business logic can choose to use soft delete or hard delete based on requirements.
Raw SQL Queries & Recursive CTEs
Drizzle's casing: 'snake_case' only applies to the ORM channel
(db.select(), db.insert(), db.update()). Raw SQL via db.all(sql\...`)returns SQLite's native snake_case columns with **no runtime mapping** — the TypeScript generic ondb.all()is a compile-time assertion only. Sodb.all<typeof messageTable.$inferSelect>(sql`SELECT * FROM message`)lies to the type system: at runtimerow.parentIdisundefined; the actual key is parent_id`.
Recursive CTEs (WITH RECURSIVE) are the main reason raw SQL is needed —
Drizzle does not yet support them in the query builder.
Pattern: CTE for IDs, ORM for rows
Keep raw SQL minimal. Use the CTE to compute the set of IDs you need (single-word column, casing-safe), then fetch full rows through the ORM where camelCase mapping is automatic and fully type-safe.
// Step 1 — recursive CTE returns ID-only
const idRows = await db.all<{ id: string }>(sql`
WITH RECURSIVE ancestors AS (
SELECT id, parent_id FROM message WHERE id = ${nodeId} AND deleted_at IS NULL
UNION ALL
SELECT m.id, m.parent_id FROM message m
INNER JOIN ancestors a ON m.id = a.parent_id
WHERE m.deleted_at IS NULL
)
SELECT id FROM ancestors
`)
const ids = idRows.map((r) => r.id)
// Step 2 — fetch full rows via ORM (auto camelCase)
const rows = ids.length > 0
? await db.select().from(messageTable).where(inArray(messageTable.id, ids))
: []
// Step 3 — restore CTE order (IN-list does not preserve order)
const order = new Map(ids.map((id, i) => [id, i]))
rows.sort((a, b) => order.get(a.id)! - order.get(b.id)!)
If the CTE computes a derived value (e.g. tree_depth), select it alongside
id — single-word aliases are also casing-safe — and join it back via a Map.
Don't SELECT * with raw SQL or write a snake→camel helper to patch the
output: both bypass Drizzle's type-safety and let future schema changes drift
silently.
Reference implementations: MessageService.getTree / getBranchMessages /
getPathToNode, KnowledgeItemService.getCascadeIdsInBase.
Custom SQL & FTS5
Triggers, FTS5 virtual tables, the CUSTOM_SQL_STATEMENTS every-boot replay (and why it's ~0.1 ms O(1)), the fts_rowid rowid-stability rule, and the idempotency rules (vtables IF NOT EXISTS, triggers DROP+CREATE) are consolidated in Database Construction (§3 Custom SQL, §4 FTS5).
Seeding
For initial data population (default preferences, builtin languages, preset providers), see Database Seeding Guide.
Write Serialization (DbService.withWriteTx)
application.get('DbService').withWriteTx(fn) runs fn as one synchronous BEGIN IMMEDIATE transaction on the single persistent connection.
When it earns its keep. With better-sqlite3 every statement is atomic on its own — a lone getDb().insert(...).run() is a complete implicit transaction and needs no wrapper. A transaction earns its keep only when a mutation must commit all-or-nothing across more than one statement:
- Use it when composing multiple writes, or a read-then-write (validate/select then insert/update/delete), into one atomic unit — the majority of write paths here (create/update/delete that also touch join tables, purge pins/tags, reorder via neighbour reads, or cascade-delete). The premise is atomicity (rollback across statements), not serialization: the single synchronous connection already serializes every write by construction.
- Don't use for a single autocommit write — call
getDb()directly, or passgetDb()to the write's*Txform (this.fooTx(getDb(), …)). Routing a lone write throughwithWriteTxbuys nothing for atomicity and falsely implies a multi-statement invariant. The*Txform stays composable, so the same primitive can still be pulled into a largerwithWriteTxwhen a caller genuinely needs multi-write atomicity.
withWriteTx vs db.transaction(). withWriteTx(fn) is a thin wrapper over getDb().transaction(fn, { behavior: 'immediate' }) behind the isReady guard. BEGIN IMMEDIATE takes the write lock up front, which only matters when a second connection writes concurrently; the main DB uses one connection, so it behaves identically to a plain db.transaction(fn). Prefer withWriteTx as the conventional, greppable write seam with the correct write-intent default — but a direct db.transaction() is equivalent for atomicity and not an error. withWriteTx is not the readiness gate: getDb() already throws when the DB isn't ready, so writes made outside withWriteTx are still guarded. The single synchronous connection serializes all access, so there is no process-wide mutex and no SQLITE_BUSY retry — the libsql-era serialization this wrapper originally existed for (upstream #288) is gone.
Signature
withWriteTx<T>(fn: (tx: DbOrTx) => T): T
fn must be synchronous — better-sqlite3 rejects a Promise-returning transaction callback. Internals: one synchronous BEGIN IMMEDIATE transaction behind the isReady guard; the single connection serializes all access, so callers never contend.
Usage
const dbService = application.get('DbService')
// A single write does NOT use withWriteTx — go straight through getDb(),
// or pass it to the *Tx form:
jobService.setMetadataTx(dbService.getDb(), jobId, merged)
// withWriteTx is for composing multiple writes into one atomic transaction:
dbService.withWriteTx((tx) => {
jobService.cancelByIdsTx(tx, ids, error)
jobService.resetToPendingByIdsTx(tx, otherIds)
})
Two-form DAO pattern
Each write method has a composable *Tx form and a thin non-Tx wrapper. A single-write method's wrapper passes getDb() to the *Tx form; a multi-write / read-then-write method's wrapper composes one or more *Tx calls inside a single withWriteTx. Either way the *Tx form stays composable, so batch/recovery paths can pull it into a larger transaction. See JobService / JobScheduleService for canonical examples.
cancelByIdsTx(tx: DbOrTx, ids: string[], error: JobError): void { /* SQL via tx */ }
// Single write → call the *Tx form with getDb() directly (no withWriteTx):
cancelByIds(ids: string[], error: JobError): void {
return this.cancelByIdsTx(application.get('DbService').getDb(), ids, error)
}
Rules
| Rule | Rationale |
|---|---|
fn must be synchronous and only do DB ops — no network / file IO / handler execution |
better-sqlite3 rejects a Promise-returning callback; the transaction blocks the single connection until fn returns |
| Do not wrap reads | WAL mode gives readers snapshot isolation; wrapping adds needless serialization |
| Don't wrap a single autocommit write | one statement is already an implicit transaction — call getDb() (or the *Tx form) directly; getDb() still guards readiness |
Wrap tight loops in one withWriteTx, not per-iteration |
One BEGIN IMMEDIATE transaction vs N |
When to migrate existing callsites
| Path | Action |
|---|---|
| Multi-statement / read-then-write mutations | Wrap in a transaction — withWriteTx (preferred) or a direct db.transaction() |
| Single-statement writes | Don't wrap — call getDb() (or the *Tx form) directly |
| Boot-only writes (migrations, seeders) | Leave |
| Pure reads | Leave |