# Database Schema Guidelines

## Schema File Organization

### Principles

| Scenario                               | Approach            |
| -------------------------------------- | ------------------- |
| Strongly related tables in same domain | Merge into one file |
| Core tables / Complex business logic   | One file per table  |
| Tables that may cross multiple domains | One file per table  |

### Decision Criteria

**Merge when:**

- Tables have strong foreign key relationships (e.g., many-to-many)
- Tables belong to the same business domain
- Tables are unlikely to evolve independently

**Separate (one file per table) when:**

- Core table with many fields and complex logic
- Has a dedicated Service layer counterpart
- May expand independently in the future

### File Naming

- **Single-table files**: named after the table export name (`message.ts` for `messageTable`, `topic.ts` for `topicTable`)
- **Multi-table files**: lowercase, named by domain (`tagging.ts` for `tagTable` + `entityTagTable`)
- **Helper utilities**: underscore prefix (`_columnHelpers.ts`) to indicate non-table definitions

## Naming Conventions

- **Table names**: Use **singular** form with snake_case (e.g., `topic`, `message`, `app_state`)
- **Export names**: Use `xxxTable` pattern (e.g., `topicTable`, `messageTable`)
- **Column names**: Drizzle auto-infers from property names, no need to specify explicitly

## Column Helpers

All helpers are exported from `./schemas/_columnHelpers.ts`.

### Primary Keys

| Helper                    | UUID Version      | Use Case                             |
| ------------------------- | ----------------- | ------------------------------------ |
| `uuidPrimaryKey()`        | v4 (random)       | General purpose tables               |
| `uuidPrimaryKeyOrdered()` | v7 (time-ordered) | Large tables with time-based queries |

**Usage:**

```typescript
import { uuidPrimaryKey, uuidPrimaryKeyOrdered } from './_columnHelpers'

// General purpose table
export const topicTable = sqliteTable('topic', {
  id: uuidPrimaryKey(),
  name: text(),
  ...
})

// Large table with time-ordered data
export const messageTable = sqliteTable('message', {
  id: uuidPrimaryKeyOrdered(),
  content: text(),
  ...
})
```

**Behavior:**

- ID is auto-generated if not provided during insert
- Can be manually specified for migration scenarios
- Use `.returning()` to get the generated ID after insert

### Timestamps

| Helper                         | Fields                                | Use Case                   |
| ------------------------------ | ------------------------------------- | -------------------------- |
| `createUpdateTimestamps`       | `createdAt`, `updatedAt`              | Tables without soft delete |
| `createUpdateDeleteTimestamps` | `createdAt`, `updatedAt`, `deletedAt` | Tables with soft delete    |

**Usage:**

```typescript
import {
  createUpdateTimestamps,
  createUpdateDeleteTimestamps,
} from "./_columnHelpers";

// Without soft delete
export const tagTable = sqliteTable("tag", {
  id: uuidPrimaryKey(),
  name: text(),
  ...createUpdateTimestamps,
});

// With soft delete
export const topicTable = sqliteTable("topic", {
  id: uuidPrimaryKey(),
  name: text(),
  ...createUpdateDeleteTimestamps,
});
```

**Behavior:**

- `createdAt`: Auto-set to `Date.now()` on insert
- `updatedAt`: Auto-set on insert, auto-updated on update
- `deletedAt`: `null` by default, set to timestamp for soft delete

## JSON Fields

For JSON column support, use `{ mode: 'json' }`:

```typescript
data: text({ mode: "json" }).$type<MyDataType>();
```

Drizzle handles JSON serialization/deserialization automatically.

## Column Nullability and Defaults

### When `nullable` vs `NOT NULL`

A column may be `nullable` only when **NULL carries a domain meaning distinct from any value in the column's domain**:

| Pattern | Example |
|---|---|
| Optional foreign key | `assistant.modelId` (no model selected yet) |
| Time of an event that may not have occurred | `deletedAt`, `cancelledAt` |
| Unassigned-tagged state | `pr.reviewerId` (unassigned vs assigned) |

All other columns should be `NOT NULL` with an appropriate default. If a column "should" always have a value, switch it to `NOT NULL` — do **not** add a `?? someValue` fallback in `rowToEntity` to mask NULL. See [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md).

#### Common offender: boolean columns without `.notNull()`

```typescript
// ❌ Wrong — inferred type is `boolean | null`
isEnabled: integer({ mode: 'boolean' }).default(true)

// ✅ Right
isEnabled: integer({ mode: 'boolean' }).notNull().default(true)
```

`mode: 'boolean'` implies two values to a reader, but Drizzle treats
nullability and default as orthogonal. Without `.notNull()`, every reader
writes `row.isEnabled ?? true` — exactly the fabricated-fallback pattern
R3 forbids. `.default(true)` runs at INSERT only; it does not constrain
existing NULLs.

Pair `.notNull().default(...)` on every boolean unless NULL carries a
third meaning (almost never — "unknown enabled" usually maps to `false`).

### Where the default value lives

| Location | Use for | Note |
|---|---|---|
| **DB `.default('X')`** | Type-level "empty" values (`''`, `0`, `false`, `[]`) — won't change because they aren't product choices | **Effectively a near-permanent choice in SQLite** — every change requires a full-table rebuild that copies every row and never touches the existing ones; legacy NULL backfill must be hand-written into the rebuild's `INSERT ... SELECT`. For product-chosen values that could evolve (`'🌟'`, default model parameters), prefer service `??`. See [Default Values & Nullability § DB defaults are near-permanent](./best-practice-default-values-and-nullability.md#db-defaults-are-near-permanent). |
| **Drizzle `$defaultFn(() => …)`** | Dynamic per-row values: UUIDs, `Date.now()` | Lives in the schema file but runs in JS at INSERT time |
| **Service `dto.x ?? DEFAULT`** | Tunable product values that may evolve (e.g., inference parameters) | No migration needed when defaults change; covers all callers (handler, seeder, internal-service) |
| **Zod `.default()`** | Avoid on entity / Create / Update schemas | Bypassed by non-handler callers; forces type asymmetry; see [API Design Guidelines § E](./api-design-guidelines.md#e-default-values-do-not-live-in-zod-schemas) |

For the full rationale and decision tree, see [Default Values & Nullability](./best-practice-default-values-and-nullability.md).

## Foreign Keys

### Basic Usage

```typescript
// SET NULL: preserve record when referenced record is deleted
groupId: text().references(() => groupTable.id, { onDelete: "set null" });

// CASCADE: delete record when referenced record is deleted
topicId: text().references(() => topicTable.id, { onDelete: "cascade" });
```

### Self-Referencing Foreign Keys

For self-referencing foreign keys (e.g., tree structures with parentId), **always use the `foreignKey` operator** in the table's third parameter:

```typescript
import { foreignKey, sqliteTable, text } from "drizzle-orm/sqlite-core";

export const messageTable = sqliteTable(
  "message",
  {
    id: uuidPrimaryKeyOrdered(),
    parentId: text(), // Do NOT use .references() here
    // ...other fields
  },
  (t) => [
    // Use foreignKey operator for self-referencing
    foreignKey({ columns: [t.parentId], foreignColumns: [t.id] }).onDelete(
      "set null"
    ),
  ]
);
```

**Why this approach:**

- Avoids TypeScript circular reference issues (no need for `AnySQLiteColumn` type annotation)
- More explicit and readable
- Allows chaining `.onDelete()` / `.onUpdate()` actions

### Circular Foreign Key References

**Avoid circular foreign key references between tables.** For example:

```typescript
// ❌ BAD: Circular FK between tables
// tableA.currentItemId -> tableB.id
// tableB.ownerId -> tableA.id
```

If you encounter a scenario that seems to require circular references:

1. **Identify which relationship is "weaker"** - typically the one that can be null or is less critical for data integrity
2. **Remove the FK constraint from the weaker side** - let the application layer handle validation and consistency (this is known as "soft references" pattern)
3. **Document the application-layer constraint** in code comments

```typescript
// ✅ GOOD: Break the cycle by handling one side at application layer
export const topicTable = sqliteTable("topic", {
  id: uuidPrimaryKey(),
  // Application-managed reference (no FK constraint)
  // Validated by TopicService.setCurrentMessage()
  currentMessageId: text(),
});

export const messageTable = sqliteTable("message", {
  id: uuidPrimaryKeyOrdered(),
  // Database-enforced FK
  topicId: text().references(() => topicTable.id, { onDelete: "cascade" }),
});
```

**Why soft references for SQLite:**

- SQLite does not support `DEFERRABLE` constraints (unlike PostgreSQL/Oracle)
- Application-layer validation provides equivalent data integrity
- Simplifies insert/update operations without transaction ordering concerns

## Migrations

The migration workflow — `pnpm db:migrations:generate` after schema changes, regenerate-never-rename, CI gates, and additive-vs-rebuild — is consolidated in **[Database Construction](./database-construction.md)**.

## Field Generation Rules

The schema uses Drizzle's auto-generation features. Follow these rules:

### Auto-generated fields (NEVER set manually)

- `id`: Uses `$defaultFn()` with UUID v4/v7, auto-generated on insert
- `createdAt`: Uses `$defaultFn()` with `Date.now()`, auto-generated on insert
- `updatedAt`: Uses `$defaultFn()` and `$onUpdateFn()`, auto-updated on every update

### Using `.returning()` pattern

Always use `.returning()` to get inserted/updated data instead of re-querying:

```typescript
// Good: Use returning()
const [row] = await db.insert(table).values(data).returning();
return rowToEntity(row);

// Avoid: Re-query after insert (unnecessary database round-trip)
await db.insert(table).values({ id, ...data });
return this.getById(id);
```

### Row → Entity Mapping

All `rowToEntity` functions follow a unified paradigm: a shallow `nullsToUndefined(row)` strips DB NULL → undefined, then date fields are converted manually. See the [Row → Entity Mapping](./data-api-in-main.md#row--entity-mapping) section of `data-api-in-main.md` for the paradigm, and [services/utils/README.md](../../../src/main/data/services/utils/README.md) for function signatures and rejected alternatives.

Key principles:

- **Shallow, not recursive**: only column-level NULLs are handled; nested JSON payloads are not deep-cleaned
- **No third-party null-handling library**: the in-house `nullsToUndefined` (~10 LOC) is sufficient — avoid dependency bloat
- **No fabricated fallbacks**: `row.x ?? '🌟'` / `row.x ?? []` is forbidden — see [Default Values & Nullability § R3](./best-practice-default-values-and-nullability.md). If a value "should" always be present, fix the column constraint instead of masking NULL in the mapper.

### Soft delete support

The schema supports soft delete via `deletedAt` field (see `createUpdateDeleteTimestamps`).
Business logic can choose to use soft delete or hard delete based on requirements.

## Raw SQL Queries & Recursive CTEs

Drizzle's `casing: 'snake_case'` only applies to the ORM channel
(`db.select()`, `db.insert()`, `db.update()`). Raw SQL via `db.all(sql\`...\`)`
returns SQLite's native snake_case columns with **no runtime mapping** — the
TypeScript generic on `db.all<T>()` is a compile-time assertion only. So
`db.all<typeof messageTable.$inferSelect>(sql\`SELECT * FROM message\`)` lies
to the type system: at runtime `row.parentId` is `undefined`; the actual key
is `parent_id`.

Recursive CTEs (`WITH RECURSIVE`) are the main reason raw SQL is needed —
Drizzle does not yet support them in the query builder.

### Pattern: CTE for IDs, ORM for rows

Keep raw SQL minimal. Use the CTE to compute the **set of IDs** you need
(single-word column, casing-safe), then fetch full rows through the ORM where
camelCase mapping is automatic and fully type-safe.

```typescript
// Step 1 — recursive CTE returns ID-only
const idRows = await db.all<{ id: string }>(sql`
  WITH RECURSIVE ancestors AS (
    SELECT id, parent_id FROM message WHERE id = ${nodeId} AND deleted_at IS NULL
    UNION ALL
    SELECT m.id, m.parent_id FROM message m
    INNER JOIN ancestors a ON m.id = a.parent_id
    WHERE m.deleted_at IS NULL
  )
  SELECT id FROM ancestors
`)
const ids = idRows.map((r) => r.id)

// Step 2 — fetch full rows via ORM (auto camelCase)
const rows = ids.length > 0
  ? await db.select().from(messageTable).where(inArray(messageTable.id, ids))
  : []

// Step 3 — restore CTE order (IN-list does not preserve order)
const order = new Map(ids.map((id, i) => [id, i]))
rows.sort((a, b) => order.get(a.id)! - order.get(b.id)!)
```

If the CTE computes a derived value (e.g. `tree_depth`), select it alongside
`id` — single-word aliases are also casing-safe — and join it back via a `Map`.

**Don't** `SELECT *` with raw SQL or write a snake→camel helper to patch the
output: both bypass Drizzle's type-safety and let future schema changes drift
silently.

Reference implementations: `MessageService.getTree` / `getBranchMessages` /
`getPathToNode`, `KnowledgeItemService.getCascadeIdsInBase`.

## Custom SQL & FTS5

Triggers, FTS5 virtual tables, the `CUSTOM_SQL_STATEMENTS` every-boot replay (and why it's ~0.1 ms O(1)), the `fts_rowid` rowid-stability rule, and the idempotency rules (vtables `IF NOT EXISTS`, triggers `DROP+CREATE`) are consolidated in **[Database Construction](./database-construction.md)** (§3 Custom SQL, §4 FTS5).

## Seeding

For initial data population (default preferences, builtin languages, preset providers), see [Database Seeding Guide](./database-seeding-guide.md).

## Write Serialization (`DbService.withWriteTx`)

`application.get('DbService').withWriteTx(fn)` runs `fn` as one synchronous `BEGIN IMMEDIATE` transaction on the single persistent connection.

**When it earns its keep.** With better-sqlite3 every statement is atomic on its own — a lone `getDb().insert(...).run()` is a complete implicit transaction and needs no wrapper. A transaction earns its keep only when a mutation must commit **all-or-nothing across more than one statement**:

- **Use it** when composing multiple writes, or a read-then-write (validate/select then insert/update/delete), into one atomic unit — the majority of write paths here (create/update/delete that also touch join tables, purge pins/tags, reorder via neighbour reads, or cascade-delete). The premise is **atomicity** (rollback across statements), not serialization: the single synchronous connection already serializes every write by construction.
- **Don't use** for a single autocommit write — call `getDb()` directly, or pass `getDb()` to the write's `*Tx` form (`this.fooTx(getDb(), …)`). Routing a lone write through `withWriteTx` buys nothing for atomicity and falsely implies a multi-statement invariant. The `*Tx` form stays composable, so the same primitive can still be pulled into a larger `withWriteTx` when a caller genuinely needs multi-write atomicity.

**`withWriteTx` vs `db.transaction()`.** `withWriteTx(fn)` is a thin wrapper over `getDb().transaction(fn, { behavior: 'immediate' })` behind the `isReady` guard. `BEGIN IMMEDIATE` takes the write lock up front, which only matters when a second connection writes concurrently; the main DB uses one connection, so it behaves identically to a plain `db.transaction(fn)`. Prefer `withWriteTx` as the conventional, greppable write seam with the correct write-intent default — but a direct `db.transaction()` is **equivalent** for atomicity and not an error. `withWriteTx` is **not** the readiness gate: `getDb()` already throws when the DB isn't ready, so writes made outside `withWriteTx` are still guarded. The single synchronous connection serializes all access, so there is no process-wide mutex and no `SQLITE_BUSY` retry — the libsql-era serialization this wrapper originally existed for (upstream #288) is gone.

### Signature

```ts
withWriteTx<T>(fn: (tx: DbOrTx) => T): T
```

`fn` must be **synchronous** — better-sqlite3 rejects a Promise-returning transaction callback. Internals: one synchronous `BEGIN IMMEDIATE` transaction behind the `isReady` guard; the single connection serializes all access, so callers never contend.

### Usage

```ts
const dbService = application.get('DbService')

// A single write does NOT use withWriteTx — go straight through getDb(),
// or pass it to the *Tx form:
jobService.setMetadataTx(dbService.getDb(), jobId, merged)

// withWriteTx is for composing multiple writes into one atomic transaction:
dbService.withWriteTx((tx) => {
  jobService.cancelByIdsTx(tx, ids, error)
  jobService.resetToPendingByIdsTx(tx, otherIds)
})
```

### Two-form DAO pattern

Each write method has a composable `*Tx` form and a thin non-Tx wrapper. A single-write method's wrapper passes `getDb()` to the `*Tx` form; a multi-write / read-then-write method's wrapper composes one or more `*Tx` calls inside a single `withWriteTx`. Either way the `*Tx` form stays composable, so batch/recovery paths can pull it into a larger transaction. See `JobService` / `JobScheduleService` for canonical examples.

```ts
cancelByIdsTx(tx: DbOrTx, ids: string[], error: JobError): void { /* SQL via tx */ }

// Single write → call the *Tx form with getDb() directly (no withWriteTx):
cancelByIds(ids: string[], error: JobError): void {
  return this.cancelByIdsTx(application.get('DbService').getDb(), ids, error)
}
```

### Rules

| Rule | Rationale |
| --- | --- |
| `fn` must be synchronous and only do DB ops — no network / file IO / handler execution | better-sqlite3 rejects a Promise-returning callback; the transaction blocks the single connection until `fn` returns |
| Do not wrap reads | WAL mode gives readers snapshot isolation; wrapping adds needless serialization |
| Don't wrap a single autocommit write | one statement is already an implicit transaction — call `getDb()` (or the `*Tx` form) directly; `getDb()` still guards readiness |
| Wrap tight loops in one `withWriteTx`, not per-iteration | One `BEGIN IMMEDIATE` transaction vs N |

### When to migrate existing callsites

| Path | Action |
| --- | --- |
| Multi-statement / read-then-write mutations | Wrap in a transaction — `withWriteTx` (preferred) or a direct `db.transaction()` |
| Single-statement writes | Don't wrap — call `getDb()` (or the `*Tx` form) directly |
| Boot-only writes (migrations, seeders) | Leave |
| Pure reads | Leave |

### Reference

[Concurrency & Locks — Layer 0](../job-and-scheduler/concurrency-and-locks.md).