Add docs/references/data/database-construction.md as the single home for how the SQLite DB is built and evolved: boot init order, drizzle migrations (regenerate-never-rename, CI gates, additive-vs-rebuild), the CUSTOM_SQL_STATEMENTS every-boot replay (~0.1ms O(1)), and the FTS5 fts_rowid rowid-stability rule, plus a gotchas table. Move the Migrations and Custom SQL sections out of database-patterns.md into it (left as pointers), and index it from data/README.md and src/main/data/db/README.md.
Fix stale references found while consolidating: wrong generate command, customSql.ts vs customSqls.ts, columnHelpers.ts vs _columnHelpers.ts, a nonexistent messageFts.ts, yarn vs pnpm, the v2-todo single-0000 claim, the generated-column wording, and v1 data.blocks vocabulary in the testing doc.
libsql client-ts upstream issue #288 makes PRAGMA busy_timeout ineffective
for async transactions, so concurrent db.transaction() calls reliably surface
SQLITE_BUSY. Introduce DbService.withWriteTx as a serialized write helper:
- Process-wide FIFO mutex (async-mutex) serializes write transactions.
- libsql client's default BEGIN IMMEDIATE protects against read-then-write
tx upgrade failures (no override needed at the drizzle layer).
- Single 50ms BUSY retry guards against transient external locks.
Reads do NOT need this — WAL gives readers snapshot isolation that is never
blocked by writers.
Includes unit tests (FIFO ordering, finally release on throw, single BUSY
retry, persistent BUSY rethrow, non-BUSY passthrough) plus a real-libsql
integration test. Updates the DbService test mock with a passthrough
withWriteTx so dependent services do not throw "is not a function" in
tests. Documents the API in database-patterns.md and points
CLAUDE.md / data-api-overview.md at the new pattern.
- data-api-in-main: add Registry Sub-Resource Endpoints section -
GET-only for stateless reads, AIP-136 colon notation for derived
views, registry packages are main-only (bundle waste + merge
already in main)
- best-practice-layered-preset-pattern: preset-only static fields
must merge in rowToEntity rather than via parallel endpoint;
document acceptable exceptions for catalog and specialised
surfaces
- data-ordering-guide section 2: drop user_provider.isEnabled from
the Live partition list - the table is whole-table ordered
(already correct in section 7)
- database-patterns: flag boolean columns without .notNull() as a
common R3 offender, with concrete wrong/right example
The "DB defaults are near-permanent" guidance previously claimed
drizzle-kit cannot auto-generate the SQLite table rebuild for DEFAULT
changes. That's incorrect: drizzle-kit emits the full PRAGMA / CREATE
__new_xxx / INSERT SELECT / DROP / RENAME / re-create-indexes sequence
automatically.
Rewrite the supporting argument from "tooling can't do it" to:
- every change forces a full-table rebuild at runtime (schema lock,
~2x temporary disk, slow on multi-GB tables);
- DEFAULT changes never touch existing rows;
- legacy NULL backfill must be hand-written into the rebuild's
INSERT SELECT line via COALESCE - drizzle-kit will not synthesize
that.
Conclusion (near-permanent) and Safe bias remain unchanged - only the
underlying mechanics are corrected.
Drop the drizzle-orm#2489 reference and the "drizzle-kit generate
--custom" workaround it implied.
Codify what was implicit: putting a value into a DB column DEFAULT for the first time costs nothing, but changing it later in SQLite is expensive and asymmetric (no ALTER COLUMN SET DEFAULT; drizzle-kit emits only an explanatory comment without naming the affected column — drizzle-orm#2489). So the first write is effectively the final one, and the placement bias should flip from "DB by default" to "service unless certain".
Verified via drizzle-orm maintainer (Andrii Sherman) Medium article, drizzle-orm GitHub issues #2489 / #5360 / #1313, and Drizzle docs via context7. Empirical: this repo's 7 existing migrations are all ALTER TABLE ADD COLUMN — zero ALTER COLUMN — confirming the team has so far avoided the manual rebuild path organically.
Spec changes in best-practice-default-values-and-nullability.md:
- New section "DB defaults are near-permanent" between DM2 and Quick chooser
- DM2 → DB DEFAULT row note tightened, links to new section
- Quick chooser flipped: "unsure" now → Service ?? (was "try DB first")
- Standard Layered Design: emoji moved from DB DEFAULT to service ?? (product-chosen value)
- Anti-patterns: emoji-mask row's Correct column updated; new row for "speculative DB DEFAULT thinking I can tune later"
- Case Study A's Fix description aligned with new bias
- Related References: drizzle-orm#2489 added
Companion update in database-patterns.md: same precision for the DB .default('X') cell in "Where the default value lives".
Establish team standards for placing default values across the data stack (DB DEFAULT / Drizzle $defaultFn / Zod .default() / service ??) and judging column nullability.
Originating context: PR #14689 fixed a PATCH leakage bug rooted in defaults living in three places at once (DB, Zod Create schema, row mapper) for the assistant entity. The follow-up discussion recovered general principles that other entities (agent, message) also violate; this doc captures them as a reference for future schema/service work.
- New: best-practice-default-values-and-nullability.md — Five rules (R1-R5), decision matrices for nullability and default placement, standard layered design example, anti-patterns table, case studies (assistant, modelId, agent.accessiblePaths)
- api-design-guidelines.md: refine Rule C Update derivation guidance; add Rule E discouraging Zod .default() on entity / Create / Update schemas
- data-api-in-main.md: upgrade row-mapper ?? fallbacks from "needs hand-write" tolerance to anti-pattern; add Write-path defaults section codifying R4
- database-patterns.md: add Column Nullability and Defaults section; add R3 no-fabricated-fallbacks bullet to Row → Entity Mapping
- README.md: index entry under Reference Guides
No code changes. Implementation follow-up will land in separate PRs that apply these rules entity by entity.
Introduce shared `services/utils/rowMappers.ts` exporting three helpers:
`nullsToUndefined` (renamed from the previous `stripNulls`, with a
corrected type signature that preserves `notNull()` columns unchanged),
`timestampToISO` for guaranteed-present timestamps, and
`timestampToISOOrUndefined` for nullable ones.
Migrate 9 services off hand-rolled null/date handling:
- Eliminate the `stripNulls` duplicate in MiniAppService
- Replace 18 repetitive createdAt/updatedAt ternaries with helper calls
- Fix McpServerService misuse where an optional domain field was being
forced to a synthesized "now" value; restore honest undefined semantics
- Simplify KnowledgeBaseService.rowToKnowledgeBase via the spread idiom,
bypassing `clean` for the `T | null` embeddingModelId field
Document the paradigm in docs/references/data/data-api-in-main.md with
standard and advanced skeletons, plus an explicit "when NOT to spread"
list covering services with field renaming, custom fallbacks, computed
fields, or sensitive-data sanitization. Per-helper design decisions
(shallow vs. recursive, rejected alternatives) live in
services/utils/README.md.
Drizzle's `casing: 'snake_case'` config only applies to the ORM channel.
Raw SQL via `db.all(sql`...`)` returns SQLite's native snake_case columns
with no runtime mapping — the TypeScript generic on `db.all<T>()` is a
compile-time assertion only. The recursive CTEs in `getTree`,
`getBranchMessages`, and `getPathToNode` used `SELECT *` and asserted
results to `messageTable.$inferSelect` (camelCase), so downstream code
silently read `undefined` from `parentId` and `siblingsGroupId`. This
broke multi-model message grouping whenever the renderer rebuilt
branches from the database (page refresh, topic switch).
Switch to the "CTE for IDs, ORM for rows" pattern: the recursive CTE
collects IDs only (single-word columns are casing-safe), then full rows
are fetched via `db.select().from(messageTable).where(inArray(...))`
where Drizzle applies camelCase mapping automatically. CTE order is
restored via a Map since SQL IN-list does not preserve order. Rename
`tree_depth` to `treeDepth` for naming consistency.
Document the pattern as the project standard in
`docs/references/data/database-patterns.md` and add regression tests
covering all three methods plus the multi-model siblings scenario.
Replace the ad-hoc seeding system with a journal-based architecture that
tracks seed versions via app_state table and skips unchanged seeds on startup.
- Introduce ISeeder interface with name/version/description/run() contract
- Add SeedRunner orchestrator with journal-based version tracking
- Rename ISeed -> ISeeder, migrate() -> run() (align with industry conventions)
- Rename *Seed -> *Seeder classes, *Seeding.ts -> *Seeder.ts files
- Move seeders into seeding/seeders/ subdirectory for better organization
- Add hashObject utility for auto-computing version from static data sources
- PreferenceSeeder/TranslateLanguageSeeder: auto checksum via hashObject()
- PresetProviderSeeder: lazy getter using RegistryLoader.getProvidersVersion()
- Simplify DbService.onInit() to single SeedRunner.runAll() call
- Add SeedRunner tests and PreferenceSeeder tests
- Add database-seeding-guide.md with version strategy documentation
Signed-off-by: fullex <0xfullex@gmail.com>