mirror of https://github.com/CherryHQ/cherry-studio.git synced 2026-07-03 12:27:41 +08:00

Files

fullex 25fe3343fc docs(data): add canonical pagination guide and wire references

Pagination docs were scattered across api-types.md (types + cursor
semantics), data-api-in-renderer.md (hooks), data-api-in-main.md (offset
example + keyset note), api-design-guidelines.md (query params), and
data-ordering-guide.md (cache shapes + determinism), with no single
discoverable home for the offset-vs-cursor model.

Add docs/references/data/data-pagination-guide.md as the canonical hub
(mirrors data-ordering-guide.md): two modes, four-layer quickstart, wire
contract, server impl (offset + keyset cursor + multi-band caveat),
renderer consumption, FTS pagination, gotchas, and a see-also map. Other
docs keep their authoritative slice and link to the guide; the migrated
conceptual prose is removed from api-types.md to avoid duplication.

Also fix two pre-existing broken anchors found while verifying links
(database-patterns withWriteTx; ordering guide section number).

2026-06-22 06:41:27 -07:00

21 KiB

Raw Blame History

DataApi in Main Process

This guide covers how to implement API handlers and services in the Main process.

Architecture Layers

Handlers → Services → Database

Handlers: Thin layer, extract params, call service, transform response
Services: Business logic, validation, transaction coordination, data access via Drizzle ORM
Database: Drizzle ORM + SQLite

Transport Adapters

ApiServer is transport-agnostic. Adapters in api/core/adapters/ bridge specific transports (IPC, HTTP) to ApiServer. Each adapter implements Disposable for automatic lifecycle cleanup. See IpcAdapter.ts JSDoc for design rationale and extension guide.

Implementing Handlers

Location

src/main/data/api/handlers/

Handler Responsibilities

Extract parameters from request
Delegate to business service
Transform response for IPC
NO business logic here

Handler Type Annotation

Every per-module handler record MUST be annotated with HandlersFor<XxxSchemas>. This is the canonical shape for all files in src/main/data/api/handlers/ — not a convention to choose among alternatives.

HandlersFor<XxxSchemas> enforces two invariants:

Paths are narrowed to the module's own schema. Path strings outside XxxSchemas (typos, cross-module leaks) produce a compile error.
Methods are exhaustive. Every path + method declared in XxxSchemas must have a handler; adding an endpoint to the schema without a matching handler is a compile error.

Example Handler

// handlers/topics.ts
import { topicService } from '@data/services/TopicService'
import type { HandlersFor } from '@shared/data/api/apiTypes'
import type { TopicSchemas } from '@shared/data/api/schemas/topics'

export const topicHandlers: HandlersFor<TopicSchemas> = {
  '/topics': {
    GET: async ({ query }) => {
      const { page = 1, limit = 20 } = query ?? {}
      return await topicService.list({ page, limit })
    },
    POST: async ({ body }) => {
      return await topicService.create(body)
    }
  },
  '/topics/:id': {
    GET: async ({ params }) => {
      return await topicService.getById(params.id)
    },
    PUT: async ({ params, body }) => {
      return await topicService.replace(params.id, body)
    },
    PATCH: async ({ params, body }) => {
      return await topicService.update(params.id, body)
    },
    DELETE: async ({ params }) => {
      await topicService.delete(params.id)
    }
  }
}

Register Handlers

// handlers/index.ts
import { topicHandlers } from './topic'
import { messageHandlers } from './message'

export const allHandlers: ApiImplementation = {
  ...topicHandlers,
  ...messageHandlers
}

Implementing Services

Location

src/main/data/services/

Service Responsibilities

Business validation
Transaction coordination
Domain workflows
Data access via Drizzle ORM

Scope limit: A DataApi service is the data business-logic layer — its domain workflows orchestrate SQLite reads/writes only, never fs/network/process/external-service side effects, even alongside a legitimate DB write and no matter how deeply nested. See Hard Rule: No Non-Data Side Effects.

Cross-Service Table Access

Each table has exactly one owning service — the rule is split by access kind:

Writes (insert / update / delete) to a table you do not own: forbidden. Call the owner's method (pass tx for transactional writes — owners' mutation methods accept Pick<DbType, 'delete' | 'insert' | ...> as the first arg). If a needed shape is missing, add a method on the owner; bulk needs get a bulk method (e.g. purgeForEntitiesTx).
Reads from a table you do not own: allowed when inlining is the simpler path. A cross-table JOIN that combines the owner's table into your query in one round-trip is fine; reach for the owner's read API only when the read needs business logic the owner already encapsulates.

Why writes are strict: the owning service is the single source of truth for the table's invariants (unique indices, orderKey semantics, soft-delete, audit timestamps) and emits its mutation logs. Foreign writes split that knowledge across every caller and silence the log narrative.

✅ ProviderService.delete → pinService.purgeForEntitiesTx(tx, 'model', ids) ✅ AssistantService.list JOINs entity_tag + tag inline to load tags per assistant ❌ ProviderService.delete → tx.delete(pinTable).where(...) directly

If you're tempted to write "going through XxxService would be over-engineering" — stop. A 5-line method on the owner is not over-engineering; a foreign service writing to its table is.

Breaking a circular dependency (`dataServiceRegistry`)

When two services call each other (A→B and B→A), a top-level import { bService } from './BService' forms a value-level import cycle the bundler cannot order. Do not paper over it with await import('./BService') at the call site — that infects the caller with async, hides the edge from static tooling, and is easy to reintroduce.

Resolve the sibling lazily through dataServiceRegistry instead:

the sibling self-registers at the bottom of its module: registerDataService('BService', bService)
the caller resolves at call time: const bService = getDataService('BService')

The registry imports services only as import type, so it stays a sink in the import graph and no value cycle can form. Only the services that form a cycle are added to the registry and self-register; every other data service stays a plain direct-import singleton and never touches it. Acyclic cross-calls keep using a plain direct import (e.g. pinService above) — reach for the registry only when a real cycle exists.

Tests: the registry is populated by module load — in production each service is loaded by its DataApi handler before any call runs. A unit test that drives a cross-service path must load the sibling so it self-registers, via a side-effect import:

import '@data/services/BService' // self-registers; otherwise getDataService throws "not registered yet"

Contract and rationale: src/main/data/services/dataServiceRegistry.ts.

Example Service

The list() below shows the offset pagination shape for illustration. The real TopicService is cursor-paginated (listByCursor); a production offset list looks like AssistantService.list. See the Pagination Guide for both server patterns.

// services/TopicService.ts
import { eq, desc, sql } from 'drizzle-orm'
import { application } from '@application'
import { topicTable } from '@data/db/schemas/topic'
import { DataApiErrorFactory } from '@shared/data/api'

export class TopicService {
  private get db() {
    return application.get('DbService').getDb()
  }

  async list(options: { page: number; limit: number }) {
    const { page, limit } = options
    const offset = (page - 1) * limit

    const [items, countResult] = await Promise.all([
      this.db.select().from(topicTable)
        .orderBy(desc(topicTable.updatedAt))
        .limit(limit).offset(offset),
      this.db.select({ count: sql<number>`count(*)` }).from(topicTable)
    ])

    return { items, total: countResult[0].count, page, limit }
  }

  async getById(id: string) {
    const [topic] = await this.db.select().from(topicTable)
      .where(eq(topicTable.id, id)).limit(1)
    if (!topic) {
      throw DataApiErrorFactory.notFound('Topic', id)
    }
    return topic
  }

  async create(data: CreateTopicDto) {
    this.validateTopicData(data)
    const [topic] = await this.db.insert(topicTable).values(data).returning()
    return topic
  }

  async update(id: string, data: Partial<UpdateTopicDto>) {
    await this.getById(id) // Throws if not found
    const [topic] = await this.db.update(topicTable)
      .set(data).where(eq(topicTable.id, id)).returning()
    return topic
  }

  async delete(id: string) {
    await this.getById(id) // Throws if not found
    await this.db.delete(topicTable).where(eq(topicTable.id, id))
  }

  private validateTopicData(data: CreateTopicDto) {
    if (!data.name?.trim()) {
      throw DataApiErrorFactory.validation({ name: ['Name is required'] })
    }
  }
}

export const topicService = new TopicService()

Write-path defaults

service.create() passes a value into db.insert(...).values({...}) only for columns that are NOT NULL, have neither a DB DEFAULT nor a $defaultFn, and are not already supplied by the DTO:

async create(dto: CreateXxxDto) {
  return await this.db.insert(xxxTable).values({
    ...dto,
    settings: dto.settings ?? DEFAULT_XXX_SETTINGS  // service-owned default for a tunable product value
  }).returning()
}

For everything else — fields with DB DEFAULTs, $defaultFn columns, or genuinely nullable columns — omit the field from values({...}). Drizzle leaves it out of the SQL; the DB applies its own default (or NULL for nullable columns). Restating the DB's knowledge in app code creates drift risk when defaults later change.

For the cross-layer placement decision tree, see Default Values & Nullability.

Row → Entity Mapping

Each Entity Service provides a rowToEntity function that bridges a Drizzle row to its domain entity. Use nullsToUndefined (from services/utils/rowMappers.ts) for the SQLite NULL → TypeScript undefined translation.

Standard skeleton:

import { nullsToUndefined, timestampToISO } from './utils/rowMappers'

function rowToMcpServer(row: typeof mcpServerTable.$inferSelect): McpServer {
  const clean = nullsToUndefined(row)
  return {
    ...clean,
    type: clean.type as McpServer['type'], // narrow enum
    installSource: clean.installSource as McpServer['installSource'],
    createdAt: timestampToISO(row.createdAt),
    updatedAt: timestampToISO(row.updatedAt)
  }
}

Audit columns generated by createUpdateTimestamps are DB-level .notNull(), so row.createdAt / row.updatedAt narrow to number and timestampToISO is the default. timestampToISOOrUndefined is reserved for construction paths where the entire source row may be absent (e.g. MiniAppService.builtinToMiniApp merging a builtin definition with an optional dbRow).

Advanced skeleton — preserving T | null fields:

When the domain type declares a field as T | null (e.g. KnowledgeBaseSchema.embeddingModelId: z.string().nullable()), bypass clean for that field and reference row directly. nullsToUndefined narrows all top-level nulls to undefined and would break the T | null contract if the field came from clean.

function rowToKnowledgeBase(row: typeof knowledgeBaseTable.$inferSelect): KnowledgeBase {
  const clean = nullsToUndefined(row)
  return {
    ...clean,
    // Preserve `string | null` contract — bypass clean (which would narrow null → undefined)
    embeddingModelId: row.embeddingModelId,
    createdAt: timestampToISO(row.createdAt),
    updatedAt: timestampToISO(row.updatedAt)
  }
}

Rule of thumb: domain field typed T | null → use row.x; domain field typed T? or T → use clean.x (or ...clean).

When nullsToUndefined + spread is NOT a fit:

Some rowToEntity functions do too much to benefit from spread. Keep them hand-written when any of the following apply:

Field renaming: row.parameters → domain parameterSupport (ModelService)
Computed / merged fields: authType derivation, apiFeatures merging from defaults (ProviderService)
Sensitive data sanitization: apiKeys stripping — ...clean would leak unsanitized values
Discriminator-driven field stripping with brand validation: branded discriminated union where each variant declares only its own fields — nullsToUndefined + spread would emit absent fields as undefined and break the BO shape. Dispatch on the discriminator and call schema.parse per variant. Example: FileEntryService.rowToFileEntry for FileEntry (variants on origin); see src/shared/data/types/file/fileEntry.ts header (§"DB row vs Business Object") for the full DB-CHECK / BO-narrow rationale.

Anti-pattern — ?? fallbacks for fabricated defaults:

row.x ?? '🌟' / row.x ?? [] inside rowToEntity is forbidden. The presence of such a fallback is reverse evidence that the column should be NOT NULL with a DB DEFAULT or $defaultFn — see Default Values & Nullability § R3. The legitimate exception is when the entity field is genuinely T | null (e.g. assistant.modelId); then bypass clean and reference row.x directly to preserve the NULL contract — that is the Advanced skeleton above, not a ?? fallback.

Conventions:

DB NULL ↔ domain undefined boundary. Domain types under @shared/data/types/* use optional fields (?:) rather than T | null, aligning with the Google TypeScript Style Guide and keeping null from leaking to the renderer via IPC. nullsToUndefined(row) is the only place this translation happens.
Batch vs single-field null handling. For processing an entire row, always use nullsToUndefined(row) + spread — do NOT hand-write per-field ?? undefined. For single values that are NOT from a row (DTO fields, computed values, function returns), inline value ?? undefined is enough — TypeScript narrows T | null to T | undefined automatically at the ?? expression. Do NOT wrap the single-field case in a helper.
Date fields: two helpers, clear boundary. timestampToISO(value: number | Date): string is the default for rowToEntity — audit columns from createUpdateTimestamps are .notNull(), so the DB row hands back a real number. timestampToISOOrUndefined(value: number | Date | null | undefined): string | undefined is reserved for merge paths where the source row itself may be absent (e.g. a builtin/preset definition without a preference row). Do NOT use timestampToISOOrUndefined as a "safer default" — if your input is a DB row, it always has these fields.

For function signature details and design-decision history (e.g. why shallow-not-recursive, why not dnull), see services/utils/README.md.

Cursor (keyset) pagination. List endpoints that page by a (sortKey, id) tuple use the shared codec + ordering helper in services/utils/keysetCursor.ts — decodeListCursor / encodeCursor for the <key>:<id> wire format, and keysetOrdering(keyCol, idCol, { major, tie }) which returns both the strict-tuple WHERE predicate (.where(cursor)) and its matching orderBy, derived from one direction spec. Do NOT hand-write cursor encode/decode, the keyset WHERE tuple, or the ORDER BY in a service. See services/utils/README.md for the list-vs-search decode policy split and boundaries, and the Pagination Guide for the end-to-end pagination model (offset + cursor).

Service with Transaction

async createTopicWithMessage(data: CreateTopicWithMessageDto) {
  const db = application.get('DbService').getDb()

  return await db.transaction(async (tx) => {
    const [topic] = await tx.insert(topicTable).values(data.topic).returning()

    const [message] = await tx.insert(messageTable).values({
      ...data.message,
      topicId: topic.id
    }).returning()

    return { topic, message }
  })
}

Transaction Method Naming

Service methods accepting a Drizzle transaction:

Rule
Parameter position	`tx` is the first parameter
Method name	ends with `Tx`
Parameter type	`Pick<DbType, '...'>` with the minimum operations needed
Non-Tx wrapper	optional; thin `db.transaction(...)` wrapper, only when a caller needs to own the transaction

// ✅
async purgeForEntityTx(tx: Pick<DbType, 'delete'>, entityType: EntityType, entityId: string): Promise<void>

// ❌ tx not first
async purgeForEntity(entityType: EntityType, entityId: string, tx: Pick<DbType, 'delete'>)
// ❌ missing Tx suffix
async purgeForEntity(tx: Pick<DbType, 'delete'>, entityType: EntityType, entityId: string)
// ❌ over-broad type
async purgeForEntityTx(tx: DbType, entityType: EntityType, entityId: string)

Optional non-Tx wrapper:

async purgeForEntity(entityType: EntityType, entityId: string): Promise<void> {
  await this.db.transaction((tx) => this.purgeForEntityTx(tx, entityType, entityId))
}

Repository Pattern (Strongly Discouraged)

⚠️ Do NOT create Repository files by default. Services handle both business logic and data access directly via Drizzle ORM. This is an intentional design decision.

Only create a separate Repository when you are 1000% certain it is absolutely necessary — e.g., extremely complex multi-table queries with joins/CTEs that would make the Service unreadable, AND the query logic is reused across multiple services.

If in doubt, keep it in the Service. The overhead of an extra architectural layer is not justified for this project's scale (Electron desktop app + SQLite).

Registry Services (Supplementary)

In rare cases where a handler needs to merge read-only preset data (shipped JSON/TS) with database data, a Registry Service may be introduced. This is uncommon — the vast majority of services are Entity Services.

Registry Services:

Do NOT own a database table and do NOT access the database directly
Obtain DB data by calling the owning Entity Service
Named {Domain}RegistryService (e.g., ProviderRegistryService)
Primary data source is static preset data (JSON files, TS constants)
All methods are read-only (no inserts, updates, or deletes)

See Layered Preset Pattern for the general architecture.

Registry Sub-Resource Endpoints

Registry data reaches the renderer through sub-resource endpoints on the owning entity. Three rules govern their shape.

GET only. Registry endpoints are stateless reads — preset merged with DB rows. POST is reserved for state changes; using it for reads breaks SWR caching, request dedup, and retry safety. For composite IDs containing /, use the greedy path form :id* (see Greedy Path Parameters). For batched lookups exceeding URL limits, split into multiple GETs — DataApi dedup makes burst reads cheap.

Colon-notation for derived views. When the sub-resource name is ambiguous, disambiguate with AIP-136 colon notation:

Shape	Use for
`GET /:parent/:id/:sub`	List the merged collection
`GET /:parent/:id/:sub:action`	Compute a derived view
`GET /:parent/:id/:sub/:childId`	Look up one merged item

Registry packages are main-only. Packages like @cherrystudio/provider-registry ship the preset data Registry Services merge against. Renderer code must not import them. Two reasons:

Bundle waste. Registry packages are large (preset catalogs, vendor metadata, icons). Importing them in the renderer ships the same payload twice — once in the main bundle, once in every renderer entry that touches it — for data the renderer already gets via DataApi.
Merge already lives in main. Registry Services merge preset + DB rows on the main side. Re-doing the merge in the renderer duplicates logic and re-introduces preset-version drift this layer was designed to remove.

The merged result reaches the renderer exclusively through these endpoints.

Error Handling

Using DataApiErrorFactory

import { DataApiErrorFactory } from '@shared/data/api'

// Not found
throw DataApiErrorFactory.notFound('Topic', id)

// Validation error
throw DataApiErrorFactory.validation({
  name: ['Name is required', 'Name must be at least 3 characters'],
  email: ['Invalid email format']
})

// Database error
try {
  await db.insert(table).values(data)
} catch (error) {
  throw DataApiErrorFactory.database(error, 'insert topic')
}

// Invalid operation
throw DataApiErrorFactory.invalidOperation(
  'delete root message',
  'cascade=true required'
)

// Conflict
throw DataApiErrorFactory.conflict('Topic name already exists')

// Timeout
throw DataApiErrorFactory.timeout('fetch topics', 3000)

Adding New Endpoints

Step-by-Step

Define schema in src/shared/data/api/schemas/

// schemas/topic.ts
export type TopicSchemas = {
  '/topics': {
    GET: { response: PaginatedResponse<Topic> }
    POST: { body: CreateTopicDto; response: Topic }
  }
}

Register schema in schemas/index.ts

export type ApiSchemas = AssertValidSchemas<TopicSchemas & MessageSchemas>

Create service in services/
Implement handler in handlers/
Register handler in handlers/index.ts

Best Practices

Keep handlers thin: Only extract params and call services
Put logic in services: All business rules and data access belong in services
Do NOT create separate Repository files: Services own data access directly via Drizzle ORM
Always use .returning(): Get inserted/updated data without re-querying
Support transactions: Accept optional tx parameter in service methods
Validate in services: Business validation belongs in the service layer
Use error factory: Consistent error creation with DataApiErrorFactory
Use nullsToUndefined in rowToEntity: Canonical SQLite NULL → undefined translation; shallow, not recursive (see Row → Entity Mapping)

21 KiB Raw Blame History

DataApi in Main Process

Architecture Layers

Transport Adapters

Implementing Handlers

Location

Handler Responsibilities

Handler Type Annotation

Example Handler

Register Handlers

Implementing Services

Location

Service Responsibilities

Cross-Service Table Access

Breaking a circular dependency (dataServiceRegistry)

Example Service

Write-path defaults

Row → Entity Mapping

Service with Transaction

Transaction Method Naming

Repository Pattern (Strongly Discouraged)

Registry Services (Supplementary)

Registry Sub-Resource Endpoints

Error Handling

Using DataApiErrorFactory

Adding New Endpoints

Step-by-Step

Best Practices

21 KiB

Raw Blame History

Breaking a circular dependency (`dataServiceRegistry`)