mirror of https://github.com/CherryHQ/cherry-studio.git synced 2026-07-03 20:59:22 +08:00

Files

SuYao c4bae482df feat(read-file): agentic read_file tool for chat attachments (#16257 )

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: fullex <106392080+0xfullex@users.noreply.github.com>
Signed-off-by: suyao <sy20010504@gmail.com>

2026-06-27 18:47:51 +08:00

10 KiB

Raw Permalink Blame History

AI Reference

This is the entry point for the AI pipeline in Cherry Studio v2 — the main-process service that owns every LLM call (chat streams, agent loops, translate, summarisation) and the renderer-side transport that connects to it.

Top-level architecture

Document	What it covers
Core Architecture	End-to-end call flow: `Ai_Stream_Open` IPC → context provider → AiStreamManager → Agent loop → `@ai-sdk/*` → broadcast / persist
Stream Manager	Active-stream registry, listeners, reconnect, abort, abort-and-restart steering, persistence backends
Agent Session Runtime	Agent-session host/driver split, `pendingTurns` follow-up queue, resume token persistence, Claude Code driver fallback
Adapter Family	How `provider.endpointConfigs[ep].adapterFamily` picks the right `@ai-sdk/*` package per request

Subsystems

Document	What it covers
Agent Loop	Main-process `Agent.stream()`: single-pass stream, hook composition, observer pattern, error/abort semantics
Params Pipeline	`buildAgentParams` + `RequestFeature` model: how capabilities, plugins, tools, and provider-specific quirks are composed
Tool Registry	Built-in tools (knowledge / web search), MCP tools, meta-tools (`tool_search` / `tool_inspect` / `tool_invoke` / `tool_exec`), deferred exposition
Chat Attachments	How attached files reach the model: native file parts when supported, capped extracted text otherwise, `read_file` for overflow paging
Provider Resolution	`Provider.endpointConfigs` schema, endpoint resolution chain, variant suffixes, custom provider extensions (aihubmix, newapi)
Observability (trace / telemetry)	`AiSdkSpanAdapter`, root span propagation, OTel attribute shape, local span projection, sinks

Renderer-side glue

Document	What it covers
IPC Transport	`useChat` + `IpcChatTransport`: `sendMessages` / `reconnectToStream`, dispatch coordinator, topic-status mirror
Execution Overlay	`TopicStreamSubscription` + `useExecutionOverlay`: ref-counted attach, execution + anchor demux, one-shot `readUIMessageStream` per turn (the renderer half of the same merge function Main uses)
Tool Approval	Approval registry, Main-as-writer model, persistent decisions, `useToolApproval` hook

Where the code lives

Scope of the focused docs. The reference documents in this folder map the chat / stream pipeline (dispatch → stream manager → runtime → tools → persistence → renderer transport). The agents/, channels/, skills/, and mcp/ subsystems are mapped in the tree below but do not yet have dedicated deep-dive docs.

src/main/ai/
├── AiService.ts                  ← lifecycle owner, IPC handlers (generate / translate / approval)
├── runtime/                      ← AI execution backends + runtime registry
│   ├── aiSdk/                    ← Agent class, loop, observers, params/features, prompts/
│   └── claudeCode/               ← Claude Code driver, warm query, SDK adapter
├── agentSession/                 ← agent-session topic host
│   └── AgentSessionRuntimeService.ts
├── agents/                       ← AgentJobsService, AgentTaskJobHandler, runAgentTask, builtin/, cherryclaw/
├── channels/                     ← ChannelManager + IM adapters (discord/feishu/qq/slack/telegram/wechat) + security/
├── streamManager/                ← AiStreamManager + listeners + persistence backends
│   ├── AiStreamManager.ts        ← registers the stream IPC (Open/Attach/Detach/Abort)
│   ├── context/                  ← ChatContextProvider implementations + dispatch
│   ├── lifecycle/                ← chat / prompt-only stream lifecycles
│   ├── listeners/                ← WebContents / Persistence / SSE / channel-adapter
│   ├── persistence/              ← MessageService / TemporaryChat / Translation backends
│   └── pipeStreamLoop.ts         ← shared chunk-pipe primitive
├── provider/                     ← provider config, endpoint resolution, custom providers
│   ├── custom/                   ← aihubmix, newapi
│   ├── config.ts                 ← providerToAiSdkConfig (builder table)
│   ├── endpoint.ts               ← resolveEffectiveEndpoint + adapterFamily routing
│   ├── extensions/               ← ProviderExtension registrations
│   └── listModels.ts             ← per-provider model listing
├── mcp/                          ← McpRuntimeService / McpCatalogService, oauth/, built-in servers
│   └── servers/                  ← in-memory MCP server implementations (browser, filesystem)
├── skills/                       ← SkillService, SkillInstaller
├── tools/                        ← unified tool registry
│   └── adapters/
│       ├── aiSdk/                ← registry.ts, repair.ts; builtin/ (web_search/web_fetch/kb_*),
│       │                            mcp/ (server → ToolEntry sync), meta/ (tool_search/inspect/invoke;
│       │                            tool_exec defined but not injected), exposition/ (shouldDefer + applyDefer)
│       └── claudeCode/           ← agentTools.ts (registry → Claude Code runtime)
├── observability/                ← AI trace adapters (aiSdk / claudeCode), local projection, sinks
├── messages/                     ← UI part → AI SDK part conversion
├── types/                        ← AppProviderId, merged extension types, request types
└── utils/                        ← reasoning / model parameters / options / websearch helpers

How a chat turn flows

Renderer useChat({ transport: IpcChatTransport }) calls sendMessages → IPC Ai_Stream_Open ({ topicId, trigger, userMessageParts, parentAnchorId?, mentionedModelIds? }).
AiStreamManager.onInit registered the Ai_Stream_Open handler; it wraps the sender in a WebContentsListener and calls dispatchStreamRequest(manager, subscriber, req). (The stream IPC — Open/Attach/Detach/Abort — lives on AiStreamManager, not AiService.)
dispatchStreamRequest picks the first ChatContextProvider whose canHandle(topicId) matches (persistent chat / temporary / agent session) and calls prepareDispatch — that resolves models, persists the user message, builds listeners, and returns a PreparedDispatch.
AiStreamManager.send(input) starts a turn (no active stream): creates an ActiveStream, launches one StreamExecution per model. (A chat resubmit on a live topic is persisted + queued as a steer and takes the inject path — the running turn yields and onExecutionDone chains a continuation; an agent-session follow-up also injects, upserting listeners.)
Each execution's runExecutionLoop calls AiService.streamText(request, signal), which builds params (buildAgentParams) and constructs an Agent composing hooks from RequestFeature[] (anthropic cache, gateway usage normalisation, reasoning extraction, …), then calls agent.stream(messages, signal) to open the AI SDK stream and yield UIMessageChunks. Agent-session runtime requests are the exception: AiService.streamText routes them to AgentSessionRuntimeService.openTurnStream() so the registered driver can own the concrete agent runtime.
pipeStreamLoop tees the chunk stream: one branch broadcasts to listeners (WebContents / SSE / channel-adapter / persistence), one branch runs readUIMessageStream to accumulate a CherryUIMessage snapshot.
On terminal (done / error / aborted / paused-for-approval), listeners get a typed terminal callback. PersistenceListener writes the final message via the appropriate PersistenceBackend.
Renderer reads the persisted row through useQuery('/topics/:id/messages') and disposes its overlay.

Key invariants

Topic-level addressing. Every IPC and broadcast is keyed by topicId. A topic has at most one active stream; subscribers are equal — there's no "owner" window.
Main owns persistence. Renderer closing or crashing does not abort the stream and does not lose data — PersistenceListener writes on terminal regardless of who is listening.
Tool approval is Main-authoritative. The renderer never writes approved/denied parts. It posts the decision over IPC and re-reads the authoritative row. See Tool Approval.
Adapter family per endpoint, not per provider. Multi-endpoint relays (MiniMax, Silicon, AiHubMix, …) carry one adapterFamily per endpoint. Picking the SDK package never reads apiHost or provider id heuristics at request time. See Adapter Family.

Service Lifecycle — AiService extends BaseService
Data Layer — MessageService, ModelService, ProviderService (called from main-side AI code)
Messaging — CherryMessagePart, CherryUIMessage, parts model
Window Manager — WebContentsListener attaches to whatever windows are open

v2 refactor

The AI domain is the largest single area of the v2 refactor: the v1 renderer aiCore tree (formerly src/renderer/src/aiCore/, pre-v2 layout) is fully deleted, with logic ported into src/main/ai/.

These reference docs are self-contained — they do not depend on the throwaway v2-refactor-temp/ tree. (The reviewer-facing change-cluster narratives that live there are review logistics for the in-flight PR, and are removed when the v2 AI refactor merges.)

10 KiB Raw Permalink Blame History