fix(knowledge): support Ollama knowledge embeddings (#14172)

### What this PR does

Before this PR:
- Creating a knowledge base with an Ollama embedding model such as
`nomic-embed-text` could fail during initialization with `Cannot read
properties of undefined (reading '0')`.

After this PR:
- Knowledge base initialization keeps using
`@cherrystudio/embedjs-ollama` instead of a Cherry Studio-only Ollama
adapter.
- The PR applies the Ollama response-handling hotfix in a scoped `pnpm`
patch for `@cherrystudio/embedjs-ollama`.
- The patched library handles the current `/api/embed` response shape
and falls back to the legacy `/api/embeddings` response during dimension
detection.
- A focused regression test covers configured dimensions, current Ollama
responses, and legacy fallback behavior.

Fixes #14168

### Why we need it and why it was done in this way

The following tradeoffs were made:
- Moved the fix from app-local code into a patched
`@cherrystudio/embedjs-ollama` dependency so Cherry Studio can keep
using the library abstraction on the frozen `main` branch.
- Kept support for both current and legacy Ollama embedding APIs because
the failure happens during dimension probing before any documents are
added.

The following alternatives were considered:
- Keeping the Cherry Studio-only `OllamaEmbeddings` adapter introduced
in the first revision of this PR.
- Opening an upstream `CherryHQ/embed-js` PR first and waiting for a
released package version before fixing Cherry Studio.
- Patching `@langchain/ollama` or widening shared dependency changes on
`main`.

Links to places where the discussion took place:
https://github.com/CherryHQ/cherry-studio/pull/14172#issuecomment-4229167072

### Breaking changes

If this PR introduces breaking changes, please describe the changes and
the impact on users.

None.

### Special notes for your reviewer

- Per review, this PR no longer keeps a Cherry Studio-only Ollama
adapter; the fix now lives in the patched `@cherrystudio/embedjs-ollama`
dependency.
- `pnpm exec vitest run
src/main/knowledge/embedjs/embeddings/__tests__/OllamaEmbeddings.test.ts`
passed.
- `pnpm lint` completed successfully.
- `pnpm test` still fails in pre-existing CherryClaw prompt tests under
`src/main/services/agents/services/cherryclaw/__tests__/prompt.test.ts`;
these failures are unrelated to this hotfix.

### Checklist

This checklist is not enforcing, but it's a reminder of items that could
be relevant to every PR.
Approvers are expected to review this list.

- [x] PR: The PR description is expressive enough and will help future
contributors
- [x] Code: [Write code that humans can
understand](https://en.wikiquote.org/wiki/Martin_Fowler#code-for-humans)
and [Keep it simple](https://en.wikipedia.org/wiki/KISS_principle)
- [x] Refactor: You have [left the code cleaner than you found it (Boy
Scout
Rule)](https://learning.oreilly.com/library/view/97-things-every/9780596809515/ch08.html)
- [x] Upgrade: Impact of this change on upgrade flows was considered and
addressed if required
- [x] Documentation: A [user-guide update](https://docs.cherry-ai.com)
was considered and is present (link) or not required. Check this only
when the PR introduces or changes a user-facing feature or behavior.
- [x] Self-review: I have reviewed my own code (e.g., via
[`/gh-pr-review`](/.claude/skills/gh-pr-review/SKILL.md), `gh pr diff`,
or GitHub UI) before requesting review from others

### Release note

```release-note
Fix knowledge base creation for Ollama embedding models such as `nomic-embed-text` by handling current and legacy Ollama embedding API responses through `@cherrystudio/embedjs-ollama` during dimension detection.
```

---------

Signed-off-by: 404-Page-Found <Lucas20220605@gmail.com>
Co-authored-by: SuYao <sy20010504@gmail.com>
This commit is contained in:
404-Page-Found
2026-04-17 20:33:07 +10:00
committed by GitHub
parent 41554411d1
commit b85c28d50c
4 changed files with 138 additions and 16 deletions

View File

@@ -153,7 +153,7 @@
"@cherrystudio/embedjs-loader-sitemap": "0.1.31",
"@cherrystudio/embedjs-loader-web": "0.1.31",
"@cherrystudio/embedjs-loader-xml": "0.1.31",
"@cherrystudio/embedjs-ollama": "0.1.31",
"@cherrystudio/embedjs-ollama": "0.1.35",
"@cherrystudio/embedjs-openai": "0.1.31",
"@cherrystudio/embedjs-utils": "0.1.31",
"@cherrystudio/extension-table-plus": "workspace:^",

45
pnpm-lock.yaml generated
View File

@@ -289,8 +289,8 @@ importers:
specifier: 0.1.31
version: 0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
'@cherrystudio/embedjs-ollama':
specifier: 0.1.31
version: 0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
specifier: 0.1.35
version: 0.1.35(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
'@cherrystudio/embedjs-openai':
specifier: 0.1.31
version: 0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))(ws@8.20.0)
@@ -2043,6 +2043,9 @@ packages:
'@cherrystudio/embedjs-interfaces@0.1.31':
resolution: {integrity: sha512-JGlRoxU09ycemhcGvimE3oy2GkpTGBDFdfV3HA3Zwaaqw36QytP9oY+kmowS26YrB6xWuulBBuKwn9o6j4J+sw==}
'@cherrystudio/embedjs-interfaces@0.1.35':
resolution: {integrity: sha512-xia4Tl2a3sM7jqsZeakkVi8rFZh6XFHYTLHQyAYQWhPsKpqrbdbvyiLhtKTnR1kbbDJqp/zwKShngfHUEQyCdw==}
'@cherrystudio/embedjs-libsql@0.1.31':
resolution: {integrity: sha512-bQMm5cak8obRubOQndQm//vqHJKENBTOgjeNi83JcWadJtssY3jjgjcGLy1eLagoj5gZf1B3dMtqLNeiquZpGA==}
@@ -2070,8 +2073,8 @@ packages:
'@cherrystudio/embedjs-loader-xml@0.1.31':
resolution: {integrity: sha512-0x6akVc6qbw9oBK1JOfZNKIe9OlRSfe5OUVbEEXluYnpjlsgCIvHVIaxbccXi/ZefkGUUGNqSVExWM53VJHbeQ==}
'@cherrystudio/embedjs-ollama@0.1.31':
resolution: {integrity: sha512-W89MQfPkWAix/4si5GBTngmLQG0kVB3YKgkIfQtLoz65WUxthU71dKZgLHdMZb0YXVgQ76MPIQpzpHGIXW+hzw==}
'@cherrystudio/embedjs-ollama@0.1.35':
resolution: {integrity: sha512-FsqkjHrzHIp4DwJvW7ckUAhAmNf0W6SyaNYwJ81hNGTLVgMV5E+uPtOhta/XihocBSxESAfUVoI8otUT0Vz6bA==}
'@cherrystudio/embedjs-openai@0.1.31':
resolution: {integrity: sha512-j2S1jlQA7RTJhqF1vySx92MBC1r9/TvOPvlCRxOPv72TlHdZl48NQgKa6qjSFIXMJYjRKzezX+AyLgxL2GXJMA==}
@@ -14000,6 +14003,19 @@ snapshots:
- openai
- supports-color
'@cherrystudio/embedjs-interfaces@0.1.35(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))':
dependencies:
'@langchain/core': 1.0.2(patch_hash=8dc787a82cebafe8b23c8826f25f29aca64fc8b43a0a1878e0010782e4da96ed)(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
debug: 4.4.3
md5: 2.3.0
uuid: 11.1.0
transitivePeerDependencies:
- '@opentelemetry/api'
- '@opentelemetry/exporter-trace-otlp-proto'
- '@opentelemetry/sdk-trace-base'
- openai
- supports-color
'@cherrystudio/embedjs-libsql@0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))':
dependencies:
'@cherrystudio/embedjs-interfaces': 0.1.30(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
@@ -14137,9 +14153,9 @@ snapshots:
- openai
- supports-color
'@cherrystudio/embedjs-ollama@0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))':
'@cherrystudio/embedjs-ollama@0.1.35(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))':
dependencies:
'@cherrystudio/embedjs-interfaces': 0.1.31(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
'@cherrystudio/embedjs-interfaces': 0.1.35(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
'@langchain/core': 1.0.2(patch_hash=8dc787a82cebafe8b23c8826f25f29aca64fc8b43a0a1878e0010782e4da96ed)(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
'@langchain/ollama': 0.1.6(@langchain/core@1.0.2(patch_hash=8dc787a82cebafe8b23c8826f25f29aca64fc8b43a0a1878e0010782e4da96ed)(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4)))
debug: 4.4.3
@@ -14230,6 +14246,11 @@ snapshots:
ws: 8.20.0
zod: 4.3.4
'@cherrystudio/openai@6.15.0(ws@8.20.0)(zod@4.3.6)':
optionalDependencies:
ws: 8.20.0
zod: 4.3.6
'@chevrotain/cst-dts-gen@11.1.2':
dependencies:
'@chevrotain/gast': 11.1.2
@@ -15227,7 +15248,7 @@ snapshots:
openapi-types: 12.1.3
uuid: 10.0.0
yaml: 2.8.2
zod: 4.3.4
zod: 4.3.6
optionalDependencies:
cheerio: 1.1.2
langsmith: 0.4.4(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
@@ -15307,7 +15328,7 @@ snapshots:
ollama: 0.5.18
uuid: 10.0.0
zod: 3.25.76
zod-to-json-schema: 3.25.1(zod@3.25.76)
zod-to-json-schema: 3.25.2(zod@3.25.76)
'@langchain/openai@0.3.17(@langchain/core@1.0.2(patch_hash=8dc787a82cebafe8b23c8826f25f29aca64fc8b43a0a1878e0010782e4da96ed)(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4)))(ws@8.20.0)':
dependencies:
@@ -15341,8 +15362,8 @@ snapshots:
dependencies:
'@langchain/core': 1.0.2(patch_hash=8dc787a82cebafe8b23c8826f25f29aca64fc8b43a0a1878e0010782e4da96ed)(@opentelemetry/api@1.9.0)(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(openai@6.15.0(ws@8.20.0)(zod@4.3.4))
js-tiktoken: 1.0.21
openai: '@cherrystudio/openai@6.15.0(ws@8.20.0)(zod@4.3.4)'
zod: 4.3.4
openai: '@cherrystudio/openai@6.15.0(ws@8.20.0)(zod@4.3.6)'
zod: 4.3.6
transitivePeerDependencies:
- ws
@@ -26537,6 +26558,10 @@ snapshots:
dependencies:
zod: 4.3.4
zod-to-json-schema@3.25.2(zod@3.25.76):
dependencies:
zod: 3.25.76
zod-to-json-schema@3.25.2(zod@4.3.4):
dependencies:
zod: 4.3.4

View File

@@ -20,12 +20,9 @@ export default class EmbeddingsFactory {
}
if (provider === 'ollama') {
return new OllamaEmbeddings({
model: model,
model,
baseUrl: baseURL.replace(/\/api$/, ''),
requestOptions: {
// @ts-ignore expected
'encoding-format': 'float'
}
dimensions
})
}
// NOTE: Azure OpenAI 也走 OpenAIEmbeddings, baseURL是https://xxxx.openai.azure.com/openai/v1

View File

@@ -0,0 +1,100 @@
import { OllamaEmbeddings } from '@cherrystudio/embedjs-ollama'
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'
const { fetchMock } = vi.hoisted(() => {
const fetchMock = vi.fn()
return { fetchMock }
})
describe('OllamaEmbeddings', () => {
beforeEach(() => {
fetchMock.mockReset()
vi.stubGlobal('fetch', fetchMock)
})
afterEach(() => {
vi.unstubAllGlobals()
vi.clearAllMocks()
})
it('returns configured dimensions without probing Ollama', async () => {
const embeddings = new OllamaEmbeddings({
model: 'nomic-embed-text',
baseUrl: 'http://localhost:11434',
dimensions: 768
})
await expect(embeddings.getDimensions()).resolves.toBe(768)
expect(fetchMock).not.toHaveBeenCalled()
})
it('uses the current /api/embed response shape for dimension probing', async () => {
fetchMock.mockResolvedValue(
new Response(
JSON.stringify({
embeddings: [[0.1, 0.2, 0.3, 0.4]]
})
)
)
const embeddings = new OllamaEmbeddings({
model: 'nomic-embed-text',
baseUrl: 'http://localhost:11434'
})
await expect(embeddings.getDimensions()).resolves.toBe(4)
expect(fetchMock).toHaveBeenCalledWith(
'http://localhost:11434/api/embed',
expect.objectContaining({
method: 'POST',
body: JSON.stringify({
model: 'nomic-embed-text',
input: 'sample'
})
})
)
})
it('falls back to legacy /api/embeddings when /api/embed fails', async () => {
fetchMock
.mockResolvedValueOnce(new Response(JSON.stringify({ error: 'not found' }), { status: 404 }))
.mockResolvedValueOnce(new Response(JSON.stringify({ embedding: [1, 2, 3] })))
.mockResolvedValueOnce(new Response(JSON.stringify({ embedding: [4, 5, 6] })))
const embeddings = new OllamaEmbeddings({
model: 'nomic-embed-text',
baseUrl: 'http://localhost:11434'
})
await expect(embeddings.embedDocuments(['first', 'second'])).resolves.toEqual([
[1, 2, 3],
[4, 5, 6]
])
expect(fetchMock).toHaveBeenNthCalledWith(1, 'http://localhost:11434/api/embed', expect.anything())
expect(fetchMock).toHaveBeenNthCalledWith(
2,
'http://localhost:11434/api/embeddings',
expect.objectContaining({
body: JSON.stringify({
model: 'nomic-embed-text',
prompt: 'first',
keep_alive: undefined,
options: undefined
})
})
)
expect(fetchMock).toHaveBeenNthCalledWith(
3,
'http://localhost:11434/api/embeddings',
expect.objectContaining({
body: JSON.stringify({
model: 'nomic-embed-text',
prompt: 'second',
keep_alive: undefined,
options: undefined
})
})
)
})
})