4.6 KiB
Fuzzy Search for File List
This document describes the fuzzy search implementation for file listing in Cherry Studio.
Overview
The fuzzy search feature allows users to find files by typing partial or approximate file names/paths. It uses a two-tier file filtering strategy (ripgrep glob pre-filtering with greedy substring fallback) combined with subsequence-based scoring for optimal performance and flexibility.
Features
- Ripgrep Glob Pre-filtering: Primary filtering using glob patterns for fast native-level filtering
- Greedy Substring Matching: Fallback file filtering strategy when ripgrep glob pre-filtering returns no results
- Subsequence-based Segment Scoring: During scoring, path segments gain additional weight when query characters appear in order
- Relevance Scoring: Results are sorted by a relevance score derived from multiple factors
Matching Strategies
1. Ripgrep Glob Pre-filtering (Primary)
The query is converted to a glob pattern for ripgrep to do initial filtering:
Query: "updater"
Glob: "*u*p*d*a*t*e*r*"
This leverages ripgrep's native performance for the initial file filtering.
2. Greedy Substring Matching (Fallback)
When the glob pre-filter returns no results, the system falls back to greedy substring matching. This allows more flexible matching:
Query: "updatercontroller"
File: "packages/update/src/node/updateController.ts"
Matching process:
1. Find "update" (longest match from start)
2. Remaining "rcontroller" → find "r" then "controller"
3. All parts matched → Success
Scoring Algorithm
Results are ranked by a relevance score based on named constants defined in FileStorage.ts:
| Constant | Value | Description |
|---|---|---|
SCORE_FILENAME_STARTS |
100 | Filename starts with query (highest priority) |
SCORE_FILENAME_CONTAINS |
80 | Filename contains exact query substring |
SCORE_SEGMENT_MATCH |
60 | Per path segment that matches query |
SCORE_WORD_BOUNDARY |
20 | Query matches start of a word |
SCORE_CONSECUTIVE_CHAR |
15 | Per consecutive character match |
PATH_LENGTH_PENALTY_FACTOR |
4 | Logarithmic penalty for longer paths |
Scoring Strategy
The scoring prioritizes:
- Filename matches (highest): Files where the query appears in the filename are most relevant
- Path segment matches: Multiple matching segments indicate stronger relevance
- Word boundaries: Matching at word starts (e.g., "upd" matching "update") is preferred
- Consecutive matches: Longer consecutive character sequences score higher
- Path length: Shorter paths are preferred (logarithmic penalty prevents long paths from dominating)
Example Scoring
For query updater:
| File | Score Factors |
|---|---|
RCUpdater.js |
Short path + filename contains "updater" |
updateController.ts |
Multiple segment matches |
UpdaterHelper.plist |
Long path penalty |
Configuration
DirectoryListOptions
interface DirectoryListOptions {
recursive?: boolean // Default: true
maxDepth?: number // Default: 10
includeHidden?: boolean // Default: false
includeFiles?: boolean // Default: true
includeDirectories?: boolean // Default: true
maxEntries?: number // Default: 20
searchPattern?: string // Default: '.'
fuzzy?: boolean // Default: true
}
Usage
// Basic fuzzy search
const files = await window.api.file.listDirectory(dirPath, {
searchPattern: 'updater',
fuzzy: true,
maxEntries: 20
})
// Disable fuzzy search (exact glob matching)
const files = await window.api.file.listDirectory(dirPath, {
searchPattern: 'update',
fuzzy: false
})
Performance Considerations
- Ripgrep Pre-filtering: Most queries are handled by ripgrep's native glob matching, which is extremely fast
- Fallback Only When Needed: Greedy substring matching (which loads all files) only runs when glob matching returns empty results
- Result Limiting: Only top 20 results are returned by default
- Excluded Directories: Common large directories are automatically excluded:
node_modules.gitdist,build.next,.nuxtcoverage,.cache
Implementation Details
The implementation is located in src/main/services/FileStorage.ts:
queryToGlobPattern(): Converts query to ripgrep glob patternisFuzzyMatch(): Subsequence matching algorithmisGreedySubstringMatch(): Greedy substring matching fallbackgetFuzzyMatchScore(): Calculates relevance scorelistDirectoryWithRipgrep(): Main search orchestration