Files
HKUDS-RAG-Anything/docs/multimodal_rag_failure_modes.md
LarFii 30da9b6fa4 review: warn on partial env config, document MinerU-only scope, fix lint
- attach_public_media_urls now logs a single warning when only one of
  RAGANYTHING_PUBLIC_ASSET_BASE_URL / RAGANYTHING_PUBLIC_ASSET_STRIP_PREFIX
  is set, instead of silently no-oping. State resets when the env returns
  to either fully unset or fully set, so legitimate transitions still work.
- Strip prefix resolution is hoisted into _resolve_strip_prefix and
  skipped entirely when the path cannot be resolved.
- README and the failure-modes doc now flag that *_public_url is
  produced only on the MinerU parser path today; other parsers are not
  yet covered.
- tests: added warn-once cases for each misconfiguration shape and a
  silent case when both env vars are absent. Reformatted the file so
  pre-commit's ruff-format hook passes (this was the previously failing
  CI step on this PR).
2026-05-11 17:16:32 +08:00

3.4 KiB
Raw Permalink Blame History

Multimodal RAG: common failure modes and quick checks

Real-world PDFs and Office files are noisy. RAG-Anything (MinerU / Docling / etc.) usually does the right thing, but when something feels “off”, this page is a short sanity checklist before opening an issue.

It is meant to stay small and easy to maintain (see discussion in #207 and #213).


1. OCR or layout silently corrupts text

Symptoms: answers quote nonsense, wrong numbers, or mixed columns.

Quick checks:

  • Open the parsers Markdown or *_content_list.json for the same page and eyeball a few paragraphs.
  • Compare with the source PDF zoomed in (sometimes OCR confuses l/1, ligatures, or watermarks).

2. Table structure lost during preprocessing

Symptoms: merged cells, tables shown as plain lines, or wrong row order.

Quick checks:

  • Find the type: table blocks in content_list and see if table_body / structure fields look plausible.
  • If you only need retrieval, sometimes a flattened table is acceptable; if you need exact structure, try another parse method or parser (PARSE_METHOD, PARSER in .env).

3. Image ↔ caption misalignment

Symptoms: captions match the wrong figure, or images appear in the wrong order.

Quick checks:

  • Confirm img_path / captions exist in content_list and that page indices look consistent.
  • For DOCX pipelines, remember that exporting to PDF for OCR can drop inline math; OMML helpers in recent versions exist specifically to recover equations from the original Word file.

4. Retrieval biased toward text (image/table ignored)

Symptoms: the answer clearly needs a figure or table, but only text chunks rank high.

Quick checks:

  • Run a probe question that can only be answered from an image or table (not from surrounding text).
  • Inspect whether multimodal stages ran (ENABLE_*_PROCESSING flags) and whether your embedding path actually sees the enriched text (not only raw OCR).

5. “Everythings slow” vs “stuck”

Symptoms: long runs, high CPU/GPU.

Quick checks:

  • #48 style: large PDFs and OCR are inherently heavy—batch size and hardware matter.
  • Distinguish slow but progressing from a hang (then check subprocess timeouts, MinerU/Docling logs, and disk space).

6. Images not loading in your UI after indexing

Symptoms: chunks reference paths that only exist on the machine that ingested the data.

Quick checks:

  • Absolute local paths are normal for processing on one host. For a browser or another server, set public URL mapping (environment variables RAGANYTHING_PUBLIC_ASSET_BASE_URL and RAGANYTHING_PUBLIC_ASSET_STRIP_PREFIX — see the main README) so each chunk can carry an *_public_url field alongside the local path. Today this mapping only runs in the MinerU parser path; other parsers will not yet produce *_public_url.

When you open an issue

Including the items below speeds up triage:

  • Which modality failed (text / image / table / equation).
  • Which parser and PARSE_METHOD you used.
  • A minimal file (or a redacted excerpt) and one query that misbehaves.

This document is intentionally short. If you expand it, prefer adding links to reproducible examples rather than long narratives.