Chunk-based RAG is broken for structured documents. The fix is simpler than you think - and faster than the original. Chunk-based RAG is broken for structured documents. The fix is simpler than you think - and faster than the original. A few weeks ago, I came across an article by Agent Native about vectorless RAG. The framing stuck with me: most RAG systems turn documents into “semantic confetti” — chunk everything, embed everything, then hope an ANN search surfaces the right bits. For large document bases, this becomes semantic hide-and-seek across thousands of chunks, burning tokens and confidently hallucinating near the answer. Agent Native Upon digging deeper, PageIndex from VectifyAI had the perfect implementation as an alternative approach. Instead of embedding chunks, it treats the document’s own heading structure as the retrieval primitive. Represent the document as a hierarchical tree, hand the outline to your LLM, let it navigate to the right section, pull that section’s text. No embeddings. No ANN. Just the document telling you how it’s organized. PageIndex I had been building agents over financial documents and hitting exactly this problem. I tried PageIndex, it worked, and then I rewrote it in Rust. This is the story of what happened. Why chunk-based RAG fails on structured documents Why chunk-based RAG fails on structured documents Take a 10-K filing. It has a section on risk factors, inside which there’s a subsection on liquidity risk, inside which there’s a paragraph about covenant breaches. When you split this into 512-token chunks, those three levels of context get shattered. The chunk about covenant breaches no longer knows it’s inside liquidity risk, which is inside risk factors. At query time, “what are the company’s covenant breach risks” might surface three chunks from different sections that share vocabulary but don’t form a coherent answer. The retrieval is technically close but contextually wrong. You end up with an LLM that has all the right words and none of the right context. Structured documents — financial reports, legal filings, technical manuals, research papers — already tell you how they’re organized. Every heading is a natural retrieval boundary. PageIndex just respects that structure. How PageIndex works How PageIndex works The approach is straightforward. Parse the markdown document into a tree of nodes, one per heading. Each node holds its title, body text, and children. Generate a compact outline of the tree. At query time: Send the outline to your LLM with the question
Ask it to return the node ID of the most relevant section
Fetch that node directly
Pass the node’s text to your LLM for the final answer Send the outline to your LLM with the question Ask it to return the node ID of the most relevant section Fetch that node directly Pass the node’s text to your LLM for the final answer The outline looks like this: [1] Annual Report 2023
[1.1] Financial Highlights
[1.2] Risk Factors
[1.2.1] Market Risk
[1.2.2] Liquidity Risk
[1.2.3] Regulatory Risk
[1.3] Management Discussion [1] Annual Report 2023
[1.1] Financial Highlights
[1.2] Risk Factors
[1.2.1] Market Risk
[1.2.2] Liquidity Risk
[1.2.3] Regulatory Risk
[1.3] Management Discussion The LLM reads this and says “1.2.2” — you fetch that node and you’re done. Precise, explainable, and no embedding infrastructure required. 1.2.2 VectifyAI’s Mafin 2.5 system, powered by PageIndex, achieved 98.7% accuracy on the FinanceBench benchmark. That’s the practical proof that the approach works at scale. Why I rewrote it in Rust Why I rewrote it in Rust A few reasons. I had already built fastrustrag — a Rust library for document deduplication that achieved 8–121x speedups over Python’s datasketch — so I had the toolchain and the workflow ready. I was also skeptical that the Python implementation would hold up under load, specifically for the index build and node retrieval operations that happen on every query. fastrustrag Before writing a line of Rust I validated that there was actually a performance problem worth solving. The methodology I’ve been using for these projects: always benchmark the Python implementation first, identify the bottleneck, then build the Rust version. Don’t rewrite things for fun. Don’t rewrite things for fun For PageIndex specifically, the bottleneck I expected was node retrieval. The Python library stores nodes in a flat list and does a linear scan to find a node by ID. That’s O(n). At 28 nodes it’s fine. At 765 nodes across a large document corpus it becomes measurably slow and, more importantly, wildly inconsistent at the tail. Building pageindex-rs Building pageindex-rs The Rust implementation follows the same architecture: parse markdown into a tree, assign dot-notation node IDs (1.2.3 rather than 0012), store nodes in a HashMap for O(1) lookup, expose everything to Python via PyO3. 1.2.3 0012 The dot-notation IDs turned out to matter more than I expected. When you show an LLM an outline with IDs like 1.2.3, it immediately understands the hierarchy — 1.2.3 is a child of 1.2, which is a child of 1. With zero-padded sequential IDs like 0012, the LLM just sees a number with no structural signal. This affected retrieval accuracy in the benchmarks, which I’ll get to. 1.2.3 1.2.3 1.2 1 0012 The Python API looks like this: import pageindex_rs
index = pageindex_rs.PageIndex.from_file("annual_report", "report.md")
# Feed this to your LLM
print(index.outline())
# [1] Annual Report 2023
# [1.1] Financial Highlights
# [1.2] Risk Factors
# [1.2.1] Market Risk
# [1.2.2] Liquidity Risk
# Fetch the node your LLM returned
node = index.get_node("1.2.2")
print(node.title) # Liquidity Risk
print(node.text) # The company's liquidity position…
print(node.breadcrumb) # ['Risk Factors', 'Liquidity Risk']
# Get a full section with all subsections merged
section = index.get_node_with_children("1.2") import pageindex_rs
index = pageindex_rs.PageIndex.from_file("annual_report", "report.md")
# Feed this to your LLM
print(index.outline())
# [1] Annual Report 2023
# [1.1] Financial Highlights
# [1.2] Risk Factors
# [1.2.1] Market Risk
# [1.2.2] Liquidity Risk
# Fetch the node your LLM returned
node = index.get_node("1.2.2")
print(node.title) # Liquidity Risk
print(node.text) # The company's liquidity position…
print(node.breadcrumb) # ['Risk Factors', 'Liquidity Risk']
# Get a full section with all subsections merged
section = index.get_node_with_children("1.2") The retrieval loop is a handful of lines: outline = index.outline()
node_id = llm(f"""
Document outline:
{outline}
Question: {user_query}
Return only the node_id of the most relevant section. Nothing else.
""").strip()
result = index.get_node(node_id)
# Pass result.text to your LLM for the final answer outline = index.outline()
node_id = llm(f"""
Document outline:
{outline}
Question: {user_query}
Return only the node_id of the most relevant section. Nothing else.
""").strip()
result = index.get_node(node_id)
# Pass result.text to your LLM for the final answer The benchmarks The benchmarks I ran three benchmark suites across three document sizes — a 42KB single article, a 395KB multi-article corpus, and a 1055KB large corpus. 500 iterations per build test, 1000 random lookups per retrieval test. The full notebook is in the repo. Index build speed Index build speed Document size

Rust mean

Python mean

Speedup



42 KB

0.207 ms

0.153 ms

0.74x ❌



395 KB

0.873 ms

1.369 ms

1.57x



1055 KB

2.549 ms

4.278 ms

1.68x Document size

Rust mean

Python mean

Speedup



42 KB

0.207 ms

0.153 ms

0.74x ❌



395 KB

0.873 ms

1.369 ms

1.57x



1055 KB

2.549 ms

4.278 ms

1.68x Document size

Rust mean

Python mean

Speedup Document size Document size Rust mean Rust mean Python mean Python mean Speedup Speedup 42 KB

0.207 ms

0.153 ms

0.74x ❌ 42 KB 42 KB 0.207 ms 0.207 ms 0.153 ms 0.153 ms 0.74x ❌ 0.74x ❌ 395 KB

0.873 ms

1.369 ms

1.57x 395 KB 395 KB 0.873 ms 0.873 ms 1.369 ms 1.369 ms 1.57x 1.57x 1055 KB

2.549 ms

4.278 ms

1.68x 1055 KB 1055 KB 2.549 ms 2.549 ms 4.278 ms 4.278 ms 1.68x 1.68x 1.68x Below ~200KB, PyO3 FFI overhead cancels the parsing speedup — Rust actually loses at small scale. I’m reporting this honestly because benchmarks that only show wins aren’t useful. At realistic document sizes the picture flips. The more important number is consistency. This is what production systems actually care about: Document size

Rust p99

Python p99

Rust max

Python max



42 KB

1.3 ms

0.2 ms

17.4 ms

0.4 ms



395 KB

1.1 ms

1.5 ms

1.3 ms

1.6 ms



1055 KB

2.8 ms

21.0 ms

3.7 ms

42.9 ms Document size

Rust p99

Python p99

Rust max

Python max



42 KB

1.3 ms

0.2 ms

17.4 ms

0.4 ms



395 KB

1.1 ms

1.5 ms

1.3 ms

1.6 ms



1055 KB

2.8 ms

21.0 ms

3.7 ms

42.9 ms Document size

Rust p99

Python p99

Rust max

Python max Document size Document size Rust p99 Rust p99 Python p99 Python p99 Rust max Rust max Python max Python max 42 KB

1.3 ms

0.2 ms

17.4 ms

0.4 ms 42 KB 42 KB 1.3 ms 1.3 ms 0.2 ms 0.2 ms 17.4 ms 17.4 ms 0.4 ms 0.4 ms 395 KB

1.1 ms

1.5 ms

1.3 ms

1.6 ms 395 KB 395 KB 1.1 ms 1.1 ms 1.5 ms 1.5 ms 1.3 ms 1.3 ms 1.6 ms 1.6 ms 1055 KB

2.8 ms

21.0 ms

3.7 ms

42.9 ms 1055 KB 1055 KB 2.8 ms 2.8 ms 2.8 ms 21.0 ms 21.0 ms 21.0 ms 3.7 ms 3.7 ms 3.7 ms 42.9 ms 42.9 ms 42.9 ms At 1055KB, Python’s p99 is 21ms and its max is 42ms. Rust’s p99 is 2.8ms and max is 3.7ms. Python’s standard deviation at that size is 2.78ms versus Rust’s 0.10ms — 27x more variable. In a pipeline processing hundreds of documents those spikes accumulate into real latency. Node retrieval speed Node retrieval speed This is where the O(1) vs O(n) gap shows most clearly: Document size

Nodes

Rust mean

Python mean

Speedup



42 KB

28

0.0072 ms

0.0060 ms

0.83x



395 KB

261

0.0119 ms

0.0272 ms

2.29x



1055 KB

765

0.0216 ms

0.0686 ms

3.18x Document size

Nodes

Rust mean

Python mean

Speedup



42 KB

28

0.0072 ms

0.0060 ms

0.83x



395 KB

261

0.0119 ms

0.0272 ms

2.29x



1055 KB

765

0.0216 ms

0.0686 ms

3.18x Document size

Nodes

Rust mean

Python mean

Speedup Document size Document size Nodes Nodes Rust mean Rust mean Python mean Python mean Speedup Speedup 42 KB

28

0.0072 ms

0.0060 ms

0.83x 42 KB 42 KB 28 28 0.0072 ms 0.0072 ms 0.0060 ms 0.0060 ms 0.83x 0.83x 395 KB

261

0.0119 ms

0.0272 ms

2.29x 395 KB 395 KB 261 261 0.0119 ms 0.0119 ms 0.0272 ms 0.0272 ms 2.29x 2.29x 1055 KB

765

0.0216 ms

0.0686 ms

3.18x 1055 KB 1055 KB 765 765 0.0216 ms 0.0216 ms 0.0686 ms 0.0686 ms 3.18x 3.18x 3.18x At 28 nodes, linear scan is fast enough that the HashMap overhead tips Rust slightly negative. At 765 nodes, Rust is 3.18x faster. The gap keeps widening — at 5000 nodes in a combined corpus it would be around 10x. Answer accuracy Answer accuracy I tested both on 10 financial questions against a ~3MB document corpus using the same LLM for both: Implementation

Correct



pageindex-rs

9 / 10



PageIndex (Python)

7 / 10 Implementation

Correct



pageindex-rs

9 / 10



PageIndex (Python)

7 / 10 Implementation

Correct Implementation Implementation Correct Correct pageindex-rs

9 / 10 pageindex-rs pageindex-rs 9 / 10 9 / 10 PageIndex (Python)

7 / 10 PageIndex (Python) PageIndex (Python) 7 / 10 7 / 10 The accuracy difference comes down to node ID format. 1.2.3 gives the LLM structural signal for free. 0012 does not. Small design decisions compound. 1.2.3 0012 What I learned What I learned Benchmark before you build. The small document results prove that Rust isn’t automatically faster — FFI overhead is real and it dominates at small scales. If your documents are consistently under 200KB, the Python library is probably fine. Benchmark before you build. Consistency matters more than mean speed. The headline speedup numbers are nice but the stdev and p99 tell the real story for production. A system that’s 1.68x faster on average but 27x more consistent in stdev is a much better choice than the mean alone suggests. Consistency matters more than mean speed. Node ID design affects LLM behavior. I didn’t expect the dot-notation change to move accuracy by two questions out of ten, but it did. How you present structure to an LLM matters in ways that are hard to predict without actually running the experiment. Node ID design affects LLM behavior. Try it Try it pip install pageindex-rs pip install pageindex-rs GitHub: https://github.com/Manojython/pageindex-rs
PyPI: https://pypi.org/project/pageindex-rs/
Original PageIndex by VectifyAI: https://github.com/VectifyAI/PageIndex
Agent Native’s article that started this: Vectorless RAG for Agents GitHub: https://github.com/Manojython/pageindex-rs https://github.com/Manojython/pageindex-rs PyPI: https://pypi.org/project/pageindex-rs/ https://pypi.org/project/pageindex-rs Original PageIndex by VectifyAI: https://github.com/VectifyAI/PageIndex https://github.com/VectifyAI/PageIndex Agent Native’s article that started this: Vectorless RAG for Agents Vectorless RAG for Agents Thanks for reading 😄 Thanks for reading 😄

This story contains new, firsthand information uncovered by the writer.

This story contains AI-generated text. The author has used AI either for research, to generate outlines, or write the text itself. 

I Rewrote a Python RAG Library in Rust

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Building AxonerAI: A Rust Framework for Agentic Systems

Chunking in RAG: The Key to Efficient, Accurate Retrieval

Revolutionizing Healthcare Through AI Innovation: A Transformative Implementation by Lakshman Kumar

Hallucinations by Design: Part 4 - Fine-tuning Your Way Out of Vector Nightmares

A New Breed of Chatbots Are Quietly Changing Product Management

RAG Systems Are Breaking the Barriers of Language Models: Here's How

Building AxonerAI: A Rust Framework for Agentic Systems

Chunking in RAG: The Key to Efficient, Accurate Retrieval

Revolutionizing Healthcare Through AI Innovation: A Transformative Implementation by Lakshman Kumar

Hallucinations by Design: Part 4 - Fine-tuning Your Way Out of Vector Nightmares

A New Breed of Chatbots Are Quietly Changing Product Management

RAG Systems Are Breaking the Barriers of Language Models: Here's How

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps