AIMar 16, 20269 min read

RAG is mostly retrieval, and retrieval is mostly boring

Everyone tunes the prompt and ignores the retrieval, which is exactly backwards for a system that lives on what it can find.

RAG systems fail in the field for reasons that have almost nothing to do with the model. The model gets blamed because it's the part that talks. But when an assistant confidently answers the wrong policy, or insists a feature doesn't exist when it shipped last quarter, the model usually did its job: it answered the question using the context it was handed. The context was wrong. The retrieval put the wrong three paragraphs in front of it, and no amount of prompt engineering fixes a system that was looking in the wrong place.

This is the part nobody wants to own. Retrieval is unglamorous. It's chunking decisions and embedding choices and a reranker you forgot to turn on. There's no demo in it. So teams pour their attention into the prompt — the visible, tweakable, satisfying part — and leave the half of the system that actually determines the answer running on defaults.

The prompt is downstream of the retrieval

Think about what the model actually receives. A user asks a question, some retrieval step pulls a handful of passages, and those passages get pasted into the prompt above the question. The model then answers from what's in front of it. If the right passage never made it into that handful, the model has two options: refuse, or make something up. Most models, tuned to be helpful, make something up.

So the ceiling on answer quality is set before the model runs a single token. You can have the best system prompt ever written, and it will faithfully reason over garbage. I've watched a team spend two weeks rewriting instructions — "cite your sources," "say you don't know," "be concise" — to fix answers that were wrong because the retriever was returning the FAQ page for every query that contained the word "refund," regardless of what the question was actually about. The prompt was never the problem.

The uncomfortable implication: most of your evaluation budget belongs upstream. Before you ask "did the model answer well," ask "was the answer even in the passages we retrieved." If it wasn't, the model never had a chance, and grading its output tells you nothing.

Retrieval is boring on purpose

Here is the part that disappoints people. Fixing retrieval is rarely a clever idea. It's a sequence of dull, measurable improvements, each worth a few points of recall, that add up to a system that actually finds things.

→Chunk on structure, not on character count. Splitting a document every 500 characters cuts sentences in half and orphans the heading from the paragraph it governs. Split on sections, keep the heading with its body.
→Keep the metadata. A chunk that knows its document title, its date, and its product area can be filtered before it's ever scored. Half of "the retriever is bad" is actually "the retriever had no way to know this chunk was from a 2019 doc that's since been superseded."
→Add a reranker. Embedding similarity is a coarse first pass; a cross-encoder that reads the query and the passage together will reorder the top fifty into a top five that's dramatically better. This is the single highest-leverage change most systems are missing.
→Measure recall at k, not vibes. Build a set of real questions with known correct passages and check whether they show up in the top k. Without this number you are tuning blind.

None of that is exciting. All of it works. The reason retrieval stays broken in so many products is not that the fixes are hard — it's that they're tedious, and tedious work doesn't get prioritized until something is on fire.

You cannot prompt your way out of a passage that was never retrieved.

Hybrid search, because words still matter

The field swung hard toward dense vector search and quietly forgot that keyword matching was solving real problems. Embeddings are wonderful at "this means roughly the same thing." They are unreliable at exact tokens — a part number, an error code, a person's surname, the literal string ECONNRESET. Ask a vector index for "error ECONNRESET" and it will happily return passages about network errors in general, none of which contain the string you needed.

The fix is not to choose. Run both — dense retrieval for meaning, sparse keyword retrieval for exact terms — and merge the results before reranking. Hybrid search is more moving parts and it is worth every one of them, because real queries are a mix of concept and literal, and a system that only does one will be confidently wrong on the other half.

Show your work, or don't ship it

Because retrieval is where the answer is won or lost, it is also where the product earns or loses trust. If the assistant can show the passages it used, two things happen: the user can verify, and you can debug. When an answer is wrong, the first question is never "what was the prompt." It's "what did we retrieve." A product that surfaces its sources turns every wrong answer into a diagnosable retrieval bug instead of a mysterious model failure.

This is also the cheapest honesty mechanism you have. A confidence signal next to an answer, with the sources it leaned on, lets a user calibrate in a glance.

Generated answer

Refunds are issued within 5–7 business days to the original payment method.

Confidence

71%

↗ help.center/refunds↗ billing-policy-2025

Fig. 1 — The answer is only as trustworthy as the passages beside it.

The teams that get RAG right are not the ones with the cleverest prompts. They're the ones who treated retrieval as the actual product and accepted that the actual product is mostly plumbing. Spend your time where the answer is decided. That's upstream, in the boring part, where the right passage either makes it into the context or it doesn't.

#AI#RAG#SearchShare ↗

→ / AUTHOR

Ionut Dumitru

Full-stack engineer and product designer. Writes about building products where the engineering and the design are the same job.

GitHub ↗X ↗

→ / NEXT

EngineeringMar 9, 2026

The abstraction you add before you need it →

← All writingionutdumitru.com