mem1.wiki

Anti-patterns

Anti-pattern

Single-retriever pipeline

Trusting one retriever (usually dense vector) to handle every query shape. Works on demos, fails on long-tail queries — keyword-heavy, code, IDs, exact phrases.

Taxonomy
anti_patterns.retrieval
Severity
common
Symptom
Demo queries return great results, but production logs show clusters of "no relevant chunks" failures on lookups for product names, error codes, or exact phrases.
Root cause
Dense embeddings smear out lexical signals. Short queries containing rare tokens (SKUs, function names, version strings) lose those tokens in the embedding manifold and retrieve loosely related text instead.
Fix
Add a sparse retriever (BM25 or SPLADE) and fuse with RRF, or route queries through a classifier that picks the right retriever per shape.
First documented
2023

Why it keeps happening

The blog posts only show dense retrieval. Adding a second retriever feels like extra work for marginal gain, until you hit production traffic and see how many queries are literally one rare token.

Fix sketch