Method

HyDE — Hypothetical Document Embeddings

Query expansion technique. Instead of embedding the query directly, ask the LLM to draft a hypothetical answer, then embed that. Often improves recall on short, abstract queries.

Source: https://arxiv.org/abs/2212.10496
Taxonomy: methods.retrieval
Origin: Gao et al. (2022)
Primary source: https://arxiv.org/abs/2212.10496
Domain: retrieval
Maturity: established
Primary artifacts: hypothetical answer, expanded query embedding

Core idea

Short queries embed poorly because they lack the linguistic context retrieval expects. HyDE asks an LLM to generate a plausible answer to the query, then uses that answer’s embedding for retrieval. The hypothetical text doesn’t need to be correct — just shaped like the real answer.

When it helps

Open-ended questions (“how do agents handle memory across sessions?”).
Cross-lingual or jargon-heavy corpora.

When it doesn’t

Lookup-style queries where the query already contains the literal token you want to match.
Cost-sensitive paths — adds one LLM call per query.

How HyDE — Hypothetical Document Embeddings compares

AI-generated editorial comparisons against nearest peers (glm-4.6). Cached at build time; regenerate via node scripts/build-comparisons.mjs.

vs GraphRAGAI · cached

GraphRAG and HyDE — Hypothetical Document Embeddings fundamentally trade off ingestion effort against query latency to solve different retrieval problems. GraphRAG requires expensive, upfront graph construction to connect entities across a dataset, whereas HyDE shifts the cost to runtime, generating a fake answer to bridge the gap between a short query and dense document vectors.

GraphRAG wins hands-down for global questions about entire datasets, such as "What are the systemic risks in this financial report?" because community summaries provide context no single chunk possesses. However, for a simple lookup like "reset password," HyDE is strictly better; graph traversal is overkill, but HyDE’s hypothetical answer anchors the vector search effectively. Use GraphRAG when you need structural reasoning and complex querying, but stick to HyDE if you need to improve semantic recall for short, abstract queries without the heavy infrastructure of a knowledge graph.

HyDE — Hypothetical Document Embeddings

Core idea

When it helps

When it doesn’t

How HyDE — Hypothetical Document Embeddings compares

vs GraphRAGAI · cached

Related

Methods

Patterns

Anti-patterns