mem1.wiki

Methods

Method

HyDE — Hypothetical Document Embeddings

Query expansion technique. Instead of embedding the query directly, ask the LLM to draft a hypothetical answer, then embed that. Often improves recall on short, abstract queries.

Source
https://arxiv.org/abs/2212.10496
Taxonomy
methods.retrieval
Origin
Gao et al. (2022)
Primary source
https://arxiv.org/abs/2212.10496
Domain
retrieval
Maturity
established
Primary artifacts
hypothetical answer, expanded query embedding

Core idea

Short queries embed poorly because they lack the linguistic context retrieval expects. HyDE asks an LLM to generate a plausible answer to the query, then uses that answer’s embedding for retrieval. The hypothetical text doesn’t need to be correct — just shaped like the real answer.

When it helps

When it doesn’t

How HyDE — Hypothetical Document Embeddings compares

AI-generated editorial comparisons against nearest peers (glm-4.6). Cached at build time; regenerate via node scripts/build-comparisons.mjs.

vs GraphRAGAI · cached

GraphRAG and HyDE — Hypothetical Document Embeddings fundamentally trade off ingestion effort against query latency to solve different retrieval problems. GraphRAG requires expensive, upfront graph construction to connect entities across a dataset, whereas HyDE shifts the cost to runtime, generating a fake answer to bridge the gap between a short query and dense document vectors.

GraphRAG wins hands-down for global questions about entire datasets, such as "What are the systemic risks in this financial report?" because community summaries provide context no single chunk possesses. However, for a simple lookup like "reset password," HyDE is strictly better; graph traversal is overkill, but HyDE’s hypothetical answer anchors the vector search effectively. Use GraphRAG when you need structural reasoning and complex querying, but stick to HyDE if you need to improve semantic recall for short, abstract queries without the heavy infrastructure of a knowledge graph.