mem1.wiki

Methods

Method

GraphRAG

Retrieval method that builds an entity-relation graph from the corpus, summarises graph communities, and uses those summaries plus graph traversal alongside vector search.

Source
https://arxiv.org/abs/2404.16130
Taxonomy
methods.rag
Origin
Microsoft Research (2024)
Primary source
https://arxiv.org/abs/2404.16130
Domain
rag
Maturity
established
Primary artifacts
entity graph, community summaries, hierarchical clusters

Core idea

Naive RAG retrieves chunks by vector similarity. GraphRAG adds a structural layer:

  1. Extract entities and relations from each chunk with an LLM pass.
  2. Cluster the graph into communities (Leiden/Louvain).
  3. Summarise each community with an LLM — these summaries become first-class retrievable units alongside raw chunks.
  4. Retrieve by combining vector hits, graph neighbourhood traversal, and matching community summaries.

Why it matters

Vector search alone struggles with global questions (“what are the main themes across this corpus?”) because the answer is not in any single chunk. Community summaries give the retriever a coarse-to-fine ladder.

Implementations

Trade-offs

How GraphRAG compares

AI-generated editorial comparisons against nearest peers (glm-4.6). Cached at build time; regenerate via node scripts/build-comparisons.mjs.

vs HyDE — Hypothetical Document EmbeddingsAI · cached

GraphRAG and HyDE — Hypothetical Document Embeddings fundamentally trade off ingestion effort against query latency to solve different retrieval problems. GraphRAG requires expensive, upfront graph construction to connect entities across a dataset, whereas HyDE shifts the cost to runtime, generating a fake answer to bridge the gap between a short query and dense document vectors.

GraphRAG wins hands-down for global questions about entire datasets, such as "What are the systemic risks in this financial report?" because community summaries provide context no single chunk possesses. However, for a simple lookup like "reset password," HyDE is strictly better; graph traversal is overkill, but HyDE’s hypothetical answer anchors the vector search effectively. Use GraphRAG when you need structural reasoning and complex querying, but stick to HyDE if you need to improve semantic recall for short, abstract queries without the heavy infrastructure of a knowledge graph.