Pattern

Cross-encoder reranker

Two-stage retrieval. First, ANN search returns top-K candidates. Then a cross-encoder scores each (query, candidate) pair jointly and re-orders by relevance.

Taxonomy: patterns.ranking
Category: ranking
Complexity: medium
When to use: Top-K ANN recall is high but ordering is noisy; you can afford a few hundred ms of extra latency for sharper top-3.
When NOT to use: Single-result lookups or strict latency budgets under 50ms total.

What it is

A bi-encoder embeds query and document independently — fast but lossy. A cross-encoder takes (query, document) as one input and scores them jointly — slower but much sharper. The standard recipe: bi-encoder for retrieval (top-50 to top-200), cross-encoder for reranking (top-3 to top-10).

Common cross-encoders

BGE reranker series (open weights, multilingual).
Cohere Rerank (managed API, strong baseline).
ColBERT-style late-interaction models for higher recall at the cost of more memory.

Trade-offs

Latency is linear in K. Reranking 100 candidates with a 6B cross-encoder is not free.
Quality lift is largest when the bi-encoder recall is high but precision@1 is low — which is the typical RAG failure mode.

Cross-encoder reranker

What it is

Common cross-encoders

Trade-offs

Related

Methods

Patterns

Anti-patterns