Return_to_Blueprints
Accuracy/Phase_05/20 min build

High-Precision RAG with Reranker

A precision-focused RAG architecture. It broadens initial vector retrieval and applies a dedicated reranker model to filter noise, maximizing the contextual relevance passed to the generation model.

Execution_Steps

  1. 01

    Broad Context Retrieval

    Connect an `input` to an `embedding` node (Cohere embed-english-v3.0), and route that to a `vector_store`. Configure the Vector Store to cast a wide net: set Top K to 50.

  2. 02

    Apply the Reranker

    Attach a `reranker` node immediately after the Vector Store. Open its configuration sheet, select "Cohere" as the provider, "rerank-v3.5" as the model, and restrict the final output to Top-K: 3. This ensures only the highest relevance chunks pass forward.

  3. 03

    Generation & Output

    Connect the Reranker to an `llm_call` (gpt-4o), and cap it with an `output` node. When simulating, note the increased P95 latency from the extra network hop, balanced against the precision gain.

Expected_Metrics

P50_LATENCY:< 3200ms
COST_SAVING:15.0%
SLA_LIMIT:4500ms

Ready to verify?

Open the canvas and simulate these parameters in real-time.

Node_Architecture

inputQueryText Mode
embeddingDense Embedderembed-english-v3.0
vector_storeBroad RetrievalTop-K: 50
rerankerPrecision FilterCohere rerank-v3.5 / Top-K: 3
llm_callGeneratorgpt-4o
outputResponseStandard