Accuracy/Phase_05/20 min build

High-Precision RAG with Reranker

A precision-focused RAG architecture. It broadens initial vector retrieval and applies a dedicated reranker model to filter noise, maximizing the contextual relevance passed to the generation model.

Execution_Steps

01
Broad Context Retrieval
Connect an `input` to an `embedding` node (Cohere embed-english-v3.0), and route that to a `vector_store`. Configure the Vector Store to cast a wide net: set Top K to 50.
02
Apply the Reranker
Attach a `reranker` node immediately after the Vector Store. Open its configuration sheet, select "Cohere" as the provider, "rerank-v3.5" as the model, and restrict the final output to Top-K: 3. This ensures only the highest relevance chunks pass forward.
03
Generation & Output
Connect the Reranker to an `llm_call` (gpt-4o), and cap it with an `output` node. When simulating, note the increased P95 latency from the extra network hop, balanced against the precision gain.

Expected_Metrics

P50_LATENCY:< 3200ms

COST_SAVING:15.0%

SLA_LIMIT:4500ms

Ready to verify?

Open the canvas and simulate these parameters in real-time.

Node_Architecture

inputQueryText Mode

embeddingDense Embedderembed-english-v3.0

vector_storeBroad RetrievalTop-K: 50

rerankerPrecision FilterCohere rerank-v3.5 / Top-K: 3

llm_callGeneratorgpt-4o

outputResponseStandard

Recursive_Read_Next

Architecture

The AI Essay Detector

Data Pipeline

Execution_Steps

Broad Context Retrieval

Apply the Reranker

Generation & Output

Expected_Metrics

Ready to verify?

Node_Architecture

The AI Essay Detector

Enterprise Support RAG Pipeline