High-Precision RAG with Reranker
A precision-focused RAG architecture. It broadens initial vector retrieval and applies a dedicated reranker model to filter noise, maximizing the contextual relevance passed to the generation model.
Execution_Steps
- 01
Broad Context Retrieval
Connect an `input` to an `embedding` node (Cohere embed-english-v3.0), and route that to a `vector_store`. Configure the Vector Store to cast a wide net: set Top K to 50.
- 02
Apply the Reranker
Attach a `reranker` node immediately after the Vector Store. Open its configuration sheet, select "Cohere" as the provider, "rerank-v3.5" as the model, and restrict the final output to Top-K: 3. This ensures only the highest relevance chunks pass forward.
- 03
Generation & Output
Connect the Reranker to an `llm_call` (gpt-4o), and cap it with an `output` node. When simulating, note the increased P95 latency from the extra network hop, balanced against the precision gain.