High-Traffic Cache Gateway
A high-traffic optimization pattern. It routes incoming requests through a semantic cache layer, defaulting to an ultra-fast LLM fallback only when a cache miss occurs, drastically reducing P50 latency.
Execution_Steps
- 01
Deploy the Cache Layer
Connect an `input` node to a `cache` node. Open the Cache config, set the Backend to Redis, Mode to Semantic, and TTL to 3600 seconds.
- 02
Route Based on Cache Hit
Connect the Cache node to a `router` node. In the router config, set the Strategy to "Cache Evaluation".
- 03
Design the Miss Path
Drag an `llm_call` node onto the canvas. Set it to "gemini-1.5-flash" for high-speed fallback. Connect one outgoing edge of the Router (the "Miss" path) to this LLM.
- 04
Converge on Output
Drag an `output` node. Connect the LLM directly to the Output. Then, draw a second edge directly from the Router (the "Hit" path) to the same Output node. The engine will calculate the blended latency based on your configured cache hit rate.