Return_to_Blueprints
Performance/Phase_04/10 min build

High-Traffic Cache Gateway

A high-traffic optimization pattern. It routes incoming requests through a semantic cache layer, defaulting to an ultra-fast LLM fallback only when a cache miss occurs, drastically reducing P50 latency.

Execution_Steps

  1. 01

    Deploy the Cache Layer

    Connect an `input` node to a `cache` node. Open the Cache config, set the Backend to Redis, Mode to Semantic, and TTL to 3600 seconds.

  2. 02

    Route Based on Cache Hit

    Connect the Cache node to a `router` node. In the router config, set the Strategy to "Cache Evaluation".

  3. 03

    Design the Miss Path

    Drag an `llm_call` node onto the canvas. Set it to "gemini-1.5-flash" for high-speed fallback. Connect one outgoing edge of the Router (the "Miss" path) to this LLM.

  4. 04

    Converge on Output

    Drag an `output` node. Connect the LLM directly to the Output. Then, draw a second edge directly from the Router (the "Hit" path) to the same Output node. The engine will calculate the blended latency based on your configured cache hit rate.

Expected_Metrics

P50_LATENCY:< 200ms
COST_SAVING:95.0%
SLA_LIMIT:500ms

Ready to verify?

Open the canvas and simulate these parameters in real-time.

Node_Architecture

inputClient RequestText Mode
cacheSemantic CacheRedis Backend / TTL 3600s
routerCache RouterHit / Miss Strategy
llm_callFallback LLMgemini-1.5-flash
outputDeliveryStandard