Return_to_Blueprints
Architecture/Phase_01/12 min build

The AI Essay Detector

A cascading routing pipeline designed to minimize token costs. It uses a faster, cheaper LLM for initial triage, only routing ambiguous cases to a heavy, expensive model for deep analysis.

Execution_Steps

  1. 01

    Canvas Initialization

    Press [Ctrl+B] to open the Node Palette. Drag an `input` node and an `llm_call` node onto the canvas. Click and drag from the Input's right handle to the LLM Call's left handle to establish a Smart Edge.

  2. 02

    Configure the Lean Classifier

    Right-click the `llm_call` node to open the Configuration Sheet. Rename the node to "Fast Triage". Select the "OpenAI" provider and the "gpt-4o-mini" model. Set Expected Input Tokens to P50: 1500 / P95: 2500, and Output Tokens to P50: 50 / P95: 100.

  3. 03

    Inject the Heuristic Router

    Drag a `router` node from the palette. Connect the output of your "Fast Triage" node to the input of the Router. This node will evaluate the confidence score of the fast classifier to determine the next step.

  4. 04

    Build the Deep Analysis Branch

    Drag two more `llm_call` nodes onto the canvas (rename them "Feature Extractor" and "Deep Classifier"). Configure both to use "gpt-4o". Connect the Router to the Feature Extractor, and the Extractor to the Deep Classifier.

  5. 05

    Establish the Fast Branch & Terminals

    Drag two `output` nodes onto the canvas. Connect the Deep Classifier to one Output. Then, draw a second edge directly from the Router to the other Output node. Your Router now officially has two outgoing paths.

  6. 06

    Define Router Probabilities

    Select the `router` node to open its config. Under "Route Weights", set the weight of the edge going directly to the Output to 0.85 (85% confident triages). Set the edge going to the Deep Branch to 0.15. Ensure the Weight Sum flashes emerald green (1.0000).

  7. 07

    Execute Monte Carlo Simulation

    Your DAG is complete. Open the simulation panel, set traffic to 100,000 requests/day, and run the engine. PRISM will probabilistically distribute the traffic and prove the 80%+ cost reduction of this cascading architecture.

Expected_Metrics

P50_LATENCY:< 2800ms
COST_SAVING:82.4%
SLA_LIMIT:4000ms

Ready to verify?

Open the canvas and simulate these parameters in real-time.

Node_Architecture

inputIngestionText Mode
llm_callFast Triagegpt-4o-mini
routerConfidence ThresholdWeighted (0.85 / 0.15)
llm_callFeature Extractorgpt-4o (Deep Analysis)
llm_callDeep Classifiergpt-4o (Final Verdict)
outputTerminalsFast & Deep Results