PRISM | Pipeline Realtime Intelligence & Simulation Monitor

§1

#Introduction

PRISM is a pipeline simulator for AI systems. You design a graph of models, retrievers, rerankers, and tools, and PRISM projects the cost, latency, and accuracy you'd see in production — without having to stand the whole thing up.

Everything runs on top of three engines: a token-level cost engine that joins against daily-scraped provider pricing, a queue-theory latency engine that respects node concurrency and cache hit rates, and an accuracy engine that estimates end-to-end quality from published model + reranker benchmarks.

Use it to pick the right model before you integrate it, to pre-budget a feature launch, or to compare two pipelines under identical traffic and pick the cheaper / faster / more accurate one with receipts.

§2

#Quickstart

A working simulation in under five minutes. No install required.

1. Create an account

Head to /login and sign up with email. You land on the Free plan — 50 simulation runs per month, unlimited pipelines, community benchmarks.

2. Build a pipeline

Open the canvas, drag in Input, Embedding, Vector Store, Re-ranker, LLM Call, and Output, then wire them left-to-right. Click a node to configure its provider/model/token expectations on the right panel.

3. Run a simulation

Open the simulator panel (bottom-right), set your expected traffic, and hit Run. Results stream in live over WebSocket — progress % on top, stage transitions underneath, and the final cost/latency/accuracy envelope at the end.

http

POST /v1/simulations
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "pipeline_id": "pipe_01H7...",
  "traffic_params": {
    "requests_per_day": 25000,
    "avg_input_tokens": 1200,
    "avg_output_tokens": 400,
    "cache_hit_rate": 0.15,
    "retry_rate": 0.02
  }
}

You get back a 202 Accepted with the simulation id. Subscribe to live updates via WebSocket (see WebSocket events), or poll GET /v1/simulations/{id}.

4. Compare options

Duplicate the pipeline, swap one variable (say, gpt-4o-mini → claude-3-5-sonnet), then queue a comparison from the Comparisons page. PRISM runs both under identical traffic and picks a winner by your declared priority.

http

POST /v1/comparisons
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "pipeline_ids": ["pipe_a", "pipe_b"],
  "priority": "balanced",
  "traffic_params": {
    "requests_per_day": 25000,
    "avg_input_tokens": 1200,
    "avg_output_tokens": 400,
    "cache_hit_rate": 0.15,
    "retry_rate": 0.02
  }
}

§3

#Core concepts

Pipelines

A pipeline is a directed acyclic graph of nodes with optional routing edges. It has a stable id, a version history, and autosaves as you edit. Pipelines are workspace-scoped — team members see the same set.

Nodes

Nine node types cover the usual shape of an AI system: I/O, core models (LLM, embedding), RAG primitives (vector store, reranker), and logic (cache, router, tool call). See the node reference for the exact config shape each one expects.

Simulations

A simulation evaluates a pipeline against traffic params. It runs three engines in parallel — cost, latency, accuracy — and returns a structured result envelope. Runs stream live over WebSocket; terminal runs are also reachable via polling.

Comparisons

A comparison fans out 2–5 simulations in parallel with identical traffic, then selects a winner using a priority — balanced, cost, latency, or accuracy. Pipelines that fail guardrails (e.g. p95 over threshold) are excluded from the recommendation even if they score highest.

Benchmarks

Benchmarks are the read-only feed of provider pricing, model accuracy, and reranker lift that the engines join against. Updated daily by the scraper for all major providers — OpenAI, Anthropic, Google, Cohere, Mistral, Voyage, and more.

§4

#Pipeline model

A pipeline is serialized as JSON with two arrays — nodes and edges — plus metadata.

json

{
  "id": "pipe_01H7XYZ",
  "name": "Support RAG — v3",
  "nodes": [
    { "id": "n_in",   "type": "input",        "label": "User question" },
    { "id": "n_emb",  "type": "embedding",    "config": { "provider": "OpenAI", "model": "text-embedding-3-small" } },
    { "id": "n_vec",  "type": "vector_store", "config": { "provider": "pinecone", "index_name": "support", "top_k": 8 } },
    { "id": "n_rk",   "type": "reranker",     "config": { "provider": "Cohere", "model": "rerank-v3.5", "top_n": 3 } },
    { "id": "n_llm",  "type": "llm_call",     "config": { "provider": "Anthropic", "model": "claude-3-5-sonnet" } },
    { "id": "n_out",  "type": "output",       "config": { "mode": "Response", "format": "JSON" } }
  ],
  "edges": [
    { "source": "n_in",  "target": "n_emb" },
    { "source": "n_emb", "target": "n_vec" },
    { "source": "n_vec", "target": "n_rk"  },
    { "source": "n_rk",  "target": "n_llm" },
    { "source": "n_llm", "target": "n_out" }
  ]
}

§5

#Node reference

The full set of node types. Defaults and the exact TypeScript types live in apps/web/lib/constants/node-presets.ts and apps/web/types/canvas.ts.

Type	Category	Purpose	Config keys
input Input	I/O	Entry point. Declares the request shape (Text / JSON / Multimodal).	mode, schema
output Output	I/O	Terminal node. Declares the response shape.	mode, format
llm_call LLM Call	Core	A call to a generative model. The cost engine multiplies expected tokens by the scraped model price.	provider, model, system_prompt_tokens, expected_input_tokens {p50,p95}, expected_output_tokens {p50,p95}, temperature, max_retries, timeout_ms
embedding Embedding	Core	Vector embedding for retrieval or semantic caching.	provider, model, expected_input_tokens {p50,p95}, dimensions, timeout_ms
vector_store Vector Store	Core	Retrieval from an index. Latency reflects provider + top_k; cost reflects operation pricing.	provider, index_name, top_k, chunk_size_tokens
reranker Re-ranker	RAG	Cross-encoder reranking. Accuracy engine credits lift from published benchmark numbers.	provider, model, top_n, timeout_ms
tool_call Tool Call	Logic	External function/tool invocation. Contributes fixed latency/cost based on declared SLAs.	name, expected_latency_ms {p50,p95}, fixed_cost_cents
cache Cache	Logic	Short-circuits downstream work when the traffic hit-rate fires. Cheapest win.	hit_rate_pct, ttl_seconds, keyed_on
router Router	Logic	Weighted branch. Each outgoing edge carries a weight; the simulator splits traffic accordingly.	routes[], default_route

§6

#Simulation

Traffic parameters

Every simulation is parameterized by the same five numbers. Two describe volume, two describe typical request size, one captures how cache-y your workload is.

Field	Type	Meaning
requests_per_day	number	Steady-state traffic volume.
avg_input_tokens	number	Per-request input size (prompts + retrieved docs).
avg_output_tokens	number	Per-request generated tokens (answer length).
cache_hit_rate	0.0 – 1.0	Fraction of requests served by a `cache` node before reaching the model.
retry_rate	0.0 – 1.0	Fraction of calls that fail and get re-attempted. Paid twice by the cost engine.

Engines

Cost engine

Joins every priced node against the daily pricing table.
Charges retries at retry_rate × provider rate.
Discounts cached requests.
Returns p50 + p95 $/query and daily/monthly totals.

Latency engine

Queue-theory per node with provider-typical μ + σ.
Respects concurrency limits and router weights.
Returns p50 / p90 / p95 / p99 end-to-end latency.
Flags stages that push worst-case over target.

Accuracy engine

Bootstraps from published model benchmarks + reranker lift.
Only runs when at least one graded node is present.
Returns 0–1 overall score and per-metric breakdown.
Skips the metric cleanly for pipelines with no retrieval.

Result envelope

Terminal simulations include a structured results object. The shape below is abbreviated; see apps/web/types/simulation.ts for the complete TypeScript types.

json

{
  "status": "complete",
  "cost": {
    "cost_per_query_p50": "0.0042",
    "cost_per_query_p95": "0.0071",
    "daily_cost_usd": "105.00",
    "monthly_cost_usd": "3150.00",
    "stage_breakdown": [ /* per-node cost contribution */ ]
  },
  "latency": {
    "latency_p50_ms": 820,
    "latency_p95_ms": 1740,
    "latency_p99_ms": 2390,
    "stage_breakdown": [ /* per-node latency contribution */ ]
  },
  "accuracy": {
    "applicable": true,
    "overall_score": 0.82,
    "metrics": {
      "retrieval_recall": 0.79,
      "rerank_ndcg": 0.86,
      "answer_relevance": 0.83
    }
  },
  "errors": []
}

§7

#Comparisons

Side-by-side simulations under identical traffic with a recommendation at the end.

Provide 2–5 pipeline ids and a priority. PRISM fans out simulations, tracks progress per pipeline, and selects a winner that (a) scores highest on your priority and (b) satisfies every active guardrail. If nothing is compliant, the response carries recommendation_status: "no_compliant_option" instead of quietly recommending a failing run.

json

// Response shape for GET /v1/comparisons/{id}
{
  "comparison_id": "cmp_01H7...",
  "priority": "balanced",
  "completed_count": 3,
  "total_count": 3,
  "items": [
    {
      "pipeline_id": "pipe_a",
      "pipeline_name": "RAG baseline",
      "simulation_id": "sim_...",
      "status": "complete",
      "score": 0.78,
      "rank": 1,
      "is_compliant": true,
      "metrics": { "cost_per_query_p50": 0.0038, "latency_p95_ms": 1420, "accuracy_overall_score": 0.81 },
      "threshold_violations": []
    }
    // ... more
  ],
  "recommendation": {
    "pipeline_id": "pipe_a",
    "pipeline_name": "RAG baseline",
    "simulation_id": "sim_...",
    "reason": "Lowest cost within latency + accuracy guardrails.",
    "priority": "balanced"
  }
}

§8

#Benchmarks

The read-only feed the engines join against. Same data powers the public benchmarks page.

Scraped daily from provider pricing pages: OpenAI, Anthropic, Google, Cohere, Mistral, Voyage, DeepSeek, Groq, Together, xAI, and more.
Normalized to $/1K tokens regardless of whether a provider publishes $/1M or per-call pricing.
Historical price deltas exposed via /v1/benchmarks/history/{provider}/{model}.
Alerts — POST to /v1/benchmarks/alerts to subscribe when a specific model's price moves.

§9

#Authentication

All REST endpoints require a Bearer token. Tokens are issued per-workspace from Settings → API keys. A token carries the role of the user who minted it — admin tokens can write, member tokens can read.

bash

curl https://api.getprism.dev/v1/pipelines \
  -H "Authorization: Bearer $PRISM_API_KEY"

§10

#REST API reference

Base URL: https://api.getprism.dev. All endpoints speak JSON, return standard HTTP status codes, and include X-RateLimit-* headers.

Pipelines

Create, read, update, and delete pipeline graphs. A pipeline is a DAG of nodes with optional routing.

GET/v1/pipelines

List workspace pipelines.

POST/v1/pipelines

Create a new pipeline.

GET/v1/pipelines/{id}

Fetch a pipeline by id.

PUT/v1/pipelines/{id}

Replace a pipeline's graph + metadata.

DELETE/v1/pipelines/{id}

Soft-delete a pipeline and its history.

GET/v1/pipelines/{id}/simulations/latest

Most recent simulation status for a pipeline (used by the workbench on load).

GET/v1/pipelines/{id}/comments

List annotations/comments attached to a pipeline.

Simulations

Kick off and observe a simulation. POST is 202 Accepted — you stream progress via WebSocket or poll.

POST/v1/simulations

Create + enqueue a simulation run.

GET/v1/simulations/{id}

Poll status and partial results.

GET/v1/simulations/{id}/report-url

Short-lived signed URL for the PDF report.

GET/v1/simulations/{id}/report

302 redirect to the signed URL (convenience).

POST/v1/simulations/{id}/share

Create or rotate a public share link.

GET/v1/simulations/{id}/approvals

List approvals requested on this run.

POST/v1/simulations/{id}/approve

Approve a run (team plan and above).

Comparisons

Run 2–5 pipelines on identical traffic and let PRISM pick a winner along a declared priority (cost / latency / accuracy / balanced).

POST/v1/comparisons

Create a comparison (202 Accepted).

GET/v1/comparisons/{id}

Poll scorecards, per-pipeline progress, and the recommendation when it lands.

Benchmarks

Read-only view of the community benchmarks feed: pricing per provider/model, historical deltas, and accuracy scores the recommender relies on.

GET/v1/benchmarks

List benchmark rows (paginated, filterable).

GET/v1/benchmarks/bootstrap

Seed snapshot of benchmarks — useful for offline tools.

GET/v1/benchmarks/history/{provider}/{model}

Price history for a single model.

GET/v1/benchmarks/compare

Side-by-side comparison for a set of provider/model pairs.

POST/v1/benchmarks/alerts

Subscribe to price-change alerts for a model.

Pricing catalog

The up-to-date, scraped pricing table powering the cost engine. Same data that drives the public benchmarks page.

GET/v1/pricing

All priced models, grouped by provider.

GET/v1/pricing/catalog

Provider → family → model tree.

Templates

Starter pipeline templates (RAG baseline, hybrid search, agent loop, …). Fork-friendly — the POST endpoint clones a template into your workspace.

GET/v1/templates

List published templates.

GET/v1/templates/{slug}

Fetch a template by slug.

POST/v1/templates/{slug}/fork

Fork a template into the caller's workspace.

Workspace & members

Team plan and above. Invite teammates, manage roles, and transfer workspace ownership.

GET/v1/workspace/members

List seats + roles.

GET/v1/workspace/invites

Pending invites.

POST/v1/workspace/invites

Invite a teammate (email).

POST/v1/workspace/invites/{id}/revoke

Revoke a pending invite.

POST/v1/workspace/invites/accept

Accept an invite using the token from the email.

PATCH/v1/workspace/members/{id}/role

Change a member's role (owner / admin / member).

POST/v1/workspace/ownership/transfer

Transfer ownership to another admin.

GET/v1/workspace/audit

Workspace audit log.

Validation

Standalone accuracy-validation runs. Useful for regressions when you change a retriever or prompt.

POST/v1/validation/runs

Queue an accuracy validation run.

GET/v1/validation/runs/{id}

Fetch a validation run.

GET/v1/validation/datasets

List available labeled datasets.

Billing

Stripe-backed. Checkout, customer portal, and webhooks. All requests are workspace-scoped.

GET/v1/billing/status

Current subscription + seats.

POST/v1/billing/checkout-session

Start a Stripe checkout for plan upgrade or seat purchase.

POST/v1/billing/portal-session

Open the Stripe customer portal.

POST/v1/billing/webhook

Stripe → PRISM webhook receiver.

WebSocket

Real-time progress. Auth over subprotocol header; see the WebSocket events section for frame shape.

WSS/v1/ws/simulations/{id}

Stream progress + stage transitions + final result for a simulation.

WSS/v1/ws/validations/{id}

Stream progress for a validation run.

§11

#WebSocket events

Live progress for simulations and validation runs.

Connect to wss://api.getprism.dev/v1/ws/simulations/{id}. Auth is passed via the Sec-WebSocket-Protocol header — bearer.<token>. The server sends newline-delimited JSON frames:

json

// Frame: progress update
{ "type": "progress", "progress_pct": 42, "stage": "latency_engine" }

// Frame: stage transition
{ "type": "stage", "stage": "accuracy_engine", "started_at": "2026-04-21T10:30:02Z" }

// Frame: terminal (final)
{ "type": "complete", "status": "complete", "results": { /* envelope */ } }

// Frame: error (terminal)
{ "type": "error", "status": "failed", "error": { "message": "Upstream 429 on Anthropic.", "type": "provider_error" } }

§12

#Rate limits

Every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset (seconds until the window rolls). Limits are workspace-scoped.

Free: 60 req/min, 50 simulation runs/month.
Pro: 300 req/min, 1,000 simulation runs/month.
Team: 600 req/min, 10,000 simulation runs/month, seat-based.
Enterprise: negotiated. Contact sales.

§13

#Errors

Error responses are JSON with a stable shape.

json

{
  "error": {
    "type": "validation_error",
    "message": "traffic_params.cache_hit_rate must be between 0 and 1",
    "field": "traffic_params.cache_hit_rate"
  }
}

Status	Type	When
400	validation_error	Request body failed schema validation.
401	unauthorized	Missing or invalid bearer token.
403	forbidden	Token lacks the role for this endpoint.
404	not_found	Resource doesn't exist or you don't have access.
409	conflict	Pipeline was edited concurrently; reload and retry.
422	unprocessable	Semantic check failed (e.g. disconnected graph).
429	rate_limited	Back off per Retry-After.
500	internal_error	Server bug. Safe to retry once, then file a report.
503	upstream_error	A provider API we depend on is degraded.

§14

#SDKs & tooling

The REST API is the source of truth. First-party SDKs wrap it and add conveniences (retry, streaming, types).

Python

Beta

Async + sync clients. Full type coverage on pipelines, simulations, and comparisons.

pip install prism-sdk

TypeScript

Beta

Isomorphic. Ships with a streaming helper that uses WebSocket when available and polls when not.

npm install @prism/sdk

Go

Planned

Roadmapped for H2. Talk to us if you want to be an early user.

go get github.com/getprism/sdk-go

GitHub Action

getprism/prism-action runs a simulation on every PR and comments the cost / latency / accuracy delta vs. main. Source lives in apps/action/ on this repo.

yaml

# .github/workflows/prism.yml
name: PRISM Simulation
on: [pull_request]
jobs:
  simulate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: getprism/prism-action@v1
        with:
          api-key: ${{ secrets.PRISM_API_KEY }}
          pipeline-id: ${{ vars.PRISM_PIPELINE_ID }}
          # optional: override traffic params for the PR run
          traffic: |
            requests_per_day: 25000
            avg_input_tokens: 1200
            avg_output_tokens: 400
            cache_hit_rate: 0.15
            retry_rate: 0.02

§15

#Support

Security issues

Email security@getprism.dev. We respond within one business day and have a public disclosure policy.

API + billing

Email support@getprism.dev for account, billing, or API questions. Team plans get a shared Slack channel.

Status + incidents

Live status at status.getprism.dev — subscribe for incident emails and component-level uptime.

1. Create an account

2. Build a pipeline

3. Run a simulation

4. Compare options

Pipelines

Nodes

Simulations

Comparisons

Benchmarks

Traffic parameters

Engines

Cost engine

Latency engine

Accuracy engine

Result envelope

Pipelines

Simulations

Comparisons

Benchmarks

Pricing catalog

Templates

Workspace & members

Validation

Share (public)

Billing

WebSocket

Python

TypeScript

Go

GitHub Action

Security issues

API + billing

Status + incidents