§1
#Introduction
PRISM is a pipeline simulator for AI systems. You design a graph of models, retrievers, rerankers, and tools, and PRISM projects the cost, latency, and accuracy you'd see in production — without having to stand the whole thing up.
Everything runs on top of three engines: a token-level cost engine that joins against daily-scraped provider pricing, a queue-theory latency engine that respects node concurrency and cache hit rates, and an accuracy engine that estimates end-to-end quality from published model + reranker benchmarks.
Use it to pick the right model before you integrate it, to pre-budget a feature launch, or to compare two pipelines under identical traffic and pick the cheaper / faster / more accurate one with receipts.
§2
#Quickstart
A working simulation in under five minutes. No install required.
1. Create an account
Head to /login and sign up with email. You land on the Free plan — 50 simulation runs per month, unlimited pipelines, community benchmarks.
2. Build a pipeline
Open the canvas, drag in Input, Embedding, Vector Store, Re-ranker, LLM Call, and Output, then wire them left-to-right. Click a node to configure its provider/model/token expectations on the right panel.
3. Run a simulation
Open the simulator panel (bottom-right), set your expected traffic, and hit Run. Results stream in live over WebSocket — progress % on top, stage transitions underneath, and the final cost/latency/accuracy envelope at the end.
POST /v1/simulations
Authorization: Bearer <your-token>
Content-Type: application/json
{
"pipeline_id": "pipe_01H7...",
"traffic_params": {
"requests_per_day": 25000,
"avg_input_tokens": 1200,
"avg_output_tokens": 400,
"cache_hit_rate": 0.15,
"retry_rate": 0.02
}
}You get back a 202 Accepted with the simulation id. Subscribe to live updates via WebSocket (see WebSocket events), or poll GET /v1/simulations/{id}.
4. Compare options
Duplicate the pipeline, swap one variable (say, gpt-4o-mini → claude-3-5-sonnet), then queue a comparison from the Comparisons page. PRISM runs both under identical traffic and picks a winner by your declared priority.
POST /v1/comparisons
Authorization: Bearer <your-token>
Content-Type: application/json
{
"pipeline_ids": ["pipe_a", "pipe_b"],
"priority": "balanced",
"traffic_params": {
"requests_per_day": 25000,
"avg_input_tokens": 1200,
"avg_output_tokens": 400,
"cache_hit_rate": 0.15,
"retry_rate": 0.02
}
}§3
#Core concepts
Pipelines
A pipeline is a directed acyclic graph of nodes with optional routing edges. It has a stable id, a version history, and autosaves as you edit. Pipelines are workspace-scoped — team members see the same set.
Nodes
Nine node types cover the usual shape of an AI system: I/O, core models (LLM, embedding), RAG primitives (vector store, reranker), and logic (cache, router, tool call). See the node reference for the exact config shape each one expects.
Simulations
A simulation evaluates a pipeline against traffic params. It runs three engines in parallel — cost, latency, accuracy — and returns a structured result envelope. Runs stream live over WebSocket; terminal runs are also reachable via polling.
Comparisons
A comparison fans out 2–5 simulations in parallel with identical traffic, then selects a winner using a priority — balanced, cost, latency, or accuracy. Pipelines that fail guardrails (e.g. p95 over threshold) are excluded from the recommendation even if they score highest.
Benchmarks
Benchmarks are the read-only feed of provider pricing, model accuracy, and reranker lift that the engines join against. Updated daily by the scraper for all major providers — OpenAI, Anthropic, Google, Cohere, Mistral, Voyage, and more.
§4
#Pipeline model
A pipeline is serialized as JSON with two arrays — nodes and edges — plus metadata.
{
"id": "pipe_01H7XYZ",
"name": "Support RAG — v3",
"nodes": [
{ "id": "n_in", "type": "input", "label": "User question" },
{ "id": "n_emb", "type": "embedding", "config": { "provider": "OpenAI", "model": "text-embedding-3-small" } },
{ "id": "n_vec", "type": "vector_store", "config": { "provider": "pinecone", "index_name": "support", "top_k": 8 } },
{ "id": "n_rk", "type": "reranker", "config": { "provider": "Cohere", "model": "rerank-v3.5", "top_n": 3 } },
{ "id": "n_llm", "type": "llm_call", "config": { "provider": "Anthropic", "model": "claude-3-5-sonnet" } },
{ "id": "n_out", "type": "output", "config": { "mode": "Response", "format": "JSON" } }
],
"edges": [
{ "source": "n_in", "target": "n_emb" },
{ "source": "n_emb", "target": "n_vec" },
{ "source": "n_vec", "target": "n_rk" },
{ "source": "n_rk", "target": "n_llm" },
{ "source": "n_llm", "target": "n_out" }
]
}§5
#Node reference
The full set of node types. Defaults and the exact TypeScript types live in apps/web/lib/constants/node-presets.ts and apps/web/types/canvas.ts.
| Type | Category | Purpose | Config keys |
|---|---|---|---|
input Input | I/O | Entry point. Declares the request shape (Text / JSON / Multimodal). | mode, schema |
output Output | I/O | Terminal node. Declares the response shape. | mode, format |
llm_call LLM Call | Core | A call to a generative model. The cost engine multiplies expected tokens by the scraped model price. | provider, model, system_prompt_tokens, expected_input_tokens {p50,p95}, expected_output_tokens {p50,p95}, temperature, max_retries, timeout_ms |
embedding Embedding | Core | Vector embedding for retrieval or semantic caching. | provider, model, expected_input_tokens {p50,p95}, dimensions, timeout_ms |
vector_store Vector Store | Core | Retrieval from an index. Latency reflects provider + top_k; cost reflects operation pricing. | provider, index_name, top_k, chunk_size_tokens |
reranker Re-ranker | RAG | Cross-encoder reranking. Accuracy engine credits lift from published benchmark numbers. | provider, model, top_n, timeout_ms |
tool_call Tool Call | Logic | External function/tool invocation. Contributes fixed latency/cost based on declared SLAs. | name, expected_latency_ms {p50,p95}, fixed_cost_cents |
cache Cache | Logic | Short-circuits downstream work when the traffic hit-rate fires. Cheapest win. | hit_rate_pct, ttl_seconds, keyed_on |
router Router | Logic | Weighted branch. Each outgoing edge carries a weight; the simulator splits traffic accordingly. | routes[], default_route |
§6
#Simulation
Traffic parameters
Every simulation is parameterized by the same five numbers. Two describe volume, two describe typical request size, one captures how cache-y your workload is.
| Field | Type | Meaning |
|---|---|---|
| requests_per_day | number | Steady-state traffic volume. |
| avg_input_tokens | number | Per-request input size (prompts + retrieved docs). |
| avg_output_tokens | number | Per-request generated tokens (answer length). |
| cache_hit_rate | 0.0 – 1.0 | Fraction of requests served by a cache node before reaching the model. |
| retry_rate | 0.0 – 1.0 | Fraction of calls that fail and get re-attempted. Paid twice by the cost engine. |
Engines
Cost engine
- Joins every priced node against the daily pricing table.
- Charges retries at retry_rate × provider rate.
- Discounts cached requests.
- Returns p50 + p95 $/query and daily/monthly totals.
Latency engine
- Queue-theory per node with provider-typical μ + σ.
- Respects concurrency limits and router weights.
- Returns p50 / p90 / p95 / p99 end-to-end latency.
- Flags stages that push worst-case over target.
Accuracy engine
- Bootstraps from published model benchmarks + reranker lift.
- Only runs when at least one graded node is present.
- Returns 0–1 overall score and per-metric breakdown.
- Skips the metric cleanly for pipelines with no retrieval.
Result envelope
Terminal simulations include a structured results object. The shape below is abbreviated; see apps/web/types/simulation.ts for the complete TypeScript types.
{
"status": "complete",
"cost": {
"cost_per_query_p50": "0.0042",
"cost_per_query_p95": "0.0071",
"daily_cost_usd": "105.00",
"monthly_cost_usd": "3150.00",
"stage_breakdown": [ /* per-node cost contribution */ ]
},
"latency": {
"latency_p50_ms": 820,
"latency_p95_ms": 1740,
"latency_p99_ms": 2390,
"stage_breakdown": [ /* per-node latency contribution */ ]
},
"accuracy": {
"applicable": true,
"overall_score": 0.82,
"metrics": {
"retrieval_recall": 0.79,
"rerank_ndcg": 0.86,
"answer_relevance": 0.83
}
},
"errors": []
}§7
#Comparisons
Side-by-side simulations under identical traffic with a recommendation at the end.
Provide 2–5 pipeline ids and a priority. PRISM fans out simulations, tracks progress per pipeline, and selects a winner that (a) scores highest on your priority and (b) satisfies every active guardrail. If nothing is compliant, the response carries recommendation_status: "no_compliant_option" instead of quietly recommending a failing run.
// Response shape for GET /v1/comparisons/{id}
{
"comparison_id": "cmp_01H7...",
"priority": "balanced",
"completed_count": 3,
"total_count": 3,
"items": [
{
"pipeline_id": "pipe_a",
"pipeline_name": "RAG baseline",
"simulation_id": "sim_...",
"status": "complete",
"score": 0.78,
"rank": 1,
"is_compliant": true,
"metrics": { "cost_per_query_p50": 0.0038, "latency_p95_ms": 1420, "accuracy_overall_score": 0.81 },
"threshold_violations": []
}
// ... more
],
"recommendation": {
"pipeline_id": "pipe_a",
"pipeline_name": "RAG baseline",
"simulation_id": "sim_...",
"reason": "Lowest cost within latency + accuracy guardrails.",
"priority": "balanced"
}
}§8
#Benchmarks
The read-only feed the engines join against. Same data powers the public benchmarks page.
- Scraped daily from provider pricing pages: OpenAI, Anthropic, Google, Cohere, Mistral, Voyage, DeepSeek, Groq, Together, xAI, and more.
- Normalized to $/1K tokens regardless of whether a provider publishes $/1M or per-call pricing.
- Historical price deltas exposed via
/v1/benchmarks/history/{provider}/{model}. - Alerts — POST to
/v1/benchmarks/alertsto subscribe when a specific model's price moves.
§9
#Authentication
All REST endpoints require a Bearer token. Tokens are issued per-workspace from Settings → API keys. A token carries the role of the user who minted it — admin tokens can write, member tokens can read.
curl https://api.getprism.dev/v1/pipelines \
-H "Authorization: Bearer $PRISM_API_KEY"§10
#REST API reference
Base URL: https://api.getprism.dev. All endpoints speak JSON, return standard HTTP status codes, and include X-RateLimit-* headers.
Pipelines
Create, read, update, and delete pipeline graphs. A pipeline is a DAG of nodes with optional routing.
/v1/pipelinesList workspace pipelines.
/v1/pipelinesCreate a new pipeline.
/v1/pipelines/{id}Fetch a pipeline by id.
/v1/pipelines/{id}Replace a pipeline's graph + metadata.
/v1/pipelines/{id}Soft-delete a pipeline and its history.
/v1/pipelines/{id}/simulations/latestMost recent simulation status for a pipeline (used by the workbench on load).
/v1/pipelines/{id}/commentsList annotations/comments attached to a pipeline.
Simulations
Kick off and observe a simulation. POST is 202 Accepted — you stream progress via WebSocket or poll.
/v1/simulationsCreate + enqueue a simulation run.
/v1/simulations/{id}Poll status and partial results.
/v1/simulations/{id}/report-urlShort-lived signed URL for the PDF report.
/v1/simulations/{id}/report302 redirect to the signed URL (convenience).
/v1/simulations/{id}/shareCreate or rotate a public share link.
/v1/simulations/{id}/approvalsList approvals requested on this run.
/v1/simulations/{id}/approveApprove a run (team plan and above).
Comparisons
Run 2–5 pipelines on identical traffic and let PRISM pick a winner along a declared priority (cost / latency / accuracy / balanced).
/v1/comparisonsCreate a comparison (202 Accepted).
/v1/comparisons/{id}Poll scorecards, per-pipeline progress, and the recommendation when it lands.
Benchmarks
Read-only view of the community benchmarks feed: pricing per provider/model, historical deltas, and accuracy scores the recommender relies on.
/v1/benchmarksList benchmark rows (paginated, filterable).
/v1/benchmarks/bootstrapSeed snapshot of benchmarks — useful for offline tools.
/v1/benchmarks/history/{provider}/{model}Price history for a single model.
/v1/benchmarks/compareSide-by-side comparison for a set of provider/model pairs.
/v1/benchmarks/alertsSubscribe to price-change alerts for a model.
Pricing catalog
The up-to-date, scraped pricing table powering the cost engine. Same data that drives the public benchmarks page.
/v1/pricingAll priced models, grouped by provider.
/v1/pricing/catalogProvider → family → model tree.
Templates
Starter pipeline templates (RAG baseline, hybrid search, agent loop, …). Fork-friendly — the POST endpoint clones a template into your workspace.
/v1/templatesList published templates.
/v1/templates/{slug}Fetch a template by slug.
/v1/templates/{slug}/forkFork a template into the caller's workspace.
Workspace & members
Team plan and above. Invite teammates, manage roles, and transfer workspace ownership.
/v1/workspace/membersList seats + roles.
/v1/workspace/invitesPending invites.
/v1/workspace/invitesInvite a teammate (email).
/v1/workspace/invites/{id}/revokeRevoke a pending invite.
/v1/workspace/invites/acceptAccept an invite using the token from the email.
/v1/workspace/members/{id}/roleChange a member's role (owner / admin / member).
/v1/workspace/ownership/transferTransfer ownership to another admin.
/v1/workspace/auditWorkspace audit log.
Validation
Standalone accuracy-validation runs. Useful for regressions when you change a retriever or prompt.
/v1/validation/runsQueue an accuracy validation run.
/v1/validation/runs/{id}Fetch a validation run.
/v1/validation/datasetsList available labeled datasets.
Billing
Stripe-backed. Checkout, customer portal, and webhooks. All requests are workspace-scoped.
/v1/billing/statusCurrent subscription + seats.
/v1/billing/checkout-sessionStart a Stripe checkout for plan upgrade or seat purchase.
/v1/billing/portal-sessionOpen the Stripe customer portal.
/v1/billing/webhookStripe → PRISM webhook receiver.
WebSocket
Real-time progress. Auth over subprotocol header; see the WebSocket events section for frame shape.
/v1/ws/simulations/{id}Stream progress + stage transitions + final result for a simulation.
/v1/ws/validations/{id}Stream progress for a validation run.
§11
#WebSocket events
Live progress for simulations and validation runs.
Connect to wss://api.getprism.dev/v1/ws/simulations/{id}. Auth is passed via the Sec-WebSocket-Protocol header — bearer.<token>. The server sends newline-delimited JSON frames:
// Frame: progress update
{ "type": "progress", "progress_pct": 42, "stage": "latency_engine" }
// Frame: stage transition
{ "type": "stage", "stage": "accuracy_engine", "started_at": "2026-04-21T10:30:02Z" }
// Frame: terminal (final)
{ "type": "complete", "status": "complete", "results": { /* envelope */ } }
// Frame: error (terminal)
{ "type": "error", "status": "failed", "error": { "message": "Upstream 429 on Anthropic.", "type": "provider_error" } }§12
#Rate limits
Every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset (seconds until the window rolls). Limits are workspace-scoped.
- Free: 60 req/min, 50 simulation runs/month.
- Pro: 300 req/min, 1,000 simulation runs/month.
- Team: 600 req/min, 10,000 simulation runs/month, seat-based.
- Enterprise: negotiated. Contact sales.
§13
#Errors
Error responses are JSON with a stable shape.
{
"error": {
"type": "validation_error",
"message": "traffic_params.cache_hit_rate must be between 0 and 1",
"field": "traffic_params.cache_hit_rate"
}
}| Status | Type | When |
|---|---|---|
| 400 | validation_error | Request body failed schema validation. |
| 401 | unauthorized | Missing or invalid bearer token. |
| 403 | forbidden | Token lacks the role for this endpoint. |
| 404 | not_found | Resource doesn't exist or you don't have access. |
| 409 | conflict | Pipeline was edited concurrently; reload and retry. |
| 422 | unprocessable | Semantic check failed (e.g. disconnected graph). |
| 429 | rate_limited | Back off per Retry-After. |
| 500 | internal_error | Server bug. Safe to retry once, then file a report. |
| 503 | upstream_error | A provider API we depend on is degraded. |
§14
#SDKs & tooling
The REST API is the source of truth. First-party SDKs wrap it and add conveniences (retry, streaming, types).
Python
BetaAsync + sync clients. Full type coverage on pipelines, simulations, and comparisons.
pip install prism-sdkTypeScript
BetaIsomorphic. Ships with a streaming helper that uses WebSocket when available and polls when not.
npm install @prism/sdkGo
PlannedRoadmapped for H2. Talk to us if you want to be an early user.
go get github.com/getprism/sdk-goGitHub Action
getprism/prism-action runs a simulation on every PR and comments the cost / latency / accuracy delta vs. main. Source lives in apps/action/ on this repo.
# .github/workflows/prism.yml
name: PRISM Simulation
on: [pull_request]
jobs:
simulate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: getprism/prism-action@v1
with:
api-key: ${{ secrets.PRISM_API_KEY }}
pipeline-id: ${{ vars.PRISM_PIPELINE_ID }}
# optional: override traffic params for the PR run
traffic: |
requests_per_day: 25000
avg_input_tokens: 1200
avg_output_tokens: 400
cache_hit_rate: 0.15
retry_rate: 0.02§15
#Support
Security issues
Email security@getprism.dev. We respond within one business day and have a public disclosure policy.
API + billing
Email support@getprism.dev for account, billing, or API questions. Team plans get a shared Slack channel.
Status + incidents
Live status at status.getprism.dev — subscribe for incident emails and component-level uptime.