Evals, regression tests, A/B
Every model update re-rolls your prompts. Teams without eval pipelines ship regressions to prod on Tuesday and roll back on Thursday — twice a quarter, every quarter.
6–12% of AI TCOPer-token rates are the easy part. Add infra, dev hours, vector DBs, vendor lock-in, monitoring, and human review — and the real TCO is usually 2.5× the API bill. Model it below in 60 seconds.
Your inputs, cross-tabulated against every model we support. The cheapest option isn't always the right one — but "right" shouldn't be off by 50×.
| Model | Vendor | In / 1M | Out / 1M | Cost / query | Monthly |
|---|
Replacing a workflow isn't about the subscription cost — it's about the fully-loaded human alternative, including benefits, tooling, and management overhead.
Adjust the human baseline; we'll break down the cost-per-interaction both ways.
Every model update re-rolls your prompts. Teams without eval pipelines ship regressions to prod on Tuesday and roll back on Thursday — twice a quarter, every quarter.
6–12% of AI TCORAG isn't "upload PDF, done". Chunking strategy, hybrid retrieval, reranker costs, re-embedding on updates — this stack is typically 25–40% of infra spend.
25–40% of infra spendModel-specific fine-tuning, function-calling schemas, cached prompts — all non-portable. Switching vendors later costs 3–6 weeks of engineering per non-trivial integration.
3–6 weeks switch costGDPR, DORA, the EU AI Act. Logs, redaction, jailbreak-resistant system prompts, and a classifier pass on inputs and outputs. Not optional in regulated sectors.
8–15% of AI TCOEven at 95% autonomous, the 5% you escalate demands an ops team, SLAs, and an escalation UI. Scales linearly with volume, not with compute.
~$0.40 per reviewed querySelf-hosting? Reserved GPU hours burn 24/7 even when traffic dips. Using APIs? Failed retries, dropped streams, and timed-out agent loops quietly rack up 8–18% token waste.
8–18% token overrunWe don't invent multipliers. Every assumption is sourced from a public price list or peer-reviewed benchmark — linked below.
Input / output per-1M-token rates are pulled from OpenAI, Anthropic, Google DeepMind and Mistral price pages, refreshed quarterly. We model blended input-heavy workloads separately from generation-heavy ones.
Refreshed: Q1 2026Andreessen Horowitz's 2024 LLMOps survey across 40+ companies found infra+ops+dev roughly doubles the raw API bill. Our default multipliers sit at the median of the reported range.
Source: a16z LLMOps field notes, 2024For RAG use cases, vector DB + embedding cost is modeled against Pinecone Serverless and self-hosted pgvector on RDS m5.xlarge. We assume 1M indexed chunks with nightly delta updates.
Source: Pinecone pricing, AWS RDS listComplete TCO breakdown with year-by-year projections, hidden cost analysis, and budget template.
Includes CFO-ready executive summary with risk flags
Choose the AI services and models you plan to integrate.
Set expected request volumes, data sizes, and processing frequency.
Get full TCO breakdown: compute, storage, API calls, team, and hidden costs.
Turn LCP improvements into monthly revenue. Three scenarios, one formula.
How often do ChatGPT, Claude and Perplexity cite your brand? Find out.
Ballpark a web, mobile or AI-powered build in 90 seconds.
We ask 8 questions, recommend the stack. Works for AI features too.
Fixed price, fixed scope. Model selection, RAG pipeline, evals, monitoring — production-ready, not a prototype.