The real cost of AI at scale

An AI agent serving 100,000 users runs around $1,000 a month. With caching. The model spend at scale is small. Anyone who tells you otherwise is selling you something else.

If a vendor is quoting you 10 times that, ask them what they are actually building. The honest answer is that the model is the cheapest part of the system, and the work that costs the money is everything around it. This piece breaks down where the real cost lives, why most quotes are wrong, and what you should actually be paying for.

// 01

The headline number

100,000 monthly active users. Each one sends 5 messages to your agent. Each message costs roughly 1,500 tokens of input and 400 tokens of output, after caching kicks in. Run that math against the public Anthropic or OpenAI rates and you land somewhere between $800 and $1,400 a month in raw API spend. Call it $1,000.

That number is real. The pricing pages are public. The math is not a secret. You can run it yourself in five minutes and you will land in the same range we did.

// 02

Where the math comes from

Three things keep the model spend low at scale. Prompt caching, which most modern providers offer, drops the cost of the static parts of your prompt by 90%. Batching brings down the per-token cost when you can tolerate latency. Output truncation keeps responses tight by design instead of letting the model ramble.

If you skip those three optimizations, you can easily 5x the bill. If you do them, the model is a rounding error in your operating expenses. The real cost is somewhere else entirely.

// 03

What is actually expensive

Here is where the budget actually goes when an AI system goes into production:

Scoping the right problem. The single most expensive mistake is building the wrong thing well. Discovery, observation, and saying no to features the founder wants but the team will not use, that is real work and worth paying for.
Cleaning the data. Your CRM has 40% missing fields. Your knowledge base contradicts itself. Your tickets are tagged inconsistently. Before any AI system ships, somebody has to live in your data for a week and fix the obvious problems. This is the most undersold line item in every proposal we have ever read.
Wiring it into the systems people already use. Slack, HubSpot, the data warehouse, the CRM, Notion, the helpdesk. None of these have one obvious integration pattern. Each one takes a few days to do well. Across five systems, this is a meaningful chunk of the build.
Evaluating the output. You need a test set. You need to know when the agent is regressing. You need a way to catch a 5% accuracy drop before your users do. Tools like DeepEval and Promptfoo make this faster but they do not make it free.
Keeping it alive. Models update. APIs deprecate. Upstream systems change their schemas. The agent that worked perfectly on Day 1 will quietly break on Day 90 if no one is watching it. That work has to be funded.

Add it up and the labor cost dwarfs the model cost by an order of magnitude or more. That is what you are paying an agency for. Not tokens.

// 04

The cost ladder

The way most AI projects actually scale in cost is not linear. It is a step function. There are three rungs:

01Pilot. One workflow, one team, one model. Total cost: low five figures, mostly labor. Model spend: under $100 a month. Lasts: 30 to 60 days.
02Production. Multiple workflows, integrations, real usage. Total cost: monthly retainer or fixed scope project, mid five to low six figures over a year. Model spend: $500 to $2,000 a month. Lasts: indefinitely.
03Scale. Multi-tenant, sovereign deployment, custom models, evals, observability, the works. Total cost: high six figures or more. Model spend: still under $5k a month for most workloads. Lasts: this is the operating layer of the business now.

Notice that the model spend never becomes the dominant cost, even at scale. The cost is in the work that surrounds the model. If a vendor is pricing you in a way that scales linearly with users or messages, they are charging you a markup on tokens and pretending it is a service fee.

// 05

What you should actually be paying for

When you read an AI vendor proposal, the line items that matter are the ones nobody likes to talk about:

Real discovery time. Not a 30 minute kickoff. A week of someone watching your team work and writing down what they see.
Data quality work. Specifically called out, with hours allocated to it.
Integration work. Each system named, with a realistic estimate.
Eval and observability setup. Not optional. Not phase two.
Maintenance. A monthly hour budget for keeping the thing alive.

If those five line items are not in the proposal, the vendor is either inexperienced or they are selling you a demo and calling it a system. The model itself is at most 5% of a real budget.

If you want to know what something will really cost, ask for the audit. Numbers on a page beat numbers in a deck.

// 06

The bottom line

AI at scale is not expensive because of the model. It is expensive because doing it well requires real work in places nobody likes to budget for. The companies that ship are the ones who fund those line items. The companies that fail are the ones who pay for tokens and skip the rest.

Before you sign anything, get the cost broken down. If the breakdown does not include data, integrations, evals, and maintenance, you are not getting a real number. You are getting a sticker price for a system that has not been built yet.

End of post

The real cost of AI at scale.

The headline number

Where the math comes from

What is actually expensive

The cost ladder

What you should actually be paying for

The bottom line

Have a problem
worth shipping?

The real cost of AI at scale.

The headline number

Where the math comes from

What is actually expensive

The cost ladder

What you should actually be paying for

The bottom line

Have a problemworth shipping?

Have a problem
worth shipping?