·infrastructure · pricing
Own the inference stack, or rent it? The choice that shapes a chatbot SaaS
Every chatbot company makes the same choice in their first quarter: rent an LLM API and mark it up, or own the inference stack and absorb the cost. The shape of the cost curve is what matters.
Every chatbot SaaS company makes the same architectural choice in its first quarter, often without realizing it's a choice: rent a frontier LLM API (OpenAI, Anthropic) and mark up per-token usage, or operate the inference stack directly and absorb the marginal cost. The decision is invisible to customers but it shapes everything — pricing, margin curve, customer-data posture, even how you respond to outages.
What "rent" looks like
Most chatbot SaaS providers route inference through OpenAI or Anthropic's API. Visitor asks a question; the chatbot company sends the question + relevant retrieved context to the API; the API returns a response; the chatbot company forwards it to the visitor and charges them per-message or per-resolution. The chatbot company's margin is the spread between what OpenAI charges them and what they charge you.
It's a fine business if you can grow fast enough to dilute the underlying token cost across many customers. It's a fragile business if your provider changes pricing (they will), changes terms (they will), experiences a multi-hour outage (they will), or starts competing with you directly (eventually they do).
What "own" looks like
You run the inference yourself — open-weight models on infrastructure you operate. The marginal cost of one more chatbot response is the electricity to run a GPU for a few hundred milliseconds. Your bill stays flat as customers send more messages, until you hit a throughput ceiling and add more hardware — a step function, not a curve.
The trade: higher operational overhead, less access to frontier reasoning capability, more responsibility for capacity planning. The win: predictable economics that let you offer flat-rate pricing without praying that nobody goes viral.
The cost-curve difference
This is the part most pricing pages bury. With per-token billing, your AI cost scales linearly with success. A 10× traffic month is a 10× bill. With owned infrastructure, your cost is mostly fixed — it scales step-wise with capacity planning, not with usage.
If you're a customer evaluating chatbot SaaS, look at how the vendor responds to "what if my traffic spikes 10×?" If the answer is "your bill spikes 10× too" — they're renting. If the answer is "you stay on your tier" — they're owning the stack (or absorbing the loss).
Where renting wins
Frontier capability. Today's GPT-4o / Claude Sonnet / Gemini 2.5 outperform any 14B-class open-weight model on subtle reasoning. If your bot's job is "answer customer-support questions from docs," that gap doesn't matter — modern open-weight models handle it well. If your bot's job is "interpret a 50-page legal contract and flag risk," the frontier models still genuinely outperform. Rent when the marginal answer quality matters more than the marginal cost.
The hybrid most platforms land on
Hosted as the default for the 90% of use cases that don't need frontier capability. BYOK ("bring your own key") for the customers who do. Ashh.ai is structured this way: bots run on our hosted open-weight models by default, and customers can point any individual bot at their own Claude / OpenAI / DeepSeek key when the use case warrants the cost. Best of both — without us inserting ourselves between the customer and their LLM provider relationship.
What this means for procurement teams
If you're evaluating chatbot vendors and your CFO is wary, ask three questions:
- What does my bill look like if traffic 10×s? Per-token vendors honest about this are rare.
- Where is my customer data going during inference? Rented stacks route through third-party APIs; owned stacks don't.
- What happens if your LLM provider has an outage? Rented stacks go down with their provider; owned stacks have local fallbacks.
None of these is a perfect-vendor question. They're a "what shape is the underlying business model?" question. The answers tell you which side of the rent-vs-own choice your prospective vendor made — and what trade-offs you're inheriting.
Build a private AI chatbot in 5 minutes.
Flat-rate. Your data never used to train anyone else's models.
Start free →