May 22, 2026·BYOK · pricing

BYOK vs. hosted models: when each one makes sense

Bring-your-own-key (Claude, GPT-4o) vs. running open-weight models on our GPUs. Cost, privacy, speed, quality trade-offs.

Every chatbot platform now offers a choice: use our default model, or bring your own API key. The marketing implies it's a feature toggle. The choice has real cost, privacy, and quality consequences that aren't obvious until you're in production.

What "hosted" means

At Ashh.ai, hosted models run on dedicated infrastructure we operate. The model itself is open-weight — from the Qwen, Gemma, Phi, and GPT-OSS families. You don't pay per token because we're not paying per token. Within your tier's message cap, additional usage is free.

What "BYOK" means

You paste your API key (Anthropic, OpenAI, DeepSeek, Kimi) into the dashboard, and we route the bot's requests through that provider with your key. The provider bills you directly for token usage — we never see the bill, never mark it up. Your bot becomes a thin layer that hands prompts to a frontier model.

Use hosted when…

Cost predictability matters. A viral page sending 10,000 questions a day doesn't blow up your bill — your tier cap protects you. With BYOK Claude Sonnet 4.6, the same traffic could cost $200+ in one day.
Privacy is a procurement gate. Hosted means your data never touches a third-party LLM provider. For finance/legal/healthcare buyers who need to pass a Vendor Risk Assessment, this is the difference between "approved" and "no, sorry."
The questions are FAQ-class. "What are your shipping policies?" doesn't need GPT-4o. A modern open-weight model handles it at the same quality level for end-user perception.
Latency matters. Our hosted models typically respond in well under a second. Frontier cloud APIs are often 2-4 seconds. For a chatbot, faster usually feels smarter.

Use BYOK when…

The bot reasons across complex context. Legal briefs, medical case notes, multi-step financial planning. Frontier models genuinely outperform 14B open-weight here.
You already pay for the API. If your team is already on OpenAI Plus / Anthropic Pro / a corporate API account, marginal cost is what you're already paying.
You want a specific model's flavor. Claude is good at thoughtful prose. GPT-4o is good at structured outputs. DeepSeek-R1 reasons in visible steps. Specific tasks benefit from specific personalities.
The volume is low. A bot answering 50 questions a day on Claude Sonnet 4.6 is ~$1 in API cost. Not worth the flat-rate trade-off.

The hybrid pattern most customers land on

One Ashh account, multiple bots. The high-volume customer-support bot runs on our hosted open-weight model — fast, cheap, predictable. The lower-volume "ask our policy library" bot runs on BYOK Claude — better at the harder questions. Best of both, on the same dashboard, paying one flat rate plus your existing API spend for the smart bot.

What we don't do

We don't mark up your provider relationship. If you BYOK GPT-4o, your bill from OpenAI is exactly what your bill from OpenAI would be without us. No "platform fee" on top of API costs, no hidden percentage. The hosted models are how we make money; BYOK is a feature we offer because not having it would be paternalistic.

Build a private AI chatbot in 5 minutes.

Flat-rate. Your data never used to train anyone else's models.

Start free →