Docs · TEAM tier

Bring your own GPU

Ashh.ai hosts the dashboard, RAG index, and chat plumbing. You host the GPU. Bots routed through your endpoint never send a single token to a third-party AI provider — your data stays on your hardware.

Two ways to connect

Pick whichever matches your network. Both work the same from the bot's perspective.

Connector binary (recommended)connector

Outbound-only Go binary on the GPU box. No inbound ports, no DNS, no certs. Best for corporate firewalls, airgapped LANs, and anyone who doesn't want to operate a reverse proxy.

Manual reverse proxymanual

You front your Ollama with TLS + a bearer token. Ashh.ai calls your URL with the saved auth header. Best when you already operate a reverse proxy and want one less moving piece.

Connector binary — quick start

In Clients → BYO GPU → Add, pick Connector binary and click Generate pairing token. The dashboard hands you a one-liner with the token pre-filled.
Paste it on your GPU box. It looks like:
```
curl -fsSL https://ashh.ai/install.sh | sudo sh -s -- --pair lc_live_…
```
The installer auto-detects your OS + CPU (Linux x86_64, Linux arm64, macOS Intel, macOS Apple Silicon), downloads the right binary, pairs, and on Linux installs a ashh-connector systemd service that auto-starts on boot.
Back in the dashboard, the endpoint flips to healthy within a few seconds. Models are auto-discovered from your local Ollama.

Don't want to pipe to sh?

Download the binary directly from /downloads/ashh-connector-<os>-<arch>,chmod +x it, then run ./ashh-connector --pair lc_live_… followed by ./ashh-connector. Same outcome, more steps. See connector/README.md for the systemd unit.

Uninstall

curl -fsSL https://ashh.ai/install.sh | sudo sh -s -- --uninstall

Manual reverse proxy — three recipes

Pick whichever you're already comfortable operating. All three work identically — Ashh.ai calls https://your-url/api/chat with your auth header on every request.

A · Cloudflare Tunnel

Free, requires only a Cloudflare account + a domain on Cloudflare. Outbound-only from your GPU box; Cloudflare provides the public HTTPS endpoint.

Install cloudflared on your GPU box, run cloudflared tunnel login.
Create a tunnel: cloudflared tunnel create ashh-gpu
Map a hostname: cloudflared tunnel route dns ashh-gpu gpu.yourcompany.com

Config ~/.cloudflared/config.yml:

tunnel: ashh-gpu
credentials-file: /home/you/.cloudflared/<id>.json
ingress:
  - hostname: gpu.yourcompany.com
    service: http://localhost:11434
    originRequest:
      httpHostHeader: localhost:11434
  - service: http_status:404

Add a Cloudflare Access policy (Service Auth) requiring a header token: in the dashboard go to Zero Trust → Access → Applications, gate gpu.yourcompany.com with a service-token policy. Cloudflare gives you a CF-Access-Client-Id and CF-Access-Client-Secret pair.
In Ashh.ai → Manual reverse proxy, set Base URL to https://gpu.yourcompany.com and Auth header to CF-Access-Client-Id: <id>\nCF-Access-Client-Secret: <secret>(only one header per Ashh.ai field today — use a small Caddy or Cloudflare Worker if you need both).

B · Tailscale Funnel

Simplest if you already use Tailscale. Funnel makes a tailnet service public over HTTPS without opening ports. No DNS to manage.

Install Tailscale on the GPU box, tailscale up.

Front Ollama with a tiny Caddy that adds a bearer-token check (Tailscale Funnel is HTTPS-only but doesn't add auth):

# /etc/caddy/Caddyfile
:9080 {
  @authed header Authorization "Bearer your-secret-token"
  handle @authed {
    reverse_proxy localhost:11434
  }
  handle {
    respond "Unauthorized" 401
  }
}

Expose Caddy via Funnel: tailscale funnel --bg --https=443 9080
In Ashh.ai → Manual reverse proxy, set Base URL to https://gpu-box.tail-XXXX.ts.net and Auth header to Authorization: Bearer your-secret-token.

C · Caddy + DNS (your own domain)

Most full-control option. You own a domain, point an A record at your box, Caddy auto-provisions Let's Encrypt and proxies to Ollama.

# Caddyfile
gpu.yourcompany.com {
  @authed header Authorization "Bearer your-secret-token"
  handle @authed {
    reverse_proxy localhost:11434
  }
  handle {
    respond "Unauthorized" 401
  }
}

Then in Ashh.ai: Base URL https://gpu.yourcompany.com, Auth header Authorization: Bearer your-secret-token.

Pointing a bot at your GPU

Open the bot's Edit page → Model & RAG tab.
The model picker now has a "Your GPU — <endpoint name>" optgroup. Pick a model from there.
Save the bot. Inference now routes through your endpoint instead of the platform pool.

If your endpoint goes offline (connector crash, network down, reverse proxy 5xx), the bot falls back to the platform Ollama rather than 500-ing the chat — so visitors see some reply instead of an error. You'll see a yellow "stale" or red "error" pill on the endpoint detail page.

Security model

Encryption in transit: TLS on every hop. Connector uses HTTPS only; manual mode requires the user to front their Ollama with TLS.
Auth at rest: Manual auth headers and connector tokens are encrypted with AES-256-GCM in our database.
Token minting: Connector pairing tokens are shown ONCE at creation. We persist only the SHA-256 hash.
Conversation persistence: Bot conversations are stored in our database for the dashboard to render history. If you don't want this, periodically delete conversations from /bots/[id]. Per-bot retention policies coming soon.
Outbound only: Neither mode requires you to open inbound ports. Connector mode is fully outbound; manual mode requires you to front Ollama, but you control whether that's public-internet-exposed (Caddy) or private (Tailscale Funnel / Cloudflare Tunnel with Access).