GPU endpoints

Docs · TEAM tier

Bring your own GPU

Ashh.ai hosts the dashboard, RAG index, and chat plumbing. You host the GPU. Bots routed through your endpoint never send a single token to a third-party AI provider — your data stays on your hardware.

Two ways to connect

Pick whichever matches your network. Both work the same from the bot's perspective.

Connector binary (recommended)connector

Outbound-only Go binary on the GPU box. No inbound ports, no DNS, no certs. Best for corporate firewalls, airgapped LANs, and anyone who doesn't want to operate a reverse proxy.

Manual reverse proxymanual

You front your Ollama with TLS + a bearer token. Ashh.ai calls your URL with the saved auth header. Best when you already operate a reverse proxy and want one less moving piece.

Connector binary — quick start

  1. In Clients → BYO GPU → Add, pick Connector binary and click Generate pairing token. The dashboard hands you a one-liner with the token pre-filled.
  2. Paste it on your GPU box. It looks like:
    curl -fsSL https://ashh.ai/install.sh | sudo sh -s -- --pair lc_live_…
    The installer auto-detects your OS + CPU (Linux x86_64, Linux arm64, macOS Intel, macOS Apple Silicon), downloads the right binary, pairs, and on Linux installs a ashh-connector systemd service that auto-starts on boot.
  3. Back in the dashboard, the endpoint flips to healthy within a few seconds. Models are auto-discovered from your local Ollama.

Don't want to pipe to sh?

Download the binary directly from /downloads/ashh-connector-<os>-<arch>,chmod +x it, then run ./ashh-connector --pair lc_live_… followed by ./ashh-connector. Same outcome, more steps. See connector/README.md for the systemd unit.

Uninstall

curl -fsSL https://ashh.ai/install.sh | sudo sh -s -- --uninstall

Manual reverse proxy — three recipes

Pick whichever you're already comfortable operating. All three work identically — Ashh.ai calls https://your-url/api/chat with your auth header on every request.

A · Cloudflare Tunnel

Free, requires only a Cloudflare account + a domain on Cloudflare. Outbound-only from your GPU box; Cloudflare provides the public HTTPS endpoint.

  1. Install cloudflared on your GPU box, run cloudflared tunnel login.
  2. Create a tunnel: cloudflared tunnel create ashh-gpu
  3. Map a hostname: cloudflared tunnel route dns ashh-gpu gpu.yourcompany.com
  4. Config ~/.cloudflared/config.yml:
    tunnel: ashh-gpu
    credentials-file: /home/you/.cloudflared/<id>.json
    ingress:
      - hostname: gpu.yourcompany.com
        service: http://localhost:11434
        originRequest:
          httpHostHeader: localhost:11434
      - service: http_status:404
  5. Add a Cloudflare Access policy (Service Auth) requiring a header token: in the dashboard go to Zero Trust → Access → Applications, gate gpu.yourcompany.com with a service-token policy. Cloudflare gives you a CF-Access-Client-Id and CF-Access-Client-Secret pair.
  6. In Ashh.ai → Manual reverse proxy, set Base URL to https://gpu.yourcompany.com and Auth header to CF-Access-Client-Id: <id>\nCF-Access-Client-Secret: <secret>(only one header per Ashh.ai field today — use a small Caddy or Cloudflare Worker if you need both).

B · Tailscale Funnel

Simplest if you already use Tailscale. Funnel makes a tailnet service public over HTTPS without opening ports. No DNS to manage.

  1. Install Tailscale on the GPU box, tailscale up.
  2. Front Ollama with a tiny Caddy that adds a bearer-token check (Tailscale Funnel is HTTPS-only but doesn't add auth):
    # /etc/caddy/Caddyfile
    :9080 {
      @authed header Authorization "Bearer your-secret-token"
      handle @authed {
        reverse_proxy localhost:11434
      }
      handle {
        respond "Unauthorized" 401
      }
    }
  3. Expose Caddy via Funnel: tailscale funnel --bg --https=443 9080
  4. In Ashh.ai → Manual reverse proxy, set Base URL to https://gpu-box.tail-XXXX.ts.net and Auth header to Authorization: Bearer your-secret-token.

C · Caddy + DNS (your own domain)

Most full-control option. You own a domain, point an A record at your box, Caddy auto-provisions Let's Encrypt and proxies to Ollama.

# Caddyfile
gpu.yourcompany.com {
  @authed header Authorization "Bearer your-secret-token"
  handle @authed {
    reverse_proxy localhost:11434
  }
  handle {
    respond "Unauthorized" 401
  }
}

Then in Ashh.ai: Base URL https://gpu.yourcompany.com, Auth header Authorization: Bearer your-secret-token.

Pointing a bot at your GPU

  1. Open the bot's Edit page → Model & RAG tab.
  2. The model picker now has a "Your GPU — <endpoint name>" optgroup. Pick a model from there.
  3. Save the bot. Inference now routes through your endpoint instead of the platform pool.

If your endpoint goes offline (connector crash, network down, reverse proxy 5xx), the bot falls back to the platform Ollama rather than 500-ing the chat — so visitors see some reply instead of an error. You'll see a yellow "stale" or red "error" pill on the endpoint detail page.

Security model