Deploy AI models in minutes.

OpenAI-compatible endpoints on Indonesian GPU infrastructure. Deploy Qwen, DeepSeek, Llama, or Gemma — live in under 5 minutes, billed by the hour.

vLLM runtimeOpenAI APIper-hour billing
< 5 min
From catalog to live endpoint
OpenAI
Drop-in compatible API
per-hour
Billed only for what you run
vLLM
Production inference runtime
Everything to ship inference

From model to production endpoint

A focused path from an open-source model to a live, billable API — without standing up GPUs, runtimes, or gateways yourself.

One-click model catalog, OpenAI-compatible API, per-hour GPU rental, and bring your own model
Drop-in compatible

Already speak OpenAI? Just change the base URL.

Drop-in compatible with the OpenAI SDK. Swap the base URL, keep your code.

curl https://api.nusapod.io/v1/chat/completions \
  -H "Authorization: Bearer $NUSAPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-7b-instruct","messages":[{"role":"user","content":"Hello"}]}'
Per-hour GPU rental

Rent the GPU you need

From flagship H100s to value-tier 4090s — pick the card that fits your model and your budget, and pay only for the hours you run.

H100
80GB vRAM
  • HBM3 · 3.35 TB/s
  • Flagship throughput
from$2.79/hr
A100
80GB vRAM
  • HBM2e · 2.0 TB/s
  • Proven workhorse
from$1.89/hr
L40S
48GB vRAM
  • GDDR6 · 864 GB/s
  • Cost-efficient inference
from$0.99/hr
RTX 4090
24GB vRAM
  • GDDR6X · 1.0 TB/s
  • Best price per token
from$0.59/hr

Indicative pricing — final rates vary by region.

How it works

From catalog to live endpoint in three steps

From catalog to live endpoint in three steps
Model catalog

Curated, ready to deploy

Pick a model and it runs on vLLM behind an OpenAI-compatible endpoint in minutes.

deepseek-r1-671bdeepseek-v3-671bllama-3.1-405b-instructllama-4-maverick-400bqwen3-235b-a22bmixtral-8x22b-instructdbrx-instruct-132bcommand-r-plus-104bqwen2.5-72b-instructllama-3.3-70b-instructqwen2.5-coder-32bgemma-2-27b-itmistral-7b-instructllama-3.1-8b-instructqwen2.5-7b-instructphi-4+ bring your own

Ready to deploy?

Launch an open-source model on Indonesian GPU infrastructure and get a live, OpenAI-compatible endpoint in under 5 minutes.