Deploy AI models in minutes.

OpenAI-compatible endpoints on Indonesian GPU infrastructure. Deploy Qwen, DeepSeek, Llama, or Gemma — live in under 5 minutes, billed by the hour.

vLLM runtimeOpenAI APIper-hour billing

< 5 min

From catalog to live endpoint

OpenAI

Drop-in compatible API

per-hour

Billed only for what you run

vLLM

Production inference runtime

Everything to ship inference

From model to production endpoint

A focused path from an open-source model to a live, billable API — without standing up GPUs, runtimes, or gateways yourself.

One-click model catalog, OpenAI-compatible API, per-hour GPU rental, and bring your own model

Drop-in compatible

Already speak OpenAI? Just change the base URL.

Drop-in compatible with the OpenAI SDK. Swap the base URL, keep your code.

curl https://api.nusapod.io/v1/chat/completions \
  -H "Authorization: Bearer $NUSAPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-7b-instruct","messages":[{"role":"user","content":"Hello"}]}'

Per-hour GPU rental

Rent the GPU you need

From flagship H100s to value-tier 4090s — pick the card that fits your model and your budget, and pay only for the hours you run.

See all 18 GPUs & pricing

H100

80GB vRAM

HBM3 · 3.35 TB/s
Flagship throughput

from$2.79/hr

A100

80GB vRAM

HBM2e · 2.0 TB/s
Proven workhorse

from$1.89/hr

L40S

48GB vRAM

GDDR6 · 864 GB/s
Cost-efficient inference

from$0.99/hr

RTX 4090

24GB vRAM

GDDR6X · 1.0 TB/s
Best price per token

from$0.59/hr

Indicative pricing — final rates vary by region.

How it works

From catalog to live endpoint in three steps

Model catalog

Curated, ready to deploy

Pick a model and it runs on vLLM behind an OpenAI-compatible endpoint in minutes.

deepseek-r1-671bdeepseek-v3-671bllama-3.1-405b-instructllama-4-maverick-400bqwen3-235b-a22bmixtral-8x22b-instructdbrx-instruct-132bcommand-r-plus-104bqwen2.5-72b-instructllama-3.3-70b-instructqwen2.5-coder-32bgemma-2-27b-itmistral-7b-instructllama-3.1-8b-instructqwen2.5-7b-instructphi-4+ bring your own

Ready to deploy?

Launch an open-source model on Indonesian GPU infrastructure and get a live, OpenAI-compatible endpoint in under 5 minutes.