Serverless

OpenAI-compatible inference, served.

Point your existing OpenAI code at a Nusapod endpoint. It runs your chosen model on a GPU you rent by the hour — no server to manage.

Drop-in OpenAI API

Keep your OpenAI client and request shape. Swap the base URL and key, and your existing code talks to your model.

Your model, your GPU

Requests run on a dedicated GPU you control, so behaviour and latency are predictable instead of shared and noisy.

Per-hour, not per-token

You rent the GPU by the hour; the throughput it produces is yours. There is no per-token bill to forecast.

Features

Ship an endpoint, not infrastructure

/v1/chat/completions

The familiar chat completions endpoint, served from your deployment. Streaming and standard parameters work as you expect.

API keys you control

Create and revoke keys from the dashboard. Scope access to your endpoint without touching infrastructure.

Live logs over SSE

Stream your deployment logs in real time to watch boot, requests, and errors as they happen.

Per-hour GPU billing

Billing tracks the GPU, not the request. Stop the deployment and the meter stops — no per-call accounting.

Use cases

What you can build

Chatbots and assistants

Back a conversational product with a stable, OpenAI-compatible endpoint.

RAG pipelines

Use the chat completions API as the generation step in a retrieval-augmented stack.

AI agents

Drive tool-using agents against an endpoint you control, at predictable per-hour cost.

Internal tools

Add inference to internal apps without standing up your own serving infrastructure.

FAQ

Common questions

Is it really OpenAI-compatible?

Yes. Your deployment exposes an OpenAI-compatible API, including /v1/chat/completions. Existing OpenAI client code works by changing the base URL and key.

Do you charge per token?

No. To be plain: "serverless" here means you do not manage the server — but billing is per-hour GPU rental, not per request or per token. You rent the GPU; the throughput is yours.

How do API keys work?

Create and revoke API keys from the dashboard, then pass one as the bearer token your OpenAI client already sends. Revoking a key cuts off access immediately.

Can I see logs?

Yes. Deployment logs stream live over SSE, so you can watch boot, requests, and errors in real time from the dashboard.

What models can I run?

Open models from the catalog — Qwen, DeepSeek, Llama, Gemma and more — or your own weights or container. They run on vLLM behind the endpoint.

Ready to deploy?

Launch an open-source model on Indonesian GPU infrastructure and get a live, OpenAI-compatible endpoint in under 5 minutes.