GPU Cloud

Rent a cloud GPU. Any model. By the hour.

Dedicated GPUs — from H100 down to the RTX 4090 — that you deploy any model on, billed per hour, with an OpenAI-compatible endpoint included.

Deploy in minutes

We provision the GPU and boot the runtime for you, so your endpoint is live shortly after you click deploy.

18 GPUs across 5 tiers

From flagship H200 and H100 cards down to value RTX 4090 and A5000 — pick the tier that fits the job and the budget.

Per-hour billing

You pay only while the GPU is running. Stop the deployment and billing stops — no idle cost, no per-token math.

Features

Built for serious GPU work

A dedicated GPU, not a shared slice

The card is yours for the life of the deployment. No noisy neighbours, no contention — predictable throughput and latency.

vLLM runtime out of the box

Models run on vLLM with continuous batching and paged attention, so you get high throughput without tuning a serving stack yourself.

OpenAI-compatible endpoint and API keys

Every deployment exposes an OpenAI-compatible API. Create and revoke API keys from the dashboard and point your existing client at it.

Bring your own model or container

Deploy from the catalog, push your own weights, or run a custom container image — all on the same per-hour GPU.

Use cases

A GPU for every workload

LLM inference

Serve open chat and instruct models behind a stable endpoint for production traffic.

Fine-tuning and training

Take a high-memory card by the hour to fine-tune or train, then release it when you are done.

Batch and offline jobs

Spin up a GPU for a batch run, process the queue, and shut it down — paying only for the hours used.

Image and video generation

Run diffusion and generative media workloads on a dedicated GPU sized to your model.

FAQ

Common questions

How is it billed?

Per hour while a deployment is running. You rent the GPU; the OpenAI-compatible endpoint is included. Stop the deployment and billing stops — there are no per-token or per-request fees.

Which GPUs are available?

A lineup of 18 GPUs across 5 tiers, from flagship H200 and H100 cards to value RTX 4090 and A5000. Pricing is indicative and varies by region and availability.

Can I bring my own container?

Yes. Deploy from the catalog, push your own model weights, or run a custom container image. It all runs on the same per-hour GPU pricing.

What runtime do you use?

Models run on vLLM by default, which gives you continuous batching and high throughput without configuring a serving stack yourself.

Is the endpoint OpenAI-compatible?

Yes. Each deployment exposes an OpenAI-compatible API, so existing OpenAI client code works by swapping the base URL and key.

Ready to deploy?

Launch an open-source model on Indonesian GPU infrastructure and get a live, OpenAI-compatible endpoint in under 5 minutes.