Model Hub

Deploy open models in one click.

A curated catalog of open models — Qwen, DeepSeek, Llama, Gemma and more — running on vLLM behind an OpenAI-compatible endpoint.

Curated catalog

A hand-picked set of popular open models, configured and ready — no hunting for weights or wiring up a serving stack.

Frontier to efficient

From frontier-scale deepseek-r1-671b down to the lean qwen2.5-7b, pick the size that fits your task and your GPU budget.

Bring your own

Not in the catalog? Push your own model weights or a custom container and deploy it the same way.

Features

From catalog to endpoint

One-click deploy

Choose a model, pick a GPU, and deploy. We provision the card and boot the runtime so your endpoint comes up shortly after.

vLLM runtime

Catalog models run on vLLM with continuous batching, giving you strong throughput without tuning a serving stack.

OpenAI-compatible endpoint

Every deployment exposes an OpenAI-compatible API, so your existing client works by swapping the base URL and key.

Per-hour GPU pricing

Catalog or custom, the billing is the same: you rent the GPU by the hour, with no per-token fees.

Use cases

Models for every job

Chat and instruct

General-purpose conversation and instruction following with Qwen, Llama, and Gemma.

Reasoning

Step-by-step problem solving with reasoning models like DeepSeek-R1.

Coding

Code generation and completion with models such as Qwen Coder.

Your own fine-tunes

Deploy your own weights or container alongside the catalog, on the same runtime and pricing.

FAQ

Common questions

Which models are in the catalog?

Popular open models across families — Qwen (including Qwen Coder), DeepSeek (including DeepSeek-R1), Llama, Gemma and more, spanning sizes from a few billion parameters up to frontier-scale.

Can I bring my own model?

Yes. Push your own model weights or a custom container and deploy it the same way as a catalog model, on the same runtime and per-hour pricing.

What is the runtime?

Catalog models run on vLLM, which provides continuous batching and high throughput behind an OpenAI-compatible endpoint.

How is it billed?

Per hour while the deployment runs. You rent the GPU; there are no per-token fees, whether you deploy from the catalog or bring your own.

How fast is deploy?

Most deployments come up minutes after you click deploy — we provision the GPU and boot the runtime for you.

Ready to deploy?

Launch an open-source model on Indonesian GPU infrastructure and get a live, OpenAI-compatible endpoint in under 5 minutes.