Supported Models

Access the latest open-source models through a single API. Text generation, code, multimodal, speech, and embeddings — all served on vLLM v0.17.1.

Showing 29 models

DeepSeek V4 Pro

DeepSeek

Featured

Advanced reasoning, coding, and long-horizon agent workflows

text code
Parameters

MoE

Context

128K

Input

$1.75/1M

Output

$3.50/1M

GLM-5.2

ZAI

Featured

ZAI's latest flagship with strong bilingual reasoning, long-context understanding, and tool use

text code
Parameters

MoE

Context

128K

Input

$1.40/1M

Output

$4.40/1M

Kimi K2.6

Moonshot

Featured

Native multimodal agentic model with long-context capabilities

text code
Parameters

1T MoE

Context

256K

Input

$0.95/1M

Output

$4.00/1M

Qwen3.5 397B A17B

Qwen

Featured

Alibaba's largest Qwen3.5 MoE model for complex reasoning

text code
Parameters

397B MoE

Context

256K

Input

$0.60/1M

Output

$3.60/1M

gpt-oss 120B

OpenAI

Featured

OpenAI open-weight 120B model with transparent weights

text code
Parameters

120B MoE

Context

128K

Input

$0.15/1M

Output

$0.60/1M

Llama 3.3 70B

Meta

Featured

Meta's flagship 70B instruction-following model

text
Parameters

70B

Context

128K

Input

$0.13/1M

Output

$0.40/1M

DeepSeek V3.2

DeepSeek

Strong coding and reasoning performance at low cost

text code
Parameters

671B MoE

Context

128K

Input

$0.30/1M

Output

$0.45/1M

GLM-5.1

ZAI

Multimodal flagship with advanced reasoning and tool use

text code
Parameters

MoE

Context

128K

Input

$1.40/1M

Output

$4.40/1M

GLM-5

ZAI

text code
Parameters

MoE

Context

128K

Input

$1.00/1M

Output

$3.20/1M

Kimi K2.5

Moonshot

Strong long-context and reasoning capabilities

text code
Parameters

1T MoE

Context

256K

Input

$0.50/1M

Output

$2.50/1M

Qwen3 235B A22B Instruct

Qwen

High-quality reasoning and instruction following

text code
Parameters

235B MoE

Context

256K

Input

$0.20/1M

Output

$0.60/1M

Qwen3 235B Thinking

Qwen

Thinking/reasoning variant of Qwen3 235B

text
Parameters

235B MoE

Context

256K

Input

$0.50/1M

Output

$2.00/1M

Qwen3 Next 80B A3B Thinking

Qwen

text
Parameters

80B MoE

Context

256K

Input

$0.15/1M

Output

$1.20/1M

Qwen3 32B

Qwen

Compact model balancing quality and speed

text code
Parameters

32B

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Qwen3 30B A3B

Qwen

Efficient MoE model for instruction following

text
Parameters

30B MoE

Context

256K

Input

$0.10/1M

Output

$0.30/1M

MiniMax M2.5

MiniMax

text code
Parameters

MoE

Context

256K

Input

$0.30/1M

Output

$1.20/1M

Gemma 3 27B

Google

Google's Gemma 3 instruction-tuned model

text
Parameters

27B

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Hermes 4 405B

NousResearch

Powerful instruction-following model with long-context capabilities

text
Parameters

405B

Context

128K

Input

$1.00/1M

Output

$3.00/1M

Hermes 4 70B

NousResearch

Highly capable model fine-tuned for multi-turn conversations

text
Parameters

70B

Context

128K

Input

$0.13/1M

Output

$0.40/1M

INTELLECT-3

PrimeIntellect

Third-generation model trained via decentralized compute

text
Parameters

MoE

Context

128K

Input

$0.20/1M

Output

$1.10/1M

Nemotron 3 Ultra 550B

NVIDIA

Massive MoE model for demanding reasoning and agentic workloads

text
Parameters

550B MoE

Context

128K

Input

$1.00/1M

Output

$3.00/1M

Llama 3.1 Nemotron Ultra 253B

NVIDIA

text
Parameters

253B

Context

128K

Input

$0.60/1M

Output

$1.80/1M

Nemotron 3 Super 120B A12B

NVIDIA

Hybrid MoE model optimized for efficient multi-agent AI

text
Parameters

120B MoE

Context

128K

Input

$0.30/1M

Output

$0.90/1M

Nemotron 3 Nano 30B A3B

NVIDIA

text
Parameters

30B MoE

Context

128K

Input

$0.06/1M

Output

$0.24/1M

Nemotron 3 Nano Omni

NVIDIA

Open, efficient omni-modal reasoning model for agentic AI

text multimodal
Parameters

Nano

Context

128K

Input

$0.06/1M

Output

$0.24/1M

Cosmos 3 Super Reasoner

NVIDIA

Super reasoning model for complex multi-step tasks

text
Parameters

Reasoning

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Qwen2.5 VL 72B

Qwen

Vision-language model supporting text and images

multimodal text
Parameters

72B

Context

128K

Input

$0.25/1M

Output

$0.75/1M

MiniCPM-V 4.5

OpenBMB

Efficient vision-language model with strong multimodal capabilities

multimodal
Parameters

8B

Context

128K

Input

$0.66/1M

Output

$1.11/1M

Qwen3 Embedding 8B

Qwen

High-precision dense retrieval with multilingual coverage (4,096 dims)

embedding
Parameters

8B

Context

32K

Input

$0.01/1M

Need a different model?

We're constantly adding new models based on customer demand. Let us know which models you'd like to see, and we'll prioritize adding them to the platform.

View documentation

Ready to get started?

Sign up and start using these models in minutes. No credit card required.