Supported Models
Access the latest open-source models through a single API. Text generation, code, multimodal, speech, and embeddings — all served on vLLM v0.17.1.
Showing 29 models
DeepSeek V4 Pro
DeepSeek
Advanced reasoning, coding, and long-horizon agent workflows
MoE
128K
$1.75/1M
$3.50/1M
GLM-5.2
ZAI
ZAI's latest flagship with strong bilingual reasoning, long-context understanding, and tool use
MoE
128K
$1.40/1M
$4.40/1M
Kimi K2.6
Moonshot
Native multimodal agentic model with long-context capabilities
1T MoE
256K
$0.95/1M
$4.00/1M
Qwen3.5 397B A17B
Qwen
Alibaba's largest Qwen3.5 MoE model for complex reasoning
397B MoE
256K
$0.60/1M
$3.60/1M
gpt-oss 120B
OpenAI
OpenAI open-weight 120B model with transparent weights
120B MoE
128K
$0.15/1M
$0.60/1M
Llama 3.3 70B
Meta
Meta's flagship 70B instruction-following model
70B
128K
$0.13/1M
$0.40/1M
DeepSeek V3.2
DeepSeek
Strong coding and reasoning performance at low cost
671B MoE
128K
$0.30/1M
$0.45/1M
GLM-5.1
ZAI
Multimodal flagship with advanced reasoning and tool use
MoE
128K
$1.40/1M
$4.40/1M
GLM-5
ZAI
MoE
128K
$1.00/1M
$3.20/1M
Kimi K2.5
Moonshot
Strong long-context and reasoning capabilities
1T MoE
256K
$0.50/1M
$2.50/1M
Qwen3 235B A22B Instruct
Qwen
High-quality reasoning and instruction following
235B MoE
256K
$0.20/1M
$0.60/1M
Qwen3 235B Thinking
Qwen
Thinking/reasoning variant of Qwen3 235B
235B MoE
256K
$0.50/1M
$2.00/1M
Qwen3 Next 80B A3B Thinking
Qwen
80B MoE
256K
$0.15/1M
$1.20/1M
Qwen3 32B
Qwen
Compact model balancing quality and speed
32B
128K
$0.10/1M
$0.30/1M
Qwen3 30B A3B
Qwen
Efficient MoE model for instruction following
30B MoE
256K
$0.10/1M
$0.30/1M
MiniMax M2.5
MiniMax
MoE
256K
$0.30/1M
$1.20/1M
Gemma 3 27B
Google's Gemma 3 instruction-tuned model
27B
128K
$0.10/1M
$0.30/1M
Hermes 4 405B
NousResearch
Powerful instruction-following model with long-context capabilities
405B
128K
$1.00/1M
$3.00/1M
Hermes 4 70B
NousResearch
Highly capable model fine-tuned for multi-turn conversations
70B
128K
$0.13/1M
$0.40/1M
INTELLECT-3
PrimeIntellect
Third-generation model trained via decentralized compute
MoE
128K
$0.20/1M
$1.10/1M
Nemotron 3 Ultra 550B
NVIDIA
Massive MoE model for demanding reasoning and agentic workloads
550B MoE
128K
$1.00/1M
$3.00/1M
Llama 3.1 Nemotron Ultra 253B
NVIDIA
253B
128K
$0.60/1M
$1.80/1M
Nemotron 3 Super 120B A12B
NVIDIA
Hybrid MoE model optimized for efficient multi-agent AI
120B MoE
128K
$0.30/1M
$0.90/1M
Nemotron 3 Nano 30B A3B
NVIDIA
30B MoE
128K
$0.06/1M
$0.24/1M
Nemotron 3 Nano Omni
NVIDIA
Open, efficient omni-modal reasoning model for agentic AI
Nano
128K
$0.06/1M
$0.24/1M
Cosmos 3 Super Reasoner
NVIDIA
Super reasoning model for complex multi-step tasks
Reasoning
128K
$0.10/1M
$0.30/1M
Qwen2.5 VL 72B
Qwen
Vision-language model supporting text and images
72B
128K
$0.25/1M
$0.75/1M
MiniCPM-V 4.5
OpenBMB
Efficient vision-language model with strong multimodal capabilities
8B
128K
$0.66/1M
$1.11/1M
Qwen3 Embedding 8B
Qwen
High-precision dense retrieval with multilingual coverage (4,096 dims)
8B
32K
$0.01/1M
No models found
Try adjusting your search or filters
Need a different model?
We're constantly adding new models based on customer demand. Let us know which models you'd like to see, and we'll prioritize adding them to the platform.
Ready to get started?
Sign up and start using these models in minutes. No credit card required.