Serverless Inference
Pay-per-token API access to open-source models.
OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.
API Request Flow
Live
Model Llama 3.3 70B
Latency 142ms TTFT
No infrastructure management
Send requests, get responses. We handle everything else.
Instant scaling
From zero to thousands of requests per second, automatically.
$0.02
Pay per token
No idle costs. You pay only for the tokens you process.
base_url="api.openai.com"
base_url="api.lyceum.technology"
OpenAI-compatible
Drop-in replacement. Change one line of code.
Get started in minutes
Use the OpenAI Python library with just one line changed. Your existing code works out of the box.
Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.lyceum.technology/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content) Supported models
Access the latest open-source models through a single API.
DeepSeek V4 Pro
DeepSeek MoE 128K context
GLM-5.2
ZAI MoE 128K context
Kimi K2.6
Moonshot 1T MoE 256K context
Qwen3.5 397B A17B
Qwen 397B MoE 256K context
gpt-oss 120B
OpenAI 120B MoE 128K context
Llama 3.3 70B
Meta 70B 128K context
Simple, transparent pricing
Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.
View pricingReady to get started?
Sign up and start sending requests in minutes.