Serverless Inference

Pay-per-token API access to open-source models.

OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.

API Request Flow
Live
Your App
Lyceum
Model
Model Llama 3.3 70B
Latency 142ms TTFT

No infrastructure management

Send requests, get responses. We handle everything else.

Instant scaling

From zero to thousands of requests per second, automatically.

$0.02

Pay per token

No idle costs. You pay only for the tokens you process.

base_url="api.openai.com"
base_url="api.lyceum.technology"

OpenAI-compatible

Drop-in replacement. Change one line of code.

Get started in minutes

Use the OpenAI Python library with just one line changed. Your existing code works out of the box.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lyceum.technology/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Supported models

Access the latest open-source models through a single API.

DeepSeek V4 Pro

DeepSeek
MoE 128K context

GLM-5.2

ZAI
MoE 128K context

Kimi K2.6

Moonshot
1T MoE 256K context

Qwen3.5 397B A17B

Qwen
397B MoE 256K context

gpt-oss 120B

OpenAI
120B MoE 128K context

Llama 3.3 70B

Meta
70B 128K context

Simple, transparent pricing

Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.

View pricing

Ready to get started?

Sign up and start sending requests in minutes.