Serverless Inference

Pay-per-token API access to open-source models.

OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.

Get started Talk to our Engineering team

API Request Flow

Live

Your App

Lyceum

Model

Model Llama 3.3 70B

Latency 142ms TTFT

No infrastructure management

Send requests, get responses. We handle everything else.

Instant scaling

From zero to thousands of requests per second, automatically.

$0.02

Pay per token

No idle costs. You pay only for the tokens you process.

base_url="api.openai.com"

base_url="api.lyceum.technology"

OpenAI-compatible

Drop-in replacement. Change one line of code.

Get started in minutes

Use the OpenAI Python library with just one line changed. Your existing code works out of the box.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.lyceum.technology/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

View full API documentation →

Supported models

Access the latest open-source models through a single API.

DeepSeek V4 Pro

DeepSeek

MoE 128K context

GLM-5.2

ZAI

MoE 128K context

Kimi K2.6

Moonshot

1T MoE 256K context

Qwen3.5 397B A17B

Qwen

397B MoE 256K context

gpt-oss 120B

OpenAI

120B MoE 128K context

Llama 3.3 70B

Simple, transparent pricing

Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.

View pricing

Ready to get started?