LLM Speed Test — AI Tokens Per Second
Measure real AI inference speed. How fast does the model stream tokens to you?
Quick prompts
Your key & prompt go straight to your endpoint — never to us.
📊 What Is TTFT?
TTFT (Time to First Token) measures how long after sending your prompt you receive the very first word back from the model. Lower TTFT = more responsive feel. Fast models typically deliver 300–800 ms TTFT depending on server load and network.
⚡ Tokens Per Second
Tokens per second (TPS) measures the output streaming speed of the model. Higher TPS = text appears faster on screen. GPT-4: ~30–60 TPS. Claude Sonnet: ~60–100 TPS. This is affected by server load, network latency, and output length.
🤖 LLM Speed Comparison
- Claude Sonnet — ~60–100 TPS
- Claude Haiku — ~120–180 TPS
- GPT-4o — ~50–80 TPS
- GPT-4o mini — ~100–140 TPS
- Gemini Flash — ~80–120 TPS
- Numbers vary by region and server load