AI Speed Test — Compare Models Live
Measure real AI inference speed. TTFT (time-to-first-token) and tokens/sec, live in your browser.
Quick prompts
Streaming SSE · your key & prompt go only to your endpoint
📊 Model Speed Comparison Reference
Reference numbers from published benchmarks. Actual speed varies by region, server load, and prompt length. Run the live test above for your real results.
📊 What Is TTFT?
Time To First Token — how long after sending your prompt you receive the very first word. Lower = more responsive feel. Fast models typically deliver 300–800 ms TTFT depending on server load and network.
⚡ Tokens Per Second (TPS)
Measures the streaming output speed. Higher = text appears faster. Affected by: model architecture, server load, network latency, and output length. The test measures wall-clock TPS including network transit.
🤖 What Affects AI Speed?
- Server-side GPU load and batching
- Your geographic distance to API servers
- Output token count (longer = more time)
- Model size (larger models = slower)
- Quantization level used for inference