AI Speed Test — Compare Models Live

Measure real AI inference speed. TTFT (time-to-first-token) and tokens/sec, live in your browser.

Quick prompts

Streaming SSE · your key & prompt go only to your endpoint

📊 Model Speed Comparison Reference

Reference numbers from published benchmarks. Actual speed varies by region, server load, and prompt length. Run the live test above for your real results.

📊 What Is TTFT?

Time To First Token — how long after sending your prompt you receive the very first word. Lower = more responsive feel. Fast models typically deliver 300–800 ms TTFT depending on server load and network.

⚡ Tokens Per Second (TPS)

Measures the streaming output speed. Higher = text appears faster. Affected by: model architecture, server load, network latency, and output length. The test measures wall-clock TPS including network transit.

🤖 What Affects AI Speed?

Server-side GPU load and batching
Your geographic distance to API servers
Output token count (longer = more time)
Model size (larger models = slower)
Quantization level used for inference