RAG Performance Calculator
Estimate end-to-end latency for a retrieval-augmented generation pipeline by stage — so you know whether embedding, search, rerank or generation is your real bottleneck.
–
Total (ms)
–
Time to first token
–
Bottleneck
🔍 Cut the bottleneck
- Rerank heavy? Use a lighter reranker or top-k filter
- Search slow? Add an HNSW/IVF index
- TTFT high? Smaller/faster model or closer region
⚡ Perceived speed
Users feel TTFT, not total time. Stream tokens and show retrieval status to mask the generation tail.
Frequently Asked Questions
Where do I get these numbers?
Measure each stage with our Embedding Speed Test and Prompt Latency Test, plus your vector DB's query metrics. Then plug them in here.
Should I always rerank?
Reranking improves answer quality but adds latency. If your first-stage retrieval is already accurate, you can skip it or apply it only to ambiguous queries.
Related tools