API Latency measures the time elapsed between a client sending a request to an API and receiving a complete response. It is most meaningfully expressed as a percentile distribution (P50, P95, P99) rather than an average, because averages obscure the experience of users who encounter the slowest responses. P95 latency (the response time for the 95th percentile of requests) is the most commonly tracked production reliability target.
Tail latency (P99 and above) is especially important for user-facing APIs; the users experiencing the slowest 1% of responses are often the most engaged and highest-value users, and degraded performance for them disproportionately affects business outcomes.
User-facing APIs typically target P95 below 300ms; P99 below 1,000ms; above 3,000ms P95 is generally considered unacceptable for interactive applications.
Each function reads P95 through a different lens and takes different actions when it changes.
Click any question to expand the answer.
Metrics that are commonly analyzed alongside P95.
See how each role uses P95 in context with the full set of metrics they own.
askotter connects your data sources and applies causal analysis to tell you exactly why your metrics are changing, not just that they changed.
Book a Conversation →