Error Rate measures the percentage of API requests, user sessions, or transactions that result in an error (typically HTTP 5xx server errors or application-level exceptions). It is a real-time health signal for production systems and is used as an SLO indicator alongside latency and uptime. Sudden spikes in error rate are often the first observable signal of a production incident.
Error rate should be tracked at multiple levels: total platform error rate, per-endpoint error rate, and per-user-segment error rate to catch localized failures that blend into acceptable aggregate numbers.
Production error rates below 0.1% are generally considered excellent; above 1% typically warrants immediate investigation; SLO targets are typically set at 0.1%–0.5% depending on service criticality.
Each function reads Error Rate through a different lens and takes different actions when it changes.
Click any question to expand the answer.
Metrics that are commonly analyzed alongside Error Rate.
See how each role uses Error Rate in context with the full set of metrics they own.
askotter connects your data sources and applies causal analysis to tell you exactly why your metrics are changing, not just that they changed.
Book a Conversation →