role guide

KPIs Every VP of Engineering Must Track

VP Engineering manages team velocity, system reliability, and infrastructure efficiency while building the engineering culture that sustains long-term product quality.

Vice President of Engineering

why it matters

Why these metrics matter

The VP of Engineering must simultaneously manage output quality (what the team ships) and platform quality (how reliably it runs). The DORA metrics (deployment frequency, MTTR, change failure rate, and lead time for changes) provide the standard organizational performance framework that is research-validated and comparable across companies. The VP Engineering also owns the production reliability metrics (uptime, API latency, error rate) that directly affect customer experience and SLA compliance. Infrastructure cost is a COGS component that engineering directly controls; maintaining or improving cost per user as the platform scales is a financial responsibility as much as a technical one. Code coverage and technical debt ratio are the long-term health metrics that determine whether current engineering velocity is sustainable or being borrowed from future capability. VP Engineering must translate these technical metrics into business language for executive and board audiences, making the case for reliability investment and technical debt reduction in terms of revenue risk and velocity impact.

diagnostic

Questions you should be able to answer

If you cannot answer these, you are missing critical visibility into your function.

Are we in the DORA elite or high-performer tier for all four metrics across all teams?
What is our production error rate trend, and are there services consistently below our SLO target?
Is infrastructure cost per user stable or improving as the user base grows?
What is our change failure rate by team and change type, and what process improvements will reduce it?
Is technical debt increasing or decreasing in our highest-traffic services?
What is our mean lead time for changes, and which phase of the pipeline is the primary bottleneck?

metric library

Your core KPIs

Every metric includes definition, formula, platforms, causal drivers, and Q&A.

Deployment Frequency

Deployment Frequency measures how often an organization successfully deploys code to production.

Mean Time to Recovery

MTTR

Mean Time to Recovery (MTTR) measures the average time required to restore a service or system to normal operation after a failure or incident.

Change Failure Rate

CFR

Change Failure Rate (CFR) measures the percentage of deployments or changes to production that result in a service degradation, incident, or rollback.

System Uptime / Availability

SLA%

System Uptime (or Availability) measures the percentage of time a service is operational and accessible to users, typically expressed as a percentage of total time in a given period.

API Latency / Response Time

P95

API Latency measures the time elapsed between a client sending a request to an API and receiving a complete response.

Error Rate

Error Rate measures the percentage of API requests, user sessions, or transactions that result in an error (typically HTTP 5xx server errors or application-level exceptions).

Lead Time for Changes

LTFC

Lead Time for Changes measures the elapsed time from when a code commit is made to when that change is running in production.

Code Coverage

Code Coverage measures the percentage of application code executed by automated tests in the test suite.

Technical Debt Ratio

Technical Debt Ratio measures the estimated remediation cost of code quality issues relative to the total cost of developing the codebase, expressed as a percentage.

Infrastructure Cost Per User

Infrastructure Cost Per User measures the average monthly cloud infrastructure spend required to serve each active user, enabling teams to track whether infrastructure costs are scaling efficiently as the user base grows.

causal intelligence

How causal analysis changes the game

For VP Engineerings: Distributed tracing and post-incident reviews provide causal attribution for reliability failures, while controlled experiments on CI/CD process changes causally measure their impact on DORA metrics.

explore

Explore other role guides

Each guide covers the full set of KPIs for that function with role-specific context.

get started

Know why every metric is moving

askotter gives VP Engineerings causal visibility into every metric on this list, so you can act on root causes, not symptoms.

Book a demo