Articles

9 Metrics Every Engineering Leader Needs to Track

by
José Caldeira
July 20, 2022

We all know that engineering metrics are essential in guiding our decisions.

As engineering leaders, we're responsible for product development and the engineering team's experience. Athenian provides dozens of relevant data points for engineering leaders, but figuring out where to start can be challenging.

These are the top 9 metrics we recommend looking at, so you can pick 2 or 3 to start monitoring and discussing with your engineering teams.

Once you've picked your metrics, here's how you can use them to set your engineering org. up for success.

But before that, let's dive in! 

1. Lead Time

Definition

The time between a code change and it being run in production.

How

Athenian calls this metric PR Cycle Time. The data is obtained from GitHub and customers' CI/CD system to understand how work is flowing. This data is then enriched with ticketing information.

Athenian mainly looks at how work flows, not how developers report it in the ticketing system. This gives customers full visibility of the entire development pipeline: from the first commit to code deployed in production without disrupting development teams.

Athenian brings visibility of customer PR Cycle Time and provides deep visibility into the stages the code goes through so teams can identify bottlenecks.

Outcome

Improve delivery speed by identifying bottlenecks.

lead time engineering metrics

2. Deployment Frequency

Definition

The number of deployments to production.

How

By having a customer deployment system calling the Athenian API, we can obtain the deployment frequency to production or any relevant environment for the engineering teams (e.g., staging environment).

Outcome

Improve agility by identifying and increasing deployment throughput.

deployment frequency chart

3. Change Failure Rate

Definition

The percentage of failed deployments to production.

How

Customers can notify the Athenian API every time they perform a deployment, allowing Athenian to report the success ratio of deployment activities. Athenian presents the success ratio instead of the failure rate.

Outcome

Increase deployment quality by monitoring the percentage of deployments that are not successful.

how to see change failure rate in a chart for deployment frequency

4. Mean Time to Recover

Definition

The mean time to resolve an incident in production.

How

Athenian counts the time it takes for customer bugs to be acknowledged by teams (MTTA) and sums it with the time it takes the teams to solve issues (MTTRepair).

These two measurements allow customers to have clear visibility of their SLOs from the moment issues are reported into the ticketing tool until they are solved and deployed to production.

To accurately do this Athenian combines information from the ticketing system, GitHub, and the customer CI/CD system.

Outcome

Increase agility by decreasing response time to resolve issues.

mean time to recover chart with different bug priorities

5. CI Velocity

Definition

Average time to run test suit.

How

All GitHub actions are tracked to let customers know the average run time of their tests and other CI activities. You can drill down this information to identify improvement points in the CI system. For example, investigate the build run time to optimize it.

Outcome

Accelerate delivery by identifying CI bottlenecks.

a chart showing suit run time over a period of time

6. CI Quality

Definition

The flakiness and success ratio of test suit.

How

Observe the results of test runs based on GitHub actions to understand the effectiveness of the running tests. Increase customer release confidence by understanding how reliable tests are (flakiness) and how many problems are caught before a release (failing checks).

Outcome

Speed up releases by monitoring quality of code.

a chart showing success ratio of test suites

7. Bug Resolution Ratio

Definition

The percentage of bugs solved versus bugs identified.

How

Athenian tracks all bugs submitted into the customer issue tracking system over time. This way, teams can understand how the bug backlog has been evolving, allowing them to properly manage the quality of the product.

Outcome

Ensure product quality by keeping bug ratio under control.

a chart showing bug fixing ratio over time

8. Code Complexity

Definition

The average size of the changes made to code.

How

Athenian measures the average PR size to help customers identify patterns of complex code changes, which typically increase deployment risk. Code complexity analysis is done by obtaining data from GitHub to understand, per team, what were the biggest changes made on the customer code.

Outcome

Reduce deployment risk by identifying large changes in the code base.

a chart showing the size of the changes made to code

9. Team Investment

Definition

The distribution of time invested by teams.

How

Athenian tracks all work reported in the issue tracking system or through PRs, to help customers understand where their teams are investing more effort.

Customers can understand where teams are investing more effort by defining customizable work categories to increase the visibility of how different activities are moving forward.

In addition, Athenian allows you to categorize the same items across different views to increase visibility. For example, obtain all investment on bugs while identifying the distribution of customer-facing bugs and bugs discovered internally.

Outcome

Improve decisions by having clear visibility of team investment levels.

Ready to see how you're engineering org is doing on these metrics? Let's get started! 

Oh and we made a cheat sheet for you! 

Save it, share it, print it, stick in on your office wall (or fridge, if you're WFH).