The ‘flawed five’ engineering productivity metrics

5 mins

Which commonly-used metrics aren't working and why?

When I began researching engineering metrics several years ago, one of the first metrics I looked into were commits. While working in GitHub, they provided graphs showing commit counts so it was easy to track. I knew it wasn’t going to be the perfect metric, but it felt useful. For starters, if an engineer had no commits it would definitely be a red flag. If there are no commits, no code is getting shipped; on a software team, that’s a problem.

graphs showing commit counts

See how useful this metric is already? It gives you an awesome way to see how much work is getting done. And it gets even better. You see, developers that I spoke with agree that making small, frequent commits leads to better-designed code. And so if you increase your number of commits, that will, in effect, result in smaller, more frequent commits – which is great!

I thought I was onto something, so I showed it to some other CTOs and they thought it sounded kind of interesting. But then I brought it up with my dad over a family dinner – he thought it was garbage and said that no developer would ever want to be measured like this. For context, my dad had been a developer for 30 years. And throughout his career, he’d seen many situations where managers would roll out terrible metrics and anger everyone on the team. He told me that tracking commits was a horrible idea because it said nothing about the actual value of the work delivered. And if someone wanted to, they could easily game the metric by creating extra commits.

Number of commits don’t tell anything about the value and quality of those commits. Please don’t measure yourself with this poor metric.
— Jaana Dogan ヤナドガン (@rakyll) February 26, 2019

I’ve highlighted how commits are problematic, but it’s not the only metric that spells trouble. There are four others that I see a lot of companies using as a way to measure productivity. But these metrics are flawed. I call them, ‘the flawed five’. They are:

Commits
Lines of code
Pull request count
Velocity points
‘Impact’

Lines of code

The number of lines of code is a metric that has been around for decades – but it’s a really bad measure of productivity. For starters, there are different languages and formatting conventions that greatly vary in the number of lines of code they generate. So, three lines of code in one programming language might be exactly the same thing as nine lines in another.

Additionally, any good developer knows that they can code the same stuff with huge variations in lines of code, and that refactoring code (which is good) results in less code.

So not only is this metric inaccurate, but it incentivizes programming practices that are a counter to building good software.

Unfortunately, lines of code is still a really common metric used in our industry. I come across companies that use it as a way of evaluating developers’ contributions to their team, even determining stack ranking and terminations based on it. I think we’d all agree that this isn’t good practice, but it’s surprisingly common. We need to move away from this.

Number of lines of code written is not a measure of your value to a project.
— Scott Hanselman (@shanselman) April 13, 2019

Pull request count

Another metric that I see being used to measure productivity is pull request count. Counting pull requests seems to be a more recent trend – I was at a meetup last year and a manager said to me, ‘Pull request count is the new vanity metric’. And I completely agree with him. It’s not a good way of measuring how much work is getting done. Tracking the number of pull requests created or merged doesn’t factor in the size, effort, or impact of that work; it tells you almost nothing other than the number of pull requests created.

Like lines of code, this metric can encourage counterproductive behaviors. For example, this metric could encourage developers to unnecessarily split up their work into smaller pull requests, creating more work and noise for the team. I’ve seen this metric spreading like wildfire across our industry. A recent example I came across is GitLab’s engineering OKRs. These are published on their website.

Engineering OKRs

In this OKR, their objective is to improve productivity by 60%. They intend to achieve and measure this by increasing the number of merge requests created per engineer by 20%.

I don’t think this is a good practice. Counting pull requests might seem less offensive than counting lines of code, but both metrics suffer from similar flaws.

Velocity points

Velocity points can be an unpleasant subject. I think a lot of developers see them as a necessary evil. Personally, I’m a big fan of velocity points and think that they can be an outstanding way of sizing and estimating work. However, when you try to turn velocity into a measurement of productivity, you will run into problems.

Jira tracks individual "productivity" using points per Sprint, thereby destroying the careers of highly productive people who happen to be working on very hard problems.
— Allen Holub (@allenholub) August 5, 2019

When you reward people or teams based on the number of points they complete, they are incentivized to inflate their estimates in order to increase their number. When this happens, it makes the estimates and the number of points you are completing, meaningless. In essence, as you start using points to measure productivity, points become useless for their designed purpose.

Impact

‘Impact’ is a new, proprietary metric offered by several prominent vendors in the engineering analytics space.

‘Impact’ is an evolved version of lines of code. It factors in things like how many different files were changes, and what number of changes were new code vs. changing existing code. All these factors are combined to calculate what is called an ‘Impact’ score for each developer or team.

I’ve observed many companies that have tried this metric, and developers almost always hate it. Not only does this metric suffer from the same flaws as lines of code, but it’s really difficult to understand because it’s calculated using a number of factors.

Then there’s the naming of it. Calling a metric ‘Impact’ sends a strong signal about how it should be used, particularly by managers. And this makes it very easy to misuse. Do you remember the story about my dad? This is exactly the kind of stuff he was terrified about.

Conclusion

Identifying metrics for developer productivity is difficult. There are many things we can measure in software, but few that we should. But why is it that these five metrics are still so prevalent? Why is it that we keep using them despite their flaws? What is the definition of developer productivity? And what metrics can we use? It’s time to work together to answer these questions.

The ‘flawed five’ engineering productivity metrics

Posted in:

Written by:

Share:

Lines of code

Pull request count

Velocity points

Impact

Conclusion

Related content

Partner Content: The Engineering Leader’s Guide to Goals and Reporting

6 questions to ask when buying a software developer metrics tool

A buyer’s checklist for software developer analytics tools

5 mistakes to avoid when choosing a software developer analytics tool

The best software development analytics tools 2024

4 free engineering slides that win over boardrooms

How feature measurement promotes productivity and happiness

What McKinsey got wrong about developer productivity

Validating soft skills with hard data

Setting goals and using metrics that motivate

What the top 10% of dev teams look like in 2023

Keep your delivery in balance with these metrics pairings

Don’t let metrics be a distraction

Why elite dev teams focus on pull-request metrics

What recent data tells us about developer productivity and team health

How business alignment works in high-performing engineering teams

Using cooperative gaming to drive positive engineering change

The workflow metrics that make elite dev teams

Using success metrics to foster autonomy in engineering teams

Want to accelerate incident response? Empower customer service teams.

How to level up your reporting as an engineering director

Leveling up your reporting as an engineering manager

Five engineering KPIs to consider for your next board meeting

Five things you need to know about setting the right goals for engineering teams

Reporting to the board as an engineering leader: Secrets to success

Making goals a habit: A guide to reaching engineering goals

Four best practices for leveraging data responsibly

Why engineering teams should stop counting Agile story points

Four ways to empower your team through data

Using SLOs to align business and engineering goals

Five mistakes to avoid when setting OKRs for your team

Balancing delivery speed with engineering health

Getting your engineers engaged with OKRs

Using metrics to communicate the state of your engineering org

Engineering metrics at every level: An engineering director’s guide

How managers can use metrics to improve planning and run better standups

How to foster data-driven tech leadership

How Netlify, Splice, Code Climate, and more use engineering metrics

What engineering leaders should be measuring

The importance of effective status reports

How to run a great incident post-mortem

A primer on the OpenTelemetry collector

The health of your business depends on the health of your engineering team

Using metrics to remove bottlenecks and support your team

How to develop engineering metrics with people, process, and tools in mind

Creating efficient, accurate, software estimations

How synergy between product and engineering can solve the problem of software estimation

Taking a fresh look at setting objectives and key results for your engineering team

Plug in to LeadDev