Measuring developer productivity: a new path for data-driven decision making
Welcome to the internet's most comprehensive guide to measuring developer productivity in 2019. What makes this topic important enough to earn the "comprehensive guide" treatment? Allow me to open with a brief retelling of the events that sparked the author's obsession with this topic.
The year was 1999. I’d been coding for awhile, but had just secured my first paid gig, writing software in a professional environment. The job posting promised a summer of experience, at $17.50/hour (!), to help a small Seattle company build software that controlled digital projectors. I still recall the rush of walking into the office on my first day, and discovering I had my own workstation, my own telephone number, even my own cube..! Heady stuff for a teenager.
My domain of ∞ infinite bliss ∞ aka the cube farm ∞ circa 1999 ∞
The team was about 10 developers -- far more than I'd ever created software with before. With so many developers working together, it seemed there would be nothing we couldn’t achieve. But then, a few weeks after that heady onboarding, I noticed something that baffled me: the company's software was bug-ridden, slow, and, most perplexingly -- not improving.
I would try to use our software on the test projector we kept in the office, and it could take 20 seconds simply to turn the projector off (when it worked at all). No user was going to wait that long for a button to take effect. How could such a basic issue could be overlooked by a team of experienced developers working 40+ hours per week? We had several meetings, where the non-technical CEO gave pep talks to get our team "fired up" to make progress against the growing bug backlog. Still, the issues persisted. Our CTO was a fire-breathing despot who once bellowed at us that "I’m not going to wipe your asses, you morons need to test your code before you commit it!" He berated us for not putting in 10 hour days as the team creeped ever-further behind schedule. A cat-and-mouse game ensued among employees trying to escape the office under his detection; on one occasion I got caught in the stairwell and sent back to my desk. Still, the issues persisted.
In the three months I worked at this company, I witnessed the CEO work tirelessly to drum up meetings with clients who would have benefited from our product -- if it worked as advertised. He was a charismatic leader that worked harder than anyone I'd ever known. Yet, by Summer's end, we had lost more clients than we added. It didn’t make sense. How could such a large team, working such long hours, get so little done?
The answer was that management had zero visibility into the team's, ahem, work. The CEO was non-technical, so he didn't have a fighting chance. The CTO was disliked, so the developers ignored his directives while looking busy. They worked on pet projects, keeping up on instant messenger, and, under sufficient duress, an urgent Jira issue or two. Since all the salaried developers had offices, they were both literally and figuratively insulated from concern about having their process observed. When I began preparing to interview for my next job, I logged into our issue tracker and counted how many tickets each developer had resolved that Summer. I discovered that the hourly intern (me), had closed more tickets any one of the salaried developers. Naivete manifest.
While this company was an extreme example, it foreshadowed themes that remain ubiquitous across the software industry today. Even inside companies led by seasoned tech experts, managers struggle to gain visibility into their team's work. Instead of directly measuring developer output and using that data to improve team productivity, each manager devises their own set of intuitions to serve as their personal "North Star." Upper management then clamors to hire and retain technical managers whose past intuitions led to success.
If this arrangement sounds tenuous, it's not for lack of trying to find something better. In a top Google result for "measure developer productivity," Dustin Barnes neatly summarizes the industry's past efforts to outperform intuition, paraphrased here:
If there is a holy grail (or white whale) of the technology industry, especially from a management standpoint, it's the measurement of developer productivity. Measuring and managing developer productivity, however, has consistently eluded us ... In business school, it's still taught that "you can't plan if you can't measure," ... [but] as we've shown above, there still doesn't exist a reliable, objective metric of developer productivity. I posit that this problem is unsolved, and will likely remain unsolved.
Dustin's attitude, aka the prevailing outlook of 2015, hammers home how far developer measurement has come in just four years. As of early 2019, there are now three credible options for directly measuring team and developer performance. Each of the options approaches measurement in its own data-driven way. Each option has its adherents, and each provides documentation to help you evaluate its alignment with your own philosophy.
A disclaimer about this guide before moving forward: it's comprehensive, which makes it long. Feel free to jump forward if there's a topic of particular interest to you. Sections are designed to be self-contained:
- Is measuring developer productivity really necessary? Reviews the emerging data on how developer measurement impacts results.
- Better data, better policies. Examples of how measurement can improve retention, morale, and amount of work completed.
- Best tools for measuring developers in 2019. Three tools available to measure developer performance, and how they differ.
- What makes code measurement possible? What changed that made it possible to measure developer output, when it was long assumed impossible?
- On securing developer buy-in. Addresses the first question asked by many companies pursuing developer measurement: how do I get the team on board?
- Transparency always wins on a long enough time horizon. What can be learned from the past about those who embrace and resist increased transparency?
This guide will be updated continually to reflect new advancements in developer measurement. If you'd like to talk about this article, drop by our corresponding blog post.
Is measuring developer productivity really necessary?
Engineering Managers don't have minutes to waste on anything non-essential (related: why the best developers tend to utilize a small set of tools). But, depending on the week, they might not even have time to perform all of their job's essential activities. If managers this busy are going to embrace a big, new idea like measuring developer performance, it must have provable evidence that the benefits justify the cost -- both financial and temporal.
GitPrime, the earliest entrant to today's developer productivity space, has done an excellent job of following up with their customers and documenting the impact that performance metrics can have on results. Their case studies include a 137% increase in Impact by Storyblocks, a 40% decrease in bugs introduced from Cloudhealth and a 25% increase in measured Impact enjoyed by Adext.
Code Climate, the most recent company to join the space, has documented an 83% increase in productivity while dogfooding their Velocity product. They ascribe their success to ditching their flat hierarchy and reducing their time spent in meetings. This anecdote from their story was especially compelling:
[Our manager] first started noticing that disengagement during daily standups. Some people were zoning out; others were on their phones. Almost no discussion was happening. It was 20 minutes of engineers taking turns speaking, one by one. When she brought this up in her 1:1s that week, she found that the team was in unanimous agreement: meetings were inefficient and taking up too much time.
This is consistent with our findings at Static Object. The "standup meeting" culture that permeates most Agile development teams has been assumed (as opposed to measured) to provide benefits that outweigh costs. We recommend teams measure the productivity difference of any regularly scheduled meetings to ensure that they're comfortable with the trade-off being made.
When it comes to standup meetings, it turns out the cost can be roughly expressed in terms of time spent:
15 minutes per standup meeting *
10 minutes before meeting when new tasks aren't started *
10 minutes after meeting to restore flow state *
number of meetings per week
The deeper one digs into the case studies emerging at companies that measure their development throughput, the more evidence emerges to support the narrative that a 5-15% improvement to developer output seems to be about the norm after beginning measurement .
Even if your business case yielded only a 5% increase to developer throughput, how many other avenues do you have available to multiply the efficacy of your engineering budget by 1.05? What does that math look like?
Better data, better policies
Data opens the door to improve the throughput of specific committers, but it gets even more interesting when we zoom out to apply it at the team level.
At the individual level, there are many paths by which measurement accelerates output. The most basic benefit is a "heads up" for the manager when a developer's velocity is more than two standard deviations below their average. The less formal way to describe such situations is that "the developer is stuck." Knowing when to intervene to help a stuck developer can save hours of time and frustration -- especially for junior developers, who are prone to suffer in silence. GitPrime, Static Object, and Velocity identify stuck developers (in varying ways) as part of their base package.
A more advanced tactic to drive up individual efficiency is to match developers with tickets that play to their strengths. On Static Object, every developer's efficiency is shown relative to their peers, across code categories:
At the team level, measurement's greatest contribution lies in allowing relentless experimentation. Step one is to establish the baseline Line Impact during a typical week. This tends to be consistent to within about 20% for most Static Object customers. Step two is to try an experiment for a couple weeks and measure the change in Line Impact. It's like an A/B test for your business processes.
One of the first experiments we ran upon gaining access to developer measurement was to analyze the impact of working from home. Here were the results of our analysis, and here's the corresponding graph that exhibits our per-hour output over the past year:
At face value, it appears that working from home on Wednesday has zero impact on the team's output. But the real story is more nuanced. It's typical for our team to schedule all chores, errands, dental visits, etc., on Wednesdays to reduce the hassle of getting to their appointments. You can see this reflected in Wednesday morning cells in the above graph, which are less active (= darker) than Tuesday or Thursday morning. How do they make up the lost time? Check out Wednesday evening -- the only day of the week with significant activity occurring after 5pm. Our developers are rewarding our trust by making up time after hours. It's a win/win that gives us the confidence to continue a cycle of trust. It would also let us spot if we hired a developer bent on abusing that trust (since any stat can be scoped to the individual level).
While working from home yields a neutral result on productivity, it has positive implications on morale, so this experiment is now ensconced as our company policy. Being able to run interesting experiments like this is a cornerstone of the well-measured company.
Best tools for measuring developers in 2019
We've reviewed specific scenarios where measuring developer performance boosts results and creates opportunities. Once you've gathered how your team can benefit from data, the next step is to consider which flavor of measurement tool best matches your needs.
If this article were written five years ago, your options would lie somewhere between "shabby" and "pitiful." GitHub Stats already existed by that point, but their focus on raw lines of code and commits made were exactly as useless as you would predict. In these past four years, the performance measurement space has blossomed to host three "high polish" options worthy of your consideration. Each brings a distinct set of strengths and its own measurement philosophy. Let's review them in chronological order of their launch date.
The first company to plant their flag in the performance measurement space was GitPrime (homepage, pricing), debuting in 2015. They quickly gathered acclaim for their "Engineering Impact" newsletter, created by co-founder Ben Thompson. It compiles the best articles from around the web to describe topics that matter to engineering leaders. GitPrime's newsletter engendered goodwill among the developer community, then backed up that goodwill by launching a quality product that supported every major git platform. Work on GitPrime is framed by four key metrics: "active days," "commits per day," "impact," and "efficiency." The company differentiates through sheer quantity of different reports offered. And they are constantly evolving new ways to slice your git data.
According to Crunchbase, GitPrime has raised about $12m to date. You can feel what that capital provides when visiting pages like their blog and homepage -- replete with swooping text and customized scroll logic. GitPrime's focus during 2018 was a push into the "collaboration" space, by working to illuminate all aspects of the Pull Request process. The emphasis on collaboration positions them to compete head-to-head with Code Climate. GitPrime is currently the only product among the the three to offer an on-premises version (though Static Object projects to offer this feature by middle-to-late 2019).
A 50 developer team on the fully featured "Velocity" plan will pay $2,399 per month when billed annually.
In 2017, Static Object (homepage, pricing) debuted with a single-minded focus on being the best solution for technical managers and developers who review code. To best serve the needs of its technical users, Static Object is the only company to offer robust code review tools, including a diff viewer that groups similar commits together. The code review tools save time by hiding churned lines of code, labeling precise operations (e.g., move, update, find-and-replace), and collecting commit activity across repos/branches. Each commit or commit group is then labeled by the size of its Line Impact, and presented for consumption.
Static Object also differentiates by allowing technical managers to use their domain knowledge to dive inside the code crunching engine and modify the way Line Impact gets calculated. This focus on configurability allows Line Impact to correspond precisely to the customer's empirical intuition. This in turn helps to drive trust from the developers themselves (more on that here). In another nod to its developer-centric ethos, Static Object is the only product to allow users to sign up for a trial without a sales demo. When Static Object launched, it had been the most expensive code metrics app on the market (priced above GitPrime at the time), but its competitors have since left it in the dust.
A 50 developer team on the fully featured "Pro" plan will pay $1,395 per month when billed annually.
The most recent entrant in the performance metrics space is Code Climate, which launched Velocity (homepage, pricing) in 2018. While the product is still in its early stages (as of February 2019, a banner is affixed to their homepage promising "v2 coming soon"), it's now the focus of the Code Climate homepage, supplanting their long-popular code quality tools. This placement suggests that Code Climate expects Velocity to be the primary focus of their company moving forward.
Velocity recently published a robust comparison between their features and GitPrime's. If you're considering Velocity, I recommend checking out this page. It illustrates how Velocity shares with GitPrime a common thread of ideology, features, and even some design elements. For example, both offer a "commit activity log" that use shapes of various color and size to represent commit activity:
In contrasting with GitPrime, the article points toward "Surfacing issues" as the area where the products diverge most. In their words,
This is the category in which the two analytics tools differ most. Velocity, with PR-related metrics at the core of the product, does a better job drawing attention (inside and outside of the app) to actual artifacts of work that could be stuck or problematic. GitPrime, with mostly people-focused metrics, draws attention to contributors who could be stuck or problematic.The article concludes that, relative to GitPrime, "Velocity has put collaboration at the center of its product, so you’ll have access to more insights into the code review process and into PR-related details right out the gate."
A 50 developer team will pay $2,199 per month when billed annually (whereas a 51 developer team will pay $3,299/month 🤔)
The final product trying to make waves in the space is waydev.co (homepage, pricing). Under the heading of "Why we built Waydev," the company states, "as a technology company working with developers, it is almost impossible to understand what your team is doing, and your intuition doesn't help. But Waydev can." Details are light on how the product differs from GitPrime, and their product screenshots look like they could have been taken out of the GitPrime help docs. Should they go on to publish more pages describing how their code processing works, their space in this guide may grow. At present, their only point of differentiation seems to amount to price, which at $25/user becomes $1,250/month for a team of 50.
What makes code measurement possible?
Skeptics of code measurement have years of history, not to mention primitive tools like GitHub Stats, that they can cite to substantiate their distrust in "lines of code" as a measurement tool. Historically, their skepticism has been well-founded. This companion article touches on some strategies used by Static Object and GitPrime to bend lines of code into useful data. Those who would like to delve even deeper into how it's possible to separate "meaningful lines of code" from noise can check out Static Object's full-length feature on counting lines of code.
On securing developer buy-in
One of the inevitable questions asked by new clients as they begin down the road toward measuring developer output: how do I explain this to my team? It is a fundamental question to address, especially if the manager intends to use code metrics as a component in performance reviews.
To earn developer buy-in, bring them along as you adopt measurement. Image credit: pexels.com
There's no single answer that is going to satisfy every developer personality. Programmers skew toward high intellect, which often goes hand-in-hand with skepticism toward authority, and broad mandates issued by management. The general approach I recommend to get your developers to embrace measurement: be transparent and just. Put yourself in their shoes and consider how you'd feel if a newly introduced tool put you in the bottom half of performers. If it's fair, and fairly applied, you'd come to terms with it. But that probably wouldn't be your first instinct.
Below are three ideas we've seen used to help developers grow comfortable with performance measurement. These focus on strategies available to Static Object users, since those are the ones with which we have the most familiarity. Here's a blog post describing how a GitPrime manager gets buy-in from his small team.
Let developers see how specific commit activity corresponds to measured output. This is where code review tools can do double duty. They were created because looking through commits on GitHub one. at. a. time. is an incredibly inefficient use of a developer's time and we knew we could do better. But when it comes to securing developer buy-in, the code review tools serve a second purpose: allowing developers to get a tangible sense for how their work becomes Line Impact on a file-by-file, commit-by-commit basis. The longer a developer uses Static Object, the less mysterious it feels, as they develop an intrinsic sense for how measurement occurs.
Get empirical with your leaderboard. If you're a CTO or Engineering Manager using Static Object, you can configure Line Impact so that it matches your empirical sense of your team's top performers. Most managers begin with at least a vague sense for who's making the greatest contribution to their team. The better a new measurement aligns with the team's existing beliefs about its top performers, the better the confidence in the measurement.
Make clear that no single tool or approach tells the entire story. The last and most important factor to aid buy-in. One of the most common concerns among new users goes something like: "the developers that write the most lines of code aren't the most desirable -- it's closer to the opposite. The most valuable developers are those who don't create bugs, mentor others, pay down tech debt, trim crufty methods, are easy to get along with, etc." To this we respond: yes! Absolutely . And that's exactly why an algorithm will never replace the value of a great engineering manager. Looking at a metric like Line Impact provides one valuable piece of the puzzle when it comes to evaluating a developer's total package. But there are many pieces that software could never begin to approximate -- like whether a prolific developer's work is on the tickets they were assigned. It's essential that, when you introduce a measurement tool, you make clear that code measurement is just one aspect among many that inform your evaluation.
Transparency always wins on a long enough time horizon
The idea that we can measure development is new, so it's natural for it to face early skepticism. Consider how big a departure this is from tools a manager has used in the past. It carries the baggage of all the ineffective measurement systems of yesteryear. It begets winners and losers, which requires a strong manager to message effectively. Even with effective messaging, the possibility of early, instinctual pushback from developers is real. It seems expensive.
For some, it will take a leap of faith to believe that improved transparency justifies the cost. This final section glances back through similar inventions of the past to extrapolate whether trust is warranted. Over the last 20 years, the arrival of inventions that improve transparency follow a pattern. The first phase is that the invention is ignored. They lack sufficient data to inform their projections. The next phase is that they begin to gain traction. Oftentimes in this phase, the invention dramatically improves life for a tiny number of users who love them. The third phase is pushback, which begins tepidly and grows fireball-hot, as parties who benefited from past opaqueness begin to suffer from improved information. The final phase is broad public awareness. With this comes general acknowledgement that, while the system will never be perfect, its users could never go back to the way things were.
A small sampling of companies that have followed this arc, along with their detractors:
- Consumer Reports. Hated by: Companies with substandard products.
- Tripadvisor. Hated by: Motel 6.
- Zillow / Zestimate. Hated by: Realtors and their lobbyists.
- Metacritic. Hated by: Adam Sandler.
- Airbnb. Hated by: Marriott, Hyatt, and especially Holiday Inn .
- Yelp. Hated by: Every restaurant, especially Applebees.
- Redfin. Hated by: Realtors and their lobbyists.
- Glassdoor. Hated by: Retail stores.
Think about the first time you encountered Zillow. If it was early enough in their history, you probably weren't impressed. It takes time and a lot of iterations to gather enough data to make an algorithm like Zestimate consistent and reliable. Estimating the price of a home requires factoring in more nebulous, real world variables than any code measurement algorithm. But, before long, the early adopters discover the tool, and if it affords them an advantage over total opaqueness, so they use it. Once enough early adopters tell their friends, the parties that had benefited from lack of information begin to fight back. The Wikipedia page for Consumer Reports dedicates an entire section to the lawsuits that have been pursued against it. They're proud to have won all 13.
In the final phase -- that of broad public awareness -- the world becomes a little bit clearer, more predictable, place. If you live in a major city, the quality of the food you eat and the service you receive is better than it has been at any point in history. Ask a business owner about Yelp and you'll hear a litany of complaints about the unjust reviews they've received. Most business owners prioritize local effects over global effects, and that's fine. What matters is not perfection -- just that the biases of the measurement are minimized. In spite of their imperfect methods, most Americans would never consider going back to relying on intuition over Yelp.
As awareness around developer measurement grows, the early-adopters gain an advantage over their competitors by running their team 5-15% more efficiently . The difference sounds small-ish, until you add in the benefits to morale from the experiments you can run on more liberal, employee-friendly policies. Code transparency is still approaching phase two of its adoption pattern. But it won't take long until smarter companies recognize they can gain an information advantage in engineering. As the cycle picks up steam, the tools will keep growing more polished, until it's hard to remember a world when managers had to guess what their developers were working on. Multiplying the yield of your engineering budget by even a paltry 5% adds up in a hurry given current developer salaries.
Thanks for making it down to full scroll bar territory! I hope you better understand how developer measurement has changed over the past few years. If you manage more than 5 developers, you can get ahead of the curve.
- GitPrime. "Ship faster because you know more. Not because your team’s rushing." $2,399 per month when billed annually.
- Static Object. "Harness the full potential of your engineering team. Drive measurably higher output from your existing git data." $1,395 per month when billed annually.
- Velocity by Code Climate. "Actionable metrics for engineering leaders. Velocity turns data into insights you can lead with." $2,199 per month when billed annually (a 51 developer team will pay $3,299/month).