Can developer productivity be measured?(stackoverflow.blog) |
Can developer productivity be measured?(stackoverflow.blog) |
Edit: if you're talking about the math, I think "changed" includes added and deleted. So, it's the ratio of added and deleted to the total change.
The real underperformers go negative.
Measuring closed tickets is an excellent metric if the tasks are written well and assigned based on business priority. When more tickets get closed, more good things are happening with the project, be that bugs getting closed off or features made.
Also ticket != value. Lots of tickets for things that involve almost no work and things that actually make a difference to the customer/product are not equal.
Everything I work on is new products/projects and tickets come in all sizes and shapes, and often change daily as some exec crams in more new ideas or some designer or product person "clarifies" the ticket, even after the work is done. Tickets are often written and estimated long before decisions are actually made. Defects are written that require a lot of investigation only to discover it's some other teams problem and you can't do anything or turns out to be a temporary service outage no one communicated or misconfiguration in some CMS or even plain simply not understanding what the product does.
Measuring productivity by tickets closed is a whole pile of dead snakes.
It pertains to perverse incentives.
It's not really a helpful observation, but I'm curious if there's a way for the relationship between "people asking for things" and "people building things" to be repeatably fruitful (IMO it's very possible that reliably producing customer value is either insanely hard and/or not doable consistently).
Counting closed tickets is indeed a measure for something, but by itself it's far from being a good indicator.
(Although in this specific case, your colleague who has been working on one ticket for two or three weeks is operating in a way that I find is usually pretty harmful for productivity overall. "Solving a critical bug" is almost universally something that can be broken down further.)
At best you’re measuring _activity_ not productivity. You just turned a group of smart people into headless chickens jumping on whatever ticket so they can to look busy. Which cultivates an environment of fear, which in turn kills deep thought and creativity... two essential ingredients for good software.
I could even argue that ticketing systems are the bane of good software, making real priorities intransparent... but that’s a rabbit hole I won’t go into here.
Instead I’d argue we shouldn’t be trying to measure developer productivity at all.
Productivity in software development is non-linear and difficult to assign individually.
How do you measure the productivity of that “lazy guy” that had an amazing shower thought one morning, implemented it by lunchtime, which in turn leads to the company making millions more by the end of the year?
Or what about the person on the team that spends most of their time supporting the rest of the team, unblocking them and helping them be productive?
Two examples of why we shouldn’t even be trying to measure developer productivity.
My own experience after 25 years in this industry is the moment someone says “but how do we measure developer productivity?” is the moment that companies software products begins a long, slow death.
Ultimately what development teams and companies (not individuals) should be measured on is _results_ that positively impact customers and business.
When the product is a success, no one cares about individual productivity.
Don't forget the weight.
I've seen single tickets taking weeks for bug investigation.
Making good tickets across a dozen organizations and 100's of people is hard to ever get right. Which is why counting tickets is sort of pointless, you might have a ticket to add a single value to a database and without it, the whole product doesn't work, but you have no idea since there are 10 layers between you and the real customer.
That seems kinda arbitrary to just manipulate the issue into tiny pieces to fit some sort of metrics system... but not reflective of the actual work.
That seems to just lead to the typical gamification that comes with counting tickets and other metrics systems that end up being arbitrary or even easily manipulated.
Ticket measuring just seems like asking for Goodhart’s Law.
There's only one way to call this: ticket system abuse.
Does it literally have to be individual JIRA tickets? No way, but going off for 2-3 weeks doesn't give the business the insight it needs to in order to wisely invest time/effort into work being executed.
[1] https://medium.com/machine-words/a-guy-in-a-room-bbbe058645e... (I thought Joel Spolsky said this but I can't actually find the original source, if anyone has it I'd appreciate it!)
Assuming that it is important to fix the bug and that the developer is competent and trusted - why not? What would communicating progress improve here?
Mind that you cannot communicate when it is done (otherwise it would not be a hard bug) you can only communicate what you have done so far and what you try next. But what kind of business value does that create?
Not that they couldn’t make the call if they had all the info, but the time required to gather and understand all the context would be a second full-time job.
Sorry, that English doesn't make sense to me. The value is that... the developer doesn't have context? How is not having context a value?
Maybe you meant "the reason"? But then that doesn't answer the question what the value is.
Are you saying that without communicating progress there is no person who understands the context in which the issue is being worked in?
That doesn't make sense to me and seems to be totally orthogonal to any communication of progress.
I saw this perhaps most acutely with a company I sold - for a couple of years I was more productive than on nearly any other large software project I've worked on, because I knew the ins and outs of everything. The developers who bought it and took over are probably better developers than I am, and they are unquestionably excellent coders, yet it took a couple of years for them to get productive at making even medium sized changes. It became incredibly obvious to me how handicapped you are diving into something someone else made, especially if the original designer isn't there anymore.
Meshing really well with managers & PMs is probably the next biggest factor in my own experience, but it doesn't come even close to the gap between being there from day 1 vs coming in much later.
> Productivity tracking tools and incentive programs will never have as great an impact as a positive culture in the workplace.
I'm a fan of choosing to use time management apps and productivity tools to manage my own budgets. But I admit that I hate it when I have to do it for someone else.
The natural trajectory for a project is to keep adding features until it collapses from its own weight. Only the long tenure developer can fight this and revitalize a project by removing the useless excess.
Feature work is such a smaller part of my individual contributions at this point - I do some here and there so I don't get too out of touch with the front end and user experience - but much of my coding work these days is reworking existing core functionality.
Thankfully we understand the necessity of deep maintenance for our system that we fully expect to still be running in 10 more years, but even with that it's damned hard to keep up. I can't imagine having developers come and go every couple/few years and little or no leadership support for code and systems improvement.
Building software is a knowledge business, and there are three types of knowledge involved:
1. Subject knowledge: understanding of the subject the software is about (e.g. accounting when building accountancy software).
2. Platform knowledge: understanding of the platform used to build the software (e.g. Python, SQL, React etc).
3. Architecture knowledge, which is what the parent is talking about: understanding of the specific choices made in the development, being aware of all the Chesterton's Fences [1] etc.
[0] https://en.wikipedia.org/wiki/Brooks%27s_law[1] https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...
I’ve also watched Brooks’ Law in action, watched people thrown onto a project to try to get it out the door cause it to slow down. I do not believe what I said is an example of Brooks’ Law. Brooks’ Law is “adding manpower to a late software project makes it later", it was not an observation about all software development in all stages. Brooks’ Law assumes that the “ramp up” time is finite, and not particularly long.
I’m also not actually saying that people aren’t productive when joining existing projects, I’m saying that there’s a much deeper component of productivity that depends on involvement from the beginning. I’m sure you’ve already experienced being very “productive” after a short time when joining a large project, I definitely have. It’s the kind of productivity that depends on being there early, and Brooks wasn’t making any qualitative statement about productivity, only quantitative.
Fred Brooks was a lovely man to speak to, BTW. I met him and had a long interview when I was considering going to North Carolina for graduate school. The lasting impression I got wasn’t anything he said in particular, it was more of the positivity and optimism he carried about software and life that made me want to hang around and hear what he had to say.
I think that complexity is hard to measure and therefore easy to game.
At GitLab we only measure tasks completed, the number of changes that shipped to production, with the requirement that every change has to add value. This measure has been used throughout R&D https://about.gitlab.com/handbook/engineering/performance-in... to assess productivity for multiple years now with good success https://about.gitlab.com/blog/2020/08/27/measuring-engineeri...
When you tell new engineers about this target they see a great opportunity to game it, just ship smaller changes. It turns out that smaller changes are quicker to ship. Lead to better code and tests. Have lower risk of cancellation and problems in production. And lead to earlier and better feedback.
Inspired by Goodhart’s Law I'll propose the following: A measure that when it becomes a target improves productivity. ~Sijbrandij's Law
Sarah and Bob make clocks, but sometimes they make hats, and sometimes they make screws, or hammers, or lamps. And sometimes the things they make get sold to customers, but sometimes other employees take them home, sometimes they make parts for each other to use when making bigger projects, and often they help each other and other employees out on unrelated projects. And sometimes they do repairs too. Oh yeah they also paint portraits that hang up around the office.
Try coming up with a measurement for their individual productivity that is easy enough to be useful, hard to game, and cheap enough to make it worth the price.
The first step is to figure out how to measure the value of all the stuff they make...
A good manager can get a reasonable subjective sense of individual productivity but won't be able to quantitatively measure it.
> Productivity: the effectiveness of productive effort, especially in industry, as measured in terms of the rate of output per unit of input.
We can all agree "lines of code" is a shit metric, and we can't say "# of bugs closed," because each will have variable difficulty and value. Programmers employed by a business are in charge of automating repetitive tasks, not performing them (the classic measure of productivity).
I perform UX research on APIs. Here, we standardize the "output unit" and therefore can get a better idea of a developer's productivity. Every developer performs the same task, so we can simply measure time spent.
There will never be an ethical solution to measure developer productivity during the workday; this isn't Ford's assembly line.
the only reason I beat him on 'bugs fixed' is that one of my jobs was to dig through the old issues and remove them if they weren't clear or relevant.
Thank You. I had not seen that one before.
Bit of a cold scenario, but one way to game it out is hypothetically removing the current dev and then hiring someone better at double the pay.
Or, less unfair seeming, double the pay by hiring a second dev. That might not double productivity ... depending on the situation it might 1.5x it, or just as easily 4x it.
Will we then organize with other workers to create better working conditions for everyone or will there be fewer and fewer developers working with ever more powerful technology chasing richer than ever VCs?
If a company is willing to sacrifice engineering talent and institutional knowledge for short term gains... Good luck staying in business.
Reference: Every outsourcing project I've seen.
I suggest you look to database models/schemas standardization for an indication of how close this is coming to fruition. I personally can't measure developer productivity at a fine-grained level until requirements are stabilized, and I personally cannot stabilize requirements unless the domain is so well known the data store is standardized. I had hoped SAP would lead the charge through empirically iterating towards standards, but they left out the huge small and mid-size business markets with what they use today. And what they use today is still far from industries' standards.
We're no closer to standardization than when I started in software decades ago. We don't even have standard means of storing, transforming, displaying and tracking metadata upon calendars, addresses, phone numbers, names, and lots of other ephemera I can rattle off, within a single stakeholder industry, not to speak of within the software industry in general. There have certainly been efforts to standardize like Silverston's, but they haven't caught traction.
I'd sure like to see that happen, because it would short-circuit a lot of the discussions I engage with stakeholders to only the site-specific requirements, where I really add business value. Instead, I have to derive the data model from intricate discussion of their requirements, since they themselves have not agreed upon the parts that are common across their respective industries, so I end up at the start of dicussions with all sorts of little twisty pieces of a data model, all alike.
The problem, however, is that management is always being pushed to make objective measurements. For example, to fire someone, you have to first put him on an improvement plan with objective measurements. Otherwise, you're wide open to a lawsuit over discrimination, etc. You have to prove to a judge someone isn't performing, or that you gave raises based on performance.
Management also gets pushed into these attempts at objective measurements by attempts to optimize the numbers like what works great for a manufacturing process.
Why productivity of developers must be measured, but productivity of managers not?
Which lines up with nradov's comment about the smallest useful unit for measuring productivity is the typical self-contained development team. Attempting to determine if Bob in TeamA is more productive than Cindy on TeamB doesn't generally result in any actionable information. What matters (from a senior leadership perspective) is can TeamA or TeamB build the things that need to be built in a timeframe that's acceptable to stakeholders.
As a manager, if I feel like Bob or Cindy are unproductive, then I need to figure out why. And LoC or number of commits isn't going to tell me that. Possibly the number of defects found in QA, but even that isn't perfect.
Which might be impossible to measure.
Also consider, frustrating though I'm sure that was, he probably still got paid for his effort in the 80%.
- how many hours did a musician spent on his instrument before selling his first record ? - How many drawings/paintings before Picasso could sell something ? (etc.)
Seems like a No True Scotsman fallacy to say only good technical leaders can tell if a developer is being productive and in the same breath say it's unmeasurable.
hours worked, bugs fixed, tickets closed, costs saved, clients saved, KPIs/OKRs hit, time in queue, hours-to-close-ticket, uptime, SLAs hit... surely some collection of indicators, while not a pure signal, would let you highlight outliers either above or below the curve.
It's like saying, "your productivity as a carpenter is how many houses you build in a year" while refusing to take into account how big or complex to build are those houses.
Even "hours worked". What do you mean, hours spent in the office? How do you know the person was not drinking coffee or staring at their code in thought of something else.
That being said, I think your metrics are good. And even single developers should be measured by them (which means all developers get measured by the same metric and get the same value). Why? Because it helps the business of SLAs are hit, no matter why they are hit.
I did have the idea of directly tying value to a graph of code that enabled a certain user journey. Sorta like 'CUJ-coverage' instead of test coverage. So if a user spent $20 at checkout, every line of code that was touched to enable that user's journey would be credited with that $20. I think this would be an interesting metric I'd probably respect but there are still probably a lot of blindspots this methodology doesn't capture.
If no developer complained about anything nothing would change and the project would be slippery slope to oblivion.
Usually any project needs some kind of feedback loop to correct any problems and most of the time an important link in the loop is developers complaining.
It just needs to be done in a productive way. For example, retrospective is an attempt to direct complaining to be productive element in the process.
Also, it is a good starting point to be very cautious abut any radical opinions like that.
have never been sure how summing together something that is supposed to have no relationship with time magically provides an estimate of anything
also not sure why teams are using the central source of truth for progress as the 'daily todo list making' tool
I live in the real world so I estimate in hours
It feels like as I progress my individual work stays the same, but helping others eats any efficiency gains I personally make.
As if when you are new to a module, you are slow because you don’t know anything, then once you have expertise, you are slow because you know everything and are helping others.
Would be interesting to measure this somehow.
This suggests that the proper way to keep team productivity high is to have all team members working on the product since the beginning, and treat them well so that they don't quit and don't have to be replaced by new ones. Maybe even start with slightly more people on the project than necessary, so if a few of them quit during the project for unrelated reasons, you can still finish the project with the remaining ones.
Probably not going to happen, because this goes against maximizing short-term productivity at the beginning of the project. The short-term productivity is maximized by having the team as small as possible, and only worrying about problems after they happen.
If the scope right now is pull a bunch of values out of spreadsheets and generate reports on them, the highest quality code would be the most terse: it looks up the files, get the information, then displays it. If tomorrow the scope changes to "do that, but in realtime, across multiple machines", the highest quality code is the one that implemented a database and REST API.
Since scope changes all the time, we can never evaluate which set of code is the highest quality.
In just about any system of productivity metrics, these two episodes would mark me as dismally productive:
In the bank I was working for, the incidence rate of online banking mainframe reIPLs went from every few days to zero.
At a telecommunication provider, data center reIPLs similarly reduced.
So we get young, naive engineers to focus on small changes. Cool, probably as it should be, you gotta start somewhere. And when these developers get hungry for bigger projects, when they get bored implementing the umpteenth small and by that point (for them) trivial change, how do you encourage them to tackle bigger technical problems? Those that lay the foundation for the new people to do their job more easily and on-board quicker? Or did you actually not tell us all, and you measure far more than just the number of changes?
Maybe a few things are happening:
1. Some large technical improvements can be shipped in multiple changes that add value.
2. Most companies do more large technical changes than is optimal.
3. Engineers are motivated to make the large technical changes since they are interesting and make their future work easier so they will prioritize them despite the measure.
4. GitLab is having fewer larger technical improvements than optimal.
5. Our dual career structure ensures that there are engineers who can do these larger technical improvements without being below average themselves because they are more productive than others.
6. We are not pushing very hard on this metric since we do it in a group setting instead of per individual.
Isn't gitlab known for disastrously poor infrastructure with all the long outages? I.e. the exact things where people need to take their time to tackle bigger technical problems, not complete short tasks. I guess this attitude explains it, at least partially.
This should illuminate why these conversations constantly go in circles.
Over 150 days, this developer's churn really fluctuates and that's because they work on different things, that requires different amounts of code. And if you look at the following:
you can see they still commit regularly, but as the Reviewability section shows, their changes are mainly small ones, which sort of aligns with what Sid (sytse) mentioned, which is mainly focusing on small changes.
If you look at the bigger picture:
The churn for the project microsoft/vscode fluctuates quite a bit as well.
Based on what I've learned so far, you really need a good baseline (that can vary greatly from one developer to another) to be able to determine if somebody is more/less productive.
Also I would suggest that ad-hoc analyses and other such hacks are unlikely to lead to PRs and other visible artifacts.
Impossible to have 'metric based reviews' at that point. But I think that's fine.
One cheap, but mostly invisible mistake that lingers isn't a problem... but they can pile up.
Sometimes firing someone ironically gives them a wake up call and they’ll do great for their next job. Sometimes they don’t learn, sometimes they’ll never recover. Sometimes promoting people helps them, sometimes they become overwhelmed and performance drops again (I’m not speaking Peter Principle either).
It hurt my career staying at a dead-end job but it gave me years of free time doing pretty much whatever I wanted.
If I pay above market rate I'll attract better devs for sure, and the caliber of folks in my hiring pipeline will get better. It's not obvious at all unless you know where to look. College is the prime example of that. If you pay better you'll have more new grads applying and they will prioritize you over other offers (unless you are an exceptionally prestigious employer). But even then, you'll never talk to the student who interned twice at FAANG and got a firm offer a year before graduation. You can get that guy only if you are willing to employ him at FAANG salary for two summers.
Employing these guys won't make my existing hires any better than they are in the immediate future.
However
Better hires leads to better teams. I find that certain developers have a multiplicative effect that applies to other devs. They mentor, document, review and help everyone grow. That might actually slow them (taking half a day to explain high level architecture to a lowly junior coder) until you realize the junior coder is now capable of answering questions from his teammates.
Well... if you knew how to spot somebody twice as good as the one you have now, why didn't you hire that guy in the first place?
So what is a "higher quality of developer"?
> Velocity is an aggregate measure of tasks completed by a team over time, usually taking into account developers’ own estimates of the relative complexity of each task. It answers questions like, “how much work can this team do in the next two weeks?” The baseline answer is “about as much as they did in the last two weeks,”
If there's one thing the last 50 years of software development has conclusively proven, is that estimating the number of man months (hours) a project will take doesn't work.
still: (1) sounds like complexity predicts weeks? so they are estimating hours. And (2) I think if jira clones were really a tool for estimation, they'd have uncertainty scores and some kind of prediction market built in
Most people dramatically underestimate the amount of time something requires. As long as you give them a clear conversion rate between story points and hours, they will estimate the task in hours -- incorrectly, despite having made the same mistake hundred times in the past -- then convert the hours to story points and tell you the result.
Then someone notices that you have like 200 man-hours in sprint, and you have only selected story points for 100 man-hours. Which in fact is perfectly okay, if you understand that the "100" is the underestimate, and the realistic estimate would actually be close to 200, so you should be happy about the plan! But most people will not get it, and they will insist to plan properly for 200 man-hours. If you don't have enough political power to stop them, they will make you plan for 200 man-hours.
Then at the end of the sprint, everyone is stressed out, and they only completed 50% of planned stories. Because they underestimated how much time the tasks would take... just like research shows humans always do, no matter how many times they got burned in the past, no matter how much you yell at them to make better estimates.
(By the way, the problem with making realistic estimates is not just that individuals suck at it, but also that social forces actively prevent it. Research shows that people who make more realistic estimates are considered less competent than their colleagues, precisely because everyone notices that their estimates are longer that they believe they should be. And no one later changes their opinion just because the estimate turned out to be correct. Like, really, people who estimated something to take 2 weeks and delivered it in 3 weeks were judged as more competent by managers than people who estimated it to take 3 weeks and delivered in 3 weeks. The former made a better impression at the beginning, and the latter didn't provide a better result at the end, so the former made a better overall impression. This is how human brains work.)
So the smart way out is to make a metric that is taboo to convert to hours. Give vague verbal descriptions, like 1 is "trivial", 2 is "fairly easy", 3 is "simple", 5 is "medium", 8 is "kinda difficult", 13 is "tricky", and 21 is "needs to be split to smaller stories". People will first feel weird about it, but then they get used to it, and they will start delivering consistent ratings... like, the kind of story that gets assigned 5 story points in January will probably also get assigned 5 story points in December.
Then all you need is calculate velocity, which is, well, the conversion rate between the story points and hours. But you can't say that, or it will ruin the magic! You just say "during the last sprint, we implemented 50 story points, so for this sprint we will also plan 50 story points", and hope that people will accept that, without making the conversion explicit. And it works...
...until someone says: "Hey wait, so if we have 200 man-hours and plan 50 story points, that actually means that 1 story point equals 4 hours, right? And why are we giving this specific story 3 story points? 12 hours sound too much to me, I am pretty sure we could do it in 8 hours, or even 4 hours if we work hard, right?" (The rest of the team is silent, either because they agree, or they don't want to be seen as less competent.) And then you get another sprint when people plan too much, complete 50% of it, and get another stern talk about being more careful about making estimates.
It is a psychological trick that only works if you stop estimating stories in hours. It always breaks when someone insists on connecting the dots, converting the estimate to hours, and "fixing" it because it is "too much". If we could reliably estimate stories in hours, we wouldn't need story points, but experience shows we can't!
(But if you tell this to people, they will insist that they absolutely can make proper estimates, or that professional developers should be able to make proper estimates. Well, they can't, and we don't live in the should-universe.)
> I live in the real world so I estimate in hours
Do you make your estimates in front of other people who sometimes second-guess them? How often you actually meet your estimates?
Most definitely, but that only happened when at least one of the following conditions were met:
1. There was good documentation in place and an onboarding process that made it easy to transfer the foundation of the architectural knowlegde.
2. The work was clearly delineated and compartmentalised, so understanding the overall architecture wasn't critical for at least the entry-level tasks.
3. The architecture was a simple, standard pattern I was already very familiar with: for example a Django monolith.
I'd also like to add that a better name what I called "architecture knowledge" would be "institutional knowledge", as it does not include just the architecture per se; it includes the auxiliary things like development process, testing mechanisms, deployment etc.
There is no right way to deal with this other than to listen to complains and figure out what it is.
One way of thinking about complaining is that it is a form of feedback. As a manager, you don't want to silence people giving you feedback, frankly, this is about as stupid thing as you can do.
Better way to deal with complaints is to educate on what kind of complaints are productive and what kinds are destructive.
For example, I try (not always succeed) to restrict to myself to only complain about things that I am ready to solve if somebody tells me "go ahead, fix it".
Imagine a situation where the company and/or the project have a few serious problems, but the company refuses to fix or even admit any of that. The developers who couldn't live with the problems have already quit. The developers who remained have stopped complaining, because they have given up.
A new developer comes, notices the problems, and starts complaining about them. People notice that the newbie makes a fuss, but nothing changes. Later the developer either quits, or gets used to it and stops complaining.
Here is how the management probably interprets the situation: "People with the least experience complain most. The correct approach is to ignore them, and wait for them to grow up. More experienced developers have realistic expectations and mature behavior."
1. The ones that “don’t make mistakes”, but actually they are just unaware. 2. The ones that expect themselves to make mistakes and prepare for that.
And what do you do in the end?
You probably already know quite a few laws, just not that they have names. "Power tends to corrupt; absolute power corrupts absolutely", and "Any sufficiently advanced technology is indistinguishable from magic" come to mind.
In all fairness though, this is the only major gitlab incident I can recall, and it's more than three years old at this point.