Improving code review time(engineering.fb.com) |
Improving code review time(engineering.fb.com) |
Solution: announce unprecedented layoffs of 10000 programmers.
Resolution: no work to be done. code review team on schedule.
Holy Fuck No.
90% of my PR review time goes into "Okay, how will this change impact parts of the system that this mid-level Dev doesn't understand yet?" and "Does this PR actually do what the ticket it is claiming to implement actually intended?"
Reviewing diffs in isolation completely removes one's ability to do that.
If you remove a person's ability to do that, what you've left them with is the easiest part of the PR, just checking that the logic seems logical. And honestly, most of that work can be automated by linting, style cops, and unit tests.
The fact that they got rid of the part of the PR review process that matters, and only saw a 1.5% improvement speaks to all sorts of problems in the process overall, not an improvement by this tool
This just seems like a feature to suggest another diff for you to review after you've finished accepting/rejecting the current diff. I'm already "in the zone" of code review so to speak, so this minimizes context switching. I see this is a good thing.
> The fact that they got rid of the part of the PR review process that matters
I don't understand this either. What "part of the PR review process" did they remove? The article does not claim to have eliminated any part of the review process.
The review tools on github already go too far toward this by removing far too much context.
Reviewing individual hunks would be crazy and even Facebook knows that.
Most changes don't need this. A prerequisite of fast code reviews is small changes. Rather than 3000-line features, make a series of changes with 10-100 lines of code plus tests. Reviews can quickly understand the change in logic and confirm that test cases for the new codepaths are being added. Wham, bam, done in two minutes.
Sure, some reviews take more time, but of them 10-30 code reviews I do a week, it's perhaps 10% of them.
The 1.5% improvement was from better suggestions on who would be a good person to review the diff.
The next reviewable diff feature "resulted in a 17 percent overall increase in review actions per day (such as accepting a diff, commenting, etc.) and that engineers that use this flow perform 44 percent more review actions than the average reviewer!"
Let me ask about your unspoken assumption: is this the PR reviewer's job? Maybe the PR seems to implement what the ticket asked for, then after merging it becomes clear that it didn't fully implement it, or the business stakeholders are unsatisfied, etc.?
But still, tasks should be small enough that it hopefully won’t hurt too much if you throw away all the code again.
Too often I experience that - even if you talk to low/mid level devs about a feature before, even if you make a task breakdown together with them and write all the software design decisions down, even if you tell them they should commit often to you can check once in a while, even then in the end it‘s too often garbage what has been produced. Still, companies want to keep these developers because it’s hard to find new ones. And I guess it’s our senior’s duty - even if there are many disappointments - to still assume the best and try to teach them to do better. Again and again and again.
The fact that this comment still remains at the top here despite being incredibly inaccurate (and many subcomments stating as such) shows how degraded the discussion has gotten here when Meta is mentioned.
I get into deep working mode for 3 hours a day total, on a good day. The rest is meetings, daily sync, coffee, lunch, "hey can you look at something", hallway conversations, emails, my own inability to concentrate when I'm not feeling it.
I've been in this industry for coming up on ten years. None of this is going to change, unless I become an academic or a hermit.
I don't get to do deep work, at least I'm going to unblock other people as fast as possible. That means out of an 8-hour work day, you're going to get a review from me within half an hour in most cases.
I've been on the team for years and I have write access. I WILL merge your change if you pass my review.
I find that this has immense benefits:
1) People just do things. They don't schedule design meetings, get approvals, get consensus. You know why? Because if someone has a good reason why the commit wasn't a good idea, we roll back. No harm, no foul. And guess what? It happens once in 100 commits. (If it's something truly complex, you do get a design doc approved first. But then the review is about making sure your code is correct, matches your design and our style/testing requirements, not whether it's the right thing to do.)
2) People write good commit messages. If your commit message isn't in the following format:
Push foos in bar order instead of baz order.
Following discussion with johnsmith, benchmark
(http://<shorturl>) shows 12% improvement in the hot
frobnication flow.
Ticket: http://tickets/<ticket_number>
I'm sending it back to you. Since I merge most of my team's code most commits look like that.3) People write small commits. Got a bigger change? I'll ask you to split it up (without breaking the build if we ship a version between commits). People don't push back on that, because they know it's not going to add a lot of overhead.
4) In the same spirit, people don't push back on changes I request - unless it's for a good reason. Discussions are on-point. When changes are made you get back approval half an hour later. No background psychological pressure of "I wanted to get this in today and I don't want to have to restore context tomorrow morning".
The velocity you reach is amazing. True serendipity. Unless you're consistently able to get full days of deep work in, I suggest you try it.
(edited for formatting)
One very simple approach would be better git integration with the IDE, helping build commit that make sense, where a set of changes could easily be commented by the author as they’re performing the edits, then keep improving from there.
The bigger picture: context is often missing for anything complicated, be it software or a new law. Yet many hands touch and retouch the underlying material over time. If you could capture _how_ something was built, and had enough insight into the larger process to sample some of the _why_, then you could both know what changed together and what potentially impacted the final decisions. This would result in (hypothetically) tremendous gains for anyone working on or joining a project that’s bigger than can fit in the mind of one person.
There are advantages of course, being able to jump around and get the context at will.
Reviewed-By: self
in the commit, and not wait. Much faster ;)As the commit author, it's in my (and everyone's) interest to size changes up so that it's easier for review, and also present in them in the logical order of thinking. Personally, I prefer the bottom-up approach.
I bring the non-functional and impertinent changes (like refactoring and tangential changes) ahead in the line-up so that the actual changes are kept separate and are concentrated at the tail end.
I make commit messages of the pattern: Present situation, the problem with that, what this patch does, and what the effect it has/how it solves the problem or sets up a path forward.
The initial PR might be sliced too thinly, and so will have more commits than ideal. But, as the review progresses, and once both the reviewer(s)' and the author's mental models are in sync, commits can be collapsed at their logical boundaries.
Regardless of the tooling and presentation, it's imperative that that the reviewers are intuitively aware of the ramifications of the change. Without that, the review ends up being nit picking, spell checking, and whatever that's obvious on the immediate vicinity, and the process degrades into a box-ticking exercise.
No AI needed. Be human.
If folks are interested, there's project called https://github.com/milin/gitown which does something similar in github leveraging code owners.
LGTM
I've often seen reviewers delivering essays to back up their arguments, which essentially boil down to "because I prefer it this way".
Focusing on pure word count doesn't mean that the feedback is valid, or even explains the reasoning well. If anything, it encourages nitpicky and long winded comments based on personal preference.
Often, less is more. If you can get a point across by a small code suggestion, do that instead. But definitely don't fall into the trap of suggesting huge chunks of code, or rewriting parts of it. Sometimes even asking a question to improve understanding is better than arguing a point.
And then other times, especially for trivial changes, "LGTM" or just a blank approval is perfectly fine as well. No need to waste time discussing trivial things if everyone is on the same boat.
But PRs and review cycles over burden dev teams and don’t seem to move the quality needle higher one bit.
A better way is to ensure multiple hands touch a given area of the code, so that multiple eyes ultimately are seeing and manipulating those bits. If they are given a task to do in that area they will be motivated to understand it (and potentially improve it). By contrast, with code reviews often the reviewer does not have time to really deep dive into the code and will only have a superficial understanding of it.
Oh, also use code quality scanners to keep an eye on tactical code debt.
- Prevents juniors from being blown out of the ocean into startalloverland by seniors at tail end
- Focus on the most dangerous aspects of the change that can't be fixed later
- Sets the stage for more informed programming reviews later on (lower priority to me though)
The good way I've seen architecture review used is nobody was going to tell you not to write or even deploy whatever the hell it was that you thought you wanted to write, but if you wanted to integrate with Grown Up Systems, there were ACLs that your system would not be added to unless and until your system had passed the review of the Grown Ups. I think this way is strictly better for two reasons: it can't strangle good ideas at birth, and it minimizes the amount of architecture reviewing that everyone needs to do, because half of the junk that gets brought to pre-implementation arch reviews never gets built anyway.
Looks a lot more fun than code review to boot:
>Driving down Time In Review would not only make people more satisfied with their code review process, it would also increase the productivity of every engineer at Meta.
This hasn't been tested. "The average Time In Review for all diffs dropped 7 percent" - they have verified that they changed the left side of the equation, the review time, but they haven't checked the outcome, the productivity. Overall it doesn't seem like they have checked if their changes have negative side effects.
Likewise
>The choice of reviewers that an author selects for a diff is very important. Diff authors want reviewers who are going to review their code well, quickly, and who are experts for the code their diff touches.
doesn't match
>A 1.5 percent increase in diffs reviewed within 24 hours and an increase in top three recommendation accuracy (how often the actual reviewer is one of the top three suggested) from below 60 percent to nearly 75 percent.
They have shown that the people they nudge are more likely to do a code review. But are they the experts who do the review well?
The 1.5 percent in reviewed diffs could also be jitter.
*edit: Meta could extend the review process. There doesn't seem to be a review process for the review team. If they don't like to review their changes, or if they cannot find suitable reviewers, how are they qualified to role out their changes to the software development team?
It always surprises me how much software companies want to rely on human verification. The whole point of programming is to automate and let the machine take care of it. Every few years the industry does add new tools to automate process like CI/CD pipelines, but at the ground level most companies seem to favor adding more humans whenever the technology is not good enough.
I made a talk about it, unfortunately in Hungarian, but you can see screenshots how it worked: https://youtu.be/7WiICWyP1sQ
Here is the code:
Everything else is a waste of time.
IMO a big part of what you ought to be reviewing for is readability, which does sometimes overlap with personal preference. But there's a spectrum from "I'd indent these columns a little differently" to "it's hard for me to follow what's happening, I think it'd be clearer if we organized things like...".
> At Meta we call an individual set of changes made to the codebase a “diff.”
GitHub:
> Pull request
Amazon:
> Change Request
GitLab:
> Merge Request
Google:
> Changelist
Nitpicking, but jesus christ, why can't we stick to a single term?
I have about 2 blocks of meetings per day on average. I usually check my assigned code reviews in the morning and when I come back from the meetings. Once I’m done with those then I move on to my own work.
If I have too much code review to the point that I can’t get my own work done, I take my name off the reviewer list and it’s reassigned.
On most days I’m still able to get a good 2-3 hours of sustained focused work on my own project.
Are you sure you are a developer?
How big is your team? We've got 4 developers on our team (and one lead), and require two signoffs for code changes. There's just not enough of us for something like that to work... right?
Picking up a 5 minute review after you've returned from some existing context switch (a meeting, coffee, conversation, lunch, etc.) is easy enough.
As a result, around 25% of my reviews are done within an hour, and 10% within 10 minutes. I'm a slightly fast reviewer, but approximately the same applies to reviews done of my code. And that includes reviews that require interaction with peers in Europe, which obviously require much longer because they can be mailed while I'm asleep or the reverse.
[*] it's an exaggeration, but close to reality.
That's not exactly a break...
Alternatively your code review culture doesn't give a lot of suggestions for changes before submission, and that leads to more technical debt which will produce bad results long term.
Often the suggestion featured does suggest a whole team as a review based on code ownership.
If it typically takes you hours to "get inside" some code, that code is way too complicated.
So my commit history is
"fump" "GAH" "I think this works?"
etc. but all changes have a description of the issue (and link back to the ticket) the solution described in some detail.
Doing that, you remove the possibility at the submitter of a final check before the merge. Especially if the merge happens on another day, rechecking with a clean mind helps to find missed things.
Then someone does that to my changes, they get a kind message to not do this anymore.
Try reviewing commits without reading any associated messaging, or having your team submit complex changes with no message. You have to engage your brain to understand what the code is doing and what is being changed, you'll have a better understanding of if the comments are useful or insufficient, and most importantly you don't already have a bias that the code does what it says. You may find that when you really look at it foos are getting pushed in qux order, or that the benchmark was calling a different function.
This is an especially bad metric because
1) We have good data that after about 60 minutes of reviewing we start to lose the ability to find more issues.
2) It incentives making bigger and harder to review changes so that people spend more time looking at the code.
Without code review, when will a junior developer ever learn anything from a senior developer?
98% of the review comments I receive hark back to some handwavy explanation about maintainability. But never in any way that could actually cause a bug.
when I run teams my rules with PRs are if it's not a bug, and it's not against the code review guidelines leave it. It's usually not worth the back and forth unless there is a clear mentor/mentee relationship.
One person writes the code. Many people read the code. If something is unclear or feels wrong to the reader, they get priority. Reviewers should leave all sorts of feedback; the response of the author should be to just implement the feedback unless they can clearly articulate why doing so is not a good idea. You're right that "it's not worth the back and forth", but completely wrong on who should be the one keeping their mouth shut.
I think there's assumption that people won't just rubberstamp significant diffs to code they don't own. When submitting a change to another team's project if the reviewers that are suggested aren't actually the right person they are more likely to know the right person who should review it and they can manually add that person.
>The 1.5 percent in reviewed diffs could also be jitter.
Facebook / Meta has tools for measuring the effects of changes and seeing if they are statistically significant. Yes, it could still be jitter but without them giving more data about the experiment we can't tell what the chance of it being due to chance is.
>There doesn't seem to be a review process for the review team
There isn't a review team. Anyone can review a change.
I wouldn't advise doing that but to play devil's advocate, why shouldn't they do that when number of reviews are probably a metric in their performance review? What's in place to discourage that?
But at its core, code review is putting two heads together to write some code instead of one. In that sense you can think of the code review problem as the "automating away all of programming" problem.
Pull Requests and Merge Requests are more complicated, and the unit of change is a branch. Many people reject this workflow because ultimately a branch of commits can be summed to one single diff and that’s the only thing that matters in the wider project, outside of your private branches on your personal machine.
GitLab doesn’t support anything other than merging hence the name. Most people who prefer linear history will never merge. Instead they’ll land their change on the tip of the branch, each developer taking it in turns to advance the linear history one step at a time.
I don’t know about changelists or change requests.
When you use Phab with Git you normally set it to always squash merge. Each diff becomes exactly one commit on the main branch, with its commit message reformatted by Phab. The SHA that you committed before submitting your changes to Phab never appears in the authoritative repo (if only because Phab adds Reviewed-By and other metadata).
Tools don't agree on what to call stuff, and everyone uses different tools (and a lot of these tools are pretty old). Perforce is from '95, phabricator is from '07. Both of those predate widespread usage of git (released '05), so it's pretty reasonable that everything is different.
Trying to standardize for some aesthetic reason wouldn’t add anything
Standardization is more about consistency than aesthetics.
It could be in a draft state, awaiting review, accepted for merge, or being merged.
'diff' is the shortest at most descriptive name.
Dennis: The implication that things might go wrong for her if she doesn't review my code in a timely manner. Now, not that things are gonna go wrong for her, but she’s thinking that they will.
Because different groups of people develop their own languages. This is pretty standard for humanity across all time periods, disciplines, cultures, etc.
At least that seems descriptive in the context of making a request in a system to get approval to change the code.
"Pull Requests" are just implementations that many web-based git tools (but not all!) have adopted to facilitate the code review process taking place on their platforms.
don't forget Incident = SEV = DEFCON = 911 ...
[1]: https://google.github.io/styleguide/ [2]: https://abseil.io/tips/ [3]: https://google.aip.dev/
Meanwhile in the same org, the group doing their GUI kept adding band-aids to a broken mess for years.
Although the best response to that sort of thing is just a quick DM - "hey, I'm not sure how to interpret this comment, wanna hop on a call for a minute and explain?"
Again - put up or shut up. Want a change? Cool, let's pair on it. I want the code to be good too. But I'm not going to just read your comment and based on the vibe I'm feeling shoot off some change, only to find out that it wasn't what you meant.
If you can't have those discussions in a code review then your code review process is too slow, and you ought to fix it so that it is easy to have these discussions there.
Being a good code reviewer (just like being a good editor) is absolutely its own skill though, and people can be bad at it.
Just like your can solve simple problems with spaghetti code.
Of course, writing simple, will factored code is often hard!
And I'd like to stress the distribution of the "complexity of a review", which you can crudely approximate with something like number of lines changed. Most of them will be small.
I've worked with squash and merge almost everywhere and it's great. Keeps history clean (1 commit = 1 ticket, usually), and links back to the PR with discussions and context.
I squash my commits into functionally independent units so they make sense and can be rolled back easily. Rebase FTW.
Just checked, it doesn't seem that my self-hosted Gitea instance allows for that, so neither does Gogs (or maybe I just can't find the option in the UI).
Organising the diffs in this way made it much, much easier to collaborate.
Given how much FB has grown since then, I suspect this would be even easier, and this is how the numbers noted above are what they are.
You put these little nuggets of impact on your performance review.
And if everyone followed everyone else's advice you end up with everyone writing code in their reviewers style instead of their own which doesn't seem like a gain.
Not to mention code changes have a cost, very few devs spend as much time testing and thinking about their code when making code changes. So many times you are trading more thought out, better tested code for less tested less thought out code. Because of this I've seen countless bugs introduced from code changes related to trivial PR requests.
Writing in "the reviewer's style" is preferred because they are the reader. For most things that are "style" there should be explicit style guides; for the things that aren't, the reader's opinion is better every time.
If your team isn't getting useful code review feedback, that's also a team culture problem and your senior eng need to step up and lead by example.
"I don't understand this" is important feedback. At a minimum, more comments are needed, but it probably also signals that variable or method names need improvement.
My experience (15+ years, companies of all shapes and sizes from college shops to FAANG) is that the kind of engineer who feels that code review feedback isn't worth listening or is a waste of time to is detrimental to the team in both the short and long term.
> the reader's opinion is better every time.
I don't see why the reader's preference is always better. If you like designing for flexibility and I like KISS why is KISS better when you write the code, and flexibility better when I write it?
You act like our profession isn't plagued by flamewars. You see it on here every day with differences of opinion about microservices, design patterns, designing for today vs the future, KISS vs SOLID, how much testing is appropriate, should you focus more on automation vs integration vs unit tests.
When the organizations views on all these matters are aligned and documented such that the code reviews are structured that's great and you get good feedback. I've worked for 10 companies almost none of them were that aligned and organized.
Code review feedback can be very valuable, I've was the champion for code reviews 8 years ago when most organizations were not doing them. But I think the vast majority of useful code review feedback falls into one of the following categories
1. Spotted a bug, deviation from spec, or a deficiency in the spec 2. A detailed and concrete comment about maintainability (i.e. this method doesn't handle x case, or blows up on y case and that's not documented) 3. This violates a code review guideline 4. Expressing what is confusing such as complex code that needs documentation or poorly named variables 5. Performance critique
Most of the code review feedback I've seen has not fallen into these categories and instead has to do with one persons preferences vs another.
- I'd break this out into a method vs I'd inline this method - Differences in code grouping by functional vs logical cohesion. - You should make this more performant vs this isn't going to be a hotspot so build this for readability over performance
It's something that could be addressed on an organizational level perhaps.
The other point to consider is the whole teach a man to fish vs give them a fish thing. Teaching is better in the long run.
Org level: absolutely. Solvable thru tooling.
You: "I'm going to send a pull request"
Other engineer: "OK. BTW, we call them change requests here"
You: "OK"
Somebody else: Where I used to work we called them pull requests. Let’s have a meeting to discuss changing the terms to match industry standards. We may also want to create a committee to agree on other terminology changes too, like main instead of master.
If the workflow is different, then it's _not_ just a question of terminology. In that case it's probably a good thing that there are different names for the various workflows.
You might need focus, i.e. non-meeting days, 3 days a week.
They mean locally as in code scope and not necessarily client sided or the size of a feature
If you look at only the PR, even with a few lines of context, you still don't see much.
I actually like the diff view window provided by JetBrains editors through the alreedy bundled "Github plugin". I get to see the whole file, before and after (left/right), with highlights for changes, lines added, lins removed. That way I see the entire context.
If you only see the usual change+3 lines of context you often don't even see what function is impacted, and it's also rare to have the context of the entire module being changed in ones head already.
For evaluating PRs, I use the PR review feature in IDEA editors and go through the list of files changed, opening a new window with the entirety of that file available to me, changes highlighted. F7 jumps to the next change, but mostly I just scroll.
You can add review comments right in the diff view.
https://youtu.be/MoXxF3aWW8k ("IntelliJ IDEA. GitHub Pull Requests") -- Diff viewer shown at 2:40; In that video the diff view is shown inside the editor, but I much prefer - and fortunately that is configurable - to open the Diff View in a new window, maximized.
https://www.jetbrains.com/help/idea/comparing-files-and-fold...
The example works only for PRs on GitHub and IDEA editors, but I wanted to present the concept, and show that IMO very nice diff view and how it makes it easy to review changes with the context of the entire file.
Thanks for the links.
I don't work at Meta and I still don't have any idea which of you is right.
A commit would satisfy the description above.
As would a pull/merge request, which the image in the article perhaps fits better: https://engineering.fb.com/wp-content/uploads/2022/11/Code-R...
Of course, it's possible to review each individual commit as well, but who does that?
I know that some folks love to mess around as much as they need in their local branches, commit often and then do an interactive rebase, to maybe end up with one or just a few commits that can then be the basis for a pull/merge request, then the difference matter a little bit less.
Imagine a feature that would heavily degrade performances as a whole, but as it’s being introduced in small bits and pieces no one sees the big picture and are left to wonder why perfs are slowly degrading, boiling frog style.
And yeah, I've been in those long discussions too about frameworks and linting rules. It can often turn into just egos trying to win and less about doing what's right or expedient for the team.
This statement is definitely a misleading one. A diff itself is always a commit in the history log, and a diff can't be a group of multiple smaller units unless their team submit change set to git first and sync the squashed commit back to fbcode. But even in that case, from fbcode's view the squashed commit is still a diff.
A set of changes is called diff stack.
https://cloud.google.com/architecture/devops/devops-tech-tru...
Do you know where the missing tips are? For example:
The root page (parent's [2]) mentions this one as famous by name:
>Often they are cited by number, and some have become known simply as “totw/110” or “totw/77”.
Is totw/110 some Google secret sauce that we are forbidden from knowing or did they skip some weeks?
110 in particular probably could be public, but it looks like they stopped externalizing them in late 2020, which is kind of sad, so I assume they never got around to it.