Improving code review time

Improving code review time(engineering.fb.com)

216 points by pzrsa 3 years ago | 219 comments

ep103 3 years ago |

> Next reviewable diff

Holy Fuck No.

90% of my PR review time goes into "Okay, how will this change impact parts of the system that this mid-level Dev doesn't understand yet?" and "Does this PR actually do what the ticket it is claiming to implement actually intended?"

Reviewing diffs in isolation completely removes one's ability to do that.

If you remove a person's ability to do that, what you've left them with is the easiest part of the PR, just checking that the logic seems logical. And honestly, most of that work can be automated by linting, style cops, and unit tests.

The fact that they got rid of the part of the PR review process that matters, and only saw a 1.5% improvement speaks to all sorts of problems in the process overall, not an improvement by this tool

lopkeny12ko 3 years ago | |

I don't understand this criticism. How does a "next reviewable diff" pop-up suggest that you're being forced to review a diff "in isolation"? As I understand it, nothing proposed in this article prevents you from reviewing the diff in context of the larger piece of software, as you would have always done.

This just seems like a feature to suggest another diff for you to review after you've finished accepting/rejecting the current diff. I'm already "in the zone" of code review so to speak, so this minimizes context switching. I see this is a good thing.

> The fact that they got rid of the part of the PR review process that matters

I don't understand this either. What "part of the PR review process" did they remove? The article does not claim to have eliminated any part of the review process.

grey-area 3 years ago | | |

It encourages looking quickly at small bits of changes in isolation as opposed taking a holistic view of the entire change, and viewing them in series as 5 min little tasks you can tick off like tiktok videos.

The review tools on github already go too far toward this by removing far too much context.

craig 3 years ago | |

I think you are misinterpreting the term "diff". Near the beginning of the article they describe it "At Meta we call an individual set of changes made to the codebase a “diff.” So diff == PR.

kqr 3 years ago | |

You seem to be thinking of "hunk" and not what Facebook calls "diff" -- what Facebook means by "diff" is closer to what many people call "commit" or even "branch", i.e. a set of interrelated changes that are sort of atomic.

Reviewing individual hunks would be crazy and even Facebook knows that.

Arainach 3 years ago | |

You don't remove their ability to do that. If you need more context, then there are UI elements to show the other lines. At Google, there are also links to the file in question in code search if you want to look at history or any other related context.

Most changes don't need this. A prerequisite of fast code reviews is small changes. Rather than 3000-line features, make a series of changes with 10-100 lines of code plus tests. Reviews can quickly understand the change in logic and confirm that test cases for the new codepaths are being added. Wham, bam, done in two minutes.

Sure, some reviews take more time, but of them 10-30 code reviews I do a week, it's perhaps 10% of them.

charcircuit 3 years ago | |

>The fact that they got rid of the part of the PR review process that matters, and only saw a 1.5% improvement speaks to all sorts of problems in the process overall, not an improvement by this tool

The 1.5% improvement was from better suggestions on who would be a good person to review the diff.

The next reviewable diff feature "resulted in a 17 percent overall increase in review actions per day (such as accepting a diff, commenting, etc.) and that engineers that use this flow perform 44 percent more review actions than the average reviewer!"

solatic 3 years ago | |

> "Does this PR actually do what the ticket it is claiming to implement actually intended?"

Let me ask about your unspoken assumption: is this the PR reviewer's job? Maybe the PR seems to implement what the ticket asked for, then after merging it becomes clear that it didn't fully implement it, or the business stakeholders are unsatisfied, etc.?

ohgodplsno 3 years ago | | |

It's not the PR reviewer's job to go actively test it out (you can assume that your colleagues are somewhat competent at what they're doing), but if you review with the spec or the issue open on the side that says to add a blue button and you see it's red, it's your job to ensure it's not a mistake and point it out to the author.

thehm 3 years ago | |

Those architectural decisions should be reviewed during the design and planning phase so mid/low-level devs don't waste time building the wrong thing in the first place.

ckdot 3 years ago | | |

In theory that’s right what you say.

But still, tasks should be small enough that it hopefully won’t hurt too much if you throw away all the code again.

Too often I experience that - even if you talk to low/mid level devs about a feature before, even if you make a task breakdown together with them and write all the software design decisions down, even if you tell them they should commit often to you can check once in a while, even then in the end it‘s too often garbage what has been produced. Still, companies want to keep these developers because it’s hard to find new ones. And I guess it’s our senior’s duty - even if there are many disappointments - to still assume the best and try to teach them to do better. Again and again and again.

SirensOfTitan 3 years ago | |

Diff at Meta is short for "Differential revision," a Phabricator term (which was originally an internal FB project before Evan left).

The fact that this comment still remains at the top here despite being incredibly inaccurate (and many subcomments stating as such) shows how degraded the discussion has gotten here when Meta is mentioned.

comfypotato 3 years ago |

Am I reading this right? If the total median time in review is “a few hours”, that means that people are dropping what they’re doing to review the code. That can’t really be a good code review? A context switch alone would be half of that time for me.

underdeserver 3 years ago |

I'm a strong believer in fast reviews.

I get into deep working mode for 3 hours a day total, on a good day. The rest is meetings, daily sync, coffee, lunch, "hey can you look at something", hallway conversations, emails, my own inability to concentrate when I'm not feeling it.

I've been in this industry for coming up on ten years. None of this is going to change, unless I become an academic or a hermit.

I don't get to do deep work, at least I'm going to unblock other people as fast as possible. That means out of an 8-hour work day, you're going to get a review from me within half an hour in most cases.

I've been on the team for years and I have write access. I WILL merge your change if you pass my review.

I find that this has immense benefits:

1) People just do things. They don't schedule design meetings, get approvals, get consensus. You know why? Because if someone has a good reason why the commit wasn't a good idea, we roll back. No harm, no foul. And guess what? It happens once in 100 commits. (If it's something truly complex, you do get a design doc approved first. But then the review is about making sure your code is correct, matches your design and our style/testing requirements, not whether it's the right thing to do.)

2) People write good commit messages. If your commit message isn't in the following format:

  Push foos in bar order instead of baz order.

  Following discussion with johnsmith, benchmark 
  (http://<shorturl>) shows 12% improvement in the hot 
  frobnication flow.

  Ticket: http://tickets/<ticket_number>

I'm sending it back to you. Since I merge most of my team's code most commits look like that.

3) People write small commits. Got a bigger change? I'll ask you to split it up (without breaking the build if we ship a version between commits). People don't push back on that, because they know it's not going to add a lot of overhead.

4) In the same spirit, people don't push back on changes I request - unless it's for a good reason. Discussions are on-point. When changes are made you get back approval half an hour later. No background psychological pressure of "I wanted to get this in today and I don't want to have to restore context tomorrow morning".

The velocity you reach is amazing. True serendipity. Unless you're consistently able to get full days of deep work in, I suggest you try it.

(edited for formatting)

bsaul 3 years ago |

there’s so many low hanging fruits for improving the quality of diff viewing. The worst code reviews are often the ones where code get refactored, leading to piles of delete / create lines that are just code being moved or slightly renamed.

One very simple approach would be better git integration with the IDE, helping build commit that make sense, where a set of changes could easily be commented by the author as they’re performing the edits, then keep improving from there.

azinman2 3 years ago | |

This is a concept I almost pursed for my PhD over a decade ago. I’m still surprised no one has done this.

The bigger picture: context is often missing for anything complicated, be it software or a new law. Yet many hands touch and retouch the underlying material over time. If you could capture _how_ something was built, and had enough insight into the larger process to sample some of the _why_, then you could both know what changed together and what potentially impacted the final decisions. This would result in (hypothetically) tremendous gains for anyone working on or joining a project that’s bigger than can fit in the mind of one person.

lamontcg 3 years ago | | |

The PR history can answer the why, and if you can anticipate someone will ask why because you know it is edge-condition spaghetti then you can document it right in a comment.

theptip 3 years ago | |

JetBrains has a code review platform that lets you do diffs in your IDE. The idea is incredibly appealing but for some reason I found it not so great in practice. Something about viewing the diffs in a different UI gets me in a different frame of mind. I found it hard to get into review mode in my IDE.

There are advantages of course, being able to jump around and get the context at will.

toast0 3 years ago |

When I was there, you could always just put

    Reviewed-By: self

in the commit, and not wait. Much faster ;)

philipwhiuk 3 years ago | |

In the future "Simon Elf" will be asked why he approved so much broken code ;)

system2 3 years ago | |

It might be the reason why you aren't there anymore. (kidding)

toast0 3 years ago | | |

Lol, I let myself out, but appreciate the thought ;)

penguin_booze 3 years ago |

> Next reviewable diff

As the commit author, it's in my (and everyone's) interest to size changes up so that it's easier for review, and also present in them in the logical order of thinking. Personally, I prefer the bottom-up approach.

I bring the non-functional and impertinent changes (like refactoring and tangential changes) ahead in the line-up so that the actual changes are kept separate and are concentrated at the tail end.

I make commit messages of the pattern: Present situation, the problem with that, what this patch does, and what the effect it has/how it solves the problem or sets up a path forward.

The initial PR might be sliced too thinly, and so will have more commits than ideal. But, as the review progresses, and once both the reviewer(s)' and the author's mental models are in sync, commits can be collapsed at their logical boundaries.

Regardless of the tooling and presentation, it's imperative that that the reviewers are intuitively aware of the ramifications of the change. Without that, the review ends up being nit picking, spell checking, and whatever that's obvious on the immediate vicinity, and the process degrades into a box-ticking exercise.

No AI needed. Be human.

milin 3 years ago |

Somewhere in the post, it's mentioned fb uses code ownership logic in the next review engine.

If folks are interested, there's project called https://github.com/milin/gitown which does something similar in github leveraging code owners.

_boffin_ 3 years ago | |

Thanks

bagels 3 years ago |

Here's another factor at Meta that can reduce code review time: Your performance review is based in part (maybe not a large part, but in part) on how many reviews you perform, and how many words you put in to those reviews. edit: In short, people are incentivized to review

nsenifty 3 years ago | |

Ex-Facebooker here: Number of reviews does factor into performance reviews, but mostly as a tie-breaker in calibrations when you're borderline in between bands, not as a primary metric. People who do disproportionately large number of reviews do get recognized and rewarded.

jiveturkey 3 years ago | |

I quite agree with the proposition you are making here. It goes to the very heart of the matter, does it not. The better you are at giving detailed, explicit and concrete feedback about each and every particular aspect of a diff (NOTE: both good and bad), the better and more competent you clearly are, and the more of a true champion for the cause you are proving to be. Time and time again, I wish my reviewers would just lay it all out on the table. As opposed to the limited and perfunctory

LGTM

imiric 3 years ago | | |

Meh.

I've often seen reviewers delivering essays to back up their arguments, which essentially boil down to "because I prefer it this way".

Focusing on pure word count doesn't mean that the feedback is valid, or even explains the reasoning well. If anything, it encourages nitpicky and long winded comments based on personal preference.

Often, less is more. If you can get a point across by a small code suggestion, do that instead. But definitely don't fall into the trap of suggesting huge chunks of code, or rewriting parts of it. Sometimes even asking a question to improve understanding is better than arguing a point.

And then other times, especially for trivial changes, "LGTM" or just a blank approval is perfectly fine as well. No need to waste time discussing trivial things if everyone is on the same boat.

Scubabear68 3 years ago |

I came to the conclusion long ago that mandatory code reviews are a waste of time. For critical stuff, absolutely.

But PRs and review cycles over burden dev teams and don’t seem to move the quality needle higher one bit.

A better way is to ensure multiple hands touch a given area of the code, so that multiple eyes ultimately are seeing and manipulating those bits. If they are given a task to do in that area they will be motivated to understand it (and potentially improve it). By contrast, with code reviews often the reviewer does not have time to really deep dive into the code and will only have a superficial understanding of it.

Oh, also use code quality scanners to keep an eye on tactical code debt.

kerblang 3 years ago |

In the spirit of tangentialism I randomly suggest: Architecture Review!

- Prevents juniors from being blown out of the ocean into startalloverland by seniors at tail end

- Focus on the most dangerous aspects of the change that can't be fixed later

- Sets the stage for more informed programming reviews later on (lower priority to me though)

jeffbee 3 years ago | |

I've seen mixed things from architecture reviews. I've seen it used by people with titles that exceeded their actual abilities, to stop people with junior titles from doing things the senior person simply didn't understand. And I've seen architecture reviews used just to satisfy the whims of senior people, to gratify that urge to nitpick or dictate what language they wanted to use. Those are the bad ways.

The good way I've seen architecture review used is nobody was going to tell you not to write or even deploy whatever the hell it was that you thought you wanted to write, but if you wanted to integrate with Grown Up Systems, there were ACLs that your system would not be added to unless and until your system had passed the review of the Grown Ups. I think this way is strictly better for two reasons: it can't strangle good ideas at birth, and it minimizes the amount of architecture reviewing that everyone needs to do, because half of the junk that gets brought to pre-implementation arch reviews never gets built anyway.

pvg 3 years ago | |

Architecture Review!

Looks a lot more fun than code review to boot:

https://www.youtube.com/watch?v=QfArEGCm7yM&t=57s

s3000 3 years ago |

Have I missed the feedback from the users? There should be some quotes from team members who liked the change. Their mentioning that they start being data-driven for internal tools suggests that they start treating developers like cattle and not pets.

>Driving down Time In Review would not only make people more satisfied with their code review process, it would also increase the productivity of every engineer at Meta.

This hasn't been tested. "The average Time In Review for all diffs dropped 7 percent" - they have verified that they changed the left side of the equation, the review time, but they haven't checked the outcome, the productivity. Overall it doesn't seem like they have checked if their changes have negative side effects.

Likewise

>The choice of reviewers that an author selects for a diff is very important. Diff authors want reviewers who are going to review their code well, quickly, and who are experts for the code their diff touches.

doesn't match

>A 1.5 percent increase in diffs reviewed within 24 hours and an increase in top three recommendation accuracy (how often the actual reviewer is one of the top three suggested) from below 60 percent to nearly 75 percent.

They have shown that the people they nudge are more likely to do a code review. But are they the experts who do the review well?

The 1.5 percent in reviewed diffs could also be jitter.

*edit: Meta could extend the review process. There doesn't seem to be a review process for the review team. If they don't like to review their changes, or if they cannot find suitable reviewers, how are they qualified to role out their changes to the software development team?

proc0 3 years ago |

If you're going to add machines to the process why not add it with the purpose of eliminating the human from the process all together? Reviews are necessary because compilers and linters can't catch everything. Runtime bugs that are not caught by the pipeline tend to be edge cases that don't happen until there is enough data to test (in the general sense) the feature. ML could be used for smart testing and if it passes the code diff merges automatically.

It always surprises me how much software companies want to rely on human verification. The whole point of programming is to automate and let the machine take care of it. Every few years the industry does add new tools to automate process like CI/CD pipelines, but at the ground level most companies seem to favor adding more humans whenever the technology is not good enough.

kissgyorgy 3 years ago |

On a much smaller scale (with a team of 8), but I also noticed this problem and wrote a “nudge bot” for Slack and Gerrit. It takes the team-relevant changes and post it to a Slack channel in a formatted message with the patch state (not reviewed, pending, needs change, etc)

I made a talk about it, unfortunately in Hungarian, but you can see screenshots how it worked: https://youtu.be/7WiICWyP1sQ

Here is the code:

https://github.com/kissgyorgy/slack-review-bot

osculum 3 years ago |

Is it just me, or in the last couple of weeks (since announcement of layoffs) there's been an increase in FB tooling/infra threads? Could be Baader–Meinhof effect, of course.

jensvdh 3 years ago |

Stacked diffs are awful compared to PR's. Not every commit should be clean

TheTomBombadil 3 years ago | |

Why? I found them way easier to review than regular PR during my time at fb. Since every commit is cleaner, you get fewer, higher quality commits than in a PR.

anikom15 3 years ago |

If code is important enough, it will get reviewed and tested one way or another.

Everything else is a waste of time.

davidmurdoch 3 years ago |

All these comments about how code review is a waste of time, or suggest code review is only for bugs, really shine light on why so much software is incredibly slow today.

itsdrewmiller 3 years ago | |

None of them engage with the content of the article either - having a "play next" button for code review is awesome assuming it works reasonably well. I'm curious about the quality of nudgebot reviews. In my experience the PRs that sit around forever are the 3000 line epic find/replace refactors all done in one commit that are impossible to really review. I skimmed the paper and didn't see any accounting for "diff time to diff length", so I'm not sure the result there is anything meaningful. Maybe people are getting faster feedback to not submit such shitty PRs.

riwsky 3 years ago | | |

what stops you from reviewing the sed one-liner that created the diff, instead?

logicchains 3 years ago | |

Maybe people'd have more time to optimise the performance of their software if they weren't spinning their wheels and context switching waiting hours for minor changes to be merged.

yrgulation 3 years ago |

Too many think code reviews are an opportunity for endless debates over personal preference. A code review should be fast and cover blatant good practice violations and architectural mistakes. Everything else should be taken care of by linters and tools. If a reviewer wants code done in a different way they can write the code themselves.

98codes 3 years ago | |

One thing I've enjoyed where I am now is that PR comments come in two flavors. The first, actual feedback. The second, borderline pedantic issues that are prefaced with "nit: " in the comment. Nit comments are safely ignored but are there so that if the author wants to put in that change while changing some other issue, then OK.

gknoy 3 years ago | | |

I especially like Github's new option to make a _suggested diff_ of what you want changed. Typo fixes, comments re-worded, etc. It really reduces friction, both as the person making the suggestion, and as the person who authored the PR.

wahnfrieden 3 years ago | | |

the reviewer should make 'nit' changes directly rather than pass a mini-ticket for evaluation back to the author. why are reviewers so scared to modify prs at orgs?

alecbz 3 years ago | |

I agree that anything that _could_ be covered by an automated check ought to be. But don't think I'm convinced that everything else is either an egregious mistake or isn't worth discussing.

IMO a big part of what you ought to be reviewing for is readability, which does sometimes overlap with personal preference. But there's a spectrum from "I'd indent these columns a little differently" to "it's hard for me to follow what's happening, I think it'd be clearer if we organized things like...".

yrgulation 3 years ago | | |

"I think it'd be clearer if we organized things like..." any such guidelines should be agreed upon beforehand, otherwise what's the expectation? That people rewrite their code to accommodate someone's needs? Reading, other than what's in whatever coding style the team uses, is subjective. Best to agree upon what the whole team prefers beforehand.

andreygrehov 3 years ago |

Meta:

> At Meta we call an individual set of changes made to the codebase a “diff.”

GitHub:

> Pull request

Amazon:

> Change Request

GitLab:

> Merge Request

Google:

> Changelist

Nitpicking, but jesus christ, why can't we stick to a single term?

posharma 3 years ago |

Disclaimer: these are anecdotal reports. I've heard from a lot of my friends how abysmal the quality of code is at Meta. Obviously, this may not be true in all teams/products, but that's the general sentiment. Why make it faster when you're already dealing with mess! This is abundantly evident from the constant fire fighting, duct tapes and a metric driven culture that incentivizes the number of diffs landed.

ummonk 3 years ago | |

I'd say local code quality is generally good but overall it's a big hodge podge of small features duct-taped together, so the whole app becomes a tangled mess. The metric driven culture tends to emphasize impact, not diffs landed.

bombolo 3 years ago | | |

> I'd say local code quality is generally good

Have you ever used facebook.com? At least the frontend is incredibly slow (on a thousands of € machine), so that's not synonym with quality in my experience.

Dwolb 3 years ago | | |

How would you measure impact?

lizardactivist 3 years ago |

Situation: code review takes too much time.

Solution: announce unprecedented layoffs of 10000 programmers.

Resolution: no work to be done. code review team on schedule.

loeg 3 years ago | |

11,000 employees, about half technical.