Keeping master green at scale

Keeping master green at scale(eng.uber.com)

301 points by roshanj 7 years ago | 115 comments

underrun 7 years ago |

Adrian Colyer dug into this a little further on the morning paper:

https://blog.acolyer.org/2019/04/18/keeping-master-green-at-...

His analysis indicates that what uber does as part of its build pipeline is to break up the monorepo into "targets" and for each target create something like a merkle tree (which is basically what git uses to represent commits) and use that information to detect potential conflicts (for multiple commits that would change the same target).

what it sounds like to me is that they end up simulating multirepo to enable tests to run on a batch of most likely independent commits in their build system. For multirepo users this is explicit in that this comes for free :-)

which is super interesting to me as it seems to indicate that an optimizing CI/CD systems requires dealing with all the same issues whether it's mono- or multi- repo, and problems solved by your layout result in a different set of problems that need to be resolved in your build system.

ori_b 7 years ago | |

> For multirepo users this is explicit in that this comes for free :-)

Only if you spend the time to build tools to detect commits in your dependencies, as well as your dependent repositories, and figure out how to update and check them out on the appropriate builds.

So, no, it doesn't come for free.

underrun 7 years ago | | |

sorry, "this" is rather ambiguous.

You are totally correct that to achieve the same performance, correctness, and overall master level "green"ness in a multirepo system you would have to either define or detect dependencies and dependent repos, build the entire affected chain, and test the result. That part is much easier in monorepo.

What I was referring to with "this" is that Uber's method of detecting potential conflicts. In multirepo land it would be a "conflict" if two people commit to the same repo. In multirepo, therefore, detecting potential conflict is trivial.

If Bob commits to repo A and Sally commits to repo B, their commits can't result in a merge conflict. Well, unless the repos are circularly dependent - which would be bad :-) don't do that. Of course, monorepo makes that situation impossible so there's an advantage for monorepo.

It seems like whether you have mono- or multi- the problems solved by one choice will leave other problems the build system has to solve that it wouldn't have to solve if the other option were chosen.

Different work would be required in multirepo but it would be work to solve the problems that monorepo solves just by virtue of it being a monorepo.

msangi 7 years ago | | |

Package managers solve it quite well. Just depend on the latest version of your dependencies and tag a new version whenever they change.

zb 7 years ago | | |

Not exactly for free, but there are free tools that handle this job for you very nicely:

https://zuul-ci.org/

sundargates 7 years ago | |

You are right to say that conflict analyzer tries to treat commits independently based on the service or app (which are usually in separate repositories in a multi-repo world). However, note that the problem of conflicting changes (or a red master) exists even when you are in a multi-repo world as you could have one repository getting a large number of commits.

In fact, at Uber we have seen that behaviour with one of our popular apps when we did not have a monorepo. The construct of probabilistic speculation explained in the paper applies even in this scenario to guarantee a green master.

underrun 7 years ago | | |

Do you mean the construct of probabilistic speculation applies in multirepo because you may end up with a hot spot repo that receives a high volume of commits at once?

Or do you mean that multirepo could also benefit from the construct of probabilistic speculation by ordering commits across multiple repos such that you are maximizing the number of repos that have changed before you build and minimising the number of commits applied to single repos?

Or both :-)

ryanmarsh 7 years ago | |

Funny that you would draw a comparison to a Merkle tree. At one client they had such coupling between systems CI/CD was nearly impossible without either an explosion of test environments or grinding everything to a near halt.

We began working with the idea of consensus based CI/CD. If you pushed a change, you published that to the network. It gave other systems the opportunity to run their full suite of tests against the deployment of your code. Some number of confirmations from dependent systems was required to consider your code "stable". This progressed nearly sequentially assembling something like a block chain.

Ultimately the client was unable to pull this off for the same reason they were unable to decouple the systems: lack of software engineering capability.

huac 7 years ago |

"Based on all possible outcomes of pending changes, SubmitQueue constructs, and continuously updates a speculation graph that uses a probabilistic model, powered by logistic regression. The speculation graph allows SubmitQueue to select builds that are most likely to succeed, and speculatively execute them in parallel"

This is either brilliant or just something built for a promotion packet

jl-gitlab 7 years ago |

We're building some similar tech at GitLab, though without the dependency analysis yet.

Merge Requests now combine the source and target branches before building, as an optimization: https://docs.gitlab.com/ee/ci/merge_request_pipelines/#combi...

Next step is to add queueing (https://gitlab.com/gitlab-org/gitlab-ee/issues/9186), then we're going to optimistically (and in parallel) run the subsequent pipelines in the queue: https://gitlab.com/gitlab-org/gitlab-ee/issues/11222. At this point it may make sense to look at dependency analysis and more intelligent ordering, though we're seeing nice improvements based on tests so far, and there's something to be said for simplicity if it works.

Scaevolus 7 years ago |

There's a nice middle ground between this and a one-at-a-time submit queue: have a speculative batch running on the side. This gives nice speedups (approaching N times more commits, where N is the batch size) with minimal complexity.

One useful metric is the ratio between test time and the number of commits per day. If your tests run in a minute, you can test submissions one at a time and still have a thousand successful commits each day. If your tests take an hour, you can have at most 24 changes per day under a one-at-a-time scheme.

I worked on Kubernetes, where test runs can take more than an hour-- spinning up VMs to test things is expensive! The submit queue tests both the top of the queue and a batch of a few (up to 5) changes that can be merged without a git merge conflict. If either one passes, the changes are merged. Batch tests aren't cancelled if the top of the queue passes, so sometimes you'll merge both the top of the queue AND the batch, since they're compatible.

Here's some recent batches: https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=batch

And the code to pick batches: https://github.com/kubernetes/test-infra/blob/0d66b18ea7e8d3...

Merges to the main repo peak at about 45 per day, largely depending on the volume of changes. The important thing is that the queue size remains small: http://velodrome.k8s.io/dashboard/db/monitoring?orgId=1&pane...

antimora 7 years ago |

I am still trying to wrap my head around a giant monolithic repo model instead of breaking codes into multiple repos.

At Amazon, for example, they have multi repos setup. A single repo represents one package which has major version.The Amazon's build system builds packages and pulls dependencies from the artifact repository when needed. The build system is responsible for "what" to build vs "how" to build, which is left to the package setup (e.g. maven/ant).

I am currently trying to find a similar setup. I have looked as nix, bazel, buck and pants. Nix seems to offer something close. I am still trying to figure how to vendor npm packages and which artifact store is appropriate. And also if it is possible to have the nix builder to pull artifacts from a remote store.

Any pointer from the HN community is appreciated.

Here is what I would like to achieve:

1. Vendor all dependencies (npm packages, pip packages, etc) with ease. 2. Be able to pull artifact from a remote store (e.g. artifactory). 3. Be able to override package locally for my build purposes. For example, if I am working on a package A which depends on B, I should be able to build A from source and if needed to build B which A can later use for its own build. 4. Support multiple languages (TypeScript, JavaScript, Java, C, rust, and go). 5. Have each package own repository.

PKop 7 years ago | |

> At Amazon, for example, they have multi repos setup.

And didn't you find that this created massive headaches trying to build many disparate and inconsistent dependencies across repos? I think the benefits touted from mono-repos are exactly illustrated by the pain points working with Amazon's multi repo setup, in my opinion.

https://danluu.com/monorepo/

"Refactoring an API that's used across tens of active internal projects will probably a good chunk of a day."

This was my experience.

awinder 7 years ago | | |

How often have you interacted with “hot” packages that both change rapidly and are high dependency? Haven’t worked at amazon but in my experience that’s been low occurrence or a reason to build evolving api / not breaking the api.

I’m just curious, but in fairness both of these schemes have obvious issues that will become headaches or positive design depending on your outlook. Clearly you can engineer effectively in either scheme.

chairleader 7 years ago |

Quite a premise: "Giant monolithic source-code repositories are one of the fundamental pillars of the back end infrastructure in large and fast-paced software companies."

huac 7 years ago | |

facebook, google, airbnb, quora, many more all use monorepo

obviously there are many others who do not use monorepo (amazon comes to mind) but it's reasonable to claim that they are actually widely used and fundamental when used

jhenkens 7 years ago | | |

Microsoft uses it for Windows as well, which was so large they wrote their own git filesystem to power it.

vruiz 7 years ago | | |

Does anybody know how these companies development environments look like? I know about Piper at Google but how do the rest manage? Does every single engineer have the entire monorepo in their machines?

venantius 7 years ago | | |

Airbnb uses a monorepo for JVM-based project but most of Airbnb's code at least as of mid-2017 was not run on the JVM and was hosted multi-repo.

richardwhiuk 7 years ago |

Anyone fancy comparing this to bors?

sundargates 7 years ago | |

Actually we have compared it in our paper.

Bors builds one change at a time. On the other hand, Submit Queue speculatively builds several changes at a time based on the outcomes of other pending changes in the system. Apart from that, Submit Queue uses a conflict analyzer to find independent changes in order to commit changes in parallel as well as trim the speculation graph.

We have also evaluated the performance of Single-Queue (idea of Bors) on our workloads. In fact, as described in the paper, the performance of this technique at scale was so high (~132x slower) that we omitted its results. Submit Queue on the other hand operates at 1-3x region compared to an optimal solution.

I recommend you to read the paper here for further details. https://dl.acm.org/citation.cfm?id=3303970

richardwhiuk 7 years ago | | |

> Bors builds one change at a time.

Bors builds multiple changes at once (it creates a merge commit of all available changes and then runs the tests on all of them), and merges if all of them are good.

Possibly you are thinking of the older bors, as opposed to modern bors-ng?

drodgers 7 years ago | |

The main difference is in the conflict-detection system. Whereas bors only has a single queue, this new system can have one queue for each set of changes which doesn't interact with any other set. Eg. if you've got an ios app, a webapp, and a bunch of documentation all in the same repo, then this system will automatically work out that changes to each of those independent projects can be tested and merged in parallel, because they can't possibly conflict.

It relies on understanding the inputs and outputs for all CI build steps to work out how changes to particular files might conflict.

Also, it has a much more sophisticated understanding of how likely a change is to be the source of failure, which it updates in response to repeated test runs. It can then prioritise the changes which are most likely to succeed.

richardwhiuk 7 years ago | | |

Is the logic of which queue what files trigger automatically or manually determined?

shimont 7 years ago |

I think that what works for companies like Uber/Google/Facebook is not applicable to the rest of fortune 500 or all of the rest of the companies.

disclaimer: I am one of Datree.io founders. We provide a visibility and governance solution to R&D organizations on top of GitHub.

Here are some rules and enforcement around Security and Compliance which most of our companies use for multi-repo GitHub orgs. 1. Prevent users from adding outside collaborators to GitHub repos. 2. Enforce branch protection on all current repos and future created ones - prevent master branch deletion and force push. 3. Enforce pull request flow on default branch for all repos (including future created) - prevent direct commits to master without pull-request and checks. 4. Enforce Jira ticket integration - mention ticket number in pull request name / commit message. 5. Enforce proper Git user configuration. 6. Detect and prevent merging of secrets.

jonthepirate 7 years ago |

Having been at both Lyft and DoorDash where I've been an engineer responsible for unit test health, I decided to do a side project called Flaptastic (https://www.flaptastic.com/), a flaky unit test resolution system.

Flaptastic will make your CI/CD pipelines reliable by identifying which tests fail due to flaps (aka flakes) and then give you a "Disable" button to instantly skip any test which is immediately effective across all feature branches, pull requests, and deploy pipelines.

An on-premise version is in the works to allow you to run it onsite for the enterprise.

roskilli 7 years ago | |

I don't want to come across as negative, but just an observation and to play devil's advocate - wouldn't it be better to fix the flaky test or delete it entirely instead of build a feature to disable it during a test run in an automated fashion?

Whenever our team has a significant number of flakey tests (more than 1-2) we usually schedule a bug squash session to fix them and amortize the cost over the whole team.

jonthepirate 7 years ago | | |

What you really want to do is first disable a test you know is unhealthy to unblock everybody. Then, you fix it. After you've reintroduced it healthy, you can turn it back on.

viklove 7 years ago | | |

Best practice is actually just to disable all tests that are failing. Can't hold up our sprint deadlines!

cjfd 7 years ago |

A possible complication would occur if there are tests that occasionally fail.

revskill 7 years ago |

What's exactly a monothlic ? Is it only related to codebase (monothlic vs monorepo) ? Or it's about runtime like microservices vs monothlic.

jade12 7 years ago | |

From the first sentence of the abstract:

> monolithic source-code repositories

A monorepo is a monolithic repository

ricardobeat 7 years ago | | |

To answer the parent, it doesn’t imply a monolith application, but deployment to multiple server roles and apps will happen using the same source repository.

techmortal 7 years ago |

How common is this in the industry? Do multirepos run on a batch?

7e 7 years ago |

Is this novel? Other companies have had this for ages.