Google Is 2B Lines of Code, All in One Place(wired.com) |
Google Is 2B Lines of Code, All in One Place(wired.com) |
I wonder how much time it takes to clone the repo, provided they use git.
Still, this is not a very forward-thinking solution. Building and combining microservices – effectively UNIX philosophy applied to the web – is the most effective way to make progress.
EDIT: Seems like I misunderstood the article – from the way I read it, it sounded like Google has a monolithic codebase, with heavily dependent products, deployed monolithically. As zaphar mentioned, it turns out this is just bad phrasing in the article and me misunderstanding that phrasing.
I take everything back I said and claim the opposite.
Or think about April 1st, when they set a Access-Control-Location: * header on google.com because someone wrote the com.google easteregg.
Read the post from the SoundCloud dude from yesterday to find out how to do software management properly (hint: modularization is everything)
If that’s not the case, I apologize for misunderstanding it.
But if it was the case, I wanted to state that it might not be wise, for the same reasons as this thread mentioned https://news.ycombinator.com/item?id=10195423
EDIT: Thanks for telling me, though! Always nice to be proven wrong, as at least I learnt something today :D
You should think of Piper as a single filesystem which permits atomic multi-file edits. And that's about it; there's nothing in that which forces any particular release structure on you.
The issue was that they wanted to load the page – with the user logged in, etc – on com.google. For this they implemented an explicit URL parameter that would allow this.
The horror!
Neither HN search nor Google search show anything to "modularization is everything".
http://philcalcado.com/2015/09/08/how_we_ended_up_with_micro...?
(That question mark is part of the URL)
It explains how even in a small company modularization can help extremely. Now look at Google, where some issues (like the google.com april 1st XSS issue) were only fixed after outsiders mentioned it.
Usually internally the team responsible for that part should have cought that.
What's the experience like for teams not running a Google service and instead interacting with external users and contributors, e.g. the Go compiler or Chrome.
Smaller stuff (like, say, tcmalloc or protocol buffers) is usually hosted in Piper and then mirrored (sometimes bidirectionally) to an external repository (usually GitHub these days).
So in these scenarios, what does Google's infrastructure buy you, if anything? And if it doesn't buy you anything, how does that influence Google culture? Are teams less willing to do real open development due to infrastructure blockage?
In general you change the much depended on library and all of its consumers (probably over time in multiple changes, but you can do it in one go if it really needs to be a single giant change).
There are also tools for making large scale changes safely and quickly.
How do you guys manage alerts and messages - does every developer get a commit notification,or is there a way to filter out messages based upon submodule.
How does branching and merging work?
I'm wondering what processes are used by non-Google/FB teams to help them be more productive in a monolithic repo world.
As for notifications, the CL has a list of reviewers and subscribers. If you want to see code changing, you watch those CLs. Most projects have a list where all submitted CLs go.
If you would like to see how things would work with submodules that behaved just like files behave (full distributed workflow) we've got a (unfortunately commercial) solution here:
Are libraries and large projects e.g. RDBMS generally vendored/forked into the monolithic repositories, regardless of whether the initial intent is to make significant changes?
So, for source deliveries:
third_party/apache/httpd/2.4/release.tgz
/patch.tgz
/Makefile (or other config)
third_party/apache/httpd/2.2/release.tgz
/patch.tgz
...https://chromium.googlesource.com/chromium/src.git/+/master/...
Looks like there is a Quora question that mentions this too: https://www.quora.com/How-many-Google-employees-can-read-acc...
FTA:
> There are limitations this system. Potvin says certain highly sensitive code—stuff akin to the Google’s PageRank search algorithm—resides in separate repositories only available to specific employees.
The vast majority of code is visible to everyone, though.
"Potvin says certain highly sensitive code—stuff akin to the Google’s PageRank search algorithm—resides in separate repositories only available to specific employees."
Although I kind of doubt that "almost every" engineer has access to the entire repo, especially when it comes to the search ranking stuff.
If you start centralizing your development you’re killing any type of collaboration with the outside world and discouraging such collaboration between your own teams.
http://code.dblock.org/2014/04/28/why-one-giant-source-contr...
piper grew out of a need to scale the source control system the initial internal repositories were using
code.google.com was a completely separate thing supporting completely different version control models, and a very different scale (very large number of small repositories, vs very small number of very large repositories)
When we're not talking about the sensitive stuff, there's not much magic to what many engineers write every day, it's the same "glue technology X to technology Y" stuff you see everywhere, so I don't think there's any value to hiding that in the name of secrecy.
> (...) all 2 billion lines sit in a single code repository available to all 25,000 Google engineers.
The most famous corporate trade secret, the Coke formula, was stolen by two employees who attempted to sell it to Pepsi. Pepsi alerted Coke, the companies worked together to bring in the FBI, and both employees went to prison: http://www.cnn.com/2007/LAW/05/23/coca.cola.sentencing/
"There are limitations this system. Potvin says certain highly sensitive code—stuff akin to the Google’s PageRank search algorithm—resides in separate repositories only available to specific employees."
There is a solution for project/directory-level CC / review requirements. I didn't see it discussed in the talk, though.
It's true that you don't know about all callers if you're working on open source software. There's no magic there; you need to think about backward compatibility. (On the other hand, if it's a library, your open source users can usually choose to delay upgrading until they're ready, so you can deprecate things.)
The main advantage for an open source project is that, though you don't know about all callers, you still have a pretty large (though biased) sample of them. If you want to know how people typically use your API's, it's pretty useful. Running all the internal tests (not just your own, but other people's apps and libraries) will find bugs that you wouldn't find otherwise.
There were changes I wouldn't have been confident making to GWT without those tests, and bugs that open source users never saw in stable releases because of them. On the other hand, there were also changes I didn't make at all because I couldn't figure out how to safely upgrade Google, or it didn't seem worth it.
You can definitely see that Google uses a completely different build system than mainstream Go by the state of the mainstream Go build system though.
Their source claims that Windows XP has ~45 million lines of code. But that was 14 years ago. The last time Windows was even in the same order of magnitude as 50 million LOC was in the Windows Vista timeframe.
EDIT: And, remember: that's for _one_ product, not multiple products. So an all-flavors build of Windows churns through a _lot_ of data to get something working.
(Parenthetical: the Windows build system is correspondingly complex, too. I'll save the story for another day, but to give you an idea of how intense it is, in a typical _day_, the amount of data that gets sent over the network in the Windows build system is a single-digit _multiple_ of the entire Netflix movie catalog. Hats off to those engineers, Windows is really hard work.)
OTOH you can't blame them for being incorrect if you (as in, Microsoft, not you personally) are being so secretive about the figures. I'm pretty sure everyone would love to see how Microsoft works internally, especially now that you teased us with that Windows build system.
If Microsoft has close to the same amount of code in a single repository, then they must have also written their own version control service that runs on more than one machine.
The last rumors I heard is that Microsoft bought a license to the Perforce source code, and created their own flavor to host internal code ("Source Depot" ?), which presumably still runs on a single machine.
Strange unit of comparison, although I may start using it.
We should have a list of these things.
Part of my job- although it's not listed as a responsibility- is updating a few key scientific python packages. When I do this, I get immediate feedback on which tests get broken and I fix those problems for other teams along side my upgrades. This sort of continuous integration has completely changed how I view modern software development and testing.
I'm wondering if microservices force you to coordinate via the codebase just like using many codebases force you to coordinate via the monolithic service. Does the coordination has to happen somewhere? I wonder if early adopters of microservices in many codebases (SoundCloud) are experiencing coordination problems trying to change services.
The bad news was that it allowed people to say "I've just changed the API to <x> to support the <y> initiative, code released after this commit will need to be updated." and have that effect hundreds of projects, but at the same time, the project teams could do the adaptation very quickly and adapt. With the orb on their desk telling them at that their integration and unit tests were passing.
I thought to myself, if there is ever a distributed world wide operating system / environment, it is going to look something like that.
I'd say the major downside was that this approach basically required a 'work only in HEAD' model, since the tooling around branches was pretty subpar (more like the Perforce model, where branches are second-class citizens). You could deploy from a branch but they were basically just cut from HEAD immediately prior to a release.
This approach works pretty well for backend services that can be pushed frequently and often, but is a bit of a mismatch for mobile apps, where you want to have more carefully controlled, manually tested releases given the turnaround time if you screw something up (especially since UI is really inefficient to write useful automated tests around). It's also hard to collaborate on long-term features within a shipping codebase, which hurts exploration and prototyping.
I have problems with Windows, but it's the fastest desktop os I think, mostly because it's graphics stack is way the best of all. Running a number crunching C code is exactly the same on Windows or Linux. (See all the benchmarks on the Internet.)
Why Mercurial instead of Git?
For example, android and chrome are git based.
Note also that when codesearch used to crawl and index the world's code, it was not actually that large. It used to download and index tarballs, svn and cvs repositories, etc.
All told, the amount of code in the world that it could find on the internet a few years ago was < 10b lines, after deduplication/etc.
So while you may be right or wrong, i don't think it's as obvious you are right as you do.
I remember seeing an internal page with dozens of links for humorous searches like "interger", "funciton", or "([A-Z][a-z]+){7,} lang:java"...
Yeah this one was my favorite of the code search examples, there are some really good ones in there.
(But as great as Google Code Search was, my grudge is because of Reader.)
Maybe they are counting everything they use. Somewhere among those 2B lines is all the source code for Emacs, Bash, the Linux kernel, every single third-party lib used for any purpose, whether patched with Google modifications or not, every utility, and so on.
Maybe this is a "Google Search two billion" rather than a conventional, arithmetic two billion. You know, like when the Google engine tells you "there about 10,500,000 results (0.135 seconds)", but when you go through the entire list, it's confirmed to be just a few hundred.
So employees modify 120 lines / day if we imagine a linear growth in employees in 17 years to 25K coders, with 250 work day a year they employ around 6 more coders each work day, so about 55M man day, so around 6,3G LOC modified. But modified != added, so I wound't believe this is all their own lines.
I also wonder if there's a circular relationship anywhere in there.
Having a massive code base isn't a badge of honor. Unfortunately in many organizations, people are so sidetracked on the next thing that they almost never receive license to trim some fat from the repository (and this applies to all things: code, tests, documentation and more).
It also means almost nothing as a measurement. Even if you believe for a moment that a "line" is reasonably accurate (and it's tricky to come up with other measures), we have no way of knowing if they're measuring lots of copy/pasted duplicate code, massive comments, poorly-designed algorithms or other bloat.
It seems to be in a reasonable order of magnitude for C++/Java-type languages compared to projects that I have seen, but it does imply a significant chunk of code that is not actively being worked on for a long time (which is not necessarily a bad thing - don't change a running system and all that).
With regard to copy/pasted duplicate code and massive comments, we do have ways of knowing that as both of those are easily computable. Duplicate code can be matched using hashes and comments are delimited, making their measurement easy.
It's like comparing the weight of a monster truck and the total weight of all the cars at a dealership...
(15 million lines of code changed a week) / (25,000 engineers) = 600 LOC per engineer per week
Is ~120 LOC per engineer per workday normal at other companies?
Would be interesting to know what percentage of the total LoC touched are typically from that kind of automated refactor. Depending on the codebase, you can touch a ton of lines of code in a very small amount of time with those tools.
- What is the disk size of a shallow clone of a repo (without history)?
- Can each developer actually clone the whole thing, or you do partial checkout?
- Does the VCS support a checkout of a subfolder (AFAIK mercurial, same as git, does not support it)?
- How long does it take to clone the repo / update the repo in the morning?
Since people are talking about huge across-repo refactorings, I guess it must be possible to clone the whole thing.
Facebook faces similar issues as Google with scaling so they wrote some mercurial extensions, e.g. for cloning only metadata instead of whole contents of each commit [1]. Would be interesting to know what Google exactly modified in hg.
[1] https://code.facebook.com/posts/218678814984400/scaling-merc...
Your last point is the only one that applies. If you want your view to advance from revision 123 to revision 125 it takes about a second to do so. If you have pending (not yet submitted) changes in your client, they might have to be merged with other changes, which can take a bit longer. If you have a really huge pending change, and your client is way behind HEAD, it might take a few tens of seconds to merge everything.
This model precludes offline work, of course. But that's not much of a problem in practice.
IMO this system would best be suited for large companies, but I could see the VCS that they are developing being used by anyone if it gets a github-esque website.
Steve Yegge, from Google, talks about the GROK Project - Large-Scale, Cross-Language source analysis. [2012] https://www.youtube.com/watch?v=KTJs-0EInW8
Working on the google codebase is pretty awesome. This week I've made changes/improvements to libraries owned by three different teams (not counting my own), in C++ and python, when my main project is all in Java. It's super fun. The code search tool is great - it's ridiculously fast, and makes navigating through the codebase very easy.
Also, is the code that you add to the repository always inspected by other people? Is that done systematically?
It varies a lot by project and by how widely used the code is.
There are few (if any) strictly enforced rules for documentation or invariant checking. You basically have to convince at least one other engineer that what you have is sufficient.
The documentation is _generally_ pretty decent for core libraries, but sometimes you just have to read the code.
* Google X - moonshot projects that aren't software centric.
* Google Fiber - mainly a infrstructure setup. May share some stuff in piper.
* Google Ventures - investment arm, not code related.
* Google Capital - more investment stuff.
* Calico - R&D for biotech.
Anything that would be software centric will probably still live under Google inc. As well, you are forgetting so many products that Google has. There are various lists of them out there [0][1].
(Just one example of how it's useful even when things are mostly services.)
Whether the CODE BASE is monolithic or not is orthogonal to the repository's nature. I was G for a couple years and I'd say they've done an ok job of breaking things up into libraries and services. Certainly there are interfaces that have done a better or worse job of setting the code up for open sourcing but because of the nature of the repository, large scale refactoring is more efficiently accomplished.
The biggest advantage seems to be that when you are an author of a dependency you can propose upgrades to all services that use your application. It is not clear to me but it seems that for small changes you can just force that change on the code owners. This ensures that the dependency author incurs the cost of a change (as is done for API changes in the Linux kernel) and that you do not need to version the API of the dependency.
Interestingly Google recently started marking API's private by default. So they are moving in the direction of explicit API management.
As soon as you work with people that are outside your control (as is common in open source) you would need to version the API as well in my opinion.
Here I am ready to deploy some new feature to gmail. In the meantime, I'm getting a steady stream of API changes. Can I build and release gmail to a specific revision number, and only incorporate the changes when I am ready, or are all release essentially off of the tip of the tree.
I don't need specifics, just the general idea. Where we work basically every project lives in its own branch, which makes it essentially impossible to synchonize changes. Things have to get merged to trunk, then pulled into the appropriate branch, and I don't like it at all.
Do you have version numbers for libraries/components: projext X uses version 1.5.4 of Y, 32.4.18 of Z, and so on, do you pull by revision number, are you all on tip?
It'd be very interesting to hear how you manage this.
When you write a feature, you follow these steps.
1. Write code and submit to the main (and only) branch. Hide your feature behind a flag. 2. Releases happen from head (with no regard to your feature) 3. When you want to enable your feature, you flip a flag to turn it on.
In practice: Things are messier.
The Linux kernel generally uses this policy for internal APIs for example.
> The solution to the excessive API change problem is to force
> whoever changes the API to fix all the consumers himself
> before the change is accepted.
This doesn't seem scalable. Let's consider the case of one api endpoint being changed by one developer, to add a new param to a function call. Further assume that this impacts hundreds of projects.Does it really make sense to make one developer update those hundred projects? Not only will it take forever for it to get finished (possibly never if there are new consumers of this api coming online frequently), but the developer of the core api may not have any experience in the impacted consumers of this codebase. I think the end result of this policy would be nothing once written ever would get updated, and new apis would just be added all the time (api explosion).
I mean, it's presumably impossible to have a single computer running a single OS build all of the Google software and run the testing.
Having people unaware of a project's purpose making changes to its code sounds like a nightmare to me.
Anymore info on this? Is it in house hardware?
Edit: And for those that are just shocked that git isn't the answer.
Facebook: https://code.facebook.com/posts/218678814984400/scaling-merc...
Google: http://www.primordia.com/blog/2010/01/23/why-google-uses-mer...
Mercurial was pushed internally as being the "better" (for some dimension of better) between it and Git back in 2010, but I think even the most hardline Mercurial fans have realized that in order to meet developers in the middle in 2015, we need to use Git for our open-source releases. We have a large investment in Gerrit [1] and Github [2] now.
So the Mercurial comment is probably entirely based on scaling and replacement for the Piper Perforce API, rather than anything externally facing.
[1] https://www.gerritcodereview.com/ [2] https://github.com/google
One thing mentioned in the paper, but not mentioned here, is that there are teams that live in GIT, such as android and chrome, but they are not monolithic.
I know there are various workarounds for dealing with large repos but having had some experience using git on big-ish projects I can certainly understand some possible reasons why it wasn't their first choice given the size of their codebase.
Git isn't so awesome that it's inconceivable that people would be willing to use something else.
It's a legitimate question, damn it. Especially if the chosen solution is not mainstream.
You don't pick non-mainstream solutions "just because". You do pick mainstream solutions "just because". As in, "just because" they are tried and true, "just because" it's familiar to devs, etc.
So let me ask the reverse question: Is that what the HN crowd [sic] has turned into? You can no longer ask the reasons behind your tech choices?
Inconceivable? No, I just asked why. Git to me is a great solution for source control, so I want to understand it's deficiency for handling large amounts of code, especially considering it is used for managing Linux source and handles everything in GitHub I've used quite well.
E.g., ProjectPotatoLoginPageBuilderFactoryObserver.
(Disclaimer: I just made it up. Not an actual Google project name.)
"lang: java" is not a part of regexp; just a Google code search extension that searches for Java.
It's a nice environment to work in. In addition to hastening Noogler onboarding, it also increases employee retention. If you are an expert in your project's codebase but get burned out, you can easily transfer to another project and be almost immediately productive.
Obviously, there's domain-specific knowledge that doesn't transfer easily or quickly from project to project. But that's quite different from self-inflicted code fragility; one's an asset and the other's a liability.
Git and HG are not. Not that they can't be learned, but the learning curve is much higher.
Piper and CitC (mentioned in Rachel Potvik's videO) are even more advanced, I can work from my desktop machine, then go and continue from my laptop, or open and edit files directly from critique in an internal web-based editor.
Key to all this - make it accessible through the web! Anyone can work then from almost any machine. But at the same time the sources is accessible in the file system - even more awesome - so i can use emacs, and whatever other tools are there.
There's also internal-only things to deal with, such as the CitC integration. Because of some of the design decisions behind CitC, storing a typical .git or a .hg directory in CitC is essentially impossible. Mercurial's .hg directory is intentionally a black-box - you interact with the repo using the hg tool. The .git directory can be seen as an API - there are at least three implementations that matter for our purposes, and if we change something in the .git directory, we lose the editor/IDE integration powered by the ones we didn't fix (or we did fix, but they haven't been released yet; some of those are linked into products that have a commercial release cycle).
If it's ever needed to cite numbers, at least tell what the context is instead of naming some number out of the blue from 15 years ago and assuming it's still the same windows
If it was any other way you'd rapidly reach a useless equilibrium where random engineers were demanding that thousands of other engineers fulfill unfunded mandates for what might turn out to be negligible benefits.
Not that it's a silver bullet, but it can make a lot of these cases non-issues.
* I can see literally _every use_ of the old function.
* I can run the tests for everyone who uses that function.
* * this is automated; the build/test tooling can figure out the transitive set of build/test targets that are affected by such a change.
* I can (relatively) easily update _every use_ of the deprecated call with the new hotness
* I can do that all within the same commit (or set of commits, realistically)
---
None of this is impossible with multiple repos, it's just a lot more difficult to coordinate
It's sort of like a partial clone, you can clone any subset you want, work on it, add in other parts, the tool takes care of making sure the stuff you add in is lined up (if you cloned a week ago and you add in another repo, it's rolled back so it matches time/commit wise).
If you want to search for something it's the same command in a collection of repos as it is in a monolithic repo:
bk -U grep refactor_me
but truth in advertising, that searches only what you have populated.
The google answer is "ship grep to the data" so they'll search everything.
Google wins if you have their datacenter. We win on UI, it's the same
everywhere.Our design is much more tightly coupled than Git's submodules. We manage the subrepos so that they are in sync just like files are in sync. What I mean by that is if you have two files modified in the same commit, you can't pull that commit and only get one of the files updated, they are both updated because that's what happened in that commit. We've provided the same semantics for collections of repositories. Git doesn't, getting those semantics is an exercise left to the user.
We get better performance because you can clone as little as one repository, what we call the product. It would be easy and fun to put a Unix distro in this system and have the top repo just have a makefile and you type make X11 and it populates the compiler, libc, the X11 sources, and builds them.
It's commercial so maybe that's uninteresting but if you want submodules that work and scale in performance, we might be worth a look (sorry for the marketing, if that's against the rules here then I'll learn how to delete a post).
Read more here, comments welcome, it's beta copy:
Bazel test[1] doesn't provide dependency testing as far as I know, but it creates the framework to support doing it.
The commands are still basically the same as Google's wrappers around Perforce and the learning curve was non-existent for someone who was used to using Perforce at Google.
How are conflicts managed?
Local branches for development, a single HEAD for merging code.
Such as: "Hacker News has an Ask dot com userbase of number of good posters" (This obviously does not include me ;))
Assume the code is sorted well (it is). Why does breaking them into "proper" repositories help anything?
What does "proper" even mean when tons of stuff is shared?
You act as if there is some obvious split that if they had "done it the right way" they would have done. That is 100% not obvious to me.
How do you demarcate these lines, maintain things across these boundaries, etc.
If I wanted to change something fundamental, like I found a 10% speedup in Protobuf wire decode by changing the message slightly, there are likely very many services that all need it.
Everyone at Google operates on HEAD. You're not allowed to break HEAD, and pre-submit/post-submit bots ensure you don't and will block your submit.
I admit that I'm not an expert at large software development, but this seems to nearly fully explain Google's declining code quality.
An astute observer may be able to come up with some plausible reasons (at least for GLS) as to why such an arrangement would be desirable.
As far as splitting up the infrastructure, we're still going to depend on Google infrastructure and use their services where it makes sense.
are you in CO?
Hence why his post is being downvoted.
nationwide wireless network
to obtain e.g. the latest stock prices.Tests would run on all the clients and, since in my workspace the server was updated simultaneously, I could be more sure it would work.
Think python 2->3 or Angular 1->2. These types of changes do happen, and I bet they happen at Google. I don't think anyone is rewriting a downstream app when they make these changes. Most likely they are doing something like forking the library and renaming it, which is just another form of versioning.
https://isocpp.org/blog/2015/05/cppcon-2014-large-scale-refa...
It also encourages writing tools for automating this stuff. A big part of the motivation for polyup (https://github.com/PolymerLabs/polyup) was the fact that we'd be responsible for getting people to upgrade.
https://selenic.com/hg/log?rev=@google.com&revcount=200
I always see a lot of Google and Facebook at the Mercurial sprints.
But if I were to take a poll of the informed and opinionated engineers sitting around me, they would almost all likely take git over hg.
In any case, git is in actual widespread use _now_ at Google, for Android and Chrome, and other open source stuff (some of which was moved recently from code.google.com to github).
"is same order of magnitude as" is not a transitive relationship.
But, it's not my job to decide whether or not that information should be shared, because it's not my job to speak for the company.
I'm having a hard time understanding why people think this is not a reasonable position.
Anyway, let me guess. Judging by how the size of all binaries shipped with Windows varied between releases, I'd be inclined to think Windows 10 does not have significantly more lines of code than Windows Vista.
So I'd guess at most 100 million lines of code?
For me, I'm comfortable saying that I don't speak for the company and leaving the numbers within an order of magnitude. When it becomes my job to decide which numbers are and aren't fit to talk about publicly, I'm happy to update you.
Note: windows programmer for 19 years now. Only because of the cash.
The disadvantage of NTFS which you point out, isn't because of a fuckup. It's not designed for your use case. You might even find Microsoft telling you that themselves here :- https://technet.microsoft.com/en-us/library/Cc938932.aspx
As to your point about productivity, I can't comment without knowing specifics. As a primarily C++ programmer, I haven't run into any Windows showstoppers that prevented me from shipping. I have run into showstoppers with their dev tools, but I see them as separate from the OS.
Note: windows programmer for 19 years now. Only because of the cash.
You know why there's cash? Because Windows works for a lot of people.The mantra among the consults I've met in the UK is if you're charging by the hour, do it in .Net on Windows. If you're charging a fixed rate, use Linux and Python.
I'm not suggesting there is anything better for an end user but I'm pointing out that it doesn't work well enough.
I still use it however and have a fondness. The accumulated knowledge of fixes is incredibly valuable.
Haven't tested ext4 and ntfs drivers directly against each other, but an useful trick if you ever need to copy millions of small files from NTFS is to mount it on Linux because Linux driver can work with it way way faster than Windows one.
I dual booted a laptop for a while with Vista. (I can't speak to anything later, because I use Linux now, and haven't looked back, so take the appropriate grain of salt.) So with Vista / Gentoo on exactly the same hardware (a Lenovo T61):
- boot time on Linux was orders of magnitude faster
- WiFi AP connect was significantly faster[1], esp. on resuming from suspend-to-RAM
- Windows had a tendency to swap things out if they weren't in use, and had to swap like crazy if you paged back to a program you hadn't used in a while; Linux, by comparison, will only swap if required to due to memory pressure.
[1] i.e., WiFi was reconnected before I could unlock the screen. No other OS I've had has been able to do this, and it's bliss.
> mostly because it's graphics stack is way the best of all.
Riiiight. The T61 had an nvidia in it, and it was fairly decent; drivers were decent between the two OSs, and performance on each was about on par with the other. (I used the proprietary drivers; nouveau performed unacceptably bad — bear in mind this was 7 years ago.)
> Running a number crunching C code is exactly the same on Windows or Linux. (See all the benchmarks on the Internet.)
This I will agree with; but what do you do after the number crunching? It's the scaffolding around the program that mattered to me: Linux has a real shell, with real tools. I can accomplish the odd task here or there. But yes, running a "number crunching C code" will perform about equally: you're really only testing the processor, maybe the memory — crucially, the hardware, not the OS.
To be fair Vista was the slowest NT 6+ OS, especially booting is way faster on Windows 8+.
It should handle general scenarios consistently. We've had a few minor versions of NTFS and now ReFS. ReFS should solve this but it doesn't as it's a copy and paste of the NTFS code initially rather than a complete reengineering effort.
In fact the majority of Windows networks both corporate and small business I can safely say that it barely works and is usually a mismanaged unpatched mess or filled with crapware.
And yet, no vendor can hold a candle to Active Directory, which is the single best thing about running Windows in an enterprise.You literally cannot manage SSO, patching, and config management on a non-Windows environment for more than a few hundred machine without the right tools. Shell scripts and Chef aren't going to cut it when you have 20,000 laptops to take care of.
One of our enterprise clients has just bought 500 Chromebooks and we integrated OpenID in our application and no one has to deal with AD, SSO is sorted and zero management overhead. I really like this solution. If someone could build a standalone product with equal quality it would destroy Microsoft overnight. Their ops team is 4 people and two of them are network people to keep the pipes (and APs) working.
And of course there is FreeIPA and PolicyKit as a contender but that's not really there yet.
To be fair they are financial point of sale machines so it's all process driven and they're not general purpose computers.