Disabled at 22 million commits(programming.dev) |
Disabled at 22 million commits(programming.dev) |
I don’t blame GH at all.
Source: https://web.archive.org/web/20230702215522/https://sh.itjust...
This is Hacker News. Hacking is about using, in particular, technology in surprising ways that were not intended by the creators.
The reason that hacking is even a thing: It's actually possible to break things in a responsible, non-destructive way (in contrast to most things in the physical world).
If we skip the responsible part, we are just... breaking things and incurring costs. Why should that be okay?
I don't really see anyone blaming GitHub, not even the original post, I'm not sure why all the responses here are insinuating that?
There is absolutely nothing wrong with GH stopping that and it's very wrong to insinuate otherwise like OP is doing.
Wouldn't be surprised if GH would permaban him.
I don’t want to quibble with the term “abuse”, because I think in this scenario it depends on whether intent is a factor and whether we should trust their stated intent. But depending on how you look at it, GitHub would be just as likely to benefit from hiring the author as they would from banning.
It's a wordplay to call that anything else than abuse.
For what purpose?
Creating an infinite loop that updates a file and commits it is hardly worthy of a job offer.
I'm reminded of a time in high school where my friend almost got himself banned from the school computers.
At home he had dial-up internet (it was 2003 and he lived in a very rural area). But at school he had megabits of bandwidth he could (ab)use. So he started pirating everything on the internet using a computer nobody ever used in a side-room of the library. It ran 24/7 downloading his long list of desires: games, movies, tv series, etc. He stored his spoils on his network drive, which had no limits on how much it could hold (until he got caught). He'd occasionally bring in a hard drive, copy everything that fit on it and bring it home with him on the school bus.
But all good things must end.
The network admin for the school board eventually came by and sat him down. He showed my friend a pie chart where, as he described it to me, "my name was on the portion that took up more than 2/3 of the pie". After a conversation, all the data got deleted, my friend got a stern warning, and somehow didn't get into any worse trouble than that.
I don't get this attitude. Shit happens, we talk about it, we don't do it again. Not everything needs to have dire consequences.
Note that the message only said “the potential to affect other users”. I would expect a professional service to catch such things before it actually affects other users.
The editor of the magazine almost stopped the contest because he worried that someone might actually win real money and the magazine would be on the hook. But the author reassured him: human nature being what it is, the winning number is going to be not only larger than 1 million, but much larger than you can imagine.
And so it was. The winning number was (IIRC) some tower of exponentials that would take most of the universe to write out as decimal digits. The SciAm budget was safe.
If readers had coordinated somehow, they could have won a million dollars from SciAm and divided it among themselves. They might have made a hundred dollars each. But the author knew that such coordination would be impossible. Human nature would not allow it. Someone, somewhere, was going to send in a ridiculously large number to win. Classic Prisoner's Dilemma.
The GitHub case is the same. Human nature being what it is, someone, somewhere is always going to try to push the limits. As the developer of a SaaS development platform, this is something I'm taking to heart.
They could have been in quite some trouble!
https://clintonwhitehouse3.archives.gov/WH/glimpse/president...
So even if people were to try, I don't see that being a big bother. Not that it's not malicious to do this now
Monorepo's in particular could be impacted
1. Some HN users might/could have been personally inconvenienced by OP's action and they prefer resenting him rather than GitHub for whatever reason
2. Many HN users get paid a lot to work on SaaS themselves, so seeing a peer (however big it is) get abused for (what appears to be) entertainment is terrifying to them
I think this is exactly why it's great, and it's basically turned into a GitHub advertisement. Either GitHub is simply unable to handle weird abuse methods and/or the abuse prevention is improved.
As an enterprise, wouldn't it be a bit concerning if your git host was unable to function (or respond appropriately) when presented with a random script kiddie?
This person didn't have bad intentions, but other people out there most definitely do.
dont fuck with shit I use.
This just doesn't seem right to me. Why? Its obvious at some point you'll harm the service. If the goal was to test it, why not try locally with git.
Linux[0] itself has about 1.2 million commits, so apparently Linux is within an order of magnitude of bringing GitHub to it's knees?
Entitled much? The author should be happy GitHub didn't just ban them for violating the ToS and intentionally trying to break things.
I think the sentiment here shows the current state that software engineering has devolved into. It’s a 9-5 where you put in minimal work and get mad when someone breaks your system because you might have to do an hour of work to fix it on your weekend.
This account basically subscribes to thousands of repositories and monitors all activities. I am suspecting this account is harvesting user activities. I am not sure why GitHub allows this type of data harvesting.
To be honest, this is why companies also should not discourage this. Imagine if a malicious group did it with multiple users at the same time. At least now they will have pro active alarms for it.
This is also how I feel about the Tor project getting their knickers twisted over people who do research on the live network. If the network can't handle it, then it's not resilient to attack. Asking people nicely not to do stuff that degrades your product will not make the product suddenly anti-fragile.
A service is offered for free, with no documented limits or restrictions, so you push the service to its breaking point... Just to see what happens?
Because you can't. GitHub is not open source, you'd need to steal the source code to try it locally. This comment is for educational purposes only, not trying to give OP ideas!!1
But you're right in spirit of course. Would be more interesting to install Forgejo/Gitea, GitLab, GitWeb, gitolite, TortoiseGit, etc., test them on various limits, and write that up in a nice blog post for magic internet points.
The "(and git)" portion can of course be tested locally. What OP will find out is that there is no more inherent limit on the number of commits in a repo than there is an inherent limit in the number of nodes in a linked list.
You can go on forever till you run out of disk space. Possibly repacking will eventually require more than available memory.
It's obvious that the author is lying about that part, he only wanted to push GitHub to its limit, but he did say git:
> I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.
Test the behavior of git locally, without testing GitHub.
That’s not obvious at all. One would expect a professional service to have limits in place to prevent any negative impacts.
git clone -–depth [depth] [remote-url]Part of the GitHub response afaik included the info that they went as far as they could with dedicated and beefier servers but asked for a software fix.
I would think that if GitHub anticipates a normal repo growing this large they can give it the special treatment
I rather know the rule: by good engineering, you can modify a system to handle a one magnitude increase with respect what it was designed for. As soon as a two magnitude increase can occur, you better redesign the system.
> git clone --depth 1 https://github.com/MicrosoftDocs/azure-docs
Cloning into 'azure-docs'...
remote: Enumerating objects: 107158, done.
remote: Counting objects: 100% (107158/107158), done.
remote: Compressing objects: 100% (101843/101843), done.
Receiving objects: 17% (18217/107158), 780.25 MiB | 43.72 MiB/sIn the case of GitHub, it's owned by a nearly trillion dollar corporation. Nobody is hurting some mom and pop business here.
Slower growth allows for forward planning and incremental architectural changes.
We wouldn’t know. GitHub is probably running something very different to normal local git including optimizations for performance and cost.
They must only ensure API/protocol compatibility and could have already replaced everything else many times over.
GitHub could trivially honor the request without changing the incentives or even taking any defensive implementation action, by specifically citing this experiment in the rules and maybe adding some more specific wording to the TOS.
(Understanding we’re waaaay off in edge case territory here and this is all basically academic.)
> The GitHub API has periodic issues merging/creating PRs. (I use PRs since that is more reliable than keeping a local master up to date via pulling at this point).
You are confidently wrong. Git, including pull requests, was developed years before GitHub ever existed. GitHub borrowed the term from git. Pull requests originally (before GitHub) are requests sent via email that one developer pull changes from another.
https://www.git-scm.com/docs/git-request-pull
The request pull command has been part of git since 2005:
https://github.com/git/git/blob/master/git-request-pull.sh
GitHub launched in 2008.
> and no HTTP API
Also wrong:
https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP
There is nothing GitHub does with respect to git that you cannot do locally.
I didn't know of the specific "request-pull" subcommand so thanks for that link. Still, both things you link are a bit different from how GitHub implements it, and I'd be very surprised if the HTTP API you link includes an endpoint for triggering the request-pull the way that GitHub has such APIs for their pull request mechanism.
If you meant to say that git can do anything GitHub can and we needn't use GitHub, I agree. I've used git in peer-to-peer fashion before, and especially now that it's Microsoft's, I think twice before opening repositories there. But if your main point was rather that git includes the same functionality as GitHub and that OP could have just tested the regular git instead of doing it on GitHub itself, I still think that's a rather different test target.
It's entirely possible that such a load test has been considered, but deemed non-realistic so not prioritised for some time. If I were running the QA team I'd be annoyed if time were spent on abusive destructive testing than realistic testing that real-world users may experience, especially because load testing like this would have to be on an identical environment to PROD so rather expensive.
It reminds me of that old QA joke:
A QA engineer walks into a bar and orders a beer. She orders 2 beers.
She orders 0 beers.
She orders -1 beers.
She orders a lizard.
She orders a NULLPTR.
She tries to leave without paying.
Satisfied, she declares the bar ready for business. The first customer comes in an orders a beer. They finish their drink, and then ask where the bathroom is.
The bar explodes.
GitHub hasn't failed here - it continued to perform at normal levels for other users, so far as I can see, and they had an upstream process which caught the issue without the system failing. Maybe some exploratory testing had previously identified where that process should kick in, but without having an automated process since it was so unlikely to happen.
Not really. GitHub has been around for over a decade. People bother with problems that have a realistic chance of happening. If GitHub didn't bothered to rate limit commits it means it was a potential issue that didn't manifested itself for over a decade.
People tend to bother about problems that happen. Otherwise everyone would be freaking out because of killer asteroids.
More than 20 years later, I still remember the enormous benefit that little bit of malice has bestowed on me and my career. And every time I’ve been on the receiving end of such an exploratory exploit since has been exponentially more appreciated.
At a previous job I was aware of a potential vulnerability, voiced it rather loudly, but had a hard time getting the attention it deserved until I recognized it happened to coincide with a really high profile business-critical bug. I only recognized it because some jerks had previously fucked with much less important stuff under my purview, and I wanted very much to understand how they did it, and learned quite a bit by wanting to know.
I used those developed instincts to unfuck what would have otherwise resulted in at least contract terminations, if not lawsuits. And the recognition allowed me to correct almost every compromised datum, which also guarded every contractee from challenges to their license status and ultimately whether they could be subject to wholly different jurisdictional context.
I’m not going to disclose the nature of the vulnerability but the way the bug presented was time deltas based on time zone configuration. Hardly a novel problem, but nearly put a whole industry into peril and or conflict. Definitely was worth the attention.
And when communicating the problem suffered, I did what any self respecting hacker would do: I exploited the damn thing myself and showed how it was done.
Arguably, it would be better if GitHub documented an explicit number of supported commits, so that one can know beforehand which usage scenarios the service is suitable for.
I don't agree. Clearly GitHub can easily handle this number of commits, and more. There was no real world limit being hit. There is no user impact or degraded performance.
This means that in practice there is absolutely no practical limit in GitHub.
Why document that? Are you planning on working on pushing more than 22 million commits into a project? And if you are, what stops you from sending an email to GitHub to clarify if it supports your extraordinary usecase?
It seems some people around here are desperate to find any flaw in the way GitHub handled this case of vandalismz and at best are grasping at straws.
But to engage you about the GitHub part: I believe that under the covers, GitHub is still using something substantially similar to git as the repo storage format. Git has no inherit limitations on number of commits. Eventually you run out of disk space, and possibly memory and/or CPU during repacking. You could turn off GC and let the repo remain unpacked. You might eventually run out of inodes. During cloning (and pulling), git implicitly creates pack files, so a clone/pull will also take a long time (CPU and or memory again) on an unpacked repo. This is why git periodically repacks.
If I had to guess, GitHub also has no inherit limits. Creating commits was probably periodically repacking on the git backend, consuming increasing amounts of resources.
I would be surprised if the GitHub API (the Ruby on Rails code) takes much resources at all.
Creating endless PRs is something you can simulate locally with two copies of a repo. You can use "git ls-remote" against a GitHub-hosted repo with PRs in it to see how it exposes PRs as references that are not normally cloned.
Regardless, I think that OP could and should have satisfied their curiosity about how git works locally, especially with respect to whether it has an inherent limits. And they could have satisfied their request about GitHub resource limits with a support request.
Nothing technically novel. But evidently it was at least a somewhat novel stress test execution for GitHub’s live systems, otherwise surely it would have been dealt with sooner and messaged with less benefit of the doubt to the user.
Investigating the limitations of something doesn’t have to be novel to be interesting. It’s been a while (I think), but for example there’s been plenty of praise here for Netflix’s own stress tests of its live systems. The tests are often really mundane, eg just shutting some stuff off or triggering known error conditions. It’s interesting not because the nature of the fault is novel, but because systems are complex and it’s a way to learn about their failure modes.
I’m also not saying GitHub should hire them (and kinda I doubt they’d want a QA offer based on reading other blog posts on their site). Just that a hire would plausibly be similarly beneficial to a ban.
Not really. This is boring stuff, and odds are they never bothered with it because a) it has no impact on operations, b) the blast radius of this doesn't go beyond the attacker's own repo, c) no moron with time to kill bothered attempting this stunt until now.
Probably now some low-level employee at GitHub needs to add a metric and an alarm to react to rate limits to prevent moron copycats from pulling this stunt for attention-seeking.
Not smart, not clever. Just boring vandalism.
Everyone can drive a car. Did everyone invent Uber?
You're underrating the non-technical factors.
To start off, based on the fact that GitHub is around for over a decade and this was the first time this sort of attention-seeking stunt was made public.
Do you have any indication this sort of stunt is relevant?