"The main problem with Git is that binary files are stored “as is” in the history of the project, so that every single revision of a new binary file (even if just a single byte has changed) is stored in full. [...] On the other hand, source files being mostly text, they are more intelligently handled and typically only differences between revisions are stored in the commits."
This is false. Git stores the full version of each file in "loose" format and uses compressed incremental diffs (originally based on xdiff) in packfiles (after "git gc") without distinguishing text vs binary in either case. The issue is that binary files are often compressed themselves (so a one-byte semantic change has nonlocal effect) or have positional references (like jump targets in an executable, causing small changes to cascade).
These factors explain the inefficient handling of binary files, but improving efficiency requires changing the semantics. LFS follows in the path of a few other tools (based on smudge/clean filters) that try to hide the semantic difference from the casual user, though that difference seems to bite people more frequently than we'd like.
The problem is that "binaries" are large amounts of data with high entropy.
"My guess is that some high-level greedy marketing dickwad, completely unaware of the asinine implications of his brilliant idea, signed off on this dumb-as-a-bag-of-rocks pricing model."
"All the marketing material pimping GitHub’s LFS support [...]. I do not believe this is unintentional."
"This is completely batshit. The side effect of this pernicious, greedy pricing model is to [...]"
"I honestly couldn’t believe that GitHub would be willing to do something that shortsighted, visibly motivated by greed from the cash they thought they could extract from some of their users".
Charitable explanation for forks not working: they haven't yet written the code to make this work with forks, and it's better to ship something working early, than to make it work in all cases.
Charitable explanation for charging for bandwith: bandwidth costs money. (I believe this is a real problem for Dropbox, which doesn't charge for bandwidth but must still pay for it). Also, all CDNs, and also AWS charge for bandwidth.
Overall, while GitHub may be able to support it's OSS folks better by changing the pricing on some parts of its product, this post is incredibly uncharitable. I hope the OP will consider removing the unfounded narrative that he's projecting onto GitHub (esp the "marketing dickwad" thing - wtf) and focus on the facts.
[Disclaimer: my company partners with GitHub on lots of stuff]
Odd example. Linux doesn't use GitHub pull requests.
In fact, there are currently 10,532 forks according to GitHub.
I'm just saying it's a very odd choice for an example of GitHub screwing over workflows.
Does anyone know if their repos supports forking in combination with LFS too?
> On the other hand, source files being mostly text, they are more intelligently handled and typically only differences between revisions are stored in the commits.
This is completely incorrect, git stores whole blobs from one commit to the other.
svn stored patches, but git does not. Every version of a file is stored in its entirety in your git tree since the beginning of the repository's existence. This is one of the reasons why git is so fast. You can go through your objects in your .git directory and verify this for yourself[0].
$ find .git/objects -type f
.git/objects/ff/a5d733354ae6f8bdc67764d58d87c9a3161f66
.git/objects/ff/deb08f4856bd6eb5b31d7f800b3e480ae3e2e0
$ git cat-file -p ffa5d733354ae6f8bdc67764d58d87c9a3161f66
...file contents appear...
[0] https://git-scm.com/book/en/v2/Git-Internals-Git-Objectshttps://git-scm.com/book/en/v2/Git-Internals-Packfiles
(edit: After I started responding to your comment, you edited your comment to link to the same book! I recommend you continue reading the later chapters: "you'll never believe how it works" ;P.)
Is there a reason why binary blobs need to be stored directly next to code in order to be versioned?
Also you'd need the necessary LFS server piece on Amazon's side.
The idea isn't to shove everything into a storage bucket, but to assemble a toolchain using components that are fit for purpose. Git is fundamentally not fit for purpose as an artifact repository. There are tools that are.
-Eric
Has anyone tried the new Perforce/Git stuff? Is it any good? We're still on an older pre-Helix version.
If you were consciously choosing to take a hyperbolic tone, can I ask if you might reconsider that decision in future posts? Or at least concretely test your idea that calling people "dickwads" and "grunts" gets you more traction.
I appreciated you raising the bandwidth question, and comparing it with other services. You made a good argument. Thank you!
They're a business. C'mon here.
SourceForge.net was once an excellent and trustworthy steward of Open Source software projects. It was predicted by some folks in the free software community that it would not always be the case, and alternatives like Savannah were maintained in order to act as a hedge against that concern. I believe it is more than reasonable to assume that github will change, and it would be downright dangerous to assume that we can rely on a profit-motivated corporation (even one as cool as github currently is) to remain a trustworthy repository forever.
So, sure, say nice things about github; I also think github is a good product, and I appreciate their free hosting for OSS projects. And, sure, you should use github if it provides value for you and you're willing to accept the price. But, don't ask me to trust they'll never change, because history indicates they will. It's probably also unfair to suggest that someone criticizing some valid concerns about github's current behavior, based on their own experience with Open Source projects hosted at github, are making "vicious accusations".
"My guess is that some high-level greedy marketing dickwad, completely unaware of the asinine implications of his brilliant idea, signed off on this dumb-as-a-bag-of-rocks pricing model."
"All the marketing material pimping GitHub’s LFS support [...]. I do not believe this is unintentional."
"This is completely batshit. The side effect of this pernicious, greedy pricing model is to [...]"
"I honestly couldn’t believe that GitHub would be willing to do something that shortsighted, visibly motivated by greed from the cash they thought they could extract from some of their users"
Line NodeJS or something.
That's exactly my point. Using a "request-pull" instead of a "pull-request" will probably mean you can do this git-lfs thing with it. My understanding was that it was not working with "forked" repositories, which you need to have to make a "pull request". To make a "request pull" (or to just send a patch file), you don't need to "fork". ;)
I'm not sure it's within your rights to ask him to change the way he writes within his own bubble because you don't like his word choice.
Clearly the post was written for an audience. Being needlessly inflammatory could certainly turn the audience off, and/or undercut the author's credibility. Ansiton's advice was both helpful and valid.
If they would just stop trying to do that, then we would have nothing to talk about.
It is not at all uncalled for to criticize the way they are trying to do business, especially when it affects you as an existing customer.
Perhaps that is a big deal (I don't use LFS and until recently neither did anyone else), but it's a far cry from what you said in your original comment, such as "I do believe it is foolish to assume that the github we know today will be the github of tomorrow."
They implement a feature in a restricted manner and all of a sudden they're evil?
SourceForge is the best past analog for github, and I think it's worth learning from history. SF.net didn't start out evil and untrustworthy; they started out good. Who's to say github won't do the same?
If you start using LFS, you are effectively losing features and the product becomes worse for a lot of their users, a lot of them paying customers.
The question is not whether this was being just inflammatory (it was) but rather whether it was unwarranted criticism.
Disclaimer: I am the author of the article.
I think what you mean is that it is not unjustifiably vicious.
But I didn't write the article. I was just kinda jumping on the "hey Github could potentially do shitty things if we aren't careful" bandwagon. I'm definitely not anti-Github. I think they've built a great product. I'm a paying customer with private repos.
My thought process went something like this: "This is a feature that is currently, probably accidentally, causing vendor lock-in for github users, as there is no easy way to take a project in its entirety back out of github, if it has enabled this feature. That kind of lock-in has been used in the past, by vendors across a wide spectrum, for evil purposes. Github, were it ever to become evil, would find this the kind of thing that would screw users and produce profit."
I don't believe github today has evil intent (though the author of the article seemingly does believe that), but I reserve the right to be skeptical of what the company's intentions will be in the future. Just as I should have been more skeptical of the the future intentions of SourceForge in the past. I think my position on this is entirely fair to github (which is a company and product I like), but I'm also trying to not to be a total sucker and make the same mistakes over and over again.
I thought the OP noted that MS also offers git+LFS (and for free)!