Introducing Git protocol version 2(opensource.googleblog.com) |
Introducing Git protocol version 2(opensource.googleblog.com) |
It's interesting how the first ever git project itself was looking for new maintainer almost as soon as it was created.
As an open-source advocate, my first thought was, "Why the hell is Google releasing a version of a protocol that Linus Torvalds wrote?"
Without that context, it would be like Google throwing up an announcement, "Introducing Google's Linux Kernel 5.0!"
But we aren't yet including projects where we are just heavy contributors, but they're not "Google projects". That includes Linux, git, LLVM, and a host of others. We do want to recognize them in our project directory, but want to make sure that they are distinguished from Google projects so that we're not implying something that is accurate.
See the list of URLs at https://public-inbox.org/git/xmqqindt6g1r.fsf@gitster.mtv.co...
One of the more exciting things is that it can now be extended to arbitrary new over-the-wire commands. So e.g. "git grep" could be made to execute over the network if that's more efficient in some cases.
This will also allow for making things that now use side-transports part of the protocol itself if it made sense. E.g. the custom commands LFS and git-annex implement, and even more advanced things like shipping things like the new commit graph already generated from the server to the client.
The specificiation of the v2 protocol is here: https://git.kernel.org/pub/scm/git/git.git/tree/Documentatio...
(There are a couple of repos listed as official mirrors, such as the googlesource.com one, but the one you linked to isn't one of them.)
What list are you referring to? If it doesn't list the one on GitHub it needs to be fixed.
If you are at all interested in hacking on Git, it's not that difficult. Knowing C and portable shell scripting for writing tests are the big things.
One sticking point, you need to submit patches to the mailing list, you can't just do a github pull request.
See https://github.com/git/git/blob/master/Documentation/Submitt...
I still see github pull requests rather frequently, even though they have never been allowed. All discussion AND patches go through the mailing list, much like the linux kernel.
[1] https://public-inbox.org/git/CAJo=hJtZ_8H6+kXPpZcRCbJi3LPuuF...
[1] https://github.com/git/git/graphs/contributors?from=2012-03-...
https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...
These are contributions by Linus:
https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...
and as you can see, his contributions, really tapered off after 2010, while contributions from Hamano remained steady from 2008 to present date, as shown below:
https://public.gitsense.com/insight/github?r=git/git#b%3Dgit....
For example: I have the Linux kernel already cloned in some directory. I clone a second repo which has the Linux kernel as a submodule. Can I clone the second repo straightforwardly without having to download Linux a second time? (Well yes, but only by manual intervention before doing the git submodule update - it'd be nice if objects could be shared in a cache across also repos somehow).
I just tried this and it seems to work:
git clone git://github.com/git/git
mkdir git2
cd git2
git init
cd .git/
rm -rf objects
ln -s ../../git/.git
cd ../
git remote add origin git://github.com/git/git
git fetch # returned without downloading anything
git checkout master
ls # etc.
If you seriously want to use this, you'll probably want to hard link the contents, instead. But iirc git clone from local disk already does that, for you?In short: clone your local copy and taking it from there?
echo ../../../git/.git/objects >> git2/.git/objects/info/alternates
or use the original as a reference: git clone --reference git git://github.com/git/git git2
This sets up the alternates for you.This isn't even theoretical, there was an environment-related bug not 5 years ago involving Git. At least BitBucket was impacted, I think GitHub were patched before it was announced
As you point out selectively allowing a new environment variable could open a can of worms for shared hosts like github if they mess up their implementation.
I think that this is because the SSH protocol isn't just encapsulating the Git protocol directly (the initial assumption of ssh "just" encapsulating the git protocol is not fully correct), and one of the parts that differs is this particular part. (Since on the git protocol side, we need to select a "service":
> a single packet-line which includes the requested service (git-upload-pack for fetches and git-receive-pack for pushes)
which in SSH would be done not by transmitting that packet-line but by instructing SSH to run that particular executable.
> This is clearly described in the article.
It really isn't, IMO; if you don't have precise knowledge of the protocols involved, I don't think anything in the article particularly spells this out.
I wonder if this will be somehow exposed by git daemon. It could be used for easy per ref access controls.
For example Git Switch [0] that uses Macaroons had to clone the repository to implement per ref ACL.
I think all their open-source stuff (Angular, GoLang, Android) uses git (and sometimes Gerrit).
Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.
So for Google external projects, they use git.
> Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.
I doubt it. Their tooling is probably pretty specific, and now that code.google.com has shut down, they probably don't have any review servers that support it.
That's great. Another subtle reminder that this ad company has way too much control.
LKML would presumably be the place for Linux to announce when they adopt this.
The Google open source blog is among the several credible options for this post, since Google employs much of the core Git team, and this post discusses their experience deploying Git protocol v2 at Google.
As noted in the blog text, it's not in a released version of Git yet, just Git master branch. So maybe it'll appear on a dedicated Git announcement list, if any, once that happens.
It seems that https://groups.google.com/forum/#!forum/git-packagers is the closest thing to a formal announcement list that there is.
Not yet, but presumably there will be a post like this: https://lkml.org/lkml/2018/4/2/425 when it is released. It is strange that the Google Blog is the first place to announce it through.
Also, somehow make sure no servers, clients, or third-party middleboxes break when the version field is incremented. The TLS protocol designers had to give up on the version field; it's now going to forever be stuck at "TLS 1.2", since too much would break otherwise.
> In previous versions of TLS, this field was used for version negotiation and represented the highest version number supported by the client. Experience has shown that many servers do not properly implement version negotiation, leading to "version intolerance" in which the server rejects an otherwise acceptable ClientHello with a version number higher than it supports. In TLS 1.3, the client indicates its version preferences in the "supported_versions" extension (Section 4.2.1) and the legacy_version field MUST be set to 0x0303, which is the version number for TLS 1.2. (See Appendix D for details about backward compatibility.)
It's really too bad that the version field can't be used as a version field anymore, but thankfully the "extension" format is pretty flexible in that regard.
Unfortunately due to a bug introduced in 2006 we aren't
able to place any extra arguments (separated by NULs) other
than the host because otherwise the parsing of those
arguments would enter an infinite loop.
I'm not sure if entering an infinite loop means what i think it does in this context but that's almost CVE worthy and they should release a fix and mark that version as obsolete as ever and never have to make their clients cater to it any more.You can read about their fix by clicking the next link in the article.
EDNS is the only way to extend the protocol now, which is basically just adding additional Records to the Message that are designated as Extended DNS records, and treated specially.
https://tools.ietf.org/html/draft-ietf-dnsop-no-response-iss...
1] It was briefly used experimentally if I recall
2] https://tools.ietf.org/html/draft-ietf-dnsop-session-signal
I find git-annex a much better solution, it's a shame everyone went with LFS.
It was very hard to use in asymmetric cases where different people have different credentials, such as where one person has access to a computer and others don't, or where a couple of core developers have authenticated R/W access to a file server or an S3 bucket and everyone else just has HTTP.
If standard git ever implements shallow blob fetching, it would preferrably make git-lfs obsolete rather than help it.
I know, that's what I am suggesting should change in version 2.0. It is a widely supported popular extension that solves a major pain point for Git, most vendors have adopted it.
New things can absolutely be required as part of a new protocol version, in fact this blog post lists several new things that will be new in 2.0 and beyond.
The analogy I'd use is HTTP/2 and SPDY. SPDY started out as a Google produced extension to HTTP, gained popularity, and was then standardized/merged into the HTTP/2 standard. All I am suggesting is Git LFS receive the same treatment.
Just to confirm, but you meant "because it is not required", right?
[1] https://sourceforge.net/projects/sourcepuller/
[2] https://www.linuxfoundation.org/blog/10-years-of-git-an-inte...
(As Tridge tells the story[3], he telnet'ed to the bk port and typed "help" so it wasn't that much of a reverse engineering effort. :-)
I personally have stuck to kind of basic git usages (call it "Git: The Good Parts" if you will), and have never had the problems people claim to have with git. It just has always worked, and it has always been there for me.
https://git-scm.com/book/en/v2/Getting-Started-A-Short-Histo...
I presume Git 2.18 (the first release supporting protocol v2) will be announced via both channels once it's out.
Almost a year's work on TLS 1.3 was spent on working around problems with middleboxes. Because without that it would be impossible to deploy in practice. TLS 1.2 took years to deploy because so many middleboxes were incompatible and we had to wait for them to rust out.
I don’t read that as DNS stoping to work, but more reasons why DNS is flaky in different scenarios.
Some of the issues there are things related to mitigation’s against reflection attacks etc. I haven’t read the entire doc, but does it go into concerns around DDOS and other such things, and how DNS servers to mitigate those attacks?
Edit: right in the intro. So a server needs to “understand” when it is under “attack” and only then put in mitigations against the attack. In the worst case, the server doesn’t do this, fixes the issues in this RFC to always respond and then amplify the attack.
Git is a decentralized version control system. Its core networking protocol must remain useful for people who self-host.
Bitkeeper itself is open-source these days available via the Apache 2.0 License, but it is too little, too late:
Just like the version field.
I'm sure middlebox software is being updated as we speak to terminate connections with unknown versions in the „supported_versions“ extension.
Often-referenced paper in that field: http://conferences.sigcomm.org/imc/2011/docs/p181.pdf
If Github et all thought this was confusing, they could have made a "beginner's mode" that auto-selected the storage server based on the git server, like LFS does. Which would still have been better, since it wouldn't have required a custom server API.
It was very hard to use in asymmetric cases where different people have different credentials
Right, but LFS can't be used in asymmetric cases at all - it assumes anyone with access to the git repository has access to the LFS storage area.
Wait, really? I thought that Git LFS let people with push access push files to the LFS area, which can then be read by anyone. That's asymmetric in the way everyone expects from GitHub. But I didn't use Git LFS because it's too expensive.
Yes, I probably encountered extra weirdness from git-annex, from the fact that the codebase was on GitHub, which doesn't support git-annex, so _everything_ in git-annex had to be on a different remote.
If it was meant to be used with the upstream as the only remote, that makes things make a lot of sense, and explains why my attempt to use it felt a lot like early Git, where there was no good upstream service like GitHub.
Here's the money quote:
"The team is also pursuing an experimental effort with Mercurial an open source DVCS similar to Git. The goal is to add scalability features to the Mercurial client so it can efficiently support a codebase the size of Google's. This would provide Google's developers with an alternative of using popular DVCS-style workflows in conjunction with the central repository. This effort is in collaboration with the open source Mercurial community, including contributors from other companies that value the monolithic source model."
Project that forward logically by two years.
Sincere apologies if you can't derive any information from my comment, but that doesn't mean there isn't any there.
Devs can use the mercurial/git clients mentioned in the paper linked by harveynick.
Yup! I use Gerrit at my company and share Administration duties with our Devops team.
I know Android uses Gerrit I just wasn't sure if Angular and co. did which is why I worded it a bit more vaguely.
It's not that you can just do "GET / HTTP/2.0" or something like that.
The TLS part is interesting, as wrapping a protocol into an encrypted channel solves a lot of these issues (but it can break again if you have stupid man in the middle boxes). It just doesn't solve the issue for TLS itself.
HTTP/2 doesn't have the same problems because it requires TLS+ALPN, but IIRC that "clean" solution was only arrived at after years of discussion and experimentation.
The result is the main RPM everyone uses will probably stay at version 4.x forever.
[0]: http://rpm5.org/
"Initially, nobody registered it, but on August 15, 1994, [...] filed for the trademark Linux, and then demanded royalties from Linux distributors. In 1996, Torvalds and some affected organizations sued him to have the trademark assigned to Torvalds, and, in 1997, the case was settled." https://en.wikipedia.org/wiki/Linux#Copyright,_trademark_and...
In TLS 1.3 the downgrade protection works like this:
If I'm a TLS 1.3 server, and a connection arrives that says it can only handle TLS 1.2 or lower, I scribble the letters "DOWNGRD" (in ASCII) near the end of a field labelled Random that is normally entirely full of random bytes.
If I'm a TLS 1.3 client, I try to ask for TLS 1.3 from the server when I connect, if instead I get a TLS 1.2 or earlier reply, I check the Random field, and see if it spells out "DOWNGRD" near the end. If it does, somebody is trying to downgrade my connection, I am being attacked and can't continue.
This trick works because if bad guys tamper with the Random field then the connection mysteriously fails (client and server are relying on both knowing all these bytes to choose their encryption keys with ephemeral mode) while older clients won't see any meaning in the letters DOWNGRD near the end of these random bytes - so they won't freak out.
You might worry: What if somebody just randomly picked "DOWNGRD" by accident for a TLS 1.3 connection ? If every single person in the world makes one connection per second, this is likely to happen to one person, somewhere, only once every few years. So we don't worry about this.