HTTP/2: The Long-Awaited Sequel(blogs.msdn.com) |
HTTP/2: The Long-Awaited Sequel(blogs.msdn.com) |
Some of the coolest stuff I saw was streams and server push. Streams allow multiplexing multiple logical streams of data onto one TCP connection. So unlike the graphs you typically see in chrome network inspector where one resource request ends and another begins, frames (the unit of data) from multiple streams are sent in parallel. So this means only one connection (connects are persistent by default) is needed between server and client, and there are ways to prioritize streams and control flow so it gives devs more opportunities for performance gains.
Also headers are only sent in deltas now. Client/server maintain header tables with previous values of headers (which persist for the connection), so only updates need to be sent after the first request. I think this will be a consistent 40-50 byte saved per request for most connections where headers rarely change.
[1] http://tools.ietf.org/html/draft-ietf-httpbis-http2-14
[2] http://chimera.labs.oreilly.com/books/1230000000545/ch12.htm...
TCP has steams. TCP has connection mux. TCP has flow and congestion control. HTTP has keepalive. Why build another stack on OSI layer 7?
Also now we have to keep state to work out what the diffs are. State is evil.
Whilst I'm sure this will have some minor performance advancements, I'm not sure that it justifies the new protocol stack.
Not sending 2Mb of JavaScript and crappy HTML down the connection to display the front page probably has higher gains.
In an ideal world we could switch to using something like the SCTP networking protocol with HTTP that would solve a lot of issues. Unfortunately we are stuck with TCP, so the application protocol (HTTP) now must implement a networking protocol so we can multiplex over a single connection.
At least people won't have to inline resources, sprite images, or concatenate CSS and JavaScript anymore. And header compression is a small upgrade to the spec.
TCP != HTTP
TCP is a transport layer protocol (OSI Layer 4). HTTP is an application layer protocol (OSI Layer 7).Connections are already processed in parallel whenever they can. That is, when the browser knows what to request, and it fits in the execution model. If there's a huge number of assets on a single hostname, this has been a limiting factor because the browsers have limited the number of requests to a single hostname to avoid overloading the server. But that will remain an issue even if the requests are multiplexed over a single connection.
Most of the time when I see graphs in the network inspector that aren't massively parallel it's because nobody have spent time optimizing where/how assets are requested in ways that will make them just as bad with connection multiplexing.
There certainly can be benefits to reap from it, but the worst offenders are already ignoring best practices.
A TCP handshake has to take place for each connection and this isn't cost free, and there's the SSL negotiations on top (though techniques like OCSP stapling help)
Going massively parallel isn't free - Will Chan of Chrome did a good write up here: https://insouciant.org/tech/network-congestion-and-web-brows...
HTTP/1.x was neatly layered on TCP with an easy-to-parse text format. This in turn ran neatly on IP4/6, which ran on top of Ethernet and other myriad things. This separation of concerns gave us the benefit of being very easy to understand and implement, while also allowing people to subvert the system, adding things like half-baked transparent proxies to networks that would munge streams and couldn't agree where HTTP headers started. We ended up having to design WebSockets to XOR packets just to fix other people's broken deployments.
HTTP/1.x also became so pervasive that it became the overwhelmingly most popular protocol on top of TCP, even to the point where a system administrator could block everything but ports 80 and 443 and probably not hear anything back from their userbase. This is the reason we ended up with earlier monstrosities like SOAP and XML-RPC: by that point HTTP had become the most prevalent transport that it was assumed incorrectly in many cases that it was the only transport.
Perhaps the IETF should be pushing a parallel version of HTTP that pushes many of these concerns into SCTP. The problem here is that it'll take forever to get that rolled out and we need something to improve things now. Look at how long it's taking to roll out IPv6: something we actually need to fix now.
Leading? Firefox and Chrome already support HTTP/2 already (and SPDY, the basis for HTTP/2, for a long time now), just not enabled by default.
Their real problem of course is IIS. We'll probably have to wait for IIS9 which I cannot see happening for another two years. IIS8.5 appeared 12 months ago in Windows Server 2012 R2.
It seems unusual for Microsoft to disable SPDY support entirely, at least until support for HTTP/2 is more widely deployed...
So if they leave SPDY in place along with HTTP 2.0, they could wind up with strange incompatibilities occurring or site operators feeling like they need to support both SPDY and the HTTP 2.0 standard (rather than just the HTTP 2.0 standard).
Looking at it, it actually seems more progressive to dump SPDY and move to the SPDY-based HTTP 2.0 at this stage. Then ten years down the road hopefully SPDY will be dead and there will just be HTTP/1.1 and HTTP/2.0.
This does not apply for ad code that's implemented as <script src="..."></script>, which will indeed block page loading.
"What does this mean for developers?
HTTP/2 was designed from the beginning to be backwards-compatible with HTTP/1.1. That means developers of HTTP libraries don't have to change APIs, and the developers who use those libraries won't have to change their application code. This is a huge advantage: developers can move forward without spending months bringing previous efforts up to date."
It's probably because IE is really just a UI wrapper around system libraries[0]. The changes for HTTP/2 would be made not in IExplorer.exe, but instead in WinInet.dll (and possibly URLMon.dll).
This is because IE isn't the only application that will use these new features.
EDIT: I should add that you don't just go changing system libraries in a patch Tuesday, you'd wait and throw them in a new version, hence the 10 preview.
I want that so bad. Coding is hard, DDoSing is so easy.
Thank you architects for making black hats life so easy. HTTPS by default? YEESS even more leverage.
I love progress.
Next great idea: implementing ICMP, UDP, routing on top of an OSI layer 7 protocol, because everybody knows religion forbid to open firewall for protocols that do the jobs, or we could even create new protocols that are not HTTP. But HTTP for sure is the only true protocol since devs don't know how to make 3 lines of code for networking and sysadmins don't know how to do their jobs.
And HTTP is still stateless \o/ wonderful, we still have this wonderful hack living, cookies, oauth and all these shitty stuff. Central certificate are now totally discredited, but let's advocate broken stuff even more.
Why not implement a database agnostic layer on top?
When are we gonna stop this cowardly headless rush of stacking poor solutions and begin solving the root problems?
We are stacking the old problems of GUI (asynch+maintainability+costs) with the new problem of doing it all other HTTP.
I have a good solution that now seems viable: let's all code in vanilla Tk/Tcl: it has GUI, it can do HTTP and all, and it works on all environment, and it is easy to deploy.
Seriously, Tk/Tcl now seems sexy.
Given that the web is becoming more and more real-time this seems pretty interesting.
Is there a risk that cellular data usage will increase from this?
Found this project but nothing live
I'm saddened. The days of good internet protocols are clearly behind us.
At the risk of sounding too blunt: Everything? All of it? Its mere existence?
It fucks up responsibilities by addresses network-layering issues at the application layer. It takes a simple & stateless text-mode protocol and converts it into a binary & state-full mess.
It has weird micro-optimizations decided to ensure that Google's front-page and any Google-request with its army of 20000 privacy-invading tracking cookies should fit within one TCP-packet using American ISPs MTU packet-size, to ensure people are not inconvenienced when their privacy is being eaten away at. Which I'm sure is useful to Google, but pretty much nobody else.
The list goes on.
It does a lot of things which is not needed nor asked for by the majority of the internet, and yet the rest of the internet is asked to pay the cost of it through a mindboggling increase in complexity, and I'm sure a source of a million future vulnerabilities.
I'm not aware of a single thing in there which I want, and if I'm wrong and find one, I'm unwilling to accept that this is the cost I have to pay for that feature.
Any web-browser I will use in the future will be one where HTTP/2 can be disabled.
However there were may other bad protocols that died through lack of use. You can still vote with your feet. A vendor will not maintain a protocol stack if people don't use it.
And yeah, I'm with you--I think that a lot of this tail-wags-dog stuff is going to come back and haunt us, but we as an industry fucking suck at being conservative when it makes sense.
I was unaware of this and became intrigued. If anyone else is curious, this is the explanation from the RFC: http://tools.ietf.org/html/rfc6455#section-10.3
Basically it's to prevent an attacker from cache poisoning an HTTP proxy (like one on a corporate network) that doesn't properly support WebSockets. WebSockets look a lot like HTTP over the wire, so without masking the wire data in some way a proxy could be tricked into believing a faked "HTTP"-looking request and response are real, and thus cache whatever an attacker supplies.
This would technically be a bug in the proxies, but it's nice to see IETF accounted for this and put in countermeasures before it inevitably became a DEFCON talk.
I disagree. Nothing is needed now, HTTP/1 is not broken and it works well enough.
There should be time enough to come up with a clean design. Even if it requires designing a new transport protocol.
Rolling out a new transport protocol like SCTP takes a lot less time than rolling out a new network protocol like IPv6. Transport protocols only runs on the endpoints, not on the routers in the network.
Except for firewalls and NAT'ing home routers, but if HTTP/1 over SCTP would result in a faster better browsing experience the problem would solve itself.
So the real world impact for users is likely to be small:
Lowering the cost of multiple streams will likely give you decent percentage wise improvements on page download times that are already so good that the absolute improvements are likely to be small, and likely minimal to no improvements on the pages that are actually slow.
With the advent of the browser pre-loader then as long as the resources are declared in the markup then the browser should discover them and issue the request.
Currently browsers often seemed to blocked on waiting for a connection to come free
A lot of the ad JS I see is in the form of inline script tags, which generally should not block anything (the JS usually asynchronously constructs another script tag, which shouldn't impact performance).
>Try it, the change can be dramatic.
I've been using AdBlock Plus, and now uBlock, for at least 8 years. So I'm definitely not arguing against it.
It's just that in theory, an ad tracker (like a 1 pixel image) does not necessarily have to impact performance. Also note that some ad blockers add performance overhead themselves.
Well, it does impact performance, even if it's async. The pipe is only a finite width (especially on mobile).
That seems better than the current situation, which often ends up doing the exact same thing, but in an ad-hoc way that gets reimplemented every time.
Could you elaborate on this, please?
SCTP is message oriented rather than stream oriented so this isn't really useful. The chunk size is also two bytes meaning that all your messages have to be less than 64k or you have to implement packet reassembly and stuff. Oh look, back at TCP again.
We must do nothing.
I suspect this entire SPDY/HTTP/2 reengineering effort is a 1000% complexity and risk increase for a 2-5% gain in performance. That is not a trade-off as an engineer I could accept.
90% of the inefficiency of web applications is down to the application stack, not the protocols. Sending hundreds of KiB of uncompressed text down rather than compressed abstract or native virtual machine instructions for example is a bigger win.
Oh wait, SCTP can act in an ordered-with-congestion-control mode (aka stream-oriented), and the userland interface to it (the most basic form of which is just plain old Berkeley sockets) does in fact implement packet assembly (of course, no matter what, if you want packets bigger than the MTU something's gonna have to disassemble and reassemble them on some level of the stack anyways).
Not to say that SCTP is a practical solution given the glacial pace of acceptance of any new network protocol at its level, but let's not start spreading FUD about its capabilities.
But it isn't so for transport protocols like SCTP. Only the endpoints using it need to support it. So a transport protocol that provides a real benefit could be deployed relatively quickly.
80 and 443 are "well known ports"[1] which is fine.
What does this have to do with ports? TCP is connection based so a client can create as many connections as it likes to a port on a host.
If someone does indeed build a new "internet" built on top of HTTP which is tunnelled through well known ports with different services with the intention of circumventing the firewall then they will not be allowed through my firewall at all.
EDIT: why do you want to block HTTP/2 by the way? You know that HTTP/1.1 can be used to tunnel other protocols too, right?
In fact you're going to have to go to the same effort to establish a TCP connection that your HTTP/2 is going to run over, then still have to do a key exchange. That channel then has the same advantages of a persistent HTTP/1.1 channel plus the ability to provide multiple streams.
The multiple streams can be resolved simply by making more than one connection to the server defensively. Perhaps a mechanism to schedule that client-side would work. Oh wait, we already have one (connection limits and keep-alive).
Then again, all of this is moot as once you've loaded the static resources (images/css/js etc) via HTTP, you should only be seeing one request periodically when an operation takes place or at an interval if polling or kept alive for server-push so maximum two connections from a client to a server.
If you need to do anything more than that, you're probably using the wrong technology both on the server and client.
HTTP/1.1 tunnelling I understand. In fact I use it most of the day (RDP over terminal services gateway) which is RPC over HTTP.
The rationale I have is that effectively managing HTTP/2.0 at the firewall requires packet and protocol inspection rather than merely understanding what connections have been made and where from and where to. This has a significant complexity and tooling cost and effort. Plus there is a significant opportunity to mask illegitimate traffic as legitimate traffic. For those of us who deal with end user network security, this is a major problem.
As a practical matter, the percentage of customers who will put up a VPN to use your service is vanishingly small.
And then we'll be back at square one, ready to make this mistake all over again.
HTTP/2 is implementing TCP's responsibilities. Again. Badly.
But of course, though we rarely worry about Token Ring these days, we do run TCP over IEEE 802.11 all the time.
Likewise, HTTP is run over other Transport Layer protocols even if it is less common, e.g. UPnP uses HTTP over the UDP transport layer protocol. http://en.wikipedia.org/wiki/Universal_Plug_and_Play#Protoco...
OSI's higher levels are abstractions. As is the case with all useful abstractions, they serve to implement the functionality of lower levels without requiring attention to their actual implementation. Not having to manage TCP allows a lot of useful JavaScript to be easily written.
There are upwards guarantees at each layer that the stack makes. All implementations within the layer must be equal to the next layer even if one of the implementations provides capability of higher layers. Nothing is said however about adding further guarantees in layers 8, 9, 10, 11, 12...and so forth because they have already been made.
I suppose I shouldn't use parity bits on serial connections then?
"Not having to manage TCP allows a lot of useful JavaScript to be easily written"
That's absurd. It makes no difference.
As for UPnP, which I know well having written an entire UPnP stack, it's a broadcast messaging layer, not a connection based protocol. All the HTTP messages stay within the size of a UDP datagram and it is expected to be wholly unreliable. Even though it's ugly, it's hardly a comparison.
I get the feeling a lot of people here are web developers with little experience of protocol stacks and not system programmers!
Back when we I had a 14.4k SLIP dialup and RTT of 200ms+ connection overhead and TCP channel overhead was a major drag on throughput but it's not like that now. I'd be surprised if there was a tangible difference to the end user.
It's destroying big business who push lots and lots of resources to the browser:
- From an admin POV, you have to shard your domain => more work, more maintenance.
- From a browser POV, you have to open multiple TCP connections => you take slow start and TLS handshake in your face for each connection + the connections have to fight each other because the OS wants to be fair among TCP connections
- From a web admin POV, you want to inline your content to reduce round trips => you have more work to do on your resources
SPDY is certainly not necessary for everyone (it mostly benefits those who push lots of different resources), that's true. We're talking about businesses who lose a month worth of revenue if the latency to their site explodes from 50 ms to 500 ms.
But it still is interesting because the actual usage _on top of HTTP_ doesn't change: you still have your websockets or your Server-Sent events, you still have your keepalive, you can do a simple-stupid "one HTTP call per resource" and it will be handled efficiently, sometimes SPDY will work underneath to push content so that the next HTTP call will actually hit the cache without you knowing about it... all at the cost of changing (or updating) your library. Because you certainly don't write HTTP text directly to your TCP socket.
The interesting point will be for those library developers. The added complexity will certainly make it harder, but on the other hand the binary format and strict rules will make it easier to parse the messages... I'd like to see where it goes here.
If you're writing at the level of serial connections and parity, by all means pay attention to those details. If you're writing at higher level, consider abstracting away such details in an interface, library or module.
I miss Wildcat BBS as much as the next person: by which I mean, not very much. HN is full of really fucking smart people not the idiots implied by your comment.
This is nothing to do with BBS's or code abstractions. On the former, there is no OSI stack; it's terminals down serial connections. On the latter, it's datagrams or sockets. It's about the guarantees that the link layer makes or doesn't. Parity doesn't pass up the layers because the guarantees are made further up (TCP).
You can still run token ring, serial, thick ether, thin ether, paper aeroplanes thrown between buildings. It doesn't matter above the data link layer.
Yes there are really fucking smart people here, as you put it but it appears there is a normal distribution of people as well.