Bandwidth needs halved by new compression written in Go(arstechnica.com) |
Bandwidth needs halved by new compression written in Go(arstechnica.com) |
In the process we've committed various things back to Go itself and at some point I'll write a blog on the whole experience, but one thing that made a big difference was to write a memory recycler so that for commonly created things (in our case []byte buffers) we don't force the garbage collector to keep reaping memory that we then go back and ask for.
The concurrency through communication is trivial to work with once you get the hang of it and being able to write completely sequential code means that it's easy to grok what your own program is doing.
We've hit some deficiencies in the standard library (around HTTP handling) but it's been fairly smooth. And, as the article says, we swapped native crypto for OpenSSL for speed.
The Go tool chain is very nice. Stuff like go fmt, go tool pprof, go vet, go build make working with it smooth.
PS We're hiring.
sigh This is by far go's biggest wart IMO, and one that frequently sends me back to a pauseless (hah! at least less pausy:) systems language. I sure do like it in almost every other meaningful regard. But I wish latency wasn't something the designers punted on.
Not going to lie - I'm heavily considering taking this as an entry level position to get my foot in the door.
There's a good chunk of that article dedicated to discussing the language choice and how other languages could have been used instead but -in this specific instance- wasn't chosen. The language choice is as much a part of the topic as the compression routines themselves. So it makes a lot of sense to include the term 'Go' in the title given that's a large focus of the article.
It's really no different to all these articles that spring up about fancy demos being built in Javascript or CSS tricks. Yet in those instances nobody says "the title is misleading. You could write that demo in C++ as well."
The thing is, many people use the title to determine whether the article is worth reading. As is, the title suggests that there is something unique about Go that reduces the bandwidth needed by the program, implying that this is something that other common languages fail to achieve. This is obviously impossible (any widely used language is capable of serializing an output byte stream in any way the programmer desires). As a result, the title sets off the alarm for "Language fanboyism", and "mathematically impossible claims", and goes swiftly into the "don't bother" pile together with "universal lossless compression algorithm invented!"[1], "perpetual motion machine" and "My favourite X language is faster that C/C++/Assembler!1!1"
From my personal point of view I'm happy they used it though, because it means more people tinkering with and improving the language.
I still feel that C++ is generally a better choice, but if I only had a short time to write something in, I would definitely go for Go.
So that brings up the question—is this just something CloudFlare is announcing for the PR, or is it actually innovative?
From the article:
“Go is very light,” he said, “and it has fundamental support for concurrent programming. And it’s surprisingly stable for a young language. The experience has been extremely good—there have been no problems with deadlocks or pointer exceptions.” But the code hit a bit of a performance bottleneck under CloudFlare’s heavy loads, particularly because of its cryptographic modules—all of Railgun’s traffic is encrypted from end to end. “We swapped some things out into C just from a performance perspective," Graham-Cumming said.
“We want Go to be as fast as C for these things,” he explained, and in the long term he believes Go’s cryptographic modules will mature and get better. But in the meantime, “we swapped out Go’s native crypto for OpenSSL,” he said, using assembly language versions of the C libraries.
I assume since you've got the preimages of compression, as well as control over the compression format, that the diff and patch operations are much more efficient in space and time than they would be with arbitrary binary data. But...by how much?
Also the assertion that "It has already cut the bandwidth used by 4Chan and Imgur by half" sounds disingenuous and possibly not backed up by moot's quote “We've seen a ~50% reduction in backend transfer for our HTML pages (transfer between our servers and CloudFlare's),”. Is backend transfer for HTML pages the only bandwidth they're using? Is the rest of it halved, and if so, how and why?
The title of the story also makes me gag.
I've always thought there were some potential attacks there around cache disclosure (which Google avoided by going with SDCH instead).
CloudFlare controls the server and the client, so they don't need to worry about the attacks or about persuading everyone to adopt their RFC.
Otherwise the this is more compelling for content sites like the referenced 4chan. But still very cool.
The response is generated by the origin webserver as normal. But rather than sending that response using the normal HTTP encoding, instead the proxy first does a binary diff against any versions that the (CloudFlare) client says it has and that the (CloudFlare) proxy also has in its cache. They use e.g. ETags or MD5 to uniquely identify the entire response content.
You can still do cookie stripping etc to try to avoid the request to the webserver altogether, but that's a separate concern.
Currently, cookies are not part of the hash.
We have customers of all types using Railgun. As an example, there's a British luggage manufacturer who launched a US e-commerce site last month. They are using it to help alleviate the cross-Atlantic latency. At the same time they see high compression levels as the site boilerplate does not change from person to person viewing the site.
What sort of sites do you think it doesn't apply to?
Single page webapps. In those cases the html/js is normally static and already CDN'ed and the data is a JSON API which varies on a per user basis.
There would be some gain as the dictionary would learn the JSON keys but I doubt it would be very dramatic vs deflate compared to the content sites referenced in the article.
If it's HTTP, then how does CDN handle certificates? Does it use CDN's certificates?
If it's HTTPS, then 1) Isn't hash gonna be a lot different if if the two versions are very alike? 2) Why does Railgun encrypt the encrypted data again?
For the connection from Railgun to the origin server it will depend on the protocol of the actual request being handled. If HTTPS Railgun makes an HTTPS connection to the origin.
I know lots of webservers zip their response data, but I was always curious about the things in html that show up very often and if there's a way to optimize around that.
For example, most web xml data contains a lot of common tags, like "div" and "span" and others that are specific to html. I think if you add them up, they might make up a considerable percent of traffic data. Is it possible for the web server to swap those out for a single character before it sends the data, and have the browser replace it when it arrives?
Or does zip compression already do that somehow?
Zip will replace the common tags (like "div") with a single "div" (in the compression dictionary), then a single character every time it appears (more or less - it might be less than a single byte if it's a really common tag). So there'll be a wasted overhead of a dictionary of common tags (which is kind of wasted).
It would be more efficient if both the browsers and compression algorithms could agree (beforehand) had a dictionary of common terms which would be likely to appear in the document.
If you're compressing a lot of data which is likely to be similar, you can do this with a common dictionary. See - http://stackoverflow.com/questions/479218/how-to-compress-sm...
Of course, my answer on Stackoverflow is pretty crude. You could create a dictionary used to compress the compression dictionary. Google is probably going to do this any time soon (if they haven't already) since they control the client (Chrome), server (google web server) and protocol (SPDY).
This isn't an announcement about companies supporting Railgun...it's about companies supporting CloudFlare by installing the Railgun Listener.
Third time's a charm. Definitely going to have to investigate.
I can't find any such announcements; anybody have links? Based on comments further down, I wonder if the author is confused.
> CloudFlare will provide software images for Amazon and RackSpace customers to install
That is very different from the claim in the first paragraph.
Dreamhost: http://dreamhost.com/dreamscape/2013/02/26/cloudflare-railgu... Media Temple: http://weblog.mediatemple.net/2013/02/26/the-web-just-got-fa...
Same thing could be easily achieved using futures or any of the asynchronous libraries available to C++, Ada, JVM and .NET languages.
Edit: err, you are correct, I didn't realize WAN optimization included binary diffing and compression. Should google before I comment.
Hopefully http2.0 will attempt solving this.. erm...
Uh, why?
For Go that came for free because its Communicating Sequential Processes design does that for you.
http://calendar.perfplanet.com/2012/efficiently-compressing-...
I google, but all I find is people complaining their proxy/filter/appliance/diagnostic is breaking because it doesn't understand SDCH.
It seems like SDCH has been around for 4 years, I presume the lack of data means it hasn't worked out.
(I imagine that you could drastically reduce the CPU load of compression by making simple hard coded state machines for each dictionary. For content like XML or json you could easily make your field names and surrounding punctuation minimal. For many very short messages sharing a dictionary that would beat deflate on compression ratio, and for long messages of non-repeating field values it wouldn't be much worse. CPU use of expansion is probably comparable, though you might get better memory access behavior out of SDCH.)
Railgun only requires website to deploy a client, and cooperate with cloudflare. User's client doesn't have to be Chrome or whatever; web server doesn't need to be aware of Railgun. It's transparent to both HTTP clients and HTTP servers. SDCH, however, requires a modification to HTTP/1.1 protocol, which implies changes in both HTTP clients and HTTP servers.
Both are quite promising, though, Railgun is easier to adopt.
AFAIK, Chrome is the only one supporting it from a browser standpoint.
I agree though — Go looks awesome.
There's been decades of work on real-time garbage collection yet all of those approaches still have tradeoffs. Consider that object recycling is a ubiquitous iOS memory management pattern. This reduces both memory allocation latencies and object recreation overhead. Ever flick-scroll a long list view on an iPhone? Those list elements that fly off the top are virtually immediately recycled back to the bottom -- it's like a carousel with only about as many items as you can see on screen. The view objects are continually reused, just with new backing data. This approach to performance is more holistic than simply pushing responsibility onto the memory allocator.
Memory recycling here also reminds me of frame-based memory allocator techniques written up in the old Graphics Gems books, a technique likewise covered in real-time systems resources. Allocating memory from the operating system can be relatively expensive and inefficient, even using good ol' malloc. A frame-based allocator grabs a baseline number of pages and provides allocation for one or more common memory object sizes (aka "frames"). Pools for a given frame size are kept separate, which prevents memory fragmentation. Allocation performance is much faster than straight malloc, while increasing memory efficiency for small object allocation and eliminating fragmentation. Again, this is a problem-specific approach that considers needs beyond latency.
Precisely. Which is why for performance-critical systems code it's important to give the programmer the choice of memory allocation techniques, but to add features to the language to make memory use safer.
Garbage collection is great, but occasionally it falls down and programmers have to resort to manual memory pooling. Then it becomes error-prone (use after free, leaks) without type system help such as regions and RAII.
I find Rust's approach much more interesting, because GC is entirely optional, but it provides abstractions that make it easier to write clear and correct manual memory management schemes.
The simplistic GC isn't part of the language design, it's a stopgap in the first version.
That same argument could be used for having the language in the title as people who are not interested in programming are going to be less interested in a thread about programming.
And language fanboyism is going to happen with or without this title (given the content of the article). What's happening here is more a case of lazy members wanting to commentate on articles they've not even read. It's basically the lowest form of blogging.
Initially, I wasn't actually planning to do deltas for the compression technique and it was in testing with a whole bunch of common sites that I stumbled upon the fact that they don't change very much. That lead me to wonder about the algorithms that might be used.
I did test quite a lot of stuff (and at one point thought I'd come up with a truly cool new algorithm only to realize that I was mistaken :-) to decide what to do.
Railgun has to trade off three things: compression efficiency, space and time. Because we are trying to do this for performance time is the most important thing to optimize for, followed by efficiency, followed by space. bsdiff is very, very good at delta compressing binary things; Railgun isn't as good, but it's very, very fast.
A year or two ago I got quite interested in delta compression, read all the papers I could find on the topic, and eventually came up with an algorithm that seems pretty competitive, although I've mostly focussed on efficient compression rather than speed. Someday I'll get around to porting the code from Python to C and find out what the performance is really like.
For what it's worth, my code is here: https://gitorious.org/python-blip
Take a look at this Rob Pike video: http://blog.golang.org/2013/01/concurrency-is-not-parallelis...
Now that video might well be crap, I'll be the first to admit I'm not skill enough to know one way or the other.
But based on that video, it does appear to me that Go does offer some form of multi-core magic and it does appear to come at a minimal cost.
A barrier to implementation of SDCH is deciding what dictionaries to create and when to update them.
(I'm aware that the post might not be prestigious as say, engineering - however, I feel that having someone with strong web development experience (who is a user of Cloudflare already) would more than offset the slight inconvenience on your part.)
EDIT: Grammar
And I couldn't agree more, being placed in the firing lines of customers is often more telling than building the software yourself - "normal" people tend to notice things which we as developers are prone to miss or gloss over unintentionally.
-----
The awkward moment when I notice I blanked the CEO
On the other hand, if I didn't know C++ and I was looking for a native compiled language to learn, I'd probably choose Go over C++.
Perhaps it's just my personal experience though.
In a neutral way, it's like: if you spent all that effort mastering this language to get such fine-grained control, why would you give that up again? And really, I understand: why would you give that up? Especially if you know how to use C++ in a fairly painless way.
Funny I though C++ did exactly the same thing. Where are the L1, L2 and L3 caches references, multiple opcode execution pipelines, processor instructions ?
Wouldn't that be assembly? Last time I heard Stroustrup, he was all raving about abstractions, not raw unfiltered access.
Note of course, that relying on the data not being corrupt on the client (which you must if you assume the compression dictionaries are sane) is dangerous. I assume you guys must store some checksum that you compare once to make sure when someone says "i have version 5, delta against this", that they really have a good copy of version 5?
SVN used to what you are suggesting, btw. We only send clients deltas against the versions they already have, and precompute them in some cases :)
(edit) That's an awesome name by the way. Railgun.
To what end? Rsync too works off both copies.
In any case, if you know the files are related, you
1. Don't need to do any of this. You can simply send the binary delta that is is usually copy/add instructions (IE copy offset 16384, length 500 to offset 32768)
2. Can precompute the deltas.
You can actually precompute in any case, it just makes no sense unless you know you will be diffed against something else.
We have users in Singapore who access various XML-heavy web services in our NY office. A dictionary-style over-multiple-requests compression technique would be brilliant for their case.
They've saved a huge amount for us (I think of 90%) of AJP (http<->tomcat) traffic. Not particularly difficult to set up.
That would be really cool in fact.
Yes.
Most companies depreciate hardware over 3 years. How much WAN/Internet bandwidth will you NOT use over the next 3 years, and how does that translate into upgrades you won't need to make?
There are also arguments for these boxes along the lines of "right now we use really expensive WAN links, but these boxes do end-to-end encryption too, so we can put the traffic on the Internet instead" but that opens up a few obvious cans of worms (and can of course be done without an accelerator with VPNs and whatnot).
Then you get into the more nebulous arguments that big bosses tend to like, such as "The average user makes Y XML requests per day to process X widgets. Each request takes Q seconds now. If we lower that to Q*0.5 with WAN acceleration, each user can now process N more widgets per day". Fluffy argument, but can have a big impact on business decision makers, especially if you can tie it to a dollar amount.
Note that WAN Accelerator salespeople are really, really good at coming up with arguments like this for/with you during the sales process.