Bandwidth needs halved by new compression written in Go

Bandwidth needs halved by new compression written in Go(arstechnica.com)

270 points by marmalade 13 years ago | 118 comments

rjknight 13 years ago |

The title suggests that there's something unique about Go, either the language or its standard library, that enables bandwidth savings. In fact, Cloudflare have written some software which they claim enables them to reduce their bandwidth, and this software happens to be written in Go. This might be an excellent choice (and I suspect it probably is), but it's not Go per se that is reducing the bandwidth usage.

jgrahamc 13 years ago | |

I agree. The benefit of using Go is that it's fast to write and has good concurrency features. To give you an idea of the size, there are 7,329 lines of Go code in Railgun (including comments) and a 6,602 line test suite.

In the process we've committed various things back to Go itself and at some point I'll write a blog on the whole experience, but one thing that made a big difference was to write a memory recycler so that for commonly created things (in our case []byte buffers) we don't force the garbage collector to keep reaping memory that we then go back and ask for.

The concurrency through communication is trivial to work with once you get the hang of it and being able to write completely sequential code means that it's easy to grok what your own program is doing.

We've hit some deficiencies in the standard library (around HTTP handling) but it's been fairly smooth. And, as the article says, we swapped native crypto for OpenSSL for speed.

The Go tool chain is very nice. Stuff like go fmt, go tool pprof, go vet, go build make working with it smooth.

PS We're hiring.

gatherknwldg 13 years ago | | |

"write a memory recycler"

sigh This is by far go's biggest wart IMO, and one that frequently sends me back to a pauseless (hah! at least less pausy:) systems language. I sure do like it in almost every other meaningful regard. But I wish latency wasn't something the designers punted on.

rjknight 13 years ago | | |

Go does look awesome. I've spent some time with Erlang, Clojure and Scala (roughly in the order that I liked them most), but Go passed the "get started writing useful code quickly" test better than any of them. Haven't gone beyond the basics yet, but I think it might occupy a sweet spot of ease of use combined with "power", loosely defined.

shanelja 13 years ago | | |

http://www.jobscore.com/jobs/cloudflare/technical-customer-s...

Not going to lie - I'm heavily considering taking this as an entry level position to get my foot in the door.

laumars 13 years ago | |

The title only suggests something unique about Go to those who didn't read the article.

There's a good chunk of that article dedicated to discussing the language choice and how other languages could have been used instead but -in this specific instance- wasn't chosen. The language choice is as much a part of the topic as the compression routines themselves. So it makes a lot of sense to include the term 'Go' in the title given that's a large focus of the article.

It's really no different to all these articles that spring up about fancy demos being built in Javascript or CSS tricks. Yet in those instances nobody says "the title is misleading. You could write that demo in C++ as well."

nickolai 13 years ago | | |

>The title only suggests something unique about Go to those who didn't read the article.

The thing is, many people use the title to determine whether the article is worth reading. As is, the title suggests that there is something unique about Go that reduces the bandwidth needed by the program, implying that this is something that other common languages fail to achieve. This is obviously impossible (any widely used language is capable of serializing an output byte stream in any way the programmer desires). As a result, the title sets off the alarm for "Language fanboyism", and "mathematically impossible claims", and goes swiftly into the "don't bother" pile together with "universal lossless compression algorithm invented!"[1], "perpetual motion machine" and "My favourite X language is faster that C/C++/Assembler!1!1"

[1]http://en.wikipedia.org/wiki/Pigeonhole_principle

vanderZwan 13 years ago | |

I like Go, but that bothered me too.

From my personal point of view I'm happy they used it though, because it means more people tinkering with and improving the language.

shanelja 13 years ago | | |

I've never used go professionally and most of my spare time is split between C++ and Scheme at the moment, but when I did go spelunking with Go, I found it a breeze to write complicated functionality in it - it felt like C++, but easier and more initially powerful.

I still feel that C++ is generally a better choice, but if I only had a short time to write something in, I would definitely go for Go.

calinet6 13 years ago |

Go is used, sure, but the cool part about this is the binary Railgun protocol. Really smart. Send only file hashes and binary diffs back and forth, do a little extra computation to figure out the changes, but only send the absolute minimum data you need to the CDN. That's just smart, and frankly, I hope other CDNs have been doing this already, because at any high volume it seems to be an obvious solution.

So that brings up the question—is this just something CloudFlare is announcing for the PR, or is it actually innovative?

0x0 13 years ago | |

It sounds like they reinvented rsync to me?

jgrahamc 13 years ago | | |

No, because we have more information than rsync does. We own both ends of the connection and can keep versions synchronized.

dubya 13 years ago | | |

I think it's more like rsync + git. You have copies of previous versions, and just ask for their hash to figure out which previous version to diff against the current version, then send the diff.

judofyr 13 years ago | | |

Not sure why you're downvoted. This is basically what rsync is: a smart algorithm for doing rolling checksums and only sending diffs.

coldtea 13 years ago | | |

Because every diff algorithm "reinvents rsync", right?

bitcartel 13 years ago |

The bandwidth reduction is due to use of a binary protocol, not Go. It just so happens the server code is written in Go and C.

From the article:

“Go is very light,” he said, “and it has fundamental support for concurrent programming. And it’s surprisingly stable for a young language. The experience has been extremely good—there have been no problems with deadlocks or pointer exceptions.” But the code hit a bit of a performance bottleneck under CloudFlare’s heavy loads, particularly because of its cryptographic modules—all of Railgun’s traffic is encrypted from end to end. “We swapped some things out into C just from a performance perspective," Graham-Cumming said.

“We want Go to be as fast as C for these things,” he explained, and in the long term he believes Go’s cryptographic modules will mature and get better. But in the meantime, “we swapped out Go’s native crypto for OpenSSL,” he said, using assembly language versions of the C libraries.

shanelja 13 years ago | |

On another note, it's always nice to see such an influential part of the HN community giving quotes for sites like this - not only does it make me a little proud to be associated with any of you, it makes me more hopeful for the chances of my future that I can call myself one of us.

lclarkmichalek 13 years ago | | |

Success by association seems about as valid as guilt by association.

jgrahamc 13 years ago | |

The binary protocol means we don't add (much) overhead, the bandwidth reduction is because we are sending page diffs which themselves are encoded in a compact binary format.

sigil 13 years ago |

Question for jgrahamc: how much more efficient is your binary delta algorithm than cperciva's bsdiff [1]?

I assume since you've got the preimages of compression, as well as control over the compression format, that the diff and patch operations are much more efficient in space and time than they would be with arbitrary binary data. But...by how much?

[1] http://www.daemonology.net/bsdiff/

jakubw 13 years ago | |

bsdiff is not a general purpose binary delta algorithm, it's targeted at executables. When you change a single line in the source code of a program and recompile it, bsdiff produces a small diff, even though a normal binary diff between the old and new executable would be huge due to how even a single extra instruction can cause many more addresses to shift. bsdiff wouldn't be particularly useful here.

sigil 13 years ago | | |

This is true. Re-reading the bsdiff paper, it's pretty tailored to executable file formats.

http://www.daemonology.net/papers/bsdiff.pdf

j_s 13 years ago | |

I am particularly interested in this aspect of the discussion (explaining the process leading to deciding to develop a new tech in-house instead of re-using any existing approach). In an ideal world there would be plenty of experimentation with real-world data to justify things, but I don't read about that happening too often.

sigil 13 years ago | | |

Agree with you there on "profile first." But knowing jgrahamc, he did -- and I'd love to know the results.

silvertonia 13 years ago |

Could be very cool. I couldn't get through the article because it read like a press release. Maybe if someone who hasn't been spoon-fed the story reports on it, I'll take notice.

peterwwillis 13 years ago | |

I don't know why you're being downvoted, the article is written pretty shittily. The article is mostly just quotes from jgc and the CEO and some filler by the writer.

Also the assertion that "It has already cut the bandwidth used by 4Chan and Imgur by half" sounds disingenuous and possibly not backed up by moot's quote “We've seen a ~50% reduction in backend transfer for our HTML pages (transfer between our servers and CloudFlare's),”. Is backend transfer for HTML pages the only bandwidth they're using? Is the rest of it halved, and if so, how and why?

The title of the story also makes me gag.

justinsb 13 years ago |

I think this is just RFC 3229, with a binary protocol (?) http://www.ietf.org/rfc/rfc3229.txt

I've always thought there were some potential attacks there around cache disclosure (which Google avoided by going with SDCH instead).

CloudFlare controls the server and the client, so they don't need to worry about the attacks or about persuading everyone to adopt their RFC.

glymor 13 years ago |

How large is the per site cache? Are cookies part of the hash (and if so how do you strip meaningless cookies)?

Otherwise the this is more compelling for content sites like the referenced 4chan. But still very cool.

justinsb 13 years ago | |

Presuming this is RFC 3229, this is transport compression, not webserver offload.

The response is generated by the origin webserver as normal. But rather than sending that response using the normal HTTP encoding, instead the proxy first does a binary diff against any versions that the (CloudFlare) client says it has and that the (CloudFlare) proxy also has in its cache. They use e.g. ETags or MD5 to uniquely identify the entire response content.

You can still do cookie stripping etc to try to avoid the request to the webserver altogether, but that's a separate concern.

jgrahamc 13 years ago | |

There isn't a per-site cache in Railgun because it's part of our large shared in-memory cache in our infrastructure.

Currently, cookies are not part of the hash.

We have customers of all types using Railgun. As an example, there's a British luggage manufacturer who launched a US e-commerce site last month. They are using it to help alleviate the cross-Atlantic latency. At the same time they see high compression levels as the site boilerplate does not change from person to person viewing the site.

What sort of sites do you think it doesn't apply to?

glymor 13 years ago | | |

> What sort of sites do you think it doesn't apply to?

Single page webapps. In those cases the html/js is normally static and already CDN'ed and the data is a JSON API which varies on a per user basis.

There would be some gain as the dictionary would learn the JSON keys but I doubt it would be very dramatic vs deflate compared to the content sites referenced in the article.

justinsb 13 years ago | | |

Surely there is a per-site cache on the origin server (in what you call the "Listener")?

songgao 13 years ago |

I'm curious about the crypto part. Could anybody explain to me, if it's a HTTPS link, where does SSL encryption happen? Does Railgun listener talk with the origin server over HTTP or HTTPS?

If it's HTTP, then how does CDN handle certificates? Does it use CDN's certificates?

If it's HTTPS, then 1) Isn't hash gonna be a lot different if if the two versions are very alike? 2) Why does Railgun encrypt the encrypted data again?

jgrahamc 13 years ago | |

The link between CloudFlare and the customer network (i.e. between the two bits of Railgun) is TLS. We have an automated way of provisioning and distributing the certificates necessary for that part.

For the connection from Railgun to the origin server it will depend on the protocol of the actual request being handled. If HTTPS Railgun makes an HTTPS connection to the origin.

songgao 13 years ago | | |

Thanks! That makes sense now :-)

xanadohnt 13 years ago |

The change detection algorithm is clever. But this is a classic memory vs. processor problem. The real trick here is that the Railgun service instantly adds massive amounts of cache to your service; it just so happens - if their claims aren't inflated - adding these additional resources to your service is transparent. This has nothing to do with Railgun being developed on Go.

tuxidomasx 13 years ago |

Other than general traffic data compression, I've always been somewhat interested in html compression in particular.

I know lots of webservers zip their response data, but I was always curious about the things in html that show up very often and if there's a way to optimize around that.

For example, most web xml data contains a lot of common tags, like "div" and "span" and others that are specific to html. I think if you add them up, they might make up a considerable percent of traffic data. Is it possible for the web server to swap those out for a single character before it sends the data, and have the browser replace it when it arrives?

Or does zip compression already do that somehow?

wisty 13 years ago | |

Yeees, no.

Zip will replace the common tags (like "div") with a single "div" (in the compression dictionary), then a single character every time it appears (more or less - it might be less than a single byte if it's a really common tag). So there'll be a wasted overhead of a dictionary of common tags (which is kind of wasted).

It would be more efficient if both the browsers and compression algorithms could agree (beforehand) had a dictionary of common terms which would be likely to appear in the document.

If you're compressing a lot of data which is likely to be similar, you can do this with a common dictionary. See - http://stackoverflow.com/questions/479218/how-to-compress-sm...

Of course, my answer on Stackoverflow is pretty crude. You could create a dictionary used to compress the compression dictionary. Google is probably going to do this any time soon (if they haven't already) since they control the client (Chrome), server (google web server) and protocol (SPDY).

shotgun 13 years ago |

I see that the article is tagged "open source." Is CloudFront going to open source Railgun? Publish any papers?

This isn't an announcement about companies supporting Railgun...it's about companies supporting CloudFlare by installing the Railgun Listener.

cobrabyte 13 years ago |

This is the third time this week that I've read or heard about Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare.

Third time's a charm. Definitely going to have to investigate.

coolj 13 years ago |

> Today, [cloud providers Amazon Web Services and Rackspace, and thirty of the world’s biggest Web hosting companies] announced that they will support Railgun...

I can't find any such announcements; anybody have links? Based on comments further down, I wonder if the author is confused.

> CloudFlare will provide software images for Amazon and RackSpace customers to install

That is very different from the claim in the first paragraph.

eastdakota 13 years ago | |

Amazon and Rackspace customers need to install the software themselves (for now). The other listed hosts have made it one-click simple without the customer having to install anything. A couple announcements from major hosts today:

Dreamhost: http://dreamhost.com/dreamscape/2013/02/26/cloudflare-railgu... Media Temple: http://weblog.mediatemple.net/2013/02/26/the-web-just-got-fa...

pjmlp 13 years ago |

Another Go PR story.

Same thing could be easily achieved using futures or any of the asynchronous libraries available to C++, Ada, JVM and .NET languages.

DoubleCluster 13 years ago |

This is WAN optimization, right? This is already being done but usually for (VPN) connections to other branches of a company.

peterwwillis 13 years ago | |

No. This is basically binary diffing and compression.

Edit: err, you are correct, I didn't realize WAN optimization included binary diffing and compression. Should google before I comment.

zobzu 13 years ago |

Uho. Binary protocol. The problem being, it's actually bringing financial advantages over HTTP. HTTP has the advantage of being standard, simple, plain text and thus easy to work with.

Hopefully http2.0 will attempt solving this.. erm...

radd9er 13 years ago | |

is your concern that the proprietary protocols will take over the web?

zobzu 13 years ago | | |

not in particular. complex binary protocols while slightly more efficient are much harder to use, understand, and design properly.

philiac 13 years ago |

The article mentions how this compression technique is similar to image compression. Would anyone care to explain, in detail if necessary, how this is so? Thanks.

radd9er 13 years ago | |

I think its because a whole bitmap isnt streamed for every new frame, just a diff telling the player about the parts of the map that need updating.

jamieb 13 years ago |

FTA: "If it was written in C++, it would be threaded code"

Uh, why?

wmf 13 years ago | |

Because many people find threads easier to understand than callbacks?

jussij 13 years ago | |

Because that’s one approach to getting the most out of all of those multiple core CPU servers.

For Go that came for free because its Communicating Sequential Processes design does that for you.

abraininavat 13 years ago | | |

Came for free? Go takes advantage of multiple cores by using threads. CSP doesn't magically multiplex your code onto your cores.

corresation 13 years ago |

I was just looking into what SDCH is (an accept-encoding option from Chrome) and it sounds very, very similar: It generates a dictionary and then uses VCDIFF between requests. Is this related somehow?

jgrahamc 13 years ago | |

Vaguely. Both Railgun and SDHC work by compressing web pages against an external dictionary. In SDHC the dictionary must be generated (somehow), and it is intended for use between a web server and browser. Railgun is back-end for our network and automatically generates dictionaries.

http://calendar.perfplanet.com/2012/efficiently-compressing-...

jws 13 years ago | | |

Is anyone aware of a performance analysis between SDCH and one of the dynamic compressions like deflate?

I google, but all I find is people complaining their proxy/filter/appliance/diagnostic is breaking because it doesn't understand SDCH.

It seems like SDCH has been around for 4 years, I presume the lack of data means it hasn't worked out.

(I imagine that you could drastically reduce the CPU load of compression by making simple hard coded state machines for each dictionary. For content like XML or json you could easily make your field names and surrounding punctuation minimal. For many very short messages sharing a dictionary that would beat deflate on compression ratio, and for long messages of non-repeating field values it wouldn't be much worse. CPU use of expansion is probably comparable, though you might get better memory access behavior out of SDCH.)

corresation 13 years ago | | |

That is superbly illuminating. Thank you.

songgao 13 years ago | |

Railgun is used between CDN and http server, while this one seems to be between browser and http server.

Railgun only requires website to deploy a client, and cooperate with cloudflare. User's client doesn't have to be Chrome or whatever; web server doesn't need to be aware of Railgun. It's transparent to both HTTP clients and HTTP servers. SDCH, however, requires a modification to HTTP/1.1 protocol, which implies changes in both HTTP clients and HTTP servers.

Both are quite promising, though, Railgun is easier to adopt.

newman314 13 years ago | | |

Until apache or nginx implements it, I wouldn't really see SDCH as gaining any real traction.

AFAIK, Chrome is the only one supporting it from a browser standpoint.