One Million Concurrent TCP connections(blog.whatsapp.com) |
One Million Concurrent TCP connections(blog.whatsapp.com) |
If it was possible in 2002-2004, I am not impressed that it is still possible in 2011.
One of the optimizations was to reduce the per-connection TCP buffers (net.ipv4.tcp_{mem,rmem,wmem}) to only allocate one physical memory page (4kB) per client, so that one million concurrent TCP connections would only need 4GB RAM. His machine had barely more than 4GB RAM (6 or 8? can't remember), which was a lot of RAM at the time.
I cannot find a link to my story though...
http://www.erlang-consulting.com/thesis/tcp_optimisation/tcp...
http://www.trapexit.org/Building_a_Non-blocking_TCP_server_u...
http://groups.google.com/group/erlang-programming/browse_thr...
(this erl mailing list thread is pretty typical, if you put up code, describe your app, hardware, network, database/external dependencies, etc, you'll get a ton of good advice about killing off bottlenecks. Another example
http://groups.google.com/group/erlang-programming/browse_frm...
Running netstat|grep like this on a high concurrency server takes a long time to run. I've never found a faster way to get real-time stats on our busy servers and would be interested if anyone else has.
I hope they publish how they did it - in fact let me drop them an email and see if I can convince them to do so.
FWIW, Yahoo still uses FreeBSD extensively.
http://www.metabrew.com/article/a-million-user-comet-applica...
Part 3 is my favorite. http://www.metabrew.com/article/a-million-user-comet-applica...
http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-...
The main advantage Erlang has over C/Python/Ruby/etc. is that asynchronous IO is the default throughout all its libraries, and it has a novel technique for handling errors. Its asynchronous design is ultimately about fault tolerance, not raw speed. Also, it can automatically and intelligently handle a lot of asynchronous control flow that node.js makes you manage by hand (which is so 70s!).
You can make event-driven asynchronous systems pretty smoothly in languages with first class coroutines/continuations (like Lua and Scheme), but most libraries aren't written with that use case in mind. Erlang's pervasive immutability also makes actual parallelism easier.
With that many connections, another big issue is space usage -- keeping buffers, object overhead, etc. low per connection. Some languages fare far, far better than others here.
It was several years ago, but I've done my share of high-concurrency stuff under Linux and the highest I got to was about 200K connections - at which point the single-threaded server bottlenecked at its disk I/O.
The main issue is not the actual connection count, it's what the per-socket OS overhead is (so not exhaust non-swappable kernel memory), how many sockets are concurrently active (have an inbound or outbound data queued) and if the application can handle all the events that epoll/kqueue report. This is not a rocket science by any means, and the kernel is relatively easy to fine-tune even when the actual load is present.
Using conntrack on a 1MC system will waste even more kernel memory!
From the cited article on ports of the libev event loop library: "The whole thing is a bug if you ask me - basically any system interface you touch is broken, whether it is locales, poll, kqueue or even the OpenGL drivers." with no particular details on what is broken in Mac OS X.
Issues with porting to AIX, Solaris and Windows are also discussed in that article, and with reports of errors, though with no specific details provided for those platforms.
Without error details, there is also insufficient information around whether alternatives or workarounds or fixes might exist, or whether there were bug reports and reproducers logged that would allow the vendors to address the (unspecified) errors.
The rightscale folks found the same thing:
http://blog.rightscale.com/2010/04/01/benchmarking-load-bala...
There is no /proc/net/tcp on FreeBSD. Hell, there is no /proc unless it is specifically mounted by the administrator, but tools certainly don't use it to get data.
Alternatively one can read RFC793 or, better, Stevens' "TCP/IP Illustrated".
But as awesome as kqueue is, OS X apparently broke it: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#OS_X_AN...
That is more akin to AIO Linux system, then? Otherwise, epoll/poll/select just notifies users when data is available but the actual copy is done by the user. Surprisingly this can make a huge difference when streaming large amounts of data.
http://blog.lighttpd.net/articles/2006/11/12/lighty-1-5-0-an...
We have argued here before and I have gotten downvoted into oblivion for being pedantic and distinguishing between asynchronous IO and non-blocking IO but it looks like that extra user-space memcpy can make a huge difference.
The accept filter concept was pioneered by David Filo at Yahoo! and
refined to be a loadable module system by Alfred Perlstein.
The closest you can get by using kqueue is to set a low water mark, so that a read event is only returned when there's enough data ready.Here's a more interesting bug in OSX: kqueue will sometimes return the wrong number for the listen backlog for a socket under high load.
Anyway, the idea was to avoid context switches by waiting/parsing in kernel-side till there was enough data for the client to do something else that just another gimme_more_data()-call back to the kernel.
It could even be applied to other methods than kqueue, so perhaps I remember a bit wrong that this was just for kqueues.