Reining in the thundering herd: Getting to 80% CPU utilization with Django

Reining in the thundering herd: Getting to 80% CPU utilization with Django(blog.clubhouse.com)

160 points by domino 4 years ago | 135 comments

Tangent, but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

And the solution to this problem is to slowly, rate-limited, bring the service back online, rather than letting the whole thundering herd go through the door immediately.

toast0 4 years ago | |

That's really not the traditional meaning of thundering herd, which is about waking up all the processes when a connection comes in, then they all try to accept it and it's a lot of work for nothing. You get much better results if only a single process is woken up for each event.

Your problem is a real problem though. Where I worked, we would call that backlog, and we would manage it with 'floodgates' ... When the system is broken, close the gates, and you need to open them slowly.

In an ideal world, your system would self-regulate from dead to live, shedding load as necessary, but always making headway. But sometimes a little help is needed to avoid the feedback loop of timed out client requests that still get processed on the server keeping the server in overload.

Ozzie_osman 4 years ago | |

Yea you are right. It could be a service being down and requests piling up, or a cache key expiring and many processes trying to regenerate the value at the same time, etc.

I think the article just used this phrase to describe something else. (Great article otherwise).

fanf2 4 years ago | | |

There is an explanation of this kind of thundering herd about 3/4 down this article https://httpd.apache.org/docs/trunk/misc/perf-scaling.html

The short version is that when you have multiple processes waiting on listening sockets and a connection arrives, they all get woken up and scheduled to run, but only one will pick up the connection, and the rest have to go back to sleep. These futile wakeups can be a huge waste of CPU, so on systems without accept() scalability fixes, or with more tricky server configurations, the web server puts a lock around accept() to ensure only one process is woken up at a time.

The term (and the fix) dates back to the performance improvement work on Apache 1.3 in the mid-1990s.

taylorhughes 4 years ago | | |

Phrase borrowed from excellent uWSGI docs https://uwsgi-docs.readthedocs.io/en/latest/articles/Seriali...

lookACamel 4 years ago | |

That's not the thundering herd. If someone rings the door (request), only one person (agent, process) needs to answer the door. But what might happen is that everyone in the house rushes to answer the door. The people "thundering" to the door (and making a mess as they do so) are the "herd". This can quickly become a problem if there are a lot of people in the house and the doorbell keeps ringing.

thaumasiotes 4 years ago | |

> but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

The thundering herd problem refers to what happens when (1) a bunch of agents come to you for something while you're busy; (2) you tell them all "I'm busy, go away and come back later"; and (3) the come-back-later time you give to each of them is identical, so they all come back simultaneously.

And that's exactly what's happening here, except that instead of giving each worker thread a come-back-later time when it asks for work, you're receiving work, sending out individual messages to every worker saying "hey, I'm not busy anymore, come back RIGHT NOW and get some more work", and then rejecting all but one of the thundering herd that shows up. The reason the Gunicorn docs and the uWSGI docs both refer to this as a "thundering herd" problem is that it's a near-perfect match for the problem prototype. The only difference is that, instead of giving out identical come-back-later times to worker threads as they ask you for work, you tell them to wait for a notification that includes a come-back-later time, and then when you get one piece of work you fire off that notification separately to every sleeping thread, including identical come-back-later times in each one.

toast0 4 years ago | | |

> That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

If my SLA is 24 hour response time, and the inbox is FIFO, and I can't drop old messages, I'm most likely not hitting the SLA. If they all came in overnight, I'll hit the SLA for day 1, but I will be busy all of day 2 and 3 and never respond on time. If after day 1, I get a days worth of messages every day, I'll never catch up.

c_o_n_v_e_x 4 years ago | |

This reminds me of inrush current when starting large motors... You get a huge current spike when you initially turn on the motor, so large that it can trip the breaker.

One solution is to use a soft starter which slow brings the motor up to speed.

luhn 4 years ago |

Unfortunately HAProxy doesn't buffer requests*, which is necessary for a production deployment of gunicorn. And for anybody using AWS, ALB doesn't buffer requests either. Because of this I'm actually running both HAProxy and nginx in front of my gunicorn instances—nginx in front for request buffering and HAProxy behind that for queuing.

If anybody is interested, I've packaged both as Docker containers:

HAProxy queuing/load shedding: https://hub.docker.com/r/luhn/spillway

nginx request buffering: https://hub.docker.com/r/luhn/gunicorn-proxy

* It does have an http_buffer_request option, but this only buffers the first 8kB (?) of the request.

Twirrim 4 years ago | |

Couldn't Apache httpd just do all of that for you? mod_buffer provides request buffering, and mod_proxy_balancer provides load balancing capabilities.

luhn 4 years ago | | |

Can Apache do request queuing?

jhgg 4 years ago |

This is somewhat suspect. At my place of work, we operate a rather large Python API deployment (over an order of magnitude more QPS than the OP's post). However, our setup is... pretty simple. We only run nginx + gunicorn (gevent reactor), 1 master process + 1 worker per vCPU. In-front of that we have an envoy load-balancing tier that does p2c backend selection to each node. I actually think the nginx is pointless now that we're using envoy, so that'll probably go away soon.

Works amazingly well! We run our python API tier at 80% target CPU utilization.

kvalekseev 4 years ago |

HAProxy is a beautiful tool but it doesn't buffer requests that is why NGINX is recommended in front of gunicorn otherwise it's suspectible to slowloris attack. So either cloubhouse can be easily DDOS'd right now or they have some tricky setup that prevents slow post reqests reaching gunicorn. In the blog post they don't mention that problem while recommend others to try and replace NGINX with HAPRoxy.

lddemi 4 years ago | |

1. HAProxy does support request buffering https://cbonte.github.io/haproxy-dconv/2.2/configuration.htm...

2. our load balancer buffers requests as well

kvalekseev 4 years ago | | |

From HAProxy mailing list about http_buffer_request option https://www.mail-archive.com/haproxy@formilux.org/msg23074.h...

> In fact, with some app-servers (e.g. most Ruby/Rack servers, most Python servers, ...) the recommended setup is to put a fully buffering webserver in front. Due to it's design, HAProxy can not fill this role in all cases with arbitrarily large requests.

A year ago I was evaluating recent version of HAProxy as buffering web server and successfully run slowloris attack against it. Thus switching from NGINX is not a straightforward operation and your blog post should mention http-buffer-request option and slow client problem.

TekMol 4 years ago |

Performance is the only thing that is holding me back to consider Python for bigger web applications.

Of the 3 main languages for web dev these days - Python, PHP and Javascript - I like Python the most. But it is scary how slow the default runtime, CPython, is. Compared to PHP and Javascript, it crawls like a snake.

Pypy could be a solution as it seems to be about 6x faster on average.

Is anybody here using Pypy for Django?

Did Clubhouse document somewhere if they are using CPython or Pypy?

petargyurov 4 years ago |

> Which exacerbated another problem: uWSGI is so confusing. It’s amazing software, no doubt, but it ships with dozens and dozens of options you can tweak.

I am glad I am not the only one. I've had so many issues with setting up sockets, both with gevent and uWSGI, only to be left even more confused after reading the documentation.

j4mie 4 years ago |

If you’re delegating your load balancing to something else further up the stack and would prefer a simpler WSGI server than Gunicorn, Waitress is worth a look: https://github.com/pylons/waitress

tbrock 4 years ago |

Aside: AWS only allows registering 1000 targets in a target group… i wonder if thats the limit they hit. If so, its documented.

tarasglek 4 years ago |

Have to wonder how well haproxy works vs balancing by making gunicorn listen via SO_REUSEPORT and letting the kernel balance instead (ala https://talawah.io/blog/extreme-http-performance-tuning-one-...)

JanMa 4 years ago |

Interesting to read that they are using Unix sockets to send traffic to their backend processes. I know that it's easily done when using HaProxy but I have never read about people using it. I guess the fact that they are not using docker or another container runtime makes sockets rather simple to use.

kvalekseev 4 years ago | |

It's standard way to connect things in UNIX and provides better performance. For example postgresql tcp+ssl is 175% slower than socket https://momjian.us/main/blogs/pgblog/2012.html#June_6_2012

lttlrck 4 years ago | | |

But domain sockets only work between processes on the same machine, why would SSL be used in that case?

mst 4 years ago | |

I do that every chance I can get.

At a guess, it's probably most loved by people picking old school simple architectures that aren't the sort of thing that goes viral.

ram_rar 4 years ago |

> Python's model of running N separate processes for your app is not as unreasonable as people might have you believe! You can achieve reasonable results this way, with a little digging.

I have been through this journey, we eventually migrated to Golang and it saved a ton of money and firefighting time. Unfortunately, python community hasnt been able to remove GIL, it has its benefits (especially for single threaded programs), but I believe the cost (lack of concurrent abstractions. async/await doesn't cut it) far outweigh it.

Apart from what the article mentions, other low hanging fruits worth exploring are

[1] Moving under PyPy (this should give some perf for free)

[2] Bifurcate metadata and streaming if not already. All the django CRUD stuff could be one service, but the actual streaming should be separated to another service altogether.

jstrong 4 years ago | |

I read the article and could not believe that was their takeaway. sometimes people are determined to vindicate their technology choices, no matter what.

stu2010 4 years ago |

Interesting to see this. It sounds like they're not on AWS, given that they mentioned that having 1000 instances for their production environment made them one of the bigger deployments on their hosting provider.

If not for the troubles they experienced with their hosting provider and managing deployments / cutting over traffic, it possibly could have been the cheaper option to just keep horizontally scaling vs putting in the time to investigate these issues. I'd also love to see some actual latency graphs, what's the P90 like at 25% CPU usage with a simple Gunicorn / gevent setup?

ksec 4 years ago | |

I was wondering that too, but there aren't that many common cloud provider that has 96 vCPU offering.

I am also wondering on 144 Workers, on 96 vCPU which is not 96 CPU Core but 96 CPU thread. So effectively 144 Workers on 48 CPU Core possibly running at sub 3Ghz Clock Speed. But it seems they got it to work out in the end. ( May be at the expense of latency )

mst 4 years ago | | |

Assuming you're running a system where normal request/response handling blocks on database queries it's often optimal to have more workers than available cpu threads and 1.5x is a common rule of thumb to try first.

dilyevsky 4 years ago |

Kinda funny they decided paying a ton of money to aws was ok but paying for nginx plus was not

Spivak 4 years ago | |

I kinda get that honestly. It’s why I’ll spend $20 without even thinking for take out but not spend $2 for an app. It’s because the cost off the software is way way more than the money. It’s a commitment to actually use it and integrate it, deal with their sales team, talk to purchasing, handle licensing, and introducing friction to replacing it or using tools that don’t integrate well because “well we already pay for it.” Licensing also complicates deployments substantially when you’re doing lots of autoscaling.

And on top of that Nginx Plus is also expensive as hell.

rowanG077 4 years ago | | |

The buy in into AWS is much, much larger then using a piece of software though.

dilyevsky 4 years ago | | |

Don’t you have to integrate cloud? This whole post is about having to put a bunch of workaround bc the cloud can’t scale apparently

ClumsyPilot 4 years ago | | |

"It’s why I’ll spend $20 without even thinking for take out but not spend $2 for an app."

I pay for apps, its not a healthy attotude

spullara 4 years ago | |

The difference people see, as far as I can tell, is that AWS is charging you cost+ and pure software companies need to charge for value or die.

dilyevsky 4 years ago | | |

Maybe for barebones compute they’re cost+ but I don’t think that’s really true for other services. For example traffic should cost effectively zero to them but they charge a huge premium. Some other managed services also appear to use value based pricing

andrenotgiant 4 years ago | |

ClubHouse runs on AWS?

dilyevsky 4 years ago | | |

Hm actually might be google based on what their traffic is going to (i only looked just now). Ok now it makes more sense why support wasn’t able to figure this out =)

vvatsa 4 years ago |

ya, I pretty much agree with 3 suggestions at the end:

* use uWSGI (read the docs, so many options...)

* use HAProxy, so very very good

* scale python apps by using processes.

latchkey 4 years ago |

If it is just a backend, why not port it over to one of the myriad of cloud autoscaling solutions that are out there?

The opportunity cost of spending time figuring out why only 29 workers are receiving requests over adding new features that generate more revenue, seems like a quick decision.

Personally, I just start off with that now in the first place, the development load isn't any greater and the solutions that are out there are quite good.

lddemi 4 years ago | |

Author here. We do and did use autoscaling heavily but at a certain scale we just ran out of headroom on the smaller instance types we were using. Jumping to a much larger instance types meant that we will likely never run into those headroom issues again, plus solves other problems like faster spin up, better sidecar connection pooling and allows for a much higher hit rate on per instance caching.

TekMol 4 years ago | | |

Did you consider switching form CPython to Pypy?

latchkey 4 years ago | | |

You were autoscaling a single threaded process. You had 1000 connections coming in and scaling 1000 workers for those connections. Everything was filtered through gunicorn and nginx, which just adds additional latencies and complexity, for no real benefit.

What I'm talking about is just pointing at something like AppEngine, Cloud Functions, etc... (or whatever solution AWS has that is similar) and being done with it. I'm talking about not running your own infrastructure, at all. Let AWS and Google be your devops so that you can focus on building features.

trinovantes 4 years ago |

I've always used nginx for my servers. Is HAProxy that much better to consider learning/switching?

lmilcin 4 years ago |

1M requests per minute on 1000 web instances is not an achievement, it is a disaster.

It is ridiculous people brag about it.

Guys, if you have budget maybe I can help you up this by couple orders of magnitude.

sdze 4 years ago |

use PHP ;)

nsizx 4 years ago | |

So much this. Practically any other option is better than Python for web development if you're looking for performance.

void_mint 4 years ago | | |

By this logic, why not Java, C++, Rust, Go, C#?

They’re all web-capable and blow the doors off PHP, Python, etc.

waprin 4 years ago | | |

Yet YouTube, Instagram, Pinterest, Reddit, Robinhood, DoorDash, and Lyft backend were originally primarily written in Python. What’s funny is that nobody can really deny Python is slow yet somehow the biggest websites in the world were written in it. More proof that Worse Is Better?

sdze 4 years ago | | |

It blows my mind how quickly PHP7.4 processes even shitty code.

catillac 4 years ago |

Famous last words, but I get the sense that the need to handle this sort of load on Clubhouse is plateauing and will decline from here. The app seems to have shed all the people that drew other people initially and lost its small, intimate feel and has turned into either crowded rooms where no one can say anything, or hyper specific rooms where no one has anything to say.

Good article though! I’ve dealt with these exact issues and they can be very frustrating.

hnarayanan 4 years ago | |

This

polote 4 years ago |

I wouldn't be very proud of writing an article like that.

Usually engineering blogs exists to show that there are fun stuff to do in a company. But here it just seems they have no idea, what they are doing. Which is fine, I'm classifying myself in the same category.

Reading the article I don't feel like they have solved their issue, they just created more future problems