Ruby on Rails load testing habits(rorvswild.com) |
Ruby on Rails load testing habits(rorvswild.com) |
Also don't forget that you are load testing all the dependencies of your service. Database, caching tier, external services, etc. Make sure other teams are aware!
Also nothing beats real world traffic. Users' connections will stay open longer than a synthetic tool may hold them open due to bandwidth, they make very random, sporadic requests too. Your service will behave very differently under large amounts of real world traffic vs synthetic.
Other options if you are running multiple web servers is to shift traffic around to increase traffic to 1 host and see where it fails. That is usually a very reliable signal for peak load.
And don't forget to do this on a schedule as your codebase (and your dependencies codebases) changes!
Guilty. I've had one of our partners call me one time because I'd caused a huge load on their end. Lots of apologies and embarrassment followed.
Later I mocked out those external calls with stubs which behaved similarly, in that I could specify the min/max/average wait times and error rates.
wrk does this with lua. https://github.com/wg/wrk/blob/master/src/wrk.lua
Also even things like the venerable jmeter supported pulling parameters from a csv file.
Once you've got that out of the way, don't forget that you'll want a distribution story. It does not matter how efficient your tool might be on a single machine - you'll want to distribute your tests across multiple clients for real-world testing.
"Sure it's easy" you might say, "I know UNIX. Give me pssh and a few VMs on EC2". Well, now you've got 2 problems: aggregating metrics from multiple hosts and merging them accurately (especially those pesky percentiles. Your tool IS reporting percentiles rather than averages already, right?!), and a developer experience problem - no one wants to wrangle infra just to run a load test, how are you going to make it easier?
And, this developer experience problem is much bigger than just sorting out infra... you'll probably want to send the metrics produced by your tool to external observability systems. So now you've got some plugins to write (along with a plugin API). The list goes on.
I'm very biased, but it's 2024. Don't waste your time and just use https://www.artillery.io/
1. https://www.artillery.io/blog/load-testing-workload-models
Shameless self promotion but I wrote up a bunch of these issues in a post describing all the mistakes I have made so you can learn from them: https://shane.ai/posts/load-testing-tips/
If your desired load for testing is small, it's not a big deal, of course.
Ruby with browser:
https://browserup.com/docs/en/load/ruby-load-test.html
Command-line installation:
We're currently trying to have each rails model implement a #new_example method that builds a valid subgraph filled in by Faker, ready to save. Ie a
user = User.new_example
will come with a Company.new_example if every user needs a company relationship.Still early, we'll see how it goes.
We generate data based off of your database schema and your production data (if you give us access.)
Since you've kinda already built something like this I would be curious to hear what you think!
Talking about efficiency being a priority, but using RoR. I guess that is one way of saturating the CPU.
Definitely times it isn't true. But if you're not doing a load test bc it's a pita, do it locally. Most of the time I've wanted to do this, all the action is inside the app. Just be careful to acknowledge that there could be limitations / surprises.
Most bottlenecks are either that database choices or poor code/design choices by developers. That is especially true today.
A coworker made similar claims to me about Laravel, but the framework really encourages you to do half a dozen database queries in even a pretty minimal request, and for example implemented bulk inserts as a for loop that did single inserts. If you didn't know better with an access pattern like that, you might think the database is the bottleneck long before it actually should be. Is Rails different? My sense was they are very similar.
That said, in my experience, CPU is often the ultimate bottleneck with PHP, Ruby, Python, and.. Like everything. Over the years serializers have often been a pain point; XML in PHP and RoR, and the Rails "serializers" currently. Any sort of mapping or hydration(which is a LOT of what happens in web apps) is comparatively slow, often order of magnitude or more, over something like nodejs, C#, golang, and etc.
> Most bottlenecks are either that database choices or poor code/design choices by developers
Perhaps in sheer quantity, but with experience those are often low hanging fruit. After those are addressed you are left with the pain of the language and framework inefficiencies.
Rails has really poor startup time due to loading all codepaths. We switched to Django and it runs beautifuly on AWS Lambda where our CI is more expensive than actual server costs. We're a b2b application so traffic is quite low so we REALLY don't saturate the CPU in a normal Fargate setup.
Our ~500k lines app takes multiple seconds to start, which is why I'm not really investigating a lambda-style setup... Do you have specific strategies to make startup fast?
> Finally for full scale high fidelity load tests there are relatively few tools out there for browser based load testing.
It exists as of a few months ago and it's fully open source: https://github.com/artilleryio/artillery (I'm the lead dev). You write a Playwright script, then run it in your own AWS account on serverless Fargate and scale it out horizontally as you see fit. Artillery takes care of spinning up and down all of the infra. It will also automatically grab and report Core Web Vitals for you from all those browser sessions, and we just released support for tracing so you can dig into the details of each session if you want to (OpenTelemetry based so works with most vendors- Datadago APM, New Relic etc)
Your app is significantly bigger than ours, so grain of salt.
We play very close attention to what's loaded on startup. There are two key tricks.
1. Heavy libraries/packages load at runtime and are only in the "background job" codepath.
def my_heavy_func():
from heavy_library import sum_heavy_function
sum_heavy_function()
vs the import at top of file.2. Limit which apps are loaded via `INSTALLED_APPS`, again no heavy packages.
Lambda is SUPER nice for us. The bottleneck becomes the DB. Webserver can basically never go down on its own as you can create 1000x by default.
Best of Luck!
But yes, the pattern is essentially the same, just our example methods and Faker - without Factory Bot.
Not saying everyone has nefarious reasons for doing it, but, it's just... everywhere.
I also play guitar, and there is a popular store in Europe with a pretty dang popular YouTube channel that I sometimes watch when the topic seems interesting. There was a whole kerfluffle a few months ago because one of the brand names that was getting a lot of air time on their YouTube channel was one that was financially backed by the owner of the store and a host of the channel. It took a ton of research of another YouTube to uncover this, and after it was found out, the owner of the store and host of the channel, finally disclosed his relationship with the brand he was promoting.
I feel like this was my more eye opening moment that tons of people out here on all variety of services are recommending their products but not disclosing their relationship clearly.
Now, you are saying so in your profile, but how many people are going to click into your profile?
I'm not saying you _have_ to do this, just suspecting that there are more and more people who are giving every recommendation the side eye these days because lack of disclosure. Disclosure isn't a bad thing, it just puts the bias in the open and people can gauge the recommendation more easily with that bias in mind.
None of this is probably new to you, but, trying to add something to the conversation rather than just call someone out, which is the easy and far more violent thing to do.
Rails has some tooling to help with query bloat: https://guides.rubyonrails.org/active_record_querying.html#e...
That said it specifically has tools to address this that started appearing a few years ago https://github.com/rails/rails/pull/35077
The way my team handles it is to stick Kafka in between whats generating the records (for us, a bunch of web scraping workers) and and a consumer that pulls off the Kafka queue and runs an insert when its internal buffer reaches around 50k rows.
Rails is also looking to add some more direct background type work with https://github.com/basecamp/solid_queue but this is still very new - most larger Rails shops are going to be running a second system and a gem called Sidekiq that pulls jobs out of Redis.
In terms of read queries, again I think that comes down to the individual team realizing (hopefully very early in their careers) that's something that needs to be considered. Rails isn't going to stop you from doing N+1 type mistakes or hammering your DB with 30 separate queries for each page load. But it has plenty of tools and documentation on how to do that better.
And, like, Mastodon apparently uses a queue to do things like send new user registration emails. Why not just send the email from the new user request handler? Then if there's an error, you can tell the user in the response instead of saying "okay you should get an email" and then having it go into the ether. I was under the impression this had something to do with not wanting to tie up the HTTP worker because you want it to quickly get back to doing HTTP requests, but if it can concurrently process requests, there's no issue.
Similarly they have an ingest queue for other federated servers sending them updates. But if things are fast, why wouldn't they just process the updates in the HTTP handler? You don't need a reliable queue because if e.g. you crash, the other side will not get their HTTP response, and they'll know to retry.
The new queue you linked is database backed, but the whole point is that you want to just run a job without needing to serialize anything outside of your process. It should just schedule it onto the thread pool and give you a promise for when it's done.
The Kafka thing also seems to be an example of what I mean: in Scala I'd just make a `new Queue` with a thread safe library, and have a worker pull off and do an insert every hundred rows or so, or after e.g. 5 ms have passed, whichever is first. No extra infrastructure needed, minimal RAM used, your queueing delay is in the single digit ms, and you get the scaling benefits. Takes maybe 10-20 lines of code.
You can then take that and abstract it into a repository pattern so that you could have an ORM that does batching for you with single item interfaces (for non-transactional workflows), but none of them seem to do this.
And again maybe I'm just not understanding but I really like having our background processes handled completely separately from our main web application. Maybe its just the peace of mind knowing that I can scale them independently of each other.