Fair: A Go library for serving resources fairly(github.com) |
Fair: A Go library for serving resources fairly(github.com) |
Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.
I tried writing another algorithm for network splitting but didn't get any better results. [1]: https://www.csg.uzh.ch/publications/details.php?id=1007
> have they claimed that?
The mention of "token bucket" in the project readme is why I wrote the comment I did.
... FAIR [only throttles] when there's a genuine shortage of resources as opposed to the approaches like token bucket or leaky bucket which may reject requests ...As usual, tradeoffs are everywhere.
I am very intrigued to find out how this would fit in, if at all.
See, e.g., https://docs.aws.amazon.com/wellarchitected/latest/framework...
This project, however, looks like a concurrency limiter, not a rate limiter. I'm also not sure how it works across a load-balanced cluster.
Unfortunately the title of the GitHub repo ("A Go library for serving resources fairly") is misleading. This is not a server; it's a library that a server can utilize to determine whether a request has exceeded fairness bounds and should be rejected with an HTTP 429 (too many requests) response.
[1]: https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine...
That said, it’s quite easy for a big job to exceed 50x the cost of the smallest job.
This whole problem gets way more complicated than our intuition generally can work with. The pathological distribution of the size of various workloads and the pathological distribution of the variety of resources that tasks can consume is not modeled well by our human brains, who really want to work with tasks that are essentially uniform. But they never are. A lot of systems end up punting, either to the OS which has to deal with this anyhow, or to letting programs do their own cooperative internal scheduling, which is what this library implements. In general "but what if I 'just'" solutions to this problem have undesirable pathological edge cases that seem like they "ought" to work, especially at the full generality of an operation system. See also the surprisingly difficult task of OOM-killing the "correct" process; the "well obviously you 'just'" algorithms don't work in the real world, for a very similar reason.
As computers have gotten larger, the pathological distributions have gotten worse. To be honest, if you're thinking of using "fair" it is likely you're better off working on the ability to scale resources instead. There's a niche for this sort of library, but it is constantly shrinking relative to the totality of computing tasks we want to perform (even though it is growing in absolute terms).
I do not think languages/runtimes typically implement that kind of prioritization/limiting mechanism.
My previous (very long) project was in such a state when I got there that it was only in the last year I was there that measuring things in microseconds was something I could get on other people's radars. I wish I had started sooner because I found a lot of microseconds in about a four month period. That was the most intensive period of user-visible latency reduction I saw the entire time it was there and second through fourth place took years of manpower to accomplish. Past me and future me are both still mad about that.
This isn't as accurate, but it's often adequate. I have never liked making a network request to see if I can make a network request; too slow. (I do use this architecture to rate-limit my own website, mostly because I wanted to play with writing a service to do that. But I'd hesitate to make someone else use that.)
In terms of preserving state, the answer for rate limiting is that it is almost always far, far less dangerous to fail open than it is to deny requests during a failure. If you really, really, wanted to preserve state (something I'd suggest avoiding for a rate limiter), several KVS's have optional persistence you can turn on, for example, Redis' AOF.
The end services themselves should be designed with some sort of pushback mechanism, so they shouldn't be in any danger of overloading, regardless of what's going on with the rate limiter.
If you wanted to throw another layer of load balancer in, there are consistent hashing-adjacent strategies in nginx+ that would allow you to go from 2 ingress routers to 3 shards with rate limiters to your services, using one KV store per box. But I highly suspect that the latency profile there will look remarkably similar to ingress routers doing rate limiting talking to a KV store cluster.