The 5-Hour CDN(fly.io) |
The 5-Hour CDN(fly.io) |
Varnish has this built in - good to see it's easy to configure with NGINX too.
One of my favourite caching proxy tricks is to run a cache with a very short timeout, but with dog-pile prevention baked in.
This can be amazing for protecting against sudden unexpected traffic spikes. Even a cache timeout of 5 seconds will provide robust protection against tens of thousands of hits per second, because request coalescing/dog-pile prevention will ensure that your CDN host only sends a request to the origin a maximum of once ever five seconds.
I've used this on high traffic sites and seen it robustly absorb any amount of unauthenticated (hence no variety on a per-cookie basis) traffic.
Request coalescing can be incredibly beneficial for cacheable content, but for uncacheable content you need to turn it off! Otherwise you'll cause your cache server to serialize requests to your backend for it. Let's imagine a piece of uncacheable content takes one second for your backend to generate. What happens if your users request it at a rate of twice a second? Those requests are going to start piling up, breaking page loads for your users while your backend servers sit idle.
If you are using Varnish, the hit-for-miss concept addresses this. However, it's easy to implement wrong when you start writing your own VCL. Be sure to read https://info.varnish-software.com/blog/hit-for-miss-and-why-... and related posts. My general answer to getting your VCL correct is writing tests, but this is a tricky behavior to validate.
I'm unsure how nginx's caching handles this, which would make me nervous using the proxy_cache_lock directive for locations with a mix of cacheable and uncacheable content.
Know how to deal with cacheable data. Know how to deal with uncacheable data. But by all means, know how to keep them apart.
Accidentally caching uncacheable data has lead so some of the most ugly and avoidable data leaks and compromises in recent times.
If you go down the "route everything through a CDN route (that can be as easy as ticking a box in the Google Cloud Platform backend), make extra sure to flag authenticated data as cache-control: private / no-cache.
https://arstechnica.com/gaming/2015/12/valve-explains-ddos-i...
Caching is HARD.
Not quite the same layer, but in node.js I’m a fan of the memoize(fn)->promise pattern where you wrap a promise-returning function to return the _same_ promise for any callers passing the same arguments. It’s a fairly simple caching mechanism that coalesces requests and the promise resolves/rejects for all callers at once.
The thundering herd problem specifically refers to what happens if you coordinate things so that all your incoming requests occur simultaneously. Imagine that over the course of a week, you tell everyone who needs something from you "I'm busy right now; please come back next Tuesday at 11:28 am". You'll be overwhelmed on Tuesday at 11:28 am regardless of whether your average weekly workload is high or low, because you concentrated your entire weekly workload into the same one minute. You solve the thundering herd problem by not giving out the same retry time to everyone who contacts you while you're busy.
Thanks!
But this is much, much harder to do once you are already streaming the response - if the time to first byte (TTFB) is quick, but the connection is low-throughout, you can’t do much at this point. But nearly all modern implementations stream the bytes to all clients immediately; they don’t try to fill the cache first (they do it simultaneously).
Some implementations might avoid fanning in too much - maintaining a smaller pool of connections rather than trying get to ”1”, but that’s ultimately a trade-off at each layer of the onion, as they can still add up.
(I worked at both Cloudflare and Google, and it was a common topic: request coalescing is a big deal for large customers)
We have a distributed CDN-like feature in the hosted version of our open source search engine [1] - we call it our "Search Delivery Network". It works on the same principles, with the added nuance of also needing to replicate data over high-latency networks between data centers as far apart as Sao Paulo and Mumbai for eg. Brings with it another fun set of challenges to deal with! Hoping to write about it when bandwidth allows.
The briny deeps are filled with undersea cables, crying out constantly to nearby ships: "drive through me"! Land isn't much better, as the old networkers shanty goes: "backhoe, backhoe, digging deep — make the backbone go to sleep".
Hopefully I'll get to keep working on projects that can make use of it because it feels like a polished 2021 version of Heroku era dev experience to me. Also, full disclosure, Kurt tried to get me to use it in YC W20 - but I didn't listen really until over a year later.
I moved my authoritative DNS name servers over to Fly a few months ago. After some initial teething issues with Fly's UDP support (which were quickly resolved) it's been smooth sailing.
The Fly UX via the flyctl command-line app is excellent, very Heroku-like. Only downside is it makes me mad when I have to fight the horrendous AWS tooling in my day job.
The folks who actually run the network for them are super clueful and basically the best in the industry.
I was under the impression that fly.io today (though they are working on it) doesn’t do anything unique to make hosting elixir/Phoenix app easier.
See this comment by the fly.io team.
huh yeah never thought about it
I blame how CDNs are advertised for the visual disconnect
CDN software might be simple in the basic happy case, but you still need a Network of nodes to Deliver the Content.
[1] http://www.squid-cache.org/ [2] https://en.wikipedia.org/wiki/Edge_Side_Includes
Out of the box nginx doesn't support HTTP/2 prioritisation so building a CDN with nginx doesn’t mean you're going ti be delivering as good service as Cloudflare
Another major challenge with CDNs is peering and private backhaul, if you're not pushing major traffic then your customers aren't going to get the best peering with other carriers / ISPs…
If a low priority response is served before a high priority one the page is likely to be slower to render etc.
"If you can run code on it, you can own it". Your front page could just be a tiny loader js that fires off a fetch() for a zero byte resource to all your mirrors, and then proceeds to load the content from the first responder.
Can anyone expand on how/why "the Internet is moving away from geolocatable DNS source addresses"?
To further reduce load and latency to your origin, you can use stale-while-revalidate to allow the CDN to serve stale cache entries for some specified amount of time before requiring a trip to your origin to revalidate.
It's also worth mentioning that even when revalidating on every request (or not caching at all), routing through a CDN can still improve overall latency because the TLS can be terminated at a local origin server, significantly shortening the TLS handshake.
Just hoping they come back around on CockroachDB-- I feel like it's a match made in heaven for what they're providing.
Tinkering has been great but the addon style pricing scares the jeebs out of me (my wallet), I just assume I can't afford it for now and spin up a DO droplet. The droplet is probably more expensive for my use case but call it ADHD tax haha, at least it's capped
I would think you'd need to do it that way, you wouldn't want to reply "done" to the first increment if that operation is going to be batched up with other ops; you'd want to keep that connection hanging until all the increments you're going to aggregate have all been committed by the backend.
In the select coalescing case, except for bookkeeping overhead, none of the queries are slower (it's a big net win all around because not only do clients get their answers on average somewhat sooner, but the DB doesn't have to parse those queries, check for them in the query cache, or marshal N responses).
But in the increment/write case, it seems like in order to spare some DB resources, some clients will perceive increased write delays (or does it still net a win because the DB backend doesn't have to deal with the contention?).
If a round trip to New York is too long, then twenty of them is way worse. So I can either do 20 round trips to Nevada, which does <20 round trips to Chicago, which does <<20 round trips to New York. Or, I can do some more cleverness with transport and session bootstrapping and end up with 14 round trips to New York.
no-cache means that the response may be stored in any cache, but cached content MUST be revalidated before use.
public means that the response may be cached in any cache even if the response was not normally cacheable, while private restricts this to only the user agent's cache.
no-store specifies that this response must not be stored in any cache. Note that this does not invalidate previous cached responses from being used.
max-age=0 can added to no-store to also invalidate old cached responses should one have accidentally sent a cacheable response for this resource. No other directives have any effect when using no-store.
Edit: And now I see that you just copied bits from the Moz Dev page. I'll have to start using those more. I think the MS docs always come up first in Google.
Also note that I only mentioned the usual suspects - there are many more options, like must-revalidate.
But we've got full-time people working on Elixir now, too; we'll see where that goes. We've still got Elixir limerence here. :)
All your failing requests batch up when your retry strategy sucks, then you end up really high traffic on every retry, and very little in between
Imagine you’re a service like Feedly, and one of your “direct customer” API clients — some feed-reader mobile client — has coded their apps such that all of their connected clients will re-request the specific user’s unique feed at exact, crontab-like 5-minute offsets from the start of the hour. So every five minutes, you get a huge burst of traffic, from all these clients—and it’s all different traffic, with nothing coalescesable.
You don’t control the client in this case, but nor can you simply ban them—they’re your paying customers! (Yes, you can “fire your customer”, but this would be most of your customers…)
And certainly, you can try to teach the devs of your client how to write their own jitter logic—but that rarely works out, as often it’s junior frontend devs who wrote the client-side code, and it’s hard to have a non-intermediated conversation with them.
It depends on the request type. Is it cacheable? Do you require a per-client side effect? ...
E.g. all clients checking for an update at 10:00 UTC every day, all clients polling for new data at fixed times, etc.
The only way to solve thundering herd - which is that a load of all requests arrive within a short timespan - is to distribute requests over larger timespan.
Reducing your herd size by having fewer requests does not solve thundering herd, but may make it bearable.
The best you can do with clients that are out of your control is to publish a client library/SDK for your API that is convenient for your customers to use and implements best practices like exponential backoff, jitter, etc. If you have documentation with code snippets that junior devs are likely to copy and paste, include it in those.
If you've painted yourself into a corner like you describe and are seeing extremely regular traffic patterns, you might be able to pre-cache. Ie, it's 12:01 and you know that a barrage is coming at 12:05. Start going down the list of clients/feeds that you know are likely to be requested based on recent traffic patterns and generate the response, putting it in your cache/CDN with a five minute TTL. Then at least a good portion of the requests should be served straight from there and not add load to the origin. There are obviously drawbacks/risks to that approach, but it might be all you can really do.
In general, try to avoid not having any control over the client and if you must lack control over the client (such as if you're a pure SaaS company selling a public API), you can apply jitter based on API key in addition to other metrics I mentioned above.
As better engineers than I used to say at a previous engagemen: "if it's not in the SLA, it's an opportunity for optimization"
It’s somewhat hard in our case, as our direct customers (like the mobile app I mentioned) have API keys with us, but they don’t tell us about which user of theirs is making the request. And often they’ll run an HTTP gateway (in part so that they don’t have to embed their API key for our service in their client app), so we don’t even get to see the originating user IPs for these requests, either. We just get these huge spikes of periodic traffic, all from the same IP, all with the same API key, all about different things, and all delivered over a bunch of independent, concurrent TCP connections.
I’ve been considering a few options:
- Require users that have such a “multiple users behind an API gateway” setup, to tag their proxied requests with per-user API sub-keys, so we can jitter/schedule based on those.
- Since these customers like API gateways so much, we could just build a better API gateway for them to run; one that benefits us. (E.g. by Nagle-ing requests together into fewer, larger batch requests.) Requests that come as a single large batch request, could be scheduled by our backend at an optimal concurrency level, rather than trying to deal with huge concurrency bursts as we are now.
- Force users to rewrite their software to “play nice”, by introducing heavy-handed rate-limiting. Try to tune it so that the only possible way to avoid 429s is to either do gateway-side request queuing, or to introduce per-client schedule offsets (i.e. placing users on a hash ring by their ID, so for a periodic-per-5-minutes request, equal numbers of client apps are set to make the request at T+0, vs. T+2.5.)
- Introduce a middleware / reverse-proxy that holds an unbounded-in-size no-expire request queue, with one queue per API key, where requests are popped fairly from each queue (or prioritized according to the plan the user is paying for). Ensure backends only select(1) requests out from the middleware’s downstream sockets as quickly as they’re able to handle them. Require API requests to have explicit TTLs — a time after which serving the request would no longer be useful. If a backend pops a request and finds that it’s past its TTL, it discards it, answering it with an immediate 504 error.