Say goodbye to resource-caching across sites and domains(stefanjudis.com) |
Say goodbye to resource-caching across sites and domains(stefanjudis.com) |
We faced a similar dilemma in designing the caching for compiled Wasm modules in the V8 engine. In theory, it would be great to just have one cached copy of the compiled machine code of a wasm module from the wild web. But in addition to the information leak from cold/warm caches, there is the possibility that one site could exploit a bug in the engine to inject vulnerable code (or even potentially malicious miscompiled native code) into one or more other sites. Because of this, wasm module caching is tied to the HTTP cache in Chrome in the same way, so it suffers the double-key problem.
This has been really sad & a big loss for the web, in my opinion. And it's one that we were about to emerge from[1], it seems like.
Alas, if we do go back to a more old-school CDN-based style of web scripting/javascripting, powered by our new ES Modules (& hopefully Import Maps) this new sharding-by-origin change will mean that we will never ever see the CDN hit-rate benefits we once saw.
It seems like it is a necessary change, to protect the user from being tracked, but it still hurts my heart so much, that we are so near to getting back to sharing resources on the web, only to have all that sharing snatched away. Whatever metrics you are looking at today, know that they represent a very sad state of affairs, that brought great pain & suffering to the hearts of many webdevs who aspired for much much much higher hit rates.
[1] https://www.bryanbraun.com/2020/10/23/es-modules-in-producti...
I mean to have a cache hit you need:
- Same CDN
- Same library
- Same library uploader/name
- Exact Same library version down to every byte of js
- Exact same way to refer to given version (e.g. if latest is 1.3.2 then foobar-1.3.2 and foobar-latest are not the same, except if foobar-lastest is a temporary redirect to foobar-1.3.2. But that would induce a further round trip).
If we furthermore consider that most people most times visit a small number of domains it's not to hard to reason that the value gained from caching doesn't outweigh the cost for the majority of users.
I can imagine a tonne of WP sites referencing jQuery from the Google CDN, e.g.: https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.mi...
Looking at the headers the JS asset would be cached for 1 year.
I suspect that website who are conscious of loading times are already testing performance with nothing cached. And websites that aren't conscious of loading times are probably using bundling techniques that would already make cross-site caches useless. In both cases, I'm having a hard time believing that loading JQuery is the reason anyone's website is slow.
There are theoretical schemes that could allow us to share libraries between sites without having the same privacy impacts, but I'm not sure it's even worth the effort of proposing them.
An extension keeping widely used versions of libraries preloaded as well as a small db of CDN/urls so that it can serve the pre-loaded libraries instead of the CDN ones when possible. This also could do thinks like collapse foobar-latest and foobar-X.Y.Z (X.Y.Z == latest) and could force load a different version with security patches. I.e. it would act kinda like a linux package manager for a limited part of common libraries.
It doesn't get hits like it should. It would be nice to be able to add libraries to the cache manually because many are out of date.
1) Cache hit must be extremely low because of different versions/variants/CDN for each library. (Have you seen how many jquery there are?).
2) It's irrelevant outside of the few most popular libraries on the planet, maybe jquery/bootstrap/googlefonts.
3) Content is cached once on page load and the saving are happening over the next tens or hundreds pages you visit. That's where the gain of caching is (10x - 100x). Saving 1 load when changing site is negligible in the grand scheme of things.
> The overall cache miss rate increases by about 3.6%, changes to the FCP (First Contentful Paint) are modest (~0.3%), and the overall fraction of bytes loaded from the network increases by around 4%. You can learn more about the impact on performance in the HTTP cache partitioning explainer. [0]
[0]: https://developers.google.com/web/updates/2020/10/http-cache...
Additional metrics: https://github.com/shivanigithub/http-cache-partitioning#imp...
I feel for those on low bandwidth and low data limit connections. Website developers should focus on bloat and address that. That doesn’t seem to be happening on a larger scale though.
> It's excellent that browsers consider privacy, but it looks like anything can be misused for tracking purposes these days.
Of course. Every bit of information you provide to a site will be misused to track and profile you. That’s what the advertising fueled web has gotten us to (I don’t blame it alone for the problems).
I wasn’t aware that Safari had handled the cache privacy issue in 2013. It seems like it has always been way ahead on user privacy (thought it’s not perfect by any means). I’ve been a long time Firefox user who has always cleared the caches regularly, and I’m curious to know if any browser has consistently provided more privacy out-of-the-box than Safari.
It will not only help with performance, but will also stop the absurd tooling madness that front-end has become.
Edit: The most common example. Let's say you need jQuery. The browser download the repo once and then it will be ready and available for maybe millions of website. Just think about the benefit of the saved bandwidth alone.
I cannot stop to think how stupid is to download the same assets again and again and again for every website you visit.
Also, I feel like adding another package manager wouldn't go that was...
So they are then beholden to major platforms such as Google to host sites for them from a global cache? Similar to what AMP does, but for all kinds of content?
Hmm.
The example that they give, that you're logged into Facebook, doesn't seem very useful other than maybe fingerprinting? But even then 90 some percent are going to be logged in, so the only real fingerprinting there is on the people who aren't.
There is also the possibility of leveraging this type of information in social engineering scenarios. Imagine getting compromising information on a sysadmin at a major commercial port and blackmailing a root password out of them, then leveraging that to set up a persistent threat and deleting their database every hour for a few weeks until they finally manage to lock you out again. The damage would be in the hundreds of millions. You could potentially do all the usual interesting things to foundries and/or oil refineries too if you manage to compromise insiders. Really, the sky is the limit if you use your imagination a bit.
- a decentralized way to store these libraries
- by a source with established trust (so it can't be misused for tracking)
JS/CSS library blockchain?
I see no problem with using remotely hosted resources for prototyping. But you should never let a link to remotely hosted fonts or scripts make it into production code.
I use webpack and other more secure/performant strategies for all my JS needs when working in a serious environment.
But when I am building something personal/light, I still load up cdnjs.com and drop something in. It's just easier than thinking about how I will serve files etc
There is still caching, just not caching of the same resource used by different domains.
Most people most times visit a small set of domains. All for which the resource still is cached from the previous visit.
Combine it with the small likely hood to get a hit on cross domain caching the change in traffic is likely negligible.
And just just just before ES Modules finally get good & modular & helpful, we destroy the shared cache that would have made them helpful. We have been on this voyage for 10 years, starting from the pre-modules but cached days, through the long dark & violent seas of modules-but-no-caching, to finally finally get to modules-that-cache-well. We finally have standards & tools in place that would allow us to begin to cache modules effectively, across sites. Except no, not any more.
Whatever numbers you see for this, they are lies. They don't represent any honest truth. They portray only a poor reflection of a bad place that we have been desperate to escape from. We have wanted to cache modules effectively with CDNs for years, but ES Modules had not been suitable to the task. To judge the impact based only on what we can see, without projecting to what the web we were all trying to make happen: that's incredibly sad. We'll never know. All possibility is being chopped off and cut down. This is an incredibly sad, incredibly tragic culmination to a long long struggle to make the web better, and frankly, I am disappointed beyond words that the teams have taken such an aggressive change of defaults so casually for such minimal proven harm. If there is a real security issue here, it should have been dealt with via more Content Security Policy flags, not by unweaving the web & making each site have it's own view of the world, unique from every other site. The security atmosphere is paranoid & delusional, & nothing tops their inquest for absolute security.
[1] https://www.bryanbraun.com/2020/10/23/es-modules-in-producti...
We are protecting users from sites coordinating their actions via the presence of resources. If I visit store.example they might cache a /big-spender resource. Then if I visit other.example, they can check to see if I have /big-spender cached.
As a user, I ought to be protected against coordinated tracking mechanisms like this. Content Security Policy might be able to let store.example protect it's asset, but in this case, the problem is that store.example might be deliberately exposing the cached/not-cached state of that resource to others; it is the user, not the site's content, that needs to be secured.
Thusfar the only safe we've found to do it is to have every site have it's own naive, isolated, alone view of the world. This is, alas, in my perspective, extremely unfortunate. I picture the spider web of information being cut into pieces, broken apart. But I also recognize the necessity of this. I can't stand it, but I see no alternatives. And makes me so sad that we will never ever see modules work on the web. That ~2011 was the last & will forever be the last good year for CDNs, before CommonJS & bundling took over, before we made CDNs no longer places of sharing.
No, not at all. This change gets rid of global caches.
A) Your site caches will still work, they just won't be shared across sites. A cold-cache load of Gmail will go through the exact same process as the cold-cache load of your site, and subsequent loads of both sites will be just as fast.
B) If your site's initial load time on a cold cache is unacceptable, you are already making an engineering mistake and you need to cut back on the Javascript.
C) Most large sites are already choosing to bundle their own libraries or serve them from dedicated CDNs instead of trying to coordinate with each other to make sure the same resource location is used across multiple websites.
D) Even in a theoretical world these changes were going to make the web a lot slower (and reminder, that world doesn't exist), the risk of library domination in that world would be larger than the risk of website domination. Imagine a world where you write a competitor to JQuery that's just as good, maybe even smaller and more efficient, but nobody uses it because "we have to use the popular library that's likely to be already in the user's cache."
While we're on the subject of D, the fact that nobody says that -- that we don't see smaller JS libraries thrown aside in favor of libraries/versions that are more likely to be already cached -- is strong evidence that your site sharing a JQuery cache with someone like Google or the NYT is not an important performance concern.
No.
This is about disabling cross domain caching. Which rarely has a cache hit already by now (see many other posts on this site).
> "rely on a large service to actual host your content for you"
This is the case anyway even with cross domain caching as the source you cache still needs to have the exact same URL including domain. The cross domain only refers to the site loading the resource.
So e.g. `foo.example/jquery-3.2.1` and `bar.example/jquery-3.2.1` where never treated as the same at any point in time. The only think changed is that if `foo.example` and `bar.example` both depended on `cdn.example/jquery-3.2.1` and you visited `foo.example` before `bar.example` it might already have been cached when you visit `bar.example`. Through most times it wasn't as e.g. `bar` used a different CDN a different URL to the same resource or a different version.
So this change doesn't really affect small sites more then any other side. And the effect is generally negligible.
[1] 1.13.0, 1.12.3, 1.12.2, 1.12.1, 1.11.5, 1.11.4, 1.11.3, 1.11.2, 1.11.1, 1.10.9, 1.10.8, 1.10.7, 1.10.6, 1.10.5, 1.10.4, 1.10.3, 1.10.2, 1.10.1, 1.10.0, 1.9.11, 1.9.10, 1.9.9, 1.9.8, 1.9.7, 1.9.6, 1.9.5, 1.9.4, 1.9.3, 1.9.2, 1.9.1, 1.9.0, 1.8.14, 1.8.13, 1.8.12, 1.8.11, 1.8.10, 1.8.9, 1.8.8, 1.8.7, 1.8.6, 1.8.5, 1.8.4, 1.8.3, 1.8.2, 1.8.1, 1.8.0, 1.7.12, 1.7.11, 1.7.10, 1.7.9, 1.7.8, 1.7.7, 1.7.6, 1.7.5, 1.7.4, 1.7.3, 1.7.2, 1.7.1, 1.7.0, 1.6.5, 1.6.4, 1.6.3, 1.6.2, 1.6.1, 1.6.0, 1.5.6, 1.5.5, 1.5.4, 1.5.3, 1.5.2, 1.5.1, 1.5.0, 1.4.8, 1.4.7, 1.4.6, 1.4.5, 1.4.4, 1.4.3, 1.4.1, 1.4.0, 1.3.2, 1.3.1, 1.3.0, 1.2.3, 1.2.0, 1.1.1
If we could we should make following best practice:
- Only use react and similar if you write a webapp, do not use such tools for websites. If your website is so complex that you need it you are doing something wrong.
- Have a js standard library which provides all the common tooling for the remaining non-webapp js use case.
- Make it have one version each year (or half year), browsers will preload it when they ship updates and keep the last 10 or so versions around.
- Have a small standardized JS snippets which detects old browsers which are not evergreen (like IE) and loads a polyfill.
Sure there are some requirements to get there. E.g. making it reasonable easy to have proper complex layouts in a reactive fashion without much JS or insane complex CSS. (Which we can do by now due to css grid, yay).
Can anyone comment on whether it's practical, and whether it could help here?
[0] https://developer.mozilla.org/en-US/docs/Web/Security/Subres...
AIUI it is less practical than you'd like, for several reasons that work together to greatly mitigate its impact: 1. The remote content most vulnerable to hostile change is also remote content already being loaded precisely because it changes; ad scripts, etc. Protecting your jQuery load is nice, and if someone did compromise a CDN's jQuery it would be a big issue, but it's also not what has been happening in the field. If the content has to change you can't SRI it. 2. If you have a piece of remote content that you rigidly want to never change, it's much safer to copy it to a location you control, and it's almost as easy as SRI. 3. If a subresource fails SRI there's no fallback other than simply not loading it, so it has a very graceless failure mode. This combines with #2 to make it even more important to put it in a locally-controlled area. Once local, SRI is more-or-less redundant to what SSL already gives you.
Basically, it's one of those things that looks kinda cool at first and makes you think maybe you should SRI everything, but the real use cases turn out to be much smaller than that.
Just content based addressing (e.g. ipfs) is good enough to be actually useful and allows local hosting and sharing caches at the same time.
Still neither of this would fix the privacy problem. Through with something like ipfs emulating network delays on access could work, but would be VERY hard to get right and make immune to statistical timing analysis.
It's also one of the view ways how to get some (imperfect) degree of censorship resistance without running in direct conflict with laws to e.g. hinder access the child pornography. Note that this is just imperfect protection working for countries which do not officially have censorship in their law and have no effectivish national firewall. I.e. it wouldn't work for China, it also wouldn't work if the political situation in some western countries get worse. But it does work for "non-official" censorship enacted based on not-so-legal pressure and harassment. Or corperate censorship enacted by companies supposedly on their own will.
(Without the fancy / bs bingo technology)
(Obviously also not ideal)
But you also must consider that most people most times visit a small set of domains.
Which means that most times they will have jquery and similar cached even without cross domain caching.
Should not exists IMHO ;=)
But yes, SRI is just for the case the CDN gets compromised.
- Easier prototyping and experimental usage of pre-releases
- Backward compatibility with older browsers on the first view versions at least
Edit: Typo
That's certainly true of the server on which your APIs, if any, reside, but isn't it typical for your website itself to also leverage the CDN for distribution?
Basically, you should never have a production website which calls out to cdnjs.cloudflare.com or ajax.googleapis.com or fonts.googleapis.com, you should be hosting all of your site's dependencies in the same place or set of places.
As a side perk, your site also will stop looking like trash for users who use browser extensions to block such external calls. ;)
You are only as good as your local server. Having a cdn means you need a better local server.