Scaling React Server Side Rendering(arkwright.github.io) |
Scaling React Server Side Rendering(arkwright.github.io) |
One question for the author, if they're reading: how did you prep for writing this piece and gather the story details? It's quite a journey, and - if I was writing something like this - I would have a hard time keeping track of all the different twists, stats and lessons as they happen so they can be written up later. Do you keep a notebook, or did you rebuild the story from artifacts?
I didn't really do any preparation. I had recently been thinking that pretty much every website I had ever built was sadly no longer in existence, and so I wanted to start producing real, "tangible" artifacts from my work; something that might have a shelf life of more than a couple of years. I had those recent SSR adventures in mind, and wanted to write them down before they faded from memory.
I usually begin with a bulleted todo list of insights or topics I want to cover. Then I dive in, and write in a more or less stream of consciousness fashion, which causes me to think of more topics to add to that list. I comb over the article and the list iteratively, reordering stories, editing, and and adding context as I go, until the result feels right. In this case I didn't have any notes, the content was rebuilt from memory.
So thanks for this great article.
If for some reason you're a bitter backend dev who doesn't use react, but are reading this comment... Do yourself a favor and read the article. Really good stuff about load balancing, keeping an eye response times by percentile, etc.
> Because we were seeking to improve performance, we became interested in benchmarking upgrades to major dependencies. While your mileage may vary, upgrading from Node 4 to Node 6 decreased our response times by about 20%. Upgrading from Node 6 to Node 8 brought a 30% improvement. Finally, upgrading from React 15 to 16 yielded a 25% improvement. The cumulative effect of these upgrades is to more than double our performance, and therefore our service capacity.
Free optimization, ripe for the taking!
Also there is a more common scenario where updating one thing requires updating other packages and through a long chain of denependencies one of the pieces being updated has something missing in the new version (that was available in the previous version) and anything that relies on that will stop working.
Anyway, even the best case scenario where everything is perfectly fine after the updates still requires detailed testing to ensure that really everything is as OK as it seems. So even then this is not totally free
But then of course, it may still be the easiest path for improving the performance.
upgrading should not be seen as an alternative to performance engineering though. Even if upgrading _does_ bring in some performance improvements.
Upgrading should be because of reasons such as security updates, and bug-fixes, and to continue to reap the improvements/features in the next version.
* Upgrading NodeJS indeed gives us massive performance boosts, but apply with caution. Ideally have a set of visual regression tests just to be safe
* Profile your NodeJS code just like you do with browser code. Sometimes the bottleneck could be in an Express middleware or in reading a massive Webpack manifest file
* If a component doesn't need to rendered on the server, don't do it. Don't waste CPU cycles (for ex, out-of-view content). Just make sure you got your SEO meta tags right
* Don't load more data than you need. It takes time to parse, it takes time to loop through and it takes time to stringify for rehydration
* Enable BabelJS's debug mode and remove unnecessary plugins
* Don't import more stuff than you need. Tree-shaking is important on the server-side too
* If you're using CSS modules, use the Webpack loader `css-loader/locals` on the server so that it doesn't emit CSS files (useless). The client compiler should do so
* Monitor your server-to-server requests. They're usually what take the longest, so cache the most important ones
* As with the majority of websites, cache is king
* Properly serialize your JSON strings. That's what we use: https://gist.github.com/eliseumds/6192135660267e2c64180a8a9c...
* It can be worth it to return a dangerous HTML string from a component instead of a tree of React nodes. We do that when we render SVGs and microdata tags
Again, it's a pain-in-the-butt. You'll have checksum errors, need to synchronize clock, polyfill Intl APIs because they're inconsistent and so on.
But here's an unpopular opinion: Server side rendering shouldn't even be a thing. Running a language as dynamic as Javascript on servers, is at best - a problem that can be dealt with, but not necessarily the solution.
I'm saying this as a full-time Javascript developer. We can do better than mandating JS on the servers.
#1. SPA, Components and functional programming is the best thing that happened to web development in the recent past. So, let's stick with it.
#2. But we are stuck with Javascript to embrace these otherwise abstract engineering methods, because browsers are stuck with JS.
#3. Webassembly is here. So why not a UI-framework, that embraces components, SPAs and functional programming but with a better language (something like Elm). A language that compiles to webassembly for browsers to run logic & build UI and runs natively on servers? This hypothetical system should compile to HTML on the servers and support smooth progressive hydration.
Running a bunch of JS on the servers, on a piece so critical like rendering HTML will always be a suboptimal solution. Imagine saving all that server-scaling costs with a much server-cost-friendly language like Rust or Swift?
> when the server is under high load, skip the server-side render, and force the browser to perform the initial render.
With this type of contingency plan, I don't see any reason to use server-side rendering at all. We build all of our sites and apps with React and don't do it at all, for any reason at this time.
Is there something we're missing?
Following your journey through troubleshooting load balancing and caching brought back memories for me. I don't know what you're using for caching, but JSR-107 has been around for nearly 20 years. You might want to check out https://commons.apache.org/proper/commons-jcs/. I know it's not Javascript, but it will solve your caching problem in an orderly way. You shouldn't have to start from scratch on caching. You might even consider telling your content creators something like "updates to the site will only take effect the next day" so you can just invalidate the entire cache once a day and be done with it. Keep it simple.
Really nicely done. Thanks for taking the time to write this — I enjoyed reading it!
I'm not sure if the second half is "... And squid or another caching web proxy" - but I'm open to the ssr pipeline being far enough from REST (the architectural pattern) that caching is broken, and something more application level, like redis/memcache or a custom cache is needed.
[1] https://www.haproxy.com/blog/four-examples-of-haproxy-rate-l...
Probably more of the confusion lies in the line between those things — like a CRUD app or a dashboard or management tools. You could do them server-side for better initial performance, but you could also get better interactivity client-side.
I think a lot of new projects go towards the interactivity and “slick UI” side of things which is partially why we see more focus on client-side things these days. Speaking to myself (a full stack dev with a front end focus), we front end devs would really benefit from caring about performance and stability more.
Everything before a login => SSR, everything after => SPA.
Why? SSR is proven to be much better at SEO. But SPAs offer best UIs. Nobody wants to click through stuttery SSR dashboards in 2019, wait for page loads, submits, etc. People prefer slick UIs, that was one of the reasons DigitalOcean got big (because of their then stunning dashboard or after-login-experience [1]) and hence every other hoster copied their interface.
[1] DigitalOcean's dashboard experience was for a long time the main teaser (as an animated gif/video) on their landing page.
Only thing that I thought was out of scope/unrealistic for most teams was having a 6 month time bomb on traffic and deciding to build a load balancer.
> The cumulative effect of these upgrades is to more than double our performance, and therefore our service capacity.
That's a risky conclusion, in that it's likely over-generalised.
The upgrades may have improved the average performance, but they might introduce some performance impact on less well trodden paths, things that may strike at the least convenient time. There are performance gotchas that show their faces when load increases (system cache inefficiencies, etc. etc.) Some of the times I've been most hurt, operationally, have come when what looks great in generalised circumstances turns out to have a nastier under-load behaviour.
That said, always watch out for upgrades and make patching/upgrading a priority task. If there is a CVE attached to an upgrade, you want to be deploying that as fast as humanly possible. That means making sure there are as few human-involved steps as possible in your build/test/deployment chain.
Web app production optimization with this kind of style would be a godsend.
One interesting consequence of this process is that every drawing needs to be perfect the first time. I didn't realize how much I lean on undo/redo until it wasn't there any more!
That doesn’t look like “properly”. The double escaping is overcomplicated and no safer compared to a direct
window.__productreview_data = ${escapedReduxStateJsonString};
(and forgets about \v, maybe others), the transformation doesn’t preserve “</_escaped_script”, and it doesn’t address a vulnerability involving <!-- that’s contrivable.Closer to correct:
JSON.stringify(data)
.replace(/\u2028/g, '\\u2028')
.replace(/\u2029/g, '\\u2029')
.replace(/</g, '\\x3c')
Better, if you put the JSON in an inert <script> (type="application/json"), it’s only necessary to escape < (or /<[/!]/g). This is a good idea so you can use restrictive CSPs.We'll consider "application/json", makes sense.
https://github.com/yahoo/serialize-javascript
You can use it in JSON mode like this: `serialize(obj, {isJSON: true});`
The issue that can be runned into here is possibly duplicating efforts on the server and JS side. You can keep them separate enough but it's tough if you're used to creating everything either through the server or through JS on the client.
The last web app I worked on like this is unfortunately not public but performed rather well and wasn't all that difficult to maintain either.
Until about 5-6 years ago, this was the way people tried to write webapps (they didn’t always get there, but at least they aimed for it). It’s really startling that it’s something that people have already forgotten.
It really isn’t terribly difficult to do this. Frameworks like Rails make it pretty easy to do, out of the box.
Citation needed.
That's how most of the world's population will experience your site.
I'm baffled that so many sites have invested in multiple megabytes of JavaScript to render their pages. It's like our entire industry has forgotten how to build sites that can be used by anyone who's not on an LTE iPhone.
Unless you're FAANG you probably don't care about most of the world's population, but the tiny slice that is most likely to see your site and generate revenue for you.
It's not baffling that most developers don't make things for most people. That's just a waste of time and money.
Most sites/apps don't work at all at this speed with the notable exception of Hacker News and Facebook Messenger.
With the react app there's a chance I'll have the js already cached and I'm only fetching the data. With the SSR I'll probably be fetching data and view continuously.
Just not true. We test all your react sites with $100 Android phones. They render them all fine and fast.
There's some good write up here: https://medium.com/@benjburkholder/javascript-seo-server-sid...
Essentially content that requires client side rendering gets stuck waiting for Google to have resources available to render the content, which is likely scaled to keep the queue at what _they_ determine to be a reasonable length. That can mean a lag of up to days or weeks after the page was first touched before they have the content indexed and in SEO. Worse, search engines don't cache javascript the same way that your end users do. An end user returning to your site likely has all your javascript pre-cached ready to go. A search engine will be starting from scratch each time. That's a lot more requests hitting your server (vs server-side rendering) and so a slower page response time (which impacts SEO rank)
My first gut instinct to this hybrid SSR-CSR approach in the article was that it's nasty, but I really like the trade-off they hit. That's a really nice failure mode. Something happening on the client side is way better than nothing, even if that something is slow.
On a side note, server-side rendering is almost certainly more efficient overall. When running under a JIT environment, methods will have already been compiled down to native code. The code is likely already in memory and ready to execute so there is no parsing / compilation time involved (and you have the advantage of OS level caching, cpu caches etc on your side). The server CPU is almost certainly faster than the client's CPU and will be more likely to be in an awake state (there is measurable latency in a CPU waking from any of the sleep or lower power/frequency states). You're trading off once-per-server parse/compilation phase vs one-per-client-per-access.
SEO: If you don't server-side render your website, your SEO will certainly suffer. Google says they execute your JS, and to some extent they do, but your ranking will be worse and fewer pages will be scraped. We experienced this first hand.
Perf: User will stare at a blank page while they wait for your JS to download and execute. On a slow connection this could be many seconds slower than SSR.
If you have 5 pages that take seconds to render you might be fine, if you have 100k pages that take 50ms to render you might be in trouble.
Some of the detail of this is exposed in the webmaster tools that Google provide.
For example, copying a route from your SPA and pasting it into Twitter, Facebook, Slack, Teams, etc. Preferably you want the copied page's title, description and icon and not just your SPA's generic info.
Technologies used outside of their context and "developers" having to deal with complex problems they should not even have in the first place.
The benefit of that is getting all the effects of 'just plain HTML' while still being able to dynamically update parts of the page, and doing the entire thing in a single framework such that (if you do it right) your code doesn't have to explicitly do anything to specially handle the two different use cases.
A case of vital context missing. Ever tried something like Next.js? It's a react framework for client + server-side rendering. What's really cool about it is that it does SSR for the initial view and the client-side react picks it up from there. It doesn't sound very impressive like that but where it blew my mind was when I realized that I could save URLs for some deeply nested view in a react site, enter that URL later and I'd get an instantaneous SSR'd view seamlessly.
I guess what I'm trying to say is that some tech is cool like that, not really a contingency plan but a plan.
(normally server-side rendering first-page loads faster because it is pre-rendered (and often cached) ).
>"... SEO was a very important consideration — we had full-time SEO consultants on staff — and we wanted to provide the best possible page load speed, so server-side rendering quickly became a requirement. ..."
Yes: when the user's browser is under high load.
> not use javascript for everything
Using JS for frontend + backend has significant advantages. Your developers only need to know one language/codebase. You don't need to hire/maintain separate frontend/backend teams who need to figure out how to coordinate with each other.
> actually generate html
That's what server-side rendering does.
> expecting people to run your code
This is how the web works, for the vast majority of users and markets worth serving.
> having to do it yourself in a convoluted workaround
Running your frontend code on the backend to generate HTML is an elegant solution and extremely easy to implement. These days it works out of the box. Not sure where you got the idea it was a "convoluted workaround".
Oh boy, this must be the overstatement of the decade. I've done SSR with a few frameworks now, including Meteor, Nest, and Next. Saying that "it works out of the box" is so disingenuous, it borders on fake news. Even ignoring the trillion edge cases involving authentication, cookies, localstorage, dynamic components, promises/futures, async components, and so on, it will take dozens of man-hours to get properly-rendered server output that works with server-side routing, hydration and looks good on Google's SEO crawlers.
In my experience the language is very little of what makes a frontend / backend developer, and you still need separate teams.
> Your developers only need to know one language/codebase
As one wise man said: make it easier for the users, harder for a database.
The same applies to developers. We tend to forget who we make products for.> Isomorphic rendering is a huge simplicity booster for developers, who for too long have been forced to maintain split templates and logic for both client- and server-side rendering contexts. It also enables a dramatic reduction in server resource consumption, by offloading re-renders onto the web browser. The first page of a user’s browsing session can be rendered server-side, providing a first-render performance boost along with basic SEO. All subsequent page views may then fetch their data from JSON endpoints, rendering exclusively within the browser
What is the the measurement of dramatic reduction?
Addendum: That is, if your loads are large enough of course. Small projects will not have scaling problems either way.
People are allergic to actually profiling anything. I think, some days, that it is intentional, because if they had real data, the stories they tell themselves to justify what they are doing would fall apart.
Some add additional nodejs servers specifically and only for SSR, increasing complexity.
airbnb created https://github.com/airbnb/hypernova to solve exactly this problem.
Back then, the cloud was not a thing yet, there was no Docker, no Kubernetes, no nice APIs to start instances, so it made sense to offload some computation to clients. Today, no longer.
Most news sites are not apps and recently I am starting to turn off JavaScript on this websites and I am considering instead of blacklisting pages I maybe whitelist stuff if things get even worse, we ended up like in Flash days where you had an extension or a setting to Allow flash on this page.
I don't consider this a settled conclusion. Mostly because it's a false dichotomy. There are alternatives to both SSR of component sites and SPAs, and every solution has its inherent advantages and disadvantages.
UI isn't even the primary feature of an SPA, that would be offline usage.
Having more interactive elements without page loads is a feature of "DHTML", and that can be had from anything between small embedded snippets of vanilla JS to full-fledged SPA frameworks. Intermediate solutions like StimulusJS or AlpineJS seem to be getting a bit more popular, too.
But in this Fallen World, it seems we're usually getting the worst of either extreme usually. Either a long rendering time for a whole page, then delivered in one "flash" (assuming you've got a fast connection), or multiple elements popping in and out while several JSON-RPC requests are made.
You can optimize both cases, of course. Proper caching/DB views etc., or things like HTTP2/GraphQl/React Suspense etc.
But it's definitely not an either/or answer. Few things in fiddling around with computers are.
To your points, I still think a proper SPA without any quirks such as DHTML, React Suspense, etc. gives the best UI for dashboard and logged-in kind of uses cases. However, having mixed environments is from a production and dev perspective subpar and hence you end up with setups like Next (SSR) with some Next pages having a stronger SPA notion (SPA within SSR).
const inlineJSON = data =>
JSON.stringify(data)
.replace(/\u2028/g, '\\u2028')
.replace(/\u2029/g, '\\u2029')
.replace(/</g, '\\x3c');
with: const escapedReduxStateJsonString = inlineJSON(JSON.stringify(data));
But yeah, the isolated <script> thing is usually even better (more compact in addition to the security benefit).Especially when server-side rendering is so easy nowadays if you pick the correct stack.
Isn't that strategy "progressive enhancement"?
https://www.w3.org/wiki/Graceful_degradation_versus_progress...
That is simply not true for pretty much most companies in existence. Also those users will still be mostly using phones that are significantly slower than iPhone.
> It's not baffling that most developers don't make things for most people. That's just a waste of time and money.
Empirical proof shows that most web developers make things for other web developers (optimizing for 4000$ MacBooks and 1000$ iPhones while forgetting that poor LTE signal areas exist) instead of making it for targeting your actual companies user reach.
(Remember, people who don't return on your slow loading JS bloated web site do not appear on simple dumb Google Analytics.)
For my company, we serve images and video. Even multi megabyte JavaScript only equals one minute of a 720p videos runtime.
For people on restricted connections, we have a lite versions with less than 100k JavaScript and videos transcoded to 260p.
On the one hand google promotes SPA frameworks. on the other hand , google makes AMP because they are too slow. It's not stated often enough, but modern web is schizophrenic
With proper architecture, all you need is plain React (no frameworks) with `styled-components` for everything - i.e., server rendering, client rendering, and hydration.
The architecture I'm describing is a fully top down approach, with concerns separated into API, UI, and App components, where the route dictates the data to be fetched, which is passed as a `store` to the layout to be rendered, with each layout being a composition of the UI components, App components, and occasional API components. I've had a lot of success with this approach and it is, in my opinion, the simplest way, incredibly easy to follow and maintain, and extremely minimal for what is actually being achieved.
With this architecture, switching between server-only rendering and client-only rendering and/or some combination of both becomes a matter of minutes, or even just seconds if you have some env var switches in place.
/**
* We have to force the bundling of @elastic/eui and react-ace
* as Gatsby, then all loaders below will match and avoid HTMLElement issues
*/
config.externals = config.externals.map(fn => {
return (context, request, callback) => {
if (request.indexOf('@elastic/eui') > -1 || request.indexOf('react-ace') > -1) {
return callback();
}
return fn(context, request, callback);
};
});
config.module.rules.push(
{
test: /react-ace/,
use: 'null-loader',
},
);
And this was just my first SSR issue. Definitely not "out of the box."People keep asserting it’s impossible to do all of the stuff we were doing on a daily basis in 2012, without the magic of megabytes of client-side code.
That's right, I am loading the same sites again every day consuming different data. Both for work and fun. That was the whole point.
Why is React superior to Razor Templates (or for that matter, Razor pages) for server side templating?
I support the assertion that it’s a good paradigm for creating consumable SSR templates I just don’t see anything that empirically cuts it above Razor other than if you know JSX/JavaScript it’s obviously more natural but I’m assuming that’s taken for account here
If you don't want to dig too deep though, compare how to add a datepicker to a view in both systems. With React it is just another component, with Razor it is a bit more finicky.
But about the smaller stacks other environments have: That's quite often because you either don't need additional parts (e.g. there's no desire for a huge Django template "community") and/or because other stacks are more full-fledged and thus the horizontal size is a lot smaller, with no need for umpteen state management solutions, state management solution helpers and state management solution application templates.
Dashboards could probably serve as a whole different topic. It was easy to beat the old school ones, where Perl CGIs roamed the prehistoric landscape. More modern CSS, and JS graphs alone beat the old rrdtool setups you often saw.
Besides, it took me a long to leave pug/stylus, I'am still not sure if a pug-based SSR is still the best way to get stuff out of the door. But again, opting for an 10 years old stack let you miss lot of things (eg maintainability of react code is top-notch).
When it comes to generating dynamic or static web page content, the pathological framework-aversion of the Go community strikes hard. Probably nothing that doesn't use the built-in Go templates with any sufficient user backing. This doesn't appear to be a language that can birth something like RoR.
As for the maintainability of react, I'm not so excited. It's a pretty decent templating system, and it seems easy enough to compose components, but beyond that it's each to their own, with some approaches being better than others. And redux still doesn't grab me as that great, it's just the sheer amount of developers resulted in a nice toolset. Whether it's frontend or backend, the twin async and dependency hells of JS don't manage to make me sleep any better, either.
I don't give a flying frick for age myself. Sure, there's less tooling for partials than for components, but that might have a reason.
Frontend frameworks do a hell of a lot more than that, though. Much of the time, they're doing things that are either the result of shortcomings in browser standards and implementations or things that are just stupid.
I worked on 3 projects this year, using Rails, Django and Phoenix. Only one of them has a front end that requires a client side app. The other two are within the bounds of server side rendering. That saves time and money to the customer.