Cloudflare Pages: Best server tech since CGI-bin?(taras.glek.net) |
Cloudflare Pages: Best server tech since CGI-bin?(taras.glek.net) |
First I've heard of such news. What does that mean? Link please?
(Not affiliated)
https://www.cloudflare.com/press-releases/2022/1-billion-wor...
Recent thread on Learning Blender 3D has me thinking that a kind of "Blender Mentor" exchange with live zoom critiques & happy hour may be fun ;)
One thing that stuck out was the comment about "Twitter cards", since Vercel/Next now has this built in: https://twitter.com/vercel/status/1579561293069316096
Static hosting has historically been seen as exactly that, static and immutable.
Things like Cloudflare functions/workers can turn static hosting into dynamic content delivery while maintaining (most of) the benefits of static hosting.
Granted, this can be done whether the underlying content is hosted on Cloudflare, AWS, or a server in your garage so long as it’s proxied/CDN’d by a service like Cloudflare or Cloudfront+Lambda@edge, etc.
It's nothing that new though, Firebase (part of GCP) and Netlify have had that for years. CloudFlare just have the right combination of marketing, reputation, pricing and tech to make headlines with it again.
In the 90s I was using SSI to build dynamic, but static sites
-- header --
-- hierarchal javscript menuing --
-- content --
-- footer --
-- tail --
Benefits are obvious: high reliability and speed to serve the static files. In addition, if the cms server is down for whatever reason, your website is still live and working fully (not just cached pages, all of them) just fine.
Actually you can think of it like a pre-built cache that it’s deployed to cloudflare kv store.
Other cool features we have is real time collaboration in the editor.
Check more about our architecture decisions here: https://bluocms.com/cms-cloudflare/
Like are the workers just serving as your backend at that point? Or are they doing something more or different as well?
Can't think of anything next.js specific you miss out on, you can just configure anything in a vercel.json file.
Github pages and likely others also supports this format, so moving between those two services doesn't exhibit this problem. Moving to cgi-bin will.
I'd suggest Cloudflare shouldn't try to establish their scheme as canonical url and rather implement github pages behavior, but what do I know... I'm just hosting an old fashioned blog, not a JAMstack/SPA/whatever thing
Is there at least an nginx-cloudflare module that lets you self-host this stuff?
https://blog.cloudflare.com/workerd-open-source-workers-runt...
Oh, right, vendor lock-in.
There's a reason Larry Ellison owns an entire Hawaiian island, and it ain't nothin' nice...it ain't nothin' nice (#QTip)
> As a shortcut, I used GPT-3 to generate a basic typescript function for me. This let me look at TS type definitions and get a better idea of what’s available so I could get developing.
As a long time programmer, I keep trying to maintain my skepticism of AI/ML techniques for program authoring. Comments like this are dissolving my objections by the minute...
Otherwise static websites are only for devs.
Yes, there are a bunch of limitations - but in practice I quite appreciate the boundaries and the reasoning. There's also the portability question, but we're talking about PaaS. As someone who was using Firebase for many years, and static AWS + lambda prior to that I find this evolution entirely refreshing.
I've pushed 5 personal projects on pages since Jan and working on more. Most recently https://thesom.au/gh/cvms
> As a shortcut, I used GPT-3 to generate a basic typescript function for me. This let me look at TS type definitions and get a better idea of what’s available so I could get developing.
My assumption was that their prompt was something like "typescript cloudflare function" and they just used the resultant code to see types in action inside their IDE.
I just got access to OpenAI codex. I used edit function and asked it to modify the JS hello world, to add typescript annotations.
This is the git commit following that gpt conversation :)
-export async function onRequest(context) {
+export async function onRequest(context: {
+ request: Request;
+ env: { [key: string]: string };
+ params: { [key: string]: string };
+ waitUntil: (promise: Promise<any>) => void;
+ next: () => Promise<void>;
+ data: { [key: string]: any };
+}) {
Not as minimal as the code I posted, but it got me over the stumbling block.As people smarter than me have said, if a video or in this case article asks a question, the answer is usually "no".
I'd love to see a more detailed post about that. I don't know any TS devs using that kind of workflow.
The biggest issue with Copilot is that, to use it well, you need to know:
- How to write functioning code.
- How to quickly review other people's code for correctness.
- How to write good comments.
- How to tweak the ordering in which you write code such that it's easier for Copilot to understand.
- How Copilot and other GANs work, in general. An awareness that Copilot is built on GPT-3, familiarity with what that implies, and so on.
And if you don't, that results in the AI seemingly not working as well, which makes it easy to make the assumption that it isn't an incredibly useful tool. It is, it just has a learning curve measured in weeks.
I believe this is why we constantly have arguments on whether or not it's helpful.
In practice, no one is running workerd at their VMs, so it seems weird for me to compare the two.
Cloudflare Pages is better than [insert your favorite hosting service] would make more sense. Still, very cool tech.
Support during this early preview includes the following functionality: - Deploy Web apps comprised of static web content - Deploy Web apps that use pre-rendering / Static Site Generation (SSG) - Deploy Web apps that use server-side Rendering (SSR)—full server rendering on demand
Cheers,
Anyways, the pages import to take an existing git repository with a slew of CMS systems (mine being Hugo) supported, and the ability to just publish it to a custom domain in a series of clicks was actually really simple.
I think I dedicated 30 minutes of entire work and just saved ~$20/month because of it.
When somebody hit my webserver for the first time, how long are they gonna wait for the page to be loaded?
Assuming a noop in the functions part.
https://developers.cloudflare.com/workers/learning/how-worke...
https://blog.cloudflare.com/introducing-workers-durable-obje...
> Durable Objects make this easy. Not only do they make it easy to assign a coordination point, but Cloudflare will automatically create the coordinator close to the users using it and migrate it as needed, minimizing latency.
Cloudflare services are awesome.
Their documentation tends to be OK but thin. Certainly not great.
Reference guides with every object/function/method documented.
Links to sample code on github.
With CloudFlare Pages you click a couple buttons in the UI and push to GitHub. When I tested it, I didn't get bogged down managing infrastructure which I've always struggled with (overthinking) in the past. I was almost instantly writing code that solves my problem(s).
The advantages aren't as huge with a purely static site, but they're still there IMO.
It's currently not very competitive with the likes of AWS API Gateway at $0.25/million minutes.
Also, old-school is still the dominant way with how small sites are hosted, in terms of number of hosting providers who offer cgi hosting.
I preferred to strip away the .html extensions anyway, so it was okay in my case. CF should trigger a HTTP 308 for older .html to the new urls automatically.
We did not create our own engine to create "lock-in". On the contrary, it would be a huge win for us if we could simply run FastCGI or Node or whatever applications unmodified. We'd be able to onboard a lot more customers more quickly that way! Our product is our physical network (with hundreds of locations around the world), not our specific development environment.
But none of that tech scales well enough that we could run every app in every one of our locations at an affordable price. Meeting that goal required new tech. The best we could do is base it on open standard APIs (browser APIs, mostly), so a lot of code designed for browsers ports over nicely.
(I'm the lead engineer for Workers. These were my design choices, originally.)
Enjoy your week! :)
I'm interested in Durable Objects but I don't know how to work within the 50-subrequest limit on the Free or Bundled plans.
It sounds like I can only read from 50 Durable Objects or KV queries in a single request. If my usage pattern is e.g., sharing docs with many users, how would I let more than 50 users access a doc?
Whether I consider fanning out on writes or fanning in on reads I'd need more than 50 subrequests. How should I approach this?
You can run it locally[2], and efforts are underway to standardize much of it so that the same code can run on Workers, Node.js, and Deno[3]. The Workers runtime is also open source[4], although in practice this is likely to be less useful for most use cases than just using one of the other JavaScript runtimes if you're self-hosting.
[1] https://developers.cloudflare.com/workers/learning/how-worke... [2] https://developers.cloudflare.com/pages/platform/functions#d... [3] https://wintercg.org/ [4] https://github.com/cloudflare/workerd
Plain old CGI doesn't scale beyond toy usage--it spawns a process per request.
So there really isn't a good single option for Cloudflare to use.
I'd be willing to bet the opposite: CGI is more than enough for 80% of workload, performance-wise.
There are a few good reasons why CGI isn't the best today: the tooling doesn't exist, the security model is really bad, and you can cram fewer websites on a single machine, so for the same number of websites you need more machines, and that's an ecological aberration. But there is no problem about CGI being too slow.
Cgi just isn't necessary anymore and would be more baggage than anything at this point.
The old web stacks are just time sinks. It's so much nicer to be able to git push... And that's it. Done.
Can you elaborate? My CGI-based website is deployed with `git push` as well, not sure what the difference is there.
Grab any 3 commonly used pieces of technology whose names can be used to form a fun initialism, append stack to the end, and soon there will be experts speaking at conferences dedicated to it. JamStack is an example you mentioned. I’ve heard others over the years.
I disagree that it’s a useful unit of thought. No one I know or work with ever talks about “web stacks” this way. Am I missing something here? Is this just some kind of niche? Who’s marketing these?
For example you have security. CGI with multitenants is hard if not impossible. At least for the tenants that will want to use this.
Then there is scaling, resource consumption, and predictability (for planning).
Also check - a WordPress plugin for static site generation and deployment https://github.com/WP2Static/wp2static
The same website without Cloudflare CDN and just using a simple caching plugin is much faster than the exported version by wp2static hosted on Cloudflare pages.
It has fallen from 99 to 78 in the mobile page speed test.
I tried creating a copy of the website using wget, it was faster but i faced issues with missing files and posts on the main page using temporary query links and not the permanent links.
So I went through the Hassle of compiling wp2static and just eating the performance loss instead.
If you can host it on workers for example, you can get an entirety free, managed website just with one click(obviously someone would do a template to enable you to setup wp on cf pages if that was possible).
I keep seeing these 'JAMstack' sites are the new hot thing when a static wordpress site is enough and easier to update than 'JAMstack' sites, in reality these sites are just glorified HTML under a CDN.
I don't see the benefit of JAMstack when you can just place a static WP site under a CDN like Cloudflare and have all the benefits of WP and a CMS to edit content.
At least with Cloudflare they are more than just a CDN, others like Netlify and Vercel seem to be both over engineered and overpriced for what they do.
Jamstacks really shine when you can decouple the frontend from the backend(s), because then you have an easily replicable single repo with nothing but code. You can deploy it to any Jamstack host with a push and they handle all of the scaling and https and caching concerns. And the backend teams, whether an in house endpoint or content writers working on a hosted headless CMS, never have to care about the stack.
A middle ground that's interesting to explore is using WordPress with Advanced Custom Forms as a backend plus a Jamstack frontend. ACF is still quite a bit more powerful than any of the headless CMS solutions I've tried, but it does mean you still have to maintain a LEMP stack, which is a headache in and of itself.
There are also managed services for doing this that take care of everything for you, from the WP hosting env, to the static hosting and CDN, along with replicating certain types of dynamic functionality on the static site like forms and search.
One example of this type of managed static WordPress hosting service is Strattic: https://www.strattic.com.
Besides WP, there are static CMSes that fit a modern workflow perfectly.
Interesting, simply-static never worked for me, only wp2static did and it had a lot of bugs.
> WP run on an arm64 instance that was only started when editing was needed.
How do you do this?, are you doing some kind of webserver container run trigger? Or running it on a managed platform?
> Besides WP, there are static CMSes that fit a modern workflow perfectly.
Yeah but they aren't so customizable and flexible like WordPress, the goal is to be able to create a whole static site without needing to write anything manually.
My sense is it's still a ways out from being practical, though.
You can work around this with some Workers that capture the dynamic routes, request the nextjs html file and return that, but then you're using up your Functions quota as well.
I have found that with Cloudflare's documentation that where it's lacking they have the ability to PR in changes and if you go back to a saved page, I've found them pleasantly updated numerous times.
Thankfully the cutting edge development is based off of open standards that you can easily build off of. I know personally that I've found most of the sticking points of developing against Cloudflare to be easily mitigated and just be an overall better experience as a customer than with another hosting provider.
It doesn't hurt that they aren't charging me all that much personally, but I have a day job that fills up the Cloudflare stock price just fine.
Their docs are subpar because Cloudflare is just not interested enough in documentation improvements and is hostile to the people who are interested and who try to make the improvements.
I tried fixing a 404 in their docs once. This would have taken, you know, approximately 2 seconds with a real wiki or CMS[1]. What resulted instead was an excruciating back-and-forth that took 9 messages over a span of exactly 5 hours, ended in no fix, and a Cloudflare dev trying to "upperhand" me (in the vein of George Constanza's pre-emptive breakup—"I'm afraid that I am gonna have to break up with you"[2]).
Sure, Cloudflare team, there's nothing wrong, I guess, with choosing to believe in the delusion that a repo full of Markdown files and the GitHub generation's busted-ass PR-based edit workflow is an acceptable substitute, just like there's nothing wrong with writing your name with a pencil that has a brick taped to it[3].
1. <https://groups.google.com/g/mozilla.dev.mdc/c/BU09C48bmGU>
I was very excited when both of these technologies were first announced (years ago).
I tried them, immediately, then (during the initial beta). They were clunky, half-baked, and don't even get me started on the DX of developing against them. Literally missing from cloudflare's own CLI - the official docs described how to manually deploy via CURLing endpoints. It felt like manually FTPing files onto a server, like in the 90s.
Ok fine, this is an early beta, let's let this bake longer.
So a few months ago, I started a new greenfield project and once again decided to give Functions + Durable Objects a shot... and was astounded to find the exact same half-baked experience, where different "beta" versions of the CLI had to be installed in order to support the different endpoints, and these were in fact mutually-exclusive.
Cloudflare has some cool ideas, but they seem to like rolling out 70% of a cool idea and then moving into the next shiny thing without ever making the first thing actually polished and reliable. After wasting many hours with Cloudflare's DX, I decided to never waste more time with them as a development platform (vs an optimization layer / CDN / etc).
https://www.amazon.com/Reynolds-Wrap-Aluminum-Foil-Square/dp...
https://blog.cloudflare.com/aws-egregious-egress/
Identity as service is another category that's egregiously priced, and it's usually priced based on monthly active users too, which is the worst. It would be great to see Cloudflare disrupting that one as well.
> Workers can also send emails for free, and soon you will be able to process them as well with Email Workers.
There's got to be some limitations to that and I don't like it when they're not well defined. Looking at the MailChannels pricing, it looks like it's roughly 40x more expensive than AWS SES on the low end and 4x on the biggest plan before negotiating a custom deal.
I can send 500k emails and have a dedicated IP on SES for the same cost as 40k emails from MailChannels. Since their big value add seems to be scanning emails to prevent spam from user generated content, I just don't see it if that's not functionality you need because it's all wasted resources / unnecessary cost in those cases.
I assume there's a point where it's no longer free and I'm guessing the cost once you get there will be high.
Plus, am I sharing IPs with people who are sending such low-quality mail it requires outbound spam filtering? That seems like a huge negative on the deliverability side of things.
They are very good at detecting abuse. The leading cause of email deliverability problems is not advertisement, it's servers being hijacked by malicious actors.
I think of it just as a different approach, where the focus is static (cache) with dynamic layered in as necessary. This forces you to be more explicit about dynamic behaviour and treating cache as a priority concern.
Also DX being better
Plus CDN if that matters.
Having data flung about the place in these objects sounds a bit scary (how do I back them up, and keep track of them, how do I join data from multiple objects, and what if those different objects are all around the world).
I wonder if that will change given enough time. I certainly hope so, but I would’ve guessed that it would’ve changed already, but here we are in 2022…
But if anyone is edgy enough to use AI as a shortcut, I guess it would be the guy who did telemetry ingestion on a t2.micro :D
Regards Jonas,
PHP, getting shit done since '95.
For players at Cloudflare, AWS, GCP, etc scale it's essentially free. The fact that many of them (other than Cloudflare) charge $0.08/GB or whatever for egress and that's just accepted because people think "that's what it costs" is wild to me.
My personal conspiracy theory of sorts is that many compute, storage, etc products from big cloud providers are essentially priced as loss leaders because while the average dev has some idea of what a GB of storage, GHz CPU, etc costs (because they priced a laptop) they have no idea how bandwidth and connectivity is priced at this level and cloud providers capitalize on that (somewhat understandable) ignorance.
Like the old 50c/m mobile phone calls, the 10c SMS messages, then when they were free, charging for international SMS and calls, now charging for data. It is the "whales" who spend a lot of money on mobile games. 1% fee for looking after your money, Etc.
I wonder what AWS will do if they have to make egress cheap! What is the next thing they can charge for that is actually cheap but people are happy to pay for.
More broadly, other acronym stacks are a thing because complementary technologies cluster together, especially in relatively immature ecosystems. E.g., PHP traditionally had better support for mySQL than for competing databases, so if you were building an app in PHP, you probably used mySQL for data storage. These effects may have weakened over time as the graph of ecosystem integrations has grown more complete (and indeed I feel like I hear less about acronym stacks these days than I did half a decade ago), but I don't think they've died out entirely.
Jamstack is just a particular configuration where vendors take care of all of that and you can push up frontend code that auto deploys and that's that. I used to have to manage the whole stack and now can work purely in HTML CSS and JS and it is a DREAM. That's the magic of the Jamstack. Less devops and network infrastructure, more time to focus on UX and DX.
As for vendors, there's Vercel, Netlify, Gatsby, maybe a few others. Cloudflare, Deno, and Fly, Render and a few others have similar variations.
It's a really nice way to work...
First, Durable Objects and Workers KV requests do not count against the 50-subrequest limit. It's HTTP requests to external servers that count.
Second, with the Unbound billing model, the limit is now 1000. The old "Bundled" billing model is really intended for "configure your CDN" use cases where you're mostly just rewriting a request or response and proxying to origin. For whole applications you should use Unbound. (Note that with Durable Objects, the objects themselves always use the Unbound pricing model.)
Third, to address your specific scenario:
> If my usage pattern is e.g., sharing docs with many users, how would I let more than 50 users access a doc?
If you mean you have 50+ users connected via WebSockets and you need to send a message to each one -- these messages (on already-open WebSockets) do not count against the subrequest limit. Only starting a whole new HTTP request or new WebSocket would count.
And for "fan in", these would be incoming requests from client, but the subrequest limit applies to outgoing request. There's no limit on number of incoming requests.
Ahhh, somehow I got the idea that requests from a Worker to Durable Objects and KV counted against subrequests. This changes things!
> If you mean you have 50+ users connected via WebSockets and you need to send a message to each one
I was thinking of, a client requests "give me a list of my documents" or "share this document with Alice, Bob, et al.", and the Worker that receives that request has to handle that somehow. I thought that was tough to do in 50 subrequests, but it's easy if I can just make an unbounded number requests to different Durable Objects or KV records.
Edit: I found what gave me the idea about subrequests. From the Workers docs:
> For subrequests to internal services like Workers KV and Durable Objects, the subrequest limit is 1000 per request, regardless of usage model.
https://developers.cloudflare.com/workers/platform/limits/#h...
You're probably right that it's more flexible, but then again I don't want to constantly switch out of IDEA.
Surely blindly pushing, yet another corporation into position, where they can lobby virtually anything, is awesome idea. (Nothing against folks from cloudflare. They seem all right, but they will get acquired and/or change leadership at some point).
You might not feel the downsides of having a huge percentage of the internet going through a single provider until it's too late to change it.
I actually quite like Cloudflare, and I don't think they're purposely doing anything nefarious (just following incentives like everyone), but I think caution is warranted.
Also note that you'll often see a PDF generated on the fly with a long, difficult to parse URL.
Take up your second point with the W3 or whatever, to be honest if tlds weren't so important for phishing and whatnot it would probably be fine. I think some browsers have started doing that anyway. You overestimate how tech savvy the average user is, and by extension you overestimate how much the average user can keep track of all this complexity. Do you think most people have heard of .info or .xyz?
Do you have some examples? I always get excited by this serverless stuff, but often the use-cases are quite limited if you think about it, or maybe I’m thinking wrong, especially if you take vendor lock into consideration.
Contact Form - Comments System - Authentication System - Image resizing (thumbnails) - GraphQL Gateway (working on this now) - Bypass CORS - Generate a random number on the server
I also built full apps on CloudFlare workers (and doing it now).
Only one caveat: They are heavily invested in marketing but their tooling is real cr*p. They are not investing in the Rust integration; or the more regular tools/integrations you are used to.
You drop a file in the functions dir of your repository, edit the firebase.js(on? it's been a while) config file to say that function X maps to path Y (so you can have api.site.com or site.com/api or whatever you want as a redirect to your function), and run firebase deploy.
Drop a file into a directory which is all or mostly static Markdown, HTML or CSS, but those files can contain markup to call other modules, as well as code files.
The existence of a file is enough for its index code to be run (if it has any) and any URL-set and page components it makes available are available to other pages to be used, or served as they are. A single file can offer many URLs, usually anchored at the name of the file but not necessarily. The reason it can affect URLs outside the name of the file itslf is that it's efficient to index all files in a directory tree, even on every HTTP request, if that's done by file change detection and the results are efficiently cached.
Some of those files act as filtering middleware, processing the input and output of some or all URLs and passing along what they don't change.
Updating static Markdown, HTML and CSS, dynamic content, code, and JSON data, are done by 'git push' or 'rsync', or just by editing files in place if you prefer. The server automatically detects file changes and keeps track of all code, file and data dependencies when individual things are used to calculate a response. The full dependency structure for each response calculation is recorded with each cached response.
Cached responses include both full HTTP responses, and components and data available for use in the manner of a subrequest. If a previously cached response depends on a file that has since changed, or been removed, or even a file that was previously absent but now present, or another cache condition such as a JSON or SQLite file change, or logged in user etc, the cached response is invalid and must be regenerated. Regeneration is a combination of on-demand and ahead-of-time, to be able to respond quickly with the speed of a static site for small finite page collections, while behaving correctly with large or infinite collections. Some code updates trigger a process restart because some code can't be safely unloaded or replaced; some code is fine updated, though, and this entire process is well automated. In practice, pleasant and fast to use: edit a file and see the result immediately on reload, with no compile/build step or extra actions.
The dependency structure partly reaches the browser, so that requests to cached responses can be served more efficiently, sometimes even zero latency. In some circumtances, changes in files cause events that ripple through to the browser causing real-time updates to components in-page without a refresh. The result of those is a little like Meteor or LiveView, except almost everything is made from static page files in Markdown or HTML, code files, and JSON data files, and the set of available pages (and "table of contents" pages) are built by indexing those files.
In practice it's mostly writing Markdown, and this is great for emphasis on content first. Or Markdown templates: content that varies a little according to data. But with extensions to be able to drop in useful rendered components (graphs, generated images, templated CSS, transclusions for headers/footers, etc) and dynamic components (live updating when the underlying data files are edited).
It even serves PDFs and thumbnails of those as in-page components, where the PDF content is HTML rendered by running Chrome on the fly from within an in-page component to generate the PDF or image, with Chrome told to fetch a subrequest which serves it the contents of that very same component. This makes for some pretty lists of downloadable PDFs, all generated from JSON data. This probably sounds complicated, but it was actually a fairly simple single file of code, dropped into a directory to make the component available by name inside the other Markdown quasi-static files.
One small VM served a few thousand requests per second last time I checked. Not as fast as a Rust server, but good enough for my uses. It made heavy use of Perl coroutines to serve and cache concurrently, and NginX to make routing and static-serve decisions. Perl coroutines are not commonly used (for ideological reasons I think), but they work very well.
I don't use the sytem any more, but it was the nicest I've used.
I'm not saying Cloudflare workers are a better proposition now. (their ecosystem is a complete shitshow) but the idea has lots of potential; and will probably be the future of computing[!].
!: That is, if the decentralized web fails to take off in the next 5 years.
What do you think is the upper limit of requests per second, if your HTTP server does process-per-request? For simplicity, assume server class hardware, that each request does no work besides producing a response, and an upper bound of 10k unique remote addresses (i.e. no more than 10k concurrent connections in a different model).
How do you think those metrics compare to other designs? I have an affinity for the conceptual simplicity of CGI but I've never been able to get a process-per-request server within even an order of magnitude of the performance of the more common designs. But I could be missing something!
Also, how does this design adapt to HTTP/2?
In the hundreds, which is absolutely enough for most use cases. If CGI is enough for sqlite.org displaying dynamic content (such as in https://www.sqlite.org/cgi/src/timeline), it is enough for 80% of websites. You are not bigger than sqlite.
> How do you think those metrics compare to other designs
The important question is not "is it better or worse than alternatives" but "is it enough for me". Yes, it is.
> Also, how does this design adapt to HTTP/2?
HTTP/2 doesn't change anything. Requests are on the same socket until the webserver, and the webserver forks a process for each request, multiplexes the responses and all is well.
All good! But if you're OK with O(100) RPS out of a server, then I guess basically every possible option is on the table. I bet `nc` spawning background `bash` scripts to handle requests would get to 1k RPS, even! ;)
> HTTP/2 doesn't change anything.
I guess that would work, as long as the fronting server managed all of the connection management details, stream demuxing, etc. But I wonder how you'd do that in a single thread?
It is indeed very very slow; mostly because it is not possible to pre-fork your CGI scripts (environment variables get set from the request, so each cgi program will have different values in the environment).
But, if you could pass HTTP data via some way other than environment variables, you could pre-fork the binaries and have acceptable speed[1].
[1] Pre-forking makes a large difference, and surprisingly is not too far off from other approaches to concurrent request handling. See https://unixism.net/2019/04/linux-applications-performance-i...
There's no way a CGI app can reach a million requests a second on the same hardware that a nodejs (single thread, worker loop) would take to do 1M RPS. Processes have a lot of overhead. Almost no one needs 1M RPS but it cannot be waived away that CGI is perfect for everything.
I also find the modern approach much easier to reason about and debug. I don't need Apache + fastCGI + php + php fastCGI + apache configuration to get started. I can just run `node server.js` and my web app works.
The fact that it also scales much better is a cherry on top.
You can pass environment variables to CGI scripts as well. In fact, that's exactly how CGI works. Shared resources can be cached in memory through redis, although a shared file (for example sqlite) is enough in many cases.
> I don't need Apache + fastCGI + php + php fastCGI + apache configuration to get started
I'm talking about CGI, not fastCGI
Don’t know if it’s as powerful, though — haven’t personally used it.
I think you're right, but I think the key part of the tech is economic: their ability to deploy hundreds or thousands of instances safely shared by large numbers of replicated sites and applications belonging to different site owners.
Individual site owners can't afford that level of infrastructure for each site separately. Actually deploying it isn't that difficult, it's just expensive and unjustifiable.
(I nearly started a business doing what Cloudflare Workers is doing now, about 10 years ago (and not with JS/Node), so I know a fair bit about building out this sort of distributed edge system, including the WebSockets and anycast part. But life took a turn and IPv4 blocks for anycast were starting to look a bit beyond what I could justify for the development, even back then. In retrospect it would have been a good thing to build.)
> That is, if the decentralized web fails to take off in the next 5 years.
You have given me an idea :-)
(For a side project I'm working on zero-knowledge proofs of execution, which allows nodes to execute code on behalf of other nodes without the latter having to trust the former. Performance is far too slow to run things like websites at the moment, but there are many tricks for speeding it up, including hardware acceleration.)
No, it wouldn't, that's a misconception. The Perl system I described ran fine on however many servers you wanted to run it on, like any horizontal scaling system. Pretty much anything that runs in a container and isn't required to communicate with a central database does. The design based around indexing files and caching calculations is effectively stateless; it scales without limit if you have the servers.
Any specific points of global data consistency require communication with a shared data store of some kind, but that's also true in Cloudflare workers, where there's a separate Durable Objects system for that. Cloudflare's Durable Objects is pretty good as those things go, but that's not what your comment was about.
What's impossible for most is paying the cost of hundreds of locations. And if you really want to do it well, anycast IP blocks. But if you have the servers, and don't need every request to fetch from a central database (or can use an eventually consistent distributed one), there's no problem deploying as many instances as you want with something that's effectively stateless, even if it's Perl, Python or PHP.
> what usually happen is that the scripting language creep start getting bigger and bigger until it's unmanageable.
Probably right, but I'm curious why you think there won't be equivalent JavaScript creep in sites using CloudFlare Workers.
I think users are (rightfully) distrustful of URLs in the general case, but a URL having a file extension is actually a pretty good indication that it's a simple "does what it says on the tin". (Imgur changing their .jpg URLs to have more complex behaviour caused a pretty big backlash, for example)
In the latter case you could use redis, but that’s a poor, inconvenient, inefficient replacement for a global variable.