A tiny Docker image to serve static websites(lipanski.com) |
A tiny Docker image to serve static websites(lipanski.com) |
Netlify, Amplify, Cloudflare Pages, etc.
# Install thttpd
RUN apk add thttpd
Wouldn't you want to use the --no-cache option with apk, e.g.: RUN apk add --no-cache thttpd
It seems to slightly help with the container size: REPOSITORY TAG IMAGE ID CREATED SIZE
thttpd-nocache latest 4a5a1877de5d 7 seconds ago 5.79MB
thttpd-regular latest 655febf218ff 41 seconds ago 7.78MB
It's a bit like cleaning up after yourself with apt based container builds as well, for example (although this might not always be necessary): # Apache web server
RUN apt-get update && apt-get install -y apache2 libapache2-mod-security2 && apt-get clean && rm -rf /var/lib/apt/lists /var/cache/apt/archives
But hey, that's an interesting goal to pursue! Even though personally i just gave up on Alpine and similar slim solutions and decided to just base all my containers on Ubuntu instead: https://blog.kronis.dev/articles/using-ubuntu-as-the-base-fo...People will remark about how this is a waste of time, others will say it is absolutely necessary, even more will laud it just for the fun of doing it. I'm in the middle camp. I wish software/systems engineers would spend more time optomising for size and performance.
Where can I submit a feature request ticket?
check this out
However, Busybox also comes with an httpd... it may be 8.8x bigger, but you also get that entire assortment of apps to let you troubleshoot, run commands in an entrypoint, run commands from the httpd/cgi, etc. I wouldn't run it in production.... but it does work :)
https://devopsdirective.com/posts/2021/04/tiny-container-ima...
I believe the best chance we have of [building binaries "to stand the test of time with minimal toil"], is by gluing together the binary interfaces that've already achieved a decades-long consensus, and ignoring the APIs. . . . Platforms can't break them without breaking themselves.
Either way though, it's a great article I'm glad the author took to write. His docker practices are wonderful, wish more engineers would use them.
Since GitHub Pages lets you attach a custom domain, it seems like the perfect choice.
I would expect their CDN to be pretty awesome. And updating the website with a simple git push seems convenient.
It's really only on windows/mac where docker has significant memory overhead, and that's just because it has to run a little VM with a linux kernel. You'd have the same issue if you tried to run thttpd there too and couldn't find a native mac/windows binary.
He also like to upgrade that server close to edge, and if that goes south, he want to rebuild and bring his static site up quickly, along with his other projects.
https://nypl.getarchive.net/media/salmo-fario-the-brown-trou...
Also the other blog posts have different big fishes, so check them out as well.
Requirements:
- rsync style publishing
- not supported by tracking users.
- raw static file hosting (including html)
- redirect foo.com/bar to foo.com/bar/index.html (this is why I need lambda...)
- zero admin https certificate management
I want things like DNS, X11 screeb locking, ssh session management, syslog, etc. to just work. I can't figure out how to fix any of that stuff under systemd, and at least one is always broken by default in my experience.
The image is a small container with an http daemon. It gets deployed as a statefulset and I mount a volume into the pod to store the static pages (they don't get put into the image). Then I use cert-manager and an Istio ingress gateway to add TLS on top.
Updating the sites (yes, several at the same domain) is done via kubectl cp, which is not the most secure but good enough for our team. I could probably use an rsync or ssh daemon to lock it down further, but I have not tried that.
Sounds like brain surgery in order to make a jam sandwich to me.
when I think of static content, I think of buying a domain name + shared hosting for monthly EUR 2,-.
And not assigning rights nor control but having a legal claim on both service and name. Am I missing something?
This is really useful for tiny little services that each want a different database server.
There are free ways to host static pages and extremely inexpensive ways to host static pages that are visited mullions of times per month using simply services built for that.
Even then, saving a few MBs in image size is the devops parlance of early optimisation.
There's so much that happens in an Autoscale group before the instance is marked healthy to serve traffic, that an image pull of few MBs in the grand scheme of things is hardly ever any issue to focus on.
We could be using container snapshots/checkpoints so we don't need to go through as much initialization code. This would imply though that we configure via the file-system or something we can attach late though. Instead of 12-factor configure via env vars, as is standard/accepted convention these days. Actually I suppose environment variables are writable, but the webserver would need to be able to re-read it's config, accept a SIGHUP or whatever.
We could try to pin some specific snapshots into memory. Hopefully Linux will keep any frequently booted-off snapshot cached, but we could try & go further & try to make sure hosts have the snapshot image in memory at all times.
I want to think that common overlay systems like overlayfs or btrfs or whatever will do a good job of making sure, if everyone is asking for the same container, they're sharing some caches effectively. Validating & making sure would be great to see. To be honest I'm actually worried the need-for-speed attempt to snapshot/checkpoint a container & re-launch it might conflict somewhat- rather than creating a container fs from existing pieces & launching a process, mapped to that fs, i'm afraid the process snapshot might reencode the binary? Maybe? We'd keep getting to read from the snapshot I guess, which is good, but there'd be some duplication of the executable code across the container image and then again in the snapshotted process image.
If you're curious, Netlify is one popular alternative that is easy to get in to even without much experience. I would say even at the free tiers Netlify is easily a cut above Github for static hosting, and it hooks into github near perfect straight out of the box if that is something you value.
Because some people may not want to depend even more on Big Tech (i.e. Microsoft) than they already do
One reason would be if your site violates the TOS or acceptable use policy. GitHub bans "excessive bandwidth" without defining what that is for example. For a small blog about technology you are probably fine.
For static websites, CDNs are largely unnecessary. My potato of a website hosted from a computer in my living room has been on the front page of HN several times without as much as increasing its fan speed.
It took Elon Musk tweeting a link to one of my blog posts before it started struggling to serve pages. I think it ran out of file descriptors, but I've increased that limit now.
I don't like github pages because it's quite slow to deploy. Sometimes it takes more than a couple of minutes just to update a small file after the git push.
$ curl https://nobodywasishere.github.io # moved to https://blog.eowyn.net
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
$ curl https://blog.eowyn.net/vhdlref-jtd # moved to https://blog.eowyn.net/vhdlref
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>Benefits: can cleanly and delete 100% of what was installed. If you use something on root can always infect, save cache, logs…
I don’t want to impact anything else running on my server. I don’t want anything to depend on that either silently.
Docker is the best thing. I just can’t understand how people still can’t get the benefits yet.
Is Amazing to start a project you had 3 years ago and just works and you can deploy without reading any docs. Just spin a docker container. Eat, safe and just works.
Congratulations. I have millions of files on my static sites. So what? Would you recommend a container for each? To what purpose?
> different degrees of functionality
We're still talking static sites. There is no 'functionality', right?
...what? Where are you quoting that from? No, I'm not recommending Docker if all you do is host static pages.
> We're still talking static sites.
No, I said "if you've got a lot going on on a single machine" - I didn't just mean static sites. I did respond with "different sites/domains with different degrees of functionality in different languages/frameworks", which means a variety of services, e.g. one client may be a static page, another might use a backend/API in Node, and another in C#/.NET - etc.. heck, you might even used containerized DBs for some of them. Hence Docker.
A 301 (or 302) redirect means setting the status code header to 301 and providing a location header with the place to redirect to. Last I checked GitHub doesn't allow any of this, or setting any other headers (like cache-control). To work around this, I've been putting cloudflare in front of my site which lets me use page rules to set redirects if necessary.
~ % curl -i 'https://kotokaze.github.io/'
HTTP/2 301
server: GitHub.com
content-type: text/html
permissions-policy: interest-cohort=()
location: http://github.kotokaze.net/Or a constantly-updating behemoth, running as root, installing packages from yet another unauditable repository chain?
And security updates, as you said, are needed regardless of whether you run Docker on top. I think Docker is a needless complexity and security risk.
The second option is standardized and usually the same 1 or 2 commands to run anywhere
Granted, you do need to learn how Docker works, and be ready to help others do likewise if you're onboarding folks with little or no prior experience of Docker to a team where Docker is used. That's certainly a tradeoff you face with Docker - just as with literally every other shared tool, platform, codebase, language, or technological application of any kind. The question that wants asking is whether, in exchange for that increased effort of pedagogy, you get something that makes the increased effort worthwhile.
I think in a lot of cases you do, and my experience has borne that out; software in containers isn't materially more difficult to maintain than software outside it if you know what you're doing, and in many cases it's much easier.
I get that not everyone is going to agree with me here, nor do I demand everyone should. But it would be nice if someone wanted to take the time to argue the other side of my claim, rather than merely insisting upon it with no more evident basis than arbitrarily selected first principles given no further consideration in the context of what I continue to hope may develop into a discussion.
Whatever set-up your application needs is a still necessary step in the process. But now you've not only added more software in docker with its a docker registry, and Docker's state on top of the application's state, you've also introduced multiple virtual filesystems and a layer of mapping between those and locations on the host, mappings between the container's ports and the host's ports. There is no longer a single truth about the host system. The application may see one thing and you, the owner, another. If the application says "I wrote it to /foo/bar", you may look in "/foo/bar" and find that /foo doesn't even exist.
All of that is indirection and new ways things can be that did not exist if you just ran your code natively. What is complexity if not additional layers of indirection and the increase of ways things can be?
To host something as a docker container I need 2 things: to know how to host docker, and a docker image. In fact, not even an image, just a dockerfile/docker-composer.yaml in my source code. If I need to host 1000 apps as a docker containers, I need 1000 dockerfiles and still to know (and remember) 1 thing: how to host docker. That's 1 piece of knowledge I need to keep in my head, and 1000 I keep on a hard-drive, most of the time not even caring what's the instruction inside of them.
If I need to host 1000 apps without dockerfiles, I need to keep 1000 pieces of knowledge in my head. thttpd here, nginx to java server there, very simple and obvious postgres+redis+elastic+elixir stack for another app… Yeah, sounds fun.
It's rare that I need to find out de novo where a given path in a container is mapped on the host. When I do need to do that, I can usually check a deployment script, or failing that inspect the container directly and see what volume mounts it has.
I don't need to worry about finding paths very often - much less frequently than I need to think about deployments, which at absolute minimum is once per project.
So, sure, by using Docker I've introduced a little new complexity, that's true. But you overlook that this choice does not exist in a vacuum, and that that added complexity is more than offset by the reduction of complexity in tasks I face much more often than the one you describe.
And that's just me! These days I have a whole team of engineers on whose behalf, as a tech lead, I share responsibility for maintaining and improving developer experience. Do you think I'd do them more of a favor by demanding they all comprehend a hundred-line sui generis shell script for deployments, or by saying "here's a single command that works in exactly the same way everyone you'll work with in the next ten years does it, and if it breaks there's fifty people here who all know how to help you fix it"?
i've been meaning to ask you this for a decade. whatever happened to when you wrote a blog with insanely irritating serifs that connected certain letters together? what was the rationale behind that? never seen it since
I'm insanely impressed by APE and redbean by the way, blows OP out of the water!
Adding value is a compromise of some increased security risk - and it's our job to mitigate that as much as possible by writing quality software.
You can limit file system access for the server to only a certain folder.
You can similarly limit port access and manage conflicts (e.g. multiple servers can think they are listening on a certain open port but those are mapped to something else on the host).
If you have multiple machines with different operating systems or even architecture you can deploy your server as a container more easily on them without needing to rebuild or test for each one.
You can have the same environment running locally while development or on CI servers without complicated setups.
The system can scale out a lot more easily to hundreds/thousands of machines if you decide to use something like Kubernetes.
The ability to pull the image on to any machine without needing to clone the source files and build it.
Smaller images mean faster pod starts when you auto scale.
What kind of work are you doing that requires really fast auto scaling? Is a few minutes to spin up a new instance really that cumbersome? Can you not signal for it to spin up a new instance a tiny bit earlier than when it's needed when you see traffic increases?
Which seems like sort of an edge case for value adding, and makes me feel like it really doesn't add any value to do this unless you already are doing it for everything, and thus you really wouldn't be throwing out any value by just serving the static site without the docker overhead.
Anything else, though? There’s got to be more to it than that, or it wouldn’t be as popular as it is.
docker has dead whale on the beach vibes. what value does it offer to those of us who have moved on from the mess linux is becoming?
Really, enlighten me
Hell if you just need to serve static HTTP it even has its own built in webserver now:
I see two main places where Docker provides a lot of value: in a large corp where you have massive numbers of developers running diverse services on shared infrastructure, and in a tiny org where you don't have anyone who is responsible for maintaining your infrastructure full time. The former benefits from a standardized deployment unit that works easily with any language/stack. The latter benefits from being able to piggy-back off a cloud provider that handles the physical and OS infrastructure for you.
Relying on hidden complexity makes for a hard path ahead. You become bound by Docker's decisions to change in the future.
For example, SSLPing's reliance on a lot of complex software (among which NodeJS and Docker) got it to close, and it got on the front page of HN recently.
https://news.ycombinator.com/item?id=30985514
Keeping dependencies to a minimum will extend the useful lifespan of your software.
(Has it occurred to you that it losing money for a while might have contributed to its eventual unmaintainability, as the dev opted sensibly to work on more sustainably remunerative projects? If so, why ignore it? If not, why not?)
Similarly for the Node aspect - that's very much a corner use case related to this specific application (normally SSLv3 support is something you actively don't want!), and not something that can fairly be generalized into an indictment of Node overall. Not that it's a surprise to see anyone unjustly indict Node on the basis of an unsupportable generalization from a corner case! But that it's unsurprising constitutes no excuse.
Other than that you seem here to rely on truisms, with no apparent effort to demonstrate how they apply in the context of the argument at which you gesture. And even the truisms are misapplied! Avoiding dependencies for the sake of avoiding dependencies produces unmaintainable software because you and your team are responsible for every aspect of everything, and that can work for a team of Adderall-chomping geniuses, but also only works for a team of Adderall-chomping geniuses. Good for you if you can arrange that, but it's quite absurd to imagine that generalizes or scales.
In isolation, yes. But if, for instance, you're already running a container orchestration tool with hundreds of containers, and have CI/CD pipelines already set up to do all of that, it's easier just to tack on another container.
If you can't see a reason why, then you probably don't need to. You probably have different needs than other people.
Many people use Docker not because of what they're doing inside of the container, but because it is convenient for tangential activities. Like lifecycle management, automation, portability, scheduling, etc.
I have several static sites in Docker containers in production. We also have dozens of other microservices in containers. We could do everything the same way, or we can one-off an entirely separate architecture for our static sites. The former makes more sense for us.
But once you start wanting to run a significant number of things, or a significant number of instances of a thing, it becomes more helpful to have a all-purpose tool designed to manage images & run instances of them. Having a common operational pattern for all your systems is a nice, overt, clean, common practice everyone can adapt & gain expertise in. Rather than each project or company defining it's own deployment/operationization/management patterns & implementations.
The cost of containers is also essentially near zero (alas somewhat less true with regards to local FS performance, but basically equal for many volume mounts). They come with great features like snapshots & the ability to make images off images- CoW style capabilities, the ability to mix together different volumes- there's some really great operational tools in containers too.
Some people just don't have real needs. For everyone else...
Fact is, for 3 servers, it would be hard to convince me of any use of Docker compared to the aforementioned deployment shell script + Debian unattended-upgrades.
What problem does Kubernetes address here? So what if it is "easy to use"? I prefer "not needed at all".
> but also only works for a team of Adderall-chomping geniuses
Of course, not everything should be implemented by yourself. Maybe this project wouldn't have been possible at all without offloading some complexity (like the convenient NodeJS packages).
But in particular Docker and its ecosystem are only worth it when you have an amount of machines that make it worth it - when things become difficult to manage with a simple shell script everyone understands: when you have a lot of heterogeneous servers, or you want to deploy to the Cloud (aka Someone Else's Computers) and you have no SSH access.
> truisms
I don't have any experience with Kubernetes nor Docker Swarm. The reason is that the truisms have saved me from it. If you don't talk me into learning Kubernetes, I won't, unless a customer demands it explicitly.
> Has it occurred to you that it losing money for a while might have contributed to its eventual unmaintainability
It absolutely has. Maybe if the service hadn't used Docker Swarm or Docker at all, it would have lasted longer, since updating Docker would not have broken everything, since this was named a factor in the closure. And therefore, the time and money would have gone further.
> I don't have experience with Kubernetes nor Docker Swarm. The reason is that the truisms have saved me from it.
Have they, though? It seems to me they may have "saved" you from an opportunity to significantly simplify your life as a sysadmin. Sure, your deployment shell scripts are "simple" - what, a hundred lines? A couple hundred? You have to deal with different repos for different distros, I expect, adding repositories for deps that aren't in the distro repo, any number of weird edge cases - I started writing scripts like that in 2004, I have a pretty good sense of what "simple" means in the context where you're using it.
Meanwhile, my "simple" deployment scripts average about one line. Sure, sometimes I also have to write a Dockerfile if there isn't an image in the registry that exactly suits my use case. That's a couple dozen lines a few times a year, and I only have to think about dependencies when it comes time to audit and maybe update them. And sure, it took me a couple months of intensive study to get up to speed on Docker - in exchange for which, the time I now spend thinking about deployments is a more or less infinitesimal part of the time I spend on the projects where I use Docker.
Kubernetes took a little longer, and manifests take a little more work, but the same pattern holds. And in both cases, it's not only my experience on which I have to rely - I've worked most of the last decade in organizations with dozens of engineers working on shared codebases, and the pattern holds for everyone.
I don't know, I suppose. Maybe there's another way for twenty or so people to support a billion or so in ARR, shipping new features all the while, without most months breaking a sweat. If so, I'd love to know about it. In the meantime, I'll keep using those same tools for my single-target, single-container or single-pod stuff, because they're really not that hard to learn, and quite easy to use once you know how. And too, maybe it's worth your while to learn just a little bit about these tools you so volubly dislike - if nothing else, in so doing you may find yourself better able to inform your objections.
All that said, and faint praise indeed at this point, but on this one point we're in accord:
> If you don't talk me into learning Kubernetes, I won't, unless a customer demands it explicitly.
I did initially learn Docker and k8s because a customer demanded it - more to the point, I went to work places that used them, and because the pay was much better there I considered the effort initially worth my while. That's paid off enormously for me, because the skills are much in demand; past a certain point, it's so much easier to scale with k8s especially that you're leaving money on the table if you don't use it - we'd have needed 200 people, not 20, to support that revenue in an older style, and even then we'd have struggled.
I still think it's likely worth your while to take the trouble, for the same reasons I find it to have been worth mine. But extrinsic motivation can be a powerful factor for sure. I suppose, if anything, I'd exhort you at least not to actively flee these technologies that you know next to nothing about.
Sure, you might investigate them and find you still dislike them - but, one engineer to another, I hope you'll consider the possibility that you might investigate them and find that you don't.
Well, here is where I must thank you. Thank you for replying to me, and giving me a real reason to look at this ecosystem.
My personal deployment script is really just one SCP command - it copies the new version of my statically-built blog to my server. The web server comes with the OS, and that's all I need.
But when I read "hundred lines? A couple hundred?" I realized my company has a script fitting that bill. There might be an opportunity for improving it. While I am still somewhat skeptical, because using Kubernetes instead of that script might still not be worth it long-term for a 7-person company (of which 3 devs and one sysadmin), I will check out its capabilities.
> in so doing you may find yourself better able to inform your objections.
Thank you for the patience to follow up, in spite of my arrogance. I might just come up with an improvement somewhere. Certainly we're a long way from a billion in ARR - I am thankful for your valuable time, and wish you continued and further success!
One thing, I'd suggest looking to Docker before Kubernetes unless you already know you need multi-node (ie, multi-machine) deployments, and maybe as a pilot project even if you do. Kubernetes builds upon many of the same concepts (and some of the same infrastructure) as Docker, so if you get to grips with Docker alone at first, you'll likely have a much easier and less frustrating time later on than if you come to Kubernetes cold. (And when that time does come, definitely start with k3s if you're doing your own ops/admin work - it's explicitly designed to work on a smaller scale than other k8s distributions, but it also pretty much works out of the box with little admin overhead. As with starting on Docker alone vs k8s, it's all about managing your frustration budget so you can focus on learning at a near-optimal rate.
But hey, thanks for the well-wishes, and likewise taking the time in this thread! It's been of real benefit to me as well. If we're to be wholly honest, as an IC in my own right I've never been above mid-second quartile at absolute best, and at my age I'll never be better than I am today - or was yesterday. But that also means I'm at a point in my career where I best spend my time helping other engineers develop; if I can master the skill of making my accumulated decades of experience and knowledge, and whatever little wisdom may be found in that, of use to engineers who still have the time and drive to make the most of it in ways I maybe failed to do - well, I'll take it, you know? It's not the kind of work I came into this field to do, I suppose, but I've done enough of it by now to know both that I can do it, and that it is very much worth doing.
So, in that spirit and quite seriously meant - I might be off work sick this afternoon, one peril I'm finding attends ever more frequently upon advancing age, but evidently that's no barrier to improving the core skill that I intend to build the rest of my career around. Thank you for taking the time and trouble to help make that possible today, and here's likewise hoping you find all the success you desire!