A tiny Docker image to serve static websites

KronisLV 4 years ago |

> My first attempt uses the small alpine image, which already packages thttpd:

  # Install thttpd
  RUN apk add thttpd

Wouldn't you want to use the --no-cache option with apk, e.g.:

  RUN apk add --no-cache thttpd

It seems to slightly help with the container size:

  REPOSITORY       TAG       IMAGE ID       CREATED          SIZE
  thttpd-nocache   latest    4a5a1877de5d   7 seconds ago    5.79MB
  thttpd-regular   latest    655febf218ff   41 seconds ago   7.78MB

It's a bit like cleaning up after yourself with apt based container builds as well, for example (although this might not always be necessary):

  # Apache web server
  RUN apt-get update && apt-get install -y apache2 libapache2-mod-security2 && apt-get clean && rm -rf /var/lib/apt/lists /var/cache/apt/archives

But hey, that's an interesting goal to pursue! Even though personally i just gave up on Alpine and similar slim solutions and decided to just base all my containers on Ubuntu instead: https://blog.kronis.dev/articles/using-ubuntu-as-the-base-fo...

tadbit 4 years ago |

I love stuff like this.

People will remark about how this is a waste of time, others will say it is absolutely necessary, even more will laud it just for the fun of doing it. I'm in the middle camp. I wish software/systems engineers would spend more time optomising for size and performance.

memish 4 years ago | |

Wouldn't removing Docker entirely be a good optimization?

kube-system 4 years ago | | |

Docker adds other value to the lifecycle of your deployment. An "optimization" where you're removing value is just a compromise. Otherwise we'd all run our static sites on UEFI.

jart 4 years ago | | |

Redbean supports UEFI too. Although we haven't added a bare metal implementation of berkeley sockets yet. Although it's on the roadmap for the future.

jamal-kumar 4 years ago | | |

oh wow are you justine?

i've been meaning to ask you this for a decade. whatever happened to when you wrote a blog with insanely irritating serifs that connected certain letters together? what was the rationale behind that? never seen it since

I'm insanely impressed by APE and redbean by the way, blows OP out of the water!

jart 4 years ago | | |

Oh you mean the blog with the long s? I was reading a lot of books at the time that were written before 1800 and I found it so fascinating how different typography was back then. I found a font I could pay for called Quant that did a really good job reproducing archaic ligatures and the long s, so I used it on a blog for a short period of time. Sadly it got negative feedback. So lately I've been focusing on https://justine.lol/ which uses Roboto. I'm glad to hear you're enjoying it!

ar_lan 4 years ago | | |

This is a really good point, and something I think a lot of people forget. It's true, the most secure web app is one written with no code/no OS/does nothing.

Adding value is a compromise of some increased security risk - and it's our job to mitigate that as much as possible by writing quality software.

vanviegen 4 years ago | | |

What value is that, for running such a simple piece of software?

paxys 4 years ago | | |

You can have multiple instances of the server running on the machine without interfering with each other.

You can limit file system access for the server to only a certain folder.

You can similarly limit port access and manage conflicts (e.g. multiple servers can think they are listening on a certain open port but those are mapped to something else on the host).

If you have multiple machines with different operating systems or even architecture you can deploy your server as a container more easily on them without needing to rebuild or test for each one.

You can have the same environment running locally while development or on CI servers without complicated setups.

The system can scale out a lot more easily to hundreds/thousands of machines if you decide to use something like Kubernetes.

vorticalbox 4 years ago | | |

A few off the top of my head.

The ability to pull the image on to any machine without needing to clone the source files and build it.

Smaller images mean faster pod starts when you auto scale.

spicybright 4 years ago | | |

You have to login to some docker repository anyways and know the series of commands to actually run it. Cloning a repo and running a shell script is probably a lot easier and faster than that.

What kind of work are you doing that requires really fast auto scaling? Is a few minutes to spin up a new instance really that cumbersome? Can you not signal for it to spin up a new instance a tiny bit earlier than when it's needed when you see traffic increases?

bryanrasmussen 4 years ago | | |

Ok when you say a few off the top of my head it implies that there are a bunch and these are like some super obvious ones, but it sounds like this is actually only useful if you have a bunch of infrastructure set up to serve sites for projects and customers that need containerization and then you just throw this simple little static site docker instance in there because when you're maintaining a lot of docker instances it is just simpler to do?

Which seems like sort of an edge case for value adding, and makes me feel like it really doesn't add any value to do this unless you already are doing it for everything, and thus you really wouldn't be throwing out any value by just serving the static site without the docker overhead.

marksomnian 4 years ago | | |

Adding to some of the other responses, one reason I chose to deploy a SPA I'm working on as a Docker image is atomicity - if I want to deploy a newer version I simply switch out the tag in my container orchestrator's config (Nomad in this case, but the same principles apply to k8s and friends) and it's guaranteed that the new deployment will be pristine, without the risk of leftover files from a rsync or similar - and if I need to roll back I do the exact same.

Filligree 4 years ago | | |

There’s value in that, but you don’t need Docker with its related debugging and maintenance overhead to get it. NixOS, among other tools, will do the same thing while constructing a “flat” operating system image.

Anything else, though? There’s got to be more to it than that, or it wouldn’t be as popular as it is.

jamal-kumar 4 years ago | | |

yeah see some of us still do this on OSes that haven't turned into a giant bloated hodgepodge of security theatre and false panacea software.

docker has dead whale on the beach vibes. what value does it offer to those of us who have moved on from the mess linux is becoming?

kube-system 4 years ago | | |

I’m not suggesting it has value to everyone. I’m suggesting it has value to the people who see value in it.

jamal-kumar 4 years ago | | |

I'm super curious to know what the value to people who see that happens to be. It's serving static websites, why do I need to wrap THAT of all things in a container?

Really, enlighten me

themacguffinman 4 years ago | | |

Out of curiosity, what OS have you moved on to?

jamal-kumar 4 years ago | | |

OpenBSD for the past 10 years or so has been really good to me and my clients, and it just keeps on getting better while linux keeps on getting worse. It's kind of a nobrainer these days.

Hell if you just need to serve static HTTP it even has its own built in webserver now:

https://man.openbsd.org/httpd

throwanem 4 years ago | | |

In terms of CPU cycles and disk space, maybe. In terms of engineer cycles, absolutely not. Which costs more?

danuker 4 years ago | | |

Hmm, a SCP shell script on my laptop, prompting my SSH key's password and deploying the site to the target machine?

Or a constantly-updating behemoth, running as root, installing packages from yet another unauditable repository chain?

somenewaccount1 4 years ago | | |

You forgot the step where you had to provision that server to run the software and maintain all the systems security updates on the live running server, and that server requires all the same maintenance, with or without docker. And if you fuck it up, better call the wife and cancell Sunday plans because you forgot how it all gets installed and ......yeah, just use docker :p

danuker 4 years ago | | |

Debian offers unattended upgrades: https://wiki.debian.org/UnattendedUpgrades

And security updates, as you said, are needed regardless of whether you run Docker on top. I think Docker is a needless complexity and security risk.

hyperhopper 4 years ago | | |

The first option is something custom that you had to write yourself and remember how to use years later, or explain how to use to others

The second option is standardized and usually the same 1 or 2 commands to run anywhere

marginalia_nu 4 years ago | | |

Building simpler systems allows you to save on all three.

throwanem 4 years ago | | |

That's true, but in my experience there is nothing mutually exclusive in systems being simple and systems running Docker.

Granted, you do need to learn how Docker works, and be ready to help others do likewise if you're onboarding folks with little or no prior experience of Docker to a team where Docker is used. That's certainly a tradeoff you face with Docker - just as with literally every other shared tool, platform, codebase, language, or technological application of any kind. The question that wants asking is whether, in exchange for that increased effort of pedagogy, you get something that makes the increased effort worthwhile.

I think in a lot of cases you do, and my experience has borne that out; software in containers isn't materially more difficult to maintain than software outside it if you know what you're doing, and in many cases it's much easier.

I get that not everyone is going to agree with me here, nor do I demand everyone should. But it would be nice if someone wanted to take the time to argue the other side of my claim, rather than merely insisting upon it with no more evident basis than arbitrarily selected first principles given no further consideration in the context of what I continue to hope may develop into a discussion.

marginalia_nu 4 years ago | | |

Docker is absolutely ups the complexity.

Whatever set-up your application needs is a still necessary step in the process. But now you've not only added more software in docker with its a docker registry, and Docker's state on top of the application's state, you've also introduced multiple virtual filesystems and a layer of mapping between those and locations on the host, mappings between the container's ports and the host's ports. There is no longer a single truth about the host system. The application may see one thing and you, the owner, another. If the application says "I wrote it to /foo/bar", you may look in "/foo/bar" and find that /foo doesn't even exist.

All of that is indirection and new ways things can be that did not exist if you just ran your code natively. What is complexity if not additional layers of indirection and the increase of ways things can be?

krick 4 years ago | | |

Does it? Or, rather, is it even simpler?

To host something as a docker container I need 2 things: to know how to host docker, and a docker image. In fact, not even an image, just a dockerfile/docker-composer.yaml in my source code. If I need to host 1000 apps as a docker containers, I need 1000 dockerfiles and still to know (and remember) 1 thing: how to host docker. That's 1 piece of knowledge I need to keep in my head, and 1000 I keep on a hard-drive, most of the time not even caring what's the instruction inside of them.

If I need to host 1000 apps without dockerfiles, I need to keep 1000 pieces of knowledge in my head. thttpd here, nginx to java server there, very simple and obvious postgres+redis+elastic+elixir stack for another app… Yeah, sounds fun.

qbasic_forever 4 years ago | |

I think the real value is just focusing on the absolute minimum necessary software in a production docker/container image. It's a good practice for security with less surface area for attackers to target.

encryptluks2 4 years ago | |

The difference between a systems engineer and a software engineer is that to a systems engineer a half functioning 5MB docker image is okay but to a software engineer a fully functional 5GB Node image is fine.

ttty 4 years ago | | |

Premature optimisation? 5 gb doesn’t matter. It’s not great, don’t get me wrong.

mr-karan 4 years ago |

While this is remarkably a good hack and I did learn quite a bit after reading the post, I'm simply curious about the motivation behind it? A docker image even if it's a few MBs with Caddy/NGINX should ideally be just pulled once on the host and sit there cached. Assuming this is OP's personal server and there's not much churn, this image could be in the cache forever until the new tag is pushed/pulled. So, from a "hack" perspective, I totally get it, but from a bit more pragmatic POV, I'm not quite sure.

kra34 4 years ago |

I love it! Can you add SSL though? Does it support gzip compression? What about Brotli? I like that it's small and fast so in addition to serving static files can it act as a reverse proxy? What about configuration? I'd like to be able to server multiple folders instead of just one?

Where can I submit a feature request ticket?

stevefan1999 4 years ago | |

https://github.com/weihanglo/sfz

check this out

CameronNemo 4 years ago | | |

This seems to be intended for local host usage exclusively. Is anyone using this for public or even internal http hosting?

nilsandrey 4 years ago | |

I think is not precisely an active project, but it's open on GitHub[1], I guess we can try to open issues there.

- [1] https://github.com/lipanski/docker-static-website

0xbadcafebee 4 years ago |

If you use "-Os" instead of "-O2", you save 8kB!

However, Busybox also comes with an httpd... it may be 8.8x bigger, but you also get that entire assortment of apps to let you troubleshoot, run commands in an entrypoint, run commands from the httpd/cgi, etc. I wouldn't run it in production.... but it does work :)

kissgyorgy 4 years ago |

Redbean is just 155Kb without the need for alpine or any other dependency. You just copy the Redbean binary and your static assets, no complicated build steps and hundred MB download necessary. Check it out: https://github.com/kissgyorgy/redbean-docker

mrweasel 4 years ago | |

There's also the 6kB container, which uses asmttpd, a webserver written in assembler.

https://devopsdirective.com/posts/2021/04/tiny-container-ima...

danuker 4 years ago | |

Wow! This is the Redbean which is an "Actually Portable Executable", or a binary that can run on a range of OSes (Linux, Windows, MacOS, BSDs).

http://justine.lol/ape.html

adolph 4 years ago | | |

Well worth a read:

I believe the best chance we have of [building binaries "to stand the test of time with minimal toil"], is by gluing together the binary interfaces that've already achieved a decades-long consensus, and ignoring the APIs. . . . Platforms can't break them without breaking themselves.

tyingq 4 years ago | |

And it does https/tls, where thttpd does not.

somenewaccount1 4 years ago | | |

I'm confused how the author considers thttpd more 'battle tested' if it doesn't resolve https.

Either way though, it's a great article I'm glad the author took to write. His docker practices are wonderful, wish more engineers would use them.

SahAssar 4 years ago | | |

The term 'battle tested' has nothing to do with amount of features, it's about how proven the stability and/or security of the included features included are. The term also usually carries a heavy weight towards older systems that have been used in production for a long time since those have had more time to weather bugs that are only caught in real-world use.

bornfreddy 4 years ago | | |

Also, https is often dealt with on a different server (load balancer for example).

jart 4 years ago | | |

Yes but it's nice to have the SSL built-in for when you want it. Web servers like Varnish and thttpd take a really hard stance on the issue, where they don't want to touch the crypto at all. Honestly, I don't blame them because implementing SSL is prodigiously technical and emotional. One of the things I do is I offer a file called redbean-unsecure.com that has zero-security baked-in so that folks who love redbean but want to handle the security separately themselves can do so. But like I said when we don't have strong opinions on separation of concerns, having a fast snappy tiny zero config SSL is nice.

cassandratt 4 years ago | | |

"Battle tested" typically means that the code has been running for a long time, bugs found, bugs squashed, and a stability has been attained for a long time. It's usage predates the "information wars", back when we really didn't think about security that much because nothing was connected to anything else that went outside the companies, so there were no hackers or security battles back then. So I suspect this is the authors frame of reference.

mg 4 years ago |

For static websites, is there any reason not to host them on GitHub?

Since GitHub Pages lets you attach a custom domain, it seems like the perfect choice.

I would expect their CDN to be pretty awesome. And updating the website with a simple git push seems convenient.

jandeboevrie 4 years ago |

But why would you prefer Docker like this over, for example, running thttpd directly? Saves you a lot of Ram an indirection?

qbasic_forever 4 years ago | |

Run this on a linux host and it isn't that much different from running thttpd directly. There's just some extra chroot, cgroups, etc. setup done before launching the process but none of that gets in the way once it's running. Docker adds a bit of networking complexity and isolation, but even that is easily disabled with a host network CLI flag.

It's really only on windows/mac where docker has significant memory overhead, and that's just because it has to run a little VM with a linux kernel. You'd have the same issue if you tried to run thttpd there too and couldn't find a native mac/windows binary.

somenewaccount1 4 years ago | |

For one, because his home server provides multiple utilities, not just this one project, and without docker he starts to have dependency conflicts.

He also like to upgrade that server close to edge, and if that goes south, he want to rebuild and bring his static site up quickly, along with his other projects.

zahllos 4 years ago |

The only thing I would change: I would use Caddy and not thttpd. This way the actual binary doing the serving is memory-safe. It may well require more disk space, but it is a worthwhile tradeoff I think. You can also serve over TLS this way.

bachmitre 4 years ago |

How many requests can thttpd handle simultaneously, compared to, say nginx ? It's a moo point being small if you then have to instantiate multiple containers behind a load balancer to handle simultaneous requests.

0xb0565e487 4 years ago |

I don't know why there is a big fish at the top of your website, but I like it a lot.

adolph 4 years ago | |

Agreed. GIS says at least some are from the NYPL:

https://nypl.getarchive.net/media/salmo-fario-the-brown-trou...

ludwigvan 4 years ago | |

Me too!

Also the other blog posts have different big fishes, so check them out as well.

nitinagg 4 years ago |

For static websites, hosting them directly on S3 with cloudfront, or on cloudflare might be a better option?

flatiron 4 years ago | |

How’s the free tier on aws for s3 and cloudfront? I can think of free alternatives that are equally as good if not better.

hedora 4 years ago | | |

S3 + cloudfront + lambda is costing me pennies per month for a trivial site. What are the free alternatives that beat it?

Requirements:

- rsync style publishing

- not supported by tracking users.

- raw static file hosting (including html)

- redirect foo.com/bar to foo.com/bar/index.html (this is why I need lambda...)

- zero admin https certificate management

flatiron 4 years ago | | |

GitHub pages gives you all this except the redirect and replace rclone with…git and is free (although evil Microsoft blah blah)

MuffinFlavored 4 years ago | |

or https://pages.github.com/ maybe?

souenzzo 4 years ago |

Is it smaller than darkhttpd?

https://unix4lyfe.org/darkhttpd/

kristianpaul 4 years ago |

Why do we need this when you can run a web server inside systemd?

hedora 4 years ago | |

This doesn't hijack a bunch of stuff on the host OS and replace it with garbage versions.

I want things like DNS, X11 screeb locking, ssh session management, syslog, etc. to just work. I can't figure out how to fix any of that stuff under systemd, and at least one is always broken by default in my experience.

wereHamster 4 years ago |

I used this as a base image for a static site, but then needed to return a custom status code, and decided to build a simple static file server with go. It's less than 30 lines, and image size is <5MB. Not as small as thttpd but more flexible.

rhim 4 years ago |

althttpd beats this: https://hub.docker.com/r/rouhim/althttpd/tags (~63 KB)

superkuh 4 years ago |

Well, this will definitely serve an unchanging static website. But unchanging static websites are just archives. Most static websites have new .html and other files added on whim regularly.

EnigmaCurry 4 years ago | |

You can just mount an external volume on top of /home/static to and be able to change the files that way. But for a single-page-app I think it works great to be able to version the entire site in the docker image tag.

CameronNemo 4 years ago |

I do something similar at work for internal only static docs.

The image is a small container with an http daemon. It gets deployed as a statefulset and I mount a volume into the pod to store the static pages (they don't get put into the image). Then I use cert-manager and an Istio ingress gateway to add TLS on top.

Updating the sites (yes, several at the same domain) is done via kubectl cp, which is not the most secure but good enough for our team. I could probably use an rsync or ssh daemon to lock it down further, but I have not tried that.

sgtnoodle 4 years ago |

Seems pretty silly. That being said, I did the exact same thing a couple years ago for work. My first attempt was to use busybox's built-in httpd, but it didn't support restarts. I vaguely recall settling on the same alpine + thttpd solution. The files being served were large, so the alpine solution was good enough.

amanzi 4 years ago |

I assume the author would then publish this behind a reverse proxy that implements TLS? Seems like an unnecessary dependency, given that Docker is perfect for solving dependency issues.

EnigmaCurry 4 years ago | |

That's certainly what I would do. I think its great that thttpd does not include a TLS dependency itself. Every once in awhile I find a project that forces their own TLS and its annoying to undo it.

mro_name 4 years ago |

Docker, really?

Sounds like brain surgery in order to make a jam sandwich to me.

nilsandrey 4 years ago | |

Locally could be easier to rely on background run of a docker image instead of another console serving the files, just to run and forget, just use it by the dependent project you probably could be working on (Dependent on the static content). I'm agreed on the cloud it's better use the plethora of services available for static content directly like Cloudflare.

mro_name 4 years ago | | |

> plethora of services available for static content

when I think of static content, I think of buying a domain name + shared hosting for monthly EUR 2,-.

And not assigning rights nor control but having a legal claim on both service and name. Am I missing something?

Casteil 4 years ago | |

It's a good way to compartmentalize if you've got a lot going on on a single machine.

mro_name 4 years ago | | |

> compartmentalize

a static website, srsly?

Casteil 4 years ago | | |

Uh, yeah? Could host dozens (or even hundreds) of different sites/domains with different degrees of functionality in different languages/frameworks for different clients on one machine.

mro_name 4 years ago | | |

> have millions of files

Congratulations. I have millions of files on my static sites. So what? Would you recommend a container for each? To what purpose?

> different degrees of functionality

We're still talking static sites. There is no 'functionality', right?

Casteil 4 years ago | | |

> Congratulations. I have millions of files on my static sites. So what? Would you recommend a container for each? To what purpose?

...what? Where are you quoting that from? No, I'm not recommending Docker if all you do is host static pages.

> We're still talking static sites.

No, I said "if you've got a lot going on on a single machine" - I didn't just mean static sites. I did respond with "different sites/domains with different degrees of functionality in different languages/frameworks", which means a variety of services, e.g. one client may be a static page, another might use a backend/API in Node, and another in C#/.NET - etc.. heck, you might even used containerized DBs for some of them. Hence Docker.

krnlpnc 4 years ago |

Up next: how to serve a LAMP site from a single docker image

hedora 4 years ago | |

It's pretty easy. I put the data in a bind mount on btrfs on my synology NAS. It snapshots the FS and does an incremental backup with hyper backup each night. The backup is crash coherent, zero downtime, and the RDBMS doesn't need to know about it.

This is really useful for tiny little services that each want a different database server.

patrakov 4 years ago |

That's a good educational resource to show to people who need to learn about multi-stage Docker builds.

pojzon 4 years ago |

Tbh the moment the author thought about hosting yourself anything to serve static pages -> it was already too much effort.

There are free ways to host static pages and extremely inexpensive ways to host static pages that are visited mullions of times per month using simply services built for that.

krick 4 years ago | |

So, the best free or extremely inexpensive way to host static pages that are visited a lot would be...?

riffic 4 years ago | | |

netlify, amplify, cloudflare pages, vercel, et cetera

It's a crowded field now

cutler 4 years ago |

Is nothing sacred? The KuberDocker juggernaut leaves no stone unturned. Laughable given that Docker was originally designed for managing massive fleets of servers at FAANG-scale.