We are airgapped so can’t use Let’s Encrypt. We inject the certs into our containers via Ansible or Docker Compose.
I haven’t checked recently but at the time nginx on Windows used select() and envoy was either beta or needed a recent version of the Windows kernel that not all customers were running.
We still use it today.
From a casual glance it does nothing haproxy doesn't already do, at a fraction of the cpu cost.
Load their Photoprism image on a standard server with only IPv6 (as a v4 address costs extra) and certificates will not get generated; logs point to Traefik although the solution is modifying Dockerfiles; thanks Dockerphiles, for insisting your software is the answer to everything server...
Aren't caddy and traefik fairly comparable? I've only used them both lightly so I may be missing the core point of each, but I thought of them as very similar.
- https://caddyserver.com/docs/modules/http.reverse_proxy.upst...
https://www.haproxy.com/blog/consul-service-discovery-for-ha...
Totally not a security issue. Source: trust me bro.
into the trash it goes. anyone who support https everywhere and ever slightly tolerates SNI is a fool.
It's pretty much mandatory if you intend to serve multiple domains with different certificates from the same host/proxy, which seems like a very very common use case, and there is no alternative to this right now.
"we need MitM for performance". listen to yourself. if some optimization breaks security, you do not optimize.
Check it out in our docs: https://docs.kraft.cloud/guides/traefik/
It's also possible to start traefik and other services together using Compose files: https://docs.kraft.cloud/guides/features/compose/
No. Not even remotely true. If you can write a coherent argument that substantiates this claim then I will address it.
It doesn't. SNI doesn't leak the URL being accessed, or anything that isn't encoded in the hostame.
E.g., we can imagine writing or using a plugin to figure out some upcoming fancy sticky session routing logic based on routes/content vs just the user IP, but there are easier and more 'with the grain' solutions than with what Caddy exposes today, afaict
(Agreed tho: The reverse proxy module, for more typical cases, is awesome and we have been enjoying for years!)
Also yes, Caddy does service discovery if you use https://github.com/lucaslorentz/caddy-docker-proxy, configuration via Docker labels. Or you can use dynamic upstreams (built-in) https://caddyserver.com/docs/caddyfile/directives/reverse_pr... to use A/AAAA or SRV DNS records to load your list of upstreams.
(this basically solves the sticky shared chatroom websocket problem for us when routing to resources with gravity, in our case, multiuser data science notebooks!)
This app is very capable and flexible, but is still in development. Please expect breaking changes.
And it does not seem to be really seeing many new commits recently, so it feels pretty much "beta" at this point.I have used it since v1 and I routinely get lost in their docs, and get immensely frustrated. I have been using Caddy for smaller projects simply because its documentation is not as terrible (though not great by any stretch)
Technical writers: documentation by example is good only for newbies skimming through. People familiar with your product need a reference and exhaustive lists, not explanation for different fields spread over 10 tutorial pages. Focus on those that use the product day in and day out, not solely on the "onboarding" procedure.
This is my pet peeve and the reason why I hate using Ansible so damn much, and Traefik to a lesser extent.
I agree. We all would benefit by giving more exposure to documentation frameworks such as https://diataxis.fr
For folks who might recognize the author's name:
Daniele Procida: Director of Engineering at Canonical. Creator of Diátaxis and BrachioGraph. Django core developer. Fellow of the Python Software Foundation.
So a beginner has to jump a BUNCH of pages to get a primer, and an expert has to bookmark the couple actually-useful pages and later give up and just look at github for specific operators/processors when they already know the basic config inside out.
Further, I found a lot of little bugs that are hard to Google, or when Googling finding open issues that are either known and working on, or no response at all.
I ended up just throwing it in the garbage and using direct connectors. I like what Otel is trying to achieve, but it feels extremely opaque and half baked at the moment.
It offers sample applications and a website that shows in a step-by-step manner what you have to do to get OpenTelemetry configured in your apps. My goal is to keep the sample apps to the minimum and focused on a single goal: E.g., I want to add tracing to my app; I want to record metrics; I want to correlate logs with traces etc.
I have lots of ideas and things in the backlog, such as collector recipes.
It's all OSS as well, so anyone can contribute with more samples :)
All in the name of selling products which abstract those parts, consulting or courses.
Then they transitioned to OpenTelemetry – for very good, justifiable, "good community member" reasons – and yikes, everything got so much more complicated. We ended up writing our own moral equivalent to the Beeline SDK. (And Honeycomb have followed up since with their own wrappers.)
There's so much I love about Open Source, but piles and piles of wildly generic, unopinionated code... ooft. :-)
It's sort of assumed that you are going to do extremely simple tasks on very flat data structures. That doesn't tend to be the reality that most of us live in. And to really get the most out of these languages you have to understand an entire unspoken set of rules on how to use yaml. That's never really pointed out in the docs.
Additionally, there are docs for the unique settings for each module but as far as using the standard settings, additionally, its rarely clear how to operate on the data that might be returned or combined with anything mildly complex, you are given a dozen 1 stanza examples for each item like a stack of ingredients and then told to bake a cake.
I've had this experience with basically every one of the various yaml interpreter systems I've used.
After a few 100k lines of yaml I can get things done but the docs are useless other than a listing of settings.
I've found a pretty good workflow with using ansible-doc though, with two-three aliases that I use constantly:
alias adl='ansible-doc --list'
alias adls='ansible-doc --list | less -S'
alias ad='ansible-doc'
Then I'll:1. Use adls to quickly search (in less with vim bindings) for relevant commands,
2. Check up the docs with `ad <command>`.
3. Almost always immediately jump to the end (`G`) to see the examples, which typically provides a direct answer to what I need.
Since authoring Ansible scripts is so dependent on the docs, I think they really should make this process work better out of the box though, providing some interface to do these lookups quicker without a lot of custom aliases.
[0] https://docs.ansible.com/ansible/latest/collections/index_mo...
The docs are very straightforward and thorough.
Thanks for Caddy btw. Neat little tool.
I would rather have a more examples. And kinda _advanced_ and complex, rather than trivial we see in the docs.
Even though I had a working V1 configs and had a know-how about lingo / architecture like routes / services I still struggled for a day or two to properly configure a pretty simple workflows in v2 like:
* add TLS with LetsEncrypt
* configure multiple domains
* configure multiple services
* add Basic Auth for some domains
That said, more detailed and extensive docs would be much better.
I also remember finding things in github issue comments that worked as bugfix/workaround of something from the docs.
PS. For now I've moved to Caddy for simplicity and better Caddy DSL compared to yaml/label verbose config.
As far as "a lot of words written, can't find what you need", Fortinet is my poster child there (based on trying to use it a decade ago). Everything I looked up there had 10,20,30 pages of introductory material with the Fortinet stuff spread throughout it.
What you are given is https://docs.ansible.com/ansible/latest/playbook_guide/playb... and you need basically to read/scan each example until you find what you need [1]. Do you call that good, especially when these are basically the only way of doing anything a little complex? That's a sure way of killing my flow and productivity in its tracks. I have been through this page in anger a dozen times, and I still have no idea what Ansible filters can or cannot do.
Also, using Google to find stuff is "cheating". The goal of documentation is to be able to use it as reference; if you need an external tool to find anything in it, that defeats its purpose a bit. When people wrote documentation books, they had to make sure it's usable, legible and efficient. These days apparently that's become lost art.
1: these examples are not even exhaustive, because they don't list all the builtin Jinja filters; chances are that what you need isn't listed on that page, but you should instead refer to https://tedboy.github.io/jinja2/templ14.html
- Having documentation split between v1 and v2, that is similar yet different enough to yield half-baked configurations before you realize what you did wrong. The website itself provides the barest of subtle changes to distinguish the two. Edit: I learned all this prior to v3.
- Supporting multiple config formats (TOML and YAML) which makes it that much harder to hunt down relevant forum posts for support. That wouldn't be a huge problem if it weren't for things that you need that aren't in the documentation (above)
- Multiple configuration _modes_. You can set options on the CLI, or in config files, and they are not 100% reflected between the two; some things must be in config files no matter what. Config files themselves are split between "dynamic" and "static" configs, and you must place the correct options in the right file.
- The one thing that Traefik does well is routing traffic to/from containers. Container labels are used to drive this. How to map those label namespaces to various internal and user provided aspects of Traefik is unclear in the docs, at best.
- Traefik exposes a dashboard web GUI by default. Yet much of the useful diagnostic and troubleshooting information is only found in the container logs.
Retiring v1 completely, picking a single configuration language/mode, and providing a rich library of example docker-compose configs, would go a very long way to securing this project's future.
You really need at least three documentation targets:
- onboarding the newbies workflows/tutorials - intermediate "focus on the important bits" workflows/tutorials - exhaustive references
There might be other useful ones as well, but I never see those three hit at the same time adequately.
Eg: https://doc.traefik.io/traefik/routing/services/
If you visit that page in your desktop browser you'll get less words per column then seeing this on the iPad (works even in dev tools window). Mind blowing.
Well said, this extends far beyond Traefik. Far too much documentation these days is tailored for people who have never used software of it's type. This was a workable strategy during the Great Developer Boom, but that's more or less over now.
As a developer who didn't come from this Boom, I have been constantly frustrated by this trope, and I hope the changes in the industry will tip the scales back toward solid reference documentation, so that I can feel confident in deploying more of these technologies.
Putting that more general note aside, I have been a Traefik user for years and I do recommend it. But a lot of what it does is difficult to cite using solid docs.
I'm happy with it though, it's a great piece of software. I wonder is there any other product out there with a similar feature set?
I do sometimes struggle to find the right doc when I'm searching for something about ansible core itself, but that doesn't happen too often.
https://registry.terraform.io/providers/hashicorp/aws/latest...
If I can't find the answer to what I need there I usually resort to LLMs, they are surprisingly good and fetching the info you need out of these massive documentations. The failure rate is quite high though so a lot of trial and error required, but the LLM at least gives you some hints to where to look for it.
I use duckduckgo and adding !ansible to my search usually gets me what I need pretty directly.
Hopefully as an aside (I know very little about traefik so maybe I am talking about them too and don’t know it), it seems like in the time since I abandoned Java they have weaponized that architectural strategy and I have no patience for it. I look at that sort of documentation and my eyes glaze over. Or if they don’t I feel disgust or anger and all three result in my stomping off.
Opentelemetry, particularly the stats code (vs the span code) triggered a lot of this in me. It has several sets of documentation that say different things. It took me a long time to figure out how to connect the amorphous dots, and then I didn’t entirely agree with their solution anyway.
However, I find it amusing that you wish there was a better reference. I think getting to the initial setup is quite hard. Once you have that, extending it is straightforward.
One pdf please. The book concept works!
You know who's docs blow too? Mellanox. I hate their stuff.
And to give credit where due: intel does a damn good job.
You are discounting massive amounts of unpaid labor done specifically for people like you. At this point, if you can't find what you're looking for, it's on you. Maybe do a little bit of your own homework instead of throwing your hands up after 2 minutes and crying to the maintainers.
And, to be clear, I have no idea if the person you're responding to's criticism is valid. But I also know that your response does not negate their criticism at all.
And when there isn't just straight documenting of what's been implemented then it is an unreasonable gate to usage which limits customers to only the flows imagined by the technical writer.
Which itself breeds this sort of refusal to participate. Either the end user is ungrateful and needs to express that gratitude through silence or there's a smug moderator who's read everything and knows which paragraph of which tutorial has the answer and harangues anyone asking with a link and a "why didnt you read sentence 5 of paragraph 2 of a tutorial written 2 years and 3 major versions ago?"
We're moving back to NGINX.
All the features I need but *much* simpler.
You basically just have to pray the guy who made the module you need knows what he is doing, because there's no standards for documentation there either. Maintainers really need to put their foot down on random ass modules with 0 documentation providing critical functionality (i.e. S3 storage backend).
I read the article but I'm still not convinced Traefik has anything over Caddy for me. Maybe someone else does and can chime in.
I use a shell script to spin up K3s clusters and test apps I specify as a positional parameter on demand (leveraging the ttl.sh ephemeral container registry). The same script tears down the cluster when finished.
- Reverse proxying, with Docker labels for configuration. New workloads are picked up automatically (but I do need to attach workloads to Caddy's network bridge).
- TLS certificates
- Automatic DNS configuration (using yet another plugin, caddy-dynamicdns), so I don't have to worry too much about losing access to my stuff if my ISP decides to hand me a different IP address (which hasn't happened yet)
There are a few things I'm currently not entirely happy about my setup:
- Any new/restarting workload makes Caddy restart entirely, resulting in loss of access to my stuff (temporarily). Caddy doesn't hand off existing connections to a new instance, unfortunately.
- Using wildcard certs isn't as simple as it could/should be. As I don't want every workload to be advertised to the world through certificate transparency logs, I use wildcard certs, and that means I currently can't use simple Caddy file syntax I otherwise would with a cert per hostname. This is something I know is being worked on in Caddy, but still.
Anyway, I've used Traefik in k8s environments before, and it's been fairly pleasant, so I think I'll give it a go for my personal stuff too!
PS: Don't let this comment discourage you trying Caddy, it's actually really good!
I guess I was never the audience for Traefik. I just need an https enabled rev proxy. Or a basic-auth layer. In Caddy both are just 1 line, very concise, no layers (which I still don’t understand…)
I abandoned the dynamic/discovery/docker labelling functionality though it was just too finicky and annoying to debug.
Instead I generate a static config file using a template engine, pretty much all my things are just a combination of host/target/port so very easy to generate the relevant sections - I don't really have any complicated middlewares other than handling TLS. It sounds like the author of the linked post has taken the same route.
The config gets generated through an ansible script and then gets copied to the machine where traefik is running - traefik watches the directory where that file is and auto-reloads on changes.
It's been working great!
Where they seem differ is the licensing, enterprise model, source language, and data plane model (sidecar vs no sidecar).
Would be worth checking out, if you're currently considering options.
Should they be public behind such a proxy? (seems odd) Or should they be totally internal and then setup a Wireguard VPN to reach them?
Getting true zero downtime deployments only worked with their file provider but that’s a bit archaic these days.
I have IPv6 everywhere, with each service getting its own IPv6 address. Each service is managed in inetd-style (via systemd-socket-proxyd ), and so essentially listens directly.
For services that need to serve IPv4, I have a reverse proxy on my network edge that demuxes on TLS SNI to the corresponding IPv6 address.
The advantage here is never having to deal with complex applications, with their complex and changing configuration.
Very difficult to just look at a file and see which bits are labels for the sake of it, and which bits are direct instructions to builtin features.
Go devops HATE YAML-based DSL we just put it there cause there’s not alternatives, json ?, don’t wanna go there fortunately there’s CUE lang but moving all these project to accept cue isn’t that easy either.
> Devops guys are mostly incapable of using any service that isn't a) written in Go
Lol we basically rewrite it in Go if we’re using it frequently. Most Go projects are just things the founder really wanted for himself
For Traefik, afaict, something about k8s
The one thing that bothers me with traefik is that their implementation of ACME does not work if you have some sort of DNS load balancing. I had one setup with three servers responding to the same domain. It seems the first request )to start the ACME dance) would go to one server, and if the second one (with the .well-known address) is sent to a different one, it will just return a 404 and fail the whole thing. Now I either have * to delegate the certificate management to the service itself or add Caddy as a secondary proxy just to get certificate from it.
* Of course, someone smarter than me will point me to a better solution and I will be forever grateful.
> how do i configure Traefik to proxy traffic to two backend services?
Configuring Traefik to proxy traffic to multiple backend services is a common use case. You can achieve this by defining multiple entry points and routers in Traefik's configuration, each pointing to a different backend service. Here's a basic example of how you can configure Traefik to proxy traffic to two backend services:
yaml
# traefik.yaml
# Enable API and Dashboard
api:
dashboard: true
# Define entry points
entryPoints:
web:
address: ":80"
app1:
address: ":8081"
app2:
address: ":8082"
[...more stuff...]This is the system I've been dreaming about but haven't had time to dig into yet. I've got ollama and openwebui set up now though, and with OpenAI getting bigger context windows it seems like it might be possible to inject the whole set of docs into the prompt, but I'm not sure how to go about that
Perhaps having a focused LLM generate the steps could help catch some documentation deficiencies.
security.acme = {
acceptTerms = true;
defaults.email = "admin-email@provider.net";
certs."mydomain.example.com" = {
domain = "*.mydomain.example.com";
dnsProvider = "cloudflare";
environmentFile = "/path/to/cloudflare/password";
};
};
services.caddy.enable = true;
services.caddy.virtualHosts."subdomain1.mydomain.example.com" = {
extraConfig = ''
reverse_proxy 127.0.0.1:1234
'';
useACMEHost = "mydomain.example.com";
};
Configuring with nginx is also fairly similar I think.Overall that has been my biggest problem with traefik. Its awesome when it works, but when it does not I always seem to have problems troubleshooting and/or finding the information I need in the docs.
At work we will start using Traefik in prod towards the end of the year. I hope Traefik and I will become better friends before that :)
Certbot has plugins that directly support many DNS registrars, and can automate configuration of Nginx. Using, for example, the CloudFlare plugin for DNS validation combined with the Nginx plugin for local config would solve your problem readily.
My reverse proxy offered a service mesh, live config reloads, managed TLS certs, and automatically rerouted traffic around down services. The whole thing was a few hundred LOC anyone could understand in its entirety. It ran in production for years unchanged and never caused an outage.
You probably already know and maybe it didn't work for you, but there's quite a few Docker companion containers that automate let's encrypt certs for an nginx Docker container.
After moving to DNS-01 validation, which comes with the added benefit of letting me cut certs for services that aren't publicly exposed with way less orchestration required than with TLS-01 style validation, my experience was suddenly much better. Assuming the DNS provider is working (and if it's not, you're hopefully getting an API error from them before LE attempts to validate the record, the failure state happens well before any check failure backoffs happen at LE. At this point, regardless of whether I'm using Traefik, Caddy, Nginx, or any other reverse proxy, I'm pretty committed to only using DNS-01 based validation from LetsEncrypt from now on, or if I have to do TLS-01 based validation, to make darn sure things are right the first time with the Staging API first.
Which, speaking of, if you cut a Staging cert with LE via Traefik, there's no good way to invalidate the staging cert. You have to munge the ACME JSON to remove the cert and restart Traefik (could maybe do a SIGHUP? didn't try) to get it to pickup the changes.
All said, lots of weird silent failures and behaviors, but the biggest pains are making dependent service errors opaque.
I agree with your comments about Traefik being finicky, and would like to add that my very basic inhouse solution to do automatic Let's Encrypt integration (that also works with other ACME compatible CAs) is ~30 lines of bash, which is ran by cron every day. It's rock solid simply by failing hard when standard return codes fail. Monitoring for failed certificate renewals is as easy as handshaking with the endpoint and parsing the NotAfter field in the OpenSSL output. I run this as part of my regular HTTP endpoint monitoring solution at it tells me if any certificate will expire within 14 days.
The absolute worst failures I've experienced is having new domains start with a self-signed certificate until I reloaded nginx manually, and that I had 2 weeks to jump in and sort out some error because a certificate renewal failed.
So at least in my experience it turns out that LE-integration isn't a strong selling point. Logging and ease of configuration is. NGINX is not perfect in those aspects either, but it is a bit more robust and well-documented at least.
Ended up using nginx and adding a .well-know/certbot endpoint orso that used lua to call certbot. Some bash, rsync & nfs for config management, never had an issue with it. Not fully automated, but close enough. And very debuggable!
On top of it, the OpenResty community must be the rudest, most entitled people in the entire internet. Have a question, “YOURE DOING IT WRONG IDIOT” is the response. Of course every terrible decision they’ve made they justify with “BUT THE PERFORMANCE” as that’s the only thing worth considering.
We’re moving back to Traefik, or Caddy, both still in POC.
Truth is: I don't like nginx syntax and traefik is/was shiny :]. I went in for the LE renewal and containers, I stayed for the configuration style.
Built-in ACME support doesn’t work for me, so I still have some `systemctl restart traefik` hacks here and there.
The moment you need a feature which Traefik provides that isn't in Nginx is when I would consider the switch.
It's worth kicking the tires on. Both are great at different things.
In the past I deployed k3s but I realized that was too much and too complicated for my self hosted stuff. I just want to deploy things quickly and not have to handle the certs myself.
I just tested it locally to check and it works fine.
I think a combination of traefik and docker compose are in the sweet spot for small scale self hosters that haven't reached the point where k8s will pay off. i.e. if you have less servers than a k8s HA control plane would use.
We've called it "tray-feek" and it was half ok, then we actually had a call with the official support and they told us its pronounced same as regular "traffic". So any discussion about that proxy goes with: "so we're receiving traffic on our public load balancer which used traffic's native load balancing to send traffic to traffic's pods...". It sounds stupid because it is stupid.
I wondered how to pronounce Traefik myself, so I started googling, and came across this: https://traefik.io/blog/how-to-pronounce-traefik-d06696a3f02...
Tldr: just pronounce as you would "traffic".
[1] https://doc.traefik.io/traefik/middlewares/http/forwardauth/ [2] https://doc.traefik.io/traefik/master/https/tailscale/
I have set up quite a few as kubernetes pods that direct to private hostnames in different namespaces and pretty happy with it for internal apps.
They effectively are decrypting your entire traffic then, including login credentials. It’s a little scary…
Proxies or VPNs like you mentioned. You usually don't expose things if you don't have to.
For proof of concepts, I use cloudflare tunnels which allows you to add ACLs to particular routes.
I used to have all my docker compose files in elaborate structures but moved to portainer for simplicity. Together with caddy proxy it rocks (well, there are several things missing but I have hope)
Let's imagine we go this way, implemented own self made webserver in nodejs, started using it and the next day it will be required to add simple things like basic auth for specific location or ACL based on Maxmind geo data or even setting straightforward round-robin balancing among several php-fpm upstream, even without weights - what would be the flow here? Involving dev team and trying to put those tasks into their backlog?
Why it's better than just adjust Nginx config in 5-10 minutes?
I was trying to do some tests to compare kubernetes resource usage of nginx+php-fpm, nginx unit with php module and frankenphp (based on caddy). For reasons that are not relevant I need the service to be exposed on both ports 80 and 443 and do both plain HTTP and HTTPS. The host is not fixed because there is a public DNS but also the cluster-internal service name.
With nginx+php-fpm and nginx unit it was dead easy: here's port 80, here's a self-signed cert for TLS so listen also on 443. That's it. it works.
With Caddy it was so frustrating to see so many assumptions and "automatic redirects, trust me, this is what you want" and no obvious manual override that I just gave up. What I got working before I gave up was either only port 80 which is not good enough, either only port 443 which is also not good enough or both but 80 redirects to 443.
You need both a java doc style big 'ol list of every function and what they do, and a narrative/workflow/primer based section of you documention
You can, in fact, use any configuration format with the API as long as Caddy has its adapter compiled-in; you just have to use the correct value in the `Content-Type` header. For instance, you can use Caddyfile format using the `text/caddyfile` value in `Content-Type`. This is documented[0].
I also get that we're arguing TIMTOWDI vs YAGNI. So, I want to take a beat to say that I think your take is absolutely valid, but I'm coming at this from a place where flexibility isn't my primary concern. I'm usually more interested in tools that will reliably save me time, and that's a quality typically in opposition to flexibility.
For me, any tool that requires or permits multiple configuration planes to get off the ground threatens to be a time sink and a tax for an already packed schedule. This complexity usually requires the operator to absorb a lot of nuance, quirks, and idioms, which is something that's difficult to do quickly. Troubleshooting also grows in scope , again due to the complex design and multiple ways to set things up. Meanwhile, the most inflexible tools in my stack cause me the least trouble due to their simplicity: there's one way to do it, and if it's a common mistake, the answer is easily searched online.
In practice, looking to the Traefik community for support required decoding everyone's different take with varying degrees of CLI and config file involvement, sometimes fragmented across entire discussion threads (not to mention the v1/v2 problem). It made for a very time consuming troubleshooting run - something that would have been avoided by a more simple design like other tools use.
_acme-challenge.customer.com IN CNAME _acme-challenge.your-automated-domain.org.In concrete terms: to set up an instance for a customer, I've been asking them to set their domain with a CNAME to "gateway.communick.host", where I have traefik already configured with one TLS HTTP resolver and one DNS resolver. and I've been using the HTTP resolver one. Are you saying that I can just configure to use the DNS resolver and it will work, or do I need to tell my customers to add another record for the delegation to work?
What kind of documentation is this though? Is this how to bake a cake or tie a necktie, or is it how to setup a reverse proxy for the services in your k8s cluster?
If it's something a lay-person could do then I think this is a good strategy (though depending on the size/scope of the project/documentaiton it does seem like a pretty big effort to undertake without compensation), but if it's something highly technical like Traefik, I expect a lay-person to not even understand half the words/vocabulary in the documentation, let alone be able to perform X by reading it and following it.
> how to setup a reverse proxy for the services in your k8s cluster
Going off this specifically. I don't know how to do this. I actually have a k8s cluster on a home server waiting for me to do exactly this. Ideally there would be a doc that would start with ingress selection, and then guide a user through how to get it set up with common use-cases. Or something like that. Like others in this conversation, I've been leveraging LLMs with varying degrees of success to try and navigate this.
You're absolutely right that there are big omissions/holes in the Ansible docs, but I also think that using Google is not "cheating." My ideal of great documentation sounds like exactly what you would agree with: A complete and comprehensive "book" (could become a physical printed book, but needn't have to as it should be equally usable with good old-fashioned hyperlinks). It should have a logical flow, introductory sections to describe pre-requisite knowledge/concepts and things that are broadly applicable to the project as a whole. It should have a table of contents, and it should definitely have an index and comprehensive lists/tables of API details such as available field/properties, which options are valid (for enum fields), etc. Your example of Jinja filters supported by Ansible is a great one. I really miss the 90's era here where such manuals were common practice, even for things like PCs.
With that ideal described, though, I think it's important to recognize pragmatism and feasibility. Documentation takes time and money to produce. Search tools (including Google) already exist and can provide a valuable addition without spending time/effort on it, so I think they should be used. That said, I agree that it's not a good idea for doc writers to rely on that for things to be found! Table of contents, logical flow, and indexes should absolutely be thought through. If the documentation is just a bunch of random unorganized and uncatalogued pages that can only be found with a search engine, that is really bad and they should feel bad.
I think Ansible falls right in the middle there. It undoubtedly has some real glaring omissions/holes in it, but it is also not nearly the worst I've seen as well. I do dread having to go the Ansible docs though, which is an indictment against their quality, and the more I think/write about this the more I agree with you lol.
Also going to disagree about "using Google is cheating". The purpose of documentation is to help me get stuff done. The Internet is not printed on dead trees, I don't want to read through a TOC or index looking for what I want when I'm searching, I want to use a search engine. I often don't want a reference, I want to quickly find how to do something. I rarely want to read about all the filters, instead I want to find the even/odd filter, or the the default or omit filter. Yes, sometimes I want to brush up on all available filters, but that's rare.
That said, when I have a question about jinja2, I find that the jinja2 docs are better than the ansible docs.
Users are a bit like high school students just skimming books for an answer to fill in on a test. They need to be forced to learn.
Non-Ubuntu distros also often have standard packages in their repos with no reference to Snap, and EFF also distributes a Docker container with Certbot pre-configured, if Docker is your thing.
Perhaps I had other options the website didn't make me aware of, but it seemed like enough of a hassle that I just dropped it.
FWIW, Certbot is available in the standard repos for almost all other major distros.
Complaining about it reroutes people to better projects, and pushes the project to fix the problem.
You can't expect someone not core to a project to just propose to restructure the whole documentation. Which may also mean changing the website.
And in any case, such overhaul coming from a "nobody" would very likely be rejected as being both too large or incomplete or not desirable.
Re-structuring needs to be pushed for by at least one person from the core team.
So yeah "Just submit a PR" in that context is not an answer, it's an excuse to avoid trying to understand the problem and actually improve the situation.
Store: I spent a lot of time arranging things around in the store, if you can't find what you're looking for you can stop complaining and write signs for us.
Customer: Or I can just use your competitor who actually cares about their customers. ¯\_(ツ)_/¯
- Former Traefik user who now uses Caddy.
At this point, I'm pushing into a place where I'm just going to switch to go because its getting to be a mess.
It’s about as bad at being a programming language or data structure serialization format, though.
There are so many things that aren't expressible in TOML that any anywhat complex system will want... it's not even a contender.
So, one problem a lot of configurations are trying to solve: modularity. I.e. how to allow different actors to change the parts of the configuration they want. Everything under /etc nowadays is of a form /etc/*.d/*.* that is all configurations are directories of multiple files with some ridiculous rules (like "prefix file names with digits so that they sort and apply in the "right" order etc.) XML had a better approach with namespaces and schema, but maybe not perfect.
Polymorphism. Any non-trivial configuration system will have plenty of repeating parts. NetworkManager connection configurations? -- They are all derived from the same "template". Systemd device services -- same thing, they are all coming from the same "template". There are plenty more examples of this. But, languages like YAML or TOML don't have a concept of polymorphism in them. This is never encoded in the configuration itself. Instead, every tool that needs to be configured and needs some degree of polymorphism rolls its own version.
Constraints. It's often impossible to describe the desired configuration through specifying the exact values. Often the goal can be described as "less than" or "given the value of X, Y should be multiple of X" and so on. Such concepts, are, again, not expressible in TOML or YAML and friends.
NB. Types are a kind of constraints.
Identity. It's often necessary in configuration to distinguish between two sub-sections that look the same and two sub-sections that designate the same exact object. Like, when configuring VMs with eg. disks: are they supposed to mount the same disk, or does each VM need a separate disk, that just has the same physical characteristics?
If you want to get crazy, you can push a dict into a list and operated on it but it gets tough at the second level. And don't get me started on if/else statements.
JSON’s a crazy-bad serialization format, too, for that matter. It doesn’t even know what a damn integer is.
Here's a good summary: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...