There are 33 open source operators for managing databases on Kubernetes. Out of that list only 3 claim to be production ready.
Out of 126 Operators that I've looked into the vast majority are abandoned and unfinished. Most state the project status as Alpha in the readme.
Kubedb itself has a version number of 0.8.0 for the operator and very low version numbers for the databases. For example version 0.2.0 for Redis.
Version numbers can mean anything but they are usually a good indicator of what the project owner thinks the status is.
It would be cool to see a break-down of status and expected dates for milestones for Kubedb.
For anyone interested in browsing other Operators I keep a table updated half way down this blog post.
https://kubedex.com/operators/
The project statuses come directly from what the authors have stated. Many beta status projects are being used in production.
It would help if the community took a break from new features and worked on stability first so that Operators and other extensions can finally take off. Some of the things being developed now are so esoteric that it seems to be more about finding the next exciting thing to add than usability.
I used to think how you did about kubernetes because I saw just how long it took for features I really wanted to get in. Then I attended some of the SIGs, and realized that there are so many use cases out there unlike mine, and that doing what I want may break what others want. So instead of making a decision that screws over everyone but one cloud provider, what I've seen is very methodical and careful decision making from many companies working together. This usually means that you get something that may not do exactly what you want out of the box, but there are hooks to do it if you'd like. I'd much prefer this over nothing at all.
It would be worth sitting in on a SIG you're interested in, and see how @smarterclayton and @thockin handle these kinds of decisions. I see so much negativity on HN about k8s, and it really seems like people just don't appreciate the amount of attention that goes into each decision. I think if you spend the time to trace the history of a feature and understand why things are done, it may change your mind about how complex k8s is.
Or perhaps it's real ops people with particular arcane needs, each scratching their own itches?
K8s is a large FOSS project; and like most large FOSS projects, most PRs are from corporate contributors that wrote the code for their own purposes and then wanted to upstream it to avoid having to maintain a fork.
Could you elaborate a bit on what those "cracks" are?
I am mostly a bystander, but in the k8s issues I see, it is too easy to either destroy all the pods or their volumes. Maybe this should be fixed at the k8s level.
As someone who's started running services in Kubernetes (albeit mostly as a hobby thus far) I would recommend setting the ReclaimPolicy to Retain for any PersistentVolumes that are particularly important. The default behavior is to delete the underlying volume when the resource representing it is deleted, but if you're worried that might happen accidentally that may not be what you want; this behavior is configurable.
FWIW, it has been: RBAC allows you to strip -- or I guess pragmatically speaking, not assign -- rights at whatever level of granularity you have the patience to maintain. It is also bright enough to do that per Namespace, so going light on the ClusterRoleBindings and keeping things out of the "production-db" Namespace would likely go a long way toward addressing the risk you are describing
This module takes that into account - and defines a “backup k8s object” that will trigger a db dump. But there is still no way to get point in time data recovery/backup that you get from current production-grade managed state providers. Im going to say its production grade if we are using the standards of 10 years ago. Production-grade today, I feel, is a bit more robust.
How does this compare to other community efforts, e.g. Zalandos Patroni project, aside from supporting more databases than just postgres?
The original swarm design had some of this but it was pulled just before release for more design work... which was never completed.
I wrote the only storage support currently in swarm, which is the "mounts" api in your service spec...
So, technically you could use swarm to do it, but it will be painful and I don't think any amount of tooling will help until docker includes some support for cluster-aware storage.
I would be happy to hear if people have successfully done this, though!
When looking at the hardware being provisioned I realized it wasn't even anything too crazy and could be had for 1/4 the price at Linode.
I will definitely be using this in the future.
Unfortunately, we aren’t there yet for most software. Let’s take Postgres as an example. Even though you have to manage your pg database manually (or use a service that manages it for you), that’s just because the right automation software hasn’t been built yet. Someday, a Kubernetes Operator (or equivalent implementation) will exist that can manage a large Postgres cluster better than a team of DBAs. It’s crazy that there are hundreds (thousands?) of configuration parameters in Postgres, and these are coupled to the operating system settings in weird and unexpected ways that most people don’t know. We should be building this knowledge into a K8s Operator and letting that control our pg.conf and os configuration, instead of giving that control over to a team of humans who might be able to put in some sane defaults, but will always be working to get the optimal performance out of Postgres as the usage share changes.
This exists in some places already. For example, Rook is a K8s operator that provisions and manages Ceph in a Kubernetes cluster. As a small startup, if I need this functionality, I don't want to hire a full time Ceph admin to figure it out, and I don’t have the expertise to take on operating Ceph myself. Rook productized operating Ceph for us, and “baked in” all of the needed knowledge to manage block and object store and even set up concurrent, shared file systems. I trust Rook to manage Ceph, and I don’t think that I could do a better job with human intervention.
We have a long way to go. Operators are a tool that might help get us there but Operators are just a pattern that exists that we can use. One thing for sure is that we shouldn’t assume that human control over complex software is required to achieve optimal performance.
It will take time for these to harden, but it eventually will since the primitives are all there.
That said, a very significant issue that the public clouds don’t face is the team creating the operator in aws / gcp’s case is the one running it and they fully understand exactly the configuration the operator will be deployed to.
With helm and operators designed for public consumption by other companies, the amount of generality needed will be higher, at least until the conformance tests for a kubernetes implantation get more detailed and there is ci/cd between the operators and charts running on clusters that better mirror various Kubernetes deployment choices.
When we designed stateful sets, it was to make the minimum bar easier to get unique network identity (which is necessary, but not sufficient for most cluster software). And in practice I’ve seen people using sets directly with a thin layer of scripting or helm on top, but I’ve also seen people implementing their own stateful sets because the logic isn’t that hard once pods had the necessary shims.
I would probably say that kube is best thought of as an extensible compute pattern framework (you can leverage the lowest level atoms or build layers on top). Kube is probably only successful as long as we keep the 80% easy, reduce the cost to add new patterns (libraries and tools), and decompose the bits that should be replaceable.
Most of the “reinventing” problems with operators are problems with kube having weak libraries - each controller and operator is somewhat bespoke. That’s something I expect to see improved this year via the various tooling libraries. But it’s still a work in progress and I regret it took so long.
Performance issues should be the least of your concern. The docker deamon and container simply hanged because of filesystem issues on CentOS 6.
I worked at a company that was dockerizing their stateless services, then planning to dockerize their cassandra databases. Multiple contractors involved.
Stateless services failed periodically because of the above issue. Load balancers can failover automatically, broken nodes are rebooted from time to time, limited impact. Noone cared, just a daily deployment routine.
I fear the day the cassandra dockerization would happen. They'd lose their entire customer data (hundreds of millions of customers) once two nodes would fail simultaneously, which happened a lot on the stateless services.
Thankfully the project never started and the company didn't go bankrupt. Pretty sure employees moved around and plans got canceled.
Expect a lot of instability in docker around filesystem, performance issues and race conditions. Low volume stateless web servers don't get to trigger issues much, but databases do.
Moreover - all/most Linux distros have switched to using Overlay2 as default driver. If you are running latest version of RHEL/CentOS/Fedora/Ubuntu that is the driver you will be most likely using.
Volumes are quite stable and reliable when based on a stable file system.
So while you could lose the container due to the but describe, you would not lose the persistent data.
It's best practice not to write to the Docker image at all during runtime (no log files, no PID file, etc), but to write only to volumes or tmpfs mounts. I'm a little bit suspicious about the crashes you described: are you sure you followed that best practice?
EDIT: There's probably some overhead with the implementation of namespaces and cgroups but I have not found any reliable sources about the quantities. As a side not if you are using memory limits you will have a performance penalty: "Memory and swap accounting incur an overhead of about 1% of the total available memory and a 10% overall performance degradation, even if Docker is not running." (from the Docker docs). This will probably affect databases running in containers as well.
There could not be an application worse-suited to running in Kubernetes et al than a traditional database. Anyone claiming something that rams this square peg into that round hole is "production ready" is showing that they're an empty husk and shouldn't be trusted near anything important.
Note the downvotes already rolling in less than two minutes after I posted this. This subject is a major third rail here. It goes against the agenda of very powerful people and my account has been censured in the past specifically for making this particular argument, that database workloads and Kubernetes don't mix. Keep that in mind when you're asking HN for their experience on this (or any other topic that YC considers critical to the interest of their investments -- they've shown that they're willing to taint the discussion if it gets too dicey).
That being said, I also disagree that Docker isn't suited to running a DBMS, assuming you actually have a large enterprise (or cloud) datacenter backing your Docker daemon. In such cases:
• You'll probably have a large enough pool of Docker machines (k8s or not) that you're going to be deploying your DBMS container in a way that reserves an entire instance just for it (or it + its accessory containers);
• You'll probably have a SAN, and you'll have many enterprise-y reasons (e.g. live VM migration) to prefer backing your DBMS with said SAN, rather than with local instance storage.
If both of those are true, then Docker has no disadvantages compared to deploying your DBMS as a raw VM.
But I fully agree that kubernetes and containers are not well suited to running production databases. In theory they could achieve parity with a dedicated machine or VM, but they're still a long ways from that - and it makes it very easy to lose your data. I was recovering a database where the persistent volume wasn't setup right and the container got killed and restarted. It was just before the holidays and it was a nightmare because everyone was on vacation.
Yeah you could get into that kind of problem with a VM or dedicated machine, but the bar is a lot higher, you'd need some kind of hardware failure. Kubernetes makes it really easy to shoot yourself in the foot when running databases.
It's a similar thing with e.g. Ubuntu LTS releases. The core distro might be FOSS, but those branches are uniquely the result of a centralized, corporate devops maintainership ensuring that the silent, automatic security and kernel package upgrades go off without a hitch.
To be clear, I’m not saying you can’t join that maintainership; what I’m saying is that, unlike with a regular FOSS library or framework, or even a regular piece of FOSS daemon software like Apache, in the case of a DBMS, the software will only continue to run smoothly for as long as that maintainership is around to keep it running smoothly. There’s no such thing as a useful unmaintained DBMS, FOSS or not.
And, because of that, the “calculus of TCO” for DBMS projects changes a bit. Unlike regular software, where “proprietary” translates to “higher potential TCO” because of switching costs, in the DBMS case, the “proprietary” vs “open” distinction is nothing next to the “big, healthy maintainership” vs “small, ailing maintainership” distinction. Because, if the DBMS loses all its maintainers? Now you’re stuck maintaining it—at the core level—yourself (and learning how to do so in the process) until such time as you can migrate your data away from it.
Personally, for a production-grade DBMS, I’d trust a corporate-backed (or at least sponsored) product over one which is purely a volunteer effort any day.
https://github.com/cloudfoundry/cf-mysql-release
These releases are used in production by Pivotal, and are actively developed to that end, so they are genuinely production-grade. People have thought carefully about resilience, backups, security, etc. BOSH is a bit awkward, and these releases are tightly coupled to CF, but there's some great work in there.
You can take as much time as you like to migrate yourself away from k8s if you don't like it any more (physically migrate your system to your own site; pin the k8s version to prevent API changes; then start changing your code to be less coupled to k8s.)
Whereas, if AWS changes and deprecates a feature, you're on their schedule as to how long you have before your service will break.
If your application runs on K8s, then it's portable and it doesn't have to be aware of the environment or other system integration (e.g storage).
The thing to remember is mounts are implemented only at the node level, so there is no cluster-aware storage controller.
Cloud-ish infrastructure is often good for running distributed decentralized databases, but try running Oracle in a bunch of Docker containers on a crappy OpenStack cluster and soon you'll be crying into your scotch.
These efforts to make people think it's a good idea to run databases on K8s are misleading people, and god help those poor teams that waste years trying to stabilize something that a fancy web page and a youtube tutorial said was a great idea.
Volumes in docker are just host bind mounts.
Now depending on your driver/opts (similar to K8s PV backends) this storage can come from anywhere and performance of the volume is totally dependent on the type of storage being used.
Now, the container fs that docker sets up is (usually) using a CoW filesystem, and as there is overhead there... but volumes are specifically designed to bypass the container fs.
However I did mean to clarify that - k8s does not uses docker volumes. Whatever performance issues one notices could either be associated with storage provider or container writable layer.
In other words, your database application was using scratch storage instead of persistent volumes?
What this anecdote shows is that the developers or admins responsible to setup the database didn't do it properly.
Also, testing failures and data recovery should be your priority before going to production.
I don't see how you could blame that on software.
The point is not to say it wasn't human error - clearly it was, but it's an error that wouldn't have been as easy to make without kubernetes. There's a cost to running a database on k8s that largely people ignore. That's before you start talking about backups and recovery which also get harder and require more manual work with more potential for error.
Troubleshooting acute outages caused by hardware or software failures requires a different skill than properly configuring the system to scale and minimize the chances of a corruption or outages. Rook solves the later, but we do understand the architecture and what Rook (and Ceph) are doing. We've just removed the expert level, craftsman, speciality knowledge required to operator Ceph because we decided, after a thorough evaluation, that the software in this case is the most capable solution.
It’s a bit like debugging software you didn’t write.
That being said we also have an ElasticSearch operator that is used to deploy ElasticSearch on Kubernetes, there nodes running on i3 instances and the corresponding instance storage is used. Although used in production that’s still very new and sadly not open source.
What is "spilo"? I am not familiar with this term. Thanks.
The postgres-operator I linked earlier but also our setup on AWS (with one image per EC2 instance) uses that to actually run Postgres.
[1]: https://github.com/zalando/spilo
[2]: https://github.com/zalando/patroniContainers require volumes/filesystems to run and some implementations are buggy as fuck.
Docker abandoned CentOS 6 many years ago, whether they stated officially or not, the last docker package and kernel/drivers are unstable. Similar story on some other distributions.
It wasn't production-ready at all back then and it's still not a good idea to containerize databases now. Besides bugs that come and go, there are other challenges around lifecycle, performance and permissions that are not trivial to deal with.
Can you outline what those stability issues are/were? Was the non-Docker part of the problem kernel related? Genuinely curious.
The filesystem drivers are buggy as fuck. You would experience kernel panics on Debian Jessie (overlayFS), or containers + docker daemon hanging on CentOS 6 (devicemapper). The fix in both cases is a reboot.
You might not notice it if you barely used docker, but it can be very outstanding at scale. I've been consulting briefly at a major web company that was deploying their web services to 5-20 nodes, daily. On every service deployment there would be up to 3 nodes dying.
That said -- no, that's not the same thing at all. Barring anomalous conditions, VMs run as long as you keep them running. They won't be reaped and rescheduled onto some other node in the cluster, whether by automated rebalancing processes or by manual `kubectl delete po...` or `kubectl drain`. You can easily set up a VM that will behave more-or-less like conventional hardware if we ignore the perf hit.
This is a pretty simple thing. The reason people say you need to make your apps "12 factor" when you go to k8s is because it doesn't work well if your app cares about state. Databases care deeply about state. You can't just kill a DB server and spin up a new one to pick up where it left off. You can't parallelize a DB workload by spinning up 8 little DB nodes. It's not a web server and it just doesn't work like that. Things like CockroachDB exist specifically because normal databases don't work like that.
This is where people usually bring up things like annotations, labels, StatefulSets, etc. First, note that the facilities that accommodate stateful workloads are not priorities for Kubernetes and are generally not well-tested or consistent. This wouldn't be a news story or an independent project if they were.
Second, please realize you're doing all of that work to try and make Kubernetes do something it's not really designed to do, with potential negative impact on the availability and scheduling processes for the applications that do work well on Kubernetes, when you could just spin a VM and avoid all of these issues entirely. There's no reason to put a production DB on k8s other than cargo culting.
Just like any other tool that makes some things easier, Kubernetes also makes it easier to shoot yourself in the foot. Just like any solution, you have to know the system well enough to reason about it. There is still a lot that can be done to improve how we explain, document, and describe the system. But people run stateful workloads on Kube all the time, and they do it because it makes their lives easier on the balance.
If you aren't managing your state, then yeah you will run into a nightmare when trying to containerize stateful apps... or running them at all. You will literally have the same problems with a VM or physical hardware.
It's important to separate state management from process management. A stateful application is absolutely not harder to contaknerize than a stateless one. Rather it is simply just harder to run stateful applications in any regard.
I would personally argue that it is easier to run a stateful app with a container manager. I know it sounds crazy but... keep in mind container tools are cenetered around what each individual application requires and the tooling tends to make it easier to express and assist in managing the state requirements of that application.
For that matter you can even prevent the scheduler from scheduling your stateful app on a new node, which seems to be the answer for the crux of the argument against containerizing a stateful app.
I agree, which is why I specifically avoided that language. Containers don't have to be implemented without regard for state -- but if you're talking about Docker or k8s, they are. Docker throws away anything not explicitly cemented in the image or designated as an external volume.
LXC, zones, and jails are containerization techniques that respect state. It's fine to run a database in these if desired. They behave just like real VMs; they have an init process, they get real IPs, they don't automatically destroy the data written to them, and they generally don't mysteriously shut down or get rescheduled. You can't be confident about any of that with Docker or k8s.
Statefulness is not a primary use case for Kubernetes. It took two years for StatefulSets to leave beta and there was a substantial false start in PetSets. As recently as April, which is the last time I seriously looked, there were still competing APIs for defining access to local volumes.
If you want to run a production database workload in a jail or a zone, that sounds fine to me. It's not about containerization in the abstract. It's about the way that Kubernetes and Docker do it.
(I mention Docker and k8s together because for most of k8s history Docker was the only supported runtime. It supposedly can use other runtimes now, but they're not widely used afaik, and behave similarly re: state anyway)
Once you’ve achieved that whether your database runs on a VM or in Kubernetes doesn’t make a difference really.
Granted, if your not at that scale, running a database in Kubernetes is probably not the best of ideas. That has nothing to do with Kubernetes though, that’s because running a stateful service with decent working backup, recovery and automated failover is difficult in any case. If that’s not your job, you’re probably better off using RDS or something equivalent.
Honestly, it sounds like you're arguing that since the kubernetes API is easier and more accessible to use, then it's more dangerous to run state on that layer. That, and a community attitude of being more willing to accept failure, which some would argue is a good thing, others not so much, but I prefer to subscribe to the thought process discussed in the SRE book that failure is inevitable, and that putting your databases inside their kube equivlent saved toil time and harderns your setup.
That said, I would argue most folks being on cloud anyways should just use a managed postgres, but we're not always on cloud and I don't think claiming putting state in kube it's inherently wrong is fair.
I take it you've never managed a large VM hypervisor (e.g. vSphere) cluster. If your VMs aren't being pinned to particular hypervisor nodes by persistent claims on local instance storage or the like, they end up "floating around" on each restart in pretty much the same way k8s containers do. Especially so if you have live VM migration enabled, in which case you're probably doing the equivalent of `kubectl drain` all the time to deprovision and repair hardware.
I recall a few nasty issues in the GitHub with data loss or unmountable volumes for the early adopters, with the official answer along the lines of "implementation is in progress".
There are still bugs, I do not disagree. Data loss bugs are considered top priority and I am not aware of any open such bugs against EBS driver.
You'll excuse me but no time to go through the history and dig up the tickets for reference.
Why would an app dev be pushing changes to the db deployment (outside of data manipulation itself)?
Just because the app dev wants to spin up a db in dev to shove their data into doesn't mean that's how it should be deployed in prod.
(It’s not impossible in specific cases, mind you. I’m still waiting on tenterhooks for the moment someone introduces an Erlang-node operator where you can apply hot-migration relups through k8s itself.)
The trick is to express your state requirements. And yeah, you will be burned badly if you don't do this... and maybe docs and such should call this out better to make sure people don't set themselves on fire just because they didn't dig in deeply enough.
But docker and k8s do provide a means to assist in managing this state for you (swarm... not so well just b/c the work hasn't been done).