KubeDB – Run production-grade databases easily on Kubernetes

KubeDB – Run production-grade databases easily on Kubernetes(kubedb.com)

187 points by openmaze 7 years ago | 91 comments

It's good to see a project focussing on production-grade databases on Kubernetes. Particularly the production grade part.

There are 33 open source operators for managing databases on Kubernetes. Out of that list only 3 claim to be production ready.

Out of 126 Operators that I've looked into the vast majority are abandoned and unfinished. Most state the project status as Alpha in the readme.

Kubedb itself has a version number of 0.8.0 for the operator and very low version numbers for the databases. For example version 0.2.0 for Redis.

Version numbers can mean anything but they are usually a good indicator of what the project owner thinks the status is.

It would be cool to see a break-down of status and expected dates for milestones for Kubedb.

For anyone interested in browsing other Operators I keep a table updated half way down this blog post.

https://kubedex.com/operators/

The project statuses come directly from what the authors have stated. Many beta status projects are being used in production.

manigandham 7 years ago | |

Part of the problem is that Kubernetes itself is still changing rapidly and already has design-by-committee cracks in the API.

It would help if the community took a break from new features and worked on stability first so that Operators and other extensions can finally take off. Some of the things being developed now are so esoteric that it seems to be more about finding the next exciting thing to add than usability.

shaklee3 7 years ago | | |

You're using that term in a derogatory sense. Would you rather have Google decide how everything is designed, and everyone else has to deal with it? I think you'd see a ton of GCP-specific stuff if that were the case.

I used to think how you did about kubernetes because I saw just how long it took for features I really wanted to get in. Then I attended some of the SIGs, and realized that there are so many use cases out there unlike mine, and that doing what I want may break what others want. So instead of making a decision that screws over everyone but one cloud provider, what I've seen is very methodical and careful decision making from many companies working together. This usually means that you get something that may not do exactly what you want out of the box, but there are hooks to do it if you'd like. I'd much prefer this over nothing at all.

It would be worth sitting in on a SIG you're interested in, and see how @smarterclayton and @thockin handle these kinds of decisions. I see so much negativity on HN about k8s, and it really seems like people just don't appreciate the amount of attention that goes into each decision. I think if you spend the time to trace the history of a feature and understand why things are done, it may change your mind about how complex k8s is.

smarterclayton 7 years ago | | |

What are some of the design by committee cracks that you think should be addressed?

derefr 7 years ago | | |

> Some of the things being developed now are so esoteric that it seems to be more about finding the next exciting thing to add than usability.

Or perhaps it's real ops people with particular arcane needs, each scratching their own itches?

K8s is a large FOSS project; and like most large FOSS projects, most PRs are from corporate contributors that wrote the code for their own purposes and then wanted to upstream it to avoid having to maintain a fork.

bogomipz 7 years ago | | |

>"Part of the problem is that Kubernetes itself is still changing rapidly and already has design-by-committee cracks in the API."

Could you elaborate a bit on what those "cracks" are?

shaklee3 7 years ago | | |

Stability of what?

sitkack 7 years ago | |

What would a production-grade conformance test suite look like for K8s to get these operators to 1.0?

I am mostly a bystander, but in the k8s issues I see, it is too easy to either destroy all the pods or their volumes. Maybe this should be fixed at the k8s level.

ryukafalz 7 years ago | | |

>too easy to either destroy ... their volumes.

As someone who's started running services in Kubernetes (albeit mostly as a hobby thus far) I would recommend setting the ReclaimPolicy to Retain for any PersistentVolumes that are particularly important. The default behavior is to delete the underlying volume when the resource representing it is deleted, but if you're worried that might happen accidentally that may not be what you want; this behavior is configurable.

mdaniel 7 years ago | | |

> Maybe this should be fixed at the k8s level.

FWIW, it has been: RBAC allows you to strip -- or I guess pragmatically speaking, not assign -- rights at whatever level of granularity you have the patience to maintain. It is also bright enough to do that per Namespace, so going light on the ClusterRoleBindings and keeping things out of the "production-db" Namespace would likely go a long way toward addressing the risk you are describing

markbnj 7 years ago |

I'm wary of the operator model in general, and we haven't had great success using operators to deploy complex stateful services in our clusters. But to be honest we also haven't had great success deploying them using OTS charts from helm stable either. One of our k8s stateful services is a large elasticsearch cluster indexing about 150m events per day, and the chart was forked and heavily modified by us to get it right. I feel that complex stateful services often have enough devils in the details that trying to implement them through an abstraction gets you into trouble. Operators aspire to be a "smart agent" that can translate a CRD resource declaration into a functioning thing, allowing you to implement your data store at an even higher level of abstraction than a helm chart provides. Since in my experience charts are themselves too abstract for this purpose (you either end up forking/modifying or, if the chart actually provides full coverage of the configuration options, creating a whole new hard to comprehend API to the k8s resources you're trying to deploy), I'm not that excited about having a back-end clippie that can do it for us. It's probably fine for simple use cases, and especially those where you often need to create and destroy simple dbs, but imo not yet for large production use cases.

keypusher 7 years ago |

While I have completely embraced running stateless services in Docker, I have been hesitant to migrate the database layer to containers. While I have not tested it personally, I have seen numerous reports of performance issues when using volumes. Is this no longer an issue, or was it limited to bind mounts? Do volumes not use the storage driver? Also, I have run into permission issues when using volumes with Docker, which I'm sure was just my own ignorance but it does seem like a cause for confusion and potential error. I have read through the documentation on the linked page, and the quickstart guides for KubeDB seems great for getting up and running, but I do worry about situations like if an automated PG database failover can't reconcile a timeline, there isn't much documentation on failover at all and this could add significant complexity to something that is already a potential nightmare. Anyone care to share their experiences running production databases in k8s?

softwaredoug 7 years ago |

For those terrified of an AWS dominated future, projects like this are crucial. The closer we can get to OSS based push button open source DB cluster in any cloud, the less we need fear AWS will host everything and lock us in to a walled garden of closed source AWS systems.

lukeqsee 7 years ago |

Earlier discussion: https://news.ycombinator.com/item?id=18698759

an-allen 7 years ago |

I’ve always been troubled by production-grade handling of state in containers - specifically as it pertains to data backup.

This module takes that into account - and defines a “backup k8s object” that will trigger a db dump. But there is still no way to get point in time data recovery/backup that you get from current production-grade managed state providers. Im going to say its production grade if we are using the standards of 10 years ago. Production-grade today, I feel, is a bit more robust.

DasIch 7 years ago | |

https://github.com/zalando-incubator/postgres-operator supports point in time data recovery just fine and is used in production for 100s of databases at Zalando.

pritambarhate 7 years ago | | |

It would be good to know the size and scale of these databases.

SoylentBob 7 years ago |

Interesting project! Thanks for sharing.

How does this compare to other community efforts, e.g. Zalandos Patroni project, aside from supporting more databases than just postgres?

mosselman 7 years ago |

Does anyone know of a docker alternative like this? So something like KubeDB that lets me deploy a production-ready postgres db on docker swarm for example?

cpuguy83 7 years ago | |

I would not run a database on swarm. It simply does not have the right api's at the cluster level to properly express state requirements.

The original swarm design had some of this but it was pulled just before release for more design work... which was never completed.

I wrote the only storage support currently in swarm, which is the "mounts" api in your service spec...

So, technically you could use swarm to do it, but it will be painful and I don't think any amount of tooling will help until docker includes some support for cluster-aware storage.

I would be happy to hear if people have successfully done this, though!

mosselman 7 years ago | | |

Thank you for your reply. Do I understand correctly that the biggest issue is the fact that containers won't run on the same node and you'd thus have storage issues? Would these issues be (partially) mitigated if you'd run postgres on a single node?

bearjaws 7 years ago |

Funny because I was just baffled by the pricing of HA MongoDB (from formerly mlab), it gets way too pricey way too fast.

When looking at the hardware being provisioned I realized it wasn't even anything too crazy and could be had for 1/4 the price at Linode.

I will definitely be using this in the future.

rmoriz 7 years ago |

How does it handle PG updates like for example from PG 9 to PG 10?

Volundr 7 years ago | |

Based on my reading of the documentation, it doesn't. So you'd be responsible for taking a backup via pg_dumpall, and restoring it post-upgrade.

rmoriz 7 years ago | | |

Thanks for the confirmation. I was not able to find it, either. Strange what use cases are called „production-grade“ nowadays...

geggam 7 years ago |

Performance tests please ?