Docker Misconceptions(valdhaus.co) |
Docker Misconceptions(valdhaus.co) |
Like, if you're using Docker in conjunction with AWS's suite of tools (Elastic Beanstalk, CloudWatch, etc.), a lot of these concerns are taken care of, you know?
So Docker doesn't solve everything, but it can be part of the solution.
Docker on ECS, with VPC for network and instance isolation, an ELB for load balancing and a Kinesis for log streaming is working extremely well.
Docker is feeling really great here as common tooling between a development system and production system.
Disclaimer: I'm working on a project, Convox, that automates setting up this type of system. https://docs.convox.com/
My hope is that the orchestration/scheduling tools (mesos, kubernetes, etc.) mature in such a way that the switch from Docker to unikernels is largely transparent to most people.
But if you are Google or Amazon, who have to build massive data centers to host thousands and thousands of those apps, along side much larger-scale applications, you could achieve much more significant density (and therefore reduced costs) if you were running unikernels as opposed to VMs. Perhaps passing some of that cost difference on to the customer for both competitive reasons and as an incentive for them to upgrade.
That said, even for a small-time app, consider the weight of trying to run a complicated micro-service-based system on a developer laptop. Having to orchestrate a bunch of VMs is an unmitigated disaster. Having to orchestrate a bunch of containers in one or more VMs is an improvement, but not much.
If you could instead run unikernels, there's considerably less overhead. Especially since the unikernels are typically able to run hosted inside a standard host-OS process.
Don't get me wrong, the world isn't really there. But when you consider a kubernetes cluster of docker containers that you never SSH into ... why bother with all those added layers of OS and runtime cruft?
I'm not really comfortable with such widespread adoption of a tool that is primarily a VC baby--NPM is setting itself up to fail (I think) in a similar fashion.
I do hope I'm wrong.
as soon as you start treating docker images as anything other that isolated statically compiled executables, you're not going to get the best out of docker.
if you are bundling inits, crons and companion apps into a single container then you need to stop, go back and either re-factor your code, or go to Full on VMs,
why?
because the networking is terrible. There are three great advantages to using real VMs over containers:
o Networking
o Isolation
o hot migration and resource allocation
Networking:
every instance of a service can have its own IP, and can be trivially tied to DNS automatically. scoped service discovery that's only sortof just possible now. however it uses immature tools with limited professional experience to back them up. DNS, DHCP with subdomains means images can be dropped in without any hard work
Isolation:
Its far harder to break out of a VM than it is a container. Especially if you are dealing with persistent storage and need to allow a container to write outside of its own chroot.
Hot migration:
This is killer. Hardware fails. having a cluster that automatically migrates around contention and hardware failure, without the app having to worry is worth many thousands of man hours. Yes making your own clustering system is fun, but its really quite hard to do well. Why bother when the hypervisor can do it for you?
There are three things going for docker:
Configuration library:
There is a rich library of prebuilt images
Baked in fudges:
You can bake in your dirty hack into the container, so long as you script it into your build job, its repeatable.
Speed:
yes there is less overhead. but lets be honest, how often have you hit up against VM speed issues that were down to your machine using too much CPU/memory? (if you're on AWS, no, you've not. AWS is dogshit slow, and expensive.)
Everything else, like immutable builds, easy dev environments et al, can be achieved already, and without much work.
I do however agree that not everything is ready to be containerized, but we are starting to get close.
Ah, so you need to use proprietary SaaS in order to have decent orchestration? Not good news.
The big problem that Docker solves is the dependency problem. Specifically, it ties multiple levels of dependencies together with application code in a way that makes no assumptions about your environment and how well-maintained it is. It means that your CI system can test on the exact same versions of binaries -- and every dependency down to the kernel level -- that you will run on your production systems.
Many bigger companies will have multiple Yum/Apt/Maven/Git repositories, and with Docker, it doesn't matter. Whatever is built into the container is what gets run. Most importantly it puts control of those things into the hands of the development team, not the system administration team. It allows you to more cleanly separate your infrastructure ops from your application engineering/devops, which is the prime benefit IMO because those two groups have never worked together well.
- Creating a scratch VM (not even provisioning) is a speed bump when you want to re-running scripts on a pristine environment to validate them. Starting a Docker image takes ~1s.
- Provisioning is slow! Reddit suggests installing a plugin (vagrant-cachier) to keep you sane. I ended up downloading a plugin to take VirtualBox snapshots of my VM and even that was depressingly slow. Docker commit takes maybe 20s on a big layer.
- VirtualBox shared folders are pretty bad, so apparently I should install vagrant-bindfs (and NFS packages). -v is effectively zero effort.
- I had to keep on shutting down and starting up the VM to tweak memory settings - too big and my laptop died, too small and nothing would start. You get far more flexible controls in Docker.
I couldn't help but think all the way through that putting everything in a Docker container would just be a far nicer experience, even if it does violate the single process rule - for this use case, Docker solves some papercuts very effectively.
- snapshots, provision your VM once, snapshot it, power off.
- clone a new machine from that snapshot (its still powered on, and do what you will with it)
Bonus points for attaching an ephemeral drive.
- provisioning is slow, thats why you have hot spares to clone from/use directly.
To minimise hassle, I use the very same infrastructure that I deploy on in prod. This means that there is no difference between prod and dev.
I understand the need to run stuff on laptops, but for me, its really not worth it. Using the same systems and sized machines as in prod makes life so much simpler. Plus I can hand over a machine to another dev really simply.