How We Automate Our Infrastructure

How We Automate Our Infrastructure(segment.com)

124 points by lambtron 10 years ago | 28 comments

avitzurel 10 years ago |

This seems to be the "new standard" when it comes to startup infrastructure beyond Heroku.

However, what frustrates me the most about it, is that every startup is left to figure out everything from scratch and it seems impossible.

There are many tools you need to familiarize yourself with, too many to be comfortable with.

Companies that already figured it out write blog post like this, which provide insights but it's super high level, as a startup engineer this gives you absolutely no value other than "yes, they are using it too".

I wonder if there's a solution for this generic enough to open source that will be a good start for startups.

You check out the project, read some docs and in 2-3 hours you have a cluster running. Kind of a "batteries included" devops solution.

hackercomplex 10 years ago | |

In my view what we're really talking about here is PaaS. Every shop is left to implement a private PaaS on their own for the most part. There are software companies out there who specialize in helping teams deploy this kind of architecture for example:

https://pivotal.io/platform

http://deis.io/

both are open source technologies based on Docker that are gathering momentum and you can hire consultants to help you deploy either one.

I personally don't use Docker. The startup I'm building has chosen to standardize on the JVM for all application code so we leverage the JAR file as a kind of container. The Java ecosystem already solved the problem of zero-downtime deployments a long time ago so for us deploying can be as simple as shipping new jars file across the network.

Instead of using Docker to drive development we simply spin up development database/redis/etc instances in the cloud which automatically join a development VPN network. All of the non-VPN interfaces are automatically firewalled off. One nice advantage of this setup is that developers who have slow laptops are still able to work. I'm a big fan of this approach.

Check out Wildfly's "High Availablity" features if you're interested in one way that the Java Ecosystem can make headaches like zero-downtime deployment, HTTP health checks, monitoring, caching, and even load balancing disappear.. It'll deploy non-java code too as long as it's on the JVM. If you're a Scala only shop there are some great Scala-only alternatives available to boot.

dmichulke 10 years ago | | |

I use a similar setup from time to time and one of my main problems with this is XMHell - you set up everything in XML and whenever something doesn't work, it's hard to find help because you'll get a stack trace instead of "WARNING: <xmlpath> seems to be incorrect" or "Module X: When using feature Y with setting Z, you also need to define A, B and C"

A recent example is trying to make Hibernate work via postgis and postgresql as a datasource in Wildfly. We weren't able to solve it, we could only work around it.

Finally, if you need some behavior off the beaten path, you'll have to use lots of annotated Java which makes it easy if you know all this but it's hard to read a Java file with 10 annotations for classes and methods, simply because you don't know what happens when.

To summarize, it's an ok solution if you have a Java guy with lots of experience in all this (luckily we had one). Otherwise you gonna have to learn a lot (as in by heart) because you can't really reason about XML and annotations (as you could, e.g., when composing services in Clojure).

brikis98 10 years ago | |

Trying to create battle-tested, pre-packaged, "batteries included" DevOps solutions is exactly what we're trying to do at Atomic Squirrel [1]. We think there needs to be a middle ground between Platform as a Service (PaaS), like Heroku, where everything is hidden and magical, and therefore, harder to debug, customize, scale, and Infrastructure as a Service (IaaS), like AWS, where you have full power and flexibility, but also way too many moving parts to learn and manage for a small company. If your company needs something like this, contact us at info@atomic-squirrel.net.

[1] http://www.atomic-squirrel.net/

avitzurel 10 years ago | | |

Actually working on an open source solution around this exact space.

I think the space between Heroku and AWS remains to be solved and lots of companies will jump on the train (if it's good and fast enough).

noir_lord 10 years ago | |

In my case I found Ansible was pretty much all I needed and it only took me a day or so to get my head around the basics (though I'm still learning all the other interesting stuff you can do) - in truth though I've been running and deploying servers for years with bash and python stuff so it just felt like a more generic better put together version of stuff I was already doing.

dano 10 years ago | | |

Agreed. I'm quite pleased with Ansible having been around a while and seen the growth through cf_engine, custom scripts, chef, puppet, and salt. Ansible is certainly quite easy to get going, very flexible, and precise.

joshmanders 10 years ago | |

I've been heavily researching and working with Docker. While I am building my new business, I decided to give back to the open source community and have been doing my best to open source every aspect of the business that I can without giving away our business. One of the things I am doing is abstracting a docker deployment workflow out into a service of it's own.

Basically what I have come up with is a push or merge on master in github, triggers a build in the service, which will push your new image up to docker hub, then ping an agent that runs on your docker host, notifying it of the new image, and any meta data needed to determine how it should proceed.

So for example, if git push to master on app, webhook fires on service, service pulls code, runs commands to run tests if you want, build docker image, etc. Push new image to docker hub, pings agent on docker host, agent gets data, pulls new image, deploys new container, does health checks, and then starts migrating new traffic to the new container before taking old container offline.

DanielDent 10 years ago | |

We've actually been considering if we should turn our internal environment into a product and/or service-product mix.

We've got a mostly automated cloud-agnostic process for spinning up a multi-datacenter Mesos cluster which integrates nicely with a docker CI workflow.

I'm pretty sure it's quite valuable, though I'm also unclear what people would be willing to pay.

helloiamaperson 10 years ago | | |

> I'm pretty sure it's quite valuable, though I'm also unclear what people would be willing to pay.

Your solution probably works great for your needs, but this stuff is expensive to productize. See https://www.openshift.org/

avitzurel 10 years ago | | |

IMHO, the problem is not only whether people will pay.

The problem is that this includes too many new tools that startups need to learn about, implement and maintain.

Most people, just reading "Mesos" "Marathon" or other in the space just tune out.

curun1r 10 years ago | |

Speaking as someone who rolled his own version of this, there were a lot of more complete solutions out there, but they all involved some technology that I felt would cause more pain down the road. Whether it's Chef/Ansible/Puppet which are popular, but seem targeted at mutable infrastructure (one of our explicit goals was immutable infrastructure) or Mesos/Kubernetes/ECS/CoreOS which seem targeted at a larger fleet of instances than we're running, there didn't seem to be any starting point beyond composing the right set of tools and writing the glue that made sense for us.

What we ended up with uses Terraform for provisioning instances, Docker (and a private registry) for distributing our application code, Consul for coordinating everything and HAProxy w/ consul-template for dynamic routing. There were only two pieces that we had to write. The first (which we may open source, if we're given the time to clean it up and generalize it) is a small Go agent that runs on provisioned hosts, figures out its role based on instance meta data, pulls its configuration from Consul and handles deployment, both initial and subsequent when a new version is registered with Consul. The second piece is ensuring that CI generates Docker images as artifacts, pushes them to our private registry and updates Consul to indicate that there's new code to deploy.

It took us about a week to get this working and it's been mostly rock solid for almost a year now. Part of why it's been solid is that we understand exactly how every component of it works. The one problem we've had came from not understanding how HAProxy worked (never point HAProxy and an ELB...it will cache the NS resolution and ELBs can change IPs over time). If we'd tried something off-the-shelf, we'd have a much shallower understanding and, since it's not optimized for our use case, we would have run into many more issues than we've had. On the whole, I highly recommend rolling your own. The code that you will have to write is glue code that's really just replacing what would be configuration in something pre-built. I get that it seems imposing to people without devops experience, but between the tools that are available these days and articles like the one we're commenting about, it doesn't take a guru to get everything working seamlessly. Also, the tools from Hashicorp are fabulous. Use them whenever possible. No disclaimer necessary since I have no affiliation with them beyond using their tools and watching their talks on the subject.

drakenot 10 years ago |

This past summer I spent some time learning Ansible. I've written scripts for the configuration and the deployment of my application's various services. The built-in idempotency of the commands was a big win for me and I feel fairly productive using the tool now.

My only complaint with Ansible really has been that it feels slow at times.

I'm interested in checking out Docker. What exactly does it buy me over my Ansible config/deployment scripts? Does it obsolete them?

dexterbt1 10 years ago | |

Ansible and Docker are orthogonal technologies. Docker buys you repeatable, application packaging to solve dev/prod parity. Ansible can then become your orchestration tool, doing the heavy lifting to manage not just containers, but hosts, dns, LBs, etc.

drakenot 10 years ago | | |

But by using Docker, it does change the way you use Ansible, right? I'm not going to be executing Playbooks against a set of hosts anymore to configure them.

Instead, I guess I'll be using a Ansible to configure a container locally (in place of using Dockerfiles)? Then perhaps a different Playbook to deploy this container to my hosts?

crdoconnor 10 years ago | | |

Docker has a build file. They're not entirely orthogonal.

It's also not strictly necessary to ensure dev/prod parity.

thraxil 10 years ago | |

> My only complaint with Ansible really has been that it feels slow at times.

Highly recommend Salt then. A bit more of a learning curve, but so much faster than Ansible.

webo 10 years ago |

Really wish this article either included more details or segmentio open-sources a few of the tools.

calvinfo 10 years ago | |

Totally hear you.

We're planning on open-sourcing some pieces of our Terraform config and service toolkit in the next few months. We're definitely excited to share our internal tooling with the rest of the community.

webo 10 years ago | | |

Awesome, looming forward to it!

kreutz 10 years ago |

Checkout Convox.com. Stellar team behind an awesome project.