“It's The Future”(circleci.com) |
“It's The Future”(circleci.com) |
It's not clever or funny. It's lazy. It's exaggerated to the point of questionable emotional stability on the part of the writer.
Actually it is funny, but only in an ironic way.
"-Well, Amazon has ECS, but you gotta write XML or some shit."
This joke will never get old to me
Even renders on console browsers.
That said, if you can get your system to work with a single Heroku box, you really truly can simplify your life. That is what we're trying to do with http://gun.js.org/ , be able to start with a single machine and no configuration/setup/complexity. Then grow out.
We just had a discussion on the WebPlatform Podcast about all of this P2P stuff (https://www.youtube.com/watch?v=NYiArgkAklE) although, like I said, I probably got too jargony.
But props to circleci for calling out the elephant in the room. Great marketing actually.
The idea is you can deploy any app to any infrastructure of your choice (inside Docker containers). This means that you are not locked into Heroku and it gives you much more flexibility.
It's basically a hosted Rancher http://rancher.com/ service with a focus on a specific stack.
I think in the future, there will be a lot of services like Baasil.io (specializing in various stacks/frameworks) and managed by various open source communities.
Docker and Kubernetes WILL become more accessible to developers - I would bet my life on it.
I'm currently building a CLI tool to allow deploying in a single command - So you can get the simplicity of Heroku while not losing any flexibility/control over your architecture.
Also Baasil.io is essentially just a control panel/dashboard (Rancher-as-a-service), you can quit Baasil.io at any time and switch to your own hosted Rancher instance and you don't have to change any of your application code or change infrastructure providers.
The main benefit of Baasil.io is that it was built by an open source community using open source software and so we can offer the best possible support for apps built on top of our own open source stacks.
"Also Baasil.io is essentially just a control panel/dashboard (Rancher-as-a-service), you can quit Baasil.io at any time and switch to your own hosted Rancher instance and you don't have to change any of your application code or change infrastructure providers."
But so you can with AWS and Azure - especially with their Docker offerings. So I'm not sure what problem baasil.io actually solves? If anything it just adds to the list of dependencies and points of failure.
Users don't have to use Baasil.io, but if they do, they will get the best possible support - For example, a customer can give us access to their Rancher control panel and this would allow us to SSH into their machines to help resolve any problems in a hands-on way.
It's probably more accurate to describe it as "DevOps as a service - With a focus on realtime apps/services". The value proposition is probably closest to Cloud 66 http://www.cloud66.com/ except more focused on realtime apps.
Another similar service is Zeit.co https://zeit.co/ - Except Zeit.co only runs Node.js. Baasil.io can be extended with components written in any language.
Having monolithic app does not make it bad. What makes it bad is not having proper modules with proper interfaces.
SOA comes handy when you want to distribute your workload, so now we have proper modules but those modules needs more computing power, so split them up into boxes and pay the pain for managing that, because you have no option.
When I wrote that article it was largely focused on the potential for Docker to create a bunch of Heroku competitors as well as a simplified development experience across multiple languages.
The businesses aren't there yet although a ton are trying. The local dev experience has not materialized yet either outside of native Linux due to performance issues with volumes that only a 3rd party rsync plugin have come close to fixing.
I still use and advocate for Heroku pretty heavily for just about any non-enterprise environment.
It's a constant balancing act. Too flexible, it becomes overwhelming. Too constrained and you sacrifice a bunch of the perks of using Docker.
The conclusion I've come to is the only way to do it is to be unashamedly opinionated about keeping things simple for the average user. Otherwise you end up having that exact conversation
Have a look at PCFDev.
Disclosure: I sit next to the PCFDev team and use it in my dayjob.
When it comes to micro service, it would be interesting to know simple things like what kind of services were created, how large are, how communiction is handled, how large team(s) behind the service etc.
For some companies these are of course trade secrets, but sometimes opening things up might be good marketing. An example is Backblaze with their very detailed descriptions of their storage pods.
https://www.infoq.com/presentations/microservices-comparison...
I find them much better than walls of text.
Getting Back To Coding http://www.drdobbs.com/architecture-and-design/getting-back-...
I think we need a better word for apps that are single tight self-contained systems than "monolith". You can design elegant interfaces, and avoid creating a sloppy mess, with function calls or objects too.
This rant sounds just like any rant from old dev mocking a new tech. "This is less efficient, this is too complicated, this can't be taken seriously, this won't last".
Creating a character obsessed with "this is dead" hardly dissimulate the obsession with "this won't work". Do whatever you please, we don't care. But don't mock others about what they please.
Passing through that, let's address the critics.
Microservices and docker are not necessarily tied. I write only monolithic apps, and use them with docker through dokku.
Etcd is a microservice problem, not a docker one.
You don't need coreos or kubernetes to use docker in production. You need them if you want massively scaled applications, just like you would have many servers running the same app with replication without docker. Most of us don't need that (and those who need it probably won't find it more complicated than what is needed to do that without docker).
If you don't want to manage servers, well, don't manage them. That's what cloud services are made for. But please tolerate some people love devops and not spending much direct money into infrastructure.
I know it's tongue-in-cheek but few if any of these new fangled things are critically dependent on one another.
It turns out that the optimal size depends on the balance between the overhead costs associated with allocating resources within one firm and the transaction costs associated with two firms doing business with each other. The overhead costs are higher with large firms because there's more internal resources, including people, to allocate. On the other hand, transaction costs are higher with small firms because each firm does less themselves so they need to transact more with others to accomplish their goals.
As the relative costs vary over time, the optimal size varies too, and firms in an industry will grow and shrink. If it increases, then you'll see mergers and acquisitions produce larger firms. If it decreases then you'll see firms start splitting or small startups disrupting their lumbering competition.
I suspect a similar thing happens in software, where there's an optimal service size. It could be infinite, where it makes sense to build large monoliths to reduce the cost of two systems communicating. Or it could be one, where it's optimal to break the system at as fine a granularity as possible (function level?).
The optimal size depends on the balance of costs. All else being equal, by drawing a service boundary between two bits of functionality you shrink the services on either side but you increase the number of services and add communication costs for them to exchange data and commands.
How these costs balance out depends on the technology, and there are competing forces at work. As languages, libraries and frameworks improve, we can manage larger systems at lower costs. That tends to increase the optimal service size. As platforms, protocols and infrastructure tools improve, the costs to run large numbers of services decreases. That tends to decrease the optimal service size.
The microservices movement, and to an extent the serverless movement, assume that in the medium- and long-term the technological improvements are going to tip the scales sharply in favour of small services. I agree that's likely the case. But we're not there yet, except in some specialized cases such as large distributed organizations (Conway's law). But it's going to be at least a few years before it's worthwhile to build most software systems in a microservice architecture.
But new technology is necessary and early adopters are necessary. Iteration is necessary. Don't punish it.
Is there an advantage to using docker when it takes 3 hours to rebuild our relatively small database?
In any case, the author of the post actually agrees with you: https://circleci.com/blog/it-really-is-the-future/
Thanks for the article, a lot of interesting things in it.
It's funny how it's focused on the scaling problem. It may depend on which circles we're in, but it seems to me that what people were most interesting about in heroku was more the ease of deployment than the scalability. It probably depends on the size of your usual projects.
Yet, what is interesting in docker is not just scalability. I find it way easier to code system dependencies with docker than with chef, that's already a big win. Also, I've stayed out of heroku for my own projects, mainly because of the cost. Docker, with dokku, allows me to have the same comfort one have on heroku, but with a 80€/mo server (handling about 15 small apps, and still having one third of memory available). And having several applications using several versions of ruby or postgres on the same server is not a problem anymore.
In that regard, docker is not only interesting for people who have massive infrastructure to manage, but also to people who are used to self hosting and want an easier way to deliver.
Probably because we have seen it all before, and there isn't much "new" most of the time.
What is mock worthy are the attempts to make everyone's pet technology a floor wax AND dessert topping. But that's mostly VC funding at work
People seem to underestimate just how powerful modern machines really are. And I don't get why people seem to think it's hard to deploy simple web applications. Just write a 4-line shell script that rsync's, runs whatever DB migrations you may have and restarts the thing.
622482 / 43800
= 14.2119178082 requests per minute
Here are my requests over the last month - http://jmp.sh/9EAUVrvI definitely understand that modern machines are powerful, but I used to STIG RHEL 4 and RHEL 5 boxes for 4 years as my primary day job. I've done everything from create creating kickstart files to manually locking down a whole Linux instances to creating RPM files. At this point in my career I just don't care about the extra cycles I get by using <insert your infrastructure tech>.
If my product needed the extra performance, trust me I would switch.
[1] - https://github.com/joeblau/gitignore.io/wiki/Advanced-Comman...
Since they upped the cost of their small tier, I moved to Digital Ocean and installed Dokku, which gives me that Heroku-like deployment experience so managing my (admittedly very small) website isn't that much of a hassle.
And you automatically get things like auto-scaling, database auto-provisioning. easy debugging and more.
Disclaimer: I'm Boxfuse's founder and CEO
http://discuss.joelonsoftware.com/?joel.3.219431.12
(Factory factory factory factory.)
With tools like Rancher http://rancher.com, you can already see things moving in that direction. Next step is rancher-as-a-service.
When it comes to developers, I think open systems will always prevail in the end (it's just more flexible).
I got really upset about this rancher tool because it doesn't design my database schema.
Shit, future was so close.
Right now, we think of frameworks as being components (part of) larger software systems - But in the future, frameworks will provide the foundations for entire software systems - They will be be responsible for declaring their own network topologies and resource requirements and they will be capable of scaling automatically to any number of machines.
Developers will extend and customize the framework with their own logic but the framework itself will handle all the difficult stuff related to its own operations.
I write stuff in Scheme. I'm a hobbyist, there's no reason for me not to, and I love the language. The apps I write are sometimes single-threaded (or coroutine-based) monoliths. But I only have one machine available for me, and the things I'm writing are fairly simple. It's good ENOUGH. And Worse really is Better[1].
1:and I truly mean that in the Gabriel sense. As in the New Jersey model. Not any other way.
Serious question though: I would absolutely love to have an introduction on how to use Docker to deploy one or two web applications that use a typical amount of backend services, say some sort of database and a redis server. All of this would probably run on a single VM (whether Amazon, DigitalOcean, Linode, ...) and you mainly use Docker to isolate the applications from each other in terms of the environment/dependencies that they need.
How do I do this with Docker in a way that gets me an easy deploy process? (Or maybe the question is actually, should I even do this with Docker?)
reader implements and gets massive bill for personal blog hosting
"Am I doing this right?"
> So I just need to split my simple CRUD app into 12 microservices, each with their own APIs which call each others’ APIs but handle failure resiliently, put them into Docker containers, launch a fleet of 8 machines which are Docker hosts running CoreOS, “orchestrate” them using a small Kubernetes cluster running etcd, figure out the “open questions” of networking and storage, and then I continuously deliver multiple redundant copies of each microservice to my fleet. Is that it?
> -Yes! Isn’t it glorious?
> I’m going back to Heroku.
I would have never considered Docker containers unless artifact preservation/isolation and deployment issues hadn't forced me to look toward a solution.
But if you're a CTO with a startup with 10+ server-side developers and plan to hire at least as much in near future, suddenly all these dockers and microservices actually make sense.
So, unless you'll start conversations with _who_ you are and _what problem_ are you trying to solve, of course the other side will seem stupid.
As a consultant, I often get asked those kinds of questions: "Should we use X?"
Whether it's programming languages, databases, operating systems, whether it's Chef vs Puppet vs Ansible vs Docker vs Whatever, it's a question that comes up a lot.
I generally answer it with "What are your team good at? What have they used, what do they know well?"
There are always exceptions to the rule, but in general I encourage people to play to the strengths of their team, rather than recommending Technology X because it's shiny and bang on-trend.
Can someone explain to me the advantages of Docker compared to Jails?
Hitler uses Docker: https://www.youtube.com/watch?v=PivpCKEiQOQ
By the way, why all the downvotes to the parent?
This is quite frustrating for both people who are aware of those issues and trying to fix them as well as the people missing out on the real advantages of such technologies.
This reminds me of similar sentiment around virtualization and cloud computing later in my peer group:
Some sold VMs as security feature and people focused their criticism on that, without understanding other advantages like quick/self-service provisioning of systems. Later one, cloud computing was trivialized as "it now just somebody elses computer" which completely ignored advantages like no ramp up costs and the ability to problematically manage your systems life cycle.
PS: Considering every new thing a fad probably also makes you consider 'hadoop' the latest shit in big data processing and assume today's tech companies hipster are fighting over wordpress plugins. (Like, really?)
1. I have a much better understanding of what's happening behind-the-scenes
2. For most small startups, you should seriously consider the time (and therefore, cost) of investing in your own infrastructure.
For point #1, I think understanding your options and how they benefit your company is essential for you transition from a small -> medium -> large size company. The paradigms you learn by virtue of researching the new technologies might end up being applicable in other parts of your development process.
On point #2, I partially regret not deploying to Heroku, seeing where our system became stressed, and optimizing. Attempting to scale for things you don't know about yet is tough, and can lead you down a path of wasted time and money.
exactly. I mean look, if you have a lifestyle business that's only going to support 5-10 people, it's totally a waste of time. if you have some hope of scaling this is the way to go. I get it, just use Heroku. It's easy and convenient. If you're planning on a billion dollar exit, this way is way better.
> I need to decide if i believe my own hype?
yeah. sorry.
Microservices often hit the same database. You want to be able to split up the database. Not just into shards, but into distributed nodes.
And by doing this, you split up the whole stack.
I have 200k visitors per month generating 8m page views and about 50m hits on the servers (with CDNs taking another few hundred million hits).
These all peak during the UK weekdays and wind down at nights and weekends.
Divisions over time aren't going to work, but neither is translating visitors into requests, especially as it's only the page views that have a beyond trivial computation cost.
Total Requests
Last Month
622,482> [...] before switching to Docker (even though I do love Docker).
This seems like a very drastic solution to problems he does not yet have. I've been responsible for similar thoughts before, but shortly after realized just how damn stupid I was.
As usual, if a joke needs to be explained (to a person) then it's not funny (to them). I found this amusing since it aligned with my experiences:
> -Since no-one understands Paxos, this guy Diego…
> Oh, you know him?
> -No, he works at CoreOS. Anyway, Diego built Raft for his PhD thesis cause Paxos was too hard. Wicked smart dude. And then he wrote etcd as an implementation, and Aphyr said it wasn’t shit.
> What’s Aphyr?
> -Aphyr is that guy who wrote, ‘Call Me Maybe.’ You know, the distributed systems and BDSM guy?
> What? Did you say BDSM?
There are several jokes here based on cultural references related to the aforementioned topics.
* First when you tell it,
* then when you explain it,
* then finally when they get it.
(x used to be English people when I was a kid.)
Anyway long story short, most of these people do not really understand why they need all this rocket science to manage < 500 internal users. One of the new buzzwords I am hearing these days is mostly related to bigdata and machine learning. One of my managers came to me and asked me why dont we integrate our product with hadoop it will solve the performance problems as it can handle lot of data.
I am frustrated by the industry as a whole. I feel industry is simply following marketing trends. Imagine the no. of man-hours are put into investigating technologies and projects dropped mid-way realizing the technology stack is still immature or not suitable for at all.
People want theirs apps to be made with Visual Studio (BTW, FoxPro was part of the package).
So they ask: "In what is the app made"?
"In Visual, Sir."
Done. End of story (like most of the time, obviously some times people are more dangerous and press it ;) ).
----
The point is not focus in the exact word but in what the people know the word will give to them.
So, for example "Big Data". The meaning for us matter zero. The meaning to some customer is that it have a largeish excel file that with his current methods and tools take too long to get results.
So. Do you use "Big Data Tools"?
"Yes Sir."
And what about use Hadoop?
"We use the parts of big data tech necessary for solve this, and if we need to use hadoop or other similar tools that fit better with your industry and that use the same principles will depend in our evaluation. Not worry, we know this"
Or something like that ;). Know that worry the people behind the words have help me a lot, even with people with WORSE tech skills (damm, I have build apps for almost iliterate people with big pockets but only witch cellphones as references of tech!)
And the anecdote about the largeish excel file that was too big and take too long? Yep, true. And was for one of the largest companies in my country ;)
I am very skeptical of people who are "BizDev" or "Project Managers" or "Managers" or "Scrum Master" they generally don't know what they're talking about and rely on buzzwords.
If they don't accept your answer and ask a followup, then they're probably a person worth actually having a conversation about the pro's and con's with.
My work lands me in a number of different conferences in non-software industries. This is true for all industries. Its just that ours has a faster revolving door. That, in addition to a low barrier to entry (anyone can claim they're a web developer), leads to a higher degree of this madness. Its just part of human behavior to seek out, and parrot, social signals that let others know you, too, are an insider.
Personally, I have to avoid a great number of those gatherings, since the lot of them are just a circlejerk of low-density information. If I pay too much attention to those events, I catch myself looking down my nose, and since that isn't productive/healthy behavior, I avoid landing myself in a place where guys with buddy-holly glasses and <obscure-craft-beer> argue which Wordpress plugin is the best.
Unfortunately I have to agree as a developer. My job is to make a fast, reliable, stable product but at the same time I'm questioned the tools I use by people who don't have any knowledge but heard the latest trend.
But sometimes it's also very easy to please people. Big data: just insert 10M records in a database and suddenly everyone is happy because they now have big data :|
Since when is 10M records is considered big data?
My goto gauge for big data is that it can't fit in memory on a single machine. And since that means multiple TB[1] these days, most people don't really have big data.
[1]: *Heck you can even rent ~2TB for $14/hour! https://aws.amazon.com/ec2/instance-types/x1/
Take your .war file, drop it onto JBoss. It deploys across the cluster in a zero downtime manner, isolates configuration, provides consistent log structure, cert management, deployment. You can deploy dozens of small war's to the same server and they can talk to each other. Load balance across the cluster automatically based on actual load. Run scheduled jobs and load balance the scheduled jobs themselves. Allow them to be isolated and unique within the cluster.
I may not like Java as a language, but from an infrastructure standpoint Java was basically Heroku long before Heroku was Heroku. The infrastructure is just...solid. The downside was that the XML config stuff was just messy.
I mean, it's great to have this new tech and all, but when you're trying to build something to last some years, sometimes it's hard to filter the crap between all the buzzwords. It just reinforces the thought that smart people should just leave this field entirely or search for other fields of knowledge (or business) where our knowledge of programming can be made of use.
I'm 35 now, but I'm starting to realize that I will not have the patience to keep with all the crap just to be employable.. There are some areas where being old and experient is valuable. Philosophy, science, psychology, teaching, etc., are maybe some of them, but this industry is definitely not one of those areas. It makes me think that what I'm building now will some day be completely wiped out of existence..
"All my work will be obsolete by 2005" -Steve Jobs
If you aren't willing to accept that obselence is part of life, then you are either building something you aren't passionate about or confused about the cruelty of time.
If you look at what technology was popular 10-15 years ago then that's what will be in use in Enterprises now. Java web services is currently the big thing at my company.
All the late 90's business apps which were in Visual Basic, Oracle Forms and Access are being rewritten as Java web services at the moment by an army of contractors. In another 10-15 years they will be rewritten again in the language Du Jour of today probably Go. It's an endless cycle.
"We could store gigabytes of data on the clients without having to pay for servers"
Yep, same experience here with both "Big Data" and the ML space. The decision makers need to see the sheer amount of Java, Scala and/or Python code you need to actually implement to do anything useful.
Nope...not magic.
Unlike the natives, however, who simply wasted some time building extraneous fake runways, in the Valley people are royally screwing up their own core architecture.
I'm old enough to find this more humorous than frustrating.
The Valley is ripe for disruption. ;)
So far I've seen micro services repeat this trend almost exactly.
Fashion signals, well, virtually everything about social interactions. A tremendously complex world. Including, for that matter, whether or not you care about fashion trends, and quite possibly, why you might or might not (you're not in the game, you've quit the game, you're so fabulously successful you don't need to play the game, you couldn't play the game if you wanted to, ...)
In IT, TLAs, ETLAs, buzzwords, slogans, brands, companies, tool names, etc., all speak to what you know, or very often, don't know. It's not possible to transmit deep understanding instantaneously, so we're left with other means of trying to impart significance.
Crucially, the fact that clothing and IT fashion are so superficial (of necessity) means they can be game, and that those who are good at following just the surface messages can dive in. Some quite effectively. But they're not communicating the originally intended meaning.
Have you looked at React Native at all?
Thanks.
Bigdata and machine learning are also hot word. But they are clearly modern engineering. Consultants exist to explain the best way to achieve modern best practices to people without the appropriate background. If someone asks about "Why no Hadoopz plx?", either explain the other technology used instead (maybe spark, storm?) or explain that the scale is small enough for Access to handle. That's a consultant's job.
'twas ever thus.
Computer science is not a real field.
That seems excessive"
A 100 times yes. We tried to split our monolithic Rails app into micro-services built in Go. 2 years and many fires later, we decided to abandon the project. It was mostly because the monitoring and alerting were now split into many different pieces. Also, the team spent too much time debating standards etc. I think micro-services can be valuable, but we definitely didn't do it right, and I think a lot of companies get it wrong. Any positive experiences with micro-services here?
To answer some questions: yes this is obviously poking fun at Docker, but I also do really believe in Docker. See the follow-up for more on that: https://circleci.com/blog/it-really-is-the-future/
In a self-indulgent moment I made a "making of" podcast about this blog post, which is kinda interesting (more about business than tech): http://www.heavybit.com/library/podcasts/to-be-continuous/ep...
And if you like this post you'll probably like the rest of the podcast: http://www.heavybit.com/library/podcasts/to-be-continuous/
> -It means they’re shit. Like Mongo.
> I thought Mongo was web scale?
> -No one else did.
It's so incredibly true, and I laugh (and cry, b/c we use Mongo) at this section each time I read it. Also, this gets me every time:
> And he wrote that Katy Perry song?
- So shared webhosting is dead, apparently Heroku is the future?
- Why Ruby, why not just PHP?
- Wait, what's Rails? Is that different from Ruby?
- What's MVC, why do I need that for my simple website?
- Ok, so I need to install RubyGems? What's a Gemfile.lock? None of these commands work on Windows.
- I don't like this new text editor. Why can't I just use Dreamweaver?
- You keep talking about Git. Do I need that even if I'm working alone?
- I have to use command line to update my site? Why can't I just use FTP?
- So Github is separate from Git? And my code is stored on Github, not Heroku?
- Wait, I need to install both PGSql and SQLite? Why is this better than MySQL?
- Migrations? Huh?
Frameworks, orchestrations, even just new technologies -- these are great if they actually make your job easier or if they make your product better. Unfortunately, they often do exactly the opposite.
> using a VCS for personal code can be overkill
I've been burned before, have you? If you're using something like Google Drive, you should use DropBox instead, since it seems less likely to lose your work.
Nooooooooooooooooo. Everytime someone says "service discovery" a kitten dies (Except for consul, that's the biz).
I really dont have any idea why the people are are so excited about "docker" all the things.
I'm pretty docker ignorant. I think I get it in concept. I manage >150 web sites (~15,000 pages total) that are php based with eXist-db and oracle (overkill but forced to use it) for database backends. My team develops on mac os x and pushes code to RHEL. We have never had a compatability problem between os x and RHEL except for some mgmt scripts in bash that were easily coded around.
Big data to me is a 400 MB apache log file.
I go home grateful I don't have to be in the buzz word mix.
I do read a lot about technology and over time that informs some changes like using apache camel for middleware, splunk for log file analysis yada dada...
I have had bosses that brought me buzz word solutions that don't ever match the problems we have. I hate that but right now I am not in that position. My boss leaves technology decisions to us.
Least you think we are not modern at all we do use a CDN, git and more.
Some days I get anxiety from reading HN, feeling stupid. Some days I get a lift from HN from reading articles like this one and the comments.
I am so glad I'm not in the business of chasing technology.
I read both articles a year ago and it really helped me grasp the whole container movement.
"-You think that’s going to be around in 6 months?"
Isn't reputation a thing of beauty?
1) Small teams (~1-5 people) trying to seem "big" by working at Google's scale.
2) Heroku's prices. We are currently (successfully so far) migrating a small Django project from bare Amazon EC2 instances to ECS with Docker. Even using 3 EC2 micro instances (1 vCPU, 1 GB RAM) for the Docker cluster we would spend ~8 USD/month/instance. With Heroku the minimum would be 25 USD/month/dyno. That's a 3x increase in expenses.
It's very possible to take advantage of technologies like containers without getting too caught in the hype.
You can still easily set things up so it's a git based deploy which is hands free after the initial push.
Now you have a single $5-10/month server that runs your app's stack without a big fuss. Of course it's not "web scale" with massive resiliency but when you're just starting out, 1 server instance is totally fine and exactly what you want.
I've ran many projects for years on 1 server that did "business mission critical" tasks like accepting payments, etc..
When I see titles like that (despite the fact that it was intended as sarcasm), I think to myself, e.g., "I bet at least hundreds of people who scrolled past it thought it was sincere, and now they will have this subconscious 'Heroku is Dead... Docker...' thought at times when deploying projects. Maybe they'll even check out Docker. Maybe these hundreds of people will represent a tipping point of sorts for Heroku->Docker migrations, because one of them will write a really great blog post about it, and it will receive thousands of views..." (alternate endings of the same thought continue to be brute-forced for a few moments).
Along the same vein of thinking, back in 2008 I had this "realization" that Google could control the world by simply showing results based on headline titles (e.g., a search for "Obama" during the election could have resulted in articles / results whose titles have words/phrases whose presences are positively correlated to lower or higher stress levels, assumptions, other emotions, etc., resulting in a net positive or negative sentiment, respectively, about the subject of the search query, all while simply scanning the results to determine which one to click).
This would be true for an average BuzzFeed-consuming-crowd, which -to my knowledge- isn't the case here.
Any of the proposed problems that containerization was supposed to fix are already fixed by using proper configuration management. In almost all cases so far, people yammering on about docker and containers (and CoreOS), it ended up being their idea of configuration management, because they didn't have any in the first place.
Say you want to fix your 'problems' with setting up servers, how about doing it the right way. You will need deployment services, regardless of containers, VMs or bare metal. You will also need configuration management services, and monitoring. Containers and special distributions solve none of it, knowledge to run systems is still required and not actually fixing your problems and layering stuff on top of it doesn't actually help.
Get something like SaltStack or Chef, and configure the out of everything. It doesn't care what you're running on, and actually solves the problems that need fixing.
Heroku is great, and free for small services. On the other hand, a highly-available kubernetes cluster is going to set you back at least $100 per month, which is just too much for small startups and side projects before they take off.
I think I'm going to forget everything and head towards http://serverless.com/. No Heroku, no Docker, no micro-services, no servers. Just everything running on AWS Lambda and DynamoDB. And everything static in S3 behind Cloudfront.
Or maybe just Firebase. But I really am tired of managing servers.
Maybe the problem is AWS.
It's a Google product offering, which means it could be EOLed tomorrow. Or this afternoon. Or maybe it already has — better go check their blog.
Disclaimer: I work at Convox.
Should I round up or down?
I have to use ECS for caching (I am not happy about it)
Builds might fail due to the custom docker version/compilation
You can mock docker, but people are using it in one way or another and you should support it properly.
Having read the article back then (and reread it now) it seems like it's still relevant. Maybe we'll have to add the year qualifier after a while when AWS lambda becomes "the way".
https://circleci.com/blog/it-really-is-the-future/
But there is, as the author notes, truth in the satire.
Read it, it's a lovely 5 minutes piece of writing.
At least you understand the author's intention. I would be worried if some non-technical people took the title literally...
is there ANY way i can spin up a server, add the ssh keys to some configuration file somewhere and just "docker-magic push" and have my rails application running ?
or do "docker-magic bundle exec db:migrate" and have that command run on the server.
Or push a Procfile with worker definitions and have the PAAS automatically pick it up, add it to supervisord/systemd and run it ?
You could basically substitute all these backend buzzwords with "Webpack", "Grunt", "Gulp", "Requirejs", "React", "Angular", "Ember", "Backbone", etc. and it would have same effect on the readers--they think you're an annoying hipster.
Thank you.
That's called practicing conservatively, minimizing chances of bad outcomes. It's a matter of astute clinical judgement to glean optimum risk/benefit ratio in a particular case. Since no two cases are ever exactly the same, good judgement is a constant necessity.
I see that the process of developing software has many parallels and not surprising that everyone experiences so much brokenness. When people complain to me about some mysterious program misbehavior (stuff I had nothing to do with) I empathize with them, and try to help them think logically about the problem they're having.
Only rarely can I offer any real insight, but given the insane proliferation of the alphabet soup of identifiers attached to all the "new things" out there, no one I know in the industry feels they have a handle on what's happening.
Seems like the pace of "innovations" will lead to even greater levels of incomplete and dysfunctional systems and can only lead, sooner or later, to truly catastrophic failures.
A small team starting a new project should not waste a single second considering microservices unless there's something that is so completely obviously decoupled in a way that not splitting it into a microservice will lead to extra work. It's also way easier to split into microservices after the fact than when you're developing a new app and you don't have a clue how it will look like or what the overall structure of the app will be in a year (most common case for startups).
That's a good point. I think this thought extrapolates to other parts of software engineering as well. Sometimes writing very modular and decoupled software from the beginning is very hard for a small team, and we can't see well if this is the best approach since it's also hard to grasp the big picture.
I'm currently facing this issue. I'm trying to write very modular and reusable applications, but now I'm paralyzed trying to picture the best patterns to use, where should I use a facade, a decorator, etc. I think I'll adopt this strategy for myself--only focus on modularizing from the beginning if it'd lead to extra work otherwise.
The thing is, you need a massive investment in infrastructure to make it happen. But once you do, its great. You can create and deploy a new service in a few seconds. You can rewrite any individual service to be latest and greatest in an afternoon. Different teams don't have to agree on coding standards (so you don't argue about it).
But, the infrastructure cost is really high, a big chunk of what you save in development you pay in devops, and its harder to be "eventually consistant" (eg: an upgrade of your stack across the board can take 10x longer, because there's no big push that HAS to happen for a tiny piece to get the benefits).
Monolithic apps have their advantages too, and many forget it: less devops cost, easier to refactor (especially in statically typed languages: a right click -> rename will propagate through the entire app) and while its harder to upgrade the stack, once its done, your entire stack is up to date, not just parts of it being all over. Code reuse is significantly easier, too.
Unsure if sarcastic.
>The thing is, you need a massive investment in infrastructure to make it happen.
I thought that one of the selling points of microservice architectures was the minimal infrastructure. I am really struggling to see an advantage in this way of doing things. You are just pushing the complexity to a dev ops layer rather than the application layer - even further form the data.
Maybe Swift? Scala Native in a year or two? I've done a little Erlang before, so maybe Elixir?
Monoliths invariably tend to become spaghetti over time, and completely impossible to any non trivial refactoring. With microservices, interfaces between modules are stable and spaghetti is localized.
The more dramatic effect was on a particular set of endpoints that have a relative high traffic (it peaks at 1000 req/s) that was killing the app, making upset our relational database (with frequent deadlocks) and driving our Elasticsearch cluster crazy.
We did more than just split the endpoints into microservices. We also designed the new system to be more resilient. We changed our persistence strategy to make it more sensible to our traffic using a distributed key-value database and designed documents accordingly.
The result was very dramatic, like entering into a loud club and suddenly everything goes silent. No more outages, very consistent response times, the instances scaled with traffic increase very smoothly and in overall a more robust system.
The moral of this experience (at least for me) is that breaking a monolith app into pieces has to have a purpose and implies more than just move the code to several services keeping the same strategy (that's actually slower, time consuming and harder to monitor)
I can't get my head around how people introduce changes to their system if they have to update 12 different microservices at once? It must be horrible.
Often you hear stories how people are converting monolithic app to microservices - but this is easy. Rewriting code is easy and it's fair to say it always yields better code (with or without splitting into microservices - it doesn't matter).
What I'd like to hear is something about companies doing active development in microservice world. How do they handle things like schema changes in postgres where 7 microservices are backed by the same db? What are the benefits compared to monolithic app in those cases?
It seems to me that microservices can easily violate DRY because they "materialise" communication interfaces and changes need to be propagated at every api "barrier", no?
In my experience, any monolith that can be broken up into a queue based system will benefit enormously. This cleans up the pipelines, and adds monitoring and scaling points (the queues). Queues removes run-time dependencies to the other services. It requires that these services are _actually_ independent, of course.
I do, however, avoid RPC based micro-services like the plague. RPC adds run-time dependencies to services. If possible, I limit RPC to other (micro) services to launch/startup/initialization/bootstrap, not run-time. In many cases, though, the RPC can be avoided entirely.
Yep. We already had a feature flag system, a minimal monitoring system, and a robust alerting system in place. Microservices make our deployments much more granular. No longer do we have to roll back perfectly good changes because of bugs in unrelated parts of the codebase. Before, we had to have involved conversations about deployments, and there were many things we just didn't do because the change was too big.
We can now incrementally upgrade library versions, upgrade language versions, and even change languages now, which is a huge win from the cleaning up technical debt perspective.
It makes sense for some thing. We run a webshop, but have a separate service that handles everything regarding payments. It has worked out really well, because it allows us to fiddle around with pretty much everything else and not worry about breaking the payment part.
It helps that it's system where we can have just one test deployment and everyone just uses that during testing of other systems.
I've also work at a company where we had to run 12 different systems in their own VMs to have a full development environment. That sucked beyond belief.
The idea of micro-service are is enticing, but if you need to spin up and configure more than a couple to do your work, it starts hurting productivity.
Is the payments service a single service that manages the whole transaction, or have you go for multiple services handling each part and, if so, how did you manage failure with a distributed transaction?
We had almost the same story with payments. Except for we've jumped to a payment-processing SaaS but got dissatisfied (all those SaaSes I saw don't work with PayPal EC without so-called "reference transactions" enabled) and decided that wasn't a good idea and we have to jump back to in-house implementation.
I didn't want to re-integrate the payments code back to the monolith - thought it would take me more time and make code messier. So I wrote a service (it's small but to heck with "micro" prefix) that resembled that SaaS' API (the parts we've used). It had surely evolved and isn't compatible anymore, but it doesn't matter as we're not going back anyway.
Works nicely and now I'm feel more relaxed - touching the monolith won't break payments.
On the other hand, I see how too many services may easily lead to fatigue. Automated management tooling (stuff like docker-compose) may remedy this, but also may bring their own headaches.
We have specific services that process different types of documents, or communicate and package data from different third parties, or process certain types of business rules, that multiple apps hook into, but it's literally like 20 services total for our department, some that are used in some apps and not others.
When I hear 'micro-services' I'm picturing something more akin to like node modules, where everything is broken up to the point where they do only one tiny thing and that's it. Like your payment service would be broken into 20 or 30 services.
But maybe I'm mistaken in my terms. I haven't done too much with containers professionally, so I'm not too hip with "the future".
The thing is though, the Elixir feed checker has its own database table that tracks whether it's seen an episode in a feed. And when there's a new episode it sends an API call to WP to insert the new post. The problem is that sometimes the API calls fail! Now what? I'll need to build logging, re-try etc. So I'm thinking of making the feed checker 'stateless' and only using WP with a lot of query caching as the holder of 'state' information about whether an episode has been seen before.
To sum up my experience so far, there's something nice about being able to use the right tech for each task, and separating resources for each service, but the complexity--keeping track of whether a task completed properly--definitely increases.
The advantage though is that APIs (system boundaries) are usually better defined.
Perhaps one should use the best of both worlds, and run microservices on a common database, and somehow allow to pass transactions between services (so multiple services can act within the same transaction).
A shared database is an anti-pattern in distributed systems.
Similarly, distributed transactions (ala. DTC) is an anti-pattern.
Distributed systems aren't hard. They're just different.
Then again, sometimes it's advantageous to identify parts of your system where aspects of state can be safely decoupled. And in which having them reside in disparate systems (and yes, sometimes be inconsistent or differently available) might actually be a better overall fit.
You completely lose the concept of transactional integrity, so you will have to work around that from the start.
Then again, sometimes your state changes not only don't need to be transactional; it can be disadvantageous to think of them that way.
Depends, depends, depends.
The non-web world has been doing this with message queueing for about 15 years. Maybe more.
I mean, the infamous "UNIX way" of "do one thing, do it well" (something we nearly lost with popularity of "do everything in a manner incompatible with how others do it" approach in too many modern systems), when complex behavior was frequently achieved through the modularity of smaller programs communicating through well-defined interfaces.
Heck, microkernels are all about this, and their ideas haven't grew out of nowhere. And HURD (even though it was never finished) is quarter a century old already.
That said, in places where it doesn't make sense we didn't try to force it. Our main game API is somewhat monolithic, but behind it we have almost 10 other services. Here's a quick breakdown:
- Turn based API service (largest, "monolithic")
- Real-time API service (about 50% the size of turn-based)
- config service (serves configuration settings to clients for game balancing)
- ad waterfall service (dynamic waterfall, no actual ads)
- push notification service
- analytics collection service (mostly a fast collector that dumps into Big Query)
- Open graph service (for rich sharing)
- push maintenance service (executes token management based on GCM/APNS feedback)
- help desk form service (simple front-end to help desk)
- service update service (monitors CI for new binaries, updates services on the fly - made easy by Go binary deployment from CI to S3)
- service ping service (monitors all service health, responds to ELB pings)
- Facebook web front-end service (just serves WebGL version of our game binary for play on Facebook)
- NATS.io for all IPC between services
...and a few more in the works. Some of these might push the line of "micro" in that they almost all do more than a single function's worth of work, but that level of granularity isn't practical.But don't get too caught up on the "micro" part. Split services where domain lines naturally form, and don't constrain service size by arbitrary definitions. You know, right tool for the job and whatnot.
I wouldn't, however, just "do microservices" from day one on a young app. But usually that young app has no idea what the true business value is, i.e., you have no idea what down time of certain parts of your services really means to the business. That's the #1 pain point we're solving: having mission critical things up 100%, and then rapidly iterating on new, less stable feature designs in separate services.
You should, however, keep an eye on how "splittable" everything is, i.e., does everything need to be in the same DB schema? Most languages have package concepts, which typically align (somehow) with "service" concepts. Do you know their dependencies? That sort of thing. Then, the later process of "refactor -> split out service" is pretty straightforward and easy to plan.
I don't really like that model applied to everything, but eh now you are kind of forced in a hybrid approach - say, your macro vertical plus whatever payment gateway service, intercom or equivalent customer interaction services, metrics services, retargeting services, there are a lot of heterogeneous pieces going into your average startup.
but back on topic, what Docker really needs now is a whack on the head of whoever thought swarms/overlays and a proper, sane way to handle discovery and fail-over - instead we got a key-value service deployment to handle, which cannot be in docker and highly available unless you like infinite recursion.
I'm currently working on a large refactoring effort along these lines. The end goal is to create a modular, potentially distributed system that can be deployed in a variety of configurations, updated piecemeal for different customers, and integrated by our customers with the third-party or in-house code of their choice using defined APIs. We aren't typical of the other examples, though, in that we do literally ship our software to our customers and they run it on their own clusters.
a good example of this that I've used in production at my current $dayjob: dynamic PDF generation. user makes request from our website, request data is used to fill out a pdf template context which is then sent over to our PDFgen microservice which does its thing and streams a response back to the user.
All of that and much more needs to be replicated for each microservice, right?
Why not just have a module in your monolithic app that does it. The logic will still be separate. In most languages/frameworks you can spawn pdf generation task. Any changes are easier to introduce as well. There's no artificially materialised interface. Updates are naturally introduced. All auth logic is there already, you don't need to worry about deploying yet another service, same with logging etc.
Ah yes, the 'let's have decentralised microservices with centralised standards!' anti-pattern. It results in lots of full-fledged, heavyweight, slow-to-update services, which also have all the problems of a distributed system. It's the worst of both worlds.
Although I personally had to deal with some monolithic monsters that I wished were split into smaller services.
IMHO. You need a lead with a clear vision that drives the effort. Too many leads will create chaos.
Well, there's your problem - you need a monitoring microservice and an alerting microservice! Well, those may be too coarse by themselves, but once you break them down into 5 or 6 microservices each, you'll be ready for production.
I don't know if you understand what Docker really is when you say something like this: "Run only one process in one brand new kernel", the kernel is shared between containers, that's the whole idea, you package the things your application need and be done with it.
The current problem with containerization is that there are no really good or understood best practices, people are still experimenting and that's why it's a big moving target and, consequently, a pain in the ass if you need to support a more enterprise-y environment. You will need to be able to change and re-architecture things if the state-of-the-art changes tomorrow.
I agree with your sentiment about going overboard on "docker all the things", that's dumb and some people do it more because of the hype than by understanding their needs and using a good solution for it but I think you are criticising something you don't really grasp, these two statements:
> "Run only one process in one brand new kernel"
> you have a kernel in your hand, why the hell you will run only one process on it?
I'm not trying to be snarky, I really recommend you doing a bit more of research on Docker to understand how it works. Also, Docker doesn't make it a pain in the ass to upgrade apps, quite the contrary if you do it in some proper ways.
Except now Go and Rust make it very easy to compile static Linux binaries that don't depend on glibc, and even cross-compile them easily.
Hell I think it's actually not even that hard to do with C/C++: https://www.musl-libc.org/how.html
If I have a binary built by Go, what problems does Docker solve that just copying that binary to a normal machine doesn't?
Deployments, installations etc. are pretty easy, it's not something containers are actually good at solving. At best you containerize the configuration management itself, which simply makes it harder to work with.
One process per container is perfectly fine. In fact, that's the common use case. There is absolutely nothing wrong with it, and there is practically zero overhead in doing it.
What you gain is isolation. I can bring up a container and know that when it dies, it leaves no cruft behind. I can start a temporary Ubuntu container, install stuff in it, compile code in it, export the compilation outputs, terminate the container and know that everything is gone. We do this with Drone, a CI/build system that launches temporary containers to build code. This way, we avoid putting compilers in the final container images; only the compiled program ends up there.
Similarly, Drone allows us to start temporary "sidecar" containers while running tests. For example, if the app's test suite needs PostgreSQL and Memcached and Elasticsearch, our Drone config starts those three for the duration of the test run. When the test completes, they're gone.
This encapsulation concept changes how you think about deployment and about hardware. Apps become redundant, expendable, ephemeral things. Hardware, now, is just a substrate that an app lives on, temporarily. We shuffle things around, and apps are scheduled on the hardware that has enough space. No need to name your boxes (they're all interchangeable and differ only in specs and location), and there's no longer any fixed relationship between app and machine, or even between app and routing. For example, I can start another copy of my app from an experimental branch, that runs concurrently with the current version. All the visitors are routed to the current version, and I can privately test my experimental version without impacting the production setup. I can even route some of the public traffic to the new version, to see that it holds up. When I am ready to put my new version into production, I deploy it properly, and the system will start routing traffic to it.
Yes, it very much is the future.
And 25 and 75 are bogus numbers, what if we start running 10 instances?
Once again keep in mind that for new projects the process is so streamlined it will take a fraction of the time to set them up.
Because I've seen my share of nasty "legacy" automation but, surprisingly, I still think a good set of well thought-out shell scripts written by someone that understands what's being automated still beat modern tools, even when the person doing the automation is the same.
I don't quite know why this is, but there's something timeless about shell scripts. I've also seen shell script automation survive for a long time unattended and with zero issues. Not so with some of the modern tools that are supposed to be all unicorns and rainbows.
It all has to be in the script building up strictly from well-understood and long stable basic bricks (and the few places where you don't it's even worse with devops tools.)
Any issue, any question can be answered by reading the damn shell script and you're never dependent on a cookbook/recipe/playbook/component that you got off of some github repo that you need 5% of to do X.
I don't have to rewrite my shell-scripts every 6 months when a new version comes out. New updates usually only happen when security issues arise.
Shell-scripts tend to be simple. There's not a lot of magic hand-holding going on, which means not a lot of complexity to break things.
It keeps you from getting too abstract. Your writing pretty close and specific to what you want it to do, not "how it should be".
They are typically standalone. It's really easy to have 1 script that solves one problem, and another script that solves another. You don't need a giant code-infrastructure to keep things going.
I think config-mgmt tools can be extremely useful if your running a widely-ranged environment. But, you probably shouldn't be running a widely-ranged environment. If you keep things simple, and run as homogeneous as possible, you probably don't need all the added complexity.
Like the way the shell would fork off an "expr" sub-process to parse a mathematical expression to add two numbers, then write the result to a pipe via stdout, then terminate the process, clean up all its resources, and switch context back to the shell, which then read the serialized sum back in from the other end of the pipe, and went about its business, regardless of the fact that the CPU running the shell already had its own built-in "add" instruction in hardware.
Shell scripts are guaranteed to be runnable on all machines.
Unfortunately the shell "language" sucks, but still...
There is, however, still a hump to get over in installation -- you need to learn what BOSH is, install BOSH, then install Cloud Foundry with BOSH. In the long run, for a production deployment, this is what you want. But it certainly doesn't feel that way when you just want to kick some damn tires.
If you just want to tinker, you can try PCFDev[0]. It's a fully-functional Cloud Foundry installation in a single VM.
Disclosure: I work for Pivotal, we donate the majority of engineering on Cloud Foundry.
[0] https://pivotal.io/platform/pcf-tutorials/getting-started-wi...
Edit: yes, I know we ask you to signup during the PCFDev install. I hate it too. We have to for export compliance, it can't be avoided.
In a very specific case, Heroku is the best solution for my problem. Sounds like it is for you too.
http://nickjanetakis.com/courses/scaling-docker-on-aws
It covers using RDS, ElastiCache and also handles load balancing your app + much more.
For example, if a DBA and a JS Developer say "We need to use a scalable database", they probably don't have the same thing in mind about what "scalable" or "database" exactly is, however, both are concerned about provide data at performance.
So, if a naive web developer wanna "a scalable document stored!" you can just give to it postgres and presto! ::troll:: ;)
Also Project Manager and Scrum Master are just positions that describe roles and responsibilities an organization / on a team. The people filling those roles needn't be clueless.
The positions I mentioned above are usually the position people who failed at picking up any valuable skill seem to resort to.
Yanking out the major chucks of independent functionality into separate deployable services makes sense at a large enough scale and for large enough, independent enough components. But you would only do so out of necessity, not as an initial architecture.
And yet here we are.
GCP isn't "Google Labs".
But I think Alan Kay has been "exposed" to computer science, and I follow his logic, based on my limited scope of knowledge.
My remark was to highlight that buzzwords are often used for "me-too" ankle-deep conversations/articles. Whether someone calls it Devops, or Systems Engineering, makes no difference to me. However, I favor pragmatic conversations about the topic, rather than buzzword bingo.
Examples include: "MongoDB sucks.", "Everyone should use Docker", and "What? You mean you're not using Kubernetes for your CRUD app?"
Basically, blanket statements that accomplish nothing more than to send social signals.
So it's healthy to embrace it as counter-balance to the constant hype.
(Besides, whether something is "real" or "solid" I think can mostly be answered in hindsight -- when it's mature enough and tested enough. In which case calling only things in the past solid is prudent).
One key factor was decoupling the high volume updates from the users requests so one didn't affect the other one.
As I said in another thread, the separation in different components was key for resiliency. That allowed independence between the higher volume update and the business critical user facing component.
>I can't get my head around how people introduce changes to their system if they have to update 12 different microservices at once? It must be horrible.
The thing is, if you design the microservices properly it is very rare to introduce a change in so many deployments at once. Most of the time is just 1 or 2 services at a time.
>What I'd like to hear is something about companies doing active development in microservice world. How do they handle things like schema changes in postgres where 7 microservices are backed by the same db? What are the benefits compared to monolithic app in those cases?
We don't introduce new features in our monolith service anymore. So, from that perspective we do all active development in microservices.
>"How do they handle things like schema changes in postgres where 7 microservices are backed by the same db?
The trick is, you want to avoid sharing relational data between microservices. I don't know if it is just us, but we have been able to split our data model so far and in most cases we don't even need a relational database anymore, so having a schemaless key/value store makes seems easy too.
> What are the benefits compared to monolithic app in those cases?"
There are several advantages, but the critical one for me is being able to have a resilient platform that can still operates even if a subsystem is down. With our monolithic app is an all or nothing thing. Another advantage is splitting the risk of new releases.
>It seems to me that microservices can easily violate DRY because they "materialise" communication interfaces and changes need to be propagated at every api "barrier", no?
Not necessarily. YMMV but you can have separation of concerns and avoid sharing data models. When you do have shared dependencies (like logging strategy or data connections) you can always have modules/libraries.
To be honest, we still have a monolithic application at the heart of our system that we've been slow to decompose, though we're working on it. We deploy it on a regular cadence and use feature flags heavily to make it play nice with everything else.
Because individuals may be jumping through dozens of services a day, moving, refactoring, deploying, reverting (when something goes wrong), etc. It has to be friction-free, else you're just wasting your time.
eg: a CLI to create the initial boilerplate, a system that automatically builds a deployable on commit, and something to deploy said deployable nearly instantly (if tests passed). The services are small, so build/tests should be very quick (if you push above 1-5 minutes for an average service, it's too slow to be productive).
Anyone should be able to run your service locally by just cloning the repo and running a command standard across all services. Else having to learn something every time you need to change something will slow you down.
That infrastructure is expensive to build and have it all working together.
Microservices also make it much harder to refactor the code which you often need to do in the early stage of a project.
the template has values that are related to database models. the main app (still mostly monolithic) fills out the template context. the context itself is what's passed to the microservice. the microservice does not connect to a database at all.
> Does it keep connection pool of let's say 5 connection always open (as libraries like to do)?
no. the service probably handles a few hundred requests per day, it is not in constant use. communication is over HTTPS. it opens a new connection on each request. this does impact throughput, but its a low throughput use case, and pdf rendering itself is much slower and that time totally dominates the overhead of opening and closing connections anyway.
> Does it have authentication?
yes, it auths with a bearer token that is borne only by our own internal server. this is backend technology so we don't have to auth an arbitrary user. we know in advance which users are authorized.
> Is it public or private API?
private
> Who is managing security?
we are, with a lot of assistance from the built-in security model of AWS.
> Is it running behind it's own nginx or other proxy?
the main app is behind nginx. the microservice is running in a docker container that exposes itself over a dedicated port. there's no proxy for the microservice, again, because of the low throughput/low load on the service. no need to have a load balancer for this so the most obvious benefit of a proxy wasn't applicable.
> Does it have DoS protection (PDF generation can be CPU intense)?
yes, it's an internal service and our entire infrastructure is deployed behind a gatekeeper server and firewall. the service is inaccessible by outside requests. the internal requests are queue'd up and processed 1 at a time.
> What about the schema for request?
request payload validation handled on both ends. the user input is validated by the main app to form a valid template context. the pdf generator validates the template context before attempting to generate one also. its possible to have a valid schema that has data that can't be handled correctly though. errors are just returned as a 500 response though. happens infrequently.
> They need to be deployed together with changes in other services, right?
nope. the microservice is fully stand alone.
> What about changes to database schema - you need to remember to update that service as well and redeploy it at the right time as well - just after successful db migrations - which live in another project.
the microservice doesn't interact with a database at all. schema changes in the main app database could potentially influence the pdf template context generation, but there are unit tests for that, so if it does happen we'll get visibility in a test failure and update the template context generation code as needed. none of this impacts the microservice itself though. it is fully stand alone. that's the point.
> All of that and much more needs to be replicated for each microservice, right?
in principle yes, and these are good guidelines for determining what is or is not suitable to be a microservice. if it would need to auth an arbitrary user, or have direct database access, or be exposed to public requests, it might not be a good candidate for a microservice. things that can stand alone and have limited functional dependencies are much better candidates.
> Why not just have a module in your monolithic app that does it.
because the monolithic app is Python/django and the PDF generation tool is Java. one of the main advantages of microservices architecture is much greater flexibility in technology selection. A previous solution used Python subprocesses to call out to PDF generation software. It's actually easier and cleaner for us to use a microservice instead.
Nowadays all that I do is setup a barebones CoreOS instance and fire away containers at it, be it with kubernetes (and then my config management is a bit more robust so to setup k8s in CoreOS) or just use CoreOS's own fleet if it suffices.
Then I get the goodies of containerization such as process isolation, resource-quotas, etc.
Like I said: it isn't painless, sometimes much the opposite, but it's worked much better for the lifecycle of most of the products and services I've been working on the past couple years.
Even before with automated deployments it wasn't so easy when configuration begins to get hairy. And yes, you can argue that this might be a smell of something else but that's what I've seen happening over and over.
Most of the times that I bring up the concept of virtue to my peers in age they seem either confused with the concept or contemptuous of it. They behave like virtue is a purely religious thing, yet caution in the face of possible danger is a very basic survival skill.
The quote has bounded context. And in that context, seems generally valid and applicable.
I'm basically saying that the high churning we have now does not give you enough time to build significant experiences that you can use later on in life, and, as such, it's the opposite of a good investment to the future. It is almost as we are only living for the present status, forgetting that on the future we will have less patience and energy to have to "re-learn" almost the same things.
Oh, and be very careful of the commands your script invokes!
shell scripts have no guarantee of portability (often less than Python, which has a rich standard library available on all platforms).
Why do you say that? genuinely curious
I have come to the point where I only look at other languages once in a while and it serves me well.
A few years ago when I was still in farming we had the ostrich craze: ostriches were crazy profitable (or so the ostrich sellers said) and every farm needed to consider it.
Eggs where $300 a piece etc etc.
Of course the first to get one made great money by selling eggs, chicken and consulting hours to all the rest.
The rest where not so lucky and today I don't know a single ostrich farm.
Same goes for latest tech: if you want to you can try to be first and make a good living on the hype stream.
As is, believe it or not JavaEE.
Most all of that is overall poor architecture, and most companies don't hire particularly good developers or DBAs (and most web developers aren't actually very good at manipulating data, relational or not), but it's the state of the union. That's "enterprise IT". That's why consultancies makes billions fighting fires and fixing things that shouldn't be problems in the first place.
A Lucene index can be much larger than your current RAM. It can be 100x that. The data will still queryable. Lucene reads into memory the data it needs in order to produce a sane result. Lucene is pretty close to being industry standard for information retrieval.
My definition is instead "when your data is not queryable using standard measures".
I unsubscribed from that (non-tech) podcast.
I'm curious; in what kinds of situation would this apply?
> Depends, depends, depends.
Flexibility is usually an important requirement. Often you cannot freeze your architecture and be done with it. I think a transactional approach could better fit with this.
Any situation where the business value of having your state be 100% consistent does not outweigh the performance or implementation cost of making it so.
This applies to small environments. If the environment is large the situation almost reverses.
Deploying automation throughout a large homogenous environment is where config-management tools really shine. They make it easy to ensure homogeneity is maintained (even if that just means ensuring all machines have the same set of shell scripts) and allow grouping for staggered updates.
If the environment is widely-ranged and large, the utopia starts to break down. Their configuration explodes in complexity and (if you're not careful) you end up with mostly the same amount of work as if they were managed as small independent environments. With the added risk that there is now a single place from where you can break everything at once.
And this happens... Usually from wrong assumptions of what's common between all machines in the environment. In homogeneous environments almost everything is common, but in widely-ranged environments you sometimes add some configuration that wasn't there before and you think applies to the whole set and all hell breaks loose. If you're lucky this will happen suddenly, if you're not, breakage will spread slowly and you'll spend quite a lot of time scratching your head on why.
I don't think large/small is a good deciding factor. You can be large and homogeneous, or small and diverse. I think similar/dissimilar is a better decider for config-mgmt vs shell-scripts.
I'd argue that config-mgmt usually does a better job if your setup is large and complex. No need to write a script that checks if it needs to install a .deb, .rpm, or whatever, if your config-mgmt tools have already done that work.
Also, if you build your shell-scripts right, they can ensure that your system is kept the same.
Unless you are over fifty years old, you never experienced this.
Rob Pike said it best: "Those days are dead and gone and the eulogy was delivered by Perl."
Perl being a thing in 1995...
[1] https://github.com/baphomet-berlin/jQuery-basic-arithmetic-p...
Well, even someone who's young (for values of 'young' older than high-school age) is probably spending more than that every month on beer, food & entertainment each …
or you just host it on an old laptop hidden in your closet.
As a grumpy SA, I see way too many people try to push for new tools because they "seem cool", instead of "Do they solve a problem we have?"
Things we consider industry standard though, why should you need to fight for it? An example I can think of, dependency injection. Ideally you can test your software better and realease more reliable builds. Believe it or not I do come across companies that still are not aware of these concepts. Introducing it would be possible without breaking anything because you can continue instantiating services the old fashioned way.
With newish stuff that's still changing, if it won't impact production (i.e., tooling) I'm up for adopting it earlier than usual.
One example I can think of is javascript bundling and packaging. This would not impact production, but will have a pretty big impact on feature integration between team members and rate of completion. In MVC you need to hand type up the path of all your JS files and stick them into bundles. Not bad, not great either. Instead you could take your flavor of package management and have that bundle and minify your js files for you automatically.
I've been around government contracting and when you see problems that come up a lot, that we have industry standard solutions too, it's hard not to feel frustrated. I get where you're coming from though, just sharing my experience :)
More programmers need to embrace the suck.
I'd argue the opposite. Instead of spending time reflecting on how cool and useful their code is, or hardening it up, devs spend too much time reinventing the wheel. All this work to learn the next new fad is killing productivity.
Sometimes it can make you more productive. Or though your site is still responding to current customer demands in a timely fashion, you know that the mobile experience could be significantly improved now that browsing via cell phone is on the rise.
Another thing to consider is employability both from a company and individual perspective. If you can keep up with moderately current (not the latest and greatest) trends, you'll attract people who want to grow in their careers. I wouldn't want to work on C# 2.0 using Visual Source Safe. It's hard to convince a company that you can learn git on the job.
In general I like to move without introducing breaking changes. I'm not a cowboy coder, it's really exhausting working with one. I do think there's merit in realizing when it's time to change though.
Multiple platforms is not a problem and generally a good thing as long as it's not excessive. You don't want to be in a case where you have the same number of different platforms as developers or anything like that. I'm guessing there is a rule of thumb here, but I'm not sure what it would be. Max 1 different platform per 5 developers? Something like that.
I do wish people would stop conflating "running in a different service" and "loose coupling". They are completely orthogonal.
I've worked on some horrendously tightly coupled microservices.
Unless you can coax dOSGi into working (which is tons of fun), then you can have services tightly coupled to other services running on entirely different machines causing frequent (and hilarious) cascades of bundle failures whenever the network hiccups.
OSGi is a trigger word for me now. I've worked on two large OSGi projects (previous job and current job) and it's always the same. Sh*t is always broken (and my lead still insists that OSGi is the one true way to modular bliss). And the OSGi fanboys always say "Your team is using it wrong!" Which very well might be true, but I no longer care. Apparently it's just too damn hard to get a team of code monkeys to respect service boundaries when OSGi makes it so damn easy to ignore them.
If I'm ever in a position of getting to design a new software architecture (hasn't happened in 10 years, but hey I can dream), I'll punch anyone who suggests "OSGi" to me right in the face.
As such, I maintain SOAP should be gone for the good of the running system.
Look at android, it's a closed project that the occasionally release some source code for. If google decided to drop android then it's a critical blow because there isn't much of a community around it.
It's "open", but it would almost certainly collapse entirely if Google decided to drop support for it. (There's no sign that they'll do that, but it's not a risk you take with a language like C.)
Anything you're not running locally just hits the shared infra.
Git doesn't really help with that. More granular deployments do, and if microservices help with more granular deployments, go for it.
That's your problem right here
CF is already a relatively successful business with hundreds of million dollars in annual revenues across a pile of companies. Kubernetes and Docker are small in comparison as "businesses" but of course the momentum there is surging in terms of both pure open source adoption and contribution. It's likely going to be a big market with a lot of choice like Deis, or plain k8s, or RedHat OpenShift, or IBM bluemix, Pivotal , Docker Datacenter, or Mesos/Marathon, etc. It's a bit of a market war brewing and that competition will make for better solutions.
What is great about open source vs. past tech "gold rushes" is that these experiments and feedback loops exist across communities that are otherwise competing and overlapping. Mesos adopted the Docker image format independently of the Docker runtime ; Kubernetes introduced pods independently but also reused parts of Docker. Docker container networking and volumes are being used compatibly in the latest incubated CF releases. RedHat submitted a way to get CF style buildpacks working on K8S. Someone found a way to make CF run on Mesos; I could see a similar attempt on Kubernetes some day. It's a confusing and busy time but also an explosion of activity. And even if there is competition for dollars in the end among all these players that will lead to tension , the work is out in the open mainly.
Most of Cloud Foundry is built the way I like software to be built. Pair programming, TDD, small balanced teams, prioritising for user valu.
That style of development is actually baked into the Cloud Foundry Foundation rules. Companies who join the Foundation are expected to send engineers to ramp up on developing in this style. And voting rights are based on the number of full-time engineers you have assigned to the effort.
The reason I mention all this is that I trust the way we build Cloud Foundry. We still get production bugs and oversights and mistakes. It's around 4 million lines of code that turns into a distributed system of ~50 different interacting processes. We built a fully-featured, robust PaaS, using containers, in about 3 years, starting from scratch.
Nobody outside Google had built a container platform of this level before. Nobody but Heroku had built a fully-featured PaaS of this level before. We are, to my knowledge, the first system to do both of these things. Certainly the first opensource one.
The reason you never hear about Cloud Foundry is because we've already built all the components other folks are trying to roll up into full PaaSes. "It just works, already" is a boring story.
But again, quite seriously: I am obviously very biased.
Please excuse me while I take an hour to digest that.
Right now the companies who provide the most engineering -- Pivotal and IBM -- are laser-focused on capturing enterprise dollars.
Which from a business perspective makes perfect sense. Pivotal's commercial distribution of Cloud Foundry (PivotalCF) holds the record for fastest-growing sales of any opensource product. Ever.
But there's not much effort on promoting to devs and startups (IBM are starting to do this more with BlueMix, which is their CF distribution). But it's early days.
So basically you'll usually find that I and a sprinkling of my colleagues show up in threads like these out of the goodness of our hearts and fondness for our work.
Obviously I have a financial interest. Pivotal makes money from PivotalCF, I work for Pivotal I'm on an options plan as well. So YMMV.
But I think Cloud Foundry is just ... way ahead, in terms of actually getting work done.
Edit: and since I'm musing aloud about business-y things, I should emphasise again that nothing I say is in an official capacity, consult your lawyer, financial planner and astrologer, etc etc.
Yep, it costs $20. Basically the cost of chinese food for 2, to ensure guaranteed victory in learning the essentials of AWS' platform while having a guided tour on how to deploy a fault tolerant web app with Amazon ECS from start to finish.
You can definitely learn everything for free, but the value in a course is that you're getting a cohesive learning path that was carefully planned and tested. You get a system that you can apply to your own projects and plenty of source code to reference.
You're paying the $20 so you can avoid spending 6 months trying to figure out everything on your own while stringing together a bunch of half-assed blog posts and tutorials.
You pay the small fee for certainty and it's well worth it because your time (and sanity) is not infinite.
I see that you recently left Amazon after making 200k/year there (a comment you made 5 days ago). I can see why you don't like people promoting Amazon products, I fully understand.
Rows is a bad measure of "big" when it comes to data. A measurement of bytes and probably more specifically bytes per field and how many fields the records have, as this gives a better indication into the way this will be written and potentially searched.
10 million rows of 5 integer values is pittance for any relational database worth using in production. 10 million rows of 250 text columns would be horrendous for a relational database.
But many times this happens because of wasted or bloated indexes that aren't useful. Or it happens when data types are picked incorrectly.
For example, I once worked on a database where the original developer used Decimal(23, 0) as a primary key. This was on MySql and that ended up taking up 11 bytes per row, versus a Long which would have just been 8. In one table, maybe not so bad but when you start putting those primary keys into foreign key relationships... we ended up with a 1 billion row table in MySql that had 4 of these columns in it. That might make it "big data" by that definition but it's also just bad design.
Another example in that same database was using text fields in mysql for storing JSON. Since text fields in mysql are stored as separate files, this meant that every table that had one (and we had several tables that housed multiple) ran in large IO and disk access issues.
"big" data is probably a bad term to use these days because of easy it is to accidentally create a large volume of data but not need a big data solution outside of the fact that it's not the business that needs it, it's the poorly implemented system that does.
But the real reason we talk about fitting in memory comes from the core of the issue: IO. Even a super large memory set could end up being slow if it's postgres and single threaded reader that's scanning a 500 GB index. AWS offers up to 60 GB/s memory bandwidth and we'd need it for this index, since that would still take almost 10 seconds to warm up the indexes in the first place.
"SQL doesn't scale". It needs to be in Mongo or whatever NoSQl database is in right now. I have heard all sorts of nonsense regarding "big data" in the last few years.
One hard tech limit is that with 50k podcasts, 4million+ episodes, search definitely doesn't work well. Not just WP, but SQL itself. Hence Elasticsearch. I also plan to work on recommendations, etc. so will need probably to be exporting SQL data into other systems anyway for making the "people who liked this also liked this" kinda things.
Also I kinda lied about using the WP API--that's how I built the system initially (and will switch to it moving forward), but to import the first few million posts from the content of the feeds, I just used wp_insert_post against the DB of new entries that Elixir fetched (I posted the code I used here: http://wordpress.stackexchange.com/a/233786/30906).
I also plan to write the whole front-end in React (including server side rendering) so will have to figure out how to get that done. Would probably use the WP-API with a Node.js app in front of it, will look into hypernova from AirBNB. So probably more usage of WP API accessed by another service...
It doesn't sound like microservices are needed, just adding in the appropriate tech for the job.
Once these are doing anything other than rotating log files, can the system really be considered monolithic?
It's literally billed as "A fast and modern Python SOAP client". Python 2 and 3 compatible. Last commit was two weeks ago.
And going by the bugtracker, it's running into quite a few problems with almost-but-not-quite compliant servers/WSDL files, which is a real issue when you're trying to interface ass-old legacy APIs (we're talking "not upgraded since 2006"-old) made by $BigEnterprise. Maybe this time the project won't die before they work out all the little kinks.
> devs spend too much time reinventing the wheel
I'd argue the opposite. They spend too much time not reinventing the wheel. They strap factory made bicycle wheels onto a car and are surprised when the wheels break. They could benefit from spending more time trying to make a better wheel.
Do you have any suggestions for which 'better wheels' people should be looking at?
A thousand times yes. Distributed systems are hard.
> Debugging is more difficult since you now can no longer step through your program in a debugger but rather have an opaque network request that you can't step into.
Yes. Folks underestimate how difficult this can be.
In theory it should be possible to have tooling to fix this, but I've not seen it in practice.
> You can no longer use editor/IDE features like go to definition.
Not a problem with a good editor.
> Version control becomes harder if the different services are in different repositories.
No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.
I would modify this slightly. Larger organizations with independent teams may want to run on per-team repos. Conway's law is an observation about code structure but it sometimes also makes good practice for code organization. And of course, sometimes the smell is "this company is organized pathologically".
Another problem is that large monolithic repositories can be difficult to manage with currently available software. Git is no panacea and Perforce isn't either.
Flat out wrong for any organization with multiple products. Which, let's be honest, is most of them.
My personal take on it, at this point, is that much of our knowledge of how to manage projects (things like individual project repos, semantic versioning, et cetera) is centered on the open-source world of a million mostly-independent programmers. Things change when you work in larger organizations with multiple projects. You even start to revisit basic ideas like semantic versioning in favor of other techniques like using CI across your entire codebase.
Mind elaborating on this?
What editor are you thinking of that can jump from HTTP client API calls to the corresponding handler on the server?
Totally agree with everything else, but gotta completely disagree on this last point. Monorepos are a huge smell. If there's multiple parts of a repo that are deployed independently, they should be isolated from each other.
Why? Because you're fighting human nature, otherwise. It's totally reasonable to think that once you excise some code from a repo that it's no longer there, but when you have multiple projects all in one repo, different services will be on different versions of that repo, and your change may have changed semantics enough that interaction bugs across systems may occur.
You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?
When separately-deployable components are in separate repositories, and libraries are actual versioned libraries in separate repositories these relationships are explicit instead of implicit. Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.
While with the monorepo approach you can get "fun" things like service A inspecting the source code of service B to determine if cache should be rebuilt (because who would forget to deploy service A and service B at the same time, anyways...), as an example I have personally experienced.
My personal belief is that the monorepo approach was a solution back when DVCSs were all terrible and most people were still on centralized VCSs like Subversion that couldn't deal with branches and cross-repo dependencies well, and that's just what you had to do, while Git and Mercurial, along with the nice language-level package managers, make this a non-issue.
Finally, there's an institutional bias to not rock the boat (which I totally agree with) and change things that are already working fine, along with a "nobody got fired buying IBM" kind of thing with Google and Facebook being two prominent companies using monorepos (which they can get away with by having over a thousand engineers each to manage the infrastructure and build/rebuild their own VCSs to deal with the problems inherent to monorepos that most companies don't have the resources and/or skills to replicate).
EDIT: Oh, I forgot, I'm not advocating a service-oriented architecture as the only way to do things, I'm just advocating that whatever your architecture, you should isolate the deployables from each other and make all dependencies between them explicit, so you can more easily write tooling to automatically catch bad deploy states, and more easily train new hires on what talks to/uses what, since it's explicitly (and required to be) documented.
If that still means a monorepo for your company's single service and a couple of tiny repos for small libraries you open source, that's fine. If it means 1000 repos for each microservice you deploy multiple times a day, that's also fine (good luck!).
Most likely it means something like 3-10 repos for most companies, which seems like the right range for Miller's Law) ( https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus... ) and therefore good for organizing code for human consumption.
But having multiple repos doesn't prevent the equivalent situation from happening (and, I think, actually makes it much likelier): no matter what, you have to have the right processes in place to catch that sort of issue.
> You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?
That's the sort of problem which happens with multiple repos, but not (as often) with a single repo.
> Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.
A monorepo is explicit, too, even more explicit than multiple repos: WYSIWYG. And you can always see if your services are using the same API by compiling them (with a statically-typed language, anyway).
The beautiful thing about a monorepo is it forces one to confront incompatibilities when they happen, not at some unknown point down the road, when no-one know what changed and why.
If you expect to need to step into a function call when debugging, then it's too tightly coupled to spin out. You should be able to look at the arguments to the call and the response and determine if it's correct (and if not, now you have isolated a test case to take to the other service and continue debugging there).
If the interface will change so often that you expect it will be a problem that it's in a separate repository, if you expect that you will always need to deploy in tandem, then it's too tightly coupled to spin out.
The advantage of micro services is the separation in fact of things that are separate in logic. The complexity of systems grows super-linearly, so it's easier to reason about and test several smaller systems with clear (narrow) interfaces between them than one big. It's easier to isolate faults. It's harder to accidentally introduce bugs in a different part of the system when the system doesn't have a different part. If done right, scaling can be made easier. But these are hard architectural questions, there's no clear-cut rule for when you should spin off a new service and when you should keep things together.
Someone else mentioned separating the shopping app from the payment system for an ecommerce business, which even has security benefits. I think that's an excellent example.
Edit: Another clear benefit is that you can choose different languages, libraries, frameworks and paradigms for different parts of the code. You can write your boring CRUD backend admin app in Ruby on Rails, your high-performance calculation engine in Rust and your user-facing app in Node.js (so the front- and backend an share Javascript validation code).
As for advantages, microservices tend to keep code relatively simple and free from complex inheritance schemes. There's rarely a massive tangled-up engine full of special cases in the mix, as there often is in monolithic apps. This substantially decreases technical debt and learning curve, and can make it simple to understand the function an isolated microservice performs.
There is the obvious advantage that if you have disparate applications executing nearly-identical logic to read or write data to the same location, and the application platforms can't execute the same library code, you can centralize that logic into an HTTP API, which reduces maintenance burden and prevents potentially major bugs.
My opinion is that adopting microservices as a paradigm leads to a slow, difficult-to-debug application, primarily because people take the "micro" in microservices too seriously. One shouldn't be afraid to split functionality out into an ordinary service after it's been shown to be reasonable to do so.
With microservices, the production version of their service would conceivably be stable. It moves the contract from the repo to the state of production services.
With a monolithic repo done right, the other teams broke their build of their branch, and it's up to them to resolve it. You, meanwhile, are perfectly happy working on your branch. When their changes are mergeable into trunk, then they may merge them, not before — and likewise for you.
With multiple repos, they break your build, but don't know it. You don't know it either, until you update your copies of their repos — and now you have to figure out what they did, and why, and how to update your logic to handle their new control flow, and then you update again and get to do it again, until finally you ragequit and go live in a log cabin with neither electricity nor running water.
I don't see how this is a problem if you are pushing frequently and have a CI system. You know within minutes if the build is broken. If it broke, don't pull the project with the breaking changes.
My point is, I don't think one approach is inherently better than the other. Both require effort on the part of the teams to manage changes (or a CM team), and both require defined processes.
Yes, that's the contract that you need to have with other teams. And it's the contract that is automatically enforced with microservices.
You don't debug distributed systems by tracing into remote calls and jumping into remote code. You debug it by comparing requests and responses (you use discrete operations, right) with the specified requests and responses, and then opening the code that has a problem¹.
It calls for completely different tooling, not for a "better debugger".
1 - Or the specs, because yes, now that your system is distributed you also have to debug the specs. Why somebody would decide on doing that for no reason at all? Yet lots of people do.
I generally like reading on anything Lisp-related, as this family of languages is still pretty much direct heritage of the golden ages.
The stuff done by Alan Kay, et al. over at PARC is also quite insightful.
Bwuh? Over in MS SQL you just go for an NVARCHAR and forget about it. What is the right way to store this data (if you really do need to store the JSON rather than just serializing it again when you get it out of the DB)
It stores text fields as blobs.
I suppose now the right way would be the json data type. It didn't exist when I was working with these servers though (or they were on a much older version of MySql) https://dev.mysql.com/doc/refman/5.7/en/json.html
I agree with the overall sentiment of your comment, but the quoted part is where I've seen trouble brew. The tendency is to be conservative about pulling updates to dependencies, which can easily get you into a very awkward state when a critical update eventually sits on top of a bunch of updates you didn't take because they broke you. It is usually better to be forced to handle the breakage immediately, one way or another.
from subprocess import Popen, PIPE
p1 = Popen(["foo"], stdout=PIPE)
p2 = Popen(["bar"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output, _ = p2.communicate()
https://docs.python.org/3.5/library/subprocess.html#replacin...When you're not writing shell, just use the tools the language gives you.
For the matter, I think a shell script is cleaner than a python script for devops; but I don't think the composability of unix tools is that much of an advantage compared to the amount of python libraries out there.
I'm likely to be using Python programs and other programs in those shell scripts. The beauty of shell is that it makes it so easy to compose programs written in different languages.
I think the advantages of using a single language for everything outweigh the disadvantages - see e.g. http://www.teamten.com/lawrence/writings/java-for-everything... (though actually my single language is Scala)
I'm curious because these simple things that are delightfully easy in bash often turn out to be surprisingly tedious in other languages.
Of course, some things are tedious in bash too. But a basic principle of shell scripting is that you call other programs to do the stuff you don't want to do in shell.
I agree it is tedious, but to be honest, reading and writing to stdin/out isn't something that would commonly need to be done in a robust system. If the world were perfect you would use library functions.
I definitely think there is scope for a language that works well as an interactive shell, and as a general purpose language. They have somewhat conflicting constraints but I'm sure we can do better than Bash. Have you seen how [ is implemented?
Monorepos come with their own challenges. For example, if any of your code is open source (which means it must be hosted separately, e.g. on Github), you have to sync the open-source version with your private monorepo version.
Monorepo are large. Having to pull and rebase against unrelated changes on every sync puts an onerous burden on devs. When you're remote and on the road, bandwidth can block your ability to even pull.
And if you're going to do it like Google, you'll vendor everything -- absolutely everything (Go packages, Java libraries, NPM modules, C++ libraries) -- which requires a whole tool chain to be built to handle syncing with upstream, as well as a rigid workflow to prevent your private, vendored fork from drifting away from upstream.
There are benefits to both approaches. There is no "one right way".
I love Git, and I used submodules for years in personal projects. It started with a few support libraries shared between projects, or common scripts for deployment, but it quickly ballooned into a mess. I'm in the process of moving related personal projects to a monolithic repository, and in the process I'm giving up the ability to tag versions of individual projects or provide simple GitHub links to share my code.
Based on these experiences, I honestly think that the only major problem with monolithic repositories is that the software isn't good at handling it, and this problem could be solved with better software. If the problem is solved at some point in the future, I don't think the answer will look much like any of the existing VCSs.
Based on experiences in industry, my observation is that the choice of monolithic repository versus separate repository is highly specific to the organization.
You expose what are the network APIs of your apps (e.g open ports), filesystem mounts, variables (12 factors), etc.
Your application becomes a block that you can assemble for a particular deployment; add some environment variables, connect a volume with a particular driver to a different storage backend, connect with an overlay to be able to talk to other containers privately across different servers or even DCs, etc.
It's really all about layers of abstraction for operating an application and deploying it to different environments.
With the latest container orchestration tools, you can have a catalog of application templates defined simply in Yaml and it's very easy to make it run anywhere. Add some autoscaling and rolling upgrades and it becomes magic for ops (not perfect yet, but checkout latest Kubernetes to see new advancements in this space).
With the proper tools and processes, this removes a lot of complexity.
But environment variables already exists without docker. Volumes already exists, aka partitions. "Overlay network" already exists, aka unix sockets or plain TCP/UDP/etc over the loopback interface.
I'm not trying to be a dick here, it's just that the points you brought up doesn't really bring anything new to the table. How is this different from just having a couple bare-metal or virtual machines behind a proxy?
There are some aspects to containerization that are very feasible, but only at certain scales and the points you brought up makes me question whether you perhaps might be over-engineering things a bit.
For example, volumes: With Kubernetes (on Docker), the lifetime of the volume mount is handled for you. No other containers have access to the mount. Container dies, mount dies. Whereas on plain Linux, mounts stay. You need cleanup, or you need to statically bind apps to their machines, which will seriously limit your ability to launch new machines -- there will be a lot of state associated with the bootstrapping of each node. Statefulness is the enemy of deployment, so really what you want is some networked block storage (EBS on AWS, for example) plus an automatic mount/unmount controller, thereby decoupling the app from the machine and allowing the app to run anywhere.
Environment vars are inherited and follow the process tree, so those are solved by Linux itself.
Process trees also handle "nesting": Parent dies, children die. But you will end up in a situation where a child process might spawn a child process that detaches. This is particularly hard to fix when a parent terminates, because the child doesn't want to be killed. Now you have orphaned process trees. The Linux solution is called cgroups, which allows you to associate process trees with groups, which children cannot escape from. So you use cgroups, and write state management code to clean up an app's processes.
I could go on, but in short: You want the things that containerization gives you. It might not be Docker, although any attempt to fulfill the principles of containerization will eventually resemble Docker.
You now have generic interfaces (Dockerfile, docker-compose, Kubernetes/Rancher templates, etc.) to define your app and how to tie it together with the infrastructure.
Having these declarative definitions make it easy to link your app with different SDN or SDS solutions.
For example, RexRay for the storage backend abstraction of your container:
http://rexray.readthedocs.io/en/stable/
You can have the same app connected to either ScaleIO in your enterprise or EBS as storage.
We are closer than ever to true hybrid cloud apps and it's now much more easier to streamline the development process from your workstation to production.
I think it's pretty exciting :)
What networking problems does Docker solve?
Your program don't see what else is running on the system. Also means that it removes possible conflicts for shared libraries and other system-wide dependencies.
This kind of isolation is not only good for app bundling as a developer, but even more important as an operator in a multi-tenant scenario. You throw in containers and they don't step on each other toes. Plus, system stay clean and it's easy to move things around.
Network namespace as in linux network namespace (http://man7.org/linux/man-pages/man8/ip-netns.8.html).
Each container has it's own IP stack.
Containers provide proper abstractions so you can then assemble all of this, pretty much like you use pipes on a unix shell.
Something like kubernetes also lets you abstract away the lock-in of your cloud infrastructure, so whilst it adds another layer and a bit of complexity, it again is arguably worth the effort if you're worried about needing to migrate away from your current target for some reason in the future.
As a framework it abstracts apps from infrastructure quite well. It's super easy for me to replace my log shipping container in kubernetes and have most things continue to work, as all the apps have a uniform interface.
Nobodies saying you can't build these things without kubernetes, but it definitely gives me more of the things than configuation managment systems currently do. Personally, I'd rather aim at the framework than handles more of what I need it to do.
Finally, bootstrapping a kubernetes cluster is actually quite trivial and you can get one off the shelf in GKE, so I'm not really sure why I'd personally want to go another route.
This is not revolutionary in itself, but having the creation and deployment of a server being 100% replicable (+ fast and easy!) on dev, preproduction, and production environments, plus it's managed with my usual versionning tool, that is something I appreciate very much.
Sure, there are other tools to do the same, but docker does the job just fine.
The problem of ensuring that upstream dependencies can be reproducibly installed and/or built is, of course, left as an exercise for the reader.
Yeah, a nicer shell-like language would be cool. I've been thinking about it for a while.
Bash is quirky but it gets a lot of stuff right and once you understand it it can be extremely ergonomic and productive.
And not depending on language run times other than shell can be really glorious in some situations, too...
This sounds exactly like the "It's the future!" guy in the original post...
The kinds of scripts I write would be awkward to have as compiled JVM programs, I think. Shell is just way more ergonomic for me for many tasks.
Data can be meaningfully separated from control and structure in many cases, and failure to do that is a major (perhaps the major) source of security bugs.
Shell doesn't enforce any particular interpretation of data. Pipelines simply connect one program's output to another's input. Interpretation is up to the programs.
If these were the only standard protocols that existed, and people were trying to tell me this was great because it's easy to compose different network applications, that criticism would be completely valid.
>Interpretation is up to the programs.
But because there are no standards beyond "stream of bytes", the chance that two independently written programs working with non-stream-like data can communicate directly is extremely low.
But the shell language itself is one of these legacy non-standardized formats. Arcane escaping rules, multiple incompatible implementations, surprising ways things get interpreted as code (e.g. the recent bash CGI bug),...