Ask HN: Best books on managing software complexity? What are the best books you've read on how to manage software complexity, both from an architectural as well as organizational perspective? |
Ask HN: Best books on managing software complexity? What are the best books you've read on how to manage software complexity, both from an architectural as well as organizational perspective? |
- Mythical Man-Month (the classic, a little outdated but the principles still ring very true)
- Software Engineering at Google (thoughts from a company that has spent a lot of time thinking about this at scale)
1. Counter-intuitively, being deeply interested and reading about a topic in natural science. Software is complex because it is a system, and especially if you're using a programming language with plenty of OOP features, you're going to benefit a lot from understanding how nature separates properties and functions into organic matter, how they work in concert to achieve a particular effect, and how they compose each other to scale up from the atomic to the cosmic scale; or
2. If you can't be bothered to learn about topics beyond programming itself, then practice unit testing. As a matter fact, attempt to write unit tests for your old code. If you can't make the unit test run, then that's life slapping you in the face to tell you that your code is way too tightly-coupled.
By the end of the day, doing any of the above things (but more so, both!) will teach you the true essence of KISS.
I can imagine that for others it might be just straight up neuroscience, or something zoology if you like pets, or physics. The idea is to try to see these natural phenomena as running code, and to think of how nature designed and wrote the code to make it all work. I can see how the approach sounds absurd, but maybe video game developers would be able to relate to what I’m saying since (I assume) they do a lot of world-building and have to make simulations of real-world objects.
https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-a...
It takes a empirical view on the software process. Does writing tests first help you develop better code faster? Can code metrics predict the number of bugs in a piece of software? Do design patterns actually make better software?
Scott Wlaschin - Domain Modeling Made Functional: Tackle Software Complexity with Domain-Driven Design and F#
I also bought the Scott Wlaschin book, but I must say I like Eric Normand's book a little bit better. Since the examples are also in JavaScript (but applicable to a lot of languages), I think it's easier to follow for everyone who's coming from an OOP background.
I'm also looking forward to Yehonathan Sharvit - Data-Oriented Programming: Unlearning objects.
These are quite different missions since it's rarely possible to recover form complexity if you've already gone there.
The essay in the the beginning of the book gave me a profound epiphany back then.
More speculatively, I think that the following biology readings are inspiring:
- Gerald Jay Sussman, Building Robust Systems an essay. In general, biology should be a source of inspiration for engineers.
- E. Goldratt, The Goal
David Marquet's books have been surprisingly helpful to me in shaping simpler organisations, starting, of course, with Turn the Ship Around.
It's very unorthodox, but cuts through a lot of the bullshit.
Hope this is useful to you.
Classics
But looking at something personal and concrete like exercise makes total sense, thanks!
Although its ideas are heavily couched in OO terminology, it's changed my ideas about interfaces and exposure to complexity utterly.
Not comprehensive, but every programmer should absolutely read it (it's a tiny book after all).
But I thought it was very uneven in value.
The parts about deep and shallow modules, what good interfaces are, _why_ keeping everything about their implementation hidden is so important -- those were very good.
He also advocated against having small routines, which are almost dogma nowadays (mostly because they're easier to unit test), and he's pretty convincing. That would mean that "making your code easier to test also improves its design" is wrong.
Other parts about documentation etc I thought were too long and not very interesting. Then the bit at the end about optimizing for execution speed was very interesting again, though perhaps because I had never read much on the subject, before, I don't know.
Yes, read it, it's an interesting contribution.
It's refreshing that they take a first principles approach in managing complexity.
Automation is great when it replaces a stable, well-working manual process.
The way to introduce automation is to first experiment with humans doing something manually until you have a great process. Then take the dumbest, most reliable part of that and turn it into a computer program that alerts a human if it encounters an edge case.
Try not to handle edge cases in the software, but try to remove them from the greater system in which they occur.
Automate by elimination, not by complexity.
Good books in this vein would be anything with Taiichi Ohno in the title, or, perhaps better, as the author.
Standard work is a great source of things to automate, but automation is not a replacement for it.
He has clearly been reading up on The Toyota Way and one aspect of it is that they look to understand well a process before automating it.
I have found that a lot of complexity comes from mindlessly automating processes: you end up with systems integrations and data conversions that add a lot of complexity and are often brittle too.
Wait what ?
Mine automation works for years without an issue. Flaky automation is meaningless.
> that alerts a human if it encounters an edge case.
No. It shouldn't alert anybody almost never if it's good. When you have 300 automation routines, even if it alerts once a year you will have some alert every day. So it needs to alert lower then that, when the world falls out or something...
A lot of the things we build into our software is complexity nobody really needs. One way of managing it is by never making it in the first place. This takes at least two things:
- Thorough discussion with the customer about what problem they really are trying to solve. They will come to you with what they think is the solution. That way leads unnecessary complexity.
- Rough economical analysis of choices. Often we do things because that's what one ought to do. Make a quick napkin calculations. A lot of the things one ought to do are not actually necessary.
There are way too many books to recommend here, but perhaps some of the most relevant are
- Deming's The New Economics, and Out of the Crisis,
- Reinertsen's Principles of Product Development Flow,
- Ward's Lean Product and Process Development.
From my perspective, the top "must-reads" are:
-- John K Ousterhout, A Philosophy of Software Design
-- Titus Winter (et al), Software Engineering at Google
-- Hanson and Sussman, Software Design for Flexibility
Other interesting titles that caught my eye:
-- Peter Naur, Programming as Theory Building
-- Scott Wlaschin, Domain Modeling Made Functional
-- Yehonathan Sharvit, Data-Oriented Programming: Unlearning objects (upcoming)
-- Nick Tune, Patterns, Principles, and Practises of Domain Driven Design
-- Robert L. Glass, Facts and Fallacies of Software Engineering
-- Donald Reinertsen, The Principles of Product development Flow
-- Eric Normand, Grokking Simplicity
1. Figure out the difference between your inherent and accidental complexity. The Cynefin framework can be a useful way of thinking about what types of complexity you have.
2. Attempt to reduce (or remove) the complexity at root before assuming you must model it. "The biggest tragedy in business is doing well that which should not be done at all." For example, we had an important user still on a legacy browser because they couldn't afford to upgrade their old laptop. Instead of adding a complicated legacy layer, we bought them a new laptop.
3. Think of the whole system, rather than just shifting the burden somewhere else within the system. This is why microservice architectures so often fail - engineers think they've reduced complexity because each service is so simple. Yet all they've done is move the complexity into the operational layer, where it's harder to deal with. Gerald Weinberg is a fun writer in this space.
4. In many organisations, a lot of people's salaries depend on their ability to add complexity, especially to processes but also to code. If you are working within such an organisation, you will have a Sysiphian task until you solve the organisational problem. Whether you can solve this problem depends on how much influence you have. This is why "skunkworks" projects so often work until they are re-integrated into the mainstream.
5. Once you've sorted out the organisational complexity, separated the inherent and accidental complexity, you'll be most of the way there. Keep reviewing and challenging everything. Complexity is a weed that just seems to grow unless you constantly tend to the garden.
I'm curious about the specifics regarding the combination of "important user" and "couldn't afford to upgrade".
Was the user's importance social rather than financial? Were the hardware requirements particularly hefty?
The entire focus of the book is reducing complexity.
[1]: https://htdp.org/
https://books.google.de/books/about/The_Principles_of_Produc...
It motivates small batch sizes, decision making at ground level, short cycle times, all the mechanics behind kanban, scrum, XP, without ever using any of their jargon.
And it moves up into economics by explaining e.g. cost of delay as a metric guiding useful decision making.
The book eatablishes a wonderful shared understanding of the principles underlying lean and agile, why they work and how to make them work, shared understanding between engineering and management.
Clean Architecture: A Craftsman’s Guide to Software Structure and Design https://www.oreilly.com/library/view/clean-architecture-a/97...
Organizational perspective:
An Elegant Puzzle: Systems of Engineering Management by Will Larson https://press.stripe.com/an-elegant-puzzle
There were some things that I did find reason to take notes on, such as:
- Each area of responsibility should have a team dedicated to it, even if some teams end up having zero people in them, and sometimes members from other teams have to rotate into that function. I liked this because I know how easy it is to forget that some functions are understaffed if you squeeze them in with other functions.
- If you want to improve how the larger organisation works, run the improvement as an experiment in your team and then publish a very brief report on the results along with instructions for how other teams can try it. I liked this because I've often done the first part, but I've often forgotten the publishing part.
- A leader is asked to decide a million things a day. Your job is not the make those decisions, your job is to figure out a system in which the decision is not needed, or the right decision is clear for anyone to see, and then help others participate in that system. I knew this from before, but it bears repeating because it's so easy to forget in the heat of decision-making.
(The last point has a more general corollary: your job is never to do what you were hired for, it's to teach others how to do what you were hired for.)
Organization is just a name for a group of methodologies. Ultimately what we want is the right way to organize programs for maximum flexibility.
> So you really take the attitude that, boy if I’m writing something complex, I need to think about what I’m doing as if I’m writing a language. Or just say that a little less onerously. When I think about the operations I want to produce in any kind of program I’m writing, I don’t want to think about only that particular problem I’m doing right then. I want to carve out a space that’s broad enough that I could be doing that particular problem and lots of variations and lots of similar ones.
If you want to skip to the good part, Chapter 9 is where you want to be.
We were able to create a hybrid FRP system that uses SQLite & application-defined functions as the sandbox within which all of this hypothetical functional/relational business occurs. This is in production for several of our customers right now.
I would add Frederick Brooks' No Silver Bullet alongside this for better understanding.
For Software specific issues i highly recommend David Parnas' collected papers in the book Software Fundamentals. He and his colleagues defined much of what is mainstream in today's Software Architecture/Organization practice.
Finally, the case studies given here are a great source of insights: https://aosabook.org/en/index.html
The Phoenix Project. The DevOps Handbook. The Google SRE Book. And “Kill It With Fire.”
You want to minimize side-effects, preferring computations over them, and data over computations.
I’m still early into it, however, this already has been a helpful lens to view my code through.
You can also google up a fair amount of related blog text, and video talks content.
https://www.amazon.co.uk/Pragmatic-Programmer-Andrew-Hunt/dp...
Stuff on lean manufacturing (yes. manufacturing!) https://www.amazon.de/dp/B000SEGIVS?ref_=cm_sw_r_kb_dp_ocvTD...
https://www.amazon.de/dp/B002LHRM2O?ref_=cm_sw_r_kb_dp_I9SSD...
We have no web stack, no authentication required. No complex UI or UX.
It's so simple, our production manager and I sat down and came up with the system. It works very well and took a few weeks to get up and running. It would have taken months to solve the same problems with a typical web based interactive system. It's also less work and upskilling for users.
Same with a search for a to-do app I was looking for. Until I was thinking my file explorer was already great at filing, so I just add txt files and so on to order my projects. Naming is key if you want it to work well though.
I use Joplin for some more complex things, like keeping track of certain project features and progress.
Most of the time we need to convince some customer that we can implement what he needs because even though everything is there "they are super special".
If you start with saying no - you won't get a sale, our sales of course try to explain that we are quite successful SaaS solution so they probably could work in way we propose and won't have to pay for custom development.
Mind you that it most of the time is not purely technical but often it is checking if they can get what they want so to say "checking the waters" if we are responsible vendor.
Sometimes it is just asserting dominance by asshole manager of a big company to see how far will company bend over to win the contract.
Good part is most of the time we simply get paid for "custom development" but the downside is when 1 year or 2 later such customer goes away this code stays and is not that useful for other customers.
So even if those customers pay quite a lot for custom dev, they externalize cost of software maintenance to SaaS vendor. Because I don't believe anyone is factoring that in the price even if customer pays monthly fees for 2 years, if he goes away we are left with dead code and even removing it costs money as you still have to do regression testing.
That takes said napkin calculation, and thinking a bit like an insurance business. You can't know for sure whether you'll make or lose money on a particular customer, but you can learn to work with expectations so that your bets in aggregate make you money.
Interesting recommendations u/kqr , don’t know the last two.
> keeping the business logic as far in user configuration land as possible
This sounds dreadfully difficult to build and test. The kind of business rules engine that you’re describing is more likely to be overpriced and buggy than a simpler well-modeled set of core domain classes imo.
Not every piece of software should be excel.
They had a project in planning for 18 months but hadn't been able to start it. I was hired as a principal and given this one of my major impacts to deliver, though other things kept getting piled on in front of it. To actually get it done one weekend I sat down and cranked out a functioning POC that could easily and rapidly be extend into the fully working system.
Come Monday they were blown away to see the major progress of an actual working system, and loved the system and were very happy about it. Then the bike-shedding began, an order of magnitude more person hours was spent talking about it than it took me to build it and they required rewriting it because they a data library I used 'old tech and uncool'..
Hilariously that data library was actually a DDD system that created bounded contexts and enforced a domain model in them, but due to their lack of understanding they didn't see that! The hilarious irony!
I wasn't a DDD guy, though having been a high level architect for 15 years I was using all the things I have found that worked best. Some time later I was reading about DDD and realised that system was actually DDD. Of course by that time I had already left, to deliver working systems instead of waste time bike-shedding with cargo culters!
The Evans book is sort of written backwards, in that it starts with a bunch of powerful tactical tools and people stop reading halfway through which means now all they have is a hammer.
The PPPoDDD book instead starts out with reasoning about when DDD is appropriate and when it's not, to dive into the tactical patterns last, once the other options are eliminated. I think that approach makes much more sense.
After all, most of the skill in a technique consists in knowing when to apply that technique, and, critically, when not to.
The PPPoDDD book also places a heavy emphasis on having productive conversations with businesspeople as a design tool. I think Evans accidentally underemphasises that too.
This means various parts of our manufacturing process happen over different plants and different teams. Everyone needs to be in sync and working on the right thing. Most of the manufacturing is automated. Machine instructions are generated as orders come in. The emails send supervisors lists of what has to be worked on in that shift and department. Also they get checklists for QC, and packing instructions.
All data entry is done by scanning stickers that are generated with machine instructions and printed at manufacturing time. There is a little bit of extra info typed in for dispatch and off cut management.
It's all feed from an off the shelf ERP. Most would heavily customize the ERP. We simply hooked Jasper reports server to the db and built some custom reports. We do have some code feeding data back to the ERP.
I hope that sheds some light?
In particular, I’ve gone through very similar experiences where functional-programming was the silver bullet that secretly no one understood.
Anything which is not code gets used as a pretext to not do actual work.
I assume that quote is talking about unit tests, but it's becoming more and more accepted that integration tests are more important than unit tests. https://kentcdodds.com/blog/write-tests
This shift away from unit tests is the corollary to Ousterhout's preference for deep modules.
However, they're a good tool if for example you're working with a Big Data system, where an end to end integration test will necessarily take an extraordinary amount of time.
For my first job out of school in Apache Spark they were absolutely brilliant. For my current codebase? Not so much.
However, TDD just reduces the number of mistakes I make by an umpteenth amount.
Just because it breaks my code up into extremely short subroutines so I screw up less. Plus writing the unit test itself is a brilliant way to "measure twice, cut once."
And integration tests are nice, but the idea of testing small units was that every branch in the code under test doubles the number of possible execution paths, so tmit becomes harder and harder to test every path.
So we should branch less, I guess. His parts about "define errors put of existence" and writing more general code with fewer obscure details would probably help.
If someone is trying to design a better plane, it doesn't make sense to tell them to just convince everyone asking for a better plane to stop traveling. It is nonsense and seems like you have unrelated ideas that you are trying to shoehorn into this question.
That's what I see a lot of in software right now.
I'm merely suggesting that maybe a regular two-story family house is enough. Maybe even one story, depending on needs.
Note that only the first sentence of my comment said "don't automate". The other eight described a way to automate with less complexity when you have to.
I think this is the problem - you are imagining a question that isn't being asked by telling someone to just not make that problem.
This person asked for book on managing complexity. You are saying 'just don't make software as complex and do it by not making software at all'. That is two steps removed from being any sort of answer to this question.
Automation routines MUST be robust, must handle all weird cases that happen frequently (at least once a year), and must notify when they fail to do so always. Then you should come back and see how to not make them fail even then.
It's easy to spend upward of $10000 on really robust automation, when the same manual process would cost only $3000 over its usable lifetime, and the economical-but-less-robust automation costs $1000 over the same period.
The robust automation, in that case, has over 10× worse ROI. What's wrong with that calculation?
The thing about really robust automation is that for it to pay off, the process have to be static over a large number of executions. For many business needs, the process, or its inputs, change every few executions, and you never get to reap the benefits of robust automation before it needs to be redone at great expense.
As for thinking that it's a dichotomy between "no automation" and "absolutely robust automation"... well, I think you're robbing yourself of a large chunk of the strategy space by refusing to see any middle ground but the two extremes.
Edit: also note that I'm not talking about "failed scripts" at any point. I'm talking about scripts that do exactly what they are supposed to, but they are performing a narrow, easily automated slice of the work. A human can chain such scripts together in the requisite sequence by spending very few minutes of their day.
> It's easy to spend upward of $10000 on really robust automation, when the same manual process would cost only $3000 over its usable lifetime.
Manual process is incomparable to automation, because $3k human will make mistakes as humans are not good robots. Also, your miserable $3k human can now do normal thing.
> The thing about really robust automation is that for it to pay off, the process have to be static over a large number of executions.
It doesn't have to be static, it just mustn't be random. Also, how often process changes is important and automation with scripts (that can be changed ad hoc) allows for quick flexibility when problems arise.
> As for thinking that it's a dichotomy between "no automation" and "absolutely robust automation"... well, I think you're robbing yourself of a large chunk of the strategy space by refusing to see any middle ground but the two extremes.
You are also robbing yourself of time to do other things which may lead to more progress, since you are fixing flaky automation all the time.
> A human can chain such scripts together in the requisite sequence by spending very few minutes of their day.
I LOLed. A minute for a single script. You must have missed that in enterprise there are hundreds of scripts. Heck, I usually have 20-30 on a single project.
Step one to solving a problem is often to find a technique by which time is no longer stacked against you. Then you can attack the other thing in peace.
You didn't do that, your solution was 'don't write software'.
If someone asks for tips on organizing their kitchen and without knowing anything more than that, you say 'make your kitchen smaller and don't use your kitchen' do you think that is reasonable advice?
Sure, if someone built tens of industrial kitchens every month for regular families and just in the middle of nowhere (which is what i see a lot of in software) and then asked "how can I reduce my kitchen expenses" I would suggest "how about going forward you make your kitchens smaller and don't build them at all where nobody uses them?"
It sounds like you think robust automation takes zero minutes to create, since you think of robust automation as always freeing up time. In my experience, robust automation is something that takes considerable time to create and maintain.
Maybe you know of some trick I don't. But since you keep writing about "failing scripts" and "flaky automation" despite my attempts to correct such misunderstandings, I'm starting to suspect you're interpreting my comments as what you want them to say for the sake of your argument, rather than what I'm trying to say.
I have never experienced that someone wants to keep human if machine could be put to use. No, humans do not prevent mistakes for highly unimaginative repetitive work. Even if it happens, its outlier.
> It sounds like you think robust automation takes zero minutes to create, since you think of robust automation as always freeing up time. In my experience, robust automation is something that takes considerable time to create and maintain.
It takes, and it gets MUCH better with experience. However, the time is finite, unlike that with human corrections.
> Maybe you know of some trick I don't.
Probably - I know how to write robust and resilient automation scripts that over time converge to almost 0 failures.
Let’s say I have a crawler which automates some data gathering. It’s sources keep changing frequently, robust automation here is probably a research project and simple automation is orders of magnitude more bang for buck.
Here is the exact case you mention, just WAY worst. This is something I did for chocolatey community:
https://gist.github.com/choco-bot/a14b1e5bfaf70839b338eb1ab7...
This page is report of the PowerShell framework I developed mostly in first year of development (https://github.com/majkinetor/au) that checks ~250 web sites for updates on various software. Today it has 6 errors and usually never much more. On my own location I keep ~60 packages and I I tackle errors maybe once a year. Stuff just work, and you rarely have to visit, otherwise I would be involved entire day into this and I am not, while those packages have many millions of users.
Now I spend almost 0 time maintaining packages and I am one of the top choco package owners.
Check out the options used, some of which make it so robust:
https://github.com/chocolatey-community/chocolatey-packages/...
> It’s sources keep changing frequently, robust automation here is probably a research project and simple automation is orders of magnitude more bang for buck.
Even if the source changes frequently its better to automate. Its not when it keeps changing daily or more then that. By automating you learn something new, so it pays more for your experience. Manually working every day the same thing (that may move around) doesn't involve complex thinking and is just waste of time.
Now I think we can get somewhere! Is this an admission that automation is not worth it when the processes or inputs change too often?
If so, then this frequency (which you have given as daily) depend entirely on the business needs in question.
Often, there's no business case to run an automated process daily.
Weekly or even monthly are very common intervals for processes in business. For a process that needs to run monthly, you only get twelve executions in a year. If the inputs change every six months, do you still think spending 60+ commits (as in your settings example) is worth it every six months, when there are cheaper ways to do it with limited human intervention?
You are answering a question that no one would ever ask and you can't answer it because you don't have any other information other than the actual question that was asked.
Just because you don't see how it's a good suggestion doesn't mean it's not.
Almost 100% of the cases I have run daily, hourly and even less (5,10,20,30 minutes schedule are common). I even had one recently that executed millions of requests to some REST API daily, running every few seconds. I call those "app supporting scripts", and I offload specific features of the main app to those.
Must be architectural thing I guess, I work as principal architect and I design most of my services so that they rely heavily on automation support.
> Is this an admission that automation is not worth it when the processes or inputs change too often?
I don't work in a vacuum. For me there are no rules about anything, context is most important (patterns, best practices etc. are for newbies). That case does lean to the manual side on first thought, but it all depends on other factors.
@kqr, thank you for your insights. While it's not exactly relevant to my problem domain (self-driving cars - lots of essential complexity there!), it's still generally useful advice.
Software is expensive to write, but it's even more expensive to maintain. That's why my favorite projects are those that delete large amounts of now-useless code. Not writing it in the first place is an even better alternative.
If someone asks "how do I get to this place" with no other context and the advice is "stay home and do nothing" that doesn't seem patronizing to you?