Systems that defy detailed understanding(blog.nelhage.com) |
Systems that defy detailed understanding(blog.nelhage.com) |
On the other hand maybe this forgiveness allowed us to build complex systems.
Being able to still do something useful in the face of something not going according to plan is essential to being reliable enough to trust.
But systems should quickly and reliably surface bugs, which are controllable failures.
A layer of suffering on top of that simple story is that it's not always clear what is and what is not a controllable failure. Is a logic error in a dependency of some infrastructure tooling somewhere in your stack controllable or not? Somebody somewhere could have avoided making that mistake, but it's not clear that you could.
An additional layer of suffering is that we have a habit of allowing this complexity to creep or flood into our work and telling ourselves that it's inevitable. The author writes:
> Once your system is spread across multiple nodes, we face the possibility of one node failing but not another, or the network itself dropping, reordering, and delaying messages between nodes. The vast majority of complexity in distributed systems arises from this simple possibility.
But somehow, the conclusion isn't "so we shouldn't spread the system across multiple nodes". Yo Martin, can we get the First Law of Distributed Object Design a bit louder for the people at the back?
https://www.drdobbs.com/errant-architectures/184414966
And let us never forget to ask ourselves this question:
The key here is that the web wasn't designed for engineers but for amateurs to slap something together sloppily in the first place.
As an aside it's curious how ridiculously forgiving HTML and JS are while CSS craps itself on a single missing semicolon. As though it were okay for the thing to be semantically and functionally malformed and malfunctioning... as long as it looks good!
as if the systems we have today aren't fragile? instead, they're fragile but their fragility is hidden and obfuscated.
being robust and reliable is different than just letting systems do whatever they think is best.
Once it is running on production though, things are quite different. You need the right combination of errors being well reported and gracefully handled without aborting or breaking the rest of the functionality unnecessarily. At that point people are relying on it to get their jobs done and they will usually find ways to work around the errors and even the corrupt data this might result in so they can keep meeting their deadlines while the programmers work on fixing the problem. This is much better than those same employees not being able to do their jobs or getting payed to stand around and do nothing. I guess this attitude is largely driven by the practicalities of where I work. If the employees that rely on the code to work get behind or can't complete their work on time, our company is nailed with thousands of dollars in fines as per the contract agreements we have to agree to in order to get the business/contracts to begin with, and then our customers can't bill their customers, so they are not happy.
A better analogy would be if your construction workers didn't have standard or prescribed bolts in their design, so they just take what's lying around and hammer and weld bits together until it seemed sturdy enough. Suffice it to say, this is not a recipe that would work to build today's sky scrapers. There is considerable design and sanity checking that goes into this stuff which the web at every point completely lacked.
XHTML was a promising start in the right direction, but they unfortunately bungled it.
This was strongly suggested by those who fought for strict XHTML, but then Sam Ruby, who was leading the HTML5 effort, asked the question, "I find an image that I know my daughter will like. I send it to her. It is SVG. She wants to upload it to her Myspace page. However, the image won't render, because SVG is a form of XML, and Myspace is non-compliant. And yet, if I send her a JPEG or GIF image, she can upload that to Myspace."
The point was we typically embed content from one page into another page, and no one believed there would ever come a day when every page on the Web would be strict compliant. So HTML5 went in the other direction, dropping most requirements and allowing pretty much anything.
As I've written elsewhere, the fundamental problem we face is that a markup language, such as HTML, is completely unsuitable to the apps we now like to build and run over the Web. We rely on HTML to function as the GUI of TCP/IP, but it was not actually designed for that, as it was descended from SGML, and it carries with it a publishing history. What would make more sense would be use of a data format, such as JSON or EDN, which can then be given visual characteristics, without ever having to participate in one hierarchy or any one understanding of a DOM. Developers understandably complain that Java/Swing had 9 different layout options, the product of much experimentation, but having a variety of layout options does allow more flexibility of styles of building a GUI, with some approaches being simpler than what we get with the React/JS translation into HTML.
As a programmer, I like that same confidence in output AND it requires me to address the failures in some way...
Poor/Lazy developers will find ways around more stringent checks.
Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, be liberal in what you accept").
the wisdom of that policy just accrues over time.
Postel's Law creates debt that we pay interest on forever
Since Hyrum's law is a naturalist observation and Postel's Law is a suggestion, Hyrum's law is a more definitive truth.
Should you have been unable to submit it, or should people not be able to view it, until you correct it?
>My answer usually is “are you sure? “.
^
Line 1:
Syntax error: "“" not allowed here.
JavaScript is quite forgiving, but that's usually okay. If something doesn't work it's usually not the end of the world.In this case everyone correctly read your second opening quotation mark as a closing quotation mark.
This allows us to focus on what you're saying (functionality.)
If we couldn't figure out why you included some typos, we would just ignore that part and focus on the rest of your comment.
When someone replies with the nitpicking style it doesn't help anyone. (In fact my first version of this comment was downvoted, before I wrote out the rest of my explanation.)
I think all the leniency in front end JS is pretty good for the same reason. It lets us communicate, and the sandboxed client environment (browser security is built assuming web sites could be malicious) means that the stakes are quite low.
Recalls Gall's Law[1]. "A complex system that works is invariably found to have evolved from a simple system that worked."
Also, TFA invites a question: if handed a big ball of mud, is it riskier to start from scratch and go for something more triumphant, or try to evolve the mud gradually?
I favor the former, but am quite often wrong.
Or are test suites just a nice fantasy for a real distributed system?
By 'tests' I am including live failures, which are also a feature of mudballs.
A distributed system is always much more difficult to test than a functionally-equivalent localized version. That's not, of course, a reason to give up on testing, but one must be realistic about how much faith one can put in it to make up for an inadequate use of abstraction and separation of concerns.
http://pespmc1.vub.ac.be/ASHBBOOK.html
> ... still the only real textbook on cybernetics (and, one might add, system theory). It explains the basic principles with concrete examples, elementary mathematics and exercises for the reader. It does not require any mathematics beyond the basic high school level. Although simple, the book formulates principles at a high level of abstraction.
> W. Ross Ashby is one of the founding fathers of both cybernetics and systems theory. He developed such fundamental ideas as the homeostat, the law of requisite variety, the principle of self-organization, and the principle of regulatory models. Many of these insights were already proposed in the 1940's and 1950's, long before the presently propular "complex adaptive systems" approach arrived at very similar conclusions. Whereas the concepts surrounding the complexity movement are often complicated and confused, Ashby's ideas are surprisingly clear and simple, yet deep and universal.
Good link
I find it sad too. PID controllers are great but from my POV they're barely the first step.
However, another way to look at it is, you can study and apply "Intro to Cyb" and leapfrog into the future.
His defining characteristic is where you can permanently work around a bug (not know it, but know _of_ it) vs find it, know it, fix it.
Very interesting.
...
> These failures are, individually, mostly comprehensible! You can figure out which browser the report comes from, triage which extensions might be implicated, understand the interactions and identify the failure and a specific workaround. Much of the time.
> However, doing that work is, in most cases, just a colossal waste of effort; you’ll often see any individual error once or twice, and by the time you track it down and understand it, you’ll see three new ones from users in different weird predicaments. The ecosystem is just too heterogenous and fast-changing for deep understanding of individual issues to be worth it as a primary strategy.
Sadly far too accurate.
But I agree when he says, it has become impractical to do so. But I just don't like it personally, I got into computing because it was supposed to be the most explainable thing of all (until I worked with the cloud and it wasn't).
I highly doubt that the original engineers who designed the first microchips and wrote the first compilers, etc... relied on 'empirical' tests to understand their systems.
Yet, he is absolutely correct, it can no longer be understood, and when I wonder why I think the economic incentives of the industry might be one of the reasons?
for example, the fact that chasing crashes down the rabbit hole is "always a slow and inconsistent process" will make any managerial decision maker feel rather uneasy. This make sense.
Imagine if the first microprocessors where made by incrementally and empirically throwing together different logic gates until it just sort of worked??
For example, in PHP I often find myself wondering if perhaps a class I am looking at might have subclasses that inherit from it. Since this is PHP and we have a certain amount of technical debt in the code, I cannot 100% rely on a tool to give me the answer. Instead I have to manually search through the code for subclasses and the like. If after such a search I am reasonably sure nothing is extending that class, I will change it to a "final" class in the code itself. Then I will rerun our tests and lints. If I am wrong, eventually an error or exception will be thrown, and this will be noticed. But if that doesn't happen, the next programmer who comes along and wonders if anything extends that class (probably me) will immediately find the answer in the code, the class is final. This drastically reduces possibilities for what is possible to happen, which makes it much easier to examine the code and refactor or make necessary changes.
Another example is often you come across some legacy code that seems like it no longer can run (dead code). But you are not sure, so you leave the code in there for now. In harmony with this article, you might log or in some way monitor if that path in the code ever gets executed. If after trying out different scenarios to get it to run down that path, and after leaving the monitoring in place on production for a healthy amount of time, you come to the conclusion the code really is dead code, don't just add this to your mental model or some documentation, embed it in the code as an absolute fact by deleting the code. If this manifests as a bug, it will eventually be noticed and you can fix it then.
By taking this approach you are slowly narrowing down what is possible and simplifying the code in a way that makes it an absolute fact, not just a theory or a model or a document. As you slowly remove this technical debt, you will naturally adopt rules like, all new classes must start out final, and only be changed to not be final when you need to actually extend them. Eventually you will be in a position to adopt new tools, frameworks, and languages that narrow down the possibilities even more, and further embedding the mental model of what is possible directly into the code.
But we all know such systems already: for instance, people! No, this post is a supplement/subsidiary to the previous one ("Computers can be understood" — BTW here's another recent blog post making the same point: https://jvns.ca/blog/debugging-attitude-matters/), carving out exceptions to the general rule, and illustrating concretely why these are exceptions (and what works instead). It is useful to the practitioner as a rule-of-thumb for having a narrow set of criteria for when to avoid aiming to understand fully (and alternative strategies for such cases). Otherwise, it's very easy to throw up one's hands and say "computers are magic; I can't possibly understand this".
(The point of the article here is obvious from even just the first or last paragraphs of the article IMO.)
The fact that you didn't pick this up leads me to think you are more interested in being smart than helpful, but perhaps I am wrong about that.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare, The 1980 ACM Turing Award Lecture
Reminiscent of Chesterton’s fence. But then, we end up in such a “complex” situation only when one thing can have multiple causes & effects — which is difficult to model correctly in a clean slate formulation.
The simplest solution seems to be to avoid making software that complex in the first place (we can exert far more control than in the physical world).
But then if we think about Peter Naur’s perspective about programming as a mode of theory building (of the domain) (unsurprising, given the basic cybernetics principles such as the law of requisite variety & the good regulator theorem), then the answer seems to be — unless your domain is really complex, think hard before you implement, and keep refactoring as your understanding improves (and truly to pick problem formulations / frameworks / languages which make that feasible. Of course, easier said than done.) The key point is to keep refactoring “continuously“ to match our understanding of the domain, rather than just “adding features”.
Aside: In my experience, software built on a good understanding of the domain will function well, untouched, for a long time — so long as it is suitably decoupled from the less-well-understood parts. The latter kind, though, generates constant churn, while also being an annoying fit. Really brings home the adage “A month in the laboratory can save a day in the library.”
This is why you should keep paying your employees that worked for the company for years, having written all of the mediocre code when they still could not program well at all.
This is also what I wanted to say.
One important part of that is that refactoring is a pretty difficult skill, and many programmers do not have it.
So... for those people, some other advice is probably better.
I am working on my first game using unity right now and I wholeheartedly agree. Almost all of my effective refactoring is turning interacting systems into standalone chunks that don’t care about the rest of the system
It’s very hard to do. I imagine my 4th game will go far smoother after I figure out what works and what doesn’t
In that analogy, XHTML is weather control satellites.
Of course, we can discuss whether that culture of busýness was ever actually constructive, but that’s a discussion for another day.
Amusing but unrelated I looked to see if that law was on wikipedia and clicked on this thinking it might be another spelling:
Just look at the F31, recently they opted to "test in prod" for a lot of components. And not because of the engineers/developers.
Were it a nuclear reactor control software they wouldn't have decided that way. (Hopefully.)
I was thinking, if the error exists between keyboard and chair, I want the strictest failure mode to both catch it and force me to do things right the first time.
But once the thing is up and running, I want it to be as resilient as possible. Resource corrupted? Try again. Still can't load it? At this point, in "release mode" we want a graceful fallback -- also to prevent eventual bit rot. But during development it should be a red flag of the highest order.
If so makes sense to be not strict, if not it’s you(and us all) rolling up two different modes of failures into a single classification.
And it offers good lessons about what's worth factoring and how. Forth words that are just static answers and aliases are OK! They're lightweight, and the type signatures are informal anyway. "Doing Forth" means writing it to exactly the spec and not generalizing, so there's a kind of match of expectations of the environment to its most devoted users.
On the other hand, in most modern environments the implied goal is to generalize and piling on function arguments to do so is the common weapon of choice, even when it's of questionable value.
Lately I've cottoned on to CUE as a configuration language and the beauty of it lies in how generalization is achieved while resorting to a minimum of explicit branches and checks, instead doing so through defining the data specification around pattern matching and relying on a solver to find logical incoherencies.
I believe that is really the way forward for a lot of domains: Get away from defining the implementation as your starting point, define things instead in a system with provable qualities, and a lot of possibilities open up.
This assumes that later code changes don't undo/blur the factoring, which while ideal is not at all consistently the case in the real world.
Refactoring is a little over arrow of a name, because code hygiene is more than just isolating factors, but the “re” part is right because you are always aiming to remove infelicities that were actively added in previous coding.
In TPS, they found that a focus on reducing complexity leads to improvements in the metrics you'd want to measure: better quality, reduced costs, and customer satisfaction.