Normalization of Deviance (2015)(danluu.com) |
Normalization of Deviance (2015)(danluu.com) |
p
{
line-height: 1.7;
max-width: 60em;
font-size: 1.2em;
margin-left: 5em;
}
It pretty much fixes the default readability which is essentially zero on this site otherwise. javascript:(function(){ var bod = document.getElementsByTagName("body")[0]; bod.style.margin = "40px auto"; bod.style.maxWidth = "40vw"; bod.style.lineHeight = "1.6"; bod.style.fontSize = "18px"; bod.style.color = "#444"; bod.style.padding = "0 10px"; })();
And for a dark mode: javascript:(function(){ var body = document.getElementsByTagName("body"); var html = document.getElementsByTagName("html"); var img = document.getElementsByTagName("img"); body[0].style.background = "#131313"; body[0].style.opacity = "1.0"; html[0].style.filter = "brightness(115%) contrast(95%) invert(1) hue-rotate(180deg)"; img[0].style.filter = "contrast(95%) invert(1)"; })();https://support.mozilla.org/en-US/kb/firefox-reader-view-clu...
Once some (foreigner) person was surprised at my dipping toast with Nutella in my latte. I was equally surprised by his surprise.
This is useful and fine. Someone wrote a test and it now hits a race condition or something and occasionally fails. Let’s assume we are very confident it is problem with the test not the product.
Choices:
Spend a sprint trying to fix it right now regardless of priority.
Turn it off and lose that coverage.
Buy some time.
In this context it makes sense. As long as their is a procedure to address these in some sane timeframe.
Maybe that is an example of normalization of deviance. But I think if it is discusses and trade offs thought through it is an OK thing to do at times. Remember most development is not green field. You inherit a system when you start a job.
On a more serious note: A maximum length for text makes sense ergonomically, which is why especially big prints like newspapers or magazines work with columns. Columns however haven't really cought on in the web, because they do not combine very well with the whole scrolling thing.
My frustration stems from the fact that I find the argument "there are no papers with sufficient evidence" to be pedantic bullshit. Like yeah, sure, you aren't even wrong, but absence of evidence is not evidence of absence. I've never seen anyone claim to like 180 char lines, whereas I've seen hordes of people who say it is very difficult for them to read that line length, and prefer something book-sized (lengthed?).
Which is basically my understanding of "too long" lines: The problem doesn't have to do with the length of the line itself / reading one line (which is what most people seem to focus on), it has to do with reliably returning to the beginning of the next line instead of accidentally drifting up or down.
So it wouldn't be much of a problem if there were other visual indicators (code lines have unique shapes instead of being a big block of text, and paragraphs with a blank line between them let you see more easily "I'm going from line 2 of 4 to line 3 of 4" so you don't actually have to track the line sideways. It's tracking the line back to its beginning in a big block of non-code text that's difficult.
Designer says: my opinion is right because A. Dan shows: A is not factual. Designer says: my opinion is right because of B.
Dan notices behaviours, and then he writes compellingly about those behaviours.
He has every right to make his choice. As do we, in deciding whether to read it.
I was mostly reacting to the assumed malice in the parent comment. Based on his blogging style, I think it's more likely that Dan's just a pedantic guy implementing his personal preferences on his personal blog :)
My first day at work at big-laser-company. Manufacturing engineer for a laser (then) so complex, it required a PhD to solve problems to get units out the door. The product was a ring laser. What that means is that the laser beam travels around in a race track pattern inside the laser before getting out, not a back-and-forth bouncing between two mirrors. Now this laser could be tuned to any wavelength by suitable setups and machinations, and once there, would “scan” a small amount about this wavelength, enabling scientists to study tiny spectral features in atoms and molecules with great precision. I knew all this shit. I was a Berkeley-trained physicist that built precision lasers out of scrap metal for my thesis. First day of work. I walk into the final test lab. The big laser was happily scanning away. The bright yellow needle-like output beam was permitted to hit the lab wall. As the laser scanned, the beam was MOVING on the wall. Whereupon, first day of work, I exclaimed the most obscene four words in manufacturing, for all to hear, “You can’t ship that!” (“Beam pointing instability” is detrimental to almost any laser application. It turns out that during scanning, an optical element was rotating, on a shaft, inside this laser. This mechanical motion caused beam motion.”) Well, I got an immediate reputation as a negative guy. (You can tell it’s deserved.) The solution was to retrofit 28 lasers in the field, mostly in Europe, with a component that cancelled the movement, on an expensive junket by a service guy. Who was hailed as a ”hero.”
This one question tends to separate out a very large fraction of companies that take unacceptable risks and allows the ones that don't to be justifiably proud of their attitude towards risk. These are not trivial things either, medical devices and software used in medical diagnosis, machine control and so on where an error can quite literally cost someone their life or a good chunk of their healthy life-span. Companies where people can not or won't speak up tend to have a lot of stuff that's wrong wiped under the carpet.
Kudos to you for speaking up, and irrespective of who got to be called a hero (that part isn't all that relevant to me) also kudos to your employer for acting on your input.
It's disappointing (and career limiting) to do the "right" engineering and lose because you didn't correctly gauge the risk tolerance of the market.
I don't think there's any one answer for this.
My observation is that people will pay a premium for demonstrable physical safety features but privacy & security in software do not win markets.
Meanwhile our major competitor cleverly placed their rotating element in such a position that the beam retraced itself through the rotating element, thereby substantially cancelling this effect.
Over a period of days, the error became increasingly, comically bad, until finally the system refused to boot.
A technician was called, and after hearing about the behaviour, the first request was that a photo of the laser light exit port be taken.
It was obvious why it wouldn’t boot: a mirror in the light path had fallen off.
The worst part was, the mirror had been held on by glue, and had been slowly slipping out of place. The hot climate was probably a factor.
They really should have had someone to say ‘you can’t ship that’ when the topic of glue to hold mirrors came up.
I wish I had pushed more strongly about it. We spent probably a full person-day of work every week on that.
I work in product at a hardware company and have a lot of domain experience which came from spending years in the (literal) field. There's been many times where I write a product spec and the engineers are incredulous. "Really? It gets THAT hot?" or "Do we really need to provide a bonding/grounding lug on the case?"
It's not uncommon to find engineering teams with deep domain experience in one area, but completely lacking in others. Ignoring domain experience, there should have been rigorous product testing during design that would have weeded out the glue issue.
I had a colleague who got called up when a Trident missile MIRV bus fell off a forklift and he had to do simulations to tell the Navy if it was still good or needed to be brought back in for rework/recalibration. My understanding is that either the MRIV bus itself or its container has integral devices that record peak 3-axis acceleration for just such a scenario. I imagine they're as simple as a few precise weights on a few wires with precise failure strains, so you can bracket the peak acceleration by which wires broke and which survived.
On the one hand, it's great to have more accurate nukes, which allow lower yields, smaller stockpiles, and presumably smaller craters if everything goes sideways. On the other hand, "surgical" nukes result in it more likely that one side will use them and gamble that the other side won't massively retaliate.
If it was ever used, that work saves lives.
A more correct and polite version of your advice: "It will cost us a lot more to ship this as-is, and fix it later, than it will to delay shipment and fix it now. Is it too late to do that? Did we over-commit to shipping now?"
It wasn't your responsibility to come up with that version. It was your manager's responsibility. It was also their responsibility to find the necessary decision-makers and involve them directly. I would argue that this sort of work is the only real way that "management" can provide value in the first place.
Somehow, socially, it's incredibly common for people to value the inverse of that job. People assume it is "good work" for a manager to successfully ignore unpopular concerns, and push through to the end, no matter how inefficient that makes the journey.
That works out in the case that the shipping date was over committed, such that a delay would cost more than fixing it later. Even so, that entire situation would be avoided by refusing to over-commit shipping dates in the first place. That's the same responsibility applied earlier in time, so a manager that behaves the way I have described could factor out the entire problem at its source.
This is what the average person should learn about management. Even if it's not their job personally, there is a lot of leverage behind the decision a worker makes about what management behaviors to socially favor, and what behaviors to socially reject. That leverage is multiplied at every level up the hierarchy, making that the opinion of someone in a management role is very significant, and the opinion of someone in an executive role is crucial.
It's really difficult to be explicit about opinions. You can't really put them in your resume, but at the same time, an opinion on management style may be an executive's primary value contribution!
Everyone wants to think of the cool ideas to make things work, but few people want to think of all the ways those ideas can break, fail, fail to be future-proof, be expensive, etc, etc whereas I relish in it; what's more satisfying than helping make a proposed or existing solution even better?
But the same applies to stuff outside of work, too. I find I'm quite negative about stuff in the exact same way and whilst it's fun to think "how could we fix this, how could we make it better", all people see it as is negativity and social pressure has made me start to rethink this approach in life. It's better to keep your mouth shut and let the fire start than to open your mouth and be negative, as per your analogy.
Hell it even applies to traditionalism; "we should put out that fire" "but that fire's always been there, that's the way it's always been" "but it's a fire!!!" "yeah, well it was here before you and we like it. That fire walked uphill both ways through the snow to get to school".
You are not getting points for preventing fires, you get them for putting them out. Unfortunately, some folks seems to conclude that lighting fire up, just to put them out later, is a good and easy way to earn that "hero" reputation.
Normalization of Deviance (2015) - https://news.ycombinator.com/item?id=22144330 - Jan 2020 (43 comments)
Normalization of deviance in software: broken practices become standard (2015) - https://news.ycombinator.com/item?id=15835870 - Dec 2017 (27 comments)
How Completely Messed Up Practices Become Normal - https://news.ycombinator.com/item?id=10811822 - Dec 2015 (252 comments)
What We Can Learn From Aviation, Civil Engineering, Other Safety-critical Fields - https://news.ycombinator.com/item?id=10806063 - Dec 2015 (3 comments)
Let's look at how the first one of these, “pay attention to weak signals”,
interacts with a single example, the “WTF WTF WTF” a new person gives off
when the join the company.
and kinda wonder if a company that prioritized not getting this reaction from new hires might find it is the most impactful thing they can do in terms of culture.In these systems it is found that they are almost always operating (or transitioning between) failure modes. Often multiple operational failure modes are simultaneous. It becomes very important to test the system in each of it's failure modes and their combinations to maintain high up time.
https://how.complexsystems.fail/ is an example, but there are many.
Human work, development, and maintenance is itself a system that interacts with these critical systems. Frankly, failure to fail causes failure (thus chaos monkey). The mythical man month is almost a sub category of these failures as are HR hiring processes and other BS. Being too successful and not having competition (or similarly sclerotic competition) can be as much of a hazard as "move fast, break things".
When is it "Normalization of Deviance"? and when is it a "Efficiency Optimization"?
I mean, the difference is pretty clear after something has failed, But very murky before.
aka "Chesterton's Fence"
Otherwise, it's "Normalization of Deviance":
* The build is broken again? Force the submit.
* Test failing? That's a flaky test, push to prod.
* That alert always indicates that vendor X is having trouble, silence it.
Those are deviant behaviours, the system is warning you that something is broken. By accepting that the signal/alert is present but uninformative, we train people to ignore them.
vs...
* The build is always broken - Detect breakage cause and auto rollback, or loosely couple the build so breakages don't propagate.
* Low-value test always failing? Delete it/rewrite it.
* Alert always firing for vendor X? Slice vendor X out of that alert and give them their own threshold.
https://www.fastjetperformance.com/blog/how-i-almost-destroy...
> Everything that can go wrong will go right.
Murphy's Law then manifests from escaping disaster through repeated iterations of taking risks where most things play out well anyway.
I have to laugh at the "append z to the end" strat at Google, though. That's a good one.
- The Space Shuttle Challenger disaster in 1986 was caused by the normalization of deviance, where engineers became accustomed to problems with the O-ring seals and began to accept them as normal. This led to the eventual catastrophic failure of the shuttle's launch, killing all seven crew members.
- The 2008 financial crisis was caused in part by a normalization of deviance in the banking industry, where risky and complex financial instruments were routinely used without proper oversight or understanding of the potential risks. This led to a widespread collapse of the financial system and a global economic recession.
- The Volkswagen emissions scandal in 2015 was caused by a normalization of deviance in the automotive industry, where engineers and executives became accustomed to cheating emissions tests and misleading customers about the true environmental impact of their vehicles. This led to significant financial and reputational damage to the company.
- The Theranos scandal in 2018 was caused by a normalization of deviance in the healthcare industry, where the company's leaders became accustomed to misrepresenting the capabilities of their blood testing technology and misleading investors and customers about its accuracy. This led to significant legal and financial repercussions for the company and its executives. (ChatGPT)
You have to find the one that is broken in the way that is tolerable to you.
Arguably the closest we know to a panacea in terms of engineering culture and best practices is Google. And what are they now known for? An inability to ship anything meaningful anymore. Spinning around in circles launching and re-launching new chat apps.
These are not unrelated. High engineering standards are always in tension with product delivery. As a security engineer once told me, "the most secure system is the one that never gets launched into production."
So while Dan is right, and all the examples are right, and things like non-broken builds and a fast CI/CD pipeline are totally achievable, don't learn the WRONG lesson from this which is that when you arrive to a company and notice a bunch of WTFs, the first thing you must do is start fixing them in spite of any old timers who say "Actually that's not as bad as it seems". Sometimes they're wrong. USUALLY, they're right.
The tech industry tends to revolve around "I'm a super-rational robotic genius" thinking that can't accept the existence of its own irrational tendencies, to the point that it becomes ridiculous.
Reading it felt like a personal attack in many places. However, reading it forever changed how I think about things. It's a much more useful framing for everyone involved if you start with the question of "why did they think this was the right thing to do?" as opposed to "this person made a bad choice / mistake". My (extraordinary) impatience naturally predisposes me towards the latter, but the core argument of the book is that that's lazy -- you can hand wave away anything and everything with "operator error".
One company I worked had no unit tests, no infrastructure as code, and no build server. This held strong for a while until enough developers implemented some unit tests, infrastructure as code (e.g. terraform), and a build server as skunkworks projects. Eventually management tolerated them, but never endorsed them. Some teams at the company still never embraced good practices because it wasn't forced on them.
I guess I've never worked at a company that valued unit tests across the whole of the engineering team. I introduced them and implemented them on my own team, but others ignored it.
Personal experience is that a build server normalizes deviance. "But it works on the build server" we used to say, as, with time, it become harder and harder to build locally. "Just fix your environment!" we used to say, when it was the build system that was actually at fault. "It's all so fragile, just copy what we've done before!" we then said, repeating the mistakes that made the build system so fragile.
Eventually, the build system moved into a Docker image, where the smells where contained. But I'm still trying to refactor the build system to a portable, modern alternative. If we hadn't had a build server, we'd have fixed these core issues earlier and wouldn't have built on such a bad foundation. Devs should be building systems that work locally: the heterogeneity forces better error handling, the limited resources forces designing better scaleability, and most importantly, it prevents "but it works on the build server!".
This got me puzzled for a couple of minutes. Yeah, that “WTF, WTF” moment. Then I realized that our build “server” comprised of 12 different platforms (luckily reduced to just 6 in the later years), so to pass a build in production was a bit harder than to build locally.
Many of the examples in the OP are probably closer to the former, but my general advice here is to keep lots of notes about what seems broken, and revisit in a month or two. Sometimes you gained context that explains why something is actually sensible. If it still seems crazy with context, you can now bubble up the feedback with confidence, and also having hopefully built some respect and trust from the team to make the message land better.
If you have an integration test that relies on an unreliable system you do not control. Sure you can mock it out for a unit test, but if you want to make sure you catch breaking API changes, you need to hit the actual system. And if it works after retrying it a few times, then so be it. no need to throw shade.
Don't test it.
Only do unit tests with the connection mocked out.
Test against production.
Try it a few times with a delay, and if it works then you know your code is good and you can move on with your deployment. Which is what flaky and pytest-retry do.
Maybe I'm missing something, but out of those 4 options retrying the test seems like the best one, with the big caveat that it is only viable if the test does indeed work after trying a few times. I really don't see any downside.
edit:
Maybe another option is to put the retry functionality directly in the client code, which would make your code more robust overall. but that is definitely more complex than using one of these libraries just for testing.
It's fascinating really... Complex systems are always in partial failure mode and that applies to collective optimization challenges. Organizations will always be stuck in local optima in most domains.
i have marginal control over who i manage. The Product isnt saving the world, but it is allowing us to live reasonably and with a clear soul at the end of the sprint. The reason i say the "mercenary" bit is simple: weigh your dreams against blood and gold and compromise.
https://www.aopa.org/news-and-media/all-news/2015/december/0...
As an obvious result, our society does an incredible amount of work maintaining that obfuscation.
---
I've heard estimates that 20% (1/5th) of all healthcare-related spending in the US is overhead from insurance determinations, paperwork, etc., and that that 25% (25%/125%=1/5th) extra spending (relative to 100% of the rest of healthcare expenditure) does not exist in single-payer healthcare systems, like those used in Canada, Germany, and every other developed nation in the world.
What do we get from that extra spending? What substantive difference does that obfuscation provide?
The main difference I see is "explicit opportunity cost". Instead of deciding ahead of time that we will pay for any arbitrary healthcare need (as a single-payer program), the opportunity for each individual healthcare act is given a price, and groups of priced opportunities are provided by subscription-based insurance plans.
Every person has to find, apply for, and pay for an insurance plan that will meet their current and future healthcare needs.
Because that is explicit, there is leverage available to manipulate each opportunity cost, and even the opportunity of each person to have that opportunity provided to them.
So what does that leverage even look like, and who is using it, and for what purpose?
Politics. Instead of care being determined by your doctor, access to each type of care is explicitly made available (or unavailable) by your insurance plan. That's a huge attack surface for political motivation.
There is currently a dextroamphetamine (Adderall) shortage in the US. The other day, I went to my pharmacy to pick up my prescription for 30 generic Concerta (methylphenidate extended release, another stimulant medication used for ADHD), and learned that all they had left were 16 brand-name Concerta. I was lucky enough to have that covered by my insurance. Many different insurance plans would not have provided me that opportunity.
Why is there a shortage? Despite a significant increase in ADHD diagnosis last year, the DEA refused to raise the limit of Adderall that can be legally manufactured. Why? Because there is a long-standing political conflict between stimulant addiction prevention and ADHD treatment, and the DEA is positioned at one side of it.
That same political conflict is why some insurance companies have outright refused to include coverage for stimulant medications. Even without a nationwide shortage, some people have found themselves stuck in a position where the opportunity for medication is held just out of reach by the political decision of their insurance company, or the lack of access to insurance at all.
The same pattern can be found with practically every type of medical care that is politically controversial: contraceptives, abortions, hormones, etc. Even if you can't get a legislative ban, there is still leverage available to obfuscate opportunity itself.
When conservative politicians argue that a single-payer program would be "too socialist for America", the substantive difference they intend to preserve is the political leverage that is baked into the system we have; the political leverage that allows politics to restrict our medical care without a single vote.
---
That's just one example. This pattern is everywhere. The only answer is social objectivity. It's a hard problem.
Once you get away from "should we use version control" and into actually difficult software engineering questions, it's not clear how to balance a fresh perspective vs. an experienced (normalized? tainted?) view. I wish the article went into this more.
Like, how does the new hire (or anyone else) know the difference between "learning the complexity of the new system" and "internalizing/normalizing the deviance of this culture"?
If a new hire can't checkout, build, and test the software on the first day, then there is likely something either wrong with the hire or the infrastructure. A sufficiently old and arcane software system might take weeks before a new hire can make even a simple change, but that shouldn't impact those three items.
To that speaks of the caliber of programmers hired. If all they have seen is $TODAYS_HOT_JS_FRAMEWORK and wrote nothing but a web app using $TODAYS_HOT_JS_FRAMEWORK they might not grasp the fundamentals that would make then realize that frameworks are just abstractions (and not that different from one another).
I don't think any software engineer would even ask that question, since the answer will almost always be "$TODAYS_HOT_JS_FRAMEWORK didn't exist when the project started, and it's not worth a re-write to port it over".
Now, that brings out a second important truth: a company can't attract and retain a wide range of different caliber employees. For instance, if a place still questions the usefulness of source control (perhaps because they consider git to be too complicated) there's no way they'll attract and retain top performers. So the culture will select people that agree that source control is a waste of time.
Having a healthy, balanced view comes from enough experiences. Ideally from working in a bunch of places and seeing enough things go sideways, and correctly understanding and identifying the causal chain that led to failures.
Funnily enough its sort of like training an AI - you essentially need a lot of correctly labelled data to learn. Junior engineers don't have enough data points, and unfortunately some "senior engineers" I've worked with took (in my opinion) the wrong lessons from their experiences. (Eg the CTO who thinks version control is too complex.)
The interesting cases are when smart, experienced people disagree on what the best solution is. Should you keep your team small and smart or have a varied team with more mentorship and process? Is code review worth it in every case? What is the right amount of tests for your software? How often do we want to push to production?
When I was teaching programming my students would sometimes ask juicy questions. My favorites were the questions I could answer with "I'll tell you my opinion, but I've worked with people I look up to who think I'm wrong about this..."
> “The thing that's really insidious here is that [once a person buys] into the WTF idea… they can spread it elsewhere for the duration of their career… Once people get convinced that some deviation is normal, they often get really invested in the idea.”
> [H]ow does the new hire (or anyone else) know the difference between "learning the complexity of the new system" and "internalizing/normalizing the deviance of this culture"?
The article implies that the new hire should pay close attention to the things that are incentivized, and those that are not.
To change the culture, these people have to go. Firing them may not be feasible, but there are other options. Dethroning them in the form of a promotion or even just physically moving them can be effective. When people don't have to jump through their hoops anymore, they lose their organizational power.
I was briefly head of engineering at a company that had several "old culture bearers" that made change impossible. I was something like the 3rd or 4th engineering leader over the space of a year. Apparently the person after me was actually allowed to fire a few of these people and was able to turn things around.
People generally don't wake up in the morning and go into work motivated to make insane/shit things - context, tech debt and business realities all mount up and even the best of us can end up making choices that in isolation look crazy.
There are of course companies who are really bad and you may well be right, but so many times I have seen in my career a young new hire storm in and think everything is shit without paying heed to the context and historical pressures. The best thing you can do in many cases is spend the first ~six months at a new tech company trying to understand that context, and indeed I think more mature engineers generally do.
I'd say a company that has accomplished (2) has cut the workload in hiring employees by 30-50% in the sense that every employee who has reaction (1) either internally or externally is at risk for being disengaged or leaving soon. Not only that but you are probably wasting your dev's time and could get dramatically more productivity out of them if you aren't WTFing them to death.
There should be no controversy at all that complete instructions for installing everything required for a dev to build the project and work on it should exist and it should be possible to complete this task in hours, not the weeks that it frequently takes. And, no, "docker" is not an answer to this anymore than "The F5 Key is a Build Process"
https://blog.codinghorror.com/the-f5-key-is-not-a-build-proc...
It is not "Docker" that solves the problem, it is the discipline of scripting the image build process into a dockerfile. If you know how to write a dockerfile you can write a bash script that runs in 20 seconds as opposed to having Docker spend 20 minutes downloading images and then crash because of a typo.
You are right that a company might have good reasons for doing things in an unobvious way, but most of the time when nobody at a company claims to understand what the company is doing except for the CEO and people aren't too sure about the CEO, it is the fault of the company lacking alignment, not a natural property of freshers.
One of the more interesting things I've found is that a huge number (easily a majority) of instructors are recently trained flyers, because there is a pipeline to train them and they're cheaper than using experienced pilots (esp for multi-engine and more complex airplanes). They also know all the ins-outs of the training and rule books (with recent changes) so they know how to pass all the tests and how to teach that. Sooooo you have a bunch of inexperienced pilots teaching all the new pilots... there's likely a failure there, but it hasn't reared its head. We still have a lot of ex-military folks around who didn't learn that way.
Who do you want flying when things go bad? People who have spent many hours with things about to go bad (military, emergency/fire, sail plane pilots) who have experience dealing with it. Those people can also be fun/terrifying to fly with, because they will take risks.
The A220, A350, and A380 are all newer than 30 years old. (A321 is barely younger than that and A330 barely older.) Boeing has released the 777 and 787 in the last 30 years. The Cirrus SR20 and SR22 are newer than that, as is the SF50 jet. The Diamond DA40, DA42, and DA62 are newer. The Honda Jet is newer. Cessna has a handful of business jets newer than that. The Embraer Phenom 100 and 300 are newer than that. There are variants of the CRJ newer than that (-700, -900, -1000).
That's a lot of new civil aviation aircraft designs in the last 30 years.
Sounds like there's some similarities with everyone focusing on Leetcode interviews, and then one generation of that filtering and then mentoring the next, and repeat.
The companies don't know what that's costing them, until there's a problem that can't be ignored.
In the case of software engineering (poorly studied, relative to aviation) the company will generally never learn whether a non-Leetcode&promo-focused team could've avoided the problems in the first place, nor whether non-Leetcode experience could've handled a problem that happened anyway.
Maybe. Or maybe you're better off with freshly trained people who still remember exactly what to do in all the failure scenarios. Certainly I've generally felt safer with drivers who'd just passed their test than with people who've been driving for years, for example.
Risk homeostasis in action!
https://soaringeconomist.com/2019/10/30/experience-can-kill-...
I've played around with some - at the time - fairly high powered lasers and have extreme respect for them, the number of near accidents with those things was large enough that I learned to triple check everything and check for stray reflections at reduced power and the cleanliness of all optics before going all in. That saved me more times than I care to remember and is a nice reminder of how finicky a powerful beam of light can be. It doesn't take a whole lot to get a sizeable fraction of your beam ending up in places where you really don't want it to be. But they're lots of fun, even if they are dangerous :)
Some industries have a really lax attitude and in quite a few cases upper management basically makes it impossible for people to speak up.
Once you change the system (document/rules/alerts/etc), then if it breaks, you change it again and learn the lesson. Both are conscious decisions by the org.
This experience as a tester is in fact a normalization of deviance from the ideal computation model of a developer. Everything should work the first time everytime from their point of view. The tester sees reality as it is. The Emperor won't fund my test systems sufficiently to service all my customers, so we make do ss best we can. Bonus points in that we get to exercise the edge cases.
The stats seem to bear that out as well.
Commercial study: https://www.fleetowner.com/perspectives/ideaxchange/article/...
Teen study: https://pubmed.ncbi.nlm.nih.gov/12643948/
In aviation, there’s a “killing zone” from 50-350 flight hours (with 40 being the typical legal minimum hours for licensing and 60+ being more typical).
The problem is the bash script may end up depending on poorly understood aspects of the local setup (global config files, installed packages, etc) - it might work fine now, but then nobody runs it for 12 months and there’s some churn in personnel and suddenly people are trying to work out why it crashes. Dockerfiles can avoid some of that stuff, although not always (e.g. the common problem that if you don’t fix the versions of packages to be installed, an updated package is released which then breaks the Dockerfile)
I'm pretty sure that Philips right now has some thoughts on this as does Medtronic. Those two should have never happened and personally I'm all for liability of executives in such cases.
Do people just... change files and then email the whole file to the other developers and hope nobody else was working on that file? Do they at least have patches?
First place I worked with other people, we at least had hard drives. I don't think we had networking on the machines or version control. For sure there was only one or two machines in the office that could reach the Internet. Mostly only one person could work on a file at a time.
When we did get more employees, a LAN, and version control a few years on, the mid-1990s Microsoft version control software was such a piece of junk it mostly amounted to a formal digital system specifying who the one person who could work on a given file was...
No concept of a patch. They spent most of the afternoon and evening "performing the manual merge and stabilizing the release", meaning rebuilding and deleting lines until it compiled.
I wish I was kidding.
Shared drives and folders with concurrent edits. Sometimes they'd separate them into "mine/2023.02.14" and "yours/2022.12.10" but that wasn't much better. Actually, because people don't seem to grok lexicographical order, or how to write dates at all, the dates are normally 10-12-2022 and 2-14-2023, guess whether the first one was from October or December.
People still griping about $TODAYS_HOT_JS_FRAMEWORK are clearly out of touch in 2023. It was funny commentary in 2013. It still rang a little true until around 2016. Now it's just an indicator you are the one not to be taken seriously.
It's React, Vue, or Angular and it's been that way for many years now.
Angular is still used in a surprising number of large companies my employer works for, so that, and React, and Vue are all solid investments IMO.
The other ones I think will only be used if one developer dares to take and sell the risk.
They're a good substitute for many of the use cases of testing.
> if a test works after retrying it then it is probably not something that a type system would be able to catch.
Type systems are pretty good at catching incorrect concurrency logic these days, and getting better all the time.
The history of nuclear brinksmanship is built on almost the exact opposite problem: people who are completely willing to sacrifice themselves for the cause and their government and who believe fully that the cost would be worth it and the decision would be correct.
At the expense of one's own life is one of the easiest sacrifices to make, and people who believe it are dangerous because they tend to volunteer a bunch of others to do so alongside them.
Nuclear command and control isn't about keeping any one person alive, it's very much about keeping the system functional so the deterrent is preserved. There's no way, within it, to actually ensure any level of personal survival - but the various advocates for first strikes at different points in history have never been concerned with that. They want their legacy, they want the problem solved "forever".
How would you do that? In the event of a nuclear war, my understanding is they'll mostly be flying around on special command and control planes. I don't think nuclear intercontinental SAMs are a thing. I'm not even sure if they could even be possible (wouldn't they need active guidance, which would be very hard on reentry).
The tier of generals just not senior enough to have a seat on the doomsday planes isn't in the emergency line of command to the nuclear weapons. So, regardless of how powerful a small coalition of those generals is, they cannot reliably prevent a nuclear launch. (They'd need a pre-existing conspiracy to quickly and efficiently turn their own air defence batteries against their own doomsday planes... at which point it seems very likely they'd just launch a coup long before a nuclear strike was ordered.)
So, I guess our last hope is that a small conspiracy of generals just under the doomsday plane tier would stage a coup once the nuclear sabre rattling reached a sufficient magnitude, before the nuclear first strike order is given.
Even if that happened, it's just buying a little time. Some set of leaders/generals in the future will push the button (or build automated systems that do it for them).
Disarmament ain't gonna happen, and anything with a small chance of happening will happen, given a long enough period of time.
> If the user/client asks you to make a small but not trivial change, how long would it take to update and deploy the program?
I have had answers ranging from "A couple hours" to "A year" (yes, they were serious). Most were in the 1-3 month range, though, which is pretty bad for a small change. It also makes it apparent why a bunch of changes get batched together whether reasonable or not. If a single small change, single large change, or collection of variable sized changes all take a few months to happen, might as well batch them all up. It becomes the norm for the team. "Of course it takes 3 months to change the order of items in a menu. Why would it ever be faster than that?"
Upd. And “change menu items order, fast” is a sign of a problem. We found Mac Cube in ski vacation rental home once. It ran MacOS 10.2 or something. All the menu items were in the places we expected them to be! You think carefully first, then you implement menu items order. Upper Apple -> About this Mac. We managed to break their network config in like 5 minutes!
And I don't think I understand your update to your comment or you don't understand the point of that example from mine. It was illustrating the submission topic: normalization of deviance. Sure, you should think about where things should be but if a customer comes in and says, "Swap these two items" and you can't provide a working version with that single change for months then things have gone off the rails somewhere. I put it in quotes to reflect a statement like what I have heard from those teams I worked with. To them a long effort for a trivial change is normal, when it should be considered deviance.
EDIT: effect->affect. Always trips me up.
You can do a lot with good test automation, even in avionics. That cuts down a ton of the time and usually improves quality.
I'll also note, don't take my "deployed" too literally. I used that term because so many people here are working on server-based applications where that makes sense. Think "out the door". The exercise can only go as far as the team/org's reach. Deployed for avionics would mean more like, "At the flight test team". After that, it's up to someone else to schedule it and get it returned with issues or fielded.
Going beyond the team's reach without including those people (and thus making them part of the team, after a fashion) is guess work and opens up the blame-game. "It's all flight test's fault it takes a year to get out to the customer." Well, it takes you 9 months to get it to flight test and them 3 months to get done. So why does it take you 9 months? If you have a good reason (complex system, lots to test) then that's valid. If it's a simpler system, 9 months to get it to flight test is probably not justifiable.
- Install docker - Setup GitHub SSH credentials. - Pull the main repo. - Run a script that will pull down related repos, install dependencies, start up a bunch of docker containers, and then run health checks on the app. - setting interactive debugging takes a bit longer, but not too much more.
Unfortunately I’ve routinely dealt with our IT department being slow to give credentials to new employees or shipping them under provisioned or just incompatible systems. No you can’t give our new senior developer the same cheap crap laptop running an ancient version of windows on that you send to the junior marketing person doing cold calls all day.
I worked for a company that swallowed the (so-called) Toyota schtick hook, line, and sinker. About 14 years ago I tolerated some Toyota UK fossil coming in and berating me, in front of my entire team, for being a crap project manager, in spite of I was the most reliable and accurate product manager said (very successful and healthily growing) company had at the time. Seriously, still, fuck that guy with a nail-festooned cricket bat. I fucking shipped everything within the constraints I'd descrived at the beginning of the project, and it did great in the market. Anyone who doesn't like it is welcome to kiss my ass. But whatever.
Toyota or, more accurately, consultants who like to hawk the Toyota Production System (TPS), talk a good game, but the reality isn't always aligned with the ideals. Jidoka is evidently not a reality at Toyota, and they aren't much more enlightened than other orgs when it comes to pointing out problems, despite their A3 reports and multicoloured boards.
The Reckoning, by David Halberstam, makes it clear that "Toyota-like" practices aren't unique to Toyota amongst Japanese auto manufacturers. It also makes clear that these practices primarily exist to keep workers engaged and morale high (because, for those of you who've never worked on a production line [I have], in case there's any doubt in your minds, yes, it's boring as fuck).
The reason Toyota was much more successful than other Japanese auto makers in the second half of the 20th century is bugger all to do with their production process, and is instead the result of them being more aggressive and decisive in the wake of WWII: they simply opened a bigger factory sooner than their competitors and were therefore able to meet demand better. This gave them a trading advantage that lasted decades. The TPS didn't hinder their advantage, but it's absolutely disingenuous to claim it as the root cause.
Do NOT drink this koolaid about the TPS. I'm not saying there's nothing of value in it (I like genchi genbutsu, for example), but take it all with a pinch of salt. The value depends on who you are, who your team is, and how as a group you best operate. Fork-lifting business practices thoughtlessly from one organisation to another often doesn't work that well and TPS is no exception. It's no better than Agile cargo-culting but, because TPS is less mainstream, perhaps hasn't come under the same critical scrutiny.
Plus TPS's penchant for fault finding and negative culture overall just pisses people off and drags them down when they are (or should be) engaged in more creative problem solving. So something didn't work out: get over it, move on, and find another solution. Don't spend ages navel gazing about it. WTF? Seriously, if you think nitpicking everything and everybody makes you a good manager, you're an idiot and you should find another vocation. Fuck the fuck off. You're a tedious oxygen thief who's boring everyone.
Maybe it makes sense when you build the same thing over and over and over again, but we don't do that and we never did so it was always ridiculous to expect this to work well (and I say this as someone who, good faith, gave it a go, but the problem is that perhaps all the people pushing it at the time weren't acting in good faith).
To me the 'Toyota way' was more of an illustration than an exact guideline to follow and I've found this to be true for most of these things that tend to become a religion. Scrum, TDD etc all have this potential to become fodder for consultants that essentially sell a dream that they can not deliver on. But that doesn't mean there isn't a kernel of truth in there.
Secondly, more to your point, I'm certainly not trying to defend the whole system or even imply that it's effective at accomplishing its stated goals. I'm merely saying that the concept of encouraging employees working with/creating/designing a product to point out flaws and making a point of digging into where defects are introduced is a good idea. I definitely can't speak to how well that philosophy is applied at Toyota but I think that's moot regardless.
That is, once something is normalized you don't notice it anymore. Like how people that saw the 'rona epidemic was under control (ish) thought the measures were no longer needed.
It is always important to know what the underlying assumptions of your strategic advantages are. Going 'countercurrent' can work, but then if the tide turns you need to be aware that your previous advantage is now a risk.
I know this is reasonable advice, but it makes me deeply cynical. After 6 months I will have learned to live in the shit (to use your term), and so it still seems like I have nothing to gain by speaking up or trying to fix things. A culture that accepts shitty code probably isn't supper demanding for an experienced developer who is accustom to the mess, so I'll just coast through my time and hop jobs after a few years.
If nobody wants to respectfully talk about my criticisms on day one, then they wont really want to at 6 months either. In the end I'm lead to believe I should have zero concern for code quality and only worry about my personal reputation.
Criticising the status quo is not a winning move for me, especially when it's lead to the company's engineering team tripling in size. If I'm asked, I'll pick some low hanging fruit– remove reliance on legacy/redundant JavaScript libraries such as jQuery, and spent time writing better unit tests. But so far I haven't been asked.
We know, but haven't had time to fix it, maybe we'll assign that to you when you're caught up.
We didn't think of it that way, good catch, lets go into detail later.
Yeah, but doing it this way makes this other thing easier, we'll show you that when you're ready.
or even: I don't know, my brain is fried with this project, can you ask again in a few months?
Dan's point is that sometimes the new person's judgement is correct, and there actually is a real problem that's invisible to people who have been with the project a long time. But the new person's judgment is basically always ignored, and that's a mistake - it ought to be weighted heavily because they legitimately have a perspective that insiders no longer have.
If instead you spend six months trying to understand the context:
"new person joins
new person: WTF WTF WTF WTF WTF
old hands: yeah we know we're concerned about it
new person: WTF WTF wTF wtf wtf w...
new person gets used to it
new person #2 joins
new person #2: WTF WTF WTF WTF
new person: yeah we know. we're concerned about it."
I'm sympathetic because my first company was a mid-stage startup with huge structural problems in the engineering org structure and processes. When I joined I had frequent "WTF" moments and had a similar experience where experienced people would explain to me why things are the way they are. So I trusted them, and put my head down, but eventually got frustrated and left. A few months later the company went bankrupt because they couldn't build product fast enough, investors lost patience, and they couldn't raise another round.
Remember, the new person has something that nobody else on the team can ever learn, no matter how much they study or how long they work. The new person has a fresh perspective.
So often, citing Chesterton's fence is significantly more naive than what it attempts to criticize.
It simply asserts that understanding why a thing is the way it is is valuable when making a decision to change it.
That understanding could be as simple as--to take a real world example that most readers here will remember--"They chose to install a hidden web server on the user's system, because they felt it was the best way to deliver user convenience given the resources and time the team had available."
We can still say it was a bone-headed choice to do that because it opened a massive back door to every user's system. And? What is the problem with looking into why they made that choice before arguing that the choice should be reversed with maximum prejudice?
Chesterton's Fence isn't a suggestion that no changes should be proposed, or that if you look into the original motivations you will change your proposal. Think of it as insurance against the possibility that every once in a while, you will discover a requirement that needs to be addressed with your suggested change.
I don't see where you're coming from that quoting Chesterton's Fence is even "criticism." It's a suggestion to take out a little insurance by doing a little homework.
In many cases in my career, I've seen code that doesn't make sense or seems like a bad idea. The person who could explain why it's there has long left the company. Am I afraid and leave the screwy stuff there, while citing Chesteron's fence? Hell no. I'll change it to do the right thing. This results in either exposing the reason why it's there, or showing that it really was unnecessary/bad. If something breaks from the change then it's good that I can finally document what wasn't documented before. So either way it's a win.
Ouch!
But that may not be true for everyone. If making changes and seeing what does or doesn't break is a successful strategy for you, go for it.
But for classic sports cars they're fairly normal, those engines were not made to last forever, tend to be fairly high power for their displacement and the book value of the cars is high enough that rebuilding an engine can make sense.
People are always looking for silver bullets and the industry is rife with examples of this kind of thing.
AKA compliance checkbox crap?
If infrastructure is immutable (which makes it work even better for autoscaling), nothing new will get installed unless you build a new image. Export whatever data you require to ensure things you want to be running are running. Monitor entry and exit points.
What is left for the "security deamons" to do?
Although if it was an IRC server then that would have been truly 1337.
Am I double-whooshing here?
How is a Hacker News comment thread not the right place to respectfully ask questions in response to interesting comments. I know I'm not entitled to an answer, nor do I intend to start a flame war. Sheesh
> It simply asserts that understanding why a thing is the way it is is valuable when making a decision to change it.
The second assertion is implicitly an assertion that decisions made in the past are, if not always good, at least good enough often enough to be worth understanding. In my experience that's not true; most of the time it's just something someone did without really thinking about it.
1. The decision in the past was sensible at the time given what the people making that decision knew/believed/were incentivized to optimize for, versus;
2. It's worth knowing what was on their mind when they made the decision.
I think the two are independent. It could be that there is no good reason for a choice people made, but it's still helpful to look into whether they had a reason, and not just assume there was no good reason without looking into it. I personally think assuming there's no good reason for a decision without looking into it is "picking up nickels off of railroad tracks."
You save a little time if you don't try to find out whether there was a reason, and most of the time your hunch that there was no good reason will be correct. And some of the time, if there was a good reason, it no longer applies, so you are saving time not looking into that reason.
But once in a while, there was a good reason and it reflects some constraint or requirement that is still relevant. It doesn't mean you can't change the thing, but it does mean that you should address the constraint or requirement as part of your proposed change.
If you never look into the reason, once in a while you will miss something. Another comments suggested "move fast and break things," i.e. Make the change and if something breaks, fix it then. That's a strategy too, but some things don't work that way. For example, some code might fix a bug that applies to one valuable customer, and if you change the code without knowing about the bug fix, you will find out about it via an irate customer.
In some cases, the cost of an irate customer once in a while is much bigger than all the time saved not looking into things. Or maybe it's a security thing, in which case one vulnerability might be extremely expensive to deal with.
I agree with you that not all decisions made in the past are worth taking into account when making changes, but in my n=1 experience looking into things is cheap insurance against the times when there is a hidden requirement or constraint that has material impact on your business. And when I frame it in my mind as insurance, I don't mind looking into 99 things that turn out to be immaterial: The 1 time it is material makes all 100 investigations worthwhile for me.
Yes, that is sometimes the case, but not often enough in my estimation to skip the step of trying to understand why that fence is there.
We can still say it was a bone-headed choice to do that because it opened a massive back door to every user's system. And? What is the problem with looking into why they made that choice before arguing that the choice should be reversed with maximum prejudice?
Because I was suggesting a different method for the specific task at-hand. I don't see where you're coming from that quoting Chesterton's Fence is even "criticism." It's a suggestion to take out a little insurance by doing a little homework.
Every time I've heard someone quote Chesterton's Fence, it's always been as a means to halt the conversation. Essentially, "shutup" -- an indirect critique of critique itself in dismissive form. There's possibly some meta point here about you not knowing the full circumstances of the situation to warrant bringing up Chesterton's fence.From here forward you can say that almost every time you've heard someone quote Chesterton's Fence, it's almost always been as a means to halt the conversation.
Today, you've encountered a counter-example, and a very firm counter-example, at that. To my mind, Chesterton's Fence is explicitly NOT about shutting down a conversation. It's an invitation to continue the conversation with more information to validate your suggested course of action.
No different than if an engineer suggests, "We should rewrite this code to be faster." What team lead or product manager wouldn't ask, "Is this a bottleneck? Have you profiled it? Do we know there are users impacted by this code's performance?"
Or if someone suggests building a bespoke feature flag service. "Have you done a build vs. buy analysis? What alternatives have you considered before choosing this design? Are there any OSS solutions that are close enough to our requirements?"
These kinds of responses shouldn't be uttered as a way of shutting down a conversation. If that's someone's intent, they are abusing their privilege.
The right way to use any of these patterns is to say them in good faith, and then socialize amongst the team the standard of preparation the team expects of someone proposing a non-trivial change.
Over time, the need to say such things decreases because the team internalizes what preparation/rigor/justification is needed for proposing changes, and does the work ahead of suggesting changes.
Whereas, if the tone and intent is to block change, the team goes down a toxic path where people are discouraged from suggesting improvements of any kind. If that's what you've encountered, you have my sympathy and I can complete understand why you might be wary of people quoting Chesterton's Fence.
These kinds of responses shouldn't be uttered as a way of shutting down a conversation. If that's someone's intent, they are abusing their privilege.
To be perfectly honest, your quote came across in that spirit. Anywho, peace!Additionally, I don’t have a lot of respect for anyone with the ego to assume they know what information was withheld “deliberate”ly or not in a discussion like this. How do you know that?! How do you not see that the OP can make this decision for themselves?!
If being respectful means anything it means reading their post closely and trying to understand what they were trying to convey. You can't talk about denying someone agency if you won't pay attention to what they're telling you.
> How do you know that?! How do you not see that the OP can make this decision for themselves?!
They did make that decision for themselves! It was clear from their post!
> I don't have a dog in this fight politically one way or the other
Neither do I.
> people don't need to identify themselves IRL here
I don't think they do either. Why are you assuming I "needed" this information?
Why won’t either of you respond to my core argument: GGP does not need to respond to my comment if they’re not comfortable.
Me asking the question is not me demanding a response.