Accidentally Load Bearing(jefftk.com) |
Accidentally Load Bearing(jefftk.com) |
> It is Hyrums law.
No, it's "Hyrum's" not "Hyrums": https://www.hyrumslaw.com/
Or, to put it another way, figure out what the consequences are before you decide whether you're willing to intend them or not.
https://www.reddit.com/r/ProgrammerHumor/comments/q9x1d2/ask...
I put in a new beam to replace the load bearing wall the preivous-previous owners had removed, with posts down to the basement: https://www.jefftk.com/p/bathroom-construction-framing
https://www.youtube.com/watch?v=jfBOIbjbLv0
(or, jump to https://youtu.be/jfBOIbjbLv0?t=1560)
Both generalize to various forms of causal and failure analysis in systems design.
Really good, brief post.
They were agreeing with the top-level comment, not correcting their spelling.
I do a lot of support work for Control Systems. It isn't unheard to find a chunk of PLC code that treats some sort of physical equipment in a unique way that unintentionally creates problems. I like to parrot a line I heard elsewhere: "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born".
But often enough when I find a root cause of a bug, or some sort of programmed limitation, the client wants removed. I always refuse until I can find out why that code exists. Nobody puts code in there for no reason, so I need to know why we have a timer, or an override in the first place. Often the answer is the problem it was solving no longer exists, and that's excellent, but for all the times where that code was put there to prevent something from happening and the client had a bunch of staff turnover, the original purpose is lost. Without documentation telling me why it was done that way I'm very cautious to immediately undo someone else's work.
I suppose the other aspect is knowing that I trust my coworkers. They don't (typically) do something for no good reason. If it is in there, it is there for a purpose and I must trust my coworkers to have done their due diligence in the first place. If that trust breaks down then everything becomes more difficult to make decisions on.
Sleep due to rate limiting from another service? COMMENT. Who's requiring it, the limits if I know exactly what they are at the time (and noting that I do not and this is just an educated guess that seems to work, if not), what the system behavior might look like if it's exceeded (if I know). Using a database for something trivial the filesystem could plausibly do, but in-fact cannot for very good reasons (say, your only ergonomic way to access the FS the way that you need to, in that environment, results in resource exhaustion via runaway syscalls under load)? Comment. Workaround for a bug in some widely-used library that Ubuntu inexplicably refuses to fix in their LTS release? Comment. That kind of thing.
I have written so very many "yes, I know this sucks, but here's why..." comments.
I also do it when I write code that I know won't do well at larger scale, but can't be bothered to make it more scalable just then, and it doesn't need to be under current expectations (which, 99% of the time, ends up being fine indefinitely). But that may be more about protecting my ego. :-) "Yes, I know this is reading the whole file into memory, but since this is just a batch-job program with an infrequent and predictable invocation and this file is expected to be smallish... whatever. If you're running out of memory, maybe start debugging here. If you're turning this into something invoked on-demand, maybe rewrite this." At least they know I knew it'd break, LOL.
I remember a former client tracking me down to ask about a bug that they had struggled to fix for months. There was a comment that I'd left 10 years earlier saying that while the logic was confusing, there was a good reason it was like that. Another developer had come along and commented out the line of code, leaving a comment saying that it was confusing!
It's absolutely imperative that the next guy knows what the fuck I'm doing by tampering with safety limits.
COMMENT WRITTEN: 2023-03-21
COMMENT LAST REVIEWED/VERIFIED AS STILL TRUE: 2023-05-04
WHY THIS CODE: This sucks but ...
It took several hours to figure it out, but the sleep was there in case a file had not finished downloading
Isn't that just the regular Chesterton's Fence argument though?
The one the article is specifically written to point out is not enough by itself, because you need to know what else has been built with the assumption that that code is there?
You're not wrong, but in the context of a PLC controlling a motor or gate it is far more segregated than the code you're probably thinking of. Having a timer override on a single gate's position limit sensor would have no effect on a separate sensor/gate/motor.
If the gate's function block had specific code built into it that affected all gates then what you're talking about would be more applicable.
> I don't know why this rung is needed but delete it and see what happens for yourself
Did not fuck around; did not find out.
Generally they're controlling industrial equipment of some sort, and making changes without a thorough understanding of what's happening now and how your change will affect the equipment and process is frowned upon.
At least some of this is cultural. EEs and MEs have historically viewed software less seriously than electrical and mechanical systems. As a result, engineering cultures dominated by EEs/MEs tend to produce shit code. Inexcusably incompetent software engineering remains common among ostensibly Professional Engineers.
I've basically found my niche in the industry as a Software Engineer though I can't say I see myself staying in the industry much longer. The amount of time's I've gotten my hands on code published by my EE coworkers only to rewrite it to work 10x faster at half the size with less bugs? Yikes. HMI/PLC work is almost like working in C++ at times, there's so many potential pitfalls for people that don't really understand the entire system, but the mentality by EE/ME types in the industry is to treat the software as a second class citizen.
Even the clients treat their IT/OT systems that way. A production mill has super strict service intervals with very defined procedures to make sure there is NO downtime to production. But get the very same management team to build a redundant SCADA server? Or even have them schedule regular reboots? God no.
Software, being less familiar, is not viewed as a fundamental architectural component because there often isn't sufficient understanding of the structure or nuance involved in building it. In my experiences software or firmware engineers tend to be isolated from the people who designed the physical systems, and a lot of meaning is lost between the two teams because the software side does not understand the limitations and principles of the hardware and the hardware team does not understand the capabilities and limitations of the software.
I hate PLC work for other reasons. I'm starting to look at going back to more traditional software role. I'm a bit tired of the road work and find the compensation for the amount asked of you to be drastically underwhelming. This meme is very much relevant:
This so much. Depending on the git blame, I'll either remove it blindly or actually think about it way more.
Early in my career, I was confused by seemingly-crazy questions in the Hacker Test (https://www-users.york.ac.uk/~ss44/joke/hacker.htm) like...
> 0133 Ever fix a hardware problem in software?
> 0134 ... Vice versa?
But after spending years developing embedded systems, I don't even blink at such questions. Yes, of course I have committed such necessarily evils!
I would like to think that if I sent out an email about git hygiene that you would support me against the people who don’t understand why I get grumpy at them for commits that are fifty times as long as the commit message, and mix four concerns two of which aren’t mentioned at all.
Git history is useless until you need it, and then it’s priceless.
I can’t always tell what I meant by a block of code I wrote two years ago, let alone what you meant by one you wrote five years ago.
One of my proudest commits had a 1:30 commit:message length ratio. The change may have only been ~3 lines, but boy was there a lot of knowledge represented there!
What the article is describing reminds me of the XKCD comic workflow: https://xkcd.com/1172/
A system exists external to the creators original purpose, and can take on purposes that were never intended but naturally evolve. It isn’t enough to say “well that is not in the spec”, because that doesn’t change reality.
After a year long foray into the world of PLC, I felt like I was programming in the dark ages.
I'm assuming its a bit better at very big plants/operations, but still.
I'm definitely going to use this, and I think there's a more general statement: "Every time software is used to fix a problem in a lower layer (which may also be software), a gremlin is born."
I think I get the gist, but that sentence is missing some words.
Gizmo caca.
(...I just watched both Gremlins movies last weekend...)
Just did one of those this morning. Hmmm.
The zinc gutter had leaked for probably decades and it destroyed part of the roof structure. The roof was held up by the wooden paneling the used to cover it on the inside (70s). So the wooden paneling was actually load bearing
Actually I've found way more stuff in this house. For example at the end of the roof the tiles weren't wide enough and instead of buying extra roofing tiles, they decided to fill it with cement and pieces of ceramic flower pots.
Software differs from all other means of production in that we can in fact test any change we make before realizing it in the world.
With good tests, I don't care what the intent was, or whether this feature has grown new uses or or new users. I "fix" it and run the tests, and they indicate whether the fix is good.
With good tests, there's no need for software archeology, the grizzled old veteran who knows every crack, the new wunderkind who can model complex systems in her brain, the comprehensive requirements documentation, or the tentative deploy systems that force user sub-populations to act as lab rats.
Indeed, with good tests, I could randomly change the system and stop when I get improvements (exactly how Google reports AI "developed" improvements to sorting).
And yet, test developers are paid half or less, test departments are relatively small, QA is put on a fixed and limited schedule, and no tech hero ever rose up through QA. Because it's derivative and reactive?
This is to say —- I agree with the article, but much nicer is to work at a place where you don’t expect to make this particular discovery very often, hah
The analogy for the load bearing stud might be a hackathon project that never expected to see production. In reality, a lot of what we do is hack on something until it barely works, and move on to the next thing.
Today's SE has wandered far, far afield from original goals, tragically enough, but that was the original conception. One of the reasons for today's relatively toothless SE departments is the rise of finance into maintenance planning. Inventory depreciation is a cruel mistress, and "what gets spared" is rarely a SE judgement these days, at least in my experience. This has predictable results, but is partially offset by the exceptionally high bar for aerospace maintenance staff, who are generally pretty damn badass compared to, say, a washing machine repairman. Finance, naturally, would like to knock that bar down a few pegs, too.
> Except that over time it had become accidentally load bearing
> through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house
Evidently, you couldn't easily tell why it was there. Moreover, I'm not persuaded that it accidentally became load bearing. It seems quite plausible that it deliberately became load bearing, for reasons which are ill conceived to you but not to the people who had them.
No, they could tell why it was there. It's just that knowing why it was there in the first place doesn't tell you what it's doing now.
The lab was full of smart people who were used to looking at things and making their own well-reasoned conclusions about whether it was OK to change something. This was a warning not to be too hasty about doing that!
IME, having robust alerting and monitoring tools, good rollback plans and procedures/automation should eliminate this fear entirely. If I was afraid to touch anything for fear of breaking it, I'd likely never get anything done.
Sure, but that all sounds like stuff that happens after you deploy/release… you really need to catch things sooner than that. Don’t make the user into the one who has to find the breakage, please. No matter how fast you roll back. Test your software thoroughly!
When I come in on a consulting basis I often have to help developers unwind the unintended effects of years of patches before we can safely make seemingly-simple changes. It's analysis-intensive, and like an archaeologist any artifacts I can dig up that you've left behind providing clues to what was in your head at the time can be helpful. In some cases where a set of functions is super critical we've made it part of the culture every time altered code is checked in to perform a walkthrough analysis to uncover any fresh unintended side-effects, ensure adherence to design requirements and discover and spotlight non-obvious relationships. The key is to turn the unintended or non-obvious into explicit. Sounds painful but in practice due to the high risk attached to getting that portion of code wrong the developers actually love that time for this is worked into the project schedules - it helps them build confidence the alteration is correct and safe.
I wish it were easier to impress the importance of this on inexperienced developers without them having to go through the hell of maintaining code in an environment where it was lacking.
It's a skill and art to keep the appropriate level of conciseness to avoid the documentation (no matter which form it takes) becoming too verbose or unwieldy to be helpful.
this is also why tests are so important. if you want to remove something, you have to think twice... once for the original code and once to fix the broken tests.
This is why I'm against squashing commits in git, that our other development teams don't understand: you're going through extra effort to make it more difficult in the future.
We're probably at a stage where doing this in an automated fashion might be reasonable. The context for why something works a particular way (say in some github repo) is the commit message and change detail, the PR, related issues, related chats in <chat tool of choice>, and then things temporally related to those. Feeding an LLM with the context and generating ADRs for an entire project automatically could be a feasible approach.
> I tried to cut it and it started to bind the saw. I stopped and thought, and then did not cut anymore until I had shored up the upstairs.
https://www.jefftk.com/p/accidentally-load-bearing#fb-231219...
In your circumstance, the board should have had some wiggle but prolly wouldn't have.
This is exactly I found out that a stud was now load bearing.
Dependencies flow in and onto new components like water. This is a fundamental law.
Even secret interfaces, etc that you think no one could find or touch will be found and fondled and depended upon.
This is why the correct view is that all software is a liability and we should write as little of it as possible, just the minimal that is needed to get the dollars and nothing more.
A good example of a famous failure if this:
https://en.m.wikipedia.org/wiki/Hyatt_Regency_walkway_collap...
That's why it is a good practice to have DiRT or similar exercises. Break things on purpose so that people don't rely on them when they shouldn't.
One of the biggest challenges I find in refactoring/rearchitecting is getting people to separate why something was done from whether it still needs to exist.
Too many times someone has insisted that a weird piece of code needs to exist because we never would have shipped without it. They treat every piece of the structure as intrinsic, and can't conceive of any part of it being scaffolding, which can now be removed to replace with something more elegant and just as functional.
When you put a doorway through a loadbearing wall, contractors and smart people place a temporary jack to brace the wall until a lintel can be substituted for the missing vertical members. Some developers behave as if the jack is now a permanent fixture, and I have many theories why but don't know which ones are the real reasons.
Think of how the SCRAM at Pripyat/Chernobyl caused the reactor to go supercritical because of the graphite tips. The control rods should have reduced reactivity in the reactor, but the first section of the rod increased reactivity.
Or how a hard hat dropped from a height may cause injury.
On the other hand, if you can't figure out what something is for, sometimes the easiest way to find out is to remove it and see what breaks. Maybe that's not such a great idea for that load-bearing stud, but in software it's easy to undo.
Guess I'm thinking of being very aware of how the structure reacts to failure, and not necessarily the strenght of all individual parts.
If you ever see a house stripped down to the sticks for a rebuild, you will hopefully notice a few braces added. Not to keep the walls from falling down, but to keep them square and true until the walls are rebuilt.
(I've seen dozens of houses from ~1900-1920 in the process of full gut rebuilds, and none of them had diagonally installed sheathing.)
At first look it seem like someone backed into the garage door and mangled the hell out of it but on more careful inspection the roof is being barely held up by the tracks that the door runs in and is pretty near to giving up the ghost. Was just going to splice the ends of the rafters (like someone did on the other side who knows how many years ago...if it works, it works) and replace the garage door but now its looking like I'll need a whole new roof.
What really worries me is the dodgy wiring strung all across the basement which is a combination of newish wires, old cloth covered wires and liberal applications of electrical tape to splice it all together. Luckily none of the wires seem to be load bearing...
My "fix" held for about 11 years, but apparently it very slowly weakened, creating a small divot on the roof. Which got bigger and bigger with each rain, but since I never go on the garage roof, I didn't notice.
Until during one heavy rain I got a surprise skylight!
So yeah, you probably want to fix that before you get a total collapse like I did.
Sounds like your guy had a similar experience.
Most problems with houses come back to managing water, air, or some other infiltration. But mostly it's water.
Except when the tests verify what the code was designed to do, but other systems have grown dependencies on what the code actually does.
Or when you're removing unused code and its associated tests, but it turns out the code is still used.
Or when your change fails tests, but only because the tests are brittle so you fix the test to match the new situation. Except it turns out something had grown a dependency on the old behavior.
Tests are great, and for sufficiently self contained systems they can be all you need. In larger systems, though, sometimes you also need telemetry and/or staged rollouts.
Assuming you mean systems in terms of actually separate systems communicating via some type of messaging, isn't that where strong enforcement of a contract comes into play so that downstream doesn't have to care about what the code actually does as long as it produces a correct message?
> Or when your change fails tests, but only because the tests are brittle so you fix the test to match the new situation. Except it turns out something had grown a dependency on the old behavior.
I think this supports GPs point about tests being second-class and not receiving the same level of upkeep or care that the application code receives and one can argue that you should try to prevent being in a position where tests become so brittle you end up with transient dependencies on behavior that was not clear.
Your code was supposed to calculate the VAT for a list of purchases, but progressively becomes a way to also force update the VAT cache for each product category and will be called in originally unexpected contexts.
BTW this is the same for comments: they'll cover the original intent and side effects, but not what the method/class is used for or actually does way down the line. In a perfect world these comments are updated as the world around them changes, but in practice that's seldom done except if the internal code changes as well.
Having worked on very long-running projects, testing or well-funded QA doesn’t tend to save you from organizational issues.
What typically seems to happen is tests rot, as tests often seem to have a shelf life. Eventually some subset of the tests start to die - usually because of some combination of dependency issues, API expectations changes, security updates, account and credential expirations, machine endpoint and state updates, and so on - and because the results from the test no longer indicate correctness of the program, and the marginal business value of fixing one individual broken test is typically very low, they often either get shut off entirely, or are “forced to pass” even if they ought to be an error.
Repeat for a decade or two and there quickly start being “the tests we actually trust”, and “the tests we’re too busy to actually fix or clean up.”
Which tests are the good ones and bad ones quickly become tribal knowledge that gets lost with job and role changes, and at some point, the mass of “tests that are lying that they’re working” and “tests we no longer care to figure out if they’re telling the truth that they’re failing” itself becomes accidentally load-bearing.
Huh? Your first part seemed to be repeating TDD optimism but then you switch test departments. Just make your claims consistent, I'd suggest you instead talk about tests being written by the programmers, kept with the code and automatically run with the build process.
However, I don't think even TDD done right can replace good design and good practices. Notably, even very simple specifications can't be replaced by tests; if f(S) just specified to spit out a string concatenated with itself, there's not obvious test treating f as a black box that verifies that f is correct. Formal specifications matter, etc. You can spot check this but if the situation is one magic wrong value screws you in some way, then your test won't show this.
there's no need for software archeology, the grizzled old veteran who knows every crack, the new wunderkind who can model complex systems in her brain, the comprehensive requirements documentation, or the tentative deploy systems that force user sub-populations to act as lab rats.
Wow, sure, all that stuff can be parodied but it's all a response to software being hard. And software is hard, sorry.
Software needs people with the suspicious minds of good testers, but security people make more money for the same skillset.
"You" might know it in "your" creations, but in my career I am much more often working and reworking in other people's creations.
I think the point of the article is not that you should avoid using decorative studs as load-bearing elements, but that you should be aware that others may have done so before you came along.
This is an even more conservative position than the default Chesterton's Fence reading, which is itself dismissed by a lot of people as pedantically restrictive.
For me, the parent article resonates. I have definitely had ceilings come crashing down on my head when I removed a piece of "ornamental" trim (programmatically speaking)
In a normal, real-life context, I can see why someone would feel that way.
In a software engineering context I think it's just a further emphasis that you ought to understand what something is doing before fiddling with it, and both the original intent and what it is currently doing are interesting information. Many times I've removed dead code, only to learn that not only was it alive (which wouldn't have been that surprising, it's easy to miss some little thing), but that it is very alive and actually a critical component of the system through some aspect I didn't even know existed, which means I badly screwed up my analysis.
The differences between the physical world and the software world greatly amplify the value of the Chesterton's Fence metric. In the physical world we can all use our eyes and clearly see what is going on, and while maybe there's still a hidden reason for the fence that doesn't leap to mind, it's still a 3-dimensional physical problem in the end. Software is so much more interconnected and has so much less regard for physical dimensions that it is much easier to miss relationships at a casual glance. Fortunately if we try, it's actually easier to create a complete understanding of what a given piece of code does, but it is something we have to try at... the "default view" we tend to end up with of software leaves a lot more places for things to hide on us. We must systematically examine the situation with intention. If you don't have time, desire, or ability to do that, the Chesterton's Fence heuristic is more important.
We're also more prone to drown under Chesterton's fences if we're not careful, though. I've been in code bases where everyone is terrified to touch everything because it seems like everything depends on everything and the slightest change breaks everything. We have to be careful not to overuse the heuristic too. Software engineering is hard.
Is it always lazy in the bad way, though? In software there's no sharp distinction between "built to carry weight" and "built to tack drywall onto." Whether a system is robust or dangerously unscalable depends on the context. You can always do the thought experiment, "What if our sales team doubled in size and then sold and onboarded customers as fast as they could until we had 100% of the market," and maybe using your database as a message queue is fine for that.
If it results in a sad dev team, then that's a case where it was a mistake. It's hard to maintain, or it's an operational nightmare. That isn't the inevitable result of using a (software) decorative stud as a (software) load-bearing element, though. There are a lot of systems happily and invisibly doing jobs they weren't designed for, saving months of work building a "proper" solution.
More often than not I've seen this happen because they, in fact, do not know.
You could conclude that the code is unnecessary and remove it, or you could conclude that some test cases need to be added to exercise it. How do you decide which is correct?
The problem is usually that well thought and and designed software was build for a moving target, and invariably things have changed over time. It's not necessarily a sign of lazy design, it's where the real world intersects with the nice neat pretend world we design for :)
there is no excuse for not owning and knowing the software you are supposed to be in control of.
Do you? Sometimes quick-one-time fixes becomes the center of important software.
Whenever I find myself doing this, I at least leave a comment typically worded along the lines of "the dev is too lazy, not enough time to do it right, or just has no clue what to do, so here you go..."
Very wild stuff.
|---]/[---[ONS]---[MOV]---|
i mean you are talking about upgrading things that are going to be in service for decades perhaps. the requirements for the programs is generally not complicated. turn on a pump for a time, weigh something and then alarm if some sensor doesn't see something.
Structured text was an improvement over ladder logic as you could fit more of the particular program in the screen real estate you had and could edit it easier since it was just text. though, that had its own set of issues that needed to be worked through and it wasn't a panacea.
Can't expect your platform to be reliable, if it just breaks at random.
But yes, it's basically what the job comes down to - having a strategy for managing complexity in all it's forms, and this is a fine example of the sort of problem that you don't learn in college.
I've (thankfully) never deprecated code and caused serious production issues, but i've seen it happen. The best places to work expect this sort of issue, and have process in place to roll back and deal with it, like any other business continuity issue (e.g. power/network loss).
The moment you find yourself scared of changing code because you don't understand the consequences then you've basically lost the battle.
My old house (~1920) had diagonal shiplap under the floors instead of plywood (but parallel in the oldest walls). That's probably more for making hardwood floors easier to install than structural integrity.
Edit: The internet says 'start of the 20th century' phased out in 1950's (plywood), and 'sometimes diagonally'.
I don't recall what the software was like, but none of that is why it failed, it was simply a moronic business idea. An overpriced subscription for low quality fruit in a DRM-laden pouch. Nobody wanted it then or now.
It's an instance of the larger pattern in which technical degree programs lag industry requirements by decades, as older faculty ossify at the state of the art circa 2-3 years prior to when they received tenure.
IMO one way to help would be to get rid of the entire notion of a "Professor".
Instead, courses should be taught primarily by a combination of professional instructors on permanent contracts and teacher-practitioners supported by the instructors. The instructors should have occasional sabbaticals for the professional instructors to embed in firms and ensure they're up to date on the industry.
The research side of the university can even more easily replace Professors and tenure with first-line lab managers on 3-5 year contracts whose job is simply to apply for grants and run labs, and who can teach if they want but are held to the same standards as any other applicant for an ad junct teaching position in any particular term.
I think one barrier to better engineering programs in universities is that there typically is an onerous set of "accreditation requirements" which prevents significant modification of the curriculum to adapt to modern needs.
The other barrier is that students culturally appear to not always want to do more coding than needed. Courses involving coding were widely regarded as the most difficult by the people around me, despite something like up to 80% of an EE class going into SW engineering after graduating.
I think in general, degree programs are designed to be something that they're not used for anymore often. The usual line is that they're designed to provide a track to academia, and aren't vocational training. But nowadays degrees seem very ritualistic and ornamental - it seems that people are doing their learning on the job mostly whatever they do, and the relevance of the degree itself is just a shibboleth of some sort.
This seems to be rapidly dissolving, at least in California. Several schools including Stanford, Cal Tech, and several of the UCs have dropped ABET accreditation for most of their programs in recent years, with more likely to follow as they come up for renewal.
No sane person would expect a programmer to just design a lithium battery charge circuit that goes in your user's pocket, that'd be reckless and dangerous. I likewise would never expect a programmer to break out an oscilloscope to debug a clock glitch, or characterize a power supply, or debug protocol communication one bit at a time. I wouldn't ask a programmer to do FCC pre-validation, or to even know what that means.
Why then do you want to rely on an EE to produce critical software?
As an EE, I know my limits and how to separate concerns. I keep my bad C++ restricted to firmware and I simply do not write code further up the stack. We have programmers for that. Where the app needs to talk to the device, I'll write an example implementation, document it, and pass it off to the programmer in charge. It's their job to write good code, my job is to keep the product from catching fire.
If you want good code, hire a programmer. If you want pretty firmware, hire a programmer to work with your EEs. If you expect an EE to write code, you get EE code because their specialization is in electronics.
Unless you really want an EE who is also a software engineer, but then you're paying someone very highly skilled to do two jobs at once.
Electronics and software are two different jobs requiring two differently specialized people. It just looks like they overlap.
Lots of EE’s need to do some programming, and lots of people getting EE degrees end up in programming careers, so it would be a disservice not to teach them any programming at all. In particular, an engineer should be taught enough programming to put together a matlab or Numpy program, right?
Meanwhile, some of their classmates will go on to program microcontrollers as their job.
Writing programs and a product, and writing programs to help design a product, are two basically different types of jobs that happen to use similar tools, so it isn’t that surprising that people mix them up.
I think part of the thing is Software Engineers haven't been a thing for as long in the industry. I'm the only Software Engineer I've met doing controls. My supervisor has a CS degree and an Electrical Technician diploma, but never another SE.
Second is I think up until recently, the work done by Control Systems has been what's capable of an EE or ME so having a SE hasn't been necessary. I've been with my company for 10 years now, and in that time I've watched the evolution of what my clients are seeking in terms of requirements to their systems.
I primarily work in Agriculture or Food Production. 10 years ago my projects were assembling plants and getting their motors to start, with the required protections then some rudimentary automation to align paths and ensure flow.
Today? I'm building traceability systems to report on exactly which truck load was the origin of contamination for a railcar shipped months later. Or integrating production data to ERP systems. Adding MES capabilities to track downtimes and optimize production usage. Generating tools to do root cause analysis on events... It's a different world and the skills of a Software Engineer haven't really been a super important role for quite a while.
I think the mindset is shifting, but it is slow.
Because of this, my personal belief is that the justification for any line of code belongs in the SCM commit message that introduced the code. `git blame` should ultimately take you to a clear, stand-alone explanation of “why” — one that, as a bonus, can be much longer than would be tolerated inline in the code itself.
Of course, I’m probably pretty unusual, in that I frequently write paragraphs of commit message for single-line changes if they’re sufficiently non-obvious; and I also believe that the best way to get to know a codebase isn’t to look at it in its current gestalt form, but rather to read its commit history forward from the beginning. (And that if other programmers also did that, maybe we’d all write better commit messages, for our collective future selves.)
...but no, to be serious for a moment: this isn't really a workable idea as-is, but it could be. It needs discoverability — mostly in not just throwing noise at you, so that you'll actually pay attention to the signal. If there was some way for your text editor or IDE to not show you all the `git blame` subject lines, but just the ones that were somehow "marked as non-obvious" at commit time, then we could really have something here.
Personally, I think commit messages aren't structured enough. Github had the great idea of enabling PR code-reviewers to select line ranges and annotate them to point out problems. But there's no equivalent mechanism (in Github, or in git, or in any IDE I know of) for annotating the code "at commit time" to explain what you did out-of-band of the code itself, in a way that ends up in the commit message.
In my imagination, text-editors and IDEs would work together with SCMs to establish a standard for "embeddable commit-message code annotations." Rather than the commit being one long block of text, `git commit -p` (or equivalent IDE porcelain) would take you through your staged hunks like `git reset -p` does; but for each hunk, it would ask you to populate a few form fields. You'd give the hunk a (log-scaled) rating of non-obviousness, an arbitrary set of /[A-Z]+/ tags (think "TODO" or "XXX"), an eye-catching one-liner start to the explanation, and then as much additional free-text explanation as you like. All the per-hunk annotations would then get baked into a markdown-like microformat that embeds into the commit message, that text-editors/IDEs could recognize and pull back out of the commit message for display.
And then, your text-editor or IDE would:
1. embed each hunk's annotation-block above the code it references (as long as the code still exists to be referenced — think of it as vertical, hunk-wise "show `git blame` beside code");
2. calculate a visibility score for each annotation-block based on a project-config-file-based, but user-overridable arbitrary formula involving the non-obviousness value, the tags, the file path, and the lexical identifier path from the syntax highlighter (the same lexical-identifier-path modern `git diff` gives as a context line for each diff-hunk);
3a. if the visibility score is > 1, then show the full annotation-block for the hunk by default;
3b. else, if the visibility score is > 0, then show the annotation-block folded to just its first line;
3c. else, hide the annotation-block (but you can still reveal the matching annotation with some hotkey when the annotated lines are focused.)
Of course, because this is just sourced from (let's-pretend-it's-immutable) git history, these annotation-block lines would be "virtual" — i.e. they'd be read-only, and wouldn't have line-numbers in the editor. If the text-editor wants to be fancy, they could even be rendered in a little sticky-note callout box, and could default to rendering in a proportional-width font with soft wrapping. Think of some hybrid of regular doc-comments, and the editing annotations in Office/Google Docs.
---
...though, that's still not going as far as I'd really like. My real wish (that I don't expect to ever really happen) is for us to all be writing code as a series of literate codebase migrations — where your editor shows you the migration you're editing on the left, and the gestalt codebase that's generated as a result of running all migrations up to that one on the right, with the changes from the migration highlighted. You never directly edit the gestalt codebase; you only edit the migrations. And the migrations are what get committed to source-control — meaning that any code comments are there to be the literate documentation for the changes themselves; while the commit messages exist only to document the meta-level "editing process" that justifies the inclusion of the change.
Why? Because the goal is to structure the codebase for reading. Such a codebase would have one definitive way to learn it: just read the migrations like a book, front to back; watching the corresponding generated code evolve with each migration. If you're only interested in one component, then filter for just the migrations that make changes to it (`git log -S` style) and then read those. And if you, during development, realize you've just made a simple typo, or that you wrote some confusing code and later came up with a simpler way to express the same semantics — i.e. if you come up with code that "you wish you could have written from the start" — then you should go back and modify the earlier migration so that it's introduced there, so that new dev reading the code never has to see you introduce the bad version and then correct it, but instead just sees the good version from the start.
In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.
WHY: Using _______ sort because at time of writing code, the anticipated sets are ... and given this programming language and environment ... this tends to be more performant (or this code was easiest and quickest to write and understand).
This way when someone later come along and says WTF?! They know why or at least some of the original developers reasoning for choosing that code implementation.
I don't think anymore that's true, at least in a number of areas.
In another life, I've worked on concurrent data structures in java and/or lock-free java code I'd at this point call too arcane to ever write. The code ended up looking deceptively simple and it was the correct, minimal set of code to write. I don't see any way to express the correctness reasoning for these parts of code in code.
And now I'm dealing with configuration management. Code managing applications and systems. Some of these applications just resist any obvious or intuitive approach, and some others exhibit destructive or catastrophic behavior if approached with an intuitive approach. Again, how to do this in code? The working code is the smallest, most robust set of code I can setup to work around the madness. But why this is the least horrible way to approach this, I cannot express that in code. I can express this in comments.
> you ought to understand what something is doing before fiddling with it
I think understanding before fiddling is one option. But I think a better option is often fiddling and seeing what happens. The trick is to make it so that fiddling is safe. E.g., on a project where I have good test coverage and find mystery code, I can just delete it and see what fails. Or I set up a big smoke test that runs a bunch of input and compares outputs to see what changes.
A lot of bad software is effectively incoherent, so it can't be understood as machinery. Instead it has to be understood in psychological, historical, or archaeological terms. "Well back then they were trying to achieve X, and the programmer doing it was not very experienced and was really interested in trendy approach Y so he tried using library Z, but it wasn't really suited for the problem at hand, so he misused it pretty severely."
That can be interesting, but it's often much more efficient to say, "Who cares how or why this got to be such a tangled mess. Let's solve for the actual current needs."
There was a good discussion of this in the comments, starting at https://www.jefftk.com/p/accidentally-load-bearing#fb-109168...
In addition to tests, you can also add logging to your running system, or make the change behind a flag that you A/B test in production.
And by the time it breaks, will anyone remember the probable cause?
If you're safely fiddling with it, I would consider that part of the process of understanding. I'm particularly prioritizing understanding what is actually doing. Historical context is primarily useful because it will point you in the direction of what else you may need to look at; when I fail to realize that removing X broke Y it's because I didn't realize that there's a reason those are together.
As a practical example, last year I was supposed to figure out why a data collection system was not getting all the data we wanted. The person who worked on it was long gone, so I looked at the code. It was a bunch of stuff using Apache Beam. No tests, no docs. The original author was not a very experienced programmer. It jumbled together a variety of concerns. And after a day of trying to understand, it became obvious to me that some of the uses of Beam were unconventional. Plus Beam itself is its own little world.
The next day, I said "fuck that", and wrote a very simple Python requests-based collector. I pretty quickly had something that was both much simpler and much more effective at actually getting the data. And from there I went into the usual sort of feedback-driven discovery process to figure out what the current users actually needed, with zero regard for original intent of the system was in solving problems for people no longer present.
What was the thing doing? Why did it to it? How did it end up that way? All excellent mysteries that will remain mysteries, as I eventually deleted all the Beam-related code and removed its scheduled jobs. For me, this ignorance was truly bliss. And for others too in that the users got their needs solved with less work, and that developers had something much cleaner and clearer to work with in the future.
Yeah, as always, IMMV.
But I do agree that online discourse puts too much emphasis on statically analyzing systems, and too little on adding instrumentation or just breaking it and seeing what happens.
At the same time, my experience is that on practice people put too much emphasis on instrumentation or just breaking it and seeing what happens, and way too little on statically analyzing the system.
Well, no, because 1. it's not useful, because 2. most people never write anything useful there (which is a two-part vicious cycle), and 3. editors don't usefully surface it.
If we fix #3; and then individual projects fix #2 for themselves with contribution policies that enforce writing good commit messages; then #1 will no longer be true.
> You can't assume anyone reading the code has access to the commit history or the time to read it.
You can if you're the project's core maintainer/architect/whoever decides how people contribute to your software (in a private IT company, the CTO, I suppose.) You get to decide how to onboard people onto your project. And you get to decide what balance there will be between the amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.
> The codebase itself should contain any important documentation.
My entire point is that commit messages are part of "the codebase" — that "the codebase" is the SCM repo, not some particular denuded snapshot of an SCM checkout. And that both humans and software should take more advantage of — even rely upon! — that fact.
> amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.
That's a false dichotomy. The codebase won't be impenetrable if there are appropriate comments in it. In my experience time would be better spent making the codebase more lucid in the source code than an external commit history. The commit messages should be good too but I only rely on them when something is impossible to understand without digging through and finding the associated ticket/motivation, which is a bad state to be in, so at that point a comment is added. Of course good commit messages are fine too, none of this precludes them.
Imho, you are missing out on a great source of insight. When I want to understand some piece of code, I usually start navigating the git log from git blame. Even just one-line commit messages that refer to a ticket can help understanding tremendously. Even the output of git blame itself is helpful. You can see which lines changed together in which order. You see, which colleague to ask questions.
As far as I know changes to the Linux kernel are usally submitted as a series of patches, i.e. a sequence of commits. I.e. a branch, although it is usually not represented as git branch while submitting.
That is one the downsides of trunk-based developments. One keeps a history of all failed experiments, the usefulness of the commit history deteriorates. That is for reading commit messages as well as for bisecting bugs.
> In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.
If you have to "wade through experiments" to read the commit history, that means that the commit history hasn't had a structural editing pass applied to it.
Again: think of your job as writing and editing a textbook on the process of writing your program. As such, the commit history is an entirely mutable object — and, in fact, the product.
Your job as editor of the commit-history is, like the job of an editor of a book, to rearrange the work (through rebasing) into a "narrative" that presents each new feature or aspect of the codebase as a single, well-documented, cohesive commit or sequence of commits.
(If you've ever read a programming book that presents itself as a Socratic dialogue — e.g. Uncle Bob's The Clean Coder — each feature should get its own chapter, and each commit its own discussion and reflected code change.)
Experiments? If they don't contribute to the "narrative" of the evolution of the codebase — helping you to understand what comes later — then get rid of them. If they do contribute, then keep them: you'll want to have read about them.
Features gradually introduced over hundreds of commits? Move the commits together so that the feature happens all at once; squash commits that can't be understood without one-another into single commits; break commits that can be understood as separate "steps" into separate commits.
After factoring hunks that should have been independent out into their own commits, squashing commits with their revert-commits, etc., your commit history, concatenated into a file, should literally be a readable literate-programming metaprogram that you read as a textbook, that when executed, generates your codebase. While also still serving as a commit history!
(Keeping in mind that you still have all the other things living immutably in your SCM — dead experiments in feature branches; a develop branch that immutably reflects the order things were merged in; etc. It's only the main branch that is groomed in this fashion. But this groomed main branch is also the base for new development branches. Which works because nobody is `git merge`ing to main. Like LKML, the output-artifact of a development branch should be a hand-groomed patchset.)
And, like I said, this is all strictly inferior to an approach that actually involves literate programming of a metaprogram of codebase migrations — because, by using git commit-history in this way, you're gaining a narrative view of your codebase, but you're losing the ability to use git commits to track the "process of editing the history of the process of developing the program." Whereas, if you are actually committing the narrative as the content of the commits, then the "process of editing the history" is tracked in the regular git commits of the repo — which themselves require no grooming for presentation.
But "literate programming of a metaprogram that generates the final codebase" can only work if you have editor support for live-generating+viewing the final codebase side-by-side with your edits to the metaprogram. Otherwise it's an impenetrably-thick layer of indirection — the same reason Aspect-Oriented Programming never took off as a paradigm. Whereas "grooming your commit history into a textbook" doesn't require any tooling that doesn't already exist, and can be done today, by any project willing to adopt contribution policies to make it tenable.
---
Or, to put this all another way:
Imagine there is an existing codebase in an SCM, and you're a technical writer trying to tell the story of the development of that codebase in textbook form.
Being technically-minded, you'd create a new git repo for the source code of your textbook — and then begin wading through the messy, un-groomed commit history of the original codebase, to refactor that "narrative" into one that can be clearly presented in textbook form. Your work on each chapter would become commits into your book's repo. When you find a new part of the story you want to tell, across several chapters, you'd make a feature branch in your book's repo to experiment with modifying the chapters to weave in mentions of this side-story. Etc.
Presuming you finish writing this textbook, and publish it, anyone being onboarded to the codebase itself would then be well-advised to first read your textbook, rather than trying to first read the codebase itself. (They wouldn't need to ever see the git history of your textbook, though; that's inside-baseball to them, only relevant to you and any co-editors.)
Now imagine that "writing the textbook that should be read to understand the code in place of reading the code itself" is part of the job of developing the program; that the same SCM repo is used to store both the codebase and this textbook; and that, in fact, the same textual content has to be developed by the same people under the constraints of solving both problems in a DRY fashion. How would you do it?
Regarding your first point:
> I've been in enough projects that changed version control systems
I have the impression that with the introduction of git, it became suddenly en-vogue to have tools to migrate history from one SCM to another. Therefore, I wouldn't settle on restarting from a snapshot anymore.
With git you can cut off history that is too old but weighs down the tools. You can simplify older history for example, while keeping newer history as it is. That is of course not easy but it can be done with git and some scripting.
When was that? I've never seen that in 15-20 years of software development; I've seen plenty of projects change VCS but they always had the option of preserving the history.