Tips for stable and portable software(begriffs.com) |
Tips for stable and portable software(begriffs.com) |
While I write a lot of C, I immediately disagree with the idea that C has a "simple (yet expressive) abstract machine model". Every so often we find a bug which has been present for over a decade, because some new compiler has added a (perfectly legal by the standard) optimisation which breaks some old code.
Picking one example: in the "old days", it was very common (and important for efficiency) to freely cast memory between char, int, double, etc. For many years this was fine, then all compilers started keeping better track of aliasing and lots of old code broke.
Also, while POSIX is a nice base, it stops you using Windows, and also almost every non-trivial program ends up with a bunch of autoconf (which has to be updated every so often) to handle differences between linux/BSD/Mac.
Also, definatly don't distribute code where you use flags like '-pedantic', as it can lead to your code breaking on future compilers which tighten up the rules.
Ugh, just venting, but it helps to know that there are others out there suffering through this :)
Still in programs that I expect to be used on wildly different systems I tend to enable all the flags that are common in the development-builds, and be more conservative in the production/deployed version.
I agree the lack of compliant POSIX on Windows is annoying. However, unless you rely on 3rd-party libraries, you can use #ifdef to write OS-specific code without autoconf.
You compile your code with -pedantic, it works, and then you distribute the source. A user gets that code, compiles it, it works, and they integrate it into their product. Later, that user upgrades or changes their compiler and your code doesn't build anymore because there's a new warning. Now they have to patch your build.
I've seen projects with -pedantics -Werror, which are particularly annoying (-Werror in general to be honest, I understand why people might want it for CI of course).
I'm not sure it makes sense professionally, though, as most codebase won't survive a decade : after three years, the dev team will turn over, and the new team will want to rewrite everything from scratch. Or start rewriting parts of the exisiting system in a new language, until it ultimately eat it up. It may be related to the kind of companies I work with, though (very early stage startups).
Regarding interfaces, I think the author could have gone a step further. There is actually a standard and portable interface system: html/js/css. If you write a dependency free web app using things like webcomponents and other standard techs, you know it will stand time, and it actually matches all the reason why the author want to use C : standard and multiple implementations.
If you're in a web startup, software won't last 3 years, the next team will systematically rewrite.
If you're in the bank, logistics, defense sector, it's very likely the software will go for a decade, as long as it's not killed the first or second year for being a pet project (initial manager left) and having no customer.
I have an old man rant about that actually... that rewrite is typically unnecessary if you actually use discipline when developing and learn how to read code.
I once took on a CakePHP 2 app and another developer asked me how in the world I got into, and understood, the framework so quickly. My secret? I read the CakePHP 2 source code. So many developers learn how to do that very well.
That is highly context sensitive. For example CAD packages are generally decades old.
They can't be rewritten from scratch. There is too much code. Too much of it is domain specific. The features can't change or else customer projects worth billions might suddenly go tits up when they migrate to a newer version (customers don't migrate to newer version very often though).
So, if there is some domain specific use case, worth millions to the software vendor and potentially billions to clients then stability is far more critical than keeping the codebase "modern".
I disagree strongly with one recommendation. This is just an example, but it holds for larger API design in general:
> we could add a fallback to reading /dev/random. [...] However, in this case, the increased portability would require a change in interface. Since fopen() or fread() on /dev/random could fail, our function would need to return bool.
No, definitely not. It is dangerous to expect the application to sanely handle the case of randomness being unavailable when it is never going to occur in practice. On all POSIX platforms, /dev/random exists and will block until sufficient entropy is available. Something would have to go seriously wrong for this to fail. This is so rare that any error handling code for it will never be tested. The most likely outcome of forcing the caller to handle it is that the return value is ignored or improperly handled and the buffer is used uninitialized, leading to a security vulnerability.
My recommendation instead would be to error check your fopen() and fread() calls within get_random_bytes(), and print an error and abort() if they fail. This way if someone's system is improperly configured and /dev/random doesn't work the program will just crash. Same goes for macOS's SecRandomCopyBytes() and Windows' half a dozen calls to use an HCRYPTPROV. This way you still return void and there is no danger of callers improperly handling errors.
In general, unless you're writing safety-critical software, it's fine for your code (or even library code) to abort() in these sorts of exceptional situations when there is no reasonable or safe way to handle the error. If someone truly wants to handle the error, they can just not use your API and do it manually.
I think a more accurate title would be "Tips for stable and portable C programs"
Once excluded, the article goes into depth on a range of things that one could argue Java specifically addresses and in a better, more portable way.
One could argue about GUIs, but the portability of GUIs is not just a Java/Swing problem.
By that logic C isn't portable because it relies on libc.
And what about Windows? It is still used on 80% on all computers? So why is POSIX essential?
It is about people. Documentation, paper trail why some decisions were made, archiving build tools, VMs, dependency source code..
Also C, POSIX and Motif are terrible choice for their fragmentation. Java is very booring, but compiling and debugging 20 years old code is very common.
Generics is going to be fun.
> a subset of C without undefined behavior
There are various projects out there that let you produce C code guaranteed to be free of undefined behaviour, but they're not 'quick fix' solutions, so they're not widely used.
https://www.eschertech.com/products/
https://blog.regehr.org/archives/1069 (ctrl-f for actually)
I would propose doing a web-app if you really care so much about compatibility. Web also allows for more custom widgets.
I don't know if Motif is better at that, but I wouldn't bet on web-apps personally.
I find it difficult to believe that Motif is actually that portable.
Web apps are only as portable as the browser features they use, and the browsers available for the platform. A primarily backend-rendered app, with minimal Javascript is much more portable than the average SPA app.
https://www.archlinux.org/packages/community/x86_64/openmoti... lists as being updated 2020-01-05, and https://sourceforge.net/p/cdesktopenv/wiki/SupportedPlatform... claims that CDE supports a rather lot of platforms (which implies motif), although I'll grant that most of those probably haven't been tested in a while.
Still better watch your step. Features can be removed from the platform. https://stackoverflow.com/a/46689336/
That's just a programming language tailored for transpilation, no?
Theoretical computer science shows us there is no 'one true representation' for algorithms.
When I have a function that takes a struct, and I need it to take a different struct for the same argument due to whatever, that to me screams "make an interface", but I hadn't been programming Java very long before my team started using Go - I've now been writing Go professionally longer than I have Java.
This is actually why I'm pretty bullish on things like RoR, Laravel, et al.
The sheer speed at which they go to a new version that breaks BC is actively making the web less secure. I've lost count of how many times I've found a new client with this software that's been working for years but suddenly broke, only to realize it's on an OS that's EOL, using a version of the framework that's EOL and a version of the language that's EOL. And now it's my job to bring it up to speed.
And typically the hardest part of that? The 3rd party dependencies that are either abandoned and don't support the newer versions of anything, or have moved onto Python 3 and no longer support Python 2.
It's why I vastly prefer something like asp.net core. I know in 5-10 years the code will probably just work with the latest version, and if there's an incompatibility, it's going to tend to be relatively small.
Do you mean bearish? I think you do as I was confused for about half of your comment before I realised
Almost any OS running on a server is going to be POSIX, probably Linux or BSD.
That's what I get for posting pre-coffee :(
Alternatively, NASA host an uglier scan of the document at https://ntrs.nasa.gov/citations/19950022400
Some projects are still using 8 and will be until their EOL date. New projects start on the latest LTS release (11).
"But developers that can exercise discipline and know how to read (and modify) code instead of rewriting cost so much money..." is what you'll typically hear in response to this.
It's cheaper (and often faster) to have cheaper, less disciplined, less experienced developers rewrite something multiple times than it is to have more expensive, more disciplined, more experienced developers write something and maintain it. It's also harder to keep the more experience developers because most developers I work with start looking for another job when their project goes into maintenance.
The typical "we never have enough time/money to do it right the first time but we always have to make the time/money to do it twice" situation.
I can't believe this. I've seen the sheer difference in speed and maintainability a single solid web developer can deliver in a framework they are familiar with versus teams of more Jr developers who spin their wheels for weeks. Rewriting when you don't even understand the starting point is always a waste of money.
> It's also harder to keep the more experience developers because most developers I work with start looking for another job when their project goes into maintenance.
This certainly resonates though. I've been that developer more than once.
I agree completely.
It can be quite shocking just how much damage a poor developer can do to small to medium companies. I know of 1 company that's holding on for dear life right now because they lost their biggest client due to a very poor developer they had employed. I told them 6 months before this all happened to get rid of him, but they didn't. And Corona is just making it that much harder for them to find new work.
Anyway, thank you for these, I'm definitely going to look further here.
Anyway, any complicated thing that can be easily ignored, inevitably will be.
Keeping the threat of undefined behaviour in mind, and taking steps accordingly, rather than complacently ignoring it. C is a highly unsafe language, and the programmer shouldn't forget this.
> any complicated thing that can be easily ignored, inevitably will be.
The demonstrable inability of C programmers to write correct code is a strong argument against the widespread use of C. Even old languages like Ada show that you can use a language much safer than C and still achieve solid performance. Languages like Rust are making further progress on having safety, performance, and programmer-convenience, all at once.
If you use an ultra-safe language like verified SPARK Ada, the language doesn't even allow you to, say, forget to check whether a denominator is zero, or to forget to protect against out-of-bounds array access.
> Must I pepper the code with bounds checks (which are prone to UB too if not done carefully)?
Not necessarily; a tool can help check for undefined behaviour. Static analysers, GCC flags, and tools like Valgrind, can automatically check for out-of-bounds array access, divide-by-zero, or attempting to dereference NULL. [0] Adding your own runtime assertions isn't a crazy idea though, especially for dev builds. If this were the norm in C programming we'd have fewer security vulnerabilities.
C lacks the kind of runtime checks that are 'always on' in languages like Java and C# (out-of-bounds, divide-by-zero, etc). That's not because such checks don't apply to C code, it's because of the minimalist C design philosophy. You have the option to add your own checks, or use tools to do so automatically, but if you develop without any checks anywhere you should expect to have more bugs. Java added them for a reason.
The C++ language has a somewhat different design philosophy, but it's the same reason its std::array class-template has both a runtime-checked at member-function, and an unchecked operator[]. It would be against the design philosophy to force you to pay the runtime overhead for checks, but it gives you the option.
> which are prone to UB too if not done carefully
What kind of error do you have in mind here?
https://stackoverflow.com/questions/3944505/detecting-signed...
"Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers. Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap. Moreover, allowing CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits). But you C programmers insist tooth an nail against that...
Right, but we're talking about a simple bounds check. There should be no need for any arithmetic, just comparison.
> "Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers.
Right. Hence its design philosophy.
> Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap.
Cheap, but perhaps not cheap enough to dismiss entirely. Bounds checking costs a few percent of performance [0], enough to put some people off in some domains such as in the kernel.
It's a pity C makes it difficult to automate just about any kind of check. Checking whether a pointer overruns a buffer that was returned by free, for instance, requires quite a bit of cleverness, as the system has to track the size of the allocated block.
You have to rely on optional compiler features, elaborate static analysis tools (often proprietary and expensive), and dynamic analysis tools like Valgrind. Ada on the other hand enables all sorts of runtime checks by default, but it's easy to switch them all off if you're sure.
> CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits)
What kind of optimisation do you have in mind? Pre-caching?
> But you C programmers insist tooth an nail against that...
'Fat pointers' of this sort have been tried with the C language [1] but I can't see the committee adding them to the standard. Part of C's virtue is that it's extremely slow moving.
I'm not advocating continued widespread use of C though. I hope safe-but-fast languages like Rust do well. We all pay a price for the problems associated with C and, perhaps to a lesser extent, C++. For what it's worth I haven't written serious C or C++ code for a long time.
[0] https://doi.org/10.1145/1294325.1294343 (An old source admittedly)