PLOS2021: ISO-C became unusable for operating systems(yodaiken.com) |
PLOS2021: ISO-C became unusable for operating systems(yodaiken.com) |
As the paper notes, there are plenty of alternative C compilers available to choose from. The reason why GCC and LLVM ended up attaining overwhelming market share is simply that they produce the fastest possible code, because, at the end of the day, that is what users want.
If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.
No, I think it's more because they are free.
In my experience, ICC can be much better at instruction selection while also not being so crazy with exploiting UB.
Besides, it's irrelevant: there are lots of free C compilers that don't exploit UB, and also rarely get used.
That might mean the differences have mostly disappeared, but that may depend on what the front end (icx vs clang) does.
Well, size_t is unsigned and has defined overflow, so you'd lose the optimization if you switched to it. (Specifically, there's cases where defining overflow means a loop is possibly infinite, which blocks all kinds of optimizations.)
Many languages try to fix this by defaulting to wrap on overflow, but that was a mistake because you rarely actually want that. A better solution is to have a loop iteration statement that doesn't have an explicit "int i" or "i++" written out.
size_t obviates the need for this optimization.
As for the optimization, it is based on misunderstanding of C semantics. The only place where the sign extend makes a difference is where pointers are longer than "ints" AND where the the iterator can overflow, and in that case, sign extend only makes a difference if the loop is incorrectly coded so that the end condition is never true. The code should just provoke a warning and then omit the sign extend (and it almost certainly doesn't make much of a difference since sign extend is highly optimized and has zero cost in a pipelined processor).
That said, I believe the set of undefined behaviours in our current standards is much, much too large - most of these should rightly be filed in the implementation-defined category instead. It is no longer the 70s and the very same modern compilers that perform more and more extreme optimisations year on year really do not need to account for a giant zoo of quirky experimental architectures; over the decades we've basically settled on a consensus on how pointers, integers, floating-point numbers etc ought to work.
There is a large population of C "real programmers" who, when they write a C program that unsurprisingly doesn't work, conclude this must be somebody else's fault. After all, as a real programmer they certainly meant for their program to work, and so the fact it doesn't can't very well be their fault.
Such programmers tend to favour very terse styles, because if you don't write much then it can't be said to be your fault when, invariably, it doesn't do what you intended. It must instead be somebody else's fault for misunderstanding. The compiler is wrong, the thing you wanted was the obviously and indeed only correct interpretation and the compiler is willfully misinterpreting your program as written.
Such programmers of course don't want an error diagnostic. Their program doesn't have an error, it's correct. The compilers are wrong. Likewise newer better languages are unsuitable because the terse, meaningless programs won't compile in such languages. New languages often demand specificity, the real programmer is obliged to spell out what they meant, which introduces the possibility that they're capable of mistakes because what they meant was plain wrong.
At least the warnings are getting a bit better for some of these.
The reason they claim program optimizations aren't important is because you can do it by hand for a specific architecture pretty easily, but you'll still want them when porting to a new one, eg if it wants loop counters to go in the opposite direction.
Does it? Honest question; my impression was that template-heavy code can tend to produce deep call trees, but not necessarily outright unreachable code unless you count instantiations ruled out by SFINAE/std::enable_if/tag dispatching, for which UB-based analyses are not necessary.
In addition, I thought template (meta)programming relied very heavily on compile-time knowledge, which seems to obviate the need for UB-based analyses in many cases.
I'm not particularly experienced, though, so maybe there's a gaping hole I'm missing.
But C code is often heavily inlined, too, and similarly pruned.
The submitter may have left out "development" to fit the title in the character limit.
These optimizations leave for C++ folks. C doesn't need all of that, just leave it as the "high level assembler" that it was in the old days, where if I write an instruction I can picture the assembler output in my mind.
Optimizers should not change the code semantics to me. Unfortunately with gcc it's impossible to rely on optimizations, so the only safe option is to turn them off entirely (-O0).
Or even define the behavior and get your compiler writer to implement it.
ps: If I index past the end of the array... what behaviour are you going to define?
Yes, it does. In fact C needs it more, because of the "for (int i = 0; i < n; i++)" idiom. At least idiomatic C++ code uses iterators.
In Fortran the aliasing rules are even stricter: given two arrays passed in as arguments the compiler can assume that they do not overlap, for example. I remember messing that up as a student long ago and getting strange results. The Fortran rule was to enable vectorization, which has been done for many decades.
In any case, for people who want to write OS or cryptography or embedded systems or arithmetic libraries or ... in C, this is not a relevant point.
How much slower would removing the UB footgun make scientific code? Do you have benchmarks? Because we have endless firsthand accounts that it makes standard C useless.
To this it's impossible to write C without using a ton of non standard attributes and compiler options to just make it do the correct thing.
That is what it does. "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out. You can make a "defined C" with `-fsanitize-trap=undefined`, but C was never a high level assembler, and performance is critical for C users too.
This gets more complicated when an inlined function calls another inlined function. The whole inner function may be in the middle of dead code, and thus everything it calls, too, and so all be elided. This sort of thing happens all the time. These elisions don't typically make code obviously much faster, cycle-wise, but they do make it smaller, with a smaller cache footprint. Cache footprint has a huge effect on overall performance.
In principle, the compiler could arrange to put the dead code in some other page, with a conditional jump there that never happens, consuming just another branch prediction slot. But branch prediction slots are cache, and the branch, though never taken, nonetheless consumes a slot. A processor ISA could, in principle, have a conditional branch instruction that is assumed by branch prediction always to go one way, and so not need to consume any branch prediction cache. But I don't know of any. RISC-V does not seem to be entertaining plans to specify such a branch.
Several extant chips do allow "hint" prefixes for branches, but I think they are ignored in current chips, according to experience where they were determined generally to be wrong. This is unfortunate, as sometimes the most frequently taken direction is not the one you want to be fastest. E.g., when spinning while watching an atomic flag, you want the looping branch to be assumed not taken, to minimize latency once the flag is clear, even though it most frequently is taken in recent history. (High-frequency trading code often has to resort to trickery to get the desired behavior.)
(There is a famous story about engineers at Digital Equipment Corporation, whence we got PDP-11s and Vaxen, and thus, indirectly, Unix. A comprehensive cycle count showed that a huge proportion of the instructions executed in their OS kernel were just one instruction. They moved heaven and earth to optimize this one instruction, but the kernel with the new instruction was exactly 0% faster. It turned out the instruction was used only in the idle loop that ran while waiting until something useful could be done. This really happened, and the same mistake is made again and again, to this day: e.g., people will sincerely swear that "rotl" and "popcnt" instructions are not important, on about the same basis.)
Intel recently implemented an atomic test-and-loop instruction that just stalls execution until an identified cache line is touched, thus avoiding the branch prediction stall getting out of the loop. I have not heard of anybody using it.
I know people recommend rust for this kind of thing, but Rust really isn't appropriate in a lot of cases, especially when dealing with microcontrollers not supported by llvm (ie PIC, 8051 off the top of my head).
This may be changing, but I was also under the impression that Rust can't easily produce as small binaries as C can.
We'll wait...
The problem is that C is sufficiently primitive that optimizing it is effectively trying infer structure backwards because the language doesn't specify it.
For-loops are a great example. Why are we even discussing about a "loop index"? Because we need to iterate over an array, string, etc. and "length" isn't something stored with the data structure so the compiler can't do the loop for you.
The real question is probably more along the lines of "Should the C standard start pinning things down more than it does?"
The answer is probably yes, but it's not straightforward to do. Look at just how much verbiage it took to create a decent memory model. And, still, using atomics in C (as opposed to C++) is brutal and the compiler can't help you very much.
My favourite is how "x + 1 < x" is optimized away for signed ...
None of pgcc, sdcc, tcc, lcc and friends work with current code bases, plus they have severe bugs.
I certainly do. C and descendants makes you over-specify variables by declaring them all int/size_t/etc individually. It should've had C++11-style "auto" from the start and there should be a statement like "for i=0..n" that declares "i" the same type as "n".
"int count; … for (int i = 0; i < count; ++i) {}" can't overflow so the optimization is always valid.
"int count; … for (int i = 0; i < count; i += 2) {}" is more of a problem.
I don't think "have a 64-bit int type" is the right approach for a new language either… we should be aiming for safety. If a variable's valid values are 0-50 then its type should be "integer between 0 and 50", not "integer between 0 and UINT64_MAX". Storage size should be an implementation detail.
Look for the "range" keyword
Can you expand on this? I've never heard of someone seriously running actual C on a GPU.
In any case, thanks for correcting me!
You're probably thinking of implementation-defined behaviour, which is the term for behaviour that, although not fully specified by the standard, is a valid behaviour for the program.
Undefined behaviour is different. From the article: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose."
Many people don't notice the distinction, and their intuition about what compilers should do with their code matches implementation-defined behaviour; and therefore they complain when, for UB, it does not behave this way.
There is a very strong argument for reclassifying many specific UBs as implementation-defined behaviours, but that's a rather different conversation from "the compiler should always do something sensible when encountering UB".
It's not preserving semantics, it's inferring a valid program from an invalid one, but since the programmer entered an invalid program, unilaterally deciding that the valid program was the intended program seems dubious at best. These should really be errors, or warnings at the very least.
Or are you saying that UB is not from the C++ community? That could certainly be true, I am just describing the situation as I see it where C++ disallows more types of casting or reinterpreting than C does (even if compilers allow it).
One salient example I heard was that the only way to legally read the bits of a float in C++ is to memcpy the float onto an integer type. whereas in C its legal (I believe by using a union).
You can argue that there is no semantic of the code if it invokes undefined behavior. What is the correct "semantic" of x+1 if x is the maximum representable value of a signed integer type? The language doesn't specify it, in fact it explicitly calls this undefined behavior; it's absolute nonsense. Should the compiler apply AI to determine the semantic intent?
> I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.
This reminds me of the common misconception of C as a "portable assembly language". The standard is largely machine independent in that it's concerned with the effects of execution rather than how those effects are implemented by the compiler and eventually carried out by a real machine. As a result, you can write several pages of code and have the compiler fold it down into "mov eax, 123 / ret" and nothing that concerns the C language will have been lost.
If you have some expression that correctly folds into "false", none of the effects of the program as far as the C standard is concerned have changed. Yes, some undefined behavior may manifest differently depending on the optimizations made, but the language is not concerned with that.
The problem is very much with the language itself. Overflow in signed integers is undefined behavior. The optimizer correctly won't consider it as a constraint to optimization. The language could as easily have defined signed integer arithmetic as modular. The standard does not have a "compiler should not change the semantic of the code" clause that applies to undefined behavior. As a general rule, the C standard is basically the antithesis to it. The compiler gets free reign over the effects of code where the behavior is undefined by the C standard, which leaves a lot of UB holes.
x + 1 < x for signed x is always false for values of x where the operation is at all defined in the C language. The good optimizer correctly solves for the constraints of the language and folds the operation into a constant false accordingly, in the same manner it would fold other expressions.
That you believe that there is any x for which this would be true is based on assumptions that the C language doesn't make. You likely assume that signed integer arithmetic should be modular as a consequence of its 2's complement bit representation. These are not assumptions that the C language makes. Signed integers don't have to be modular. They don't have to be 2's complement. It might be harder than you think to maintain a set of constraints and assumptions in addition to those specified by the C language. It would be these additions that would be special cases, not the optimizations.
If there's anything you should have a beef with, it's the language itself. Don't use C if you aren't ready to either be caught off guard with unexpected consequences undefined behavior at run time or have the patience to very carefully avoid it by learning what invokes it.
I think much of the "undefinedness" of the language comes from the fact that multiple implementations of the "C language" existed long before standardization. The subtle differences in these implementations and their stake in the standardization process meant that a lot of things simply couldn't be defined because it would invalidate existing implementations.
All this wouldn't be an issue if C were just some application language, but it's what all of computing is built on. It really should be simple by default and without adding more footguns than what directly programming in assembly gives you.
I wouldn't mind if all the assumptions-based optimisations were made opt-in. They are obviously useful in some way, but they're impossible to allow globally in a large legacy codebase. Which is pretty much every C codebase.
At the root is taking a “assume the programmers know what they are doing” language and turning it into a “assume the programmers are drooling morons” language.
For example: the compiler is assuming someone isn't ignorant and knows, for example, if they want to write a variable as one type and read it as another, they need to consider two things: they are now in implementation-dependent territory (big endian vs little endian effects, for example), and they need to use unions, not just casts, if the same data can be interpreted as two different types and neither is array of char. Failure to use unions properly triggers aliasing problems.
Specifically: if you have a union with members a and b, and you assign into member a, then reading from member b afterward is, by the C and C++ Standards, usually undefined. You are presumed to have some way to know that member a is live, and use that. You can then write into b, and later read that back out.
C does have a thing where a void pointer can be copied into any other pointer, without a cast, and operations with the typed pointer are OK. Technically, the pointer value is supposed to have been copied to the void pointer from the same type, but realloc has to work. Note, realloc does give you alignment guarantees you don't get from pointers to random things.
Nothing else is portable. Unions, other pointer casting, all is UB unless your compiler extends the definition. So, Gcc has an extension to make unions, used in a certain way, well-defined. Probably Clang copies that. You would have to look up exactly what. But the memcpy hack works in all compilers. You won't use it often enough for the extra verbosity to be a problem.