Weekend projects: getting silly with C(lcamtuf.substack.com) |
Weekend projects: getting silly with C(lcamtuf.substack.com) |
It certainly could do though. In C, using an uninitialised variable does not mean "whatever that memory happened to have in it before" (although that is a potential result). Instead, it's undefined behaviour, so the compiler can do what it likes.
For example, it could well unconditionally initialise that memory to 123. Alternatively, it could notice that the whole snippet has undefined behaviour so simply replace it with no instructions, so it doesn't print anything at all. It could even optimise away the return that presumably follows that code in a function, so it ends up crashing or doing something random. It could even optimise away the instructions before that snippet, if it can prove that they would only be executed if followed by undefined behaviour – essentially the undefined behaviour can travel back in time!
I'm not enough of a specification lawyer to say that this is definitely true, but the reasoning and example given there seems sound to me.
[1] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...
Without undefined behavior, the compiler emits code that has the behavior defined by the code —- the ordering may be altered, but not the behavior.
A compiler's analysis can go backward in time. That is to say, the compiler can build a model of what happens in some section of code over time, and analyze it whichever way it wants.
You cannot go back in time from execution time to translation time, but the translator can follow the code as if it were executing it at translation time.
Static objects are always initialized, so the situation cannot arise.
That leaves dynamic ones, like uninitialized struct members in a malloced structure.
Accessing uninitialized dynamic memory means isn't undefined behavior in C. It results in whatever value is implied by the uninitialized bits. If the type in question has no trap representations, then it cannot fail.
switch(k) {
if (0) case 0: x = 1;
if (0) case 1: x = 2;
if (0) default: x = 3;
}
which is a switch where you don't have to write break at the end of every clause. #define brkcase if (0) case
That might be worth using. Compilers won't love the control flow but they'll probably delete it effectively. #define brkcase break;case
kinda defeats the purpose of the macro even.Incidentally, what happens if you use your brkcase as the first case?
I don't find either particularly exciting - a macro that would append break to the current case feels better
I'll confess, I've used this construct to mean "omit the first line of the next case label but otherwise fall through".
If you think of the case label as merely a label and not a delimiter between statements all of this makes sense.
case 1 ... 10:
Is valid C? I have been programming in C for years, what standard is this from? #include <stdio.h> // compile & run: gcc -Wall countdown.c -o countdown && ./countdown
int n = 10; int main(int argc, char *argv[]) { printf("%d\n", n) && --n && main(n, NULL); }
Python version: import sys # run: python3 countdown.py 10
def main(n:int): sys.stdout.write(f"{n}\n") and n-1 and main(n-1)
main(int(sys.argv[1]))
Shell version: # run ./countdown.sh 10
echo $1 && (($1-1)) && $0 $(($1-1)) import sys # run: python3 countdown.py 10
def main(n:int): print(n) or n-1 and main(n-1)
main(int(sys.argv[1]))
This also works and is definitely more Pythonic: _ = [print(n) for n in range(10,0,-1)] 4[arr] // same as arr[4] arr[i][j]
j[i[arr]]
These are the simplifications you'd do. You only need to know that a[x][y] is equivalent to (a[x])[y], and that a[x] is the same as x[a]. arr[i][j]
(arr[i])[j]
(i[arr])[j]
j[i[arr]]https://stackoverflow.com/questions/34559705/ternary-conditi...
Duff is trying to optimise MMIO, you wouldn't do anything close to this today even in C, not least because your MMIO is no longer similarly fast to your CPU instruction pace and for non-trivial amounts of data you have DMA (which Duff's hardware did not). In a modern language you also wouldn't treat "MMIO" as just pointer indirection, to make this stay working in C they have kept adding hacks to the type system rather than say OK, apparently this is an intrinsic, we should bake it into the freestanding mode of the stdlib.
Edited to add:
For my money the successor to Tom Duff's "Device" is WUFFS' "iterate loops" mechanism where you may specify how to partially unroll N steps of the loop, promising that this has equivalent results to running the main loop body N times but potentially faster. This makes it really easy for vectorisation to see what you're trying to do, while still handling those annoying corner cases where M % N != 0 correctly because that's the job of the tool, not the human.
That's just a special case of being able to intermingle switch with arbitrary syntax, which is what TFA does, before it jumps to computed gotos.
While you can't use SIMD you can still benefit from instruction-level parallelism.
It's potentially better in some scenarios where you want to minimize instruction cache usage and there are few iterations of the loop.
#include <stdio.h>
#include <stddef.h>
int main()
{
{
int *p = NULL;
if (p)
{
what:
printf("a = %d\n", *p);
return 0;
}
int a = 123;
p = &a;
goto what;
}
}I've seen this in the wild, particularly with macros.
#define assert(c) if (!c) ...
if (foo) assert(...);
else bar(); // oops!Let's stop getting silly with C, too many CVEs!
---
Serious comment:
It's a rather cool article actually. Not something I'd do daily but it's kind of sort of useful to know these techniques.
Metaprogramming custom control structures in C by Simon Tatham
I will never love anything as much as I love C, but C development jobs lie in really weird fields I’m not interested in, and I’m fairly certain I am not talented enough. I have seen C wizardry up close that I know I simply cannot do. However, one of the more useful exercises I ever did was implement basic things like a file system, command line utilities like ls/mkdir etc. Sometimes they are surprisingly complex, sometimes no.
After you program in C for a while certain conventions meant to be extra careful kind of bubble up in languages in a way that seems weird to other people. for example I knew a guy that’d auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1). The former will not compile if you accidentally use variable assignment instead of equality operator (which everyone has done at some point).
This tendency bites me a lot in some programming cultures, people (ime) tend to find this style of programming as overly defensive.
No need for Yoda notation. clang will warn of this by default and gcc will do so if you compile with -Wall, which should also be your default.
I've seen that one and personally dislike that mindset: Making the code less readable to compensate for a disinterest in using actual static analysis tooling.
if (987654321987654321==x)
if (find_optimal(frobnicator(x)*8)==1)
while these are subjectively more readable:if (x==987654321987654321)
if (1==find_optimal(frobnicator(x)*8))C evolved a lot and many foot guns are not a problem anymore. For example for
if (x = 1)
you nowaday get a warning. https://godbolt.org/z/79acPPro6
Implicit int, calling functions without prototypes, etc. are hard errors. And so on.
https://docs.gtk.org/glib/data-structures.html
It could be fun to do a lab summary after the lists and hashes introduction.
Have a wonderful day, =)
I have written C at least a few times per year for over 30 years. About ten years of that was OS development on Solaris and its derivatives.
Articles like this show crazy things you can do in C. I’ve never found the need to do things like this and have never seen them in the wild.
The places that wizardry is required are places like integer and buffer overflow, locking, overall structure of large codebases, build infrastructure, algorithms, etc. Many of these are concerns in most languages.
> auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1)
When I was a student in the 90s advice like this would have been helpful. Compiler warnings and static analyzers are so much better now that tricks like this are not needed.
That's not so much of a footgun anymore - the common C compilers will warn you about that so there's not much point in defending against it.
Same with literal format string parameters to printf functions: the compiler is very good at warning about mismatched types.
switch (a) {
brkcase 0: foo();
case 1: bar();
}
so in a sense the `if (0) case` trick also affects the previous case, not the current one. But that one also falls apart when there are multiple statements under the brkcase.I think you really have to squint your eyes to see the similarities, beyond the general theme of exploiting the counterintuitive properties of switch statements.
The warning also says that it's an assignment. It's a pretty clear warning meant to force the programmer to do extra work to get the error.
In general, digging deep enough down a stack, and it drops back into the gsl:
https://www.gnu.org/software/gsl/
Indeed, first month attrition rates for interns at some companies is over 46%. =3
printf("hello\n"); // this doesn't have to print
x = x / 0; // because this is effectively a notreached() assertionhttps://godbolt.org/z/8nbbd3jPW
I've often said that I've never noticed any surprising consequences from UB personally. I know I'm on thin ice here and running risk of looking very ignorant. There are a lot of blogposts and comments that spread what seems like FUD from my tiny personal lookout. It just seems hard to come across measureable evidence of actual miscompilations happening in the wild that show crazy unpredictable behaviour -- I would really like to have some of it to even be able to start tallying the practical impact.
And disregarding whatever formulations there are in the standard -- I think we can all agree that insofar compilers don't already do this, they should be fixed to reject programs with an error message should they be able to prove UB statically -- instead of silently producing something else or acting like the code wouldn't exist.
Is there an error in my logic -- is there a reason why this shouldn't be practically possible for compilers to do, just based on how UB is defined? With all the flaws that C has, UB seems like a relatively minor one to me in practice.
Another example: https://godbolt.org/z/b5j99enTn
This is an adaption from the Raymond Chen post, and it seems to actually compile to a "return 1" when compiling with C++ (not with C), at least with the settings I tried. And even the "return 1" for me is understandable given that we actually hit a bug and there are no observeable side-effects before the UB happens. (But again, the compiler should instead be so friendly and emit a diagnostic about what it's doing here, or better return an error).
Un-comment the printf statement and you'll see that the code totally changes. The printf actually happens now. So again, what uecker says about observable effects seems to apply.
Even things like “printf” can’t be implemented purely in standard C. Even making a syscall is outside of the scope of the C standard.
This should be a suite of intrinsics. It's the same mistake as "register" storage, a layer violation, the actual mechanics bleeding through into the abstract machine and making an unholy mess.
If you had intrinsics it's obvious where the platform specific behaviour lives. Can we "just" do unaligned 32-bit stores to MMIO? Can we "just" write one bit of a hardware register? It depends on your platform and so as an intrinsic it's obvious how to reflect this, whereas for a type qualifier we have no idea what the compiler did and the ISO document of course has to be vague to be inclusive of everybody.
But this is all opinions and terms such as "unholy mess" etc do not impress me. In my opinion "volatile" is just fine as is "register. Neither are layer violations nor a type system problem. That the exact semantics of a volatile access are implementation defined seem natural. How is this better with an intrinsic? What I would call a mess are the atomics intrinsics, which - despite being intrinsics - are entirely unsafe and dangerous and indeed mess (just saw a couple of new bugs in our bug tracker).
[1]. https://developercommunity.visualstudio.com/t/Invalid-optimi...
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf "Any other behavior during execution of a program is only affected as a direct consequence of the concrete behavior that occurs when encountering the erroneous or non portable program construct or data. In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program."
I should point out that compilers also generally do not do true time-travel: Consider this example: https://godbolt.org/z/rPG14rrbj
The issue you linked to is not a counter example because, as the poster said, g may terminate the program in which case that snippet does not have undefined behaviour even if b is zero. The fact that they bothered to mention that g may terminate the program seems like an acknowledgement that it would be valid to do that time travelling if it didn't.
> Note that blog post is correct about C++ but incorrectly assumes this is true for C as well.
Presumably you're referring to this line of the C++ standard, which does not appear in the C standard:
> However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
I looked at every instance of the word "undefined" in the C standard and, granted, it definitely didn't have anything quite so clear about time travel as that. But it also didn't make any counter claims that operations before are valid. It pretty much just said that undefined behaviour causes behaviour that is undefined! So, without strong evidence, it seem presumptuous to assume that operations provably before undefined behaviour are well defined.
Note that I am a member of WG14. We added more clarification to C23 to make clear that this is not a valid interpretation of UB, see here: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
> The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
A static array is "an object" already. A pointer to the middle of it is not a new object.
The lack of clarity in earlier standards made it impossible to deal with code incrementally, since all the unknown execution paths could potentially breach back in time and smash your semantics.
> In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program.
I consider that a change in the standard but, of course, that's allowed, especially as it's backwards compatible for well defined programs.
The wording is a little odd: it makes it sound a like you need some undefined behaviour in order to make the operations beforehand work, and, taken very literally, that operations between two undefined behaviours will work (because they're still "before an operation with undefined behavior"). But I suppose the intention is clear.
Note that the "for which" IMHO already makes this clear that this can not travel in time. When everything could be affected these words ("for which") would be meaningless.
- if a compiler finds that condition A would lead to UB, it can assume that A is never true - that fact can "backpropagate" to, for example, eliminate comparisons long before the UB.
Here is an older discussion: https://softwareengineering.stackexchange.com/q/291548
Is that / will that no longer be true for C23? Or does "time-travel" mean something else in this context?
int foo(int x)
{
printf("%d\n", x);
fflush(stdout);
return 1 / x;
}In the following example
int foo(int x)
{
if (x) bar(x);
return 1 / x;
}the compiler could indeed remove the "if" but not because it were allowed to assume that x can never be zero, but because 1 / 0 can have arbitrary behavior, so could also call "bar()" and then it is called for zero and non-zero x and the if condition could be removed (not that compilers would do this)
Unfortunately I don't think there is a good fix for that.
There is unconditional use of a pointer b, which is UB if b is null. However, there is an earlier branch that checks if b is null. If we expected the UB to "backpropagate", the compiler would eliminate that branch, but both gcc and clang at O3 keep the branch.
However, both gcc and clang have rearranged the side effects of that branch to become visible at the end of the function. I.e. if b is null, it's as if that initial branch never ran. You could observe the difference if you trapped SIGSEGV. So even though the compiler didn't attempt to "time-travel" the UB, in combination with other allowed optimizations (reordering memory accesses), it ended up with the same effect.
Hoisting the loop invariant div is an important optimization, but in this case I think the compiler could preserve both the optimization and the ordering of the side effects by loop-peeling.
That, plus a modicum of reasoning like "if this were to be evaluated, it would be UB" (therefore, let's assume that is not evaluated).
If you want an example of GCC removing a side effect that happens-before provable subsequent UB: https://godbolt.org/z/PfoT8E8PP but I don't find it terribly interesting as the compiler warns here.
The only way the program can avoid perpetrating undefined behavior in the statement "x = x / 0" is if it does not execute that statement.
Thus, to assume that the program does not invoke undefined behavior is tantamount to assuming that the program does not execute "x = x / 0".
But "x = x / 0" follows printf("hello\n") unconditionally. If the printf is executed, then x = x / 0 will be executed. Therefore if the program does not invoke undefined behavior, it does not execute printf("hello\n") either.
If the program can be assumed not to execute printf("hello\n"), there is no need to generate code for it.
Look at the documentation for GCC's __builtin_unreachable:
> Built-in Function: void __builtin_unreachable (void)
> If control flow reaches the point of the __builtin_unreachable, the program is undefined. It is useful in situations where the compiler cannot deduce the unreachability of the code.
The unreachable code assertion works by invoking undefined behavior!
edit: this is in addition to the guarantees with regard to side effects that uecker says the C standard provides.
This is far from obvious -- especially once SIGPIPE comes into play, it's quite possible that printf will terminate the program and prevent the undefined behavior from occurring. Which means the compiler is not allowed to optimize it out.
The only issue is that writing to a stream is visible behavior. I believe that it would still be okay to eliminate visible behavior if the program asserts that it's unreachable. The only reason you might not be able to coax the elimination out of compilers is that they are being careful around visible behavior. (Or, more weakly, around external function calls).
#include <stdio.h>
int f(int y, int a) {
int x, z;
printf("hello ");
x = y / a;
printf("world!");
z = y / a;
return x+y;
}
In godbolt, it seems the compiler tends to combine the two printfs together. So if a=0, it leads to UB between the printfs, but that wont happen until after the two printfs. Here the UB is delayed. But will the compiler actually make sure that in some other case, the x/a won't be moved earlier somehow? Does the compiler take any potentially undefined behavior and force ordering constraints around them? ...The whole point of UB is to be able to optimize the code as if it doesn't have undefined behavior, so that we all get the maximum optimization and correct behavior as long as there's no UB in the code.Fact is that in this instance, the compiler did not remove a basic black of code (including or excluding "observeable side-effects" leading up to the point of UB happening). It would not be valid for the compiler to assume that the path is never taken in this case, even assuming that UB never happens, because depending on the the value of the variables, there are possible paths through the code that do not exhibit UB. In other words, "the compiler wasn't able to prove UB".
So this is not an instance of the situation that we are discussing. The emitted code is just fine, unless a division by zero occurs. Handling division by zero is responsibility of the programmer.
Nobody is arguing that UB can lead to weird runtime effects -- just dereference an invalid pointer or whatever.
The issue discussed is that based on assumptions about UB, the compiler emits code that does not correspond to the source in an intuitive way, for example a branch of code is entirely removed, including any observeable side-effects that logically happened before the UB.
Now the point of the GGP poster is probably that the observeable side-effect (the volatile access) does not happen at all because the UB happens first. But I would classify this case differently -- the volatile access is not elided from the branch.
Further more, it might well be that (and let me assume so) the order of the volatile access and the division operation that causes the UB are probably not defined as happening in a strict sequence (because, I'm assuming again as any reasonable standards layman would, UB is not considered a side-effect (that would kinda defeat the point, disallowing optimizations)). So it's entirely valid for the compiler to order the operation that causes the (potential) UB before the volatile access.
That's literally what happens in my example: the div is hoisted above the volatile read which is an observable side effect. The practical effect is that the expected side effect is not executed even if it should have happened-before the SIGFPE.
uecker claims that the UB should still respect happens-before, and I'm inclined to agree that's an useful property to preserve.
And I don't see any significant difference between my example and what you are arguing.
extern volatile int x;
int ub() {
int r = x;
r += 1/0;
return r;
}
and the output is ub:
mov eax, DWORD PTR x[rip]
ud2
I don't see what is the side effect that you say is removed here?As for the earlier example (hoisting the division out of the loop), I was going to write a wall of text explaining why I find the behaviour totally intuitive and in line with what I'd expect.
But we can make it simpler: The code doesn't even have any observeable side effect (at least I think so), because it only reads the volatile, never writes it! The observeable behaviour is exactly the same as if the hoist hadn't happened. I believe it's a totally valid transformation, at least I don't have any concerns with it.
Here I've inserted an increment of the volatile (i.e. a write access) at the start of the loop. If the divisor is 0, in the optimized version with the division hoisted out of the loop, the increment will never actually happen, not even once. Whereas it should in fact happen 1x at the beginning of the first loop iteration with "unoptimized" code.
I don't find this offputting: First, the incrementing code is still in the output binary. I think what is understood by "time travel", and what would be offputting to most programmers, is if the compiler was making static inferences and was removing entire code branches based on that -- without telling the user. If that was the case, I would consider it a compiler usability bug. But that's not what's happening here.
Second, I think everybody can agree that the compiler should be able to reorder a division operation before a write access, especially when hoisting the division out of a loop. So while maybe an interesting study, I think the behaviour here is entirely reasonable -- irrespective of standards. (But again, I don't think uecker, nor anyone else, said that the compiler may never reorder divisions around side-effecting operations just because the division "could" be UB).
I think that discussing about omitting branches is a red herring, there is no expectation that the compiler should emit branches or basic blocks that match the source code even in the boring, non-ub case.
The only constraint to compiler optimizations is the as-if rule and the as-if rule only requires that side effects and their order be preserved for conforming programs. Uecker says that in addition to conforming programs, side effects and their ordering also need to be preserved up to the UB.
I do of course also find it unsurprising that the idiv is hoisted, but, as the only way that the standard can constraint compilers is through observable behaviour, I don't see how you can standardize rules where that form of hoisting is allowed while the other are not.
In fact the compiler could easily optimize that loop while preserving the ordering by transforming it to this:
extern volatile int x;
int ub(int d, int c) {
int r;
x += 3;
r += x;
int _div = d / c;
r += _div;
for (int 2 = 0; i < 100; ++i) {
x += 3;
r += x;
r += _div;
}
return r;
}
This version preserves ordering while still optimizing away the div. In fact this would also work if you replaced the volatile with a function call, which currently GCC doesn't optimize at all.Now I'm not able to reproduce the issue with a guaranteed UB. I still think the loop variant shows the same problem though.
In any case, yes, according to the C standard a volatile read counts as an observable side effect.
And I think I can agree that under a strict interpretation of the rule that UB doesn't get reordered with observable behaviour the GCC output in the godbolt is wrong.
Maybe it has something to do with the fact that it's volatiles? I've hardly used volatiles, but as far as I know their semantics have traditionally been somewhat wacky -- poorly understood by programmers and having inconsistent implementations in compilers. I think I've once read that a sequence of volatile accesses can't be reordered, but other memory accesses can very well be reordered around memory accesses. Something like that -- maybe the rules in the compiler are too complicated leading to an optimization like that, which seems erroneous.
But look at this, where I've replaced the volatile access with a printf() call as you describe: https://godbolt.org/z/Ec8aYnc3d . It _does_ get optimized if the division comes before the printf. The compiler seems to be able to do the hoisting (or maybe that can be called "peeling" too?). But not if you swap the two lines such that the printf comes before the division. Maybe the compiler does in fact see that to keep ordering of observable effects, it would have to duplicate both lines, effectively duplicating the entire loop body for a single loop iteration. In any case, it's keeping both the printf() and the div in the loop body.
If the first div happens before the first printf, then it can be CSE out of the loop as any trap would have happened before the printf anyway, so no reordering, and if it didn't trap the first time it wouldn't have trapped later either. In this case CSE is fine and there are no reordering.
If the div happens after printf, then reordering is prohibited not only to preserve side effects before UB (which we have seen GCC doesn't necessarily respects), but because for the most part printf is treated as an opaque function: it could legitimately exit, or longjump out of the function or never return, so on the abstract machine the UB might not happen at all. So it is not safe to hoist trapping instruction like div above opaque functions (but it is safe to sink them).
Still the modification I showed for volatile can be applied as well: peel the first iteration out of the loop so that the first printf can be done before computing the div to be CSEd out of the loop. But GCC doesn't do it although it seems desirable.