I Do Not Know C: Short quiz on undefined behavior (2015)

I Do Not Know C: Short quiz on undefined behavior (2015)(kukuruku.co)

214 points by waynecolvin 9 years ago | 178 comments

aidanhs 9 years ago |

My 'favourite' bit of surprising (not undefined) behaviour I've seen recently in the C11 spec is around infinite loops, where

void foo() { while (1) {} }

will loop forever, but

void foo(int i) { while (i) {} }

is permitted to terminate...even if i is 1:

> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate

To make things a bit worse, llvm can incorrectly both of the above terminate - https://bugs.llvm.org//show_bug.cgi?id=965.

pcvarmint 9 years ago | |

It means that empty loops (loops with empty bodies) can be completely removed if the controlling expression has no side effects.

> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

It means while(i) {} can be eliminated as if i were 0, because there are no side effects in the loop expression or the loop body, and what would be the point of the loop if it never terminated on a non-constant expression?

As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.

There are cases where automatically generated C code might have empty loops which are useless.

If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.

TorKlingberg 9 years ago | | |

It's quite common in embedded systems to have the fault handler end with an infinite loop, to give the programmer a chance to attach a debugger an inspect the call stack. Sometimes this behavior is turned on or off with a debug flag, which can trigger this unexpected optimization if the flag is not a compile time constant.

rkv 9 years ago | | |

> If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles

Yes but an infinite loop + sleep is okay, right?

fmap 9 years ago | |

This definition is actually required for the correctness of many standard compiler optimizations such as partial redundancy elimination and code motion.

adamnemecek 9 years ago | |

What's the point of this?

mnarayan01 9 years ago | | |

If the optimizer can determine that "nothing happens" in the loop, it can optimize the loop away without attempting to determine whether or not the loop terminates.

bcoates 9 years ago | | |

It allows the optimizer to assume away the halting problem; all nontrivial loops are obligated to halt.

DSMan195276 9 years ago |

I'll be honest, I didn't find any of these to be particularly surprising. If you've been using C and are familiar with strict-aliasing and common UB issues I wouldn't expect any of these questions to seriously trip you up. Number 2 is probably the one most people are unlikely to guess, but that example has also been beaten to death so much since it started happening that I think lots of people (Or at least, the people likely to read this) have already seen it before.

I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.

junk_disposal 9 years ago |

Honestly, Optimizing compilers will kill C.

It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).

Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!). And if you slip somewhere, your security checks blow up (http://blog.regehr.org/archives/970 https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...) .

Tharre 9 years ago |

I don't think this Q&A format makes for a good case of not knowing C.

I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?

It's easy to spot mistakes in isolated code pieces, especially if the question already tells you more or less what's wrong with it. But that doesn't mean you'll spot those mistakes in a real codebase (or even when you write such code yourself).

moosingin3space 9 years ago | |

This is further compounded by how difficult it is to build useful abstractions in C, meaning that much real-world C consists of common patterns, and reviewers focus on recognizing common patterns, which increases the chances that small things slip through code review.

Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.

hermitdev 9 years ago |

It's worth noting that for example #12, the assert will only fire for debug builds (i.e. the macro NDEBUG is not defined). So, depending on how the source is compiled, it may be able to invoke the div function with b == 0.

eon1 9 years ago |

C also: https://news.ycombinator.com/item?id=12902304

userbinator 9 years ago |

IMHO the problem is with compilers (and their developers) who think UB really means they can do anything, when what programmers usually expect is, and the standard even notes for one of the possible interpretations of UB, "behaving during translation or program execution in a documented manner characteristic of the environment".

http://blog.regehr.org/archives/1180 and https://news.ycombinator.com/item?id=8233484

sparky_ 9 years ago |

I suppose this sort of ambiguity is what drives the passion of Rust and Go programmers.

barsonme 9 years ago | |

Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly stupid stuff like passing (-INT_MIN, -1) to #12.

But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in Go, just as it is in C—it's IEEE 754 rules. int might not always be able to hold the size of an object in Go, just like C. In Go #6 wraps around and is an infinite loop, just like C.

The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.

But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.

federicoponzi 9 years ago |

Before: What? I know C. After 3 questions: Ok, I don't know C. Well played sir.

E6300 9 years ago |

1. Unless C's variable definition rules are completely different from C++'s, int i; is a full definition, not a declaration. If both definitions appear at the same scope (e.g. global), this will cause either a compiler error or a linker error. A variable declaration would be extern int i;

khedoros1 9 years ago | |

C's variable definition rules are different from C++'s. gcc happily compiles those two lines, g++ exits with the "redefinition" error.

caf 9 years ago | | |

Yes, in C a plain

  int i;

at file scope is a tentative definition - if, by the end of the compilation unit, no definition has been seen, one of them will become a definition, otherwise it is just a declaration.

On the other hand, this:

  int i = 0;

is a definition, and you can't have two of those.

E6300 9 years ago | | |

That was unexpected.

brianmurphy 9 years ago |

As a former C programmer, you know not to fool around at the max bounds of a type. That avoids all of the integer overflow/underflow conditions. When in doubt, you just throw a long or unsigned on there for insurance. :)

nightcracker 9 years ago |

I got every single one right. Does that mean I know C through and through? Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really tricky stuff.

AndyKelley 9 years ago |

I made this post as a response. Disclaimer: yet another programming language trying to dethrone C. People seem to be less enthusiastic about the subject these days.

http://andrewkelley.me/post/zig-already-more-knowable-than-c...

kvakkefly 9 years ago |

Anyone who enjoys this will also enjoy http://cppquiz.org

Hydraulix989 9 years ago |

I feel bad because I'm smart enough to answer these questions correctly in a quiz format, but if I saw any of them in production code, I would not even think twice about it.

(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)

rdc12 9 years ago |

Isn't this line from #3, undefined behavior not mentioned in the article (sequence point violation)

zp++ = xp + *yp;

msbarnett 9 years ago | |

That's not a sequence point violation. The C standard makes it clear that zp gets xp + *yp prior to the increment. Quoting 6.5.2.4

> The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate type is added to it.) See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.

The last sentence is key.

wmu 9 years ago |

#4 is not really language issue, rather a floating point numbers feature.

raarts 9 years ago |

(2015)

Kenji 9 years ago |

I'm sorry, but the answer this website gives to 1. is wrong. See for yourself:

  int i;
  int i = 10;
  
  int main(int argc, char* argv[]){
  	return 0;
  }

Try to compile it. It doesn't work (gcc.exe (GCC) 5.3.0), the error is:

  a.cc:2:5: error: redefinition of 'int i'
   int i = 10;
       ^
  a.cc:1:5: note: 'int i' previously declared here
   int i;
       ^

Either I misunderstood the author and this example, or I do know C.

mauricioc 9 years ago | |

Judging by the .cc extension, you are compiling this with a C++ compiler. Quoting from Annex C (which documents the incompatibilities between C++ and ISO C) of the C++ standard:

   Change: C++ does not have “tentative definitions” as in C E.g., at
   file scope,

   int i;
   int i;

   is valid in C, invalid in C++. This makes it impossible to define
   mutually referential file-local static objects, if initializers are
   restricted to the syntactic forms of C. For example,

   struct X { int i; struct X *next; };
   static struct X a;
   static struct X b = { 0, &a };
   static struct X a = { 1, &b };

   Rationale: This avoids having different initialization rules for
   fundamental types and user-defined types.
   
   Effect on original feature: Deletion of semantically well-defined
   feature.

   Difficulty of converting: Semantic transformation.

   Rationale: In C++, the initializer for one of a set of
   mutually-referential file-local static objects must invoke a
   function call to achieve the initialization.

   How widely used: Seldom.

Kenji 9 years ago | | |

facepalm of course, even if I use gcc, if I compile a.cc it switches to the c++ compiler. Thanks.

void baz(struct foo *foo, struct bar *bar) { union { struct foo foo; struct bar bar; } *foo_u = (void *)foo, *bar_u = (void *)bar; foo_u->foo.i = 0; bar_u->bar.i++; }

The <sys/socket.h> header shall define the sockaddr_storage structure, which shall be: Large enough to accommodate all supported protocol-specific address structures Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol-specific address structures and used to access the fields of those structures without alignment problems