How to Think About Variables in C

How to Think About Variables in C(denniskubes.com)

38 points by denniskubes 13 years ago | 59 comments

voidlogic 13 years ago |

Extremely uninteresting- It is like a page of "C-S 1XX: Intro to C" fell out of its bindings and landed on Hacker News.

This might have been mildly interesting if there had been the assembly for a few different architectures (x86, MIPS, ARM, PowerPC, etc) showing how the C code was translated to assembler for each. And could have been very interesting with an additional discussion of memory barriers and atomic operations in C and their relation to assignments and pointers.

holyjaw 13 years ago | |

Amendment: 'Extremely uninteresting' -TO YOU-.

As someone who has had difficulty picking up real programming languages, and has only found some marginal success due to Obj-C's ARC feature, I can tell you this puts everything I've read in to much better perspective.

Try not to be so negative, man, I think it's clear you weren't even the intended target anyways.

minimax 13 years ago | |

HN has a pretty broad audience and a pretty big chunk of it doesn't know $language. These types of beginner posts for $language pop up from time to time. It's nothing to worry about.

voidlogic 13 years ago | | |

$language in this case is C, the lingua franca of computing.

It is almost always the first language ported to any system, almost every computer science program at least covers the basics, it has been in 1st/2nd place on the TIOBE index for over a decade, its the 5th most popular language on github by commits and it is over 40 years old.

But- I'm willing to accept there might be people on Hacker news that don't know C, thats why I gave suggestions to the author to expand on the content and make it interesting to a wider audience. That was the point of my post.

mturmon 13 years ago | | |

Posts on elementary topics (should be) noteworthy only if mastery is exhibited. Hence, griping.

ultimoo 13 years ago | |

I agree. I liked the opening line though: "C is memory with syntactic sugar." It is a good introductory article for someone who has never used C -- CS-1xx Intro as you said.

greenyoda 13 years ago | | |

"Syntactic sugar" generally means a syntax that's just a nicer-looking version of something that can be equivalently expressed in a more fundamental syntax. But C is more than that: it provides a way of abstracting away the details of the machine so that you don't have to explicitly deal with the fact that your machine has 64-bit pointers and 2's complement integer arithmetic and IEEE floating point and an instruction set that handles shift operations in a particular way.

So a better formulation might be: "C provides an abstraction layer on top of a computer's memory model and instruction set that will allow your code to be portable between different machine architectures, but only if you play strictly by the rules."

By the way, the classic K&R book explains the fundamentals of C pretty well. If you really want to understand C, I'd recommend reading it cover to cover (it's pretty short).

denniskubes 13 years ago | |

I was trying to describe a simple mental model that has been helpful to me. While I agree assembly details would have been interesting putting that in would have lost more than half the audience.

nemetroid 13 years ago | | |

> putting that in would have lost more than half the audience.

I surely hope not.

blt 13 years ago | |

The least they could have done is explain how structs work.

haberman 13 years ago |

There are some subtle problems with the model as explained in this article. If you use this as your mental model, you will probably run afoul of undefined behavior without realizing it.

If you read the C standard, you'll notice it doesn't talk much about "memory" (the word only appears 13 times in C99); it mostly talks about "objects" (mentioned 735 times in C99). These objects aren't OO-objects -- obviously C doesn't have OOP built in -- but rather all the basic types like int, float, struct, etc are objects. When you declare a variable like "int x", you are creating an object.

C's aliasing rules dictate that you can only access an object via a pointer of that object's actual type. This is why it is dangerous to think of the assignment operator as a simple memory-copying operation. If assignment were a simple memcpy, you could do something like this:

  int x = 5;
  // BAD: undefined behavior, violates aliasing.
  short y = *(short*)&x;

If a variable were just a memory address and assignment were just a memory copy, this would be a valid operation. But the right way to think of it is that a variable is a storage object whose address can be taken, and and a dereference is an operation that reads a storage object.

A pointer isn't a generic memory-reading facility, it must actually point to a valid storage object of the pointer's type (or to NULL).

If you do want to read and write arbitrary objects in memory, you can always use memcpy():

  int x = 5;
  short y;
  // This is fine, and smart C compilers optimize away the
  // function call.
  memcpy(&y, &x, sizeof(y));

_kst_ 13 years ago |

"A data type is a number of bytes to the compiler."

The size of a type is just one of its many attributes. Even if, for example, "long", "float", and "void* " happen to have the same size, they're still very distinct types.

"Integer data types are defined in the limits.h file. Float data types are defined via macros in the floats.h file."

Integer and floating-point types are defined by the compiler, guided by the hardware and the ABI for the platform. <limits.h> and <float.h> document the characteristics of the predefined numeric types.

"A pointer doesn’t hold a memory address, it holds a number that represents a memory address."

Sure, and a floating-point object is ultimately just a collection of bits -- but that's hardly the best way to think about either of them. Integers and pointers (addresses) are logically very distinct things, even if they happen to have similar representations. For example, the addresses of two distinct variables have no defined relationship to each other (other than being unequal); just evaluating (&x < &y) has undefined behavior.

C lets you get away with a lot of type-unsafe stuff, particularly if you resort to pointer casts, but it's fundamentally much more strongly typed than the author seems to think it is.

revelation 13 years ago | |

See also: strict aliasing

dllthomas 13 years ago |

1 int x = 10;

2 &x = 20; // this doesn't work

3 * (&x) = 20; // this does work

Why does line 2 &x not work but line 3 does? Because &x returns a pointer, a number representing a memory address. This is an important distinction. A pointer doesn’t hold a memory address, it holds a number that represents a memory address.

=======

No, that is not why. Note that the following does work:

int * x = 0;

and the following works, though typically yields a warning:

int * x = 20;

Line 2 fails because & doesn't give back an l-value.

asveikau 13 years ago |

> Every variable is a starting memory address to the compiler.

Definitely not true. More like, "it will have an address, if you take the address with the & operator". Otherwise, the compiler is quite free to store locals in registers.

denniskubes 13 years ago | |

> Yes I am being simplistic and yes certain data types have certain syntactic sugar but I have found this to be a good mental model

As stated in the post.

mturmon 13 years ago | | |

I think you're going to keep getting comments on these ill-considered asides, but here is another problem:

"In most assembly languages, data types don’t exist. You operate on bytes and offsets."

This is just not true.

Most assembly languages (I learned on PDP-11 assembler, which I remember best, but what I say is true of 68000 and x86 too) have a notion of a byte, but also integers of various word lengths, and floating point numbers.

In fact, some registers are in effect designated as "pointers" for various kinds of conventional indirect addressing (the instruction pointer, the register holding the stack pointer, and others).

In this sense, C is even closer to assembly than you indicate, because the data types are so analogous.

asveikau 13 years ago | | |

This reminds me of another comment I had: I personally find the phrase "syntactic sugar" irritating. As used, I don't feel like it adds anything to the blog post. IMO you could write nothing there and it'd make the exact same point.

What exactly is the "syntactic sugar" that hides the idea that names can have addresses? Structs? Some specific kind of expression? Array index syntax? The names themselves?

halayli 13 years ago | | |

Simplicity here doesn't help. Variables aren't about how they are stored and where but more about what gets applied to them and how.

snorkel 13 years ago |

Integers are the simple case, but you really haven't grasped the C memory model until you're comfortable handling text strings at any length, calling functions by pointers, working with structure pointers, and knowing when you need a pointer to a pointer. Part of it is understanding variable scope, local vs global vs stack frame memory. It's not rocket science, just takes practice, and the courage to segfault your way through it.

denniskubes 13 years ago |

What other mental models do people use to think about variables and memory? I would like to hear about them.

16s 13 years ago |

It sounds simple, but you'd be surprised how many programmers don't grok the fact that types/data have sizes (especially numeric types). For many tasks, this doesn't matter, but when it does matter, you need people who understand.

As an example, an IPv4 address is 32 bits. Don't convert it to a string and put it in a varchar(64) in your database when you are optimizing for space (I actually saw this once). And yes, the DB had an inet type, but no one knew how to use it, what it was or why it mattered.

__david__ 13 years ago |

My favorite bit of pointer code is one I had to write in the bootstrap code of an embedded processor:

    int r = ((int (*)())startAddress)(); // Wheeee!

derleth 13 years ago |

> C is memory with syntactic sugar and as such it is helpful to think of things in C as starting from memory.

http://en.wikipedia.org/wiki/Lie-to-children

> A lie-to-children, sometimes referred to as a Wittgenstein's ladder (see below), is an expression that describes the simplification of technical or difficult-to-understand material for consumption by children. The word "children" should not be taken literally, but as encompassing anyone in the process of learning about a given topic, regardless of age. [snip] Because life and its aspects can be extremely difficult to understand without experience, to present a full level of complexity to a student or child all at once can be overwhelming. Hence elementary explanations tend to be simple, concise, or simply "wrong" — but in a way that attempts to make the lesson more understandable.

OK, the very first sentence of this piece falls flat on its face when you begin to think about how a computer actually handles getting data into and out of the parts of the CPU that actually do the work of modifying data according to the opcodes in flight.

In specific, C is meant to be a pleasant syntax to sling data around a large, flat address space, where the assumption is that every part of the address space can be treated like any other, with no special consideration given to some locations being faster than others. (The 'register' keyword mucked with this a bit, but approximately nobody uses it anymore in new code. Just as well, because good compilers ignore it anyway; more below.)

This is horribly, hilariously wrong when you learn about cache hierarchy, and becomes even more wrong when you throw an OS implementing virtual memory and a disk cache into the picture. C doesn't have any way to refer to cache; you can't tell the compiler 'store this in cache' because that would break the abstraction C enforces.

So we loop back around: C enforces the abstraction for a good reason; namely, compilers are better than humans at scheduling memory use in practically every case, and in the few cases they aren't, you're doing something hardware-specific enough you'll need to drop into assembly anyway. This is also the reason the 'register' keyword is a no-op and has been for decades. Compilers can schedule registers better than humans because compilers know more about all of the optimizations in play, and when they can't, you'll have to drop into assembly anyway.

TL;DR: This is a basic introductory post. Nitpicking it for things that compilers take care of for you anyway is pointless.

denniskubes 13 years ago | |

Thank you.

#include <stdio.h> #include <stdint.h> void f(volatile uint32_t *x, volatile uint16_t *y) { *x = 5; printf("%d\n", *y); } int main() { volatile uint32_t x = 10; f(&x, (volatile uint16_t*)&x); }