How to find size of an array in C without sizeof

How to find size of an array in C without sizeof(arjunsreedharan.org)

420 points by ashishb4u 9 years ago | 203 comments

The result you get with this trick is signed, while the result you get with sizeof is unsigned.

Edit: Just to clarify, what you get is ptrdiff_t instead of size_t. So if array size is greater than PTRDIFF_MAX, you get undefined behavior [1].

[1] http://en.cppreference.com/w/c/types/ptrdiff_t

tedunangst 9 years ago | |

As far as I know, every compiler is badly broken with arrays greater than SIZE_MAX / 2, so this would be the least of your troubles.

pascal_cuoq 9 years ago | | |

Since I have it at hand, here is a list of examples of how compilers are broken if an array is larger than SIZE_MAX / 2 (which is called PTRDIFF_MAX in the post):

http://trust-in-soft.com/objects-larger-than-ptrdiff_max-byt...

mannykannot 9 years ago | |

Is there a circle in hell reserved for C standards committee members who add to the number of cases where 'undefined behavior' occurs in the standards?

microtherion 9 years ago | | |

Just how would you make this case defined? The alternatives to undefined behavior tend to be (1) being silent about the issue and letting users and implementers find out themselves (2) defining the behavior in a way that makes it difficult to implement on some architectures or (3) defining the behavior in a way that imposes costs on all architectures. None of these is particularly attractive.

kbart 9 years ago | | |

Read "undefined behavior" as "depending on architecture and compiler". It's not like anything can happen, but it's simply not to describe every architecture and every compiler into a standard. Sure, somebody is free to write an implementation where a nuke is launched every time "undefined behavior" is encountered, and they would be right according to C standard, but in real world, you pretty much know what to expect on a given system.

michaelmior 9 years ago | | |

I hope not. One of the reasons for allowing undefined behaviour is to allow room for compilers to perform certain optimizations that may not be possible if a specific result were required.

flamedoge 9 years ago | |

How likely do you run into array bigger than 2gb?

quotemstr 9 years ago | | |

The "how likely is it, really?" response to questions of technical correctness has always bothered me. It takes a mindset completely alien to mine to say "Here's a race condition. Sure, it's undefined behavior, but the race is narrow, so it's rare" or to say "Sure, memory allocation can theoretically fail, but in practice almost never does" or to say "fsync is too slow and most computers have batteries these days".

Software is unreliable enough as it is due to problems beneath our notice. It seems reckless to avoid fixing problems that we do notice. Sure, you could argue that rare problems are rare and that users probably won't notice them --- this attitude is penny-wise and pound-foolish, because you can't meaningfully reason about a system that's only probably correct.

cperciva 9 years ago | | |

A few months ago I was doing FFTs on arrays larger than 4GB. Amusingly, this uncovered a bug in the LLVM optimizer: It was looking at stride lengths to figure out if accesses were independent, and truncated a 4GB stride down to 0.

vram22 9 years ago | | |

Not likely, but possible. This reminds me of the bug that was found in the binary search algorithm a few years ago, IIRC, in Java. The interesting thing is that binary search is probably one of the earliest-invented algorithms. Yet, in the book Writing Efficient Programs by Jon Bentley (which I mentioned in a recent HN comment), he says that in a class he taught to several industrial programmers with many years of experience, some had bugs in their implementations of binary search that he set them as an exercise. Not sure but I think I remember reading in the article about the Java binary search issue, that even his algorithm had the bug that was found in the Java version. Why it was not found earlier is (maybe) because it only occurred with an extremely large array, IIRC. Don't have a link right now, but it can probably be found by searching for the right phrase.

loeg 9 years ago | | |

It's basically bogus to have a single object bigger or equal to half of address space (represented by size_t) in C. 32-bit platforms should detect and abort in such conditions (compiler/linker for static objects, malloc() implementation for dynamic allocations).

dragandj 9 years ago | | |

Today, with ML, big data and similar applications, that might be often.

minipci1321 9 years ago | | |

Probably not very likely, but keep in mind that this method could also be used without actually allocating the array -- akin to the 'offsetof()' macro. (Which is undefined behavior.)

icedchai 9 years ago | | |

On a 64-bit platform (anything modern), ptrdiff_t is going to be 64-bit so this will not be an issue (ok, 63-bit... but you get my point.)

kabdib 9 years ago | | |

Often enough; "pack" files in video games are often many GB. Memory-map one of those and there you are . . .

sfifs 9 years ago | | |

Oh surprisingly easily. Say you're handling a few billion cookies in RAM or manipulating DNA data.

hsivonen 9 years ago | | |

It's quite easy to serve over 2 GB of spaces over the network. (gzip, brotli)

millstone 9 years ago |

I'm surprised at all of the comments calling this stupid or pointless. The point is not that you should this trick in lieu of sizeof; the point is to shed light on a subtly of C arrays.

btrask 9 years ago | |

I suspect this article made a lot of people feel stupid, or in other words, it taught us something. Sometimes the ego gets out of check.

I think the article is well-presented and educational.

gragas 9 years ago | | |

>I think this article made a lot of people feel stupid

I don't think so. Anyone with a solid understanding of C understands pointer arithmetic. I think the article isn't obvious only to those who have a weak understanding of the language.

anigbrowl 9 years ago | |

Quite. This exactly the sort of thing that makes C such a fun language.

shmerl 9 years ago | | |

I'm not sure if it's a praise for C though. Arcane design and lack of clarity might be fun to decipher, but it's not something that you'd want to see in the programming language.

jheriko 9 years ago | |

personally the issue i take with this article is that it displays an opinion that is counterproductive to learning (imo).

rather than calling out that pointer arithmetic implicitly relies on 'sizeof' in order to be useful, its treated like some kind of magic. i.e. i don't think it points out the not subtle but rather obvious connection, and instead distracts from it...

JBiserkov 9 years ago | | |

Your comment:

>rather than calling out that pointer arithmetic implicitly relies on 'sizeof'

Article:

>arr has the type int , where as &arr has the type int ()[size].

For me this is calling out the implicit use of sizeof by pointing out the type.

pwython 9 years ago | |

You've been on here 4 years and are surprised at the top comments criticizing the content of a post? :)

There's a reason this meme exists: http://i.imgur.com/Z6pFTjj.jpg

arjun024 9 years ago |

Author of the article here. There's no intention here to encourage people to use this in code (in fact the opposite). This article is more of a "Did you know cool shit like this exist?".

mynameisbahaa 9 years ago | |

Please fix your site's header :)

mynameisbahaa 9 years ago | | |

I dug deeper and the problem was at my end. The computer I am using has a parental control software installed and configured to block certain websites including twitter which caused the author's site not to load all the needed assets and screwed up the page header. sorry for the inconvenience but I would have been able to figure it out quicker than this if people who down-voted my comment took the time to tell that the site is working fine for them!

arjun024 9 years ago | | |

I'll appreciate if you could provide a screenshot :)

Stratoscope 9 years ago |

Whether you use this method of getting the number of elements in an array or the more traditional sizeof method, please encapsulate the logic in a macro.

Instead of writing either of these:

  size_t length = sizeof array / sizeof array[0];

  size_t length = (&array)[1] - array;

Define this macro instead:

  #define countof( array )  ( sizeof(array) / sizeof((array)[0]) )

Or if you must:

  #define countof( array )  ( (&(array))[1] - (array) )

And then you can just say:

  size_t length = countof(array);

Edit: I used to call this macro 'elementsof', but it seems that 'countof' is a more common name for it and is a bit more clear too - so I'm going to run with that name in the future.

icedchai 9 years ago |

Interesting. I've been working with C for almost 30 years (first taught it to myself when I was 14) and never thought about the actual type of array.

psyc 9 years ago | |

You're not alone. I've been programming in either C or C++ for 25 years, and it wouldn't have occurred to me that you can have a "pointer to array of size N" that includes the size. Though I probably could have been led there with a little Socratic questioning.

int_19h 9 years ago | | |

The reason why people don't usually run into this is because C tries really hard to decay your arrays to pointers to first element, so there are very few cases where it actually comes up - sizeof(array) and &array are some of the few. On top of that, writing down the type of such an array is not exactly obvious, and requires parentheses:

    int (*p)[10];

This all is much more interesting in C++, because there, in conjunction with references, this lets you write functions that take arrays as arguments and know their length. Like so:

    template<size_t N>
    void foo(const int (&a)[N]) {
        for (size_t i = 0; i < N; ++i)
            cout << a[i];
    }

    int a[10];
    foo(a);

jjnoakes 9 years ago | | |

If you start thinking about two dimensional arrays, you'll probably get close quickly.

cbsmith 9 years ago | |

Which kind of explains so much about the problem with C. ;-)

jjnoakes 9 years ago | | |

I think it more explains that you can do a lot without fully understanding what it is you are working with.

Which can be good or bad.

minipci1321 9 years ago |

For the completeness sake, the size of an array can also be computed via linker symbols, see for example: http://stackoverflow.com/questions/29901788/finding-the-last....

Same constraints apply (pointer arith).

I am not sure why this method, applied to ordinary arrays, would be preferred to sizeof (), but since we're shedding light here...

EDIT: pointer arith constraints only apply if we compute the difference (end - beg) in the C code. We could also do that in the linker script itself, and I don't recall whether or not C semantics of ptrdiff_t would be preserved in that case. Such preservation doesn't seem very probable to me, so potentially this method might allow to avoid overflows (or to move them much higher) -- to be checked in the 'ld' doc!

tlb 9 years ago | |

Do all linkers guarantee not to round this up to a word size?

pmiller2 9 years ago |

Was anyone else's first thought "Hmm... cool," followed by "I hope nobody asks me this on an interview?"

exabrial 9 years ago | |

If you are asked this in an interview, it's not longer an interview... I would simply reply "what circumstances would dictate the necessity of such rather than producing clean code for my coworkers?"

pmiller2 9 years ago | | |

Hence why I hope noone asks me it. :)

p1esk 9 years ago | |

I actually thought: "cool... I hope someone asks me this on an interview!"

hnfairy 9 years ago |

Despite the argument at the end, this is undefined behavior in the latest C specification. The code dereferences a pointer one past the last element.

C11 6.5.6/8:

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated

feelix 9 years ago | |

"it shall not be used as the operand of a unary * operator that is evaluated"

he doesn't use the * operator on it, he just calculates its position. If he were to access it (ie, use it with *) then that would be breaking the rule

dom0 9 years ago | |

The snipper only calculates the pointer, and does not dereference it. Should be fine.

Buge 9 years ago | | |

It's a complicated situation. There's a pointer to an array, and that pointer is dereferenced, resulting in an array (that then decays to a pointer). But that second array/pointer is not dereferenced. I'm not sure if it's legal.

amelius 9 years ago | |

I think the authors of the spec really meant something else: reading/writing a memory location past the end of the array is illegal. But here "*" is used only in an address computation, not to actually access memory.

Shows how difficult it is to get a spec right.

So, IMO, you are right, the code in the article is illegal (strictly speaking).

But I think it is likely that most compilers would still allow it, because that clause in the spec essentially exempts the compiler from adding an explicit bounds check.

bonzini 9 years ago | | |

I don't think this is illegal. What is the clause in the spec that allows &arr[1]? I would try and see if it also applies to (&arr)[1].

angry_octet 9 years ago |

While this is as interesting as any c arcana, I truly hope that people are not passing around pointers to arrays and then using sizeof(array)/sizeof(elem) to figure out how big they are, like they are stuck in a first year programming assignment that denies them the use of malloc, so they use C99 VLAs everywhere.

gruez 9 years ago |

How is this better than the sizeof method? This looks like a clever way to access sizeof information without explicitly using the sizeof operator.

jjnoakes 9 years ago | |

It isn't better. I don't think it was claimed to be.

But if you really understand C, it should also not be a surprise that it works this way.

cbsmith 9 years ago | |

I think it is better in exactly zero ways.

It is, nonetheless, different.

Nimitz14 9 years ago |

Why do we dereference the array pointer? Wouldn't that give us the value at the address when we just want the address? Also wouldn't the subtraction just give us a number of bytes and thus we'd still need to divide by sizeof(int))?

clarry 9 years ago | |

Pointer arithmetic works element-wise, not byte-wise.

So if p is a pointer, then p+1 refers to the next element after p, regardless of the size of the pointee. And so (p+1) - p is 1, again regardless of the size of the pointee.

In this case, &arr is a pointer to array, and &arr + 1 would point to the next array following the first one. But we wanted to calculate the number of elements in the array, not the fact that we have one array. So we dereference the pointer, thus getting an array type, which in turns "decays" to a pointer to the first element of the array, which has the right type for counting the elements using pointer arithmetic.

Nimitz14 9 years ago | | |

Thank you.

jheriko 9 years ago |

there is a classic mistake here... the idea that pointer arithmetic does not rely on sizeof.

that's the entire mystery opened and closed afaik. sure you can use some obscure notation if you like, but why not just use sizeof?

russkrayer 9 years ago |

Thanks for posting this question. The responses are very interesting.

halayli 9 years ago |

this is undefined behavior. &arr + 1 can overflow. There's no guarantee &arr isn't near memory end boundary. &arr + 1 is converted at compile time to rbp - X where X is an integer determined by the compiler similarly to how sizeof works.

Basically ptr + integer requires the compiler to determine the sizeof ptr's type.

cperciva 9 years ago | |

this is undefined behavior. &arr + 1 can overflow

No. From 6.5.6 Additive operators:

7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

8 [...] if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So &arr + 2 can overflow, and &arr + 1 cannot be dereferenced, but &arr + 1 shall not overflow and is not undefined behaviour.

halayli 9 years ago | | |

But arr != &arr even though they have the same value. #8 applies to arr (P), but in the post OP is using &arr which is a ptr to array[x] and doesn't apply to it.

mnarayan01 9 years ago | | |

So then I guess malloc can't return an allocation which actually goes to the end of the address space, but has to leave at least one extra byte to avoid overflow? That's pretty interesting, though I guess it certainly makes sense.

Edit: Also now that I think about it, I've written code that relied on that behavior...not sure if I'd heard it before and internalized and forgot it, or just was being foolish.

cbsmith 9 years ago | |

Nope, you have guarantees about checking the address of one element past the end of an array. Think of all the bugs you'd otherwise enjoy...

Etheryte 9 years ago |

Given how many bugs & errors stem from simple fails in range checks etc, I would much rather go with the tried and true way rather than use something "clever".

Quoting http://stackoverflow.com/a/16019052/1470607

  Note that this trick will only work in places where `sizeof` would have worked anyway.

Animats 9 years ago | |

Yes. This only works for arrays on the stack, at best. It assumes that arrays are placed on the stack in the order of declaration, which is not a requirement of the C standard and may differ between compilers.

Unless you're writing a buffer overflow exploit, in which case you need to know exactly what's on the stack and where, this isn't a good way to program.

Update: misread the article; thought he was differencing with the beginning of the next array.

dllthomas 9 years ago | | |

I don't see how the code assumes anything about the placement of the array. Indeed, it works just fine for static arrays:

    $ cat test.c
    #include <stdio.h>
    
    int arr[5];
    
    int main(int argc, char *argv[]) {
    	printf("%lu, %ld\n", sizeof(arr) / sizeof(*arr), (&arr)[1] - arr);
    }
    
    $ gcc test.c && ./a.out
    5, 5

Not saying it's "a good way to program" - it's needlessly obfuscated compared to the standard sizeof alternative. But it doesn't rely on anything tricky.

ycmbntrthrwaway 9 years ago | | |

> It assumes that arrays are placed on the stack in the order of declaration

I am not sure it is the case here. The code uses only one array, how can it assume the order of arrays?

utopcell 9 years ago |

nice exposition to c array types.

in c++, a compile-time equivalent to sizeof would be:

  template<typename T, size_t N> size_t sz(T(&)[N]) { return N; }

angeladur 9 years ago |

I would do this only when I am obfuscating code.

disposablezero 9 years ago |

Many implementations historically also allocated enough memory to include one extra element at the end of the array.

tedunangst 9 years ago | |

I find this improbable.

angry_octet 9 years ago | | |

Agreed, compiler implementors rarely decide to use more memory than is required. There may be a stack canary, but this is between stack allocated variables and control flow structures, not for every array.

slobdell 9 years ago |

That pun in the first sentence alone made the article worth it.

stirner 9 years ago |

The printf commands say "the address of..." but proceed to print out the value, not address.

hmottestad 9 years ago | |

Looks fine to me. An address is just a number, this one being hex encoded.

stirner 9 years ago | | |

Okay. In my experience "the address of x" is taken to be synonymous with "&x", but I suppose that's a pedantic difference.

Chinjut 9 years ago |

C is such a boondoggle of a language... We're condemned to forever explore its every weird nook and cranny for historical reasons, rather than because it is the cleanest, best approach to things possible.

minipci1321 9 years ago | |

C for sure has its weird sides, but does appear much more logical and consistent when observed "from the below", from how-the-hardware-runs perspective.

For example, the shift operators have higher precedence than bitwise masking (and/or/xor) since this way the expressions setting/clearing ranges of bits won't require parentheses (so increased readability) and the masking constants in them will be the narrowest. Loading a wide immediate value into a register sometimes takes several instructions, so such precedence also brings in the least cost as well (nowadays compilers take care of that to some extent).

But people frequently mess up this aspect, use lots of parens (and ending up with wide masks) saying this rule is not intuitive. It is.

armitron 9 years ago | | |

You could attempt to rationalize some of its (terrible) design decisions after-the-fact by finding convenient examples, but compared to the clarity and surety of straight-up assembly, C is a dystopian nightmare of enormous unseen complexity and undefined behavior.

Maro 9 years ago |

I haven't written C in a while, but I think this is pretty stupid. sizeof() is a compile-time thing in C, so it's substituted with a number by the time you get an executable. See:

http://stackoverflow.com/questions/671790/how-does-sizeofarr...

I think this is effectively doing the same thing, but in a non-standard way; ie. I think `int n = (&arr)[1] - arr;` is substituted with the actual the number by the compiler the same way sizeof() would be, only noone will know wtf is going on.

Disclaimer: I didn't look at the generated code to confirm; I guess it could even be compiler/runtime dependent.

dllthomas 9 years ago | |

I don't think anyone is proposing that people use this. I read it as an exercise to stretch our understanding of other bits of the language.