How to find size of an array in C without sizeof(arjunsreedharan.org) |
How to find size of an array in C without sizeof(arjunsreedharan.org) |
Edit: Just to clarify, what you get is ptrdiff_t instead of size_t. So if array size is greater than PTRDIFF_MAX, you get undefined behavior [1].
http://trust-in-soft.com/objects-larger-than-ptrdiff_max-byt...
Software is unreliable enough as it is due to problems beneath our notice. It seems reckless to avoid fixing problems that we do notice. Sure, you could argue that rare problems are rare and that users probably won't notice them --- this attitude is penny-wise and pound-foolish, because you can't meaningfully reason about a system that's only probably correct.
I think the article is well-presented and educational.
I don't think so. Anyone with a solid understanding of C understands pointer arithmetic. I think the article isn't obvious only to those who have a weak understanding of the language.
rather than calling out that pointer arithmetic implicitly relies on 'sizeof' in order to be useful, its treated like some kind of magic. i.e. i don't think it points out the not subtle but rather obvious connection, and instead distracts from it...
>rather than calling out that pointer arithmetic implicitly relies on 'sizeof'
Article:
>arr has the type int , where as &arr has the type int ()[size].
For me this is calling out the implicit use of sizeof by pointing out the type.
There's a reason this meme exists: http://i.imgur.com/Z6pFTjj.jpg
Instead of writing either of these:
size_t length = sizeof array / sizeof array[0];
size_t length = (&array)[1] - array;
Define this macro instead: #define countof( array ) ( sizeof(array) / sizeof((array)[0]) )
Or if you must: #define countof( array ) ( (&(array))[1] - (array) )
And then you can just say: size_t length = countof(array);
Edit: I used to call this macro 'elementsof', but it seems that 'countof' is a more common name for it and is a bit more clear too - so I'm going to run with that name in the future. int (*p)[10];
This all is much more interesting in C++, because there, in conjunction with references, this lets you write functions that take arrays as arguments and know their length. Like so: template<size_t N>
void foo(const int (&a)[N]) {
for (size_t i = 0; i < N; ++i)
cout << a[i];
}
int a[10];
foo(a);Same constraints apply (pointer arith).
I am not sure why this method, applied to ordinary arrays, would be preferred to sizeof (), but since we're shedding light here...
EDIT: pointer arith constraints only apply if we compute the difference (end - beg) in the C code. We could also do that in the linker script itself, and I don't recall whether or not C semantics of ptrdiff_t would be preserved in that case. Such preservation doesn't seem very probable to me, so potentially this method might allow to avoid overflows (or to move them much higher) -- to be checked in the 'ld' doc!
C11 6.5.6/8:
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated
he doesn't use the * operator on it, he just calculates its position. If he were to access it (ie, use it with *) then that would be breaking the rule
Shows how difficult it is to get a spec right.
So, IMO, you are right, the code in the article is illegal (strictly speaking).
But I think it is likely that most compilers would still allow it, because that clause in the spec essentially exempts the compiler from adding an explicit bounds check.
So if p is a pointer, then p+1 refers to the next element after p, regardless of the size of the pointee. And so (p+1) - p is 1, again regardless of the size of the pointee.
In this case, &arr is a pointer to array, and &arr + 1 would point to the next array following the first one. But we wanted to calculate the number of elements in the array, not the fact that we have one array. So we dereference the pointer, thus getting an array type, which in turns "decays" to a pointer to the first element of the array, which has the right type for counting the elements using pointer arithmetic.
that's the entire mystery opened and closed afaik. sure you can use some obscure notation if you like, but why not just use sizeof?
Basically ptr + integer requires the compiler to determine the sizeof ptr's type.
No. From 6.5.6 Additive operators:
7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
8 [...] if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So &arr + 2 can overflow, and &arr + 1 cannot be dereferenced, but &arr + 1 shall not overflow and is not undefined behaviour.
Edit: Also now that I think about it, I've written code that relied on that behavior...not sure if I'd heard it before and internalized and forgot it, or just was being foolish.
Quoting http://stackoverflow.com/a/16019052/1470607
Note that this trick will only work in places where `sizeof` would have worked anyway.Unless you're writing a buffer overflow exploit, in which case you need to know exactly what's on the stack and where, this isn't a good way to program.
Update: misread the article; thought he was differencing with the beginning of the next array.
$ cat test.c
#include <stdio.h>
int arr[5];
int main(int argc, char *argv[]) {
printf("%lu, %ld\n", sizeof(arr) / sizeof(*arr), (&arr)[1] - arr);
}
$ gcc test.c && ./a.out
5, 5
Not saying it's "a good way to program" - it's needlessly obfuscated compared to the standard sizeof alternative. But it doesn't rely on anything tricky.I am not sure it is the case here. The code uses only one array, how can it assume the order of arrays?
in c++, a compile-time equivalent to sizeof would be:
template<typename T, size_t N> size_t sz(T(&)[N]) { return N; }For example, the shift operators have higher precedence than bitwise masking (and/or/xor) since this way the expressions setting/clearing ranges of bits won't require parentheses (so increased readability) and the masking constants in them will be the narrowest. Loading a wide immediate value into a register sometimes takes several instructions, so such precedence also brings in the least cost as well (nowadays compilers take care of that to some extent).
But people frequently mess up this aspect, use lots of parens (and ending up with wide masks) saying this rule is not intuitive. It is.
http://stackoverflow.com/questions/671790/how-does-sizeofarr...
I think this is effectively doing the same thing, but in a non-standard way; ie. I think `int n = (&arr)[1] - arr;` is substituted with the actual the number by the compiler the same way sizeof() would be, only noone will know wtf is going on.
Disclaimer: I didn't look at the generated code to confirm; I guess it could even be compiler/runtime dependent.
with a cleaner way to do _countof using a template in C++ 11.
You can also use the template technique to pass a fixed size array to a function, and have the function determine the array size (without needing a 2nd length param, or null terminator element). Similar to strcpy_s(): http://stackoverflow.com/questions/23307268/how-does-strcpy-...
MSVC has a built in _countof: http://stackoverflow.com/questions/4415530/equivalents-to-ms...
While we're talking macros, anyone who reads the g-truc.net article should feel itchy after seeing the countof macro in their example:
#define countof(arr) sizeof(arr) / sizeof(arr[0])
Two problems here:1. The last use of 'arr' doesn't have 'arr' wrapped in parenthesis.
2. The entire expression is not wrapped in parentheses either.
If you write a macro that does any calculation like this, play it safe and put parens around every macro argument and parens around the entire expression too. Otherwise you never know what operator precedence will do to you.
Why?
When reading such code, it means I would have to go and lookup a macro definition. So, there's a clear drawback. What's the benefit that makes it worthwhile?
I mean, you don't go look up the definitions of every function that gets called, every time they are called, right?
But my point with suggesting the macro applies equally to the more traditional sizeof division. I have seen code that divides the two sizeofs every time an array length is needed. I think it's better to put that calculation in a macro so you only do it in one place.
The standard also has the idea of "implementation defined" behaviour, which is close to the definition above. "Undefined behaviour" is a trickier beast, since compilers can rightly assume undefined behaviour never occurs, and optimise accordingly.
These days, compilers quite often speculate on undefined behavior, generating code as if the undefined part cannot happen - the result is that your code is going to do stuff you pretty much can not know or expect.
That's the same thing I'm saying. :-)
N1256 6.5.6p8
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i-n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
The last sentence right there forbids what we're doing here.
6.5.3.2p3 allows dereference with address-of (&a[1]):
The unary & operator yields the address of its operand. If the operand has type ''type'', the result has type ''pointer to type''. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.
The exception in this clause clearly does not apply to (&arr)[1] because the operand of & is not a result of the * (or []) operator.
And use do/while wrappers (without a trailing semicolon) where needed: https://kernelnewbies.org/FAQ/DoWhile0
Assertion failed: (Distance > 0 && "The distance must be non-zero"),
function areStridedAccessesIndependent, file /wrkdirs/usr/ports/devel/
llvm38/work/llvm-3.8.0.src/lib/Analysis/LoopAccessAnalysis.cpp, line 1004.
Looking at the file it was easy to see what was being asserted, and to see that the type was a 32-bit integer; since I knew I was dealing with huge FFTs, the problem was obvious.Let this be a lesson: Asserting that impossible things don't happen makes debugging much easier when they do happen!
My suggestion to use a macro is not because of any difference in the compiled code, but to improve the readability of the source code.
bug in java binary search
and showed a related search in the drop-down, 'programming pearls ...', a book by Jon Bentley, which seems to confirm what I said above (though I saw it in his other book, "Efficient Programs", IIRC - he might have mentioned the same issue in the Programming Pearls book too).
Edit: and the Wikipedia article confirms it too:
https://en.wikipedia.org/wiki/Binary_search_algorithm#Implem...
All running on 32bit windows 8.1
So an implementation that could stick an array at the very end of the address space, and do wraparound for one-past-the-end so that it's represented by all bits zero, would then need to special-case that zero value when performing any pointer comparisons.
But yes, you'd need logic for pointer comparisons as well.
*(&arr + 1) - arr
That translates to taking the address one point past the array and subtracting the address of the array from it. It doesn't actually dereference the location past the end of the array.While:
(&arr)[1] - arr
might appear to be doing something different, it actually isn't.&arr + 1 is a pointer to an array that begins just after the existing array.
* is the dereference operator, so it seems to me that *(&arr + 1) dereferences the pointer to the array, resulting in an array (or a reference to an array), which then decays to a pointer.
It doesn't. Because an array is already a pointer, in (&arr + 1) &arr is a pointer to a pointer (ie, a handle) so *(&arr) is dereferencing the handle to the pointer. So it's still one pointer level deep - it doesn't dereference it completely.
I guess it could be useful for teams working together on bigger codebases.
2 most IDEs allow simple hover over and see macro definition without having to break much flow.
Otherwise I totally agree with your point.
Also, &arr is not a pointer to a pointer. It's a pointer to an array. Specifically, its type is int(* )[5] in this example, and so when you dereference it, the result is of type int[5]. So if you do e.g. sizeof(* &arr + 1), you'll get 5 * sizeof(int).
So &arr behaves like it's a pointer to the start of int[5][1].
> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow;
this point doesn't apply to &arr + 1.
Likewise, the compiler ensures that you can build &arr[5] and that is the same address as &arr+1. &arr+1 cannot overflow.
Dereferencing the pointer is UB, but you can create the pointer, assign it to a variable, etc.
But, so far as I can see, this case (allocating at the very end of address space) is the only one where wraparound would matter for pointers.
And what would be the other option? If it's saturation, then your one-past-the-end pointer for the array at the end of address space would compare equal to pointer to last element...
Now, runtimes are increasingly adopting address randomization, which can change the rules about this, depending on what you are doing.
I got asked some years back why I defaulted to C in some interview questions -- I grew up with the language, understand the nuances and many of the implementations.
It's now possible to make your way through a university education in CS without ever touching or understanding C. This is a problem.
I did not study CS, but I had a number of CS modules/classes. LaTeX was the only programming language I recall using. Students with better handwriting could probably get away with not doing any programming at all.
It's not clear to me that this is a problem, but I imagine that the systems requirement of most CS programs will involve C.
For what it's worth, I default to python in interviews even though it's my least favorite out of the languages that I use frequently.
Why is that ? I would image that you'd use the language you are most comfortable with and trust the most during an interview ? What makes Python a good 'interview' language but a less good bread and butter language for you ?
Having done even a semester of any type of assembly (not just part of a class) is probably enough. Other /low level/ languages like Forth (Imagine you /only/ had assembly, and wanted to build something a /little/ less painful) could probably work too.
I suppose that this is the case. But really, to me this is article does not reveal anything beyond what I already knew from basic pointer arithmetic.
Hi! Could you take a guess at what percentage of C programmers who write C professionally fit your definition of that (I realize you were being hasty in your phrasing, but still)?
Obviously your answer should be betweeen 0% (no programmer who writes C professionally) and 100% (every programmer who writes C professionally.)
I'm genuinely curious what you think! Thanks :)
I very much doubt that 85% of C programmers know these things. It would be interesting to find out!
Correctness is great in theory, but in practice it's a matter of what's important.
If it were really as likely as, say, the sun exploding that X happened then it would be of no use to expend time on X.
BUT very often people speaking about the probability of events given suspicious constraints. While a memory allocation might not fail in most situations it will fail often in some situations. And a one-in-a-million chance is almost guaranteed when there are millions of uses.
But the question is important in another context: language design. Why is this undefined behavior something that exists in the first place? Objects larger than PTRDIFF_MAX could just not be allowed! This avoids the problem and makes code easier to reason about, with pretty much no downside.
You just need to get it, and really its no harder than, say context managers in python, or promises in js. Its not relevant at what 'level' those constructs are. They are novel in they way in which they model and solve real problems in context.
So 'lack of clarity' is really due to misunderstanding the context and problem space the langue was made to operate in.
I find C somewhat logical, but it has an easy-to-learn simplified version of itself that can be learned before re-reading the spec to complete your understanding of the language.
if you're on a 16bit system and you define char x[36], the compiler guarantees that x's address is not more than 65500. if you do &x + 1 then you'll overflow, x + 1 won't.
You can pass whatever you want to the functions and apply the operands you want and the compiler will happily comply with you. But when you pass it 65500 and add 72 to it, it's going to overflow.
char *p = x;
p += 36; // overflow?
As arr == &arr, so are pointers P and Q that point just after last array item (1+&x[35]) and just after entire array (1+&x). As 6.5.6.8 above said, P is okay, and so must be Q. They said about last element, not second. Can you please explain why is x+1 even an argument?>if you're on a 16bit system and you define char x[36], the compiler guarantees that x's address is not more than 65500 65499?
This is 6.3.6 "Additive operators" in ISO C90 standard, for anyone curious.
I really like the modern incarnations of C++ and statically typed ML-influenced, but not necessarily ML-derived, languages.
There are many criticisms that one could make about R, but I like some of its lispier features.
? I'd reword that to say humans are more entertained by a small set of complicated rules. Simple rules are easy for humans, but they can be boring.
However, in reality here, we're comparing apples to oranges. The instruction set for a computer is it's language. Asking a computer to speak English - that's our (humans') language, and with the tables turned, one could ask whether the computer thought better in English or Chinese, and the answer may be different but still meaningless.
C evolved in a different age and is fraught with undefined behavior.
One in a million literally means that at ~100k requests a second it will happen once every 10 seconds.
> They also have a provision where a pointer to a nonarray value is treated as if it were a 1-element array
That is not what it says. Let me quote the exact words (emphasis mine):
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
You're right that what we have is obviously (a pointer to) an array object, not an element of another array. This is precisely the case where this special provision kicks in, and thus it is legal to do &arr + 1.
I used to prefer C89 but learned the hard way that there are quite a few "bugs" in the standard, where it is ambiguous or otherwise fails to give a clear answer. So C99 is my go-to standard these days, even though I only care for a subset of its features.
"nonarray object" definitely sounds like such a bug. I think the intent of this clause is clear: it is meant to make sure you can always pass a pointer to a single object that treats its argument as a pointer to an array element, and does pointer arithmetic on it. One of the most common construct is simply looping over an array by incrementing the pointer, and this must work without producing an overflow when the pointer points past the array, the way it's conventionally written. If it weren't for this clause, passing a pointer to a single object to be treated as an array of size 1 would break a lot of code. Going further, is the object allocated by malloc an array or a nonarray? That would then be a critical question to ascertaining the correctness of most code out there.
And I cannot think of any reason why only pointers to nonarray objects should be usable in this manner.
And what's wrong with learning something from an article? This is really not about pointer arithmetic at all. Rather it's about a particular use of C's near-infinitely composable type system.
I must nitpick. sizeof may or may not be evaluated at compile time. It is not possible to always evaluate it at compile time (see VLAs). The standard even includes an example of this:
#include <stddef.h>
size_t fsize3(int n)
{
char b[n+3]; // variable length array
return sizeof b; // execution time sizeof
}
int main()
{
size_t size;
size = fsize3(10); // fsize3 returns 13
return 0;
}