Indices Point Between Elements(blog.nelhage.com) |
Indices Point Between Elements(blog.nelhage.com) |
It used to be popular, and still is in some circles, to debate whether programming languages ought start array indexing at 0 or 1.
When talking about this with other programmers, I've discovered that a lot of the issues/confusion could be avoided by consistent use of terminology: Offsets/offsetting always being zero-based and indexes/indexing always being one-based.
Using rulers and birthdays also helps to explain differences. You're in the first year before your first birthday, being zero (whole) years old.
To make matters potentially more confusing, culturally, I remember something about the ground floor in the UK buildings not being "Floor 1" like it is in the United States.
Of course, it's the offset from the first element, so it's kind of a circular definition.
Another example is screen space versus screen displacement. This is the difference between affine space and a vector space. Whether the upper corner is (0,0) or (222,22) shouldn't matter as long as you are doing everything relative to some point.
In C, I argue we always use offsets. Each element of an array A is at a particular memory location, the name of the array being the first memory location, and then A[i] means take the thing at A+i. Notice that the difference p-q between two memory locations p and q is exactly the offset you put into an index expression: q[p-q] == *p.
That said, it is convenient to confuse the offset with the memory location since 1. the memory location is likely not known when writing the program 2. if it were known, it would be almost impossible to use.
Now, an anecdote: I was helping implement a QR factoring algorithm from a textbook which uses 1-based indexing in a language which uses 0-based offsets. We tried changing the bounds of the nested loops to account for the difference, but it was basically impossible to avoid off-by-one errors. So, we left the loop bounds as the textbook had them and instead indexed like A[i-1], since this i-1 is the offset from A[0], the array element labeled 1.
Actually, that's perfectly explained with your offset vs. index terminology. In some countries, the floor number is an index within the array of floors. In others, it's an offset from the ground.
Is there an Esolang that numbers its arrays with 0,M,1,2,3...?
As to the point about floor numbering, the situation can be a little confusing in Canada. Some buildings label in the US style, labeling stories [1st, 2nd, 3rd] while others label in the UK style as [Ground, 1st, 2nd]. We also often mix them and you'll see [Ground, 2nd, 3rd], with ground sometimes replaced by main, or lobby.
In British English, the first floor is the first above the ground.
In American English, the first floor is the ground.
I would argue that your "first birthday" is, in fact, the day you are born—your birth day. The thing that happens for the first time a year later, is the first anniversary of your birth day.
this is an exemplary case of citation needed if I ever saw one. maybe it's a valid debate for programming languages that doesn't allow people to do pointer arithmetic, which already restrict the field a lot, but even then that's sound as part of the 4GL bullshit that never really took off, and for good reasons
Visual Basic had the "OPTION BASE" statement to select.[0] (Many other versions of basic did too)
APL also has the ⎕IO Index Origin setting [1]
If you want to see a lively debate, there's c2[2], and there's also Dijkstra[3]
[0] https://msdn.microsoft.com/en-us/library/aa266179%28v=vs.60%...
[1] https://en.wikipedia.org/wiki/APL_syntax_and_symbols
Actually it's primarily early languages plus Lua.
[0] https://en.m.wikipedia.org/wiki/Comparison_of_programming_la...
edit: Probably the most popular genome browser, based at UC Santa Cruz, uses this zero-based, half-open numbering internally. But, at some point in the past, biologists developed an expectation that numbering would be 1-based. So the Santa Cruz genome browser actually adds 1 to the start position of everything for display purposes.
But it is contextual. When it comes to languages like C, where arrays are more directly mapped to pointers and memory layout, I've found it better to talk about pointers, and allow people to derive the behavior that way.
Either way, I'd be careful of trying to claim that 'this is what it is', rather than 'here is a way to remember it'.
I find it very useful, however, for imagining what the returned insertion point index of a binary search would mean, when the item you are looking for can not be found.
https://docs.python.org/2/tutorial/introduction.html#tut-str...
One way or the other, someone needs to know the same number of rules in order to understand how the indices work.
[1] livegrep.com
[0] https://www.gnu.org/software/emacs/manual/html_node/emacs/Po...
edit: clarify Emacs reference in OP
That way the compiler would be able to tell if I accidentally mixed the two. Every conversion would have to explicit: for example, there might be two functions, "before" and "after", that take a gap index, and return an element index.
I think I might actually enjoy programming this way, but perhaps others would find it needlessly bureaucratic.
Another nice link: http://betterexplained.com/articles/learning-how-to-count-av...
This only adds to the confusion, as now there are 2 ways in which indices can be interpreted.
>"Indexing between elements, instead of indexing elements, helps avoid a large class of off-by-one errors."
It only replaces them with indexing-method errors. Instead of remembering if my ranges are open or closed, I have to remember if they are using between-element indices or on-element indices. It's still going to cause the same kinds of problems.
an index is an offset to a pointer in memory, shifted by the size of the structure it points to. there is no other way around, no magic tricks about index being between elements.
of course people never exposed from c miss out all of this, and then are left to made up bullshit about how stuff actually works
array index are offset to a memory location, plain and simple; if that requires more explaining than this, then there is the need to get back to tech the basic underpinning on how computer memory and addressing works
We do have a convention that dereferencing a memory address returns the 8 bits to the right of that address. We've even optimized our hardware for that convention. But that's just a convention of the dereference operation; it's not fundamental to the addresses themselves.
I agree that a C pointer isn't analogous to an array index; that's because a pointer is a range, determined by a pair of memory addresses. One, stored at runtime, refers to the location before the first byte of the range. The other, implicitly derived from the runtime value and the size information in the pointer type, refers to the location after the last byte of the range. When we think of memory addresses as the article's indexes, and pointers as the article's ranges, everything falls into place.
(Incidentally, please be careful calling out people for not understanding computers. C isn't actually the lowest level of computing, and pointers aren't as primitive as your post implies. When you call someone out, you need to be 100% clear and 100% right.)
[...] before pointers, structs, C and Unix existed, at a time when other languages with a lot of resources and (by the standard of the day) user populations behind them were one- or arbitrarily-indexed, somebody decided that the right thing was for arrays to start at zero.
[...] the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.
[0.5] http://exple.tive.org/blarg/2013/10/22/citation-needed/
If anything, mathematics provides as much of a reason as pointer arithmetic to start indexing at 0. Indexing from 1 occurs more if you're creating a one-to-one correspondence with some real-world object, and you want to number those objects starting from 1, perhaps because that's a convenient user-visible numbering.
In that case, wouldn't the floors actually be a good real-world example to students? It's a case where the index vs. offset convention seems to be split roughly 50%/50% around the world.
If people can't agree about which way is better for numbering floors, it's no surprise that number-crazy programmers can't agree about numbering a whole lot of other things :)
I suppose it would be a good real world example of the contrast, but I was saying I don't think it's a good, universal example to explain indexing or offsetting specifically. "You start your fourth year alive on your third birthday" has nearly universal understanding, "You exit the building on the floor numbered 1" is highly idiomatic.
^ there, semantic issue resolved
specific languages might reuse the word array for abstracting underlying optimizations, but calling array an indexed object doesn't really change what an array is, no more than calling fish a dolphin change it from being a mammal
also, a pointer is a range only when paired with a type. otherwise a pointer is the index of a cell within the address space, and you want the address space zero starting not because it's convenient, but because otherwise you wouldn't be able to reference the last cell (since it overflow your word size) unless you do additional stuff to normalize the one starting address to zero back again
using cell deliberately because memory can be accessed by word, byte, page etc
anyway. what you call a contiguous memory area that have a type and can be navigated by offset? that's an array. well then, are you going to use the pointer convention for it or just have the +1 to be removed at every access operation?
and we're back again to what an array is. arbitrary memory constructs that are called array shouldn't be taken into account for they are the one causing the whole confusion we're into and we shouldn't be, because an array is an array and an indexed object is not
In any case, I think I agree that dereferencing an address should return the byte to the right for the reason that you mention. That's a solid point, and I totally didn't think of that :) That's a really important property of the dereferencing operation.
I still feel like that doesn't make the mental model of memory-addresses-are-gaps-between-bytes any less valuable, though, nor does it mean that abstractions built on top of this memory model need to use the same conventions as the underlying system - that's the point of abstractions, after all :)
for example, if you have the following structs
typedef struct { void *key; } base;
typedef struct { base b; int misc; int data[2]; } derived;
then derived is laid out as follows -----+------+---------+---------+---------+-----
... | base | derived | data[0] | data[1] | ...
-----+------+---------+---------+---------+-----
^ ^ ^
| | |
base derived.data &derived.data[2]