Nearly all binary searches and mergesorts are broken (2006)(ai.googleblog.com) |
Nearly all binary searches and mergesorts are broken (2006)(ai.googleblog.com) |
In particular, (x + y) / 2 is the wrong implementation of midpoint in general, because it would fail to even compile on objects you can't add together. But midpoint is well-defined on anything you can subtract (i.e. anything you can define a consistent distance function for)—and it doesn't require addition to be well-defined between those objects!
One obvious (in C/C++, and not-so-obvious in Java) counterexample here is pointers/iterators. You can subtract them, but not add them. And, in fact, if you implement midpoint in a manner that generalizes to those and respects the intrinsic constraints of the problem, you end up with the same x + (y - x) / 2 implementation, which doesn't have this bug.
I guess in maths this is called a generating Lie algebra (maybe someone can comment on this?)
Basically,
1. You have a 0 time delta, and you can add and subtract them satisfying some natural equations. (time deltas form a group)
2. You can add time deltas to a datetime to get a new datetime, and this satisfies some natural equations relating to adding time deltas to each other (time deltas act on datetimes).
3. You can subtract two datetimes to get a time delta satisfying some more natural equations (the action is free and transitive).
This is often called an affine structure.
But note that that sentence was just about calculating midpoints, not about the larger binary search algorithm. And in any case, I was just trying to convey layman intuition, not write a mathematically precise theorem.
To vary your point here, the axioms for twos complement and IEEE floating-point aren't well known or observed.
avg = (x + y) / 2
which fails both for signed ints (when adding positive x and y overflows maxint) and for unsigned ints (when x + y wraps around 0). Note that this can only be considered a bug for array indices x,y when these are 32 bit variables and the array can conceivably grow to more than 2 billion elements.I wonder what is the simplest fix if the ordering between x and y is not known (e.g. in applications when x and y are not range bounds) and the language has no right-shift operation...
Google Research Blog: Nearly All Binary Searches and Mergesorts Are Broken - https://news.ycombinator.com/item?id=16890739 - April 2018 (1 comment)
Nearly All Binary Searches and Mergesorts Are Broken (2006) - https://news.ycombinator.com/item?id=14906429 - Aug 2017 (86 comments)
Nearly All Binary Searches and Mergesorts Are Broken (2006) - https://news.ycombinator.com/item?id=12147703 - July 2016 (35 comments)
Nearly All Binary Searches and Mergesorts are Broken (2006) - https://news.ycombinator.com/item?id=9857392 - July 2015 (43 comments)
Read All About It: Nearly All Binary Searches and Mergesorts Are Broken - https://news.ycombinator.com/item?id=9113001 - Feb 2015 (2 comments)
Nearly All Binary Searches and Mergesorts are Broken (2006) - https://news.ycombinator.com/item?id=7594625 - April 2014 (2 comments)
Nearly All Binary Searches and Mergesorts are Broken (2006) - https://news.ycombinator.com/item?id=6799336 - Nov 2013 (46 comments)
Nearly All Binary Searches and Mergesorts are Broken (2006) - https://news.ycombinator.com/item?id=1130463 - Feb 2010 (49 comments)
Google Research Blog: Nearly All Binary Searches and Mergesorts are Broken [2006] - https://news.ycombinator.com/item?id=621557 - May 2009 (9 comments)
This type of issue is pretty common to encounter and I make at least a few fixes a year specifically addressing integer overflow across many companies
you can also remove that unpredictable branch in the loop if you want.
whatever_t *bisect(whatever_t *offset, size_t length, whatever_t x) {
while(size_t midpoint = length / 2) {
bool side = x < offset[midpoint];
midpoint &= side - 1;
length >>= side;
offset += midpoint;
length -= midpoint;
}
return offset;
}Do the sum using signed int.
Then cast to unsigned int before the division (i.e., use a non-arithmetic shift low).
Then cast back to signed int.
func Search(n int, f func(int) bool) int {
// Define f(-1) == false and f(n) == true.
// Invariant: f(i-1) == false, f(j) == true.
i, j := 0, n
for i < j {
h := int(uint(i+j) >> 1) // avoid overflow when computing h
// i ≤ h < j
if !f(h) {
i = h + 1 // preserves f(i-1) == false
} else {
j = h // preserves f(j) == true
}
}
// i == j, f(i-1) == false, and f(j) (= f(i)) == true => answer is i.
return i
}
If you care about stuff like this you may enjoy the puzzle "Upside-Down Arithmetic Shift":https://bugfix-66.com/76b563beb6f4e61801fce4e835be862fb3dbbe...
It doesn't help that almost nobody knows this, though.
Go was designed by (among others) the father of Unix Ken Thompson, with an understanding of the mistakes of C and C++.
Another example is that Go requires explicit integer casts (disallowing implicit integer casts) to avoid what is now understood to be an enormous source of confusion and bugs in C.
You can understand Go as an improved C, designed for a world where parallel computing (e.g., dozens of CPU cores) is commonplace.
The proper method is to type promote first - not just to unsigned but to a wider variable type - 32 to 64 bits or from 64 to 128 bits. Unsigned simply gives a single extra bit, while erasing negative semantics. Promoting to twice the size works for either addition or multiplication. The benefits are correctness and the ability to be understood at a glance.
Are you sure? What's an example of an array.length that would trigger a remaining edge case here? (Keep in mind array.length is 32-bit in Java.)
The implementation shown works perfectly for arrays on the order 2^30. Calling them broken is like saying strlen is broken for strings that aren't null terminated.
Well in fact it is exactly the contrary.
https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf [PDF] page 7 of the PDF, 5 of the classroom note.
If the system has a well written formal specification then your model can be built from that without error if done diligently. One real world example is the first Algol 60 compiler, which was built to a formal specification. On the other hand if there is no useful spec or no spec at all then you end up needing to experiment, ie test, and get your model as close as you can.
It is not sufficient merely to test a program; you have to prove it correct too.
In addition, it is not sufficient merely to prove a program correct; you have to test it too.
In summary, you have to both prove a program correct, and test it; skipping either will result in buggy garbage.
1. If you only have single byte elements, you'd better use counting sort.
2. There always tend to be parts of the virtual address space that are reserved. On x86-64, most userspace processes can only access 2^47 bytes of space.
Not only that, but in practice most general purpose operating systems are designed with higher-half kernels[0].
Though even then I'm not sure you could reliably allocate two gigs of contiguous virtual space without running into some immovable OS-provided thing.
Such a function, even if it seems trivial, has some educative value as it opens an opportunity to explain the problem in the documentation.
If you look at GLSL, it has many function that do obvious things, like exp2(x) that does the same thing as pow(2,x), and I don't think anyone has any issue with that. It even has a specific "fma" operation (fma(a,b,c) = a*b+c, precisely), that solves a similar kind of problem as the overflowing average.
I briefly tried using binary search as a weeder problem and quickly abandoned it when no one got it right.
Suppose your high, low and mid indexes are as wide as a pointer on your machine: 32 or 64 bits. Unsigned.
Suppose you're binary searching or merge sorting a structure that fits entirely into memory.
The only way (low + high)/2 will overflow is if the object being subdivided fills the entire address space, and is an array of individual bytes. Or else is a sparsely populated, virtual structure.
If the space contains distinct objects from [0] to [high-1], and they are more than a byte wide, this is a non-issue. If the objects are more than two bytes wide, you can use signed integers.
Also, you're never going to manipulate objects that fill the whole address space. On 32 bits, some applications came close. On 64 bits, people are using the top 16 bits of a pointer for a tag.
Yeah, if you suppose that, you can correctly conclude that you only run into overflow if the object is a byte array that fills more than half the address space (though not the entire address space as you say). And that's why this problem remained unnoticed from 01958 or whenever someone first published a correct-on-my-machine binary search until 02006.
But suppose they aren't. Suppose, for example, that you're in Java, where there's no such thing as an unsigned type, and where ints are 32 bits even on a 64-bit machine. Suddenly the move to 64-bit machines around 02006 demonstrates that you have this problem on any array with more than 2³⁰ elements. It's easy to have 2³⁰ elements on a 64-bit machine! Even if they aren't bytes.
Is it that low and high are both floating point, so you're not constrained by int precision and so you don't get an overflow error. The article makes it sound like sign switching is the issue, but this is just a general overflow problem, right?
Example with 8-bit integers (from wikipedia):
Bits, Unsigned value, Signed value
0000 0000, 0, 0
0000 0001, 1, 1
0000 0010, 2, 2
0111 1110, 126, 126
0111 1111, 127, 127
1000 0000, 128, −128
When the logical bit shift is conducted on -128, -128 is treated as an unsigned integer. Its sign bit gets shifted such that the integer becomes 0100 0000, aka 64.
https://en.wikipedia.org/wiki/Binary_search_algorithm#Proced...
I’m not sure what this could mean. Could you please share some examples?
M = L + (R - L)/2I'm taking a pragmatic perspective: like it or not, people are going to skim the article and copy & paste the pseudocode.
Given that the pseudocode is buggy in the vast majority of programming languages and the user isn't informed about this in the pseudocode, it's going to lead to unnecessary bugs.
https://en.wikipedia.org/wiki/Binary_search_algorithm#Implem...
And as other mentionned, this is pseudo code and not implementation. But if you think it's incorrect, feel free to correct it.
Yes, we are a long way from flipping switches to input machine code, but there are still hardware considerations for correctness and performance, e.g. the entire industry of deep learning running somewhat weird implementations of linear algebra to be fast on GPUs.
mid = low/2 + high/2https://www.khanacademy.org/math/arithmetic-home/multiply-di...
What is relevent here is that integer division is not distributive over addition.
for unsigned numbers. not sure if it works for signed numbers.
(x>>1) + (y>>1) + (0x01 & x & y)
I've coded binary searches and sorts tons of times in C++, and yet none was succeptible to this bug. Why? Because, whenever you're talking indices, you should ALWAYS use unsigned int. Since an array can't have negative indices, if you use unsigned ints the problem is solved by design. And, if the element is not found, you throw an exception.
Instead, in C you don't have exceptions, and you have to figure out creative ways for returning errors. errno-like statics work badly with concurrency. And doing something like int search(..., int* err), and setting err inside of your functions, feels cumbersome.
So what does everyone do? Return a positive int if the index is found, or -1 otherwise.
In other words, we artificially extend the domain of the solution just to include the error. We force into the signed integer domain something that was always supposed to be unsigned.
This is the most common cause for most of the integer overflows problems out there.
looks fairly simple to me. note this works of any ordering of R and L if the data type is signed.
x / 2 + y / 2 + ((x & 1) + (y & 1)) / 2
x / 2 + y / 2 + (x & y & 1)
Edit: This is the same you wrote, but it gets the wrong number for negative values, for negative ints the rounding will go up and not down.The difference of two elements of the torsor of some group G is an honest-to-God group element of G, though, and so you have an honest-to-God identity element. You may or may not have an honest-to-God division or halving operator (which computes e given (e + e)) but in cases where G is the additive group of some field you do.
However, in this case our array indices are drawn from something like ℤ/2³²ℤ, and we might be trying to halve odd numbers, so none of this is justifiable! We want something different from our halving operator.
https://math.ucr.edu/home/baez/torsors.html
I see dataflow and maxiepoo were already talking about this: https://news.ycombinator.com/item?id=33493149
… some field of characteristic ≠ 2, of course.
(x^y)/2 + (x&y)- Which inputs are valid.
- For a valid input, what the constraints on the return value are.
You'd have a point if the implementations had as an input constraint: "array must be less than 2^30". But they didn't.
Otherwise, nothing is broken unless it never returns the right answer. Take:
fn add(x: u32, y:u32) -> u32 { return 1; }
This implementation works perfectly for numbers that add to 1. It just loses generality outside of that.
fn add(x: u32, y: u32) -> u32 { return (x + y) - (x >> 10) - (y >> 10); }
This implementation works for x < 2^10 and y < 2^10. Arguably this implementation is much worse than the previous one because it fails unexpectedly. At least the previous implementation would be much more obviously broken.
But these are both broken because they don't fulfill the (implicit) contract for add. You can't just say "well, it's implied that my add function only takes inputs that add to 1" unless you actually write that somewhere and make it clear.
More generally, I feel like the thought process of "it's not broken if it works fine for inputs that occur 99% of the time" is an artifact of how little attention we pay to correctness, not something that is intrinsically true. If your function breaks for inputs that are clearly within its domain without any kind of warning... it's broken, as much as we might not want to admit it. We're just so used to this happening near edge cases that we don't think about it that way, but it's true.
They do implicitly. It's just common sense. When you read a recipe in a cookbook, it usually doesn't mention that you're expected to be standing on your legs, not on your arms. Reader is expected to derive these things themselves.
A lot of generic algorithm implementations will start acting weird if your input size has the order of INT_MAX. Instances this big will take days or weeks or process on commodity CPUs, so if you're doing something like that you would normally use a specialized library that takes these specifics into account.
Very much on brand for the FAANG r/iamverysmart crowd though.
C has size_t. Use it.
(And because C doesn't mandate correct handling of benign undefined behavior, you still have a problem if you `return ptr-orig_ptr` as a size_t offset (rather than returning the final ptr directly), because pointer subtraction is specified as producing ptrdiff_t (rather than size_t), which can 'overflow' for large arrays, despite that it's immediatedly converted back to a correct value of size_t.)
['a','b','c'].indexof('b') == 1 // found - return index
['a','b','c'].indexof('w') == 3 // not found - return size of arrayCasts from int to pointer should be explicit.
We are enamoured with programmer convenience at the expense of the safety of our systems. It’s unprofessional and we should all aim to fix it.
The problems here are about using integers that are too narrow and not properly doing arithmetic to prevent overflow from impacting the result.
Isn't this article a counterexample to that? Where using signed instead of unsigned actually does result in a loss of functionality?
sounds like a lot of your code is in fact broken...
Heh. But then again, these kind of people will create way worse problems than last-bit overflows.
Well, that's arguable. This "mistake" could be fixed in C tomorrow without breaking the semantics of any existing C code, but notice that this hasn't been "fixed" in any of the latest C standards, so perhaps it's still there for a reason.
And the reason it hasn't been "fixed" is that compilers can optimize code better if they can assume that signed addition won't overflow.
So it's more of a trade-off rather than strictly being a disadvantage.
It's also something you can "fix" in your own code if you want to, by passing a compiler flag (-fwrapv in gcc), although arguably, at that point your code wouldn't be strictly C-standard compliant anymore. Or by using some library that handles arithmetic overflow by wrapping around, which could be implemented in standard C.
> Another example is that Go requires explicit integer casts (disallowing implicit integer casts) to avoid what is now understood to be an enormous source of confusion and bugs in C.
I agree with you on this, although forcing explicit casts also makes the code more verbose and can make it harder to understand what's going on.
I think a balanced approach is requiring explicit casts only for the "tricky" cases, i.e. when the values might become truncated, and possibly also when sign-extension might be necessary and therefore might result in something the programmer didn't expect.
But if I were to design a language I'm not sure that I would require explicit casts for e.g. promoting a uint16_t to a uint32_t...
> You can understand Go as an improved C, designed for a world where parallel computing (e.g., dozens of CPU cores) is commonplace.
That's a bit of a hot take :) Let me know when the Linux kernel starts to get rewritten in Go ;)
Everyone says that and they have no proof.
Here's an entire blog post with proof:
https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...
> They do implicitly. It's just common sense.
That's neither how language specifications work, nor true in this case even if it's true in other cases. Providing one more of the same kind of input that already works is in no way the same thing as changing something totally unrelated.
> When you read a recipe in a cookbook, it usually doesn't mention that you're expected to be standing on your legs, not on your arms.
I don't think this binary search was breaking because of people standing on their arms either.
> A lot of generic algorithm implementations will start acting weird if your input size has the order of INT_MAX. Instances this big will take days or weeks or process on commodity CPUs,
It's incredibly strange to read this from someone in 2022. I don't know of any standard library algorithm that would take "days or weeks" for inputs of size 2^31 now, let alone the majority of them being like this. In fact I don't think this was the case back when the article was written either.
I don't see my other arguments being convincingly refuted, so they still hold.
How ancient is your machine? Quicksorting (2^31 - 16) elements (because that's Java's hardcoded limit) takes < 11 seconds on my machine, and a big chunk of that time is taken in the random number generation to create the input...
# Temp.java
import java.util.*;
public class Temp { public static void main(String[] args) { byte[] arr = new byte[0x7FFFFFF0]; new Random().nextBytes(arr); Arrays.sort(arr); } }
$ javac Temp.java && time -p java Temp
real 10.30 char array[60000]; // 5KB left for code and stack if not segmented
size_t i = 40000;
size_t j = 50000;
size_t mid = (i+j)/2; // should be 45000
// i+j = (size_t)90000 = 24464
// mid = 24464/2 = 12232 != 45000
Larger integers make the necessary array size bigger, but don't change the overall issue.Well, they're both bitwise right shifts, the ">>>" is specifically a logical or unsigned right shift.
Moreover I don't see how you deny that using signed would lose functionality in this case—it's pretty undeniable that it gives the wrong answer in cases where unsigned would give the correct answer; the code is right in the article and you can test it out. This is true irrespective of any other cases that neither might handle correctly (assuming you believe any exist, but see my previous paragraph).
Moreover, it is trivial to convert from a 32-bit signed or unsigned type to a 64-bit int, so you are not constrained by the size type of Java.
Obviously maths with the smaller representations will be quicker than with this array representation, so the interpreter does some work to try and use smaller representations where possible. But if you tried to, say, add two 64-bit signed ints together, and the result would overflow, then the interpreter will transparently convert the integers into the array representation for you, so that the overflow doesn't happen.
So the first poster said that the default merge sort implementation on Wikipedia was buggy, because it doesn't protect against overflows (assuming that the implementation used fixed-sized integers). The second poster pointed out that if the implementation used these arbitrary precision integers, then there is no chance of overflow, and the code will always work as expected.
You can look up "bigint" which seems to be the term of art for implementations of arbitrary precision integers in most languages. You can also read a bit about how they're implement in Python here: https://tenthousandmeters.com/blog/python-behind-the-scenes-...
In the spirit on nitpicking on edge cases: It does, but quiet often you pass a number to some C library (other than stdlib) and C does not honour this arrangement.
For example.
Until you run out of memory, but yeah.
What article are you referring to? This article is about binary search and mergesort, not quicksort?
And which quicksort has O(n^2) typical behavior? That's the worst-case behavior you normally only get on adversarial inputs, not the typical one. (And standard libraries have workaround for that anyway.)
I still think that it's normal for programs to behave weird when the input parameters are getting close to INT_MAX. Sometimes it's unavoidable. And if it's not specified in the function docs, you should go and check the implementation as a programmer. For binary search it is avoidable, so the criticism of the linked implementation is fair.
However, the linked-to binary search starts from 0:
1: public static int binarySearch(int[] a, int key) {
2: int low = 0;
3: int high = a.length - 1;
and the promoted fix is: 6: int mid = low + ((high - low) / 2);
The aforementioned Wikipedia binary sort also takes a length, rather than start/end: function binary_search(A, n, T) is
L := 0
R := n − 1An affine space is a torsor under a vector space, and you can have instead a torsor under any group. This loses a bit of structure, in the sense that you can take convex combinations in an affine space but not in an arbitrary torsor; but otherwise it is a proper generalisation. But the convex combination $(a + b)/2$ used to obtain a midpoint is exactly what we want here!
Though it pains me to say so as an algebraist, I think that it probably just isn't a problem most usefully modelled with a more abstract algebraic structure. Although it would be easy to cook up a structure permitting "division with rounding" … maybe a Euclidean domain (https://en.wikipedia.org/wiki/Euclidean_domain) is something like the right structure?
And technically speaking, I think C doesn't guarantee that 'size_t' is at least as large as a 'signed int' (even though this is true on all platforms that I know of), so your approach would fail if that weren't the case. Although, you could use 'ssize_t' instead of 'int', or 'unsigned int' instead of 'size_t' to fix that.
> The real reason it doesn't work in C is because a memory region can be large enough to still overflow in that case.
The Go code we are discussing has nothing to do with memory regions, it's a generic binary search function, so it can be used for e.g. bisecting git commits. It doesn't require the calling function to use arrays.
Although yes, if the calling code were trying to do a binary search on an array, conceptually it could fail, but in that case you could argue the bug would be in the calling function, because it would be trying to pass the array length into a binary search function which only accepts an `int` or `ssize_t` function parameter, which could result in the array length being truncated. But strictly speaking, this would not be an arithmetic overflow issue.
That said, I would just fix the code so that it works for the full 'size_t' range, since the most common use case of a binary search function is indeed to do searches on arrays. In that case, the Go approach wouldn't work indeed.
That doesn't matter, because size_t is large enough to hold any array index (that's kind of[0] the defining property of size_t), so any array index in a signed int can be safely converted to size_t. The real problem is that (using 16-bit size_t for illustrative purposes) if you have, say, x = (size_t)40000 and y = (size_t)50000 into a 60000-element array, x+y = (size_t)90000 = (size_t)24464, which means (x+y)/2 = 12232, which is the completely wrong array element.
0: Technically, size_t is large enough to hold any object size, but array elements can't be smaller than char (sizeof can't be less than 1), so a array can't have more elements than it's sizeof.
> That doesn't matter, because size_t is large enough to hold any array index (that's kind of[0] the defining property of size_t), so any array index in a signed int can be safely converted to size_t.
Well, the Go code we're discussing has nothing to do with arrays or array indices, so `size_t` doesn't help here.
Go look at the code :) It's a generic function for doing binary search, which accepts an `int` as a function argument, specifying the search size.
The code is then doing:
h := int(uint(i+j) >> 1) // avoid overflow when computing h
Replacing the Go expression `uint(i+j)` with `(size_t)i+(size_t)j` in C like morelisp proposed would not work correctly if `size_t` is smaller than `int`.That's the point I was making.
For 32 bit systems using 64bit instead of size_t would similarly solve the problem.
Conceptually, a 64-bit kernel today could allow your program to allocate (almost) the entire 64-bit address space, assuming it does memory overcommit (like Linux) and/or uses some kind of memory compression (like Linux supports as well).
There might be some MMU limitations on today's mainstream systems, but this doesn't mean that all 64-bit systems have those limitations or that those limitations will remain there in the future.
So your code would break as soon as a new system comes along without those limitations.
Also, this would be even more true if the code and stack would be stored in different address spaces, as theoretically that would even allow you to allocate the entire address space, I think.
A 48-bit index to an array can represent >240TBytes of RAM minimum - if your records are > 1 byte, you have significantly higher storage requirements. The largest system I could find that’s ever been built was a prototype that has ~160TiB of RAM [1]. Also remember. To make the algorithm incorrect, the sum of two numbers has to exceed 64bits - that means you’d need >63-bits of byte-addressable space. That just simply isn’t happening.
Now of course you might be searching through offline storage. 2^63 bits is ~9 exabytes of an array where each element is 1 byte. Note that now we’re talking scales of about about the aggregate total storage capacity of a public hyperscaled cloud. Your binary search simply won’t even finish.
So sure. You’re technically right except you’d never find the bug on any system that your algorithm would ever run on for the foreseeable future, so does it even matter?
As an aside, at the point where you’re talking about 48-bits worth of addressable bytes you’re searching, you’re choosing a different algorithm because a single lookup is going to take on the order of hours to complete. 63-bits is going to take ~27 years iff you can sustain 20gib/s for comparing the keys (sure binary search is logarithmic but then you’re not going to be hitting 20gib/s). Remember - data doesn’t come presorted either so simply getting all that data into a linearly sorted data structure is similarly impractical.
~~C does not guarantee a 64 bit type exists.~~ This is not really correct these days.
Isn't (signed/unsigned) 'long long int' mandatory since C99?
It says: 'There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int'.
My cursory search for 'long long' in the standard didn't find anything about it being optional...
> Signed integer expression simplification [...]
This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> the index calculations are typically done using 32-bit int
No, they're done using size_t; that's what size_t is for.
> undefined overflow ensures that a[i] and a[i+1] are adjacent.
No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
(Also, note that this does not mean the optimizer is allowed to access memory at &a[i+1], since that might be across a page boundary (or even the wraparound from address ...FFF to 0), which makes adjacency less helpful than one might hope.)
> Value range calculations [...] Loop analysis and optimization [...]
Allowed but not required to retain excess bits on signed overflow.
> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> > Value range calculations [...] Loop analysis and optimization [...]
> Allowed but not required to retain excess bits on signed overflow.
What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?
If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.
That would be quite surprising, to say the least.
> No, they're done using size_t; that's what size_t is for.
for (int i = 0; i < 256; i++)
a[i] = b[i] + c[i];
So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.
But even then you might use an `int` if you know the arithmetic result will fit.
> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.
If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.
If x+[a or 1] and y+[3 or b] don't overflow, then computing them in a wider signed integer type has no effect, since the same values are produced as in the original type.
More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs). In cases where you would not have gotten undefined behaviour, there is no signed integer overflow, so the values you get are not partially-undefined, and work the same way as in the signed-overflow-is-undefined-behaviour model. ... you know what; table:
signed overflow is \ no overflow unsigned overflow signed overflow
undefined behaviour: correct result truncated mod 2^n your sinus cavity is a demon roost
wide intermediates: correct result truncated mod 2^n one of a predictable few probably-incorrect resultsAnd the blog post lists dozens of optimizations which can indeed be performed due to that decision.
The benefits of these optimizations will vary depending on the actual code and on the architecture/CPU, of course.
It will be greater for hot and tight loops that benefit from these optimizations and less important for cold parts of the code.
It will also be greater for more limited CPUs and less important for more sophisticated ones.
The lower_bound you pointed to takes start/end iterators defining the partially-ordered range to examine.
The linked-to essay and the Wikipedia take array + size.
These are different APIs.
The latter - which is what this thread is about, IMO - is easier to implement because the sizes are always non-negative.
https://en.cppreference.com/w/cpp/named_req/RandomAccessIter...
they were conceived for c++'98 and transmogrified to a trait for c++20
the run time in GPs lower_bound link is log(n) for random access iterators and liner for non-random access.
But how does that change my observation about the differences in the two APIs?
As I understand it, the correctness of any correct algorithm is conditioned on some requirements about the data types it's operating on and the operations applicable to it. Once you formalize that set of requirements, you can apply the algorithm (correctly) to any data type whose operations fulfill them.
But some sets of data and some operations on them that fulfill some formally stated requirements are just an abstract algebra, aren't they? Like, if you have two data types {F, V} and operations on them {+, ×, ·, +⃗} that fulfill the vector-space axioms, they're a vector space, so any vector-space algorithm will work on them. So every algorithm (or, more accurately, every theorem about an algorithm) defines some algebraic structure (or several of them), which may or may not be a well-known one.
For binary search you have two sets: the set of indices I, which is what we've been talking about, and the set E of elements that might occur in the array a you're searching through. You need a total order on E, and I think you need a total order on I, and the array a needs to be sorted such that aᵢ ≤ aⱼ if i < j (though not conversely, since it might contain duplicates). You need a midpoint operation mid(i, j) to compute a new index from any two existing indices such that i ≤ mid(i, j) < j if i < j. And "mid" needs a sort of well-foundedness property on I to guarantee that the recursion eventually terminates; for any subset of ℤ you can define a "mid" that fulfills this, but for example in ℚ or ℝ you cannot, at least not with the usual ordering.
I don't think a Euclidean domain is quite the right thing for I because there's no obvious total order, and I think you need a total order to state the sortedness precondition on the array contents. Also, lots of Euclidean domains (like ℚ or ℝ) are dense and therefore could lead you to nonterminating recursion. And it isn't obvious to me that "mid" requires so much structure from I.
If you care about complexity you also have to define the cost of operations like ≤, mid, and array indexing, so your algorithm is no longer defined on just an abstract algebra, but an abstract algebra augmented with a cost model. But for correctness and termination that isn't required.
Not quite. A variety of algebras (which is usually what people have in mind when they talk about "algebraic structures" in general) is a collection of operations with equational laws, meaning that they're of the form "for all x_0, x_1, ... (however many variables you need), expression_1(x_0, x_1, ...) = expression_2(x_0, x_1, ...)", where the expressions are built up out of your operators.
Fields are the classic example of a structure studied by algebraists which is nonetheless not a variety of algebras: division isn't described by equational laws, because it's defined for everything except zero. This makes fields much harder to work with than e.g. groups or rings, in both math and programming.
I think you may be confusing integral domains (https://en.wikipedia.org/wiki/Integral_domain) with Euclidean domains (https://en.wikipedia.org/wiki/Euclidean_domain). Although any field is a Euclidean domain, the structure is not very useful (what I am used to calling the norm, but Wikipedia calls the Euclidean function, may be taken to be identically 0, and the remainder from every division may also be taken to be 0). But anyway, I was just free associating (no pun (https://en.wikipedia.org/wiki/Associative_property) intended).
I certainly agree that one can create a structure that encodes the algebra of binary search, and, at a casual glance, your definition looks good to me. I meant only that such a structure seems unlikely to do much more than to encode binary search (unlike, say, affine spaces, which are useful in a very broad variety of settings for which they have not been purpose-built) … although of course any mathematical structure, howsoever abstract or specialised, will find other uses if enough people get interested in it.
"Today" being the important part. That could change tomorrow. I could implement a 64-bit CPU right now that would support it (on an FPGA). It's not an inherent limitation, it's just an optimization that current CPUs do because we don't need to use the full 64-bit address space, usually.
Also, address space doesn't necessarily correspond 1-to-1 with how much memory there is.
For example, according to the AddressSanitizer whitepaper, it dedicates 1/8th of the virtual address space to its shadow memory. It doesn't mean that you need to have 2 exabytes of addressable storage to use AddressSanitizer, or that it reads or writes to all that space.
As I said, memory overcommit and memory compression (and also page mapping in general, as well as memory mapping storage and storage compression and storage virtualization, etc) allow you to address significantly more memory (almost infinitely more) than what you actually have.
There are other tricks with memory, page mapping and pointers which could break your code if it's not standards-compliant. This could happen for security reasons or because of new compiler or kernel optimizations or new features.
So I agree that this isn't a problem right now, unless you're doing something very esoteric, but if you want to have standards-compliant code and be more future-proof then you cannot rely on that.
There is also the point that the Go code that we're discussing has nothing to do with arrays, memory or address spaces, because it's a generic binary search function that works for any function "f" passed as an argument.
For example, it can be used to do a binary search for finding the zero of a mathematical function (i.e. for finding which value of `x` results in `y` becoming zero in the equation `y=f(x)`) and this has nothing to do with address spaces.
You’re hand waving away way too much complexity. Please do build this system. Keep in mind that addressing 63bits of memory with huge tables on will use up > 2 Tera worth of PTEs which translate to what, 16 Terabit worth of memory? This is simply an order of magnitude more than dedicated machines ship with. You’re certainly not getting an FPGA with that.
> For example, according to the AddressSanitizer whitepaper, it dedicates 1/8th of the virtual address space to its shadow memory. It doesn't mean that you need to have 2 exabytes of addressable storage to use AddressSanitizer, or that it reads or writes to all that space.
I think you’re failing to appreciate how large 2^63 bytes is.
> As I said, memory overcommit and memory compression (and also page mapping in general, as well as memory mapping storage and storage compression and storage virtualization, etc) allow you to address significantly more memory (almost infinitely more) than what you actually have.
See point above. Such a system is just not likely to exist in your lifetime.
> but if you want to have standards-compliant code and be more future-proof then you cannot rely on that.
All code has a shelf life. What’s the date you’re working on here? I’m willing to bet it’s not an issue by the end of this century.
The page table is itself stored in virtual memory, is a tree structure and it can be fully sparse, i.e. you only need to populate the PTEs that you use, basically.
Keep in mind, as long as you enable memory overcommit or use MAP_NORESERVE in mmap, you can allocate 127 TiB (~2^47 bytes) worth of virtual address space on Linux x86-64, today, at zero cost. With 4K pages!
In fact, I just did that on my laptop and memory usage has remained exactly as it was before.
And on POWER10 you can map 2^51 bytes of virtual address space today, also at zero cost.
> I think you’re failing to appreciate how large 2^63 bytes is.
No, I do appreciate it. It's 65536 times larger than the maximum address space you can allocate today on Linux x86-64 at zero cost. With 4K pages. Or a factor of 2048 larger than POWER10 can do today, also at zero cost.
In fact, with 1G HugePages, the maximum theoretical number of PTEs needed for 2^64 bytes of address space would be LESS than the number of PTEs needed for the 2^47 bytes you can allocate today, on Linux x86-64, with 4K pages, at zero cost (which I just did, on my laptop).
The maximum amount of virtual address space you can allocate is only limited by how many bits the CPU and MMU are designed to address.
If you don't believe me, you can try it yourself:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
// 127 TiB
size_t size = 127ULL * 1024 * 1024 * 1024 * 1024;
printf("allocating %zu bytes of virtual address space...\n", size);
void *p = malloc(size);
if (p == NULL) {
perror("malloc");
exit(1);
}
printf("success: %p\n", p);
sleep(3600);
}
Be sure to do 'echo 1 > /proc/sys/vm/overcommit_memory' as root and then run the program: $ gcc -o alloc alloc.c -Wall
$ ./alloc
allocating 139637976727552 bytes of virtual address space...
success: 0xf29bb2a010
Then observe how memory usage on your system hasn't changed.Yes, but saying "signed overflow is completely undefined" simply means that you are not allowed to do that, so this is a very well-defined policy and as an experienced programmer you know what to expect and what code patterns to avoid (hopefully).
If you say "signed overflow is allowed" but then your code behaves nondeterministically (i.e. giving different results when signed overflow happens, depending on which compiler and optimization level you're using or exact code you've written or slightly changed), I would argue that would actually be more surprising for an experienced programmer, not less!
It would make such signed overflow bugs even harder to detect and fix! As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
And it would prevent tools like UBSan from working to detect such bugs because some code would actually be correct and rely on the signed overflow behavior that you've defined, so you couldn't just warn the programmer that a signed overflow happened, as that would generate a bunch of false alarms (especially when such signed-overflow-relying code was part of widely used libraries).
> More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs).
C compilers are already allowed to do what you say, currently. But I'm not sure that relying on that behavior would be a good idea :)
I think it's preferable that the C standard says that you are not allowed to overflow signed integers, because otherwise the subtlety of what happens on signed overflow would be lost on most programmers and it would be very hard to catch such bugs, especially due to code behaving differently on slightly different configurations (hello, heisenbugs!) and also because bug-detection tools couldn't flag signed overflows as invalid anymore.
No, saying "signed overflow is a compile-time error" means you're not allowed to do that. Saying "signed overflow is completely undefined" means you are allowed to do that, but it will blow up in your face (or, more likely, your users' faces) with no warning, potentially long after the original erroneous code change that introduced it.
> As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
That sentence is literally indistinguishable from a verbatim quote about the problems with undefined behaviour. Narrowing the scope of possible consequences of signed overflow from "anything whatsoever" to "the kind of problems that are kind of reasonable to have as a result of optimization" is a strict improvement.
> And it would prevent tools like UBSan from working
> bug-detection tools couldn't flag signed overflows as invalid
The standard says `if(x = 5)` is well-defined, but that doesn't stop every competently-designed (non-minimal, correctly configured[0]) compiler from spitting out something to the effect of "error: assignment used as truth value".
0: Arguably a fully competently designed compiler would require you to actually ask for -Wno-error rather than having it be the default, but backward compatibility prevents changing that after the fact, and it would require configuring -Wall-v1.2.3 (so build scripts didn't break) anyway.
> No, saying "signed overflow is a compile-time error" means you're not allowed to do that.
In the vast majority of cases it's not possible to statically determine if signed overflow will occur, so compilers can't do that. I'm sure they would do it, if it were possible.
> Saying "signed overflow is completely undefined" means you are allowed to do that, but it will blow up in your face
No, it does not mean that. You're not allowed to do signed overflow in standard-compliant C, period.
You're allowed to do signed arithmetic, but the arithmetic is not allowed to overflow. You can write code that overflows, but it will not have defined semantics (because that's what the standard says).
And the compiler cannot enforce or emit a warning when overflow occurs because in the general case it's not possible to statically determine if it will occur.
But if the compiler can determine it, then it will emit a warning (at least with -Wall, I think).
And if you pass the `-ftrapv` flag to GCC (and clang, probably), then your code will deterministically fail at runtime if you do signed overflow, but for performance reasons this is not required by the standard.
> > As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
> That sentence is literally indistinguishable from a verbatim quote about the problems with undefined behaviour. Narrowing the scope of possible consequences of signed overflow from "anything whatsoever" to "the kind of problems that are kind of reasonable to have as a result of optimization" is a strict improvement.
No, because experienced programmers don't expect signed overflow to work, because it's not allowed. Such bad code would be caught if you enable UBsan. But if the C standard would require what you propose, then UBsan could not fail when a signed overflow occurs, as that could be a false positive (and therefore make such signed-overflow detection useless).
If you allow signed overflows then you have to define the semantics. And nondeterministic semantics in arithmetic is prone to result in well-defined but buggy code, while in this case also preventing bug-detection tools from being reliable.
That said, you could conceivably implement such semantics in a compiler, which you could enable with some flag, like for example, -fwrapv causes signed overflow to wraparound and -ftrapv causes signed overflow to fail with an exception.
So you could implement a compiler flag which does what you want, today, and get all the benefits that you're proposing.
You could even enable it by default, because the C standard allows the behavior that you're proposing, so you would not be breaking any existing code.
And this would also mean that existing and future C code would still be C-standard compliant (as long as it doesn't rely on that behavior).
But making the C standard require that behavior means that there will be well-defined, standard-compliant code that will rely on those weird semantics when signed overflow occurs, and that's a really bad idea.
> > bug-detection tools couldn't flag signed overflows as invalid
> The standard says `if(x = 5)` is well-defined, but that doesn't stop every competently-designed (non-minimal, correctly configured[0]) compiler from spitting out something to the effect of "error: assignment used as truth value".
The big difference in this case is that such errors are easy to determine at compile-time. No compiler would cause such code to fail at run-time, because it would lead to unexpected and unusual program crashes, which would result in both users and programmers to be mad (especially if the code is correct). But for signed overflows, it's not possible to implement a similar compile-time error.
As another example, AddressSanitizer is competently-designed but as soon as you enable `strict_string_checks` you will run into false positives if your code stores strings by keeping a pointer and a length, rather than forcing them to be NULL-terminated, so that flag can be completely useless (and for my current project, it is).
Which is why I'm guessing it's disabled by default. Which means almost nobody uses that flag.
This happens because strings are not actually required to be NULL-terminated in C, even though most people use them that way. So there is code out there (including mine) that relies on strings not always being NULL terminated, and this has well-defined semantics in C.
But of course, as soon as that happens, then you can't rely on the tool to be useful anymore, because there is perfectly fine code which relies on the well-defined semantics.
Note that in this case, "strict_string_checks" is a run-time check, like detection of signed overflows would have to be.
Just to be clear, even if a PTE entry was just 1 pointer long (it's not), covering 63 bits of address space with 1 GiB PTEs would require >73 GiB just for the page tables. And those page tables ARE getting materialized if you're doing a binary search over that much data.
I'm not as imaginative in you to see a world in which you can sparsely map in 2^63 elements (9 exabytes if 1 byte per element) on one CPU and then the problem you're solving is a binary search through that data which is going to cause about log(n) to be mapped in to satisfy the search. 1 exabyte is probably the amount of RAM that Google has collectively worldwide. Now sure, maybe you're talking about mapping files on disk but again. 1 exabyte is a shit ton. It's probably several clusters worth of machines for storage. And even with 1 GiB pages, you're talking about 1 billion PTEs total and each lookup is going to need to materialize ~9 PTEs to search. And all of that is again a moot point because no CPU like that exists or will exist any time soon.
One has to be quite careful with the terminology when crossing disciplines, as here. Fields are, rather boringly, algebras over themselves, in what I dare to call the ‘usual’ sense of an algebra over a field (https://en.wikipedia.org/wiki/Algebra_over_a_field). Rather what hither_shores is saying is that the collection of fields does not form a variety of algebras, in the sense of universal algebra https://en.wikipedia.org/wiki/Variety_(universal_algebra) (which is itself quite different from the algebraic-geometry sense https://en.wikipedia.org/wiki/Algebraic_variety). See, for example, https://math.stackexchange.com/questions/3756136/why-is-the-... .