Too much discussion of the XOR swap trick(heather.cafe) |
Too much discussion of the XOR swap trick(heather.cafe) |
We didn't get into the deeper question of benchmarking it vs. a three-register swap, because I suspect the latter would be handled entirely by register renaming and end up being faster due to not requiring allocation of an ALU unit. Difficult to benchmark that because in order for it to make a difference, you'd need to surround it with other arithmetic instructions.
A meta question is why this persists. It has the right qualities for a "party trick": slightly esoteric piece of knowledge, not actually that hard to understand when you do know about it, but unintuitive enough that most people don't spontaneously reinvent it.
See also: https://en.wikipedia.org/wiki/Fast_inverse_square_root , which requires a bit more maths.
The other classic use of XOR - cursor overdrawing - has also long since gone away. It used to be possible to easily draw a cursor on a monochrome display by XORing it in, then to move it you simply XOR it again, restoring the original image.
About a month later an engineer decided to turn on all warnings for gcc...and behold a message stating something to the effect of "WARNING: execution will halt upon reaching this statement." The compiled code basically just halted and never returned from the function call (can't remember the specifics).
And that is how we learned not to hide compiler messages.
This is discussed in a section near the end. But I feel like the discussion of why people care about using XOR to swap two values is missing an underlying discussion of why people would care about swapping values. As shown earlier in the division example, at the point where you would do the swap, typically you can just write the rest of the code with the roles of the values swapped.
if max < min:
min, max = max, min
[... algorithm that requires min < max]
So your suggestion would be to have two versions of the algorithm in the two branches of the 'if'. This is significantly more complicated and may even be slower depending on lots of factors.Today, yes; most modern instruction sets are pretty orthogonal, and you can use a value in any register symmetrically -- although even today, division instructions (if they exist!) are among the most likely to violate that expectation, along with instructions that work with the stack pointer. But in the XOR heyday, this was less true -- instruction sets were less orthogonal, and registers were more scarced. It's not unreasonable for an OS scheduler tick to do some work to figure out the newly-scheduled task's stack pointer in one register, and need to swap it into an SPR or similar so that the return from interrupt returns to the new location, for example; and this is the exact type of place where the XOR trick occasionally has value.
void swap(unsigned* a, unsigned* b)
{
*a = *a + *b;
*b = *a - *b;
*a = *a - *b;
}
GCC still sees through it with restrict: https://godbolt.org/z/8n3bGha3eBut that's a specific example of a general use that still is handy occasionally. A = B XOR C. Come back later with C and A and retrieve B.
Requires fixed-length keys. Generate random table for each byte position. Then to hash a key, for each position lookup byte in table and XOR the results. This is used in chess/go/shogi/... engines because the board/position representation is fixed-length and you can undo modes easily - XOR-out byte from previous state and XOR-in from new state.
is register renaming something done in hardware, or by the compiler?
Yes, error correction.
You have some packets of data a, b, c. Add one additional packet z that is computed as z = a ^ b ^ c. Now whenever one of a, b or c gets corrupted or lost, it can be reconstructed by computing the XOR of all the others.
So if b is lost: b = a ^ c ^ z. This works for any packet, but only one. If multiple are lost, this will fail.
There are way better error correction algorithms, but I like the simplicity of this one.
a^=b^=a^=b;
Which allegedly saves you 0.5 seconds of typing in competitive programming competitions from 20 years ago and is known to work reliably (on MinGW under Windows XP).
Bonus authenticity: use `a^=a` to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).
For real now, a very useful application of XOR is its relation to the Nim game [0], which comes in very handy if you need to save your village from an ancient disgruntled Chinese emperor.
Modern compilers will still use xor for zeroing registers on their own.
For instance: https://godbolt.org/z/n5n35xjqx
Variable a(register esi) is first initialized to 42 with mov and then cleared to zero using xor.
> a^=b^=a^=b;
I believe this is defined in C++ since C++17, but still undefined in C.
tmp = (a ^ b) & mask
a ^= tmp
b ^= tmp
If mask = 0xfff...fff then a/b will be swapped, otherwise if mask = 0 then they'll remain the same.One extension that I ran into, and which I think forms a nice problem is the following:
Just like the XOR swap trick can be used to swap to variables (and let's just say that they're bools), it can be extended to implement any permutation of the variables: suppose that the permutation is written as a composition of n transpositions (i.e., swaps of pairs), and that is the minimal number of transpositions that let's you do that. Each transposition can be implemented by 3 XORs, by the XOR swap trick for pairs, and so the full permutation can be implemented by 3n XORs. Now here's the question: Is it possible to come up with a way of doing it with less than 3n, or can we find a permutation that has a shortcut through XOR-land (not allowing any other kinds of instructions)? In other words, is XOR-swapping XOR-optimal?
I'm not going to spoil it, but only last year a paper was published in the quantum information literature that contains an answer [0]. I ended up making a little game where you get to play around with XOR-optimizing not only permutations, but general linear reversible circuits. [1]
[0] https://link.springer.com/article/10.1007/s11128-025-04831-5
For me it falls in the obfuscated-C quadrant for code. Performance implications aside, it's just not the kind of "self-documenting" code I like lying around in my sources. (And I'll take clarity of purpose over performance every day.)
It can only be used when walking the list of course, which is quite likely why it is not more widely used -- it does not provide the benefit that a regular double linked list is mostly used for.
I'm not sure I'd call it a "trick", but since A ^ 0 = A, and B ^ B = 0, then ((A ^ B) ^ B) = A. i.e. XOR-ing any number by the same number twice gets you back the original number.
This used to be used back in the day for cheap and nasty computer graphics, since it means that if you draw to the screen by XOR-ing with the pixels already on the screen then you can undo it, restoring the background, by doing it a second time. The "nasty" part is that XORing with what's already on the screen isn't going to look great, but for something like a rotating wire-frame figure it might be OK.
When you released to the menu button on the mouse, it did a similar swap to restore the screen contents. In version 2.0 I optimized it to do a simple copy of the offscreen rectangle back onto the screen because before it was wasting time in order to preserve the menu pixels that we’re going to be thrown away immediately.
(See also the wacky way in which ARM "load immediate" works)
So we all know addition swap. One generalization that comes to mind is doing some other in-place transform on the two input variables. Lets keep it simple and suppose that its a linear transform. Thus the problem is to apply some matrix [[a,b],[c,d]] to two input variables [x,y] using entirely in-place operations.
We can do this by realizing that our basic operands can be expressed as matrices. x += ky is the same as the matrix [1, k] [0, 1]
likewise y += kx is equiv to the lower triangular matrix [1, 0] [k, 1]
and lastly, the = operator is equiv to a matrix with an element on the diag. x = k [k ,0] [0, 1]
y *= k [1, 0] [0, k]
From this point on it becomes a challenge of if you can construct any desired matrix into some combination of these available ones (spoiler, yes you can).
The next generalization one could contemplates is doing operations in place on more than 2 variables. Well, if one has already solved arbitrary 2x2 matrix operations, then that can be rigged to implement larger matrices one submatrix at a time.
The final generalization that comes to mind is what can we do with non-arithmetic operators? We've already seen an example of this with using xor-swap rather than addition-swap. But is there anything out there vaguely like xor-2x2-matrix-multiply?
I legit don't know. I have some thought, but I won't meander out loud if its not going to lead anywhere.
For some reason this reminds me of the Fourier transform. I wonder if it can be performed with XOR tricks and no complicated arithmetic?
Solution: xor is just addition mod 2. Write the numbers in base n and do digit-wise addition mod n (ie without carry). Very intuitive way to see the xor trick.
The article makes the same point as well at the end:
It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.
int a, b;
bool cond;
int swap = cond ? a ^ b : 0;
a ^= swap;
b ^= swap;
If cond is highly unpredictable this can work rather nicely.Yes there are false positives, and the false negative of all-zero/all-equal, but the test can be useful in a "bloom filter" type case.
Have used it in dynamic firewalling rules ... one can do something pretty close to a JA3/JA4 TLS fingerprint in eBPF with that (to match the cipher lists).
https://lemire.me/blog/2022/01/21/swar-explained-parsing-eig...
Things like "add with saturation" and the special AES instructions.
a = the bits of some song or movie
b = pure noise
Store c = a^b.
Give b to a friend. Throw away a.
Now both you and your friend have a bit vector of pure noise. Together you can produce the copyrighted work. But nobody is liable.
> https://ansuz.sooke.bc.ca/entry/23
> https://ansuz.sooke.bc.ca/entry/24
As these article outline, the legal situation is much more complicated (and unintuitive to people who are used to "computer science thinking").
(unless you're an AI company, in which case you can copy the whole internet just fine)
Law is more akin to philosophy than computer science.
A few months ago, I had a rare occasion of trying to explain them to a relative who had just bought a fancy NAS and wanted help setting it up.
* wrapping is well-defined behavior for unsigned integers; signed integer wrapping is UB, but is not used here.
* the equations (that a & b cancel each other out, resulting in the swap) hold even when done in a mod N ring.
It'll break eventually. If it matters, write the simd yourself. It'll probably be 2-50x better than the compiler anyways.
How many ways could they do this? Could they note in court that they found you getting your copy from a "super secure no liability legal loophole" piracy service? Could they just get B's side, whether through subpoena or whatever mechanism you have to communicate with B? (You must, since your file is "just noise" and useless to you as it is)
[1] Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques - https://arxiv.org/abs/2001.09599
[2] https://people.csail.mit.edu/ml/pubs/fpga12_xor.pdfI ended up not taking it because the pay wasn’t great (and at the time it wasn’t really what I wanted to do), but part of me is still curious about what that would have been like.
What I remember implementing this on projects was the messiness of:
- incrementally getting the Makefiles to turn on -Wall file-by-file as they were scrubbed. I think it was something similar to "<list-of-files>: CFLAGS+=-Wall" and then add to the list.
- suppressing warnings that were "ok" on a case-by-case basis. different languages had different ways of saying "ignore error 123 here" if at all.
- I remember lint had things like this too, like /NOTREACHED/
maybe things have gotten better/cleaner.
There is no trace back from pure noise to the original work.
Colour of bits is just magical thinking.
There may be no trace from pure noise to original work, but you didn't get that particular noise randomly, you in fact got it from the original work.
Once you understand that law cares less about the thing itself, and more about the causal chain that led to it, it stops seeming magical and becomes perfectly reasonable.
(Also, FWIW, it's not that far conceptually from code = data, but there's still tons of technical people who can't comprehend the fact that there is no code/data distinction in reality. "Code" vs "data" too isn't a property of bits, it's only a matter of perspective.)
They do this by means such as "questioning people" and "finding evidence". For example, if you have a file on your computer describing your plan to use XOR to infringe copyright, that would be considered "evidence".
No law exists "physically".
Otherwise: Even in Computer Science the situation is more complicated, as is explained in the linked articles). Relevant excerpt from the first linked article:
"Child pornography is an interesting case because I find myself, and I think many people in the computing community will find themselves, on the opposite side of the Colourful/Colour-blind gap from where I would normally be. In copyright I spend a lot of time explaining why Colour doesn't exist and it doesn't matter where the bits came from. But when it comes to child pornography, I think maybe Colour should make a difference - if we're going to ban it at all, it should matter where it came from. Whether any children were actually involved, who did or didn't give consent, in short: what Colour the bits are. The other side takes the opposite tack: child pornography is dangerous by its very existence, and it doesn't matter where it came from. They're claiming that whether some bits are child pornography or not, and if so, whether they're illegal or not, should be entirely determined by (strictly a function of) the bits themselves. Legality, at least under the obscenity law, should not involve Colour distinctions.
[...]
The computer science applications of Colour seem to be mostly specific to security. Suppose your computer is infected with a worm or virus. You want to disinfect it. What do you do? You boot it up from original write-protected install media. Sure, you have a copy of the operating system on the drive already, but you can't use that copy - it's the wrong Colour. Then you go through a process of replacing files, maybe examining files, swapping disks around and carefully write-protecting them; throughout, you're maintaining information on the Colour of each part of the system and each disk until you've isolated the questionable files and everything else is known to be the "not infected with virus" Colour. Note that developers of Web applications in Perl use a similar scorekeeping system to keep track of which bits are "tainted" by influence from user input.
When we use Colour like that to protect ourselves against viruses or malicious input, we're using the Colour to conservatively approximate a difficult or impossible to compute function of the bits. Either our operating system is infected, or it is not. A given sequence of bits either is an infected file or isn't, and the same sequence of bits will always be either infected or not. Disinfecting a file changes the bits. Infected or not is a function, not a Colour. The trouble is that because any of our files might be infected including the tools we would use to test for infection, we can't reliably compute the "is infected" function, so we use Colour to approximate "is infected" with something that we can compute and manage - namely "might be infected". Note that "might be infected" is not a function; the same file can be "might be infected" or "not (might be infected)" depending on where it came from. That is a Colour.
[...]
Random numbers have a Colour different from that of non-random numbers. [...]
Note my terminology - I spoke of "randomly generated" numbers. Conscientious cryptographers refuse to use the term "random numbers". They'll persistently and annoyingly correct you to say "randomly generated numbers" instead, because it's not the numbers that are or are not random, it's the source of the numbers that is or is not random. If you have numbers that are supposed to come from a random source and you start testing them to make sure they're really "random", and you throw out the ones that seem not to be, then you end up reducing the Shannon entropy of the source, violating the constraints of the one-time pad if that's relevant to your application, and generally harming security. I just threw a bunch of math terms at you in that sentence and I don't plan to explain them here, but all cryptographers understand that it's not the numbers that matter when you're talking about randomness. What matters is where the numbers came from - that is, exactly, their Colour.
So if we think we understand cryptography, we ought to be able to understand that Colour is something real even though it is also true that bits by themselves do not have Colour. I think it's time for computer people to take Colour more seriously - if only so that we can better explain to the lawyers why they must give up their dream of enforcing Colour inside Friend Computer, where Colour does not and cannot exist."
But, meh, could just be he meant "won't get caught" and not "liability"; I make mistakes in comms all the time, after all.
That's the whole point of encryption.
Both people will say they are innocent and that the other person used the other's noise vector and the copyrighted work to produce their noise vector.
Simple: because you can't find out who tells the truth, simply jail both. :-)
See also: https://xkcd.com/1494/
Are you going to jail 100 people because one of them is lying?
Judges and juries don't need to guilt to be mathematically proved, they just have to be pretty sure.
If the prosecuting side has a reason to care that much, it doesn't matter whether it's 10 or 100 people - in fact, if it's 100 people, the original source is in deeper shit because this is now obviously not just personal use, but distribution.