FizzleFade

FizzleFade(fabiensanglard.net)

604 points by pietrofmaggi 8 years ago | 176 comments

antirez 8 years ago |

An alternative approach that works for every resolution: http://antirez.com/news/113

dsacco 8 years ago | |

This is a good post, kudos on writing it up so quickly! In fact, the first thing I thought of when I read this post on HN was, "Well why not use a pseudorandom permutation instead of a pseudorandom function, this way we efficiently fill all pixels without first checking if they're red?"

Being that a feistel network is a pseudorandom permutation, this fulfills the need (and in my opinion, even more elegantly than a LFSR). For even better performance you could use AES as the PRF, especially if users have AES-NI instructions available for acceleration. Then use a basic Feistel network for the PRP.

antirez 8 years ago | | |

Thanks! Exactly my thought indeed. Probably now that many people are aware of crypto primitives, permutation boxes and other related tools it is an immediate thought to have, but potentially back then when the game was written it was not so obvious.

0x4a42 8 years ago | | |

>This is a good post, kudos on writing it up so quickly!

Wait... what? It took him 25 years! ;)

wyldfire 8 years ago | |

> A good tool to have in a programmer mental box.

Relates well to Shannon's problem solving process' [1] "Step 2: Fill your 'mental matrix' with solutions to similar problems."

[1] http://www.businessinsider.com/engineer-claude-shannon-probl...

first_amendment 8 years ago | |

Always love seeing applications of Feistel cipher. Used it with AES as the PRF for implementing FPE in legacy systems.

Just want to note that this approach (regardless of PRF) probably wouldn't have worked in 1991. Recomputing the cipher state at every pixel is probably ~10x slower than the single shift + xor in the iterative LFSR approach.

antirez 8 years ago | | |

Hello, yes the approach is slower in my implementation, but I've the feeling that a suitable F (much simpler) and a low number of rounds could do the trick. However the highlight in the original post was that the ports were not able to reproduce the effect. Given that the ports are AFAIK successive and use higher resolutions, I bet that the CPU was not an issue in that case.

EDIT: I just randomly checked that 4 rounds of F = ((r * 31) ^ (r >> 3)) & 0xff provide more or less the same result. Multiplying for 31 is the same as shifting 5 bits on the left and subtracting the number again, so it's just 4 rounds of bit shifting and xor.

Orangeair 8 years ago | |

You state that a Feistel network has the property that each input value is mapped to a different output value, but how do you ensure that there isn't some cycle whose length is shorter than that of the size of the set of possible inputs? That is to say, what guarantees that every pixel is reached at least once?

first_amendment 8 years ago | | |

Feistel is a permutation. That means it's a 1:1 mapping between a 16-bit # to another 16-bit #.

You run Feistel for each number 0->65535 (corresponding to the "stage" of the FizzleFade) and out comes the pixel to redden at that stage. Since it's a 16-bit number, some values will fall outside of 320x240 resolution, and you ignore those.

antirez 8 years ago | | |

Hello, please check the chillingeffect comment replies, it is basically the same question. There are no cycles since it's not a generator where the previous number is the seed for the next. It's a transformation which is invertible and guaranteed to be unique by the Feistel network structure itself.

chillingeffect 8 years ago | |

How do you pick and test the parameters in the F() transformation to guarantee that "Every input 16 bit input will generate a different 16 bit output?" your comment says they were picked at random... was that after some iterations? how did you know when you had a proper transformation?

Thank you.

dsacco 8 years ago | | |

It might help if you look at a diagram such as this one to understand exactly what the code is doing: https://en.m.wikipedia.org/wiki/Feistel_cipher#/media/File%3...

So you're basically using what's known as the Luby-Rackoff construction on a provable pseudorandom function to create a pseudorandom permutation. A pseudorandom function generates output that appears random but which can repeat, which is why it cannot be invertible, and is thus unsuitable as a block cipher (you need to be able to decrypt a ciphertext to a specific plaintext).

A pseudorandom function is used as the round function in the feistel network (in the diagram, that's denoted by the F in the middle). You seed the pseudorandom function with a key K. Because the Feistel network successively transforms L and R in each round (L0, R0, L1, R1 and so on), it can be proven that even when the PRF F generates an output that has already been used, the Feistel structure will transform that output differently than the last time it was used.

In other words, the function F is not itself invertible. Invertibility is provided by the surrounding Feistel structure, because if F was already a permutation you wouldn't need anything else. F is only required to generate pseudorandom output, and the Feistel structure's additional logic is what grants invertibility to it. This is the "magic" of the Luby-Rackoff construction, which allows you to take any PRF and transform it into a PRP.

antirez 8 years ago | | |

Hello, you don't have to pick an invertible F(), the way the L and R sides are combined together leads automatically to the network to be invertible. This is the magic of Feistel networks, that F() can be as complex as you want and can be not invertible at all.

jones1618 8 years ago | |

If you'd like to see it in action, I made a CodePen to demonstrate antirez's pixel dissolve over a Castle Wolfenstein 3D scene: https://codepen.io/jones1618/pen/YxRVpo

gfody 8 years ago | |

"just scramble the address" was the solution that immediately came to mind. feistelNet looks a little heavy though, is it really necessary? would you not get the same effect from straight xor on the x and y coords?

Natanael_L 8 years ago | | |

A permutation like that reduces biases in the output

jhoechtl 8 years ago | |

Couldn't that algorithm equally be used to circumvent the problem of slowing down a file systems the more the available space fills up? I imagine this would be beneficial when storing but detrimental when reading data.

buzzybee 8 years ago |

FizzleFade is also found in Microprose games from the era (e.g. Railroad Tycoon, Civilization), sometimes in full-screen transitions and other times to fade in single sprites. But more relevantly to "id software history", you can find it in Origin's Space Rogue, which John Romero contributed to. A likely possibility is that he picked up the trick on this or a previous project while at Origin.

It's also possible to use a slower "arbitrary PRNG and bump" scheme that tests the VRAM for the desired values(e.g. if it were a sprite, by running the blit at that pixel address and testing) and walks forwards or backwards until an unset value is found. If the walk can be done fast enough, it'll execute at the same framerate as an LFSR cycle-length fade. It can be further supplemented with an estimation heuristic or a low-resolution map to generate unique patterns. It's just less speedy and mathematically interesting to do that.

ticklemyelmo 8 years ago | |

I remember seeing it in a variety of mid-80's Commodore 64 games, where it ran at full speed and looked fantastic. I always wondered how it worked.

zimpenfish 8 years ago | |

I'm 95% sure I've seen this effect on a Spectrum which would likely predate even Space Rogue - I'd guess that would be an LSFR fade because "arbitrary PRNG and bump" would be hella clunky given the Spectrum's screen layout.

hyperion2010 8 years ago |

If want to know more about cool things you can do with shift registers and you've never heard of Solomon W. Golomb, check out Shift Register Sequences (intro at [0]). Most of our fundamental telecommunications is possible because he solved the mathematics involved.

0. http://jm.planetarydefenses.net/sense/refs/ref14_golomb.pdf

simias 8 years ago |

The original GameBoy had a hardware LFSR that could be used to generate white-noise-like sounds. It was often used for "whooshing" effects and also cymbal sounds, such as in the famous Super Mario Land theme: https://www.youtube.com/watch?v=Gb33Qnbw520

WillKirkby 8 years ago | |

For more technical details on the exact LFSR used, there's this: http://belogic.com/gba/channel4.shtml

khedoros1 8 years ago | |

Ditto with the NES. It actually had two possible tap configurations with different sequence lengths, so that they have slightly different sounds to them.

dmichulke 8 years ago |

A pseudo-RNG that cycles through a all elements of a modulo-ring.

Example for a 2^32 bit cycle:

X(n+1) = (a * X(n) + c) mod m

a = 134775813

c = 1

m = 2^32

leni536 8 years ago | |

My approach would be something like this, but with a very "poor" generator with the parameters a=81007, c=0 and m=2^17. This approximates a low discrepancy sequence (additive recurrence with alpha=1/golden ratio). Then I would calculate x and y values using the hilbert curve and the calculated pseudorandom number as the index (more precisely two Hilbert curves next to each other, so it covers a 512x256 rectangle). On today's CPUs it can be calculated quite fast (shameless selfplug: https://github.com/leni536/fast_hilbert_curve). I suspect that the resulting pattern on the screen would be less random looking, but more uniform without any obvious pattern.

schindlabua 8 years ago | | |

Cool! I was just looking up hilbert implementations yesterday, so that's super useful. Thanks!

(quick note: in your source the function is called hilebert instead of hilbert)

barrkel 8 years ago | |

Indeed; and careful selection of parameters for the LCG can truncate the ring to most arbitrary powers of two. And if you're willing to live with slight inefficiency (no more than twice as much work), an arbitrary modulo ring (shuffled sequence) can be produced by creating slightly larger range and skipping values that are outside the range.

This is a question I asked on SO some years ago relating to this problem - producing a shuffled range of numbers without allocating an array:

https://stackoverflow.com/questions/464476/generating-shuffl...

xigency 8 years ago | |

I thought of this too, but it might be a bit more taxing on an old CPU to do excess multiplication than a simple XOR and test.

wott 8 years ago |

> asm mov ax ,[ WORD PTR rndval ]

> asm mov dx ,[ WORD PTR rndval +2]

> asm mov bx , ax

> asm dec bl

> asm mov [ BYTE PTR y ], bl // low 8 bits - 1 = y

> asm mov bx , ax

> asm mov cx , dx

> asm mov [ BYTE PTR x ], ah // next 9 bits = x

> asm mov [ BYTE PTR x +1] , dl

I don't understand the need for the second asm mov bx , ax : BX is not used afterwards. Same for CX, it is never used.

> uint32_t rndval = 1;

> uint16_t x,y;

> do

> {

> y = rndval & 0x00F; // Y = low 8 bits

> x = rndval & 0x1F0; // X = High 9 bits

Er... no, if you do that, you only get the lowest 4 bits in y, and then you only get 5 bits in x (and not the right ones, of course).

It should be:

       y = rndval & 0x000000FF;  // Y = low 8 bits

And then you have a 'problem' for x, because you must shift it right, otherwise it doesn't fit in a 16-bit variable:

       x = rndval & 0x0001FF00;  // X = bits 8 to... 16 > 15, irk

So you should just do :

       x = rndval >> 8;  // X = bits 8 to 17, in their right place

nik_0_0 8 years ago | |

Looks like the author fixed the C translation following your comments:

  y =  rndval & 0x000FF;  /* Y = low 8 bits */
  x = (rndval & 0x1FF00) >> 8;  /* X = High 9 bits */

However, given that the assembly is verbatim from id-software's git, I guess those extra instructions are part of history now.

DecoPerson 8 years ago |

I used LFSR to render the red grading in this little experiment: https://youtu.be/fUpUrpHLUxo .

I have a feeling the Octane rendering engine uses it too.

It's a good fit for most cases where you want random sampling of a set without replacement.

leni536 8 years ago |

> Since 320x200=64000, it could have been implemented with a 16 bits Maximum-length register.

But then you have to calculate modulus for 200 or 320.

tripzilch 8 years ago | |

except the pixels are in a linear framebuffer, the fizzlePixel(x, y) function has a multiplication by 320 in it to calculate the address in the buffer.

so you could skip both the modulo and the mul if you go straight for obtaining a random shuffled framebuffer index.

unless maybe fizzlePixel does additional bookkeeping for some purpose.

leni536 8 years ago | | |

> has a multiplication by 320

Huh the original implementation seems to have a lookup table instead of that, interesting.

https://github.com/id-Software/wolf3d/blob/05167784ef009d0d0...

steventhedev 8 years ago |

It never ceases to amaze me how many of the older games were implemented as circuits first, and then translated to code. Makes you really appreciate how far we've come, and what sort of background that generation of developers had.

dre85 8 years ago |

I didn't quite understand why this is guaranteed to reach every pixel coordinate? Is there something inherent about LFSR that generates complete sequences within the cycle? So elements are never repeated or omitted?

fpgaminer 8 years ago | |

Yeah this wasn't covered well in the article.

Go back to the article and look at the section just below the first mention of Maximum-Length LFSR.

Take a look at that list of numbers and notice something; every number from 1-15 is output once and only once.

That's a property of Maximum-Length LFSRs; they output each number in their range once and only once.

So, for example, a 17-bit Maximum-Length LFSR will output every number from 1-131071, just in random order.

The Wolfenstein code separates the output of the LFSR into X and Y coordinates. Since the LFSR will visit every possible number exactly once, it will visit every possible combination of X and Y coordinates exactly once.

You can look at the 4-bit Maximum-Length LFSR again. Split each number it outputs into 2-bit X and Y:

You'll see that it hits every point on a 4x4 screen exactly once, in random order.

The caveat is that it doesn't seem to hit 0,0. This is because an LFSR can't go to 0, otherwise it gets stuck there. However, I believe the ASM code was incorrectly translated by the article author. For example the author seemed to forget to translate the "dec bl" instruction into the C equivalent, which would subtract 1 from the y coordinate and allow visiting 0,0.

dre85 8 years ago | | |

Thanks for that explanation! I just wanted to confirm that it's an inherent property of the LFSR.

andars 8 years ago | |

The cycle length is 2^17-1 = 131071. The number of pixels is 320*200 = 64000. Therefore each pixel is hit at least once.

The authors likely used 17 bits instead of 16 because then the x and y coordinates can be obtained via masking rather than modulo (8 bits -> 256 values < 320 pixels).

jstapels 8 years ago | |

I was wondering the same thing myself. I'm guessing they tested it and found that it hit every coordinate at least once and said "good enough". Meaning, I don't think it's guaranteed, they just picked one that did.

The interesting thing is that you're guaranteed to always have a different number because otherwise the cycle would restart. So you just need to find a sequence that's long enough.

pasta 8 years ago |

I have the feeling that knowledge about bits is lacking by a lot of younger coders. And I also think this is what causes bloatware.

CPUs are powerful enough to use a naive fade transition. But coders who are aware of the internal workings can make it even faster on todays hardware.

Great article and imho still relevant on todays much more powerful computers.

ishi 8 years ago |

Cool, I knew that LFSRs were used in ciphers. I was not aware that they were also useful for implementing old-school graphical effects.

https://en.wikipedia.org/wiki/Linear-feedback_shift_register...

ktta 8 years ago | |

CSS[1] was the defacto 'DRM' back in the day, which used two LFSRs for 40-bit encryption. Apart from being able to brute force it today, there are many other ways to break it[2].

Now the DRM on DVDs/BluRays is AACS[3], which uses AES. You might also recognise it from the 'copyrighted numbers' fiasco[4]

[1]: https://en.wikipedia.org/wiki/Content_Scramble_System

[2]: https://en.wikipedia.org/wiki/Content_Scramble_System#Crypta...

[3]: https://en.wikipedia.org/wiki/Advanced_Access_Content_System

[4]: https://en.wikipedia.org/wiki/AACS_encryption_key_controvers...

jimktrains2 8 years ago | |

Similar concepts are used for raid 6 parity (LSFR + Galois Fields). A pdf I found describing raid 6 parity describes the field and LSFR use and some background. https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

(I took at stab at implementing them awhile back as an experiment. https://github.com/jimktrains/r6parity)

zimpenfish 8 years ago | |

I'm not sure I'd call Wolfenstein 3D "old-school".

But then I did start with computers in 1980.

smcl 8 years ago | | |

It's funny where we draw our lines. Some younger guys will call Half-Life 2 old-skool, which for me felt like it was just a couple of years ago.

kbart 8 years ago | | |

Most of recent CS graduates weren't even born when this game was released, so it's not even "old-school" -- it's "ancient" for them.

emmanueloga_ 8 years ago |

As interesting as this article is, I would also love to know where and how the author of the code learned about LFSRs!

I wonder if they rediscovered the algorithm, knew they were implementing a LFSR or even just solved a particular instance of the problem without ever realizing they were writing a LFSR.

I learned about LFSRs a while ago and wrote a small implementation for ruby as an exercise [1] using a Wikipedia page as reference [2]. But Wolfenstein 3D was released in 1992, I'm sure back then information was a lot harder to find online!

1: https://github.com/EmmanuelOga/lfsr 2: https://en.wikipedia.org/wiki/Linear-feedback_shift_register

thomasahle 8 years ago |

Fundamentally you want to make a random permutation, but not spend linear memory on it, as you would if you did it by shuffling.

nemo1618 8 years ago | |

I did something like this in Go: https://godoc.org/github.com/lukechampine/randmap/perm

jstimpfle 8 years ago |

What properties are required in an LFSR that it covers the whole range (2^n-1 numbers) before returning? Or are such configurations found experimentally?

pfedak 8 years ago | |

Galois generators of the sort in the article can be described naturally using the finite field Z_2[x]/p(x), that is, polynomials with coefficients taken mod two, where we consider polynomial p(x) is equivalent to zero. If p has degree n, this field has 2^n elements - the 2^n polynomials with degree less than n are distinct, but x^n is equivalent to x^n-p(x), which has degree less than n, so everything with higher degree is already accounted for. This is, however, only actually a field if p is irreducible.

Now we can describe the generator as multiplication by x, where the leftmost bit is the lowest power, since every bit moves right except for the rightmost, which corresponds to turning x^n into x^n-p(x) as above. The cycle length is the smallest k such that x^k is equivalent to 1. Now, an interesting property of finite fields is that there is always a "primitive" element, the powers of which generate every other element (you can prove this inductively by counting elements z such that z^d = 1 for different d, and noting that z^d-1, as a degree d polynomial, has at most d roots in the field). If x is primitive, it will cycle through all other values before reaching 1. In the specific case of n=17, however every element of the field is primitive (this is easy group theory, the nonzero elements form a group under multiplication, and that group has a prime number of elements).

This means any degree-17 polynomial which is irreducible in Z_2[x] will give rise to a full-cycle-length generator. Luckily, finding an irreducible polynomial isn't too hard - of the 2^16 options (ignoring the ones without a constant term, which are obviously divisible by x), 7710 are irreducible by my quick Mathematica computation.

jimktrains2 8 years ago | |

A pdf I found describing raid 6 parity describes it. https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf That PDF + the sibling comment should help in understanding.

(I took a stab at implementing it in C a few years back. https://github.com/jimktrains/r6parity)

vigna 8 years ago | |

There's a complete theory based on primitive polynomials on Z/2Z. Check any article on linear PRNGs for references.

"Experimentally" will work only with ridiculously small state spaces.

Jyaif 8 years ago |

Note that you can (quite obviously) use this to fade from one image to an other by writing the color of the next image instead of just "red".

transitive_bs 8 years ago |

For a Javascript implementation, check out https://github.com/fisch0920/dissolve-generator

This version is based off of the article A Digital Dissolve Effect by Mike Morton "Graphics Gems", Academic Press, 1990 (http://dl.acm.org/citation.cfm?id=90821)

D_Guidi 8 years ago |

as a "senior" business programmer with non-engineering studies (I have a deegree in byology), I'm feeling an impostor reading this and admitting that I'm unable to understand basically everything... even https://bigmachine.io/products/the-imposters-handbook/ not helped too much

cousin_it 8 years ago | |

It's very simple. Basically you need to cycle through all n-bit integers (except zero) in a random-looking way, and it turns out there's an arcane-ish bit twiddling operation that will do it when applied repeatedly. The reason it works can be explained with some undergrad math. You don't need to know these things for business programming but they are fun to read about.

Another idea in a similar vein is https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dither... It converts an image with many shades of gray to an image with only black and white pixels. The algorithm is dead simple, but the results are surprisingly convincing.

kbart 8 years ago | |

Don't worry, it's rare to see manual bit level optimization these days as compilers are quite smart in optimization. It's kind of lost art. I work with embedded systems and even there bit manipulation is mostly used for controlling MCU registers, not writing optimized code. If you still want to understand such manipulations, it's mostly boolean algebra of which there's plenty of literature to choose from.

sp332 8 years ago | |

The article skipped most of the description of what LSFR's actually do, and the linked Wikipedia article is surprisingly unhelpful. After you read the value of the registers, they all get shifted one bit to the right. The last value simply falls off and is discarded. To generate the value for the new leftmost bit, which is now "empty", you XOR some of the other bits together. Then you read the new value and start over. Eventually the values will repeat, and the goal is to find a configuration that will give you the longest cycle before repeating.

dopeboy 8 years ago |

Fascinating. Who wrote the original ASM? Carmack?

userbinator 8 years ago |

My assumption is that LFSR literature was hard to come across in 1991/1992 and finding the correct tap for a 16 bit maximum length register was not worth the effort.

I guess it might be more due to the lack of overlap between the problem domains --- LFSRs were known since the late 60s in relation to CRCs and other error-correcting codes. https://en.wikipedia.org/wiki/Gold_code

abecedarius 8 years ago |

Still a good trick today. https://github.com/silentbicycle/greatest just added a feature of shuffling test executions into random order, after I suggested the general idea in a Twitter convo. (I think it uses LCGs instead of an LFSR.)

Kiro 8 years ago |

Stupid question: Why can't you just take an array of all the pixels, randomize it and then iterate it to draw?

gall_anonim 8 years ago | |

Because with ideal packing it would take 150 kB (320·240·log2(320·240)/8). Wolf3d needed 640 kB of RAM and using quarter of it just to display death animation wouldn't be a good idea.

bombela 8 years ago | |

because this would take a huge amount of memory. something like xy(sizeof x + sizeof y). not only memory wasn't available cheaply at the time. it would be slow to generate, write to memory, then fetch back to draw.

bkanber 8 years ago | |

This was written in assembly and to do that you'd have to make a new register of all the references while randomizing. Your approach works great in modern, high level languages where there's no issue copying an array in memory and calling array_shuffle or something, and doubling the memory requirement for the effect is no problem. This approach however seems to only require 17 bits of additional memory.

et1337 8 years ago | |

1) That's a lot of memory

2) Accessing memory is the slowest thing you can possibly do on a computer, other than IO

drudru11 8 years ago |

The Atari 2600 video hardware used these all over the place instead of traditional counters. It saved them a lot of gates that counters would require.

trollopTheJope 8 years ago |

i am interested to know the particulars of any routines people have for reading and reviewing a codebase, as the author talks about doing in his spare time. do you take notes? add comments? step through with a debugger?

kbart 8 years ago | |

Given you are a seasoned programmer, most of the code written in familiar language should be obvious just by skimming it. But when it comes to a short and "smart" algorithms, especially including bit manipulations, I still find pen&paper the best tool to find out what's really happening.

thejynxed 8 years ago | | |

Pencil/pen, paper, flow charts, and in some cases one of those TI scientific calculators can come in real handy.

This being said, I've never gone much away from C/C++/Pascal/Assembly/COBOL/FORTRAN and various forms of BASIC where what I mentioned helps immensely (I work on legacy systems in my spare time, CS isn't my primary field), although Python, Haskell, Rust and GO have piqued my curiosity.