FizzleFade(fabiensanglard.net) |
FizzleFade(fabiensanglard.net) |
Being that a feistel network is a pseudorandom permutation, this fulfills the need (and in my opinion, even more elegantly than a LFSR). For even better performance you could use AES as the PRF, especially if users have AES-NI instructions available for acceleration. Then use a basic Feistel network for the PRP.
Wait... what? It took him 25 years! ;)
Relates well to Shannon's problem solving process' [1] "Step 2: Fill your 'mental matrix' with solutions to similar problems."
[1] http://www.businessinsider.com/engineer-claude-shannon-probl...
Just want to note that this approach (regardless of PRF) probably wouldn't have worked in 1991. Recomputing the cipher state at every pixel is probably ~10x slower than the single shift + xor in the iterative LFSR approach.
EDIT: I just randomly checked that 4 rounds of F = ((r * 31) ^ (r >> 3)) & 0xff provide more or less the same result. Multiplying for 31 is the same as shifting 5 bits on the left and subtracting the number again, so it's just 4 rounds of bit shifting and xor.
You run Feistel for each number 0->65535 (corresponding to the "stage" of the FizzleFade) and out comes the pixel to redden at that stage. Since it's a 16-bit number, some values will fall outside of 320x240 resolution, and you ignore those.
Thank you.
So you're basically using what's known as the Luby-Rackoff construction on a provable pseudorandom function to create a pseudorandom permutation. A pseudorandom function generates output that appears random but which can repeat, which is why it cannot be invertible, and is thus unsuitable as a block cipher (you need to be able to decrypt a ciphertext to a specific plaintext).
A pseudorandom function is used as the round function in the feistel network (in the diagram, that's denoted by the F in the middle). You seed the pseudorandom function with a key K. Because the Feistel network successively transforms L and R in each round (L0, R0, L1, R1 and so on), it can be proven that even when the PRF F generates an output that has already been used, the Feistel structure will transform that output differently than the last time it was used.
In other words, the function F is not itself invertible. Invertibility is provided by the surrounding Feistel structure, because if F was already a permutation you wouldn't need anything else. F is only required to generate pseudorandom output, and the Feistel structure's additional logic is what grants invertibility to it. This is the "magic" of the Luby-Rackoff construction, which allows you to take any PRF and transform it into a PRP.
It's also possible to use a slower "arbitrary PRNG and bump" scheme that tests the VRAM for the desired values(e.g. if it were a sprite, by running the blit at that pixel address and testing) and walks forwards or backwards until an unset value is found. If the walk can be done fast enough, it'll execute at the same framerate as an LFSR cycle-length fade. It can be further supplemented with an estimation heuristic or a low-resolution map to generate unique patterns. It's just less speedy and mathematically interesting to do that.
0. http://jm.planetarydefenses.net/sense/refs/ref14_golomb.pdf
A pseudo-RNG that cycles through a all elements of a modulo-ring.
Example for a 2^32 bit cycle:
X(n+1) = (a * X(n) + c) mod m
a = 134775813
c = 1
m = 2^32
(quick note: in your source the function is called hilebert instead of hilbert)
This is a question I asked on SO some years ago relating to this problem - producing a shuffled range of numbers without allocating an array:
https://stackoverflow.com/questions/464476/generating-shuffl...
> asm mov dx ,[ WORD PTR rndval +2]
> asm mov bx , ax
> asm dec bl
> asm mov [ BYTE PTR y ], bl // low 8 bits - 1 = y
> asm mov bx , ax
> asm mov cx , dx
> asm mov [ BYTE PTR x ], ah // next 9 bits = x
> asm mov [ BYTE PTR x +1] , dl
I don't understand the need for the second asm mov bx , ax : BX is not used afterwards. Same for CX, it is never used.
> uint32_t rndval = 1;
> uint16_t x,y;
> do
> {
> y = rndval & 0x00F; // Y = low 8 bits
> x = rndval & 0x1F0; // X = High 9 bits
Er... no, if you do that, you only get the lowest 4 bits in y, and then you only get 5 bits in x (and not the right ones, of course).
It should be:
y = rndval & 0x000000FF; // Y = low 8 bits
And then you have a 'problem' for x, because you must shift it right, otherwise it doesn't fit in a 16-bit variable: x = rndval & 0x0001FF00; // X = bits 8 to... 16 > 15, irk
So you should just do : x = rndval >> 8; // X = bits 8 to 17, in their right place y = rndval & 0x000FF; /* Y = low 8 bits */
x = (rndval & 0x1FF00) >> 8; /* X = High 9 bits */
However, given that the assembly is verbatim from id-software's git, I guess those extra instructions are part of history now.I have a feeling the Octane rendering engine uses it too.
It's a good fit for most cases where you want random sampling of a set without replacement.
But then you have to calculate modulus for 200 or 320.
so you could skip both the modulo and the mul if you go straight for obtaining a random shuffled framebuffer index.
unless maybe fizzlePixel does additional bookkeeping for some purpose.
Huh the original implementation seems to have a lookup table instead of that, interesting.
https://github.com/id-Software/wolf3d/blob/05167784ef009d0d0...
Go back to the article and look at the section just below the first mention of Maximum-Length LFSR.
Take a look at that list of numbers and notice something; every number from 1-15 is output once and only once.
That's a property of Maximum-Length LFSRs; they output each number in their range once and only once.
So, for example, a 17-bit Maximum-Length LFSR will output every number from 1-131071, just in random order.
The Wolfenstein code separates the output of the LFSR into X and Y coordinates. Since the LFSR will visit every possible number exactly once, it will visit every possible combination of X and Y coordinates exactly once.
You can look at the 4-bit Maximum-Length LFSR again. Split each number it outputs into 2-bit X and Y:
0001 | 0,1
1000 | 2,0
0100 | 1,0
0010 | 0,2
1001 | 2,1
1100 | 2,0
0110 | 1,2
1011 | 2,3
0101 | 1,1
1010 | 2,2
1101 | 3,1
1110 | 3,2
1111 | 3,3
0111 | 1,3
0011 | 0,3
You'll see that it hits every point on a 4x4 screen exactly once, in random order.The caveat is that it doesn't seem to hit 0,0. This is because an LFSR can't go to 0, otherwise it gets stuck there. However, I believe the ASM code was incorrectly translated by the article author. For example the author seemed to forget to translate the "dec bl" instruction into the C equivalent, which would subtract 1 from the y coordinate and allow visiting 0,0.
The authors likely used 17 bits instead of 16 because then the x and y coordinates can be obtained via masking rather than modulo (8 bits -> 256 values < 320 pixels).
The interesting thing is that you're guaranteed to always have a different number because otherwise the cycle would restart. So you just need to find a sequence that's long enough.
CPUs are powerful enough to use a naive fade transition. But coders who are aware of the internal workings can make it even faster on todays hardware.
Great article and imho still relevant on todays much more powerful computers.
https://en.wikipedia.org/wiki/Linear-feedback_shift_register...
Now the DRM on DVDs/BluRays is AACS[3], which uses AES. You might also recognise it from the 'copyrighted numbers' fiasco[4]
[1]: https://en.wikipedia.org/wiki/Content_Scramble_System
[2]: https://en.wikipedia.org/wiki/Content_Scramble_System#Crypta...
[3]: https://en.wikipedia.org/wiki/Advanced_Access_Content_System
[4]: https://en.wikipedia.org/wiki/AACS_encryption_key_controvers...
(I took at stab at implementing them awhile back as an experiment. https://github.com/jimktrains/r6parity)
But then I did start with computers in 1980.
I wonder if they rediscovered the algorithm, knew they were implementing a LFSR or even just solved a particular instance of the problem without ever realizing they were writing a LFSR.
I learned about LFSRs a while ago and wrote a small implementation for ruby as an exercise [1] using a Wikipedia page as reference [2]. But Wolfenstein 3D was released in 1992, I'm sure back then information was a lot harder to find online!
1: https://github.com/EmmanuelOga/lfsr 2: https://en.wikipedia.org/wiki/Linear-feedback_shift_register
Fundamentally you want to make a random permutation, but not spend linear memory on it, as you would if you did it by shuffling.
Now we can describe the generator as multiplication by x, where the leftmost bit is the lowest power, since every bit moves right except for the rightmost, which corresponds to turning x^n into x^n-p(x) as above. The cycle length is the smallest k such that x^k is equivalent to 1. Now, an interesting property of finite fields is that there is always a "primitive" element, the powers of which generate every other element (you can prove this inductively by counting elements z such that z^d = 1 for different d, and noting that z^d-1, as a degree d polynomial, has at most d roots in the field). If x is primitive, it will cycle through all other values before reaching 1. In the specific case of n=17, however every element of the field is primitive (this is easy group theory, the nonzero elements form a group under multiplication, and that group has a prime number of elements).
This means any degree-17 polynomial which is irreducible in Z_2[x] will give rise to a full-cycle-length generator. Luckily, finding an irreducible polynomial isn't too hard - of the 2^16 options (ignoring the ones without a constant term, which are obviously divisible by x), 7710 are irreducible by my quick Mathematica computation.
(I took a stab at implementing it in C a few years back. https://github.com/jimktrains/r6parity)
"Experimentally" will work only with ridiculously small state spaces.
This version is based off of the article A Digital Dissolve Effect by Mike Morton "Graphics Gems", Academic Press, 1990 (http://dl.acm.org/citation.cfm?id=90821)
Another idea in a similar vein is https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dither... It converts an image with many shades of gray to an image with only black and white pixels. The algorithm is dead simple, but the results are surprisingly convincing.
I guess it might be more due to the lack of overlap between the problem domains --- LFSRs were known since the late 60s in relation to CRCs and other error-correcting codes. https://en.wikipedia.org/wiki/Gold_code
2) Accessing memory is the slowest thing you can possibly do on a computer, other than IO
This being said, I've never gone much away from C/C++/Pascal/Assembly/COBOL/FORTRAN and various forms of BASIC where what I mentioned helps immensely (I work on legacy systems in my spare time, CS isn't my primary field), although Python, Haskell, Rust and GO have piqued my curiosity.
In fact, if I see someone using binary operators in languages such as Java,JS,Ruby etc... I'll immediately consider it bad code, regardless of context - it's just not the right tool for the level of abstraction in these languages.
The fact is that in these products (frontend/backend web development and apps) the performance profile is dominated by bad algorithms, wrong data structures, slow libraries, missing db indexes etc... which knowledge about bits help absolutely zero with.
Large binaries are also not dominated by code you write but rather by using too broad libraries or simply from huge assets.
The only thing I'd consider knowledge of bits to be of any help to a run of the mill developer these days is the knowledge that floating point numbers don't multiply/divide well - but that kind of knowledge can be imparted without really diving into how bits work.
Clojure is written in java (and clojure). It uses bit operations to implement Software Transactional Memory and persistent collections. Is it a bad code?
You seem to think a programmer should always code on the same level of abstraction. I'd argue in many applications often you find no suitable abstraction, and have to implement it yourself.
Also linear feedback shift registers can be used without any knowledge about bits other than the fact that integers loop over when they reach maximum value (and every programmer should know that anyways). And these registers are useful for some very nice algorithms (like in-place pseudorandom permutations' generators - for example for shuffling songs in a media player, or for some ai algorithms).
I get what you are writing about here, that higher level concepts dominate programming many common apps today, but it seems like a weird place to allow yourself to have a knowledge gap.
Therein lies your bias, and the rest of your comment just follows.
int x = somefunction();
int x_dividedby16 = x >> 4;
My coworker corrected the second line to something like: int x_dividedby16 = (int)Math.ceil(x / 16.0);Then again, more readable is subjective and some devs might see "<<" and get thrown off.
> the performance profile is dominated by bad algorithms, wrong data structures
Do you know how you often implement good algorithms and data structures? You wouldn't implement a bloom filter as a high level array of 1/0 integers, would you?
I once spent an hour "optimizing" a piece of code that ran in 15 minutes, which tested an embedded system. The resulting code ran in 4 seconds. The root cause was a complete misunderstanding of how microprocessor interrupts work.
I spent a week re-writing a 4000 line C-function, which ended up to be around 50 lines of code, doing the same thing. The root cause was a complete misunderstanding of how state machines should be constructed.
These types of rookie mistakes by itself don't worry me. But the "trend" seems to be that young coders graduate with less knowledge about the underlying hardware on which their software will run.
I feel like just like Mathematics and Physics, low level programming / microprocessors / computer architecture classes should be part of many university curriculum. (Oh and I think they should teach about version control systems in college as well, but that's the topic of another story!)
There's a lot of software being written in Excel macros and Wordpress web development. These tools actually get work done - they automate things that were not automated before, they publish things that were not published before, and they generate revenue and improve productivity.
Young coders, and the authors of these programs, should not be forced or expected to go through low-level programming classes to get work done. The tools are good enough and the hardware is fast enough that for 99.99% of all projects, they don't need this expertise.
I say this as someone with an EE degree (from less than a decade ago) consisting of classes that spanned the entire pyramid of complexity: from analysis and simulation of individual transistors, through construction of a CPU on an FPGA, through writing an RTOS for a microcontroller, up to programming a web server. That understanding benefits me greatly in what I'm doing today.
But I work with plenty of people who don't have that background. For all the business logic that seems to be found in any software project, that's just fine. Most of the work is in translating people's desire for the software to "do what I want" into actual requirements and then codifying those requirements into a database or UI. For construction of state machines, they should use a library or have someone more competent construct the framework and let them implement the actual states, and when the work requires an understanding of how microprocessor interrupts, they should defer to someone who understands that technology.
And yes, they should teach some version control in college as well. It would be enough to give students one major project that ends up with the filename laydn-Project-complete-final-actuallyfinal-3-done-8-29.zip, which just worked a minute ago after which you barely changed anything and now it's completely broken, and that thing that worked once stopped working at some unknown time in the past. Then just mention that git and svn are things that exist.
> CPUs are powerful enough to use a naive fade transition. But coders who are aware of the internal workings can make it even faster on todays hardware.
Not really. No one would do the fade transition---or more accurately, the regional alpha blending with a fixed color---in CPU nowadays exactly because GPUs are much more capable in that job, and such a job is actually fairly easy in most abstractions used because it is an integral part of UX and whatever. The ability of leveraging all resources and determining their efficient utilization is a key.
> I have the feeling that knowledge
> about bits is lacking by a lot of
> younger coders. And I also think
> this is what causes bloatware
From a management perspective, I wonder if people find older developers miss obvious solutions that involve throwing small amounts of money and/or hardware at business problems, and instead turn to "clever" solutions that are costly in terms of extra developer time needed for developer and maintenance.Certainly when running technology in an SME I found my gut feelings about many cost/benefit questions were often invalidated by the ever-decreasing cost of computing power, both in terms of physical hardware and cloud resources.
Anecdote: One of my managers once spent €70000¹ worth of hardware to speed up an application because he believed it would be cheaper than to optimize the code. Naturally, no-one had done any kind of performance analysis, so while the upgrade made things about 20% faster, it still wasn't enough. It took me all of 20 minutes to figure out that caching was disabled on the database and it had no indexes whatsoever². 30 minutes of work yielded a few thousand percent speed increase.
¹) This was a long time ago before SSDs existed and buying a hardware RAID of fast disks and fast CPUs / memory was extremely expensive.
²) DBAer consultants of a certain large database vendor are not worth the horrendous hourly fee they ask.
I regularly see the exact opposite: People throwing money and hardware at problems instead of doing the simple and obvious thing. I guess that makes me officially old.
It hurts (both emotionally and financially) to be rejected by younger teams because they assume things about me that aren't true. I now get the dreaded "has too much experience and would be bored" rejection line all too often.
Why do people feel the need to generalise about things developers of different ages do...?
Knowing when to save on developer time by spending money in some form definitely seems to be a learned skill though.
Of course you'll also find a lot of kids who overestimate the maintenance cost of 3 lines of weird code sitting in a corner doing their thing for 20 years ;)
That was a long time ago in tech terms (~15 years). In the intervening gap I spent most of my time in scripted languages (as3, c#, etc.).
Result? I don't know how to wrangle bits anymore. The added speed increase I'd add almost definitely isn't worth it (for example, flash and unity games, web services, etc.). That doesn't mean it isn't worth it for the end product - I rely on Unity, Adobe, Google, etc. to worry about those optimizations so I can stand on their shoulders and benefit.
But in terms of cost/benefit, getting into that stuff myself (again) just didn't make sense for most of my professional needs.
It seems to me like they might be doing the same thing! For all the supposedly smart people they hire, TBH I can't think of a single product that feels fast/performant to me.
Man, was I shocked. Very smart, highly educated, mid-level and even senior software engineers seem to know very little about bits these days. When they'd run into a memory issue, their natural response was to just spin up a few more servers and throw another terabyte of memory at the problem. Makes sense, I guess - until their CFO saw their pretty exponential curve in infrastructure costs.
While the information is nice to know and helps every once in a while (like understanding these articles better), I just don't think it's nearly as necessary to know as it used to be.
I also acknowledge that with the invention of "code bootcamps" a lot more coders have never been to college than before. Effectively, I think, creating two types of programer that has nothing to do with age. Both types have their applications they are good at.
I don't think this is true. I used to code in assembly and knowledge of bits is largely irrelevant the vast majority of the time now outside of graphics algorithms and low-level number crunching. Algorithmic growth and memory usage are always going to be important though.
Now, when FPGAs become more and more popular, people will be forced to learn VHDL languages and then different lesser-known techniques will surface. (It's a naive prediction, I know.)
Isn't that already part of a normal CS degree? We needed to learn VHDL to program a Atmel ATTiny for class.
It's not even necessarily about squeezing cpu or memory performance, but finding a solution that is "native" to how computers work: binary bits.
I optimize for what they pay me for now. I also optimize for what I think they will pay me in future.
The super efficient C solution might be an salary dead-end when management also accepts a huge Hadoop solution :)
But knowledge about inner workings helps to understand what is going on and what can be improved.
So I don't say you should use a non GC language for example. But knowledge about memory management can help to understand memory leaks.
Whether I use array access or a bitmask will mean nothing performance-wise, since the bloom filter itself is likely already speeds up a different algorithm by an order of magnitude.
But that's not to say the premise is bad - I don't think it's wrong to assume the majority of work is in that area, definitely for young developers. There aren't that many real-time and performance-demanding programming jobs compared to web development.
I don't think even game-programming uses bit operations heavily these days- most of the low-level work is done by third party engines.
The complete disregard for good design I have seen troubles me as a software engineer. When I've had to rewrite code, usually its not because I disagree with its design, but because there is no design to disagree with! IMO code design should be taught better (or at least taught at all...). Only then can someone make the judgement call of what parts can be poorly written and which parts require more care.
This next claim is anecdotal: the people which don't bother to learn even the basics of how actual machines work tend to be the same people which don't bother to properly design their code. Good design necessitates some understanding of how the machine works, it can not be accomplished through brute force pattern matching. I understand that code is becoming increasingly common and that some people have no choice but to interact with it. But if a significant portion of your income is generated through work in software, then I don't really think a valid excuse exists; you should do your job correctly. I'm not saying everyone needs to be an expert. The culture of apathy for the most basic computing principles is what troubles me, and it should trouble anyone who has to read code written by others.
> Why do people feel the need to
> generalise about things developers
> of different ages do
Mostly it was a tongue-in-cheek reply to a comment doing the same... > People throwing money and hardware at
> problems instead of doing the simple
> and obvious thing
There are (almost) no technical problems, just business decisions. Programmer time for development and maintenance is expensive. Hardware is decreasingly so (also it's CapEx, so nobody cares if you literally light it on fire).So it can be useful to think a bit to be able to stay on one machine.
But making those trade-offs is exactly what engineering is about, and what's hard. So talking in the very abstract is seldom useful.
> Managers often think throwing more
> hardware at the problem will fix things
It often will... > no-one had done any kind of performance
> analysis
... is the moral of this storyLacking low level knowledge can only create blind spots in the analysis.
The performance modelling teams working for the large CPU makers get several code traces from major database vendors to optimize the execution of critical pieces of database code. They do a cycle-by-cycle analysis.
And then people run the databases with caching and indexing disabled. Go figure
The three lines of code were:
for ( ... ) { x = x << y }
The developer was essential using an array of booleans to simulate a bitwise operator.
It is very sad that devs these days need a comment to know what that does. Would we expect 7 lines of comments if it was:
for ( ... ) { x = x + y; }
Yes but they won't. Pragmatically speaking adding a comment to explain a concept the next developer won't use very often means you won't need to answer their questions about it when they next need to modify that piece of code. That's a big win for you, and it means the other developer can carry on being productive.
I don't know anything about clojure, but it sounds like you're clobbering in the suggestion that because bitwise operations are bad in high level business code, they must also be bad in low level language implemention code. At best you're not paying attention, at worst that's an intellectually dishonest counter argument.
But if any of your Java code for applications or libraries used bit operations, I'll call it bad code.
There's nothing wrong with writing your own library implemented using bit operations.
Maybe quadtree division of space etc. Only touch nodes that had no short period objects, like honeycombs, boxes or blinkers.
You wouldn't even need storage for completely empty nodes.
I'm sure best/fastest GoL engines have even more interesting strategies.
Learning some form of assembly language at university is (probably) reasonably common.
Learning VHDL (for programming FPGAs) is less common - I did for my Computer Science degree, but this was about 15 years ago, so I don't know if it's still common.
That's the p90 use case for development people are doing.
I don't poke at memory, I use a profiler that tells me what every piece of code and variables I use take in memory.
I don't design cache friendly data structures, I use redis.
I would wager that my experience is closer to the development situation and needs of the majority of programmers.
I don't see why knowledge of bit operations are necessary to do a good programming job.
Sure, Conway's Game of Life is a toy problem, but the optimizations used to make it run fast (which go way beyond mere bit-twiddling, btw) are useful tools and practice in a programmer's mental toolbox, for solving real problems.
Certainly, a programmer who "merely" plugs frontends into backends with databases or whatever it is that you consider a "real job" in programming[0], doesn't need to know about bit-level optimizations to do that job. But I would argue that even then, it makes a better programmer, knowing about that sort of thing, broadening your bases, versatility and well-roundedness.
Doesn't mean you have to use it on problems where it's not supposed to, or premature optimization.
Elsewhere in this thread you say you don't need low-level optimizations because Redis handles this for you. But you wouldn't say that the people researching, designing and building Redis weren't doing "real jobs", would you?
I think broadening your bases is actually a pretty good reason for learning about these (or other) things. Because you seem to have a pretty narrow view of what a "real job" is in programming. In particular that you can't seem to come up with any good reasons for knowing about bits, except IEEE floats.
What about graphics programming, usually pixels are packed into a 32-bit RGBA word, you need some basic bit operations for this. Most situations abstracting that away kills performance.
If you need to read out sensors from a device, you're probably going to come across bits (and many other ancient programming lore) somewhere along the line.
If you write code for small / low powered hardware (like Arduinos or equivalent) you are going to need to know your way around bits and bit-manipulation code is very welcome there.
Knowing about these techniques is also important because sometimes they suddenly become useful in a whole new or modern context. Take antirez's Feistel network approach to this fizzling-problem, for instance. That's a cryptographic primitive now used in a graphics context. You can make exactly the same point about it, that a programmer who is supposed to do frontend/backend/db work, should never write code like that, and indeed this is true, you most definitely should NOT be coding your own versions of crypto primitives in a context like that (I'd argue that's even worse than bit-twiddling).
[0] I'm kidding, I respect the work that goes into it. I think it's boring as hell, but it's a lot of work and not all trivial either.
> (which go way beyond mere bit-twiddling, btw)
I was fascinated by optimizations people come up for Game of Life by reading Michael Abrash's Black Book.
> a programmer who "merely" plugs frontends into backends with databases"
I am such a programmer: never have used bit manipulation in my job, and while it is kinda boring, it is a job, and someone has to do it.
YmFzZTY0IGFsd2F5cyB0ZW5kcyB0byBsb29rIGtpbmQgb2YgbGlrZSB0aGlz
Some people will immediately know what to do with that string in the blink of an eye, and the reason why they can spot that with a single glance has very much to do with basic knowledge about bit manipulations.
> I don't see why knowledge of bit operations are necessary to do a good programming job.
The programmer that can't do the above will be stuck. Probably they can still do a good programming job regardless, but the programmer that can is a better programmer.
Knowledge that is related to your practical work but not technically part of it, yet it is part of the field of your work. That is very useful because it makes you better and more well-rounded, meaning you have more mental tools to tackle unforeseen problems. It makes you more well-rounded and better at your job.
If you only know the things you strictly need to know for your job, you're going to get stuck as soon as you encounter a problem that requires novel thinking. And I dunno, I also consider that ability to be part of a "real job".
Then again there are programming disciplines where you have to know all of this and more. It ultimately comes back to those capable of implementing the tools/frameworks/compilers and those that just use them. Safe crypto libraries are impossible without bit wrangling.
I will still argue that having these skills has at times saved order of magnitudes of time for me. Maybe it wasn't worth all the effort I spent if it only saves time once in a while. Then again I haven't been a full time developer in like 11 years, now I break other peoples software for a living and bang bits together on a weekly basis :)
Using redis or memcache or whatever semi-persistent , network available data store for key value pairs only has middling value (to me) and relies on adopting a data model that is severely limited in scope. If you can't understand the structure of a 'packet' on the network then you can't really troubleshoot network dependent services that provide your L7 function.
https://en.wikipedia.org/wiki/CPU_cache
https://www.akkadia.org/drepper/cpumemory.pdf
These are great places to start.
Especially Ulrich Drepper's PDF on memory.
edit: even simple things like looping can require knowledge of "bits" and "memory layout" and "CPU cache" to write the fastest code. Most people don't need to write the fastest code though.