Bias Compounds, Variance Washes Out(convergentthinking.sh) |
Bias Compounds, Variance Washes Out(convergentthinking.sh) |
* bias compounds
* variance diffuses
* configs store parameters
* BF16 + RNE (6 bytes) plateaus
* errors repeat
* six bytes match ten
This sort of thing reads really well and conveys the idea in very few words. It's good writing! But in my experience humans don't generally "let nouns verb" as much as LLMs do, maybe we're just not as clever with words.
What's wrong with "configs store parameters"? I guess "parameters are stored in configs" could be more correct, but IMO it means exactly the same thing and sounds just as natural. "Six bytes match ten" is shorthand for "the performance of the algorithm that uses six bytes of storage matches the performance of the algorithm that uses ten bytes of storage". But here we have "performance matches", which is an inanimate concept doing something, so is this an LLM smell too?
Yes everyone says the sun shines and the wind blows, those are specific idioms. Noone says bias compounds or variance diffuses or six bytes beat ten.
I'm not saying they shouldn't! They probably should! It's just that LLMs say it much more than humans do.
> "Six bytes match ten" is shorthand for "the performance of the algorithm that uses six bytes of storage matches the performance of the algorithm that uses ten bytes of storage".
Yes, I understand this and support it. I am emphatically not saying it is bad writing. It's an unbelievably brilliant piece of terse writing that most human writers would not stumble upon in the course of writing the post.
From a technical writing perspective, this is a terrible blog post.
Here is a better blog: https://cloud.google.com/blog/topics/developers-practitioner...
This makes me wonder whether you could apply different dithering approaches to numeric computations. You cannot use diffusion or similar mehods, because you don't have information about neighboring pixels/computations. Using low-discrepancy sequences might work to reduce stochastic noise, but it could also reintroduce bias for some computations.
This was quite interesting though. Surprised to see it work so well on a real example.
But... it's not unusual in the slightest.