The Secret Life of NaN (2018)

The Secret Life of NaN (2018)(anniecherkaev.com)

52 points by prakashqwerty 67 days ago | 40 comments

NaNs are a very underappreciated feature of IEEE-754 floating point. In the D programming language, floats get default initialized to NaN, not to 0.0.

    double y = 0.0; // initialized to 0.0
    double x; // initialized to NaN

The discussion routinely comes up as "why not default initialize to 0.0?" The reason is a routine mistake in programming is forgetting to initialize a variable. With a floating point 0.0, one may never realize that the floating point calculation results are wrong. But with NaN, the result of a floating point computation will be NaN, which is unlikely to go unnoticed.

I don't know of any other programming language with this safety feature.

Also, the D `char` type is initialized to 0xFF, not 0, because Unicode says that 0xFF is an invalid character.

p1necone 66 days ago | |

Just requiring explicit assignment before first use feels like the superior approach to automatic initialization, regardless of whether the automatic initialization is with 0 or with NaN.

WalterBright 66 days ago | | |

That suggestion is often made.

The trouble with it is a bug I've seen often. People will get an error message about an "uninitialized variable". Then they go into "just get the compiler to shut up" mode, amd pick "0" as the initializer. Then, the program compiles and runs, and silently produces the wrong answer. Code reviews will simply pass over the "0" initializer, as it looks right.

With default NaN initialization, the programmer is more likely to stop and think about it, not just insert 0.

Another issue with it is:

    float x = 0.0;
    setFloat(&x);

    void setFloat(float* px) { *px = 3.0; }

For the purposes of code clarity I don't want to see a variable initialized to a value that is never used, just to shut the compiler up.

billforsternz 66 days ago | | |

How long did you think about this before making this declaration? How long did Walter Bright think about this before making his decision when designing his language? Not saying you're wrong, just something to think about perhaps.

lmm 66 days ago | | |

Yep. This is NaN as a billion dollar mistake all over again.

WalterBright 66 days ago | |

Another crucial use of NaNs is if you have a sensor. If the sensor has failed, the sensed value should be transmitted as NaN, not 0, so the receiver knows the data is bad.

AlotOfReading 66 days ago | | |

My experience is that if you write an interface that (rarely) returns NaNs, someone will use it assuming it's never NaN no matter how good the docs are. Then their code does bad things and you have to patiently explain why they're wrong and yes, they are holding isnan() wrong (in C/C++).

bumby 65 days ago | | |

Doesn’t this completely depend on the sensor failure mode? Eg if a voltage sensor internally shorts to ground, the failure will read 0V, not NaN. Or are you using “failed sensor” to only mean “not reporting” here?

I think your initialization is smart in many use cases, but the sensor application probably isn’t one of them except for that single failure mode. It can still lead to masked failures and false assumptions (“the sensor is getting a value so it must be working”). That’s the same issue as what you’re supposedly fixing by that design choice. It still requires engineering knowledge to assess correctly.

anitil 66 days ago | |

That's a very thoughtful decision, I always enjoy your updates on D

wpollock 66 days ago | |

> ... Unicode says that 0xFF is an invalid character.

Not so. You may be thinking of UTF-8 encoding. 0xff is DEL in Unicode.

LittleLily 66 days ago | | |

DEL is unicode codepoint U+007F, which is the byte 0x7F in UTF-8, not 0xFF. Perhaps you were thinking of ÿ which is codepoint U+00FF, which encodes to the bytes 0xC3 0xBF in UTF-8.

WalterBright 66 days ago | | |

The "char" type in D represents a UTF-8 code unit, the byte 0xFF is not a valid character code and is strictly forbidden.

addaon 65 days ago | |

What's the cost of this in terms of not being able to bzero() simple data structures, or use OS-cleared pages directly without dirtying them? This seems like it would turn some sparse memory usage patterns dense…

WalterBright 65 days ago | | |

You can always statically initialize them with 0:

    static float[10] array = 0.0;

GMoromisato 66 days ago |

I use nan boxing in GridWhale. It feels like the Infinite Hotel[1]: you can always add another type. Note that these techniques also rely on the fact that we don't use all 64-bits for memory addressing. If we ever do, lots of VMs will break.

For me, the major advantage of nan boxing is that you don't have to allocate a whole class of types (like floats). That saves so much at garbage collection time.

------------

[1] https://en.wikipedia.org/wiki/Hilbert%27s_paradox_of_the_Gra...

jasperry 66 days ago |

This is super useful, thanks. So if I were implementing a programming language, and wanted to have symbols to specify NaN in source code, I'd really only need quiet NaN, right? Because signaling NaN is supposed to always to raise an exception anyway?

WalterBright 66 days ago | |

I originally implemented Signalling and Quiet NaNs in the compiler. It was an abject failure. With all the transformations a compiler does, where the signalling turns into a quiet is lost. So just quiet NaNs are used.