IEEE 754 16-bit Floating Point Format(github.com) |
IEEE 754 16-bit Floating Point Format(github.com) |
The actual arithmetic is still done in 32 bit precision and the conversions are only done when loading from or storing to memory. Some GPUs actually have 16 bit arithmetic internally, but most use 32 bit ALUs and just convert on load/store.
Alternatively, if you don't care about NaNs and denorms, and whatnot (e.g. the use cases for 3d model data mentioned in the README don't really involve NaNs or denorms), some simple bitshifting can do the conversion.
Here's a snippet of Python code that I've used:
def float2half(float_val):
f = unpack('I', pack('f', float_val))[0]
if f == 0: return 0
if f == 0x80000000: return 0x8000
return ((f>>16)&0x8000) | ((((f&0x7f800000)-0x38000000)>>13)&0x7c00) | ((f>>13)&0x03ff)
def half2float(h):
if h == 0: return 0
if h == 0x8000: return 0x80000000
f = ((h&0x8000)<<16) | (((h&0x7c00)+0x1C000)<<13) | ((h&0x03FF)<<13)
return unpack('f', pack('I', f))[0]
[0] https://en.wikipedia.org/wiki/F16C