Table 1 is the closest thing. Device specs for six devices: 120-989 TFLOPS and 64-96 GB RAM.
An RTX 5090 is about 105 TFLOPS.
https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
- Llama 3.1 with 405B parameters: 2 TB of memory (FP32), 500 GB (FP8)
- DeepSeek R1 with 671B parameters: 1.3 TB (scaling linearly, around 600 GB for 300B parameters)
Ling claims no more than 96 GB of memory, most likely for inference. That's far more than a 20% reduction. Am I missing something?