Zero and DeepSpeed: system optimizations allow training models 100B parameters | Dark Hacker News