Every Flop Counts: Scaling 300B Moe LLMs Without Premium GPUs [pdf](github.com)2 points by mountainview 1 year ago | 0 commentsNo comments yet