GShard: Scaling giant models with conditional computation and automatic sharding | Dark Hacker News