Moe inference optimizations: 15% lower expert load by request reordering(blog.doubleword.ai)3 points by mezark 45 days ago | 0 commentsNo comments yet