Next Grok model training with 10T parameter model

Next Grok model training with 10T parameter model(twitter.com)

3 points by ramshanker 85 days ago | 4 comments

lifecodes 85 days ago |

I guess we are reaching the point where “10T parsmeters” sounds more like a marketing number than a meaningful metric.

Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.

Would be more interesting to see -> capability per dollar or per watt, not parameter count...

bfeynman 85 days ago |

Isn't what the leading labs are currently chasing after is not pretraining and massive parameters but enriched and deep fine tuning and post training for agentic tasks/coding? MoE with just new post training paradigms lets smaller models perform quite well, and much more pragmatic to scale inference with. Given that, this choice seems super odd, as the frontier labs seem to stay neck and neck, and I don't even see Grok being used in any benchmarks because of how poorly it performs

ramshanker 85 days ago |

This is the best publically posted model size, ever since top AI labs started treating model size as a trade secret. This should also guide next generation of inference ASICs.

carolien 84 days ago |

Sounds more them marketing number. Carolien eutrucking