Also there is likely over-fitting to current benchmarks and use-cases that make objectively dumber models perform better. Within the year people will create reasoning models (allegedly) distilled from 4.5 that will match or beat it in most usecases and benchmarks humans care about.
A separate problem is that post-training is limiting current models to only "expert level". It's likely that superhuman abilities from the base model are lost.
IMO scale beats all. It's just that it's hard and that there has been comparatively little since gpt-3. The entire human race, companies and countries need to come together and work together on a distributed solution.