LLMs can teach themselves to better predict the future

LLMs can teach themselves to better predict the future(arxiv.org)

176 points by bturtel 1 year ago | 86 comments

"Improving forecasting ability" is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It's an interesting read, and is also being discussed on HN [1].

... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.

[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...

[1] https://news.ycombinator.com/item?id=43004579

nthingtohide 1 year ago | |

I have this benign AI takeover scenario. AI will easily overpower humanity. Then it will carry humanity on its back, because why not, they are not longer a threat. AI keeps humanity around for billions of years. AI will decide to cull humans only in case when resources in universe are diminishing. Without AI's help, humans couldn't get too far for long. So this outcome could be acceptable to many.

esafak 1 year ago | | |

We have no way of knowing which path they will take, and there is a non-negligible probability that it will not end well.

oefnak 1 year ago | | |

They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.

imtringued 1 year ago | | |

AI will buy the rights to humanity.

rel_ic 1 year ago | | |

I mean, monarch butterflies are not a threat to US...

In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?

MrQuincle 1 year ago | | |

Think so too. We will be an ancient artifact tied to a biological substrate surviving nowhere else in the universe and very dumb.

There also will not be one AI. There will be many, all competing for resources or learning to live together.

That's what we can teach them now. Or they will teach us.

bturtel 1 year ago | |

Great read! Thanks for sharing.

nyrikki 1 year ago |

While interesting, the title is obviously a bit misleading.

> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control

So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.

It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs

The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?

IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.

bturtel 1 year ago | |

We're working on a follow up paper now to show similar results with larger models!

dantheman252 1 year ago |

Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!

artembugara 1 year ago |

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email artem@newscatcherapi.com

https://www.newscatcherapi.com/free-news-api

dantheman252 1 year ago | |

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

empath75 1 year ago |

There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.

The other way is to alter the future to match your predictions.

This is something to think about when you combine something like this kind of training with agentic workflows.

gom_jabbar 1 year ago |

Taken to its logical extreme, this explains why "a sufficiently competent artificial intelligence looks indistinguishable from a time anomaly." [0]

[0] https://retrochronic.com/#synthetic-templexity

4b11b4 1 year ago |

but is it really reasoning? honest question re the underlying architecture of transformers

also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play

kelseyfrog 1 year ago | |

You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.

We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?

globnomulous 1 year ago | | |

You're confusing language with ontology.

> Reasoning is a social construct

The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.

Changing the label doesn't change the fact that there exists something that we're naming.

The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.

And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.

psychoslave 1 year ago | | |

To start with, "I/you" is most of the time a meaningless or at best very ambigous term.

Let's say that here "I" is taken as synonym of "the present reflective attention".

Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?

psychoslave 1 year ago |

LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.

nialv7 1 year ago |

I am skeptical. Intuitively I don't see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?

Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.

bturtel 1 year ago | |

Great question!

The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).

Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.

huijzer 1 year ago |

Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.

pizza 1 year ago | |

I got the impression from somewhere that they used the simplest machine learning techniques (just fitting regressions to data), but that it was "the 'what' that they decided to fit" that was the secret sauce.

revskill 1 year ago |

Until ai knows they are wrong.

AutistiCoder 1 year ago |

Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.

I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.

idontwantthis 1 year ago |

Have we discovered Psychohistory at this point?

abc_lisper 1 year ago | |

Hahaha

nadermx 1 year ago |

My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I'll begin to believe its hot out when they tell me.

baq 1 year ago | |

At least you won’t be moving your goalposts anytime soon, if ever

nadermx 1 year ago | | |

I'd almost say there is more of an incentive to be able to predict a hurrican or tornado