Arrows of Time for Large Language Models

Arrows of Time for Large Language Models(arxiv.org)

6 points by tianlong 2 years ago | 3 comments

nyoncore 2 years ago |

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

frotaur 2 years ago | |

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlong 2 years ago |

There is a link with entropy creation?

nyoncore 2 years ago |

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

frotaur 2 years ago | |

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlong 2 years ago |

There is a link with entropy creation?