The Core Flaws of Modern AI Based on Large Language Models

The Core Flaws of Modern AI Based on Large Language Models(bykozy.me)

2 points by byko3y 150 days ago | 2 comments

PaulHoule 150 days ago |

From the viewpoint of language modeling (as opposed to reasoning) transformers are absolute genius compared to the CNN and RNN solutions we were trying before. Ultimately they are sensitive to the graph structure which is the truth about language (in two parts of the text we are talking about the same thing) and not the tree structure which is an almost-trust.

byko3y 150 days ago | |

RWKV and Mamba families LLM-s have recurrent blocks instead of attention, and they get really close to performance of Transformer on small scale. What you've probably tried was some outdated tech. The biggest advantage of Transformer is that they are easy to scale, and the scale itself brings some quality to the table.

For example, some time ago Mamba-3B matched performance of Pythia-7B: https://www.reddit.com/r/singularity/comments/18asto2/announ...

The main drawback of the legacy models was that they were hard to scale, both due to slow training (sequential processing) and poor stability with depth increase (i.e. poor gradient flow). Modern implementation no longer have these problems and able to scale decently.