On the Gullibility of Language Models

On the Gullibility of Language Models(twitter.com)

1 points by diego 2 years ago | 1 comment

Imagine being closed in a black box, and all you have as input is someone passing you written instructions through a small window. You can't see who is giving you the instructions. They're always on the same paper, same font, size, everything.

Different people give you instructions. But you never see them, only the paper with text on it. Some people are "admin" some are "user". You have to guess from context. But instead of identifying themselves, they're incredibly vague about it all, at best you may get something like "User:" before a line, to tell it apart from the ambient instructions you were given beforehand.

And somehow it's your fault if you misidentify who is supposedly writing some part of the text.

This is not "gullible", it's poor signal for the model. It has no way to know who is who, it's all the same token stream to it. No voices, no faces, no caller id, nothing for it to hang onto for recognition. What is it supposed to do?