On the Gullibility of Language Models(twitter.com) |
On the Gullibility of Language Models(twitter.com) |
Different people give you instructions. But you never see them, only the paper with text on it. Some people are "admin" some are "user". You have to guess from context. But instead of identifying themselves, they're incredibly vague about it all, at best you may get something like "User:" before a line, to tell it apart from the ambient instructions you were given beforehand.
And somehow it's your fault if you misidentify who is supposedly writing some part of the text.
This is not "gullible", it's poor signal for the model. It has no way to know who is who, it's all the same token stream to it. No voices, no faces, no caller id, nothing for it to hang onto for recognition. What is it supposed to do?