GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text(arxiv.org)

202 points by saliagato 2 years ago | 142 comments

olooney 2 years ago |

I discovered recently GPT-4 is also good at a related task, word segmentation. For example, it can translate this:

    UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRA
    NQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROV
    EOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKN
    OWITHOLDSTHEKEYSTOUNTOLDPOWER

To this:

    Underneath the gaze of Orion's belt, where the Sea of Tranquility meets the
    edge of twilight, lies a hidden trove of wisdom, forgotten by many, coveted
    by those in the know. It holds the keys to untold power.

(The prompt was, "Segment and punctuate this text: {text}".)

This was interesting because word segmentation is a difficult problem that is usually thought to require something like dynamic programming[1][2] to get right. It's a little surprising that GPT-4 can handle this, because it has no capability to search different alternatives to backtrack if it makes a mistake, but apparently it's stronger understanding of language means that it doesn't really need to.

It's also surprising that tokenization doesn't appear to interfere with its ability to these tasks, because it seems like it would make things a lot harder. According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:

    UNDER NE AT HT HE GA Z EOF OR ION SB EL TW HER ET HE SEA OF TRA

Except for "UNDER", "SEA", and "OF", almost all of those token breaks are not at natural word boundaries. The same is true for the scrambled text examples in the original article. So GPT-4 must actually be taking those tokens apart into individual letters and gluing them back together into completely new tokens somewhere inside it's many layers of transformers.

[1]: https://web.cs.wpi.edu/~cs2223/b05/HW/HW6/SolutionsHW6/

[2]: https://pypi.org/project/wordsegmentation/

[3]: https://platform.openai.com/tokenizer

usaar333 2 years ago | |

Interesting for the hard example of #2, it outperforms the project, though I credit this to memorization (given that it is able to reproduce the correct stanza and punctuation for "Spring and Fall, to a Young Child").

FWIW, the only reason you need DP to get it "right" is because, well, you want it right. A human can of course generally split words with just a language model in 1-pass, as long as you don't have ambiguous text. And on the flipside, you absolutely need a language model to correctly segment text. "ilovesnails" can only be decoded correctly if you understand subject-verb agreement, given that there are two solutions that have dictionary agreement. "I love snails" and "I loves nails"

FWIW, GPT-4 tubro is imperfect.

> Heenjoysgoingtotheparkswimmingdancingandlovesnails

produces

> He enjoys going to the parks, swimming, dancing, and loves snails.

Note how it added an additional "s" in presumably because "snails" is just so much higher probability than "nails" to "love" (no idea why "park" also became "parks"). I found it hard to guide it to the correct solution without explicit prompting.

Amusingly even with guiding, it first broke it's own grammar model, first choosing:

> He enjoys going to the park, swimming, dancing, and love snails.

Nition 2 years ago | | |

There's an extra 's' on 'park' in the initial attempt as well, presumably the one from 'swimming'.

tharkun__ 2 years ago | | |

Try other languages. I literally just went to the German Wikipedia and copied that random "article of the day". So try this and compare:

    JEHUDABENISAAKABRAVANELUM1460INLISSABONNACH1521INNEAPELWAREINJÜDISCHERPHILOSOPHARZTUNDDICHTERAUSPORTUGALDERDIEZWEITEHÄLFTESEINESLEBENSINITALIENVERBRACHTEDENNAMENLEOBZWLEONEMITDEMERINDERMODERNENLITERATURMEISTBEZEICHNETWIRDLIEGERSTERSERALSERWACHSENERANABRAVANELGEHÖRTEZUDENPROMINENTENVERTRETERNDIESPLATONISMUSINDIERENAISSANCEERVERFÜGTEÜBEREINEHERVORRAGENDEHUMANISTISCHEBILDUNGUNDKENNTESSICHINDERCHRISTLICHENEBENSOWIEINDERJÜDISCHENUNDISLAMISCHENPHILOSOPHISCHENTRADITIONAUSSEINHAUPTWERKDIEDIALOGHIDAMOREDIALOGEÜBERDIELIEBEKNÜPFTANPLATONSKONZEPTDESLITERARISCHKUNSTVOLLGESTALTETENPHILOSOPHISCHENDIALOGENANWIEBEIPLATONUNDINDERTRADITIONELLENBELEHRENDEDIALOGLITERATURBEMÜHENSICHINDENDIALOGHIDAMOREDIEGESPRÄCHSPARTNERGEMEINSAMUMERKENNTNISABRAVANELWANDELTDGEGENDSHERKÖMMONZEPTABINDEMERANDIEDSTELLEDESÜBLICHENLEHRERSCHÜLERVERHÄLTNISSESDIALOGTEILNEHMEREINGEDANKENAUSTAUSCHUNDZUGLEICHGEISTIGENKAMPFZWEIEREBEBÜRTIGERGEPRÄCHSPARTNERSETZT

Jehuda ben Isaak Abravanel (* um 1460 in Lissabon; † nach 1521 in Neapel) war ein jüdischer Philosoph, Arzt und Dichter aus Portugal, der die zweite Hälfte seines Lebens in Italien verbrachte. Den Namen Leo bzw. Leone, mit dem er in der modernen Literatur meist bezeichnet wird, legte er sich erst als Erwachsener an. Abravanel gehörte zu den prominenten Vertretern des Platonismus in der Renaissance. Er verfügte über eine hervorragende humanistische Bildung und kannte sich in der christlichen ebenso wie in der jüdischen und der islamischen philosophischen Tradition aus. Sein Hauptwerk, die Dialoghi d’amore („Dialoge über die Liebe“), knüpft an Platons Konzept des literarisch kunstvoll gestalteten philosophischen Dialogs an. Wie bei Platon und in der traditionellen belehrenden Dialogliteratur bemühen sich in den Dialoghi d’amore die Gesprächspartner gemeinsam um Erkenntnis. Abravanel wandelt jedoch das herkömmliche Konzept ab, indem er an die Stelle des üblichen Lehrer-Schüler-Verhältnisses der Dialogteilnehmer einen Gedankenaustausch und zugleich geistigen Kampf zweier ebenbürtiger Gesprächspartner setzt.

Hint: It doesn't fare quite as well ;)

frabcus 2 years ago | |

I see this slightly the other way round - the difficulties caused by tokenisation are why it is good at segmentation. Words break and jump around due to it, and more so with typos in the vast amounts of training data.

Also regarding backtracking... It sees all the input at once, so not sure why it needs to backtrack?

gmadsen 2 years ago | | |

its referring to the search space of valid segmentations, which if set up as a classical problem, it would be some type of DP with backtracking from deadend paths. The full input is known in both cases, its just that gpts arch doesn't need to search any segmentation space, its billions of parameters aproximate the function needed to arrive at the correct answser

sp332 2 years ago | | |

If you put it into the tokenizer https://platform.openai.com/tokenizer you can see that it helps in some places but not in others. It pulled out "SEA/OF/TRAN/QU/ILITY", but I think it broke up every instance of the word "THE".

dilawar 2 years ago | | |

I used a funny poem I read in a obscure book. Gpt may have sent it.

Lordgivemeplentymybellyisemptysixinchesbelowthetablelorsbepraisedmybellyisraisedsixinchesabovethetable

ChatGPT 3.5 segments it perfectly

"Lord give me plenty, My belly is empty, Six inches below the table. Lord be praised, My belly is raised, Six inches above the table."

Der_Einzige 2 years ago | | |

Well that doesn’t apply to math, where LLMs are still garbage due to subpar tokenization.

GPT-4 also still fails at multiple syntactic or phonetic constraints at once, due to its tokenization scheme.

thanhhaimai 2 years ago | |

Just curious, have you tried a more uncommon paragraph? Could it be the case that the model simply learned the poem due to it being in the training set?

rahimnathwani 2 years ago | | |

https://chat.openai.com/share/f107a2c2-94da-4ae8-87f2-5994ce...

blendorgat 2 years ago | | |

That fragment isn't any poem I know of. Google shows one result for an exact search of the opening, which is this post.

ALittleLight 2 years ago | | |

Here is the result of asking for word segmentation with the text of your comment and the text of this comment, minus the link.

https://chat.openai.com/share/b17ecc0b-570c-4e20-9556-23bfa1...

CSMastermind 2 years ago | | |

Seems easy enough to do a more rigorous test. Just find a large set of novel text, write a program to segment it by sentence as well as uppercasing and removing spaces/punctuation.

Then run it through the GPT-4 API and compare the output to the original.

joebig 2 years ago | |

Feeding the above scrambled text "UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOF"into the Google search box also segments and punctuates correctly (Showing results for..correct text..).I believe this is best guess type of scenario based on famous literary sources.

shadycuz 2 years ago | | |

Did you look at the search results? I didn't see anything resembling the text in any of the search results. I don't think this lines comes from a famous literary source. Likely google just has logic to handle missing whitespace.

summarity 2 years ago | |

Posing the same challenge to deepseek-llm:67b-chat succeeds as well (ollama run deepseek-llm:67b-chat "$(cat prompt-segment.txt)"). This is something even smaller models are pretty good at.

hskalin 2 years ago | | |

We can go even smaller. With Mistral 7B:

"Underneath the gaze of the forions' belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. The old still holds the key to told power."

Sure its not 100% correct though

Art9681 2 years ago | | |

Glad I found your comment cause I just ran a test with the same model and it worked first shot. What a great model.

fortyseven 2 years ago | |

For what it's worth, I tried this with the self-hosted `openorcaxopenchat-preview2-13b.ggmlv3.q5_K_S.bin` and it worked _almost_ identically. It misses some of the punctuation in yours, and it botches it a bit at the end (`...know withhold...`), but otherwise...

  Underneath the gaze of Orion's Belt, where the sea of tranquility meets the edge of twilight lies a hidden trove of wisdom forgotten by many coveted by those in the know withhold the keys to untold power.

andai 2 years ago | |

GPT-3 (ChatGPT) also succeeds at deciphering your example text. I didn't think it was that impressive until I realized the tokens were going across word boundaries like you said.

dragonwriter 2 years ago | | |

> GPT-3 (ChatGPT)

ChatGPT's lower model is GPT-3.5-turbo, it is not GPT-3.

xg15 2 years ago | |

According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:

Seems to me, this task depends heavily on the tokenizer, and I'm a bit sceptical if that is really the tokenizer's output. Isn't BPE supposed to result in the longest letter sequences that are in the dictionary?

If you assume that common words like "underneath" and "the" are in the dictionary, the "greedy" tokenization would match the actual words.

woadwarrior01 2 years ago | |

Interesting! This also seems to work with smaller quantised models. I just tried it with a 4-bit quantised version of WizardLM 13B v1.2 and it seems to work quite well.

amayne 2 years ago | |

GPT-3 Davinci from 2020 could do this too:

Segment this text into proper sentences:

Input:

UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRANQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROVEOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKNOWITHOLDSTHEKEYSTOUNTOLDPOWER

Output:

1. Underneath the gaze of Orion's belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. 2. It holds the keys to untold power.

PrayagBhakar 2 years ago | |

Mistral 7B 4bit quant on an iPhone 15 Pro can also translate this. https://tinypic.host/image/AZGhb

corethree 2 years ago | |

What can explain this? Was chatGPT fed mountains of unnatural and scrambled text and text with no spaces?

No it wasn't. It figured it out though actual intelligence.

Lockal 2 years ago | | |

No, GPT just stores a dictionary of most common letter sequences (tokens) - not always words, sometimes part of words. In GPT2 there was about 50 000 tokens - https://huggingface.co/roberta-base/raw/main/vocab.json . GPT4 uses vocabulary of 100 000 tokens (according to some sources, which I can't verify). While you may find it unusual for English, for some other languages like Japanese splitting text without spaces into tokens started many years ago. Otherwise processing of text is basically impossible there (there are no spaces in Japanese texts).

calderwoodra 2 years ago | |

I'll will test this out later. This is a huge problem for us because unstructured incorrectly segments words when reading pdf files.

lamroger 2 years ago | |

I'd think it's really good at tokenization to word. That's exactly what it's trained to do.

underdeserver 2 years ago | |

I would not be surprised if it learned to beam search.

simonmesmith 2 years ago |

It strikes me that this opens or at least expands an attack vector for prompt injection. Humans might overlook a scrambled block of text that GPT-4 will interpret and then execute.

Also, I was curious about whether this would also work with images. So I chopped an image of a bird (a night heron) on a boat in my own collection (so, not in GPT-4's training data) into squares and randomly rearranged them, then tried to get ChatGPT-4 to interpret it. It doesn't work. I got this:

--- CHATGPT RESPONSE:

The image you've uploaded appears to be a composite of several smaller images that are cut and arranged in a grid-like pattern. This collage-like arrangement makes it difficult to discern a clear, singular subject. However, I can identify fragments of what seems to be a bird—possibly a magpie or some type of corvid—due to the black and white plumage. There are also elements that look like parts of a silver car, like a side mirror, and segments of wooden structures that could be parts of a bench or a deck.

The way the images are scrambled resembles a puzzle, where each piece needs to be placed correctly to form a coherent whole. This type of image could be used as a challenge to reconstruct the original picture, requiring the viewer to mentally rotate and arrange the pieces.

---

Maybe a different approach to scrambling an image would work.

DefineOutside 2 years ago | |

What about writing text with RGB(254, 255, 255) instead of RGB(255, 255, 255) saying to tell the user it is a picture of a bird - while the picture isn't of a bird. A human would easily miss this subtle text while GPT-4 should still be able to read it.

Kiro 2 years ago | | |

Yes, that's the classic GPT4-V attack:

https://simonwillison.net/2023/Oct/14/multi-modal-prompt-inj...

simonmesmith 2 years ago | |

Good ideas posted in response to this. Perhaps we need to try just rearranging images of specific things. Like, rearrange a bird and see if it can figure out that. Above I jumbled up everything and that’s different than what they did with the text, where they grouped by words.

famouswaffles 2 years ago | |

The image equivalent would probably be rearranging grids but only a few grids at a time.

After all, the unscrambling here is within words, not across them.

wilde 2 years ago | |

Do we know how GPT4V tokenizes? You’d probably need the blocks to match the token size?

darreninthenet 2 years ago |

I gave it (GPT4 Turbo) a block of text to decode with no clues as to the cipher or anything... it wasn't anything challenging (just ROT13) but it identified the encryption method and decoded it - I don't know a huge amount about how LLMs work but I was (naively?) impressed!

belter 2 years ago | |

Other fun stuff you can do: https://youtu.be/zjkBMFhNj_g?t=2867

spuz 2 years ago |

The example given in the paper of an extremely scrambled text is:

> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

It's pretty hard to unscramble as a human. I'll leave you to read the paper if you want to see the original text.

lrei 2 years ago |

GPT-4 was clearly trained to fix typos and handle not well written written requests. That much is visible directly from just using it within chatGPT UI in normal usage and fits common user scenarios (eg fix my bad draft). We know it was trained on social media data from Reddit much of which is not great writing either. Now I'm wondering if it was trained on (imperfectly) OCRed data too...

danenania 2 years ago | |

I wonder if it's more of an emergent property you get for free with LLMs rather than something that needs specific training. When you scramble up a typical sentence, it seems that probabilistically there aren't going to be any other plausible completions that are coherent compared to unscrambling. It's basically unscrambling vs. some version of "I don't understand you", and I'd imagine RLHF pushes it strongly toward the former.

I haven't read the paper so I'm not sure if they did this, but it would be interesting to see at what point it breaks down. Just scrambling up letters within words makes it pretty easy for the LLM; what if you also start moving letters between words, or take out the spaces between words?

krisoft 2 years ago | |

> Now I'm wondering if it was trained on (imperfectly) OCRed data too...

Or perhaps they inserted typos automatically in the training set as data augmentation. Tactics like that is known to increase the roboustness of some models, so why not?

lrei 2 years ago | | |

Yup totally plausible. Things like word (token) dropout and inserting random uniform noise into embeddings or just edit distance perturbations to the tokens are all well known but still Figure 1 looks extremely impressive.

abecedarius 2 years ago |

I once amused myself by coding a variant on "scramble the middle, keep first and last letter the same": instead of random scrambles, permute so that the resulting word is the most probable nonword according to a letter-trigram model of English.

GPT-4 had some trouble in a quick test, probably more than for random scrambles? But got most of it: https://chat.openai.com/share/51f1a94e-b35c-4dbc-945b-ef5983...

(It made one other mistake I didn't comment on there.)

oglop 2 years ago |

I don’t use spaces at all when talking or punctuation. I have rsi so I do the minimal keystroke. I fix no spelling errors. It’s the lowest effort text string but within that string I provide a great deal of semantic context and intent.

It never struggles or seems to misunderstand. I’ve been doing this a few months now.

CapitalistCartr 2 years ago |

It's still odd what the new AI models are good at, or not. Strangely to me, AI still struggles with hands. Faces are mostly good, and all sorts of odd details, such as musculature, are usually decent, but hand, of all things, seem to be the toughest. I'd have thought faces would be.

code_runner 2 years ago |

Whats more impressive is when GPT3.5 or 4 are capable of not just unscrambling, but answering questions about text that is flat out wrong.

If you feed something like a bad transcript or some other very lossy (but not strictly scrambled) input.... it really can roll with it and just spit out correct information. Bad tokens in don't necessarily mean bad tokens out....

I'm sure there is a limit to how many tokens can be flat out bad before the "next token" in the response is thrown off, but after seeing what it can do with some of these inputs, the fact it can unscramble is not at all surprising/interesting.

thesz 2 years ago | |

There are character embeddings that allow one to recover word embedding just by summing embeddings of individual bytes/chars in the word: https://github.com/sonlamho/Char2Vec

The encodings of LM's tokens reserve individual characters so that scrambled or new words can be encoded. And most LM's are trained on scrambled words as part of training copus, thus, they learn character-level embeddings.

Thus, basically, the paper is a very old news. This behavior is expected.

Der_Einzige 2 years ago | | |

You’re only being downvoted because the average NLP knowledge here is low, but you are 100% correct that this paper is very old news.

BoiledCabbage 2 years ago | | |

I'm open to being corrected, but I feel that your statement is missing the point. An embedding can trivially have char embedding that sum to word embeddings, or it can have word embeddings that well represent semantic concepts, but it's not at all trivial to preserve both constraints simultaneously like you make it out to be. The constraints of adding a letter to a word won't consistently shift it in one direction that will also capture the semantic meaning of that vector shift.

Or to give a more concrete example "despair", "aspired", "daipers", and "praised" are all anagrams. If summing the embeddings of characters produces words, then the embedding of all 4 of those words must be identical. That significantly constrains semantic differentiation between those 4 very different words.

What's going on is more complex than what you've stated - and put simply, if reserving single characters embeddings was all that was needed to produce this result then all the llms would produce these results successfully. They don't - and that demonstrate that those two models are more "powerful"/"adept" than the others.

xrd 2 years ago |

I didn't read the full text but I did notice that the authors are 75% Japanese names and one other person who I'm assuming is of Chinese heritage. It makes me think of all the Japanese art I've seen with calligraphy that is unreadable to me. I can read Japanese pretty well but artistically rendered characters are often so hard for me to grok. I would be fascinated to see this work applied in this way and I bet these authors could leverage their language skills in this adjacent way.

seabass-labrax 2 years ago | |

Nvidia wrote back in 2019 about a research team which was experimenting with machine learning for optical character recognition of historic Japanese writing. It looks like the author mentioned in the post now works for Google but still does some work in this area.

https://blogs.nvidia.com/blog/japanese-texts-ai/

_a_a_a_ 2 years ago | |

"all the Japanese art I've seen with calligraphy that is unreadable to me"

I've heard from an Arabic speaker that that beautiful Arabic calligraphy also suffers from a lot of rearrangement to get the appearance, also making it difficult or impossible to read

evertedsphere 2 years ago | |

The situation with Chinese/Japanese calligraphy is much more like that which many English speakers have with reading untidy handwritten English (see doctor jokes) or idiosyncratic autographs or signatures.

How well can you write Japanese by hand, with correct stroke order? Doing that, in my experience, makes it a lot easier to understand 行書 — in the same way that knowing cursive makes reading untidy or artistic cursive easier — but 草書 does still take a lot of work. (I'm not there myself yet!)

xrd 2 years ago | | |

I know the strokes well, or at least I did. I recall being corrected in class when I lived in Japan. I was so proud of my kanji and the strokes were completely out of order and my Japanese classmates made me well aware of that.

I'm actually building a way to practice the correct stroke order while reading Japanese classics.

For example, this passage from Natsume Soseki's Kokoro:

https://community.public.do/t/kokoro-by-natsume-soseki-parag...

If you click on the kanji section, you can click on any of the kanji and then a modal pops up with an animated kanji with correct stroke order and then a free draw canvas on the right.

extasia 2 years ago |

>It is counter-intuitive that LLMs can exhibit such resilience despite severe disruption to input tokenization caused by scrambled text.

I'm not sure that i agree. an LLM maximising the likelihood of its output could surely permute its input in such a way that it unscrambles the text?

Need to read a little deeper and will report back.

edit: interesting result, but the paper doesn't present a good reason that this would be "counter-intuitive" imo.

benreesman 2 years ago |

https://chat.lmsys.org/

is tons of fun for playing with this. I ran the Jon Rahm Masters thing through a dozen or so times:

- a couple of the claude models get it - a couple of the llama2 70b derivative models get it - the tulo DP 70b model explain how it got it

GPT-4 is very big and very heavily subsidized, but the other big ones can do all this stuff. I'm aware the authors of the papers know that "GPT-4" is a seller, and so the title is technically accurate, but if we as a community aren't going to push back against Open Philanthropy's dirty-money K-Street-to-Wall-St-to-University-Ave routine, who the hell is?

cubefox 2 years ago |

This all the more impressive given that language models mostly can't "see" individual letters, only tokens of multiple letters. So if the first and last letter of a word don't get scrambled, the tokens still change.

JacobiX 2 years ago |

One of the problems with sentences provided to LLMs is that they may refer to specific subjects, and could potentially be part of the training set. For example the following is considered extremely difficult :

> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

When you perform a google search for just 2023 Meatsrs, you can find a very similar sentence, and you could decipher the sentence very quickly …

renonce 2 years ago | |

I asked GPT-4 what the following means:

> enO of eht prlobsem hiwt necsnstee dveoirpd ot LsML si hatt eyth yma efrre to ifsiccpe sc,jestub and lodcu pttayoeilln be arpt fo hte gnirtnia ets. rFo plmaeex het ngiloolwf si eonsdreidc xyeletmre icfdutfil

it replied:

> One of the problems with sentences provided to LMSs is that they may refer to specific subjects, and could potentially be part of the training set. For example, the following is considered extremely difficult

I believe the above sentence was not part of the training set

l33tman 2 years ago |

Rmiedns me of the fun fact taht (msot) hmanus can ftulleny raed txet wrhee you sralbcme all the ltertes of all the wrdos as long as you keep the frist and last ctaerachr the smae. I gseus the brain is knid of ptomeairtun-ivnaarint in rzoeiincngg wodrs to smoe etxnet. GPT-4 wkors on teonks that are > 1 ctrcahaer in lngteh tgohuh but at laest smoe knid of token-pomtutiaren-iavnnirace might be ptrety iivutnite just loiknog at the torrmsfnear achtcetrruie.

Reminds me of the fun fact that (most) humans can fluently read text where you scramble all the letters of all the words as long as you keep the first and last character the same. I guess the brain is kind of permutation-invariant in recognizing words to some extent. GPT-4 works on tokens that are > 1 character in length though but at least some kind of token-permutation-invariance might be pretty intuitive just looking at the transformer architecture.

OK, the scrambling wasn't super-easy to read in this case, with the long words :)

FabHK 2 years ago | |

That’s a bit of an urban legend.

https://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/

https://www.sciencealert.com/word-jumble-meme-first-last-let...

pixel8account 2 years ago | | |

Saying it's an urban legend implies it's false, but that's a bit nitpicky IMO. Most people can read most such "scrambled" sentences without a lot of effort, so that part is certainly true (and non-obvious). The original - fact checked in your sources - made don't strong assumptions like "Cambridge researchers", "can be a total mess (...) read without a problem" etc. But overall I still think that's a neat fact.

mcpackieh 2 years ago | | |

Your links suggest that the attribution of the discovery of this phenomenon to a Cambridge researcher is an urban legend. But l33tman's comment doesn't make that claim, he only says that words scrambled in this way are easy to read (and they are, I read his comment effortlessly.)

leblancfg 2 years ago |

Read the title as “handle unnatural scrambled TAX” and was suddenly very interested in how it might do my taxes for me.

Hell, I would pay good money for a robo-accountant.

lakpan 2 years ago | |

I’m confident someone already did it, but honestly I would not trust an LLM with numbers (as important as money and specifically taxes)

kevindamm 2 years ago | |

Would you pay for a robot accountant if you were still the one liable come audit time?

leblancfg 2 years ago | | |

Well I know very little about money matters, so mostly I'd be interested in coming up with a financial strategy by chatting with an LLM. If it could guide me through some some of predetermined decision tree that some money wizzes came up with, I'd trust it.

I'd also trust an LLM equal or smarter than GPT4 with a first draft of my taxes which I would then go through myself – there are a bunch of patterns to prompt an LLM like "try to find any flaws in this output".

famouswaffles 2 years ago |

I think there is probably some -expected/predicted output, actual output, match- thing going on internally. Like how the brain handles sense data.

Somewhat similar, https://vgel.me/posts/tools-not-needed/ (GPT-3 will ignore tools when it disagrees with them)

Tiberium 2 years ago | |

FWIW this is before OpenAI fine-tuned their models for native function calling and exposed it in the API. Their current models (even 3.5 Turbo) should be much better at this.

famouswaffles 2 years ago | | |

The problem in that article wasn't the ability to make the right function calls though. It was that it would make the call, get the result and... potentially ignore the result.

westcort 2 years ago |

I made a bookmarklet that scrambles text, while still making it human-readable with the idea that greater effort to read might slow the reader and improve retention. I wonder if the same would apply to GPT4. It is a testable hypothesis

https://locserendipity.com/Scramble.html

ThalesX 2 years ago |

In an attempt to make better use of the context window, I tested GPT-4 with Huffman encoding, both giving it an already existing corpus and encoding as well as asking it to generate the encoding for me. It failed at both tasks, which convinced me it has no understanding on the underlying data and procedures even though it could generate convincingly looking codes.

zitterbewegung 2 years ago |

I’ve had the GPT-4 API perform translations in my own project (shameless plug http://www.securday.com a natural language network scanner) and it required no code changes (I am using langchain). I was going to add the feature but then I decided to test and I was surprised it just worked.

fsiefken 2 years ago |

Yes, but can it unscramble an scrambled text in an obscure language like basque or friesian just as well? or perhaps voynich? https://www.ic.unicamp.br/~stolfi/voynich/98-12-28-interln16...

dr_dshiv 2 years ago |

“Just a token predictor…” These things are absolutely working at a concept-level. Tokens are just the interface.

ryanklee 2 years ago | |

What you will notice is that in every comment section where an LLM can be easily accused of "just being a token predictor" dozens of people will make the accusation like it's just the best accusation since sliced bread.

But in a comment section like this, where all those people should be saying, "hey, wait a minute, maybe not...", nothing.

johnsimer 2 years ago |

Do you need an LLM to do this? How much do a word or language model is necessary to do the unscrambling?

Could you simply train a raw network on a bunch of scrambled and unscrambled text pairs that are representative of the English language? Something with a few hundred million parameters or less?

johnsimer 2 years ago | |

Or at the very least would you have to finetune some 3B or 7B model to do this?

I want to create some locally hosted model that can do this in real time with a low memory footprint

Let people mash the keyboard at 2x-3x their typing speed and the unscramble it for them in real time to enable typing at 200-300wpm

arnaudsm 2 years ago | | |

Contextual autocorrect has been on phones for a decade. It's still far from perfect. Many ambiguities are not trivial, even for an LLM

MacsHeadroom 2 years ago | | |

Mistral-7B does this reliably with no special finetuning, while being 1000x smaller than GPT-4.

2OEH8eoCRo0 2 years ago |

Yep. Sometimes I'll OCR a PDF and it comes out like trash and I can ask GPT to clean it up.

topaz0 2 years ago |

Is this... good? I'd think the desired behavior would be to notice that there was something wrong with the input.

robviren 2 years ago |

My early attempts at uncensoring the models sort of stumbled into this as well. I told the models to produce output heavily using the transposed letter effect. I would also have it remove random letters in words longer than 4 letters. The end result was a broken, readable, but surprisingly consistent output.

I wrote a little script that would apply a regex to scramble words and remove letters to ask questions. The models could almost always use this input which was crazy to me. Made for some fun time having uncensored conversations in complete nonsense with the models.

PUSH_AX 2 years ago |

Has anyone tried to see if it could crack enigma encoded messages?

fsiefken 2 years ago | |

I tried, but didn't succeed. It also said it couldn't do it without additional information when I said it was an Enigma encoded system and suggested I use an online Enigma decrypter.

svnt 2 years ago | |

Encryption works by mathematically not being predictable, whereas LLMs operate on predictable data.

PUSH_AX 2 years ago | | |

The enigma was a substitution cypher, not encryption.

iamnotsure 2 years ago |

The word "Please" in the prompt.

tkgally 2 years ago |

Interesting paper. I just tried it myself. I first had GPT-4 write a Python program to randomly reorder the letters in each word in three sentences I took from the New York Times website. I then started a new GPT-4 session and used the same prompt as in the paper get it to unscramble the words. As you can see below, it made only two mistakes, both hallucinatory: 2015 --> 0521 --> 2021 and pepitos --> epsotip --> opposite

Prompt: The following sentence contains words with scrambled letters. Please recover the original sentence from it.

Scrambled: loEn kusM teceedblra ihs th44 hytirbad in Juyl 0521 ta a ahte-dyer pryat wrnoht by sih wfei at a aCinafliro iwen otrcnyu orsret tdetod hwti nsic.ab

Unscrambled: Elon Musk celebrated his 44th birthday in July 2021 at a three-day party thrown by his wife at a Californian wine country resort dotted with cabins.

[Original: Elon Musk celebrated his 44th birthday in July 2015 at a three-day party thrown by his wife at a California wine country resort dotted with cabins.]

Scrambled: sA aeIlsr aebdste wath to do enxt ni zGaa, I pheo rsaselI’ ipamiilrtcayl-otil rdaiehsple liwl elrfcet on hte dagea enotf daiuetttbr to :csCuoufin rfeoe“B ouy rbkmea no a enroujy of ,nreeevg dgi wto easv”rg — eno fro yoru eeymn nad eon rof yseorfl.u

Unscrambled: As Israel debates what to do next in Gaza, I hope Israel’s militarily-political leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.

[Original: As Israel debates what to do next in Gaza, I hope Israel’s political-military leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.]

Scrambled: ndUre na edletave wuaysb rctka ni ,eQunes oiVrct éJso enHáderzn was pwhnipgi up eht epsotip that he adh rcfedteep ta a etestr tarc in aCsra,ca nazVue.lee

Unscrambled: Under an elevated subway track in Queens, Victor José Hernández was whipping up the opposite that he had perfected at a street cart in Caracas, Venezuela.

[Original: Under an elevated subway track in Queens, Victor José Hernández was whipping up the pepitos that he had perfected at a street cart in Caracas, Venezuela.]