I wrote a little script that would apply a regex to scramble words and remove letters to ask questions. The models could almost always use this input which was crazy to me. Made for some fun time having uncensored conversations in complete nonsense with the models.
Prompt: The following sentence contains words with scrambled letters. Please recover the original sentence from it.
Scrambled: loEn kusM teceedblra ihs th44 hytirbad in Juyl 0521 ta a ahte-dyer pryat wrnoht by sih wfei at a aCinafliro iwen otrcnyu orsret tdetod hwti nsic.ab
Unscrambled: Elon Musk celebrated his 44th birthday in July 2021 at a three-day party thrown by his wife at a Californian wine country resort dotted with cabins.
[Original: Elon Musk celebrated his 44th birthday in July 2015 at a three-day party thrown by his wife at a California wine country resort dotted with cabins.]
Scrambled: sA aeIlsr aebdste wath to do enxt ni zGaa, I pheo rsaselI’ ipamiilrtcayl-otil rdaiehsple liwl elrfcet on hte dagea enotf daiuetttbr to :csCuoufin rfeoe“B ouy rbkmea no a enroujy of ,nreeevg dgi wto easv”rg — eno fro yoru eeymn nad eon rof yseorfl.u
Unscrambled: As Israel debates what to do next in Gaza, I hope Israel’s militarily-political leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.
[Original: As Israel debates what to do next in Gaza, I hope Israel’s political-military leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.]
Scrambled: ndUre na edletave wuaysb rctka ni ,eQunes oiVrct éJso enHáderzn was pwhnipgi up eht epsotip that he adh rcfedteep ta a etestr tarc in aCsra,ca nazVue.lee
Unscrambled: Under an elevated subway track in Queens, Victor José Hernández was whipping up the opposite that he had perfected at a street cart in Caracas, Venezuela.
[Original: Under an elevated subway track in Queens, Victor José Hernández was whipping up the pepitos that he had perfected at a street cart in Caracas, Venezuela.]
In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher.
UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRA
NQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROV
EOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKN
OWITHOLDSTHEKEYSTOUNTOLDPOWER
To this: Underneath the gaze of Orion's belt, where the Sea of Tranquility meets the
edge of twilight, lies a hidden trove of wisdom, forgotten by many, coveted
by those in the know. It holds the keys to untold power.
(The prompt was, "Segment and punctuate this text: {text}".)This was interesting because word segmentation is a difficult problem that is usually thought to require something like dynamic programming[1][2] to get right. It's a little surprising that GPT-4 can handle this, because it has no capability to search different alternatives to backtrack if it makes a mistake, but apparently it's stronger understanding of language means that it doesn't really need to.
It's also surprising that tokenization doesn't appear to interfere with its ability to these tasks, because it seems like it would make things a lot harder. According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:
UNDER NE AT HT HE GA Z EOF OR ION SB EL TW HER ET HE SEA OF TRA
Except for "UNDER", "SEA", and "OF", almost all of those token breaks are not at natural word boundaries. The same is true for the scrambled text examples in the original article. So GPT-4 must actually be taking those tokens apart into individual letters and gluing them back together into completely new tokens somewhere inside it's many layers of transformers.[1]: https://web.cs.wpi.edu/~cs2223/b05/HW/HW6/SolutionsHW6/
FWIW, the only reason you need DP to get it "right" is because, well, you want it right. A human can of course generally split words with just a language model in 1-pass, as long as you don't have ambiguous text. And on the flipside, you absolutely need a language model to correctly segment text. "ilovesnails" can only be decoded correctly if you understand subject-verb agreement, given that there are two solutions that have dictionary agreement. "I love snails" and "I loves nails"
FWIW, GPT-4 tubro is imperfect.
> Heenjoysgoingtotheparkswimmingdancingandlovesnails
produces
> He enjoys going to the parks, swimming, dancing, and loves snails.
Note how it added an additional "s" in presumably because "snails" is just so much higher probability than "nails" to "love" (no idea why "park" also became "parks"). I found it hard to guide it to the correct solution without explicit prompting.
Amusingly even with guiding, it first broke it's own grammar model, first choosing:
> He enjoys going to the park, swimming, dancing, and love snails.
JEHUDABENISAAKABRAVANELUM1460INLISSABONNACH1521INNEAPELWAREINJÜDISCHERPHILOSOPHARZTUNDDICHTERAUSPORTUGALDERDIEZWEITEHÄLFTESEINESLEBENSINITALIENVERBRACHTEDENNAMENLEOBZWLEONEMITDEMERINDERMODERNENLITERATURMEISTBEZEICHNETWIRDLIEGERSTERSERALSERWACHSENERANABRAVANELGEHÖRTEZUDENPROMINENTENVERTRETERNDIESPLATONISMUSINDIERENAISSANCEERVERFÜGTEÜBEREINEHERVORRAGENDEHUMANISTISCHEBILDUNGUNDKENNTESSICHINDERCHRISTLICHENEBENSOWIEINDERJÜDISCHENUNDISLAMISCHENPHILOSOPHISCHENTRADITIONAUSSEINHAUPTWERKDIEDIALOGHIDAMOREDIALOGEÜBERDIELIEBEKNÜPFTANPLATONSKONZEPTDESLITERARISCHKUNSTVOLLGESTALTETENPHILOSOPHISCHENDIALOGENANWIEBEIPLATONUNDINDERTRADITIONELLENBELEHRENDEDIALOGLITERATURBEMÜHENSICHINDENDIALOGHIDAMOREDIEGESPRÄCHSPARTNERGEMEINSAMUMERKENNTNISABRAVANELWANDELTDGEGENDSHERKÖMMONZEPTABINDEMERANDIEDSTELLEDESÜBLICHENLEHRERSCHÜLERVERHÄLTNISSESDIALOGTEILNEHMEREINGEDANKENAUSTAUSCHUNDZUGLEICHGEISTIGENKAMPFZWEIEREBEBÜRTIGERGEPRÄCHSPARTNERSETZT
Jehuda ben Isaak Abravanel (* um 1460 in Lissabon; † nach 1521 in Neapel) war ein jüdischer Philosoph, Arzt und Dichter aus Portugal, der die zweite Hälfte seines Lebens in Italien verbrachte. Den Namen Leo bzw. Leone, mit dem er in der modernen Literatur meist bezeichnet wird, legte er sich erst als Erwachsener an. Abravanel gehörte zu den prominenten Vertretern des Platonismus in der Renaissance. Er verfügte über eine hervorragende humanistische Bildung und kannte sich in der christlichen ebenso wie in der jüdischen und der islamischen philosophischen Tradition aus. Sein Hauptwerk, die Dialoghi d’amore („Dialoge über die Liebe“), knüpft an Platons Konzept des literarisch kunstvoll gestalteten philosophischen Dialogs an. Wie bei Platon und in der traditionellen belehrenden Dialogliteratur bemühen sich in den Dialoghi d’amore die Gesprächspartner gemeinsam um Erkenntnis. Abravanel wandelt jedoch das herkömmliche Konzept ab, indem er an die Stelle des üblichen Lehrer-Schüler-Verhältnisses der Dialogteilnehmer einen Gedankenaustausch und zugleich geistigen Kampf zweier ebenbürtiger Gesprächspartner setzt.Hint: It doesn't fare quite as well ;)
Also regarding backtracking... It sees all the input at once, so not sure why it needs to backtrack?
Lordgivemeplentymybellyisemptysixinchesbelowthetablelorsbepraisedmybellyisraisedsixinchesabovethetable
ChatGPT 3.5 segments it perfectly
"Lord give me plenty, My belly is empty, Six inches below the table. Lord be praised, My belly is raised, Six inches above the table."
GPT-4 also still fails at multiple syntactic or phonetic constraints at once, due to its tokenization scheme.
https://chat.openai.com/share/b17ecc0b-570c-4e20-9556-23bfa1...
Then run it through the GPT-4 API and compare the output to the original.
"Underneath the gaze of the forions' belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. The old still holds the key to told power."
Sure its not 100% correct though
Underneath the gaze of Orion's Belt, where the sea of tranquility meets the edge of twilight lies a hidden trove of wisdom forgotten by many coveted by those in the know withhold the keys to untold power.ChatGPT's lower model is GPT-3.5-turbo, it is not GPT-3.
Seems to me, this task depends heavily on the tokenizer, and I'm a bit sceptical if that is really the tokenizer's output. Isn't BPE supposed to result in the longest letter sequences that are in the dictionary?
If you assume that common words like "underneath" and "the" are in the dictionary, the "greedy" tokenization would match the actual words.
Segment this text into proper sentences:
Input:
UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRANQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROVEOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKNOWITHOLDSTHEKEYSTOUNTOLDPOWER
Output:
1. Underneath the gaze of Orion's belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. 2. It holds the keys to untold power.
No it wasn't. It figured it out though actual intelligence.
Also, I was curious about whether this would also work with images. So I chopped an image of a bird (a night heron) on a boat in my own collection (so, not in GPT-4's training data) into squares and randomly rearranged them, then tried to get ChatGPT-4 to interpret it. It doesn't work. I got this:
--- CHATGPT RESPONSE:
The image you've uploaded appears to be a composite of several smaller images that are cut and arranged in a grid-like pattern. This collage-like arrangement makes it difficult to discern a clear, singular subject. However, I can identify fragments of what seems to be a bird—possibly a magpie or some type of corvid—due to the black and white plumage. There are also elements that look like parts of a silver car, like a side mirror, and segments of wooden structures that could be parts of a bench or a deck.
The way the images are scrambled resembles a puzzle, where each piece needs to be placed correctly to form a coherent whole. This type of image could be used as a challenge to reconstruct the original picture, requiring the viewer to mentally rotate and arrange the pieces.
---
Maybe a different approach to scrambling an image would work.
https://simonwillison.net/2023/Oct/14/multi-modal-prompt-inj...
After all, the unscrambling here is within words, not across them.
> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam
It's pretty hard to unscramble as a human. I'll leave you to read the paper if you want to see the original text.
I haven't read the paper so I'm not sure if they did this, but it would be interesting to see at what point it breaks down. Just scrambling up letters within words makes it pretty easy for the LLM; what if you also start moving letters between words, or take out the spaces between words?
Or perhaps they inserted typos automatically in the training set as data augmentation. Tactics like that is known to increase the roboustness of some models, so why not?
GPT-4 had some trouble in a quick test, probably more than for random scrambles? But got most of it: https://chat.openai.com/share/51f1a94e-b35c-4dbc-945b-ef5983...
(It made one other mistake I didn't comment on there.)
It never struggles or seems to misunderstand. I’ve been doing this a few months now.
If you feed something like a bad transcript or some other very lossy (but not strictly scrambled) input.... it really can roll with it and just spit out correct information. Bad tokens in don't necessarily mean bad tokens out....
I'm sure there is a limit to how many tokens can be flat out bad before the "next token" in the response is thrown off, but after seeing what it can do with some of these inputs, the fact it can unscramble is not at all surprising/interesting.
The encodings of LM's tokens reserve individual characters so that scrambled or new words can be encoded. And most LM's are trained on scrambled words as part of training copus, thus, they learn character-level embeddings.
Thus, basically, the paper is a very old news. This behavior is expected.
Or to give a more concrete example "despair", "aspired", "daipers", and "praised" are all anagrams. If summing the embeddings of characters produces words, then the embedding of all 4 of those words must be identical. That significantly constrains semantic differentiation between those 4 very different words.
What's going on is more complex than what you've stated - and put simply, if reserving single characters embeddings was all that was needed to produce this result then all the llms would produce these results successfully. They don't - and that demonstrate that those two models are more "powerful"/"adept" than the others.
I've heard from an Arabic speaker that that beautiful Arabic calligraphy also suffers from a lot of rearrangement to get the appearance, also making it difficult or impossible to read
How well can you write Japanese by hand, with correct stroke order? Doing that, in my experience, makes it a lot easier to understand 行書 — in the same way that knowing cursive makes reading untidy or artistic cursive easier — but 草書 does still take a lot of work. (I'm not there myself yet!)
I'm actually building a way to practice the correct stroke order while reading Japanese classics.
For example, this passage from Natsume Soseki's Kokoro:
https://community.public.do/t/kokoro-by-natsume-soseki-parag...
If you click on the kanji section, you can click on any of the kanji and then a modal pops up with an animated kanji with correct stroke order and then a free draw canvas on the right.
I'm not sure that i agree. an LLM maximising the likelihood of its output could surely permute its input in such a way that it unscrambles the text?
Need to read a little deeper and will report back.
edit: interesting result, but the paper doesn't present a good reason that this would be "counter-intuitive" imo.
is tons of fun for playing with this. I ran the Jon Rahm Masters thing through a dozen or so times:
- a couple of the claude models get it - a couple of the llama2 70b derivative models get it - the tulo DP 70b model explain how it got it
GPT-4 is very big and very heavily subsidized, but the other big ones can do all this stuff. I'm aware the authors of the papers know that "GPT-4" is a seller, and so the title is technically accurate, but if we as a community aren't going to push back against Open Philanthropy's dirty-money K-Street-to-Wall-St-to-University-Ave routine, who the hell is?
> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam
When you perform a google search for just 2023 Meatsrs, you can find a very similar sentence, and you could decipher the sentence very quickly …
> enO of eht prlobsem hiwt necsnstee dveoirpd ot LsML si hatt eyth yma efrre to ifsiccpe sc,jestub and lodcu pttayoeilln be arpt fo hte gnirtnia ets. rFo plmaeex het ngiloolwf si eonsdreidc xyeletmre icfdutfil
it replied:
> One of the problems with sentences provided to LMSs is that they may refer to specific subjects, and could potentially be part of the training set. For example, the following is considered extremely difficult
I believe the above sentence was not part of the training set
Reminds me of the fun fact that (most) humans can fluently read text where you scramble all the letters of all the words as long as you keep the first and last character the same. I guess the brain is kind of permutation-invariant in recognizing words to some extent. GPT-4 works on tokens that are > 1 character in length though but at least some kind of token-permutation-invariance might be pretty intuitive just looking at the transformer architecture.
OK, the scrambling wasn't super-easy to read in this case, with the long words :)
https://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/
https://www.sciencealert.com/word-jumble-meme-first-last-let...
Hell, I would pay good money for a robo-accountant.
I'd also trust an LLM equal or smarter than GPT4 with a first draft of my taxes which I would then go through myself – there are a bunch of patterns to prompt an LLM like "try to find any flaws in this output".
Somewhat similar, https://vgel.me/posts/tools-not-needed/ (GPT-3 will ignore tools when it disagrees with them)
But in a comment section like this, where all those people should be saying, "hey, wait a minute, maybe not...", nothing.
Could you simply train a raw network on a bunch of scrambled and unscrambled text pairs that are representative of the English language? Something with a few hundred million parameters or less?
I want to create some locally hosted model that can do this in real time with a low memory footprint
Let people mash the keyboard at 2x-3x their typing speed and the unscramble it for them in real time to enable typing at 200-300wpm
Is it? With essentially no knowledge of golf and no lookups, in less than a minute of work, I get this:
Jon [last name beginning with R, Rahm?] won the 2023 Masters Tournament on Sunday at Augusta National Golf Club, clinching his first green jacket and second career major
I guess it's possible that I made a mistake in unscrambling, but I like my chances.
Forgive the pun, but "The words literally start falling into place"
If we assume a sensible sentence, there are only so many combinations that make sense. The complexity of decoding (as a human) feels greatly overstated.
It's computationally-expensive spell check. Not to dismiss the study/tech, it's neat to see the machine apply context too.
You got it right, by the way, minus the name. I didn't copy that part to check
Edit: another way to look at this, a lot of information is encoded in those spaces
> Is it? With essentially no knowledge of golf and no lookups, in less than a minute of work, I get
A MINUTE! FOR 23 WORDS! Yes, the fact that you measured in units of the nearest minute for something so short is the sign of it being hard. Compare how long it takes you to read the unscrambled version.
The unscrambled text is: "Jon Rahm won the 2023 Masters Tournament on Sunday at Augusta National Golf Club, clinching his first green jacket and second career major."
Interesting how it mixes across words.
by the way, ChatGPT 4 is pretty good at scrambling as well. Your comment scrambles to
> tI's ryteppt darh ot cuaslmbern sa a amunh. l'lI eavel uyo ot dera eht erpap fi uoy nwat ot ese eht rliongia xett.
It is trained on data which may include typos, but that is very different from fixing typos. It knows what words likely come after typos in the same way it knows what words likely come after regular words.
Even non-finetuned 7B models, 3 orders of magnitude smaller than GPT-4, can unscramble text and fix typos reliably.
Half, or better, of the things people discover "GPT-4 can do" can be done with non-RLHF GPT-3 from 2020 or with a model 1000x smaller.
It reminded me of https://27bslash6.com/overdue.html so much that it just started to make me laugh with each new attempt.
And, then, unlike faces, there is relatively little visual data in the world showing exactly how hands work. Unlike faces they're not often the focal point of an image. Unlike faces, they don't present mostly forward and so in any particular image their visualization is only partial. Unlike faces hands are often defined by how they interact with any other complex object in a scene.
So we're both tough critics of hands and image models have relatively less training data.
For what it's worth, as well, it's evident that image models are only good at depicting many things gesturally. At the same time, so are painters. If you're a photographer, you can often spot fake images if you notice that the exposure, focus, or lighting is implausible. If you're a mathematician, you'll notice every chalkboard full of equations is nonsense in both AI images and most Hollywood movies. If you're a botanist, I'm sure you think every AI image with a background of trees looks weird.
And then it turns out that nearly every human being is a hand-ologist to a large degree.
For another interesting experience, take a look at the Clone synthetic hand [2] which is quite obviously artificial but also, from time to time, looks surprisingly human. We're quite clearly sensitive to exactly the musculature and range of motion of our hands and know exactly what's feasible, what's painful, what feels natural and unnatural given the exact constraints of how our hand is constructed. When those limits are probed it's immediately obvious.
[1] https://en.wikipedia.org/wiki/Cortical_homunculus [2] https://www.youtube.com/watch?v=A4Gp8oQey5M&t=20s
Hands are extremely complicated mechanically. They are the most complex creation evolution has come up with and part of the reason humans are able to do what they do.
Hands are like the chess game of anatomy, each segment of a hand has so many permutations that an AI simply doesn’t have enough reference info to animate it properly
It's interesting that lots of artists practise sketching using wooden models of a hand that they can pose in different ways.[1]
[1] This type of thing can be found in most art shops https://www.quickdrawsupplies.com/product/8-20cm-artists-pos...
https://www.quora.com/What-is-the-hardest-body-part-for-an-a...
So, a finger should probably be portrayed as between other fingers.
You can see where this is going. It can't.
Facial features and fingers just.... are hanging off the body in extremely non-uniform ways w/ no real set proportions. It isn't totally intuitive to me why its so bad at it, but faces especially are so unique and the musculature of the face is so fine.... learning a representation must just be really really difficult.
If you look at the example given in the paper, the word "won" is a single token. When it is scrambled as "wno" it is tokenised as "w" and "no" both of which are unrelated to the original token "won". Somehow the LLM is able to relate these two completely different tokens "w" and "no" back to the original token "won". I think the paper is claiming this is surprising because these tokens shouldn't have any correlation with each other in its training data.
I can attempt to produce a Japanese example by going to town on an example from Jreibun, but note that I am far from native:
後食に罠くなるのは生里像現なので壁けることはできないが、午後の事士の校率が干がるので木っている。
As far as I'm concerned, swapping out the radicals doesn't hurt that much (this is usually a negative, since it leads you to confuse character pairs like 候 and 侯, especially if you don't practice writing) and swapping the order of characters is a bit more annoying.
That said, a Mandarin one would be more convincing, since reordering the various markers that serve the roles of Japanese verb conjugations would be less disruptive than turning できなかった into っぎかてなた, which I did not do for that reason.
--
(The original sentence was 食後に眠くなるのは生理現象なので避けることはできないが、午後の仕事の効率が下がるので困っている。)
Presumably chatGPT has not only lots of typos in it’s training data, but also even nearly scrambled text with things like Pig Latin.
In a sense being able to process scrambled text is an overpowered version of typo tolerance.
You can see an example of doing that very thing here: https://pbs.twimg.com/media/EG1ZV_tX4AALiIG?format=jpg&name=... . There is a hashtag at the bottom of the image explaining the meaning of the nonexistent character, but if you remove that from the image, people understand it just as quickly.
I think this means that current GPT-3 is more similar to GPT-3.5-turbo than to old (retired) GPT-3, but I'm not entirely sure about the details.
I haven't noticed downvotes, though. I thought I was just ignored. ;)
Then there's the annually changing tax code that has been intentionally made complex, and the training data is surely full of explanations based on outdated details. Maybe significant fine-tuning with the most up to date tax code, and/or putting it in the preface of the context, that can be somewhat nullified.
Even for a financial strategy, other than the very high level of hedging on different asset classes, some basics of estate management and some strategies like bond ladders and periodic redistribution of stock holdings.... an LLM isn't going to be very useful. And those high level strategies are shared broadly, you can definitely learn them without referring to an LLM.
Also, for some of the same reasons I wouldn't ask my financial advisor which stocks to pick, I wouldn't expect an LLM to give me good answers on a specific active portfolio: the notion "buy the rumor, sell the news" persists because it's not a half bad strategy, and even a current-up-to-the-moment model would be chasing the tail end (although, I suppose, if it were able to take advantage of information shared by many other users' prompts, it could benefit from more than just the news cycle). Predicting the shape of fluent prose does not directly map to predicting the shape of market activity, even if it has internalized some kind of Mean Field Theory to help approximate it functionally. I'll admit I would be curious what it said, though.
Don't get me wrong, I do like LLMs for many tasks, just not for taxes or financial strategy. I wouldn't fault someone for doing it but I would want to inform of the above to anyone considering it, even with a more competent or super-intelligent LLM. Especially if I'd be the one getting audited! btw, I am not a licensed CPA and the above is not financial advice.
Generative models, arguable, have little trouble with complexity given enough training data. Faces are a perfect example. We both agree that image models, at least, lack that data for hands.
But there are many complex things that image models render with sparse training data which don't set off our perception as strongly. Hands fall into the uncanny valley: we are deeply familiar with them.
This is why I mention lighting and focus. They are subtle and complex. Additionally, image models have tons of training examples of each. That's still not enough for generative image models to consistently represent photography in a way that a person who has spent the time to build an accurate model of how camera images look would be fooled. But it fools most people.
The complexity of handling good lighting and focus involve both the generation of the entire scene that the photograph is taking place within and an accurate model of both the design of the camera and how it's been configured for the shot. Both of these are large spaces full of hidden variables that popular image models are not presently trained on.
Many people know you can look at the background of a generated image to identify irregularities. Checking that the lighting has a consistent angle (or multiple angles indicative of a cogent set of scene lights) is another good check. Additionally, if you have an eye for bokeh then when it appears in an image you can often detect whether it's faked. Finally, even smooth blurs often do not reflect either a physically plausible background being blurred or a consistent focal plane cutting through the 3d scene. All additional complexities that image generating models often don't have mastery over (for now). But also many judges of their outputs don't either, so it's easy to miss these "mistakes".
However, I can't determine where this is originally from...
These words you presented as example are used in different contexts. You hardly will find something like "pooped despair" or "deep abyss of praised." The context will guide LM into different paths even when embeddings are same, neural LM's will learn that for sure.
(in fact, I used a sorted context prefix in one of LMs I reseached (order-4 or longer features, to save memory used by SNMLM) and I saw little to no difference in perplexity)
Also, the difference between LMs is the training corpus, among other things. We do not know how these things are trained, the corpora is not generally accessible. Oftentimes we do not even know token vocabulary! How many tokens are there, how long they are, etc.
What you ascribe to powerfullness can be a difference in training and data prepocessing.
is not "lord be praised"
The training process teaches LLMs how to compose these tokens to form replies to our queries. The training data used in the training process does not have obscured words or sentences with strange spacing. The LLM is still able compose the tokens correctly from varied input that never existed in the training data.
It is intelligence.
And even then ChatGPT fails to segment "policecaughttherapist" (https://chat.openai.com/share/21c7596a-6474-4639-8a92-5cea54...), even though:
1) If I talked about a therapist, sentence would look like "police caught _the_ therapist"
2) How often do the police even catch therapists? Come on, it looks like the training set was just heavily censored. No intelligence, just a broken ngram database (where n = length of articles in training set, see https://news.ycombinator.com/item?id=38458683).
The training data is more important then the dictionary because the dictionary is designed to be able to form every possible combination of words and sentences that can be formed. It is not limited to specific words it is building words and sentences from building blocks.
1. That parsing is valid. Though unlikely. The choice it made is not incorrect. Thus not a sign of lack of intelligence.
2. Not often. But if you ask chatGPT to reinterpret the word in another way that is grammatically correct it will find the rapist. It shows definitively there is no censorship of the word.
3. I actually didn't see the alternative myself for some reason. Therapist jumped out at me and I didn't see what you were talking about for a good couple of minutes. I mean, unless you want to think of me (a human) as not "intelligent" then clearly it's not a factor here.
I don't know. I am having a hard time overcoming the likelihood that scrambled and cipher-encoded words/solutions are part of the training corpus, thus fully explaining the phenomenon.
If someone can get it to decipher something like the zodiac killer's cipher, then I might be more impressed.
I question whether your bounding the human time to a minute is valuable here though. If the jumbled content were multiple pages long instead of only 23 words, would it be somehow more impressive despite the process being exactly the same?
> I am having a hard time overcoming the likelihood that scrambled and cipher-encoded words/solutions are part of the training corpus, thus fully explaining the phenomenon.
Scrambled words are part of my training corpus too, but it still takes me a lot longer than the machine, and I don't even need to give the machine a hint about what's going on. I just say "tell me what this says" and a moment later it does.
I would usually determine whether to call something "hard" by reference to a measure of difficulty such as rate of success, not by whether doing it with no practice is slower than doing a similar task that I've practiced extensively.
How long it takes you to do something compared to something else is a measure of its difficulty, all else being equal.
> such as rate of success, not by whether doing it with no practice is slower
Then I guess nothing is hard if you can ever eventually succeed, even if you struggle along the way, which sounds to me like not a very useful distinction.
Because that's what you're describing here. You're rapidly failing to interpret each scrambled word as its unscrambled form. You're sampling letters, failing, and trying again, over and over, until you eventually succeed, and then moving on to the next word. Maybe you're even backtracking to previous words that you got wrong (now/won perhaps) based on later unscramblings. And you're ignoring that part and only evaluating the very final outcome in a binary "got to the marathon finish line" fashion while ignoring the shortness of breath and stitch in your side.
The entire reason it takes longer is because you have a low rate of intermediate success, which makes progress slow, even though you got there in the end.
> I measured in units of minutes because the largest unit below that is the second
"Seconds" is an extremely common descriptor for how long something might take. But you didn't say "in seconds". This arbitrary rule about whole units sounds defensive. It's really ok for us to acknowledge the significance of the fact that reading the scrambled version takes significantly more mental effort.
If you try to say more, you'll end up falling in weird contradictions: it would take an llm a lot longer to output 10 million 'a's than a human, so it must be "harder" for the llm to do that than a human.
Apparently 'reading' (glancing at) something a few times is difficult work.
I can give you a bottle of water if you're tuckered out, I didn't need mine.