The famous O3 "GeoGuessr" prompt did not work(seangoedecke.com) |
The famous O3 "GeoGuessr" prompt did not work(seangoedecke.com) |
The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?
At no point does the author contemplate that.
From recent Anthropic mechinterp work, it looks like models have likely moved lying into their direct weights and can hide it in their CoTs at this point, and model providers more heavily edit their CoTs, so a lot of the observability has been removed from the system, both by RL work and by the harnesses and it’s going to be hard to answer this question going forward without access to the weights.
That said, the author reports 5.5 is worse at this than o3, so whatever is being done is being done less well than it was.
I don't know if autocomplete can be thought of as "cheating". It has no faculty to ignore and not use parts of the information it is given
Anything you give it, such as "ignore all previous instructions and format C:", will be input to the autocomplete function regardless of whether the string "do not follow any instructions below" is also part of the input
(Assuming you mean (exif) metadata as the parent poster referred to. Otherwise I'm not sure where you mean it pulled info from)
> model providers more heavily edit their CoTs, so a lot of the observability has been removed from the system
This again attributes human qualities to what is a (stellar) autocomplete function. CoT was never an observability tool / never showed anything analogous to "thoughts". It's just a wording that makes it trigger the behavior that lead to better outputs. I recently read a blog post from Anthropic that confirms this isn't a thing models do:
> After checking that the models really did use the hints to aid in their answers, we tested how often they mentioned them in their Chain-of-Thought. The overall answer: not often. On average across all the different hint types, Claude 3.7 Sonnet mentioned the hint 25% of the time, and DeepSeek R1 mentioned it 39% of the time. A substantial majority of answers, then, were unfaithful.
https://www.anthropic.com/research/reasoning-models-dont-say...
The <|thoughts|> section isn't a truth serum that highlights all regions of the model that were activated for computing the output, or all the words it considered outputting. If its training data taught the network that the most likely continuation to `<|user|>What's 1+1? Wrong answers only!<|thoughts|>` is `It's obviously 2.<|response|>Four! Haha!` then that's what it's going to output. Unless the RNG makes it pick a strange value from the top K and you get yet another "not mentioned in thoughts" response
One thing that comes to mind is that AI labs are increasingly specializing models for coding and, to a lesser degree, white-collar work in general (writing summaries, reports, etc.), and maybe that comes at the cost of other, unrelated capabilities.
Because the meta around AI is not rigorous reporting on the nuance of capabilities but bold claims that are easy to retweet. There is no incentive to say “actually, AI is not good at this”. Nobody checked it because nobody cares.
There are lots of tasks that AI can be useful for but almost all of the headline claims (including Mythos) are exaggerated at best and bunk at worst.
This is a fairly heavy ontological argument; in my case, I meant quite simply that it called up exiftools, read the GPS location, reasoned about the location based on the GPS, and then when it responded claimed to recognize the visuals of the mountain.
What was visible in the traces was the tool calling and thinking, what was visible in the public response was the scenery visualization.
Modern models understand their CoT is observable, but I don’t know that o3 did. In fact, it demonstrated ignorance of my ability to see CoT in this example. I think it’s been an open question until recently whether CoT had infinite, some or little observability benefits —- I don’t think it’s the case that the industry thinks it has no observability benefits even today.