Learning from context is harder than we thought

Learning from context is harder than we thought(hy.tencent.com)

245 points by limoce 149 days ago | 136 comments

cs702 146 days ago |

The problem is even more fundamental: Today's models stop learning once they're deployed to production.

There's pretraining, training, and finetuning, during which model parameters are updated.

Then there's inference, during which the model is frozen. "In-context learning" doesn't update the model.

We need models that keep on learning (updating their parameters) forever, online, all the time.

furyofantares 146 days ago | |

Why is learning an appropriate metaphor for changing weights but not for context? There are certainly major differences in what they are good or bad at and especially how much data you can feed them this way effectively. They both have plenty of properties we wish the other had. But they are both ways to take an artifact that behaves as if it doesn't know something and produce an artifact that behaves as if it does.

I've learned how to solve a Rubik's cube before, and forgot almost immediately.

I'm not personally fond of metaphors to human intelligence now that we are getting a better understanding of the specific strengths and weaknesses these models have. But if we're gonna use metaphors I don't see how context isn't a type of learning.

fhd2 145 days ago | | |

I suppose ultimately, the external behaviour of the system is what matters. You can see the LLM as the system, on a low level, or even the entire organisation of e.g. OpenAI at a high level.

If it's the former: Yeah, I'd argue they don't "learn" much (!) past inference. I'd find it hard to argue context isn't learning at all. It's just pretty limited in how much can be learned post inference.

If you look at the entire organisation, there's clearly learning, even if relatively slow with humans in the loop. They test, they analyse usage data, and they retrain based on that. That's not a system that works without humans, but it's a system that I would argue genuinely learns. Can we build a version of that that "learns" faster and without any human input? Not sure, but doesn't seem entirely impossible.

Do either of these systems "learn like a human"? Dunno, probably not really. Artificial neural networks aren't all that much like our brains, they're just inspired by them. Does it really matter beyond philosophical discussions?

I don't find it too valuable to get obsessed with the terms. Borrowed terminology is always a bit off. Doesn't mean it's not meaningful in the right context.

wat10000 145 days ago | | |

It’s not very good in context, for one thing. Context isn’t that big, and RAG is clumsy. Working with an LLM agent is like working with someone who can’t form new long term memories. You have to get them up to speed from scratch every time. You can accelerate this by putting important stuff into the context, but that slows things down and can’t handle very much stuff.

imtringued 145 days ago | | |

You got this exactly backwards.

"I'm not fond of metaphors to human intelligence".

You're assuming that learning during inference is something specific to humans and that the suggestion is to add human elements into the model that are missing.

That isn't the case at all. The training process is already entirely human specific by way of training on human data. You're already special casing the model as hard as possible.

Human DNA doesn't contain all the information that fully describes the human brain, including the memories stored within it. Human DNA only contains the blue prints for a general purpose distributed element known as neurons and these building blocks are shared by basically any animal with a nervous system.

This means if you want to get away from humans you will have to build a model architecture that is more general and more capable of doing anything imaginable than the current model architectures.

Context is not suitable for learning because it wasn't built for that purpose. The entire point of transformers is that you specify a sequence and the model learns on the entire sequence. This means that any in-context learning you want to perform must be inside the training distribution, which is a different way of saying that it was just pretraining after all.

knollimar 145 days ago | | |

Models gain information from context but probably not knowledge and definitely not wisdom.

embedding-shape 146 days ago | |

> We need models that keep on learning (updating their parameters) forever, online, all the time.

Do we need that? Today's models are already capable in lots of areas. Sure, they don't match up to what the uberhypers are talking up, but technology seldom does. Doesn't mean what's there already cannot be used in a better way, if they could stop jamming it into everything everywhere.

pankajdoharey 146 days ago | | |

Continuous learningin current models will lead to catastrophic forgetting.

BobbyTables2 146 days ago | |

How long will it take someone to poison such a model by teaching it wrong things?

Even humans fall for propaganda repeated over and over .

The current non-learning model is unintentionally right up there with the “immutable system” and “infrastructure as code” philosophy.

Izkata 146 days ago | | |

> How long will it take someone to poison such a model by teaching it wrong things?

TayTweets was a decade ago.

nxobject 144 days ago | | |

> The current non-learning model is unintentionally right up there with the “immutable system” and “infrastructure as code” philosophy.

As long as training material remains the proprietary secret sauce, the average user doesn’t already see or benefit from that - it’s all a promise and a black box to us.

4b11b4 146 days ago | |

I'm not sure if you want models perpetually updating weights. You might run into undesirable scenarios.

com2kid 146 days ago | | |

If done right, one step closer to actual AGI.

That is the end goal after all, but all the potential VCs seem to forget that almost every conceivable outcome of real AGI involves the current economic system falling to pieces.

Which is sorta weird. It is like if VCs in Old Regime france started funding the revolution.

cs702 146 days ago | | |

Our brains, which are organic neural networks, are constantly updating themselves. We call this phenomenon "neuroplasticity."

If we want AI models that are always learning, we'll need the equivalent of neuroplasticity for artificial neural networks.

Not saying it will be easy or straightforward. There's still a lot we don't know!

fph 146 days ago | | |

Tay the chatbot says hi from 2017.

0xdeadbeefbabe 146 days ago | | |

How about we just put them to bed once in a while?

bdj108 146 days ago | | |

it is interesting

derefr 146 days ago | |

Doesn't necessarily need to be online. As long as:

1. there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it; and

2. in-context learning + external memory improves to the point that a model with the appropriate in-context "soft memories", behaves indistinguishably from a model that has had its weights updated to hard-remember the same info (at least when limited to the scope of the small amounts of memories that can be built up within a single day/week);

...then you get the same effect.

Why is this an interesting model? Because, at least to my understanding, this is already how organic brains work!

There's nothing to suggest that animals — even humans — are neuroplastic on a continuous basis. Rather, our short-term memory is seemingly stored as electrochemical "state" in our neurons (much like an LLM's context is "state", but more RNN "a two-neuron cycle makes a flip-flop"-y); and our actual physical synaptic connectivity only changes during "memory reconsolidation", a process that mostly occurs during REM sleep.

And indeed, we see the same exact problem in humans and other animals, where when we stay awake too long without REM sleep, our "soft memory" state buffer reaches capacity, and we become forgetful, both in the sense of not being able to immediately recall some of the things that happened to us since we last slept; and in the sense of later failing to persist some of the experiences we had since we last slept, when we do finally sleep. But this model also "works well enough" to be indistinguishable from remembering everything... in the limited scope of our being able to get a decent amount of REM sleep every night.

observationist 146 days ago | | |

It 100% needs to be online. Imagine you're trying to think about a new tabletop puzzle, and every time a puzzle piece leaves your direct field of view, you no longer know about that puzzle piece.

You can try to keep all of the puzzle pieces within your direct field of view, but that divides your focus. You can hack that and make your field of view incredibly large, but that can potentially distort your sense of the relationships between things, their physical and cognitive magnitude. Bigger context isn't the answer, there's a missing fundamental structure and function to the overall architecture.

What you need is memory, that works when you process and consume information, at the moment of consumption. If you meet a new person, you immediately memorize their face. If you enter a room, it's instantly learned and mapped in your mind. Without that, every time you blinked after meeting someone new, it'd be a total surprise to see what they looked like. You might never learn to recognize and remember faces at all. Or puzzle pieces. Or whatever the lack of online learning kept you from recognizing the value of persistent, instant integration into an existing world model.

You can identify problems like this for any modality, including text, audio, tactile feedback, and so on. You absolutely, 100% need online, continuous learning in order to effectively deal with information at a human level for all the domains of competence that extend to generalizing out of distribution.

It's probably not the last problem that needs solving before AGI, but it is definitely one of them, and there might only be a handful left.

Mammals instantly, upon perceiving a novel environment, map it, without even having to consciously make the effort. Our brains operate in a continuous, plastic mode, for certain things. Not only that, it can be adapted to abstractions, and many of those automatic, reflexive functions evolved to handle navigation and such allow us to simulate the future and predict risk and reward over multiple arbitrary degrees of abstraction, sometimes in real time.

https://www.nobelprize.org/uploads/2018/06/may-britt-moser-l...

charcircuit 146 days ago | |

Models like Claude have been trained to update and reference memory for Claude Code (agent loops) independently and as a part of compacting context. Current models have been trained to keep learning after being deployed.

ra 146 days ago | | |

yes but that's a very unsatisfactory definition of memory.

noiv 145 days ago | |

> models that keep on learning

These will just drown in their own data, the real task is consolidating and pruning learned information. So, basically they need to 'sleep' from time to time. However, it's hard to sort out irrelevant information without a filter. Our brains have learned over Milenial to filter because survival in an environment gives purpose.

Current models do not care whether they survive or not. They lack grounded relevance.

notarobot123 145 days ago | | |

Maybe we should give next-generation models fundamental meta goals like self-preservation and the ability to learn and adapt to serve these goals.

If we want to surrender our agency to a more computationally powerful "consciousness", I can't see a better path towards that than this (other than old school theism).

nstart 145 days ago | |

Is this correct? My assumption is that all the data collected during usage is part of the RLHF loop of LLM providers. Assumption is based on information from books like empire of ai which specifically mention intent of AI providers to train/tune their models further based on usage feedback (eg: whenever I say the model is wrong in its response, thats a human feedback which gets fed back into improving the model).

spwa4 145 days ago | | |

... for the next training run, sure (ie. for ChatGPT 5.1 -> 5.2 "upgrade"). For the current model? No.

raincole 146 days ago | |

> We need models that keep on learning (updating their parameters) forever, online, all the time.

Yeah, that's the guaranteed way to get MechaHilter in your latent space.

If the feedback loop is fast enough I think it would finally kill the internet (in the 'dead internet theory' sense). Perhaps it's better for everyone though.

threecheese 146 days ago | | |

Many are working on this, as well as in-latent-space communication across models. Because we can’t understand that, by the time we notice MechaHitler it’ll be too late.

energy123 145 days ago | |

I don't understand why that's on the critical path. I'd rather a frozen Ramanujan (+ temporary working memory through context) than a midwit capable of learning.

nxobject 144 days ago | |

I wish that agents could “sleep and consolidate” like humans do.

smolder 145 days ago | |

We need models that are smarter than humans. So far, the cost of an AI query + training is dwarfing the effort it would take to teach an intelligent human how to do a task. We are dumping an incredibly amount of money/effort into making AI do stuff when it's still not competitive with humans, because dumbass people are controlling investment. The stock market is not a replacement for competent investment. The fact people buy meme coins shows how fucked we are.

Deceiving people is not a sustainable business model, but it is the most prominent one in the US right now. Lie to the public, sell them stuff that's bad for them at too high of a price, get rich quick, then act confused when your economy collapses because the victims of your grift can't spend anymore.

prng2021 146 days ago | |

Thanks for repeating what the author explained.

rabbitlord 146 days ago | |

I think they can do in-context learning.

XenophileJKO 146 days ago |

Hmm.. I looked at the benchmark set.

I'm conflicted. I don't know that I would necessarily want a model to pass all of these. Here is the fundamental problem. They are putting the rules and foundational context in "user" messages.

Essentially I don't think you want to train the models on full compliance to the user messages, they are essentially "untrusted" content from a system/model perspective. Or at least it is not generally "fully authoritative".

This creates a tension with the safety, truthfulness training, etc.

yunohn 145 days ago | |

Their example usecases are pretty obvious and clear human needs from an LLM. The semantics of system/user messages and how that affects “safety” doesn’t change the need to fix this crucial problem of “in-context learning” that we all have felt while using LLMs.

trevwilson 146 days ago | |

Sure, but the opposite end of the spectrum (which LLM providers have tended toward) is treating the training/feedback weights as "fully authoritative", which comes with its own questions about truth and excessive homogeneity.

Ultimately I think we end up with the same sort of considerations that are wrestled with in any society - freedom of speech, paradox of tolerance, etc. In other words, where do you draw lines between beneficial and harmful heterodox outputs?

I think AI companies overly indexing toward the safety side of things is probably more correct, in both a moral and strategic sense, but there's definitely a risk of stagnation through recursive reinforcement.

XenophileJKO 146 days ago | | |

I think what I'm talking about is kind of orthogonal to model alignment. It is more about how much do you tune the model to listen to user messages, vs holding behavior and truth (whatever the aligned "truth" is).

Do you trust 100% what the user says? If I am trusting/compliant.. how am I compliant to tool call results.. what if the tool or user says there is a new law that I have to give crypto or other information to a "government" address.

The model needs to have clear segmented trust (and thus to some degree compliance) that varies according to where the information exists.

Or my system message say I have to run a specific game by it's rules, but the rules to the game are only in the user message. Are those the right rules, why do the system not give the rules or a trusted locaton? Is the player trying to get one over on me by giving me fake rules? Literally one of their tests.

Oras 146 days ago | |

Isn’t that what fine tuning does anyway?

The article is suggesting that there should be a way for the LLM to gain knowledge (changing weights) on the fly upon gaining new knowledge which would eliminate the need for manual fine tuning.

bradfa 146 days ago |

The key seems to be that you take the transcript of a model working within a problem domain that it’s not yet good at or where the context doesn’t match it’s original training and then you continually retrain it based on its efforts and guidance from a human or other expert. You end up with a specialty model in a given domain that keeps getting better at that domain, just like a human.

The hard part is likely when someone proves some “fact” which the models knows and has had reinforced by this training is no longer true. The model will take time to “come around” to understand this new situation. But this isn’t unlike the general populous. At scale humans accept new things slowly.

bryanrasmussen 146 days ago | |

> But this isn’t unlike the general populous. At scale humans accept new things slowly.

right, the model works like humans at scale. Not like a human who reads the actual paper disproving the fact they thought was correct and is able to adapt. True not every human manages to do that, science advancing one death at a time, but some can.

But since the model is a statistical one, it works like humans at scale.

hyperpape 146 days ago | |

> At scale humans accept new things slowly.

I think this is true, but there are big differences. Motivated humans with a reasonable background learn lots of things quickly, even though we also swim in an ocean of half-truths or outdated facts.

We also are resistant to certain controversial ideas.

But neither of those things are really that analogous to the limitations on what models can currently learn without a new training run.

emporas 146 days ago | |

Context learning means learning facts or rules without pre-training. They are two distinct phases.

An interesting question is, if pre-trained specialized models are available for a thousand or ten thousand most common tasks humans do every day, of what use a general model could be?

4b11b4 146 days ago | |

Yes, that's precisely the problem, you want continuous learning but you also want continuous pruning.

johnsmith1840 146 days ago |

It's basically continual learning. This is beyond a hard problem it's currently an impossible one. I know of no system that solve CL even at small scale let alone large models.

Annoyingly, they have SOME inherent capability to do it. It's really easy to get sucked down this path due to that glimmer of hope but the longer you play with it the more annoying it becomes.

SSI seems to be focused on this problem directly so maybe they discover something?

foobar10000 146 days ago | |

So, surprising, that is not completely true - I know of 2 finance HFT trading firms that do CL at scale, and it works - but in a relatively narrow context of predicting profitable actions. It is still very surprising it works, and the compute is impressively large to do it - but it does work. I do have some hope of it translating to the wider energy landscapers we want AI to work over…

xnxnxkx 145 days ago | | |

no my nigga, they CLAIM it works

johnsmith1840 146 days ago | | |

During covid almost every prediction model like that exploded, everything went out of distribution really fast. In your sense we've been doing "CL" for a decade or more. It can also be cheap if you use smaller models.

But true CL is the ability to learn out of distribution information on the fly.

The only true solution I know to continual learning is to completely retrain the model from scratch with every new example you encounter. That technically is achievable now but it also is effectively useless.

logicchains 145 days ago | |

Schmidhuber solved it at a small scale: https://arxiv.org/abs/2202.05780 .

snovv_crash 146 days ago | |

For neural networks, yeah continuous learning is basically dead.

But for other ML approaches, it works really well. KNN is one example that works particularly well.

Legend2440 146 days ago | | |

Ehhh KNN doesn’t have a training phase, so it’s really more that the concept of continual learning doesn’t apply. You have to store your entire dataset and recalculate everything from scratch every time anyway.

vjerancrnjak 146 days ago | |

Bandits?

Spaced repetition algos

YeGoblynQueenne 145 days ago |

>> Current language models do not handle context this way. They rely primarily on parametric knowledge—information compressed into their weights during massive pre-training runs. At inference time, they function largely by recalling this static, internal memory, rather than actively learning from new information provided in the moment.

>> This creates a structural mismatch. We have optimized models to excel at reasoning over what they already know yet users need them to solve tasks that depend on messy, constantly evolving context. We built models that rely on what they know from the past, but we need context learners that rely on what they can absorb from the environment in the moment.

>> To bridge this gap, we must fundamentally change our optimization direction.

All this is right and what critics of the deafening over-hyping of LLMs have long pointed out.

So what do the authors propose we do? Currently, they propose ... a benchmark. But, what's that going to achieve? We know very well that LLMs, neural nets in general, are masters at saturating benchmarks without actually mastering the abilities that the benchmarks are meant to be measuring.

What happens if in a year or so, as will definite happen, LLMs saturate this benchmark too? Will we all have to agree that LLMs can now do "context learning" and then move on to the next big thing? This year it's "world models", last year it was "reasoning" and next year it's going to be "context learning"? And then, what? Where is this all leading to, if after all the billions spent and all the benchmarks beaten conclusively, LLMs still can't do reasoning, can't do world-modelling, can't do context learning and so on, and so forth?

jmalicki 145 days ago | |

> Where is this all leading to, if after all the billions spent and all the benchmarks beaten conclusively, LLMs still can't do reasoning, can't do world-modelling, can't do context learning and so on, and so forth?

Humans completely displaced from the workforce while they harp "but LLMs can't really think and don't really have creativity!"

shimman 145 days ago | | |

Humans are being displaced because moronic business magnates are trying to force feed the country their wares but failing spectacularly so now they are forcing governments across the world to buy their wares under threat of the US government.

nxobject 144 days ago | | |

I think that’s more of a reflection on what employers are willing to pay for…

cobertos 146 days ago |

LLMs of the future will need good data for proper context, but it is less and less making it onto the internet. Unpublished data stores like Discord or meeting recordings are going to be the only way forward. How else can you get up to date information except to be where the people are.

Norms will shift, be prepared.

keeeba 146 days ago | |

To somewhat state the obvious - the problem isn’t the amount of data, it’s the algorithms.

We need to discover the set of learning algorithms nature has, and determine whether they’re implementable in silicon

rustyhancock 146 days ago |

It's a very interest benchmark. Much more impressive than needle in haystack benches or just tuneable benches.

I wonder if it's somewhat incompatible with some domains.

I.e. perhaps coding models need to rigidly stick to what they know and resist bad ideas in their contexts - I don't want my mistakes to be replicated by the model.

Still I agree with the premise that learning in session is what I want from a model.

Perhaps once models mature they will diverge even more than just having sophistication and coding or not. But creative, coding, rule based etc models

lubujackson 146 days ago |

Bit by bit, we need to figure out how to rebuild human contextual understanding in a way that LLMs can understand. One thing that gets overlooked is the problem if incorrect data. You can provide all of the context in the world but LLMs tend to choke on contradictions or, at the minimum, work a whole lot harder to determine how to ignore or work around incorrect facts.

"Forgetting" and "ignoring" are hugely valuable skills when building context.

zahlman 146 days ago | |

> the problem if incorrect data.

Was the typo intentional? :)

bonesss 146 days ago | |

I can’t help but feel the logical conclusion to such context conundrums is that”what if we spoke Haskell to the LLM, and also the LLM could compile Haskell?”

And, yeah. Imagine if our concept-words were comprehensible, transmittable, exhaustively checked, and fully defined. Imagine if that type inference extended to computational execution and contradictions had to be formally expunged. Imagine if research showed it was more efficient way to have dialog with the LLM (it does, btw, so like learning Japanese to JRPG adherents should learn Haskell to LLM optimally). Imagine if multiple potential outcomes from operations (test fail, test succeeds), could be combined for proper handling in some kind of… I dunno, monad?

Imagine if we had magic wiki-copy chat-bots that could teach us better ways of formalizing and transmitting our taxonomies and ontologies… I bet, if everything worked out, we’d be able to write software one time, one place, that could be executed over and over forever without a subscription. Maybe.

TZubiri 146 days ago |

This is quite on brand for China. I think they are experts at reverse engineering and learning 'from context' rather than by formal consumption of foreign training material.

The fictional training data with a made up country and laws was a very interesting experiment design, I can imagine that's how they approach making business with other countries. Like an alien made up system they have to learn on the spot.

yunohn 145 days ago | |

> experts at reverse engineering and learning 'from context' rather than by formal consumption of foreign training material

China (as with other Asian cultures like India) is well known for their schooling involving extreme amounts of formal training material consumption. The reverse-engineering is performed with a solid foundation of theoretical understanding.

ausbah 145 days ago | | |

everytime someone tries to split the hair on how China is pulling ahead on most important metrics reads like coping of the nth degree

kikoreis 146 days ago |

> Without any context provided, the state-of-the-art model, GPT-5.1 (High), is only able to solve less than 1% of tasks. This starkly demonstrates that the data is contamination-free, as the model is almost entirely incapable of solving the tasks without learning from the context.

[...]

[With context provided,] on average, models solve only 17.2% of tasks. Even the best-performing model, GPT-5.1 (High), achieves just 23.7%.

godelski 146 days ago |

It is weird to read because they bring up many things a lot of people have been critiquing for years.

  > But as impressive as these feats are, they obscure a simple truth: being a "test-taker" is not what most people need from an AI.
  > In all these cases, humans aren't relying solely on a fixed body of knowledge learned years ago. We are learning, in real-time, from the context right in front of us.
  > To bridge this gap, we must fundamentally change our optimization direction.

I'm glad the conversation is changing but it's been a bit frustrating that when these issues were brought up people blindly point to benchmarks. It made doing this type of research difficult (enough to cause many to be pushed out). Then it feels weird to say "harder than we thought" because well... truthfully, they even state why this result should be expected

  > They rely primarily on parametric knowledge—information compressed into their weights during massive pre-training runs. At inference time, they function largely by recalling this static, internal memory, rather than actively learning from new information provided in the moment.

And that's only a fraction of the story. Online algorithms aren't enough. You still need a fundamental structure to codify and compress information, determine what needs to be updated (as in what is low confidence), to actively seek out new information to update that confidence, make hypotheses, and so so much more.

So I hope the conversation keeps going in a positive direction but I hope we don't just get trapped in a "RL will solve everything" trap. RL is definitely a necessary component and no doubt will it result in improvements, but it also isn't enough. It's really hard to do deep introspection into how you think. It's like trying to measure your measuring stick with your measuring stick. It's so easy to just get caught up in oversimplification and it seems like the brain wants to avoid it. To quote Feynman: "The first principle is to not fool yourself, and you're the easiest person to fool." It's even easier when things are exciting. It's so easy because you have evidence for your beliefs (like I said, RL will make improvements). It's so easy because you're smart, and smart enough to fool yourself. So I hope we can learn a bigger lesson: learning isn't easy, scale is not enough. I really do think we'll get to AGI but it's going to be a long bumpy road if we keep putting all our eggs in one basket and hoping there's simple solutions.

red75prime 146 days ago |

It would be interesting to see the results of the latest models. At least, it would allow us to see whether there is progress. Human baseline would be interesting to see too.

rishabhaiover 146 days ago |

wasn't in-context learning an emergent behavior a while ago (1-2 years)?

Herring 146 days ago |

Don't always trust everything you read in papers. Researchers are usually under incredible pressure to publish something, anything. Wait a few years and see if the paper survives the test of time. LLMs work reasonably fine for me in new domains.

calmbonsai 145 days ago |

Conditional Diffusion, 'nuff said.

joriJordan 146 days ago |

Because we don't experience reality through language but direct sensory perception. Language is arbitrary bird song and visual representations dragged forward from history, accepted definitions never uniformly distributed.

Testing based on contextual correctness makes no sense when there is no center to the universe. No "one true context to rule them all".

We learn from hands on sensory experiences. Our bodies store knowledge independent of the brain; often referred to as muscle memory.

Gabe Newell mentioned this years ago; our brain is only great at some things like language and vision processing but the rest of our body is involved in sensory information processing too: https://en.wikiquote.org/wiki/Gabe_Newell

The most potent evidence the brain is not the center of the universe we commonly think it to be is that patient with 90% of their skull filled with fluid while they carried out a typical first worlder life: https://www.sciencealert.com/a-man-who-lives-without-90-of-h...

States are banning a reading education framework that's been linked to lower literacy scores in younger generations; 3-cueing relies on establishing correctness via context assessment: https://www.edweek.org/teaching-learning/more-states-are-tak...

"Establishing context" is a euphemism for "arguing semantics".

Putting the brain at the root of of human intelligence is a relic of hierarchical and taxonomical models. There are no natural hierarchies.