Robust autonomy emerges from self-play(arxiv.org) |
Robust autonomy emerges from self-play(arxiv.org) |
- All simulated agents use the same neural net with the same weights, albeit with randomized rewards and conditioning vector to allow them to behave as different types of vehicles with different types of aggressiveness. This is like driving in a world where everyone is different copies of you, but some of your copies are in rush while others are patient. This allows backprop to optimize for a sort of global utility across the entire population.
- There is no modeling of occlusion effects. Instead, agents are given the state of nearby agents, but corrupted by random noise. In the real world, occluded nearby agents can be extremely close (think about a child running out from behind a parked car). The paper comments on this.
> Both Waymax and nuPlan construct observations, maps, and other actors with auto-labeling tools from realworld perception data. This brings occlusion, incorrect or missing traffic-light states, and obstacles revealed at the last moment. Despite the minimalistic noise modeling in GIGAFLOW, the GIGAFLOW policy generalizes zero-shot to these conditions.
- The resulting policy simulates agents that are human-like, even though the system has never seen humans drive. This is a great result when one considers other reinforcement learning projects produce extremely high performance agents that humans would consider to be abusive or pathological.
The curriculum is usually modeled as some form of reward function to steer learning, or sometimes by environment configuration (e.g. learn to walk on a normal surface before a slippery surface).
With DeepSeek R1 and these autonomous driving research results, it feels like we've entered an era where human data is no longer necessary. The ability to infinitely expand learning through simulation while maintaining safety in the real world feels like science fiction coming to life—it's truly exciting.
This isn't fake surprise. Sometimes I'll wake up and think, "who on earth were those guys and what were they trying to do? And yet their actions make sense..." or, "who came up with that punchline? It's legitimately funny and I never saw it coming, so it can't have been me..."
And yet I know it's all being generated by my own brain somehow. Through some kind of privileged access level.
And then I think about the bicameral brain structure. Does our brain have two halves so that it can function in a self-play training mode during sleep? Are each halves of my brain experiencing the same dream from opposite points of view?
Apologies for the tangent; this is almost totally unrelated to the article and probably something well known to neuroscience for decades. But still, it fascinates me, and the more we learn about the effectiveness of self-play in AI, the more I wonder.
I don't have any definite knowledge of what's going on with that, but I suspect some part of it is my brain retroactively manufacturing the memory of lots of time passing, and some part of it is my brain confabulating episodic memory about the dream or trance as I wake up and write it down.
Human memory is well known to be generally unreliable and full of confabulated details, so I think the most parsimonious explanation for differences between the time experienced in dreams and the objectively-measurable time that passes is that our brains are just making shit up.
Of course, the idea that your brain just lies to you about the past might be just as creepy as any other explanation.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
Sometimes, when I dream, I am envious of people being witty in ways I think I can't be, and when I wake up I'm like "..."
Your brain didn’t really dream the joke. It dreamt the emotional response to it. The joke itself was just a prop.
Not sure where I’m going with this.
We get used to our usual normal-functioning consciousness, but there’s a whole universe of its potential modes and what-causes-what and in which order. For example, dream scenarios may easily happen in reverse, from a random emotion to drawing the scene around that, to the sense of reality, when you just remember a concept of it. Everything can be backwards, orthogonal, not in order. All “it” has to do is to make it feel normal. All these “backwards”, “normal”, “time”, “because” are not something just granted, you have to actively experience these all the time.
Also I think split brain experiments basically support that you can split a consciousness into two tbh...
Anyways obviously above is speculation
My theory is that we are always in a "dream" state. Stimuli that manage to reach our conscious attention will alter this dream.
When a asleep only very strong stimuli will reach us so for the most part our "dream" is in a feedback loop mostly doing its own thing. When awake though we have a much more weaker filter for stimuli . The direction that our "dream" takes is fully controlled by it.
Genuinely curious, but then why bother?
It is similar to solving problems. You want most of it to happen in unconsciousness, otherwise it’s too slow.
Things are learned when they are natural, without thought.
I believe it only requires that your sensory and post-sensory systems be unpredictably generative when feeding to your subjective sense-making/observer. This could be provided for within a coherent whole brain.
What you describe would be like learning chess by exploring random boards, but I'm talking about dreams as self play: learning chess by playing as white against black, without any window into black's strategy. To do that well seems to require running two brain instances in relative isolation. Dreams would be the only safe time to do that, and a bicameral brain hardware would be the most straightforward implementation. I doubt my optic nerve can play chess against my cerebellum.
My head canon on this phenomenon is that we are not quite as integrated (or isolated) as our conscious 'self' would have us believe.
Neuroscience and Buddhism alike seem to back this up... The concept of 'anatta' [0] has held up strongly for thousands of years (how could it not?). As far as I can tell, neuroscience is also increasingly clear that consciousness is an emergent phenomenon - and strange things happen when this is disrupted.
Most who have experienced psychedelics will agree that our brains are capable of providing utterly novel, previously unimaginable experiences; stranger even than most dreams.
For example, DMT users across a wide range of cultural backgrounds describe seeing 'machine elves' [1]; "People describe these entities as distinct, autonomous beings that typically present with some kind of message."
Then, look at people who 'hear voices'; a phenomenon which is quite scary in our culture, and quite normal or respected in other, older cultures. The voices can say things which surprise us even in our waking lives, sometimes even offering powerful insight, without any ingestion of substances.
So... There are these ways to experience surprising characters made by our own brains - dreams, drugs, and other oddities. My personal view is that these are always active, mostly doing their thing in the background; something like flora and fauna in the sea of our sub/consciousness. Sleep can reveal them, as can 'altered' states of mind.. As if the light of consciousness dazzles us, and we can't see the gloriously intricate machinery in the dark.
0 - https://buddhism.stackexchange.com/a/25302
1 - https://health.howstuffworks.com/wellness/drugs-alcohol/dmt-...
I had a dream once where I had 3 really good ideas for a particular product but when I woke up I couldn't remember the 3rd and best idea. Then I started wondering if there actually was a 3rd idea or did my dream just skip to the "That's a great idea" stage with no actual idea, leaving me with just the feeling.
Brains are weird, dreams doubly so.
From there, difficulty should scale up so you don’t always win, giving you that “intermittent reinforcement“ that makes games addictive
The other reasons would be overcoming other humans (esports/pvp multiplayer), discovery (story driven and exploratory games), and just passing the time (casual games).
I've occasionally been able to do the same with architecture: design a massive sprawling palace with ease as I fly through it. And much like the music, on the one and only (I think?) time I woke up and sketched as much as I could remember, it still all made sense.
But in my normal waking life I am creatively constipated. My mind aggressively criticises and crushes ideas before they get a chance to grow organically. I have one side of the creative process in my waking life (filtering) but very little of the other (synthesis).
This makes me think a couple of things:
1. I totally get why artists use drugs so much. Any way to tap into that other state must be incredibly tempting.
2. It would be so amazing if we could figure out how to record in high fidelity and interpret what's going on in these altered states of consciousness. Maybe you've composed a whole symphony while you slept once, and you just don't know it!
There's an exercise I've had a bit of success with over the years: Identify a problem, then write down the first ten solutions to it that come to mind, no matter what.
It's important to write down each response you think of, no matter how impractical, ineffective, implausible, or ridiculous. You'll start to notice when you hold back from writing something down, no matter if it's because you think it's unoriginal, it seems like it doesn't fit the problem, or if you think it's too taboo. It's that _noticing_ that you're training, that's the goal. All forms of preemptive filtering keep us from letting the beginning of an idea take root, so the other ideas that might have followed from it never even occur to us.
Children are much better at this game than adults. When asked to invent a better paper clip, adults tend to focus on how to improve a standard paper clip's paper-clipping, how economically it could be made, or to focus on and change one feature at a time. Children can come up with ideas like building-sized paper clips made of cotton candy, to suggest a pony could be a good paper clip because you could tell it when to open or close its mouth and because it makes them happy, or to tell you how faeries would make paper clips so they could fly around on them.
It can help to have a good way to avoid feeling too embarrassed to write down every idea. Maybe you feel less inhibited when you pre-commit to deleting everything you typed, or tearing up and burning the paper you wrote them on after you're done. I know that saving the ideas makes me more likely to judge them by how other people might react, but perhaps saving the ideas might help for others.
The dream may give you the feeling that someone is telling you a joke, and the feeling that makes you laugh, without the actual joke existing as a real structured text joke.
I'm aware of the possibility you're hinting at; it's my default expectation for a dream to feel sensible but not be sensible. Like dreaming that I can speak Spanish fluently, but when I wake up I realize that none of those were real words. It's when this doesn't happen that I feel so amazed: when I wake up and reflect and it still feels like there must have been someone else in there with me, whose moves I was unable to introspect, but they were nonetheless coherent.
You have multiple visual cortices that are made of roughly the same stuff as the rest of your neocortex. There is more than enough idle network/processing capacity there, given it is not being fed visual stimulus by the optic nerve, to “play chess.”