Robust autonomy emerges from self-play

Robust autonomy emerges from self-play(arxiv.org)

140 points by reqo 1 year ago | 62 comments

markisus 1 year ago |

Some interesting points from this paper:

- All simulated agents use the same neural net with the same weights, albeit with randomized rewards and conditioning vector to allow them to behave as different types of vehicles with different types of aggressiveness. This is like driving in a world where everyone is different copies of you, but some of your copies are in rush while others are patient. This allows backprop to optimize for a sort of global utility across the entire population.

- There is no modeling of occlusion effects. Instead, agents are given the state of nearby agents, but corrupted by random noise. In the real world, occluded nearby agents can be extremely close (think about a child running out from behind a parked car). The paper comments on this.

> Both Waymax and nuPlan construct observations, maps, and other actors with auto-labeling tools from realworld perception data. This brings occlusion, incorrect or missing traffic-light states, and obstacles revealed at the last moment. Despite the minimalistic noise modeling in GIGAFLOW, the GIGAFLOW policy generalizes zero-shot to these conditions.

- The resulting policy simulates agents that are human-like, even though the system has never seen humans drive. This is a great result when one considers other reinforcement learning projects produce extremely high performance agents that humans would consider to be abusive or pathological.

nine_k 1 year ago |

Can there be "smart toys" for models that help them self-improve in a particularly efficient way?

cainxinth 1 year ago | |

A Young Lady's Illustrated Primer

Rebuff5007 1 year ago | |

In RL literature this is generally called "curriculum learning".

The curriculum is usually modeled as some form of reward function to steer learning, or sometimes by environment configuration (e.g. learn to walk on a normal surface before a slippery surface).

visarga 1 year ago | |

Yes, the smart toys are search, code execution, simulations and games.

jazzyjackson 1 year ago | |

video games are basically like this, progressive level require more skill, learned from the easier levels.

djmips 1 year ago | | |

And this is a reason we play video games? That they appeal to some ancient instinct to improve?

hirokio123 1 year ago | |

I'm creating "smart toys" like that for humans. I recently launched a mobile app. I'd love to see these research breakthroughs feed back into human learning because if humans remain foolish, the world could fall apart.

With DeepSeek R1 and these autonomous driving research results, it feels like we've entered an era where human data is no longer necessary. The ability to infinitely expand learning through simulation while maintaining safety in the real world feels like science fiction coming to life—it's truly exciting.

grandma_tea 1 year ago | |

Can you expand on that? Efficient in what way?

nine_k 1 year ago | | |

Efficient in the way of bringing the model to meet the criteria of autonomy faster. On one hand it may be something specifically efficient at reaching some autonomy qualities. OTOH it could be just something that efficiently uses the improvement in the model during training to make the subsequent training faster.

seaucre 1 year ago |

This is interesting, and I have always thought this approach worth exploring given the "bitter lesson" in other ML domains, but I think we should be skeptical until we see such models deployed and operating effectively on real-world vehicles.

dhbradshaw 1 year ago |

Interesting to see this coming out of Apple

mitthrowaway2 1 year ago |

Something about dreams that fascinates me is that I usually am genuinely surprised by events that occur in dreams. I interact with other characters whose motivation I cannot understand and whose actions I cannot fully anticipate. It feels like there's a foreign entity acting as DM.

This isn't fake surprise. Sometimes I'll wake up and think, "who on earth were those guys and what were they trying to do? And yet their actions make sense..." or, "who came up with that punchline? It's legitimately funny and I never saw it coming, so it can't have been me..."

And yet I know it's all being generated by my own brain somehow. Through some kind of privileged access level.

And then I think about the bicameral brain structure. Does our brain have two halves so that it can function in a self-play training mode during sleep? Are each halves of my brain experiencing the same dream from opposite points of view?

Apologies for the tangent; this is almost totally unrelated to the article and probably something well known to neuroscience for decades. But still, it fascinates me, and the more we learn about the effectiveness of self-play in AI, the more I wonder.

linux_devil 1 year ago |

Maybe not directly related , I find genertic algorithms and other optimisation algorithms such as Ant Colony Optimisation algorithms intersecting with this approach of self-play and leading to robust autonomy.

The28thDuck 1 year ago |

The concept of being able to simulate 42 years of “experience” in one hour seems so foreign to me. Something about it creeps me out.

ThrowawayTestr 1 year ago | |

Don't watch the White Christmas Black Mirror episode.

baq 1 year ago | | |

Maybe at this point don’t watch any black mirror episodes…?

RGamma 1 year ago | | |

Don't read Junji Ito's Nagai Yume either.

mikelevins 1 year ago | |

I had a couple of hobbies (lucid dreaming and shamanic trance drumming) that enabled me to experience big disconnects between the subjective experience of time passing and objective measurable wall-clock time. Some dreams and trances subjectively appeared to be much longer than the wall-clock time recorded by clocks and human helpers.

I don't have any definite knowledge of what's going on with that, but I suspect some part of it is my brain retroactively manufacturing the memory of lots of time passing, and some part of it is my brain confabulating episodic memory about the dream or trance as I wake up and write it down.

Human memory is well known to be generally unreliable and full of confabulated details, so I think the most parsimonious explanation for differences between the time experienced in dreams and the objectively-measurable time that passes is that our brains are just making shit up.

Of course, the idea that your brain just lies to you about the past might be just as creepy as any other explanation.

geon 1 year ago | |

Humanity experiences almost a million years per hour.

p-a_58213 1 year ago | |

If the gym is sufficiently simple and well-coded, achieving a simulation speed of 367,920x real-time (simulating 42 years in one hour) is plausible. The question is whether these simulated scenarios genuinely reflect 42 years of real-world driving experience and truly represent the information that a single agent has at its disposal when making driving decisions.

dang 1 year ago |

[stub for offtopicness]

TZubiri 1 year ago | |

[flagged]

awinter-py 1 year ago | |

[flagged]

esafak 1 year ago | | |

"Guys, do you think we'll get away it?"

surume 1 year ago |

[flagged]

dang 1 year ago | |

Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.