Ask HN: What were the papers on the list Ilya Sutskever gave John Carmack?

396 points by alan-stark 3 years ago | 131 comments

John Carmack's new interview on AI/AGI [1] carries a puzzle:

“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”

What papers do you think were on this list?

[1] https://dallasinnovates.com/exclusive-qa-john-carmacks-different-path-to-artificial-general-intelligence/

dang 3 years ago |

Recent and related:

John Carmack’s ‘Different Path’ to Artificial General Intelligence - https://news.ycombinator.com/item?id=34637650 - Feb 2023 (402 comments)

sillysaurusx 3 years ago |

"The email including them got lost to Meta's two-year auto-delete policy by the time I went back to look for it last year. I have a binder with a lot of them printed out, but not all of them."

RIP. If it's any consolation, it sounds like the list is at least three years old by now. Which is a long time considering that 2016 is generally regarded as the date of the deep learning revolution.

pengaru 3 years ago | |

> If it's any consolation, it sounds like the list is at least three years old by now.

In my experience when it comes to learning technical subjects from a position of relative total ignorance, it's the older resources that are the easiest to bootstrap knowledge from. Then you basically work your way forward through the newer texts, like an accelerated replay of a domain's progress.

I think it's kind of obvious that this would be the case when you think about it. Just like how history textbooks can't keep growing in size to give all past events an equal treatment, nor can technical references as a domain matures.

You're forced to toss out stuff deemed least relevant to today, and in technical domains that's often stuff you've just started assuming as understood by the reader... where early editions of a new space would have prioritized getting the reader up to speed in something totally novel to the world.

moglito 3 years ago | |

"considering that 2016 is generally regarded as the date of the deep learning revolution" --

I thought it was 2012, when AlexNet took the imagenet crown?

sillysaurusx 3 years ago | | |

That's probably fair. But you'd be hard-pressed to find a DL stack to try out your ideas with prior to 2016, since that's when Tensorflow launched. :)

(Gosh, it's been less than a decade. Time sometimes doesn't fly, considering how much it's changed the world since then...)

vtantia 3 years ago | |

Whoops, Carmack referenced the thread and tagged Ilya in it a veiled request to publish the list - https://twitter.com/ID_AA_Carmack/status/1622673143469858816

mellosouls 3 years ago | |

Sorry - where is that sourced from? Or are you meaning it was a personal communication to you? Or it's a joke?

sillysaurusx 3 years ago | | |

He told me.

querez 3 years ago |

A lot of other posts here are biased to recent papers, and papers that had "a big impact", but miss a lot of foundations. I think this reddit post on the most foundational ML papers gives a lot more balanced overview: https://www.reddit.com/r/MachineLearning/comments/zetvmd/d_i...

sho_hn 3 years ago |

> "You’ll find people who can wax rhapsodic about the singularity and how everything is going to change with AGI. But if I just look at it and say, if 10 years from now, we have ‘universal remote employees’ that are artificial general intelligences, run on clouds, and people can just dial up and say, ‘I want five Franks today and 10 Amys, and we’re going to deploy them on these jobs,’ and you could just spin up like you can cloud-access computing resources, if you could cloud-access essentially artificial human resources for things like that—that’s the most prosaic, mundane, most banal use of something like this."

So, slavery?

optimalsolver 3 years ago |

Carmack says he's pursuing a different path to AGI, then goes straight to the guy at the center of the most saturated area of machine learning (deep learning)?

I would've hoped he'd be exploring weirder alternatives off the beaten path. I mean, neural networks might not even be necessary for AGI, but no one at OpenAI is going to tell Carmack that.

chrgy 3 years ago |

From ChatGPT, although personally I think this list is bit old but should be at the 60% mark at the very least Deep Learning:

AlexNet (2012) VGGNet (2014) ResNet (2015) GoogleNet (2015) Transformer (2017) Reinforcement Learning:

Q-Learning (Watkins & Dayan, 1992) SARSA (R. S. Sutton & Barto, 1998) DQN (Mnih et al., 2013) A3C (Mnih et al., 2016) PPO (Schulman et al., 2017) Natural Language Processing:

Word2Vec (Mikolov et al., 2013) GLUE (Wang et al., 2018) ELMo (Peters et al., 2018) GPT (Radford et al., 2018) BERT (Devlin et al., 2019)

loveparade 3 years ago | |

You are getting downvoted because this list if from ChatGPT, but as a researcher in the field, this list is actually really good, except for perhaps the SARSA and GLUE papers, which are less generally relevant. I would add WaveNet, the Seq2Seq paper, GANs, some optimizer papers (e.g. Adam), diffusion models, and some of the newer Transformer variants.

I'm very confident that this is pretty much what any researcher, including Ilya, would recommend. It really isn't hard to find those resources, they are simply the most cited papers. Of course you can go deeper into any of the subfields if you desire.

machiaweliczny 3 years ago | | |

As a hobbyst I would add Tsetlin Machines, DreamerV3, DiffusER and most RL from DeepMind.

ilaksh 3 years ago |

My guess is that multimodal transformers will probably eventually get us most of the way there for general purpose AI.

But AGI is one of those very ambiguous terms. For many people it's either an exact digital replica of human behavior that is alive, or something like a God. I think it should also apply to general purpose AI that can do most human tasks in a strictly guided way, although not have other characteristics of humans or animals. For that I think it can be built on advanced multimodal transformer-based architectures.

For the other stuff, it's worth giving a passing glance to the fairly extensive amount of research that has been labeled AGI over the last decade or so. It's not really mainstream except maybe the last couple of years because really forward looking people tend to be marginalized including in academia.

https://agi-conf.org

Looking forward, my expectation is that things like memristors or other compute-in-memory will become very popular within say 2-5 years (obviously total speculation since there are no products yet that I know of) and they will be vastly more efficient and powerful especially for AI. And there will be algorithms for general purpose AI possibly inspired by transformers or AGI research but tailored to the new particular compute-in-memory systems.

jimmySixDOF 3 years ago |

>90% of what matters today

Strikes me as the kind of thing where that last 10% will need 400 papers

mindcrime 3 years ago | |

"The first 90% is easy. It's the second 90% that kills ya."

kabdib 3 years ago | | |

"All projects are divided into three phases, each consisting of 90% of the work."

-- just about everything I've shipped :-)

michpoch 3 years ago | |

For the last 10% you'll need to write a paper yourself.

tikhonj 3 years ago | |

Along with the kind of details and tacit knowledge that never makes it into papers...

swyx 3 years ago | |

maybe thats the part he intends to deviate. he just doesnt need to reinvent the settled science.

codeviking 3 years ago |

This inspired us to do a little exploration. We used the top cited papers of a few authors to produce a list that might be interesting, and to do some additional analysis. Take a look: https://github.com/allenai/author-explorer

hexhowells 3 years ago |

While not all papers, this list contains a lot of important papers, writings, and conversations currently in AI: https://docs.google.com/document/d/1bEQM1W-1fzSVWNbS4ne5PopB...

albertzeyer 3 years ago |

(Partly copied from https://news.ycombinator.com/item?id=34640251.)

On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...

Diffusion models is also another recent different kind of model.

Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:

There is CLIP to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

klaussilveira 3 years ago |

Following: https://twitter.com/u3dcommunity/status/1621524851898089478?...

sebkomianos 3 years ago | |

Following: https://news.ycombinator.com/item?id=34643510

polskibus 3 years ago |

What about just asking Carmack on twitter?

jranieri 3 years ago | |

I did, without success.

belter 3 years ago | | |

I asked him too.

He said:

- Who are you, and how did you get into my house?

TigeriusKirk 3 years ago | | |

Is anyone asking Ilya Sutskever?

arbuge 3 years ago | |

Or, more directly, ask Sutskever...

KRAKRISMOTT 3 years ago |

Start tweeting at him until he shares

fnordpiglet 3 years ago | |

Clearly do this by tweet storming him via LLM

steveBK123 3 years ago | | |

As an AI LLM, I cannot decide which academic papers are "best" as the idea of "best" is subjective and there are many different factors that need to be considered.

EvgeniyZh 3 years ago |

Attention, scaling laws, diffusion, vision transformers, Bert/Roberta, CLIP, chinchilla, chatgpt-related papers, nerf, flamingo, RETRO/some retrieval sota

seydor 3 years ago | |

what do you mean 'scaling laws'?

EvgeniyZh 3 years ago | | |

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.

and multiple follow-ups

username3 3 years ago |

They asked on Twitter and he didn’t reply. We need someone with a blue check mark to ask. https://twitter.com/ifree0/status/1620855608839897094

mirekrusin 3 years ago | |

Ask Elon to ask him.

winwhiz 3 years ago |

I had read that somewhere else and this is as far as I got

https://twitter.com/id_aa_carmack/status/1241219019681792010

throwaway4837 3 years ago |

Wow, crazy coincidence that you all read this article yesterday too. I was thinking of emailing one of them for the list, then I fell asleep. Cold emails to scientists generally have a higher success-rate than average in my experience.

cloudking 3 years ago |

Ilya's publications may be on the list https://scholar.google.com/citations?user=x04W_mMAAAAJ&hl=en

daviziko 3 years ago |

I wonder what would Ilya Sutskever would recommend as an updated list nowadays. I don't have a twitter account, otherwise I'd ask him myself :)

Phil_Latio 3 years ago |

Not in the list: https://arxiv.org/pdf/1805.09001.pdf

evc123 3 years ago |

https://arxiv.org/abs/2210.14891

adt 3 years ago |

https://lifearchitect.ai/papers/

vikashrungta 3 years ago |

I posted a list of papers on twitter, and will be posting a summary for each of them as well. here is the list https://twitter.com/vrungta/status/1623343807227105280

Unlocking the Secrets of AI: A Journey through the Foundational Papers by @vrungta (2023)

1. "Attention is All You Need" (2017) - https://arxiv.org/abs/1706.03762 (Google Brain) 2. "Generative Adversarial Networks" (2014) - https://arxiv.org/abs/1406.2661 (University of Montreal) 3. "Dynamic Routing Between Capsules" (2017) - https://arxiv.org/abs/1710.09829 (Google Brain) 4. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (2016) - https://arxiv.org/abs/1511.06434 (University of Montreal) 5. "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - https://papers.nips.cc/paper/4824-imagenet-classification-wi... (University of Toronto) 6. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) - https://arxiv.org/abs/1810.04805 (Google) 7. "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) - https://arxiv.org/abs/1907.11692 (Facebook AI) 8. "ELMo: Deep contextualized word representations" (2018) - https://arxiv.org/abs/1802.05365 (Allen Institute for Artificial Intelligence) 9. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019) - https://arxiv.org/abs/1901.02860 (Google AI Language) 10. "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (2019) - https://arxiv.org/abs/1906.08237 (Google AI Language) 11. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (2020) - https://arxiv.org/abs/1910.10683 (Google Research) 12. "Language Models are Few-Shot Learners" (2021) - https://arxiv.org/abs/2005.14165 (OpenAI)

theusus 3 years ago |

like papers are that comprehensible.

mgaunard 3 years ago |

In my experience, all deep learning is overhyped, and most needs that are not already addressable by linear regressions can be done so with simple supervised learning.