Google's Pathways Language Model and Chain-of-Thought

Google's Pathways Language Model and Chain-of-Thought(vaclavkosar.com)

77 points by vackosar 4 years ago | 30 comments

vackosar 4 years ago |

Correction!! The model costed around 10M not 10B! Thanks for raising that. Mistake during copying from the second slide :(

phoe18 4 years ago |

The article quotes the cost as roughly 10B$ in the first paragraph. Likely a typo? They quote 10M$ in a later paragraph.

azinman2 4 years ago | |

Ya I was like there’s no way google spend 10B on this.

vackosar 4 years ago | |

Yes, of course, thanks!

simulate-me 4 years ago |

The amount of capital needed to train these high-quality models is eye watering (not to mention the costs needed to acquire the data). Does anyone know of any well capitalized startups exploring this space?

visarga 4 years ago | |

> The amount of capital needed to train these high-quality models is eye watering

It's relative. It would cost more to open a 40 room hotel (about 320k/room), and hotels can't be copied like software.

Vetch 4 years ago | | |

It's not like that many people are opening 40 room hotels either. Such amounts are atypical within programming and CS communities.

A more relevant example is video games, imagine if the only viable ones were top end AAA games whose completed versions could only be accessed by cloud gaming?

gwern 4 years ago | |

The data here is effectively free. I don't think they would exhaust The Pile, which you can download for free. This is also true for text2image models like DALL-E 2: while OA may have invested in its own datasets, everyone else can just download LAION-400M (or if they are really ambitious, LAION-5B https://laion.ai/laion-5b-a-new-era-of-open-large-scale-mult... ).

lumost 4 years ago | |

OpenAI would be the best example. However these large language models also have limited business value today, making an startup a speculative bet that the team will beat Google/FB/AI/Academics at making a language model and find a viable business model for the resulting model.

I'd take one of those bets or the other, both are tough to pull off. Considering that the first task of such a startup would be to hand ~100-500MM to a hardware or cloud vendor I'd be hesitant to invest as an investor.

visarga 4 years ago | | |

It costs less than 10M to train. Why hand so much to hardware or cloud? Soon enough there will be open source GPT-3's, at least two are in training as we speak (BigScience and EleutherAI).

> these large language models also have limited business value today

The Instruct version of GPT-3 has become very easy to steer with just a task description. It can do so many tasks so well it's crazy. Try some interactions with the beta API.

I believe GPT-3 is already above average human level at cognitive tasks that fit in a 4000 token window. In 2-3 years I think all developers will have to adapt to the new status quo.

simulate-me 4 years ago | | |

I agree 100%, but I think viable businesses will begin to emerge especially as these large models move from text to images (and eventually to video and 3d models). If the examples shown of DALL-E 2 are indicative of its quality, then a large number of creative jobs could be replaced with a single "creative director" using the model. But the high entry cost just to attempt to train such a model will likely remain a hurdle until more business value is proven.

sjg007 4 years ago | | |

I'd just solve some existing problem with the most basic language model you can get your hands on and then move up from there. Sell it first.

rafaelero 4 years ago | |

That's literally nothing for the benefits it could provide if applied on the real world.

vackosar 4 years ago | |

Correction! Cost is around $10M not $10B.

PaulHoule 4 years ago |

I've talked about structural deficiencies in earlier language models, this one seems to be doing something about them.

vackosar 4 years ago | |

Sounds interesting! Would you link to that or describe them here? Thanks!

PaulHoule 4 years ago | | |

A very simple one is "can you write a program that might never terminate?"

If a neural network does a fixed amount of computation and that is that it is never going to be able to do things that require a program that may not terminate.

There are numerous results of theoretical computer science that apply just as well to neural networks and other algorithms even though people seem to forget it.

Another is "can an error discovered in late stage processing be fed back to an early stage and be repaired?" That's important if you are parsing a sentence like

   Squad helps dog bite victim.

It was funny because I saw Geoff Hinton give a talk in 2005, before he got super-famous, and he was talking about the idea that led to deep networks and he had a criticism of "blackboard" systems and other architectures that produced layered representations (say the radar of an anti-aircraft system that is going to start with raw signals, turn those into a set of 'blips', coalesce the 'blips' into tracks, interpret the tracks as aircraft, etc.)

Hinton said that you should build the whole system in an integrated manner and train the whole thing working end-to-end and I thought "what a neat idea" but also "there is no way this would work for the systems I'm building because it doesn't have an answer for correcting itself.

imranq 4 years ago |

$10M for a bag of numbers (i.e the learned weights of the model matrices)