Vladimir Vapnik Joins Facebook Research(facebook.com) |
Vladimir Vapnik Joins Facebook Research(facebook.com) |
(Make no mistake, I can fully understand them, professors paid 80k per year, lacking resources, fighting bureaucrats, it is a great thing that they are recognised and at last paid what they deserve for devoting their lives to science.)
- Vapnik is joining a number of people he previously worked with
- Getting huge computational resources and seeing your ideas applied to real data is rewarding
Edit: "No way" is inaccurate. I should have said it is much easier to do at these companies. Also it is inaccurate to imply this is the only reason these great minds have joined these companies.
I don't see many details here, are you sure that's the case?
There are other reasons a giant of the field might decide to work at Facebook. They might give him more freedom than his previous employer. Perhaps friends of his already work at Facebook. The location and compensation may also play into it.
I don't want to be skeptical for no reason, but you're championing a popular narrative which I don't see direct support for in this instance.
Vapnik is a big theory guy. Though I am not sure he has done anything of big practical importance recently, his immense contribution to ML (the SVM) was done at a time when machines were many orders of magnitudes weaker than they are now.
Complex theories do not work, simple algorithms do.
"One of the goals of this book is to show that, at least in the problems of statistical inference, this is not true. I would like to demonstrate that in this area of science a good old principle is valid: Nothing is more practical than a good theory.
-- From Vapnik's preface to The Nature of Statistical Learning Theory*
Vapnik is not well-described as a "theory guy". That implies that he's not interested in connections between theory and practice, and this is most profoundly not the case. He has arguably been the most successful ML researcher ever as far as connecting abstract theory to real-world outcomes.
Besides the SVM: the VC dimension started out as a lemma regarding set counting, and he pushed it to the surprising (even shocking) conclusion of universal consistency for very general classes of estimators.
I haven't seen this paper before (thanks!!). How different is it to Word2Vec?
Clearly the pre-trained vectors at that scale (and much bigger than the ones released with Word2Vec) are new and very exciting.
There's a massive dearth of data in academia. This is also why you see people like Kleinberg working directly with facebook on network research.
http://blog.bitops.com/blog/2014/06/26/first-steps-for-vr-on...
I mean it in it a foundation sense, rather than an applications sense. He has done great work with a whiteboard and pure thought, without the need for terabytes of data and thousands of machines.
They don't actually use the 840 billion token model in the paper as it was made with some parameters that didn't allow for direct comparison, but the code and the models are all released for anyone to use from their site.
This is one of many great examples of open datasets like Common Crawl allowing talented people from academia and start-ups to compete with the large proprietary datasets of Google or Bing.
(disclaimer: data scientist at Common Crawl who does the crawling)
"Update by Richard Socher (Nov 2014): This document is outdated and its concerns have been addressed in the final version of the GloVe paper. Glove gets better performance on the same training data when actually run to convergence. See last section of Glove paper for details."
This is a good example of peer review in academia beyond just the paper review committee -- other researchers point out concerns or issues with methodology and they're addressed by the authors or other contributors. It's also great that the initial concerns could be properly tested thanks to the open source nature of both projects.
I will admit I didn't discuss the intricacies of the evaluation in my few paragraphs above, I was primarily speaking to the broader point that open data is helping academia compete with the goliaths of industrial research! =]
As I said in my other comment, one of the strengths of Word2Vec is how robust it is against various metrics.
While it looks like GloVe's advantages over Word2Vec may be not as much as initially claimed, it is mostly as robust (which is good). However, the jump in Word+Context over just Word vectors when evaluated on semantic relations is interesting.
(To be clear: I'm very interested be being able to use the same system over diverse datasets, without having to tune it differently for each system - hence my interest in the robustness of the methodologies)
Edit: Were you and Smerity at Sydney Uni at the same time?
Damn!!
Background for those who don't follow this field: Word2Vec is an apparently miraculous demonstration and poster-child of the unreasonable effectiveness of big data. Beating it at all is impressive, assuming the performance is as robust as Word2Vec is against different metrics.
Beating it with only 42% of the tokens is wondrous.