Announcing SyntaxNet: The World’s Most Accurate Natural Language Parser

Announcing SyntaxNet: The World’s Most Accurate Natural Language Parser(googleresearch.blogspot.com)

1083 points by cjdulberger 10 years ago | 235 comments

xigency 10 years ago |

Evidence that this is the most accurate parser is here; the previous approach mentioned is a March 2016 paper, "Globally Normalized Transition-Based Neural Networks," http://arxiv.org/abs/1603.06042

"On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach."

From the original paper, "Our model achieves state-of-the-art accuracy on all of these tasks, matching or outperforming LSTMs while being significantly faster. In particular for dependency parsing on the Wall Street Journal we achieve the best-ever published unlabeled attachment score of 94.41%."

This seems like a narrower standard than described, specifically being better at parsing the Penn Treebank than the best natural language parser for English on the Wall Street Journal.

The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.

weinzierl 10 years ago | |

spaCy is another active open source (MIT) POS-tagger. In a previous discussion on HN[1] it was well received.

There is a simplified educational 200 lines python version [2] of it. It claims 96.8% for the WSJ corpus.

What am I missing here?

[1] https://news.ycombinator.com/item?id=8942783

[2] https://spacy.io/blog/part-of-speech-pos-tagger-in-python

syllogism 10 years ago | | |

Those are the part-of-speech tag accuracies. spaCy's accuracy on the PTB evaluation is 92.2% --- so it makes 20% more errors than P. McP. On the other hand, spaCy is about 200x faster.

I've been watching the line of research in SyntaxNet closely, and have been steadily working on replacing spaCy's averaged perceptron model with a neural network model. This is one of the main differences between spaCy and Parser McParseface.

The key advantage of the neural network is that it lets you take advantage of training on lots and lots more text, in a semi-supervised way. In a linear model, you grow extra parameters when you do this. The neural network stays the same size --- it just gets better. So, you can benefit from reading the whole web into the neural network. This only works a little bit in the linear model, and it makes the resulting model enormous.

Another difference is that spaCy is trained on whole documents, while P. McP. is trained in the standard set-up, using gold pre-processing. I speculate this will reduce the gap between the systems in a more realistic evlauation. Of course, P. McP can do the joint training too if they choose to. I've reached out to see whether they're interested in running the experiment: https://github.com/tensorflow/models/issues/65

hendler 10 years ago | | |

Also have been using spaCy with good results.

Just installed syntaxnet - tests passed in the following setup.

https://gist.github.com/Hendler/61831e411069815ee4ed490f553f...

INFO: Elapsed time: 908.048s, Critical Path: 640.26s

//syntaxnet:arc_standard_transitions_test PASSED in 0.0s

//syntaxnet:beam_reader_ops_test PASSED in 20.9s

//syntaxnet:graph_builder_test PASSED in 16.3s

//syntaxnet:lexicon_builder_test PASSED in 1.8s

//syntaxnet:parser_features_test PASSED in 0.0s

//syntaxnet:parser_trainer_test PASSED in 46.1s

//syntaxnet:reader_ops_test PASSED in 5.7s

//syntaxnet:sentence_features_test PASSED in 0.0s

//syntaxnet:shared_store_test PASSED in 0.5s

//syntaxnet:tagger_transitions_test PASSED in 0.0s

//syntaxnet:text_formats_test PASSED in 1.7s

//util/utf8:unicodetext_unittest PASSED in 0.0s

Some other notes:

Also using Keras with Theano. Before spaCy, StanfordNLP, Freeling, and/or NLTK.

kamkha 10 years ago | | |

spaCy's 96.8% accuracy is for the task of POS tagging while Google's reported 94% accuracy is for dependency parsing, a significantly harder problem.

charlieegan3 10 years ago | | |

spaCy also has a dependency parser - looks like this blog post is just talking about the POS-tagger.

cbr 10 years ago | |

    better at parsing the Penn Treebank than the best
    natural language parser for English on the Wall
    Street Journal

I'm pretty sure "the 20 year old Penn Treebank" and "the Wall Street Journal" are referring to the same dataset here. In the early 1990s the first large treebanking efforts were on a corpus from the WSJ, and they were released as the Penn Treebank: https://catalog.ldc.upenn.edu/LDC95T7 People report results on this dataset because that's what the field has been testing on (and overfitting to) for decades.

(I worked on a successor project, OntoNotes, that involved additional treebank annotation on broader corpora: https://catalog.ldc.upenn.edu/LDC2013T19)

xigency 10 years ago | | |

Yes, the press release is (actually) pretty difficult to parse and really opaque in how the comparison is measured, which is why I wanted to throw into question the blog's headline, "The World's Most Accurate Parser." It seems more clear now but obviously Google doesn't feel the need to overtly prove that they are the best in the world at tasks, which is a bit questionable considering their number of followers. In all, it seems they have tested against several other dependency parsers, but clearly not all of them, and it's fair to say that it is "highly accurate," but this parser still falls victim to some of the same issues that most statistical parsers do, and while faster than some dependency parsers, it is not faster than all of them.

The point about overfitting is valid, too, which is another reason why this "most accurate such model in the world" claim is obnoxious.

It's also fair to note that their advance is in fractions of percentage points on this specific dataset over models that are 5-10 years older.

musesum 10 years ago | |

> The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.

So you're referring to this LSTM?

"Andor et al. (2016)* is simply a SyntaxNet model with a larger beam and network. For futher information on the datasets, see that paper under the section "Treebank Union"."

After spending a few months hand coding a NLP parser, am rather intrigued by LSTM. I like the idea of finding coefficients, as opposed to juggling artificial labels.

xigency 10 years ago | | |

Yes, my mistake. Their claim is that SyntaxNet (originally described in the paper and improved over one month) is the best in field, whereas Parsey McParseface is just one trained instance.

eitally 10 years ago | |

Coincidentally, I had a parent/teacher conference with my 1st grader's teacher yesterday afternoon. Regarding reading level & comprehension, she remarked that current research indicates anything below about 98% comprehension isn't sufficient for reading "fluency". Before the past few years, the standard was 95% comprehension = fluency, but that extra few percentage points apparently make an enormous difference (probably because of colloquial & jargon edge case usages that carry specific meanings in specific contexts, but which aren't easy to programmatically detect, but that's just my supposition).

mentat 10 years ago | | |

Sorry, but that just doesn't make any sense to me. Practically 70% seems like enough to understand most narrative. I've read some really difficult texts (translated German theology) and for anything of meaningful complexity 98% is unreachable without a huge vocabulary and understanding of both oddities of grammar and the construction of narrative or argument.

wodenokoto 10 years ago | |

The paper you mention is the world's best results and is macparseface with broader beam search and more hidden layers.

This is an opensourcing of the March 2016 method (syntaxnet, note that in the paper there are results from several trained models) as well as a trained model that is comparable in performance but faster (macparseface).

It is very hard to separate those two things from the way they write.

teraflop 10 years ago |

This is really cool, and props to Google for making it publicly available.

The blog post says this can be used as a building block for natural language understanding applications. Does anyone have examples of how that might work? Parse trees are cool to look at, but what can I do with them?

For instance, let's say I'm interested in doing text classification. I can imagine that the parse tree would convey more semantic information than just a bag of words. Should I be turning the edges and vertices of the tree into a feature vectors somehow? I can think of a few half-baked ideas off the top of my head, but I'm sure other people have already spent a lot of time thinking about this, and I'm wondering if there are any "best practices".

fpgaminer 10 years ago |

One of the projects I'd love to develop is an automated peer editor for student essays. My wife is an english teacher and a large percentage of her time is taken up by grading papers. A large percentage of that time is then spent marking up grammar and spelling. What I envision is a website that handles that grammar/spelling bit. More importantly, I'd like it as a tool that the students use freely prior to submitting their essays to the teacher. I want them to have immediate feedback on how to improve the grammar in their essays, so they can iterate and learn. By the time the essays reach the teacher, the teacher should only have to grade for content, composition, style, plagiarism, citations, etc. Hopefully this also helps to reduce the amount of grammar that needs to be taught in-class, freeing time for more meaningful discussions.

The problem is that while I have knowledge and experience in the computer vision side of machine learning, I lack experience in NLP. And to the best of my knowledge NLP as a field has not come as far as vision, to the extent that such an automated editor would have too many mistakes. To be student facing it would need to be really accurate. On top of that it wouldn't be dealing with well formed input. The input by definition is adversarial. So unlike SyntaxNet which is built to deal with comprehensible sentences, this tool would need to deal with incomprehensible sentences. According to the link, SyntaxNet only gets 90% accuracy on random sentences from the web.

That said, I might give SyntaxNet a try. The idea would be to use SyntaxNet to extract meaning from a broken sentence, and then work backwards from the meaning to identify how the sentence can be modified to better match that meaning.

Thank you Google for contributing this tool to the community at large.

jrgoj 10 years ago |

Now for the buffalo test[1]

`echo 'Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo' | syntaxnet/demo.sh'

   buffalo NN ROOT

   +-- buffalo NN nn

   |   +-- Buffalo NNP nn

   |   |   +-- Buffalo NNP nn

   |   |   +-- buffalo NNP nn

   |   +-- buffalo NN nn

   +-- Buffalo NNP nn

        +-- buffalo NNP nn

[1]: https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...

nihonde 10 years ago | |

How does it handle the "translation" from that wiki page?

Bison from Buffalo, which bison from Buffalo bully, themselves bully bison from Buffalo.

jrgoj 10 years ago | | |

> Input: Bison from Buffalo , which bison from Buffalo bully , themselves bully bison from Buffalo . Parse:

   Bison NNP ROOT
    +-- from IN prep
    |   +-- Buffalo NNP pobj
    +-- , , punct
    +-- bison NN ccomp
    |   +-- themselves PRP nsubj
    |   +-- bully RB advmod
    |   +-- from IN prep
    |       +-- Buffalo NNP pobj
    +-- . . punct

deanclatworthy 10 years ago |

It's really nice to have access to these kinds of tools. I am sure some folks from Google are checking this, so thank you.

Analysis of the structure of a piece of text is the first step to understanding its meaning. IBM are doing some good work in this area. http://www.alchemyapi.com/products/demo/alchemylanguage

Anything in the pipeline for this project to help with classifying sentiment, emotion etc. from text?

zappo2938 10 years ago | |

Yes, we derive syntactic meaning from grammatical structure. It's one thing getting a machine to understand grammar and another to get a human to understand. If anyone is interested, Doing Grammar by Max Morenberg is an excellent source of knowledge about grammar.[0]He approaches grammar very systematically which is helpful if people want to train machines.

[0] http://www.amazon.com/Doing-Grammar-Max-Morenberg/dp/0199947...

feral 10 years ago |

I'd love to hear Chomsky's reaction to this stuff (or someone in his camp on the Chomsky vs. Norvig debate [0]).

My understanding is that Chomsky was against statistical approaches to AI, as being scientifically un-useful - eventual dead ends, which would reach a certain accuracy, and plateau - as opposed to the purer logic/grammar approaches, which reductionistically/generatively decompose things into constituent parts, in some interpretable way, which is hence more scientifically valuable, and composable - easier to build on.

But now we're seeing these very successful blended approaches, where you've got a grammatical search, which is reductionist, and produces an interpretable factoring of the sentence - but its guided by a massive (comparatively uninterpretable) neural net.

It's like AlphaGo - which is still doing search, in a very structured, rule based, reductionist way - but leveraging the more black-box statistical neural network to make the search actually efficient, and qualitatively more useful. Is this an emerging paradigm?

I used to have a lot of sympathy for the Compsky argument, and thought Norvig et al. [the machine learning community] could be accused of talking up a more prosaic 'applied ML' agenda into being more scientifically worthwhile than it actually was.

But I think systems like this are evidence that gradual, incremental, improvement of working statistical systems, can eventually yield more powerful reductionist/logical systems overall. I'd love to hear an opposing perspective from someone in the Chomsky camp, in the context of systems like this. (Which I am hopefully not strawmanning here.)

[0]Norvig's article: http://norvig.com/chomsky.html

mdip 10 years ago |

This looks fantastic. I've been fascinated with parsers ever since I got into programming in my teens (almost always centered around programming language parsing).

Curious - The parsing work I've done with programming languages was never done via machine learning, just the usual strict classification rules (which are used to parse ... code written to a strict specification). I'm guessing source code could be fed as data to an engine like this as a training model but I'm not sure what the value would be. Does anyone more experienced/smarter than me have any insights on something like that?

As a side-point:

Parsy McParseface - Well done. They managed to lob a gag over at NERC (Boaty McBoatface) and let them know that the world won't end because a product has a goofy name. Every time Google does things like this they send an unconscious remind us that they're a company that's 'still just a bunch of people like our users'. They've always been good at marketing in a way that keeps that "touchy-feely" sense about them and they've taken a free opportunity to get attention for this product beyond just the small circle of programmers.

As NERC found out, a lot of people paid attention when the winning name was Boaty McBoatface (among other, more obnoxous/less tasteful choices). A story about a new ship isn't going to hit the front page of any general news site normally and I always felt that NERC missed a prime opportunity to continue with that publicity and attention. It became a topic talked about by friends of mine who would otherwise have never paid attention to anything science related. It would have been comical, should the Boaty's mission turn up a major discovery, to hear 'serious newscasters' say the name of the ship in reference to the breakthrough. And it would have been refreshing to see that organization stick to the original name with a "Well, we tried, you spoke, it was a mistake to trust the pranksters on the web but we're not going to invoke the 'we get the final say' clause because that wasn't the spirit of the campaign. Our bad."

Someone 10 years ago |

For those wondering: the license appears to be Apache 2.0 (https://github.com/tensorflow/models)

syncro 10 years ago |

Dockerized version so you try without installing:

https://hub.docker.com/r/brianlow/syntaxnet-docker/

TeMPOraL 10 years ago |

> Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence.

Isn't the core observation about natural language that humans don't parse it at all? Grammar is a secondary, derived construct that we use to give language some stability; I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.

Anyway, some impressive results here.

ohitsdom 10 years ago |

I'm sure it's only a matter of time before someone puts this online in a format easily played with. Looking forward to that

xigency 10 years ago | |

It's already available here - https://github.com/tensorflow/models/tree/master/syntaxnet

    echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh

    Input: Bob brought the pizza to Alice .
    Parse:
    brought VBD ROOT
     +-- Bob NNP nsubj
     +-- pizza NN dobj
     |   +-- the DT det
     +-- to IN prep
     |   +-- Alice NNP pobj
     +-- . . punct

ohitsdom 10 years ago | | |

I mean fully online, where I don't have to download and setup tensorflow.

bpodgursky 10 years ago | |

I have a visualizer for CoreNLP that's OSS, would be easy to adapt: http://nlpviz.bpodgursky.com/

whitten 10 years ago | | |

Thank you for sharing this. Do you think it can handle multiple parses ?

rspeer 10 years ago |

I'm glad they point out that we need to move on from Penn Treebank when measuring the performance of NLP tools. Most communication doesn't sound like the Penn Treebank, and the decisions that annotators made when labeling Penn Treebank shouldn't constrain us forever.

Too many people mistake "we can't make taggers that are better at tagging Penn Treebank" for "we can't make taggers better", when there are so many ways that taggers could be improved in the real world. I look forward to experimenting with Parsey McParseface.

weinzierl 10 years ago |

Say, I wanted to use this for English text with a large amount of jargon. Do have to train my own model from scratch or is it possible to retrain Parsey McParseface?

How expensive is it to train a model like Parsey McParseface?

wodenokoto 10 years ago | |

Training the model is basically free. Obtaining annotated text for your domain can be quite expensive if you hire a trained linguist.

escap 10 years ago | | |

>Training the model is basically free if you have a few GPUs and don't pay electricity

scarface74 10 years ago |

I started working on a parser as a side project that could parse simple sentences, create a knowledge graph, and then you could ask questions based on the graph. I used http://m.newsinlevels.com at level 1 to feed it news articles and then you could ask questions.

It worked pretty well but I lost interest once I realized I would have to feed it tons of words. So could I use this to do something similar?

What programming language would I need to use?

zodiac 10 years ago | |

Is your work available somewhere? I would love to play with something like it.

scarface74 10 years ago | | |

It was a hack in C# with no unit tests it never got past the conceptual stage.

jventura 10 years ago |

As someone who has published work in the NLP area, I always take claimed results with a grain of salt. With that said, I still will have to read the paper to know the implementation details, although my problem with generic linguistic approaches such as this one seems to be is that it is usually hard to "port" to other languages.

For instance, the way they parse sequences of words may or may not be too specific to the English language. It is somewhat similar to what we call "overfitting" in the data-mining area, and it may invalidate this technique for other languages.

When I worked on this area (up to 2014), I worked mainly in language-independent statistical approaches. As with everything, it has its cons as you can extract information from more languages, but, in general, with less certainties.

But in general, it is good to see that the NLP area is still alive somewhere, as I can't seem to find any NLP jobs where I live! :)

Edit: I've read it in the diagonal, and it is based on a Neural Network, so in theory, if it was trained in other languages, it could return good enough results as well. It is normal for English/American authors to include only english datasets, but I would like to see an application to another language.. This is a very specialized domain of knowledge, so I'm quite limited on my analysis..

wodenokoto 10 years ago | |

It's not particularly hard to port nlp to other languages when you use these methods. You are mostly limited by tagged corpora.

Nlp is very much alive and well.

wodenokoto 10 years ago | |

They trained an expanded version of macparseface on CoNLL 09, which includes a bunch of languages and it performs very good too.

Look at the March 2016 paper they cite.

the_decider 10 years ago |

According to their paper (http://arxiv.org/pdf/1603.06042v1.pdf), the technique can also be applied to sentence compression. It would be cool if Google publishes that example code/training-data as well.

neves 10 years ago |

Shouldn't the title be renamed for "The World's Most Accurate Natural Language Parser For English"?

It's impressive how Google's natural language features, since the simpler spell check, degrades when it work with languages different from English.

zodiac 10 years ago |

> It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures.

Does "possible" mean "syntactically valid" here? If so I'd be interested in a citation for it.

Also, I wonder what kind of errors it makes wrt to the classification in http://nlp.cs.berkeley.edu/pubs/Kummerfeld-Hall-Curran-Klein...

joosters 10 years ago |

I don't see how a linguistic parser can cope with all the ambiguities in human speech or writing. It's more than a problem of semantics, you also have to know things about the world in which we live in order to make sense of which syntactic structure is correct.

e.g. take a sentence like "The cat sat on the rug. It meowed." Did the cat meow, or did the rug meow? You can't determine that by semantics, you have to know that cats meow and rugs don't. So to parse language well, you need to know an awful lot about the real world. Simply training your parser on lots of text and throwing neural nets at the code isn't going to fix this problem.

aaron-santos 10 years ago |

I'd love to see the failure modes especially relating to garden path sentences. [1]

[1] - https://en.wikipedia.org/wiki/Garden_path_sentence

xigency 10 years ago | |

Statistical models for parsing are almost guaranteed to fail at parsing garden path sentences.

mindcrash 10 years ago |

Anyone planning (or already busy) training Parsey with one of the alternative Treebanks available from Universal Dependencies [1]? Would love to know your results when you have any :)

I am personally looking for a somewhat reliable NLP parser which can handle Dutch at the moment. Preferably one which can handle POS tagging without hacking it in myself.

[1] http://universaldependencies.org/

hartator 10 years ago |

> At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways.

There is 6 links in this sentence in the original text. I get it can help to get more context around it, but I think it's actually making the text harder to "human" parse. It also feels they have hired a cheap SEO consultant to do some backlink integrations.

RichieAHB 10 years ago | |

Is this a joke? SEO consultant at Google? Backlinks? It's just a simple way to give it context, and I barely noticed the links.

Can't work out if this is a joke.

jdp23 10 years ago |

Parsey McParseface is a great name.

infogulch 10 years ago | |

A reference to the new polar ship that was publicly voted to be named 'Boaty McBoatface', but the science minister is too stuck up to roll with it[0].

[0]: https://news.ycombinator.com/item?id=11642618

djsumdog 10 years ago | | |

They did name one of them Boaty..the tiny one / unmanned sensor drone. :-/

Vote for boaty! He's keepin hope afloat!

http://shirt.woot.com/offers/vote-for-boaty

dharbin 10 years ago | |

Xey McXface is a canary for detecting naming contests within an organization.

jdright 10 years ago | | |

And also a good template for naming things! Thanks.

TeMPOraL 10 years ago | |

I wonder what's with that X McXface thing. Yes, I've heard something about Boaty McBoatface (whatever that is), but is the Boatface thing original, or is it some kind of obscure American meme?

ChrisClark 10 years ago | | |

It wasn't the original, and yes there is a sort of meme like it. Not always -face though. I've heard Beardy McBeardson, the pattern is usually X-y Mc-X-somthing.

solipsism 10 years ago | | |

Since Boaty McBoatface is a British ship, why would you assume it's an obscure American meme?

Just kidding! As an American, I can say it's a decent assumption :D

chubot 10 years ago | |

Yeah I noticed that :) Is this a Simpsons reference? Can't quite place it.

chipperyman573 10 years ago | | |

Nope, Boaty McBoatface[0]

[0] http://www.theatlantic.com/international/archive/2016/05/boa...

UK-AL 10 years ago | | |

Havent heard about Boaty McBoatface?

koide 10 years ago | | |

Google for Boaty McBoatFace

sourcd 10 years ago |

What would it take to build something like "wit.ai" using SyntaxNet ? i.e. to extract "intent" & related attributes from a sentence e.g.

Input : "How's the weather today"

Output : {"intent":"weather", "day":"Use wit-ai/duckling", "location":"..."}

amelius 10 years ago |

How would you feed a sentence to a neural net? As I understand, the inputs are usually just floating point numbers in a small range, so how is the mapping performed? And what if the sentence is longer than the number of input neurons? Can that even happen, and pose a problem?

vhold 10 years ago | |

What wodenokoto said, and also look up "Word Embeddings", word2vec is a popular method.

https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...

There's a bunch of blogs, tutorials, etc, around word2vec and other methods of generating vectors from a training set of words.

Also, in the tensorflow models codebase where this syntaxnet code lives, there is an another tensorflow-using-method of generating word embeddings with demonstration code called Swivel

https://github.com/tensorflow/models/tree/master/swivel

wodenokoto 10 years ago | |

One hot vectors. You build a dictionary of all words + one catch all for unknown words. Each word then has a position in a sparse vector.

So for example :

Yes = (0,0,1,0,0,...) No = (0,0,0,1,0,....)

Convolutional and recurrent nets can handle inputs of arbitrary lengths.

zodiac 10 years ago | |

I think Syntaxnet is just using the NN to guide search instead of doing end-to-end parsing. That said, you can feed "sequences of stuff" into recurrent neural nets! See

https://papers.nips.cc/paper/5346-sequence-to-sequence-learn...

w_t_payne 10 years ago |

Cool - I reckon I'm going to try to use it to build a "linter" for natural language requirements specifications. (I'm a bit sad like that).

WWKong 10 years ago |

Anyone know a tool that does Natural Language to SQL?

escap 10 years ago | |

http://quepy.machinalis.com/about.html http://kueri.me/product/

Animats 10 years ago |

This could lead to a fun WordPress plug-in. All postings must be parsable by this parser.

Surprisingly, this thing is written in C++.

zem 10 years ago |

one interesting use i can think of is new improved readability scores that can take into account words that are common or uncommon depending on part of speech. (e.g. a text that used "effect" as a noun would be lower-level than one that used "effect" as a verb)

vicaya 10 years ago |

    1. WordNet
    2. ImageNet
    3. SyntaxNet

    ...

    n. SkyNet

instakill 10 years ago |

What are some use cases for this for hobbyists?

degenerate 10 years ago |

I'd love to let this loose on the comments section of worldstarhiphop or liveleak and see what it comes up with...

bertan 10 years ago |

Parsey McParseface <3

jweir 10 years ago |

Parsey McParseface? Nice touch Google.

https://github.com/tensorflow/models/tree/master/syntaxnet/s...

scriptle 10 years ago |

Did I just read it as Skynet ?

PaulHoule 10 years ago |

Meh.

This kind of parser isn't all that useful anyway. Parts of speech are one of those things people use to talk about language with, but you don't actually use them to understand language.

corin_ 10 years ago | |

You do subconciously. When you read "Dave punched John" you don't need to think "hmmm, is 'John' the object or the subject?" but if your brain hasn't figured out which is the object and which the subject you won't know who is hitting who.

You don't need to be able to name or define parts of speech but you need to be able to parse them, or you won't understand anything.

TeMPOraL 10 years ago | | |

Yeah, except are you sure that the way you parse a sentence maps 1:1 to what you learned as "parts of speech"? I think an equivalent understanding, one that seems to be more intuitive to the way my own brain works (if I can believe introspection), is that "punched" is to be read by default as "--punched-->", and "was/is punched by" pattern-matches to <--punched--". Arrow denotes who's punching whom.