Announcing SyntaxNet: The World’s Most Accurate Natural Language Parser(googleresearch.blogspot.com) |
Announcing SyntaxNet: The World’s Most Accurate Natural Language Parser(googleresearch.blogspot.com) |
Input : "How's the weather today"
Output : {"intent":"weather", "day":"Use wit-ai/duckling", "location":"..."}
https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...
There's a bunch of blogs, tutorials, etc, around word2vec and other methods of generating vectors from a training set of words.
Also, in the tensorflow models codebase where this syntaxnet code lives, there is an another tensorflow-using-method of generating word embeddings with demonstration code called Swivel
So for example :
Yes = (0,0,1,0,0,...) No = (0,0,0,1,0,....)
Convolutional and recurrent nets can handle inputs of arbitrary lengths.
https://papers.nips.cc/paper/5346-sequence-to-sequence-learn...
Surprisingly, this thing is written in C++.
1. WordNet
2. ImageNet
3. SyntaxNet
...
n. SkyNethttps://github.com/tensorflow/models/tree/master/syntaxnet/s...
This kind of parser isn't all that useful anyway. Parts of speech are one of those things people use to talk about language with, but you don't actually use them to understand language.
You don't need to be able to name or define parts of speech but you need to be able to parse them, or you won't understand anything.
What kind of parser is it?
An English parser.
... :)
You know, it's a brilliant name. It's a person's name, like Siri or Cortana. But if they'd named it Jane or John, the mass media would have a fit. So they side-stepped that by making it sound computer-y, while remaining a proper name. Brilliant.
#otherlanguagesmatters
According to the section titled "Why is Parsing So Hard For Computers to Get Right?" in the blog post, determining the difference between "An English parser" and "An English parser" .. is Parsey's great strength.
Five stars, would pun again.
Why would they have a fit if Google named it Jane or John?
I am so glad to be rid of her vapid nonsense.
It is a silly name and makes me think it is not a serious effort. But names can be deceiving.
Doubt it had any other meaning other than the team just having fun making a quip at the Boaty Mcboatface poll for a ship's name.
Disclaimer: I'm not a linguist, but by layman's standards I'm pretty confident.
Seriously though, this is just a joke on an Internet meme. Quit being a buzzkill.
(INB4 yes, that's the reference to an old joke about Lufthansa pilot in Frankfurt complaining he's forced to speak in English to the control tower)
Honestly though, English is a quite good starting language due to its simplicity, but I do hope more attention will be expended on making the technology like this trainable on other languages as well.
I also wish middle management were a bit less afraid of such things. You can usually get cool ideas approved in the higher echolons (because a CEO doesn't have to fear too much, or possibly because these people actually are more imaginative).
Names that have meaning, tell a story are incredibly useful for marketing, even if they sometimes sound unprofessional. Exp: 'Plan B' (morning after pill), 'CockroachDB', 'Virgin'.
It's beyond me how anyone could have chosen the predictable public outcry instead of naming that boat Boaty McBoatface. That's probably the least offending name that ever resulted from an internet poll.
If Helpy McHelpdesk got into widespread use, it'd probably soon become more referred to as the HMH.
Indeed. e.g. http://knowyourmeme.com/memes/events/dub-the-dew
Remotey Remoteface?
"On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach."
From the original paper, "Our model achieves state-of-the-art accuracy on all of these tasks, matching or outperforming LSTMs while being significantly faster. In particular for dependency parsing on the Wall Street Journal we achieve the best-ever published unlabeled attachment score of 94.41%."
This seems like a narrower standard than described, specifically being better at parsing the Penn Treebank than the best natural language parser for English on the Wall Street Journal.
The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.
There is a simplified educational 200 lines python version [2] of it. It claims 96.8% for the WSJ corpus.
What am I missing here?
[1] https://news.ycombinator.com/item?id=8942783
[2] https://spacy.io/blog/part-of-speech-pos-tagger-in-python
I've been watching the line of research in SyntaxNet closely, and have been steadily working on replacing spaCy's averaged perceptron model with a neural network model. This is one of the main differences between spaCy and Parser McParseface.
The key advantage of the neural network is that it lets you take advantage of training on lots and lots more text, in a semi-supervised way. In a linear model, you grow extra parameters when you do this. The neural network stays the same size --- it just gets better. So, you can benefit from reading the whole web into the neural network. This only works a little bit in the linear model, and it makes the resulting model enormous.
Another difference is that spaCy is trained on whole documents, while P. McP. is trained in the standard set-up, using gold pre-processing. I speculate this will reduce the gap between the systems in a more realistic evlauation. Of course, P. McP can do the joint training too if they choose to. I've reached out to see whether they're interested in running the experiment: https://github.com/tensorflow/models/issues/65
Just installed syntaxnet - tests passed in the following setup.
https://gist.github.com/Hendler/61831e411069815ee4ed490f553f...
INFO: Elapsed time: 908.048s, Critical Path: 640.26s
//syntaxnet:arc_standard_transitions_test PASSED in 0.0s
//syntaxnet:beam_reader_ops_test PASSED in 20.9s
//syntaxnet:graph_builder_test PASSED in 16.3s
//syntaxnet:lexicon_builder_test PASSED in 1.8s
//syntaxnet:parser_features_test PASSED in 0.0s
//syntaxnet:parser_trainer_test PASSED in 46.1s
//syntaxnet:reader_ops_test PASSED in 5.7s
//syntaxnet:sentence_features_test PASSED in 0.0s
//syntaxnet:shared_store_test PASSED in 0.5s
//syntaxnet:tagger_transitions_test PASSED in 0.0s
//syntaxnet:text_formats_test PASSED in 1.7s
//util/utf8:unicodetext_unittest PASSED in 0.0s
Some other notes:
Also using Keras with Theano. Before spaCy, StanfordNLP, Freeling, and/or NLTK.
better at parsing the Penn Treebank than the best
natural language parser for English on the Wall
Street Journal
I'm pretty sure "the 20 year old Penn Treebank" and "the Wall Street Journal" are referring to the same dataset here. In the early 1990s the first large treebanking efforts were on a corpus from the WSJ, and they were released as the Penn Treebank: https://catalog.ldc.upenn.edu/LDC95T7 People report results on this dataset because that's what the field has been testing on (and overfitting to) for decades.(I worked on a successor project, OntoNotes, that involved additional treebank annotation on broader corpora: https://catalog.ldc.upenn.edu/LDC2013T19)
The point about overfitting is valid, too, which is another reason why this "most accurate such model in the world" claim is obnoxious.
It's also fair to note that their advance is in fractions of percentage points on this specific dataset over models that are 5-10 years older.
So you're referring to this LSTM?
"Andor et al. (2016)* is simply a SyntaxNet model with a larger beam and network. For futher information on the datasets, see that paper under the section "Treebank Union"."
After spending a few months hand coding a NLP parser, am rather intrigued by LSTM. I like the idea of finding coefficients, as opposed to juggling artificial labels.
This is an opensourcing of the March 2016 method (syntaxnet, note that in the paper there are results from several trained models) as well as a trained model that is comparable in performance but faster (macparseface).
It is very hard to separate those two things from the way they write.
The blog post says this can be used as a building block for natural language understanding applications. Does anyone have examples of how that might work? Parse trees are cool to look at, but what can I do with them?
For instance, let's say I'm interested in doing text classification. I can imagine that the parse tree would convey more semantic information than just a bag of words. Should I be turning the edges and vertices of the tree into a feature vectors somehow? I can think of a few half-baked ideas off the top of my head, but I'm sure other people have already spent a lot of time thinking about this, and I'm wondering if there are any "best practices".
The problem is that while I have knowledge and experience in the computer vision side of machine learning, I lack experience in NLP. And to the best of my knowledge NLP as a field has not come as far as vision, to the extent that such an automated editor would have too many mistakes. To be student facing it would need to be really accurate. On top of that it wouldn't be dealing with well formed input. The input by definition is adversarial. So unlike SyntaxNet which is built to deal with comprehensible sentences, this tool would need to deal with incomprehensible sentences. According to the link, SyntaxNet only gets 90% accuracy on random sentences from the web.
That said, I might give SyntaxNet a try. The idea would be to use SyntaxNet to extract meaning from a broken sentence, and then work backwards from the meaning to identify how the sentence can be modified to better match that meaning.
Thank you Google for contributing this tool to the community at large.
`echo 'Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo' | syntaxnet/demo.sh'
buffalo NN ROOT
+-- buffalo NN nn
| +-- Buffalo NNP nn
| | +-- Buffalo NNP nn
| | +-- buffalo NNP nn
| +-- buffalo NN nn
+-- Buffalo NNP nn
+-- buffalo NNP nn
[1]: https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...Bison from Buffalo, which bison from Buffalo bully, themselves bully bison from Buffalo.
Bison NNP ROOT
+-- from IN prep
| +-- Buffalo NNP pobj
+-- , , punct
+-- bison NN ccomp
| +-- themselves PRP nsubj
| +-- bully RB advmod
| +-- from IN prep
| +-- Buffalo NNP pobj
+-- . . punctAnalysis of the structure of a piece of text is the first step to understanding its meaning. IBM are doing some good work in this area. http://www.alchemyapi.com/products/demo/alchemylanguage
Anything in the pipeline for this project to help with classifying sentiment, emotion etc. from text?
[0] http://www.amazon.com/Doing-Grammar-Max-Morenberg/dp/0199947...
My understanding is that Chomsky was against statistical approaches to AI, as being scientifically un-useful - eventual dead ends, which would reach a certain accuracy, and plateau - as opposed to the purer logic/grammar approaches, which reductionistically/generatively decompose things into constituent parts, in some interpretable way, which is hence more scientifically valuable, and composable - easier to build on.
But now we're seeing these very successful blended approaches, where you've got a grammatical search, which is reductionist, and produces an interpretable factoring of the sentence - but its guided by a massive (comparatively uninterpretable) neural net.
It's like AlphaGo - which is still doing search, in a very structured, rule based, reductionist way - but leveraging the more black-box statistical neural network to make the search actually efficient, and qualitatively more useful. Is this an emerging paradigm?
I used to have a lot of sympathy for the Compsky argument, and thought Norvig et al. [the machine learning community] could be accused of talking up a more prosaic 'applied ML' agenda into being more scientifically worthwhile than it actually was.
But I think systems like this are evidence that gradual, incremental, improvement of working statistical systems, can eventually yield more powerful reductionist/logical systems overall. I'd love to hear an opposing perspective from someone in the Chomsky camp, in the context of systems like this. (Which I am hopefully not strawmanning here.)
[0]Norvig's article: http://norvig.com/chomsky.html
Curious - The parsing work I've done with programming languages was never done via machine learning, just the usual strict classification rules (which are used to parse ... code written to a strict specification). I'm guessing source code could be fed as data to an engine like this as a training model but I'm not sure what the value would be. Does anyone more experienced/smarter than me have any insights on something like that?
As a side-point:
Parsy McParseface - Well done. They managed to lob a gag over at NERC (Boaty McBoatface) and let them know that the world won't end because a product has a goofy name. Every time Google does things like this they send an unconscious remind us that they're a company that's 'still just a bunch of people like our users'. They've always been good at marketing in a way that keeps that "touchy-feely" sense about them and they've taken a free opportunity to get attention for this product beyond just the small circle of programmers.
As NERC found out, a lot of people paid attention when the winning name was Boaty McBoatface (among other, more obnoxous/less tasteful choices). A story about a new ship isn't going to hit the front page of any general news site normally and I always felt that NERC missed a prime opportunity to continue with that publicity and attention. It became a topic talked about by friends of mine who would otherwise have never paid attention to anything science related. It would have been comical, should the Boaty's mission turn up a major discovery, to hear 'serious newscasters' say the name of the ship in reference to the breakthrough. And it would have been refreshing to see that organization stick to the original name with a "Well, we tried, you spoke, it was a mistake to trust the pranksters on the web but we're not going to invoke the 'we get the final say' clause because that wasn't the spirit of the campaign. Our bad."
Isn't the core observation about natural language that humans don't parse it at all? Grammar is a secondary, derived construct that we use to give language some stability; I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.
Anyway, some impressive results here.
echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
Input: Bob brought the pizza to Alice .
Parse:
brought VBD ROOT
+-- Bob NNP nsubj
+-- pizza NN dobj
| +-- the DT det
+-- to IN prep
| +-- Alice NNP pobj
+-- . . punctToo many people mistake "we can't make taggers that are better at tagging Penn Treebank" for "we can't make taggers better", when there are so many ways that taggers could be improved in the real world. I look forward to experimenting with Parsey McParseface.
How expensive is it to train a model like Parsey McParseface?
It worked pretty well but I lost interest once I realized I would have to feed it tons of words. So could I use this to do something similar?
What programming language would I need to use?
For instance, the way they parse sequences of words may or may not be too specific to the English language. It is somewhat similar to what we call "overfitting" in the data-mining area, and it may invalidate this technique for other languages.
When I worked on this area (up to 2014), I worked mainly in language-independent statistical approaches. As with everything, it has its cons as you can extract information from more languages, but, in general, with less certainties.
But in general, it is good to see that the NLP area is still alive somewhere, as I can't seem to find any NLP jobs where I live! :)
Edit: I've read it in the diagonal, and it is based on a Neural Network, so in theory, if it was trained in other languages, it could return good enough results as well. It is normal for English/American authors to include only english datasets, but I would like to see an application to another language.. This is a very specialized domain of knowledge, so I'm quite limited on my analysis..
Nlp is very much alive and well.
Look at the March 2016 paper they cite.
It's impressive how Google's natural language features, since the simpler spell check, degrades when it work with languages different from English.
Does "possible" mean "syntactically valid" here? If so I'd be interested in a citation for it.
Also, I wonder what kind of errors it makes wrt to the classification in http://nlp.cs.berkeley.edu/pubs/Kummerfeld-Hall-Curran-Klein...
e.g. take a sentence like "The cat sat on the rug. It meowed." Did the cat meow, or did the rug meow? You can't determine that by semantics, you have to know that cats meow and rugs don't. So to parse language well, you need to know an awful lot about the real world. Simply training your parser on lots of text and throwing neural nets at the code isn't going to fix this problem.
I am personally looking for a somewhat reliable NLP parser which can handle Dutch at the moment. Preferably one which can handle POS tagging without hacking it in myself.
There is 6 links in this sentence in the original text. I get it can help to get more context around it, but I think it's actually making the text harder to "human" parse. It also feels they have hired a cheap SEO consultant to do some backlink integrations.
Can't work out if this is a joke.
Vote for boaty! He's keepin hope afloat!
Just kidding! As an American, I can say it's a decent assumption :D
[0] http://www.theatlantic.com/international/archive/2016/05/boa...
I could see Parsey McParseface helping identifying patterns in literature contemporaneous to the biblical texts. Certain idiomatic uses of syntax, which would have been obvious to the original readers, could be identified much more quickly.
There are a ton of Jewish Idioms in the Bible that many don't understand at all, including "No man knows the day or the hour" which is a traditional Jewish Wedding Idiom. Lots and lots of things could be explored with enough data and resources.
I once interned for a company that's been doing this for years. They have all kinds of features tracing individual words through various different languages, etc.
Here's an example of using information from a syntactic parser to decorate words, and create an enhanced bag-of-words model: https://spacy.io/demos/sense2vec
Here's a very terse explanation of using them in a rule-based way: https://spacy.io/docs/tutorials/syntax-search
http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf
They are useful as a preprocessing step for a lot of downstream NLP tasks. It shouldn't be hard to find more papers that take advantage of the tree structure of language.
One really simple and obvious thing is word sense disambiguation. Plenty of homonyms are different parts of speech (e.g. the verb "lead" and the noun "lead"). I'm sure there's lots of more sophisticated stuff you can do as well, but this might be the lowest-hanging fruit.
I use parse trees as a kind of "advanced language model" for when I need to replace a word in a sentence (see for example: http://www.aclweb.org/anthology/P13-1142 ), it's so much better than using just simple n-grams.
If a teacher gave students a grammar-checking tool to check their writing, they might assume that the tool knew better than they did, which is only sometimes true.
"Learn the rules like a pro, so you can break them like an artist."
-Picaso
And come to think of it, isn't there a saying: Did stupidity require smart computers or did smart computers allow for stupidity?
An intelligent software (or one that pretends to be intelligent) might allow for any unqualified bloke to be in a position where they can teach.
Visiting noredink.com now ... I can't actually figure out what it is the site offers from a cursory glance, so I can't even begin to figure out whether it matches my idea and what issues it may have had that excluded it from her classroom.
I'll bounce grammarly off her later though, in case that's new.
As an aside, I don't think this is the optimal way to teach people how to write. What were the ideas in those papers? How were they organized? Do the student's arguments make sense? I think that's what most students spend most of their time thinking about when writing an essay, and it can be a bit demoralizing to see the teacher care just as much about whether the grammar was right. Most students can fix grammar mistakes relatively easily once they notice them anyway.
You would be surprised! Especially with people whose first languages aren't English. Having something to provide feedback on grammar problems early on would be really useful. Of course, once they get grammar down, the next step is rhythm and flow, as well as reducing redundancy, and the biggest problem, as you say, is always the story, but you have to peel through lots of grammar problems before you get to that point. I edit a lot of research papers for my Chinese peers (most have PhDs, I work in a China-based research lab, so that isn't weird), so I'm pretty clear on the problems.
[0]: http://grammarly.com
EDIT: but sure, this is only the first step and semantic parsing is far from solved
> This suggests that we are approaching human performance—but only on well-formed text.
It may fall down on exactly the bad writing you want to process. GIGO?
See e.g. http://visl.sdu.dk/~eckhard/pdf/TIL2006.pdf which gets 99% on POS and 96% on syntax function assignment – Constraint Grammar parsers are the state of the art of rule-based systems, and the well-developed ones beat statistical systems. CG's are also multitaggers – they don't assume a word has to have only one reading, it might actually be ambiguous, and in that case it shouldn't be further disambiguated (that's why they use F-scores instead of plain "accuracy").
CG's also require manual work, so it's not like you can download a corpus an unsupervisedly learn everything; but on the other hand, for what languages in the world do you have a large enough data set to unsupervisedly learn a good model? And for what training methods can you even get good models from unlabeled data? The set of languages for which there are large annotated corpora (especially treebanks) is even smaller … So CG's are also heavily used for lesser-resourced languages (typically in combination with finite state transducers for morphological analysis), where the lack of training data means it's a lot more cost-effective to write rules (and turn existing dictionaries into machine-readable FST's) than it is to create annotated training data (which would often involve OCR-ing texts, introducing yet another error source). CG writers still tend to have a very empirical mindset – no toy sentences like "put the cone on the block", but continual testing on any real-world text they can get their hands on.
Both rule-based and statistical approaches are fundamentally flawed by not incorporating any real world information. Humans do language going from real world info, mapping to grammar or rules. Computers are trying to go the other way and are not going to succeed other than as mere toys.
Even tech progression-wise, both rule and model/nn approaches are really bad since there is no meaningful sense of iterative progress or getting better step by step, unlike cpu chips or memory speeds. They are more of a random search in a vast space, hit or miss, getting lucky or not, which is very bad no good, as a technology or as a career.
Now to get to 99.4%+ how about we combine techniques such as spaCy or Parsey McParseFace (love the name Google) with very simple real-world cognitive models. So for the example given "Alice drove down the street in her car." a simple cognitive model would _know_ that streets cannot be in cars and so be able to disambiguate. A cognitive model wouldn't know all the facts about the world, it would know certain things about streets, certain things about cars and be able to infer on the fly whether the relationship between streets and cars matches either the first parse possibility or the second. To me this seems like the obvious next step. If it's obvious to me it must have been obvious to someone else so presumably somebody is working on it.
I think both approaches are needed for general AI: neural networks, or something like them, for low level perception and recognition; and symbolic AI for higher level reasoning. Without the symbolic layer, you can't be sure what's going on.
Symbolic AI has been very closely guided by cognitive psychology. Artificial neural networks ignore neurophysiology, so even when they work, they tell us very little about how the brain works.
I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.
That is completely wrong. People like Geoff Hinton spend most of their time thinking about how the brain works (indeed, his background is cognitive psychology). The "convolution" part of convolution neural networks is designed to mimic how the optic nerve interfaces with the brain.
I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.
The funding dried up because they ran into the limits of what is possible.
Artificial languages (such a programming languages) are usually designed to be unambiguous. In other words, there is a 1:1 mapping from a sentence or fragment to its abstract representation.
Natural language is ambiguous, so there is usually 1:N mapping from a sentence to abstract representations. So, at some point you need to decide which of the N readings is the most likely one.
Older rule-based approaches typically constructed all readings of a sentence and used a model to estimate which reading is the most plausible. In newer deterministic, linear-time (transition-based) parsers, such ambiguities (if any) are resolved immediately during each parsing step.
In the end it's a trade-off between having access to global information during disambiguation and having a higher complexity. So, naturally, the rule-based systems have been applying tricks to aggressively prune the search space, while transition-based parsers are gaining more and more tricks to incorporate more global information.
That said, there is some interesting work on "good-enough" language processing, which suggests that people maintain some fuzziness and don't fully resolve the structure when they don't need to. [1]
[1] http://csjarchive.cogsci.rpi.edu/proceedings/2009/papers/75/...
They reflect a particular language in its well-written form. However, humans are extremely robust against syntax errors. I am not a linguist, but I think this speaks in favor of lexicalist approaches: we can be very free in word order, as long as our brain can match up e.g. verbs with their expected arguments.
(Example sentences taken from https://he.palgrave.com/page/detail/syntactic-theory-geoffre..., although any introductory linguistics/syntax textbooks will spend a few pages making the case that humans understand language by first parsing it into some kind of tree structure).
And yet the following is also correct - in terms of real-world usage, not some prescriptive definitions:
"The man who has written the book will be followed, right mate?"
> you will find that the rules must involve imposing some sort of tree structure on the original sentence.
The rules are, and the brain may be, but I feel those are different tree structures. Moreover, I wonder if the "tree structures" of our brains aren't just artifacts of recursive pattern matching - we also know that when reading, humans process whole groups of words at a time, and only if there's some mismatch they process pieces in more detail. Any recursive process like this will generate a tree structure as its side effect.
Anyway, thanks for the examples. I might pick a linguistic book at some point. Right now the idea of understanding natural language by parsing it into "NOUN PHRASES" and "VERB PHRASES" and stuff seems completely backwards, given how humans have no trouble parsing "invalid" sentences, or using them - especially in spoken language.
(Not to mention our ability to evolve the language, and how the grammatically invalid constructs tend to be introduced, used, understood with no trouble and at some point they become grammatically accepted - see e.g. recent acceptance of "because <noun>").
I'm not a linguist, but seeing how people a) can understand sentences that are grammatically malformed perfectly well, b) can easily derive meaning out of "sentences" stripped out of verbs ("I her dinner cinema Washington"), it seems to me that most of the work is being done by pattern-matching to known words and phrases. E.g. "drove down the street" is a kind of semantic unit on its own.
Again, I'm not a linguist, but a lot of introspection as well as observing other people strongly suggest to me that humans do anything but parsing grammatical structures.
You do need to analyze a sentence to understand it. Think of a classical attachment ambiguity such as "the boy saw the girl with the telescope". There are two readings of the sentence, and just like a Gestalt, you're typically perceiving it as one or the other. This involves a process of disambiguation, which is evidence that you have parsed the sentence.
In terms of a basic probabilistic model, P(meow | rug) would be far lower than P(meow | cat), and that alone would be enough to influence the parser to make the correct decision. Now, if the sentence were "The cat sat on the rug. It was furry", that would be more ambiguous, just like it is for an actual human to decode. But models trained on real-world data do learn about the world.
An interesting example of this you can easily try for yourself is playing with Google's voice to text features--if you say silly things like "the rug meowed" you will have terrible results because no matter how clearly it can hear you its training data tells it that makes no sense.
Actually, "animacy" is a fundamental feature in semantics. It's part of your mental lexicon that a cat is animate and a rug isn't, and you would simply infer from that which is the referent for "it". As semantic challenges go, this is a very trivial one. In general the border between linguistic and world knowledge can become blurred. There may be limits to what can be learned purely from text, but seeing that this model achieved 94%, a lot can be learned purely from (annotated) text.
Cool. I wonder how much how much Human effort (vs machine time) went into tweaking SyntaxNet versus tweaking Parsey M.?
"bully RB advmod"
I assume this is wrong, and probably triggered by the "ly" ending? Also, odd that it dropped the second clause and didn't see any trouble with the parsed results lacking a single verb. I guess it has to be able to handle fragments, etc.
Actually I found an educational 500 lines python parser from the same author as well [1]. It claims an accuracy of 92.7 for the WSJ corpus.
I would think the technology could be helpful in a fairly narrow way: identifying syntactical constructions outside the bible to help explain ambiguous syntactical constructions within it (For example, Ugaritic texts, another ancient Semitic language similar to Hebrew, are often studied to aid in understanding portions of the Old Testament). Scholars have been doing this without computers for some time and have begun to do this type of analysis with software. I would imagine more sophisticated software would yield at least some new insights.
That success only lasts in the limited context of the corpora used for training. Step outside that and success goes down to 60% or much worse. And that's just tagging and things, shallow parsing. Meanning? Discourse? Don't even think about it.
That is, if you took the well NLP trained model, then you could in principle extract out of it facts like "streets are not found inside cars".
http://www.duetsblog.com/2016/03/articles/trademarks/namey-m...
For other non-Parsey McParseface dependency parsers and POS taggers that are web accessible, see http://corenlp.run/ and http://nlp.stanford.edu:8080/parser/.
Time flies like an arrow. Fruit flies like a banana.
So, really all of the above parsers are weak in that they only output a single best parsing, when in reality sentences can have more than one valid structure, the principal example being the second sentence you've provided. I don't think Google's model has a better sense of humor than the others, no. I anticipate that they all have used relatively similar training data.
However, there is probably a trivial way to get the second sentence to parse as
Subject --- Verb --- Object
Noun Verb Article Noun
| \ | | |
Fruit flies like a banana .
and that is to provide training data with more occurrences of ... > N{Fruit flies} V{like} honey.
> N{Fruit flies} V{like} sugar water.
than occurrences of > A plane V{flies} PREP{like} a bird.
The more sentences using simile that the parser finds, the less likely the neural net is to consider 'like' as a verb. It's also impacted by all of the uses of [flies like].That's the nature of statistical language tools.
The stock parser debuted here gives the same answer as CoreNLP, by the way.
flies VBZ ROOT
+-- Fruit NNP nsubj
+-- like IN prep
| +-- banana NN pobj
| +-- a DT det
+-- . . punct
So much for Parsey McParseface's sense of humor.If you have a small (ish), well-known text, I don't think you will get much insight from machine translation. Certainly there are plenty of uses for computer text analysis/mining in biblical studies, but I doubt translation is one of them. And for obscure idioms or hapax legomena, machine translation definitely can't help you because by definition there are no other sources to rely on.
E.g. I remember from school getting in into an argument with a teacher over the interpretation of a poem. "His" interpretation, which was really the interpretation of some authority who'd written a book was blatantly contradicted by the text if you assumed that the author hadn't suddenly forgotten all his basic grammar despite all the evidence to the contrary everywhere else that he was always very precise in this respect.
Of course, in some of these kind of instances, it will be incredibly hard to overcome the retort that any "revelation" is just a bug.
In a more general sense, people are typically exceedingly bad at parsing text, judging by how often online debates devolve into bickering caused largely by misunderstanding the other party's argument. Often to the extent of even ending up arguing against people who you agree with. Having tools that help clarify the parsing for people might be interesting in that respect too.
[1] http://www.amazon.com/Oral-Formulaic-Study-Quran-Andrew-Bann...
[2] http://www.academia.edu/9490706/Retelling_the_Tale_A_Compute...
"And that quote is almost certainly made up."
- Willem Shakespere
With then-innovated art like Cubism, which in some sense broke earlier rules, the point is that those artists, like Picasso, were able to do representative art in correct proportions -- they were just going beyond that.
This is a nontrivial issue, because there are always students who think they can skip learning boring mundane old fashioned art and go straight to something like cubism, but in 99% of cases that doesn't work well at all compared with learning "the rules" first.
With writing, many great writers have broken "the rules" with punctuation, spelling, grammar, etc. But the important thing is that they do so on purpose.
Whereas if one doesn't know the rules in the first place, one doesn't have the choice of whether to follow them or break them.
Such a person will always break those rules they don't know (subconscious knowledge counts btw) -- but not for aesthetic reasons, only out of ignorance.
If you listen to it without paying attention to the text (based on a poem by Goethe by the same name; both the German text and an English translation is found at [1]), parts of it sounds like horrible jammering and poor harmonies and it's easy to write it off as not sounding very nice.
Here's [2] a much clearer rendition (two singers, with much stronger delineation of the three different characters) than the one linked from Britannica.
If you do pay attention to the text, it is very clear that the unpleasant parts are very deliberate:
The singer(s) switches between the role of a father, his sick dying child, and the Erl-king that occurs in the hallucinations of the child while the father is riding to bring the child to a doctor.
The big difference between the unpleasant-sounding parts of this song and a bad composer is the clear intent and delineation - Schubert made things sound bad intentionally explicitly at the points he wanted to illustrate pain and fear, rather than because he didn't know how to make things sound pleasant when he wanted to.
The song clearly proves this by setting the childs jammering and the fathers fearful attempts to soothe him up against much more pleasant segments where the Erl-king speaks and tries to seduce the child to come with him.
You only get that clear separation if you know how to evoke each effect precisely. Arguably a bad particularly composer wouldn't even know how to make things sound bad the "right way" - there's a big difference between random bad sounds and making things evoke a child in pain.
I've heard great painters say that the only thing that matters is that you paint. Plenty of wonderful painters did not study their predecessors in depth.
I'm assuming that your 99% number is fabricated? Incidentally 99% of statistics are made up.
> given how humans have no trouble parsing "invalid" sentences
I think you misunderstand slightly - the claim linguists make is not "humans are unable to understand invalid sentences because they can't parse them", the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.
Furthermore, the way you understand sentences like "I this morning fish eat" is different from the way you understand "I ate fish this morning", in the former it feels like you're guessing. It could work for communicating simple thoughts, but I doubt an english non-speaker who has an english dictionary could convey a complicated thought requiring many words by that same guessing process. In fact the reason why language evolved tree syntax is probably because it is needed to convey long, complicated thoughts.
> because <noun>
I'm glad you mentioned that! First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field - language changes over time, the study of the rules of language and how these rules change is interesting and important, but it's pointless to "enforce" the rules. Even new constructions like "because X" have rules that govern them, eg see http://allthingslinguistic.com/post/72252671648/why-the-new-... - constructions like "because want" and "because need" exist, but no one says "because adore", and something interesting explains why. (to be fair, I haven't really internalized the "because X" construction so I can't claim that I find "because adore" unnatural, but the article says it's the same reason why "omg want" and omg need" are currently grammatical but "omg adore" is not, and even if you're not familiar with the "omg X" construction, it gives independent evidence in that "omg adore" has no tumblr tags; of course, it may become grammatical in the future, but that would be because the rules have changed over time, not because there are no rules). To that point,
> or using [ill-formed sentences] - especially in spoken language.
actually, if a sentence is used in spoken language routinely and non-accidentally, linguists take it as evidence that it's grammatical and then work backwards to find the rules that explain why it is so. How else could they do it?
> the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.
I see. Yeah, most of the way I think about how mind processes language comes from focusing on that "gut feeling", that on one hand tells you that this perfectly understandable sentence is somehow "off", and on the other hand lets you form perfect sentences without ever explicitly thinking about grammar.
> First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field
It seems to me that I've been operating under invalid assumption that linguistics is mostly prescriptive. Thanks for that. Any recommendation for an intro book I could grab to read in my spare time?
Unfortunately I think the field suffers from a lack of such books.
1. You could try Steven Pinker's "The Language Instinct", although it's a general-audience book that doesn't really try to teach you linguistics proper
2. The first textbook I used was https://linguistics.osu.edu/research/pubs/lang-files and it's pretty good. However, it's quite hard to obtain.
Edit:
3. If you just want to look at syntax http://web.mit.edu/norvin/www/24.902/24902.html is advanced but good
Here is the undeniable proof that syntactic structure exists. Consider the sentence `The magician pointed at the man with the hat.' This is a perfectly natural sentence, of which there are two likely interpretations. One is that the magician used a hat to point at the man. The other is that the man who was pointed at wore a hat.
What distinguishes these sentences? Only the underlying syntactic structure, of whether to parse it as `the magician pointed at (the man) with the hat' or as `the magician pointed at (the man with the hat)'. This `hierarchical structure' of our sentences is syntactic structure at its essence.
You argue that humans can understand sentences with whatever grammar, and parsing is pretty much pattern-matching of words. But what about the sentence pair : `Benny chased Jenny' versus `Jenny chased Benny'? These have the same words, and mean different things. It is only our syntactic understanding of how words are ordered in English that allows us to understand these sentences.
There are multiple hypothesis of what a sequence of words can mean, which is not the same thing as 'we form explicit syntax trees in our heads when reading a sentence'.
I could also give you the bag of words
magician point man hat
You would derive meaning from this bag of words, probably the same interpretations as in your example. However, the sentence is utterly ungrammatical. Note that I am not contending that we don't use some form of syntax at all. E.g., I think that someone whose native language has a freer word order than English will assign more hypotheses to the bag of words above (e.g., my brain also considers the less likely option that the magician is the object).
Another problematic aspect of this hypothesis is that a longer sentence will have so many possible parses that it would take a long time to construct and consider all parses. Moreover, I find it unlikely that we have thousands of exact syntax trees in our head that we compare.
Yeah. I'm playing with a different idea now - maybe that "tree structure" that "undeniably exists" in our brains isn't an explicit syntax tree, but an artifact of recursive, adaptive pattern-matching? I.e. if you look at things like reading speed or "understanding" speed, you'll notice that people tend to process stuff in large blocks until something "does not click", and they have to focus and process the block in detail. That sort of feels like a recursive refinement, and any process that recurses in more than one place generates a tree structure as a side effect.
For example we know for sure that the brain applies sophisticated mathematical algorithms to signals coming from the ears to locate sound in 3D space, yet we are certainly not consciously aware of it - we just "know" where the source is located
Regarding grammar, there is a theory called Universal Grammar from Chomsky that we are born with grammar structures in the brain.
Some recent news on it - http://www.medicaldaily.com/noam-chomskys-theory-universal-g...
It's a very good study, but does NOT prove `UG' once and for all.
There is just a ton of information and context the computer probability models do not have. They can use all the big data they want, but are capturing only a very thin slice of real world information.
Mhm.
When humans see a jeopardy answer looking for the name of an ancient king, they might give the wrong name, because quick, did Hadrian rule before or after Caesar?
If Watson gets it wrong, its answer is something like "What are trousers?".
It seems quite obvious that different things are going on there.
If you think about it, there is an iteration happening within machine learning that is essentially building that prior knowledge about the world by reusing previous models as inputs to knew ones. For example how Spacy uses word2vec vectors to do parsing and NER and then sense2vec uses Spacy pos tags create word vectors.
sense2vec.spacy.io
Prior knowledge _might_ solve that problem. It's not really solved yet so who knows. Yeah, work is ongoing and word vectors sound cool and all, but in the past people said the same thing about bag-of-words models and look where we are now.
Humans solve sparsity, sure, we learn language from ridiculously few data points, but who knows what it is that we do, exactly? If we knew, we wouldn't be discussing this.
Let's restate the problem to make sure we're talking about the same thing: the problem is that the number of possible utterances in a given language that are grammatically correct according to some grammar of that language is infinite (or so big as for it to take longer than our current universe has to live before an utterance is repeated).
And it's a problem because it's impossible to count infinity given only finite time. I don't see how prior knowledge, or anything else, can solve this.
Which must mean humans do something else entirely, and all our efforts that are based on the assumption that you can do some clever search and avoid having to face infinity, are misguided and doomed to fail.
For the date parser you want http://nlp.stanford.edu/software/sutime.html
The code and rules aren't fun to customize though.
Another option is Natty [1], but it also seems to fail on the same examples. Natty at least has an ANTLR grammar that's reasonably easy to understand, though.
I'm working on a project that analyzes sentiment from speech, and I've been meaning to start on text sentiment analysis, but I'm not sure where to start.
The code is pretty readable but relies heavily on a ruleset which might need to be tweaked for one's need.
The visual system (retina, lateral geniculate nucleus, visual cortex) was fairly well understood well before ANNs were developed. A few uncontroversial ideas (e.g. that cells take their inputs from neighbouring cells in the previous layer) were adopted for use in ANNs.
I was around at the time of, and affected by, the AI winter. There was certainly no consensus among those working in AI that they had got as far as they could. Work stopped when funding was cut, often for political reasons.
The most mature area at the time, apparently ripe for commercialization, was expert systems. However, it was very hard to commercialize them: customers couldn't think of any suitable applications, and when they could, they couldn't spare the time of their experts.
Finally, the main reason for the AI winter was probably that AI was unable to live up to the grossly inflated expectations, simply because the expectations were grossly inflated. This seems to be happening again, with neural networks.
I wasn't around, but I got curious about symbolic systems after listening to MIT's AI course[1]. Did some reading about the subject. The impression I got matches what you describe.
It's ridiculous how many people here dogmatically recite statements about failures of symbolic systems without (apparently) knowing anything about how those systems were used and what they achieved. If you listen to the comments, it sounds as if research on symbolic systems only ever produced crude, useless toys. That was certainly my impression before I took some time to actually look into it. A bit of straightforward Googling can show that it's a gross misrepresentation of history. For example, MIT's lecture on knowledge engineering [2] has some really interesting info on this subject.
[1] http://ocw.mit.edu/courses/electrical-engineering-and-comput...
[2] http://ocw.mit.edu/courses/electrical-engineering-and-comput...
I'd invite you to read "The Master Algorithm" to understand exactly how they failed the first time and how they aren't the route forward: https://en.m.wikipedia.org/wiki/The_Master_Algorithm
If that was as significant a factor as you make it sound, the progress in artificial neural networks would be closely tied to the progress of neurology. So where are all the citations of neurology and cognitive psychology papers in recent AI/ANN research?
There is some work in this though, but often going the other way: see for example http://news.discovery.com/tech/robotics/brain-dish-flies-pla... and even more extreme http://www.nature.com/articles/srep11869
One day someone will build a neural net model to do this rather than hand written rules.
[0] - https://github.com/chaitanyagupta/chronicity
[1] - it's surprising how easy is to get 80% there with hacks like these: https://github.com/TeMPOraL/alice/blob/master/language.lisp#...
1) You have a working system. You know only bits and pieces of how it works.
2) You build a crude model of the system. It kinds of sucks at doing the stuff the System is doing well.
3) People over several decades apply tons and tons of task-specific optimizations and modifications to your model. Those modifications have nothing to do with the original system, but because of them the model finally achieves good performance at some tasks.
4) You use the hype generated by #3 to claim that you were right all along and that your model captures the essential aspects of the original system.
5) When people point out that your model works in ways that clearly don't match the original system, you make a claim that it's the original system that approximates your model, not the other way around. Without any observations of the original system supporting your claim.