it doesn't use natural language processing, it *computes* the answer.
GibberishFurther, one of the promising approaches in this area involves using generative grammars (or other generative, non-parametric approaches) to approximate natural language representations.
Both these approaches 'compute' answers without a notion of natural language grammars that are usually associated with natural language processing.
usually associated
I've definitely read papers about using those methods you describe on processing text. They are all just algorithms attacking a problems, so the distinction between NLP and computing is gibberish.The demo was very impressive, but in practice they just couldn't match the scope of Wikipedia or the wider Web. It may be the same way with Wolfram|Alpha.
A cite to some Google claims when they came out would be nice. (Academic papers don't count.)
I was in the SF Bay Area at the time and the "publicity" that I remember was friends saying "check this out". I don't recall Google saying anything beyond "here's how many pages we indexed".
I think that's the point shimonamit is making.
Was Twitter hyped up like this before it launched? Facebook? Google? Microsoft? Apple? TechCrunch? Hacker News? Wikipedia? Heck, pretty much ANYTHING that's successful now? (Even small stuff like Balsamiq that's currently very successful in a small way wasn't hyped before it launched).
Now think of stuff that was hyped massively before launch. Cuil. Powerset. Yeah.
Stuff that ultimately becomes super successful becomes successful over quite a long period of time and due to the excitement of users after launch - not the bleatings of gurus before launch.
http://www.trueknowledge.com/technology/
It's an interesting concept, and has much broader applications through their API.
Products that tend to be modest initially and improve and prove themselves rapidly tend to do better than products that are hyped up beyond all proportions.
The thesis of A New Kind of Science is something like "systems comprised of a small number of simple rules can perform arbitrarily complex computations."
The book proceeds to support the thesis. The content is comprised of descriptions of such systems, corresponding Mathematica execution trace diagrams, and analysis. These analyses are related to a ambitiously large scope of natural phenomenon and scientific knowledge.
I think a couple of things are clear.
(1) We are at the point where something impressive is likely to be able to be produced, and Wolfram may very well have the resources to do it.
(2) We are not at the point where the be-all-end-all version of this can be produced.
Compare this with the symbolic computation packages (Mathematica, Maple, etc.). Around 1990, we were at the point where we could produce a very good one. Several were written. They have been improved since, but only marginally. We're still pretty much using 1990 technology.
And that's fine. We knew how to make a really good symbolic computation package. We did. End of story.
But consider the proposed packages (Alpha, etc.). We might produce something impressive. But we are not ready to produce something really good and useful. Our initial efforts will require lots of improving.
And Wolfram is definitely not the one to do that improving. He runs an aggressively closed shop. Always has. I predict, therefore, that the cathedral-bazaar effect is going to mean his product will be difficult to improve, and so will never become truly useful.
If he provides not just technology and data but also the means to extend that technology and data by following his example he might be contributing something truly revolutionary.
Seems to me that this technology should have been released for some other scientific usage first (if it is indeed that powerful). It could be valuable as an engine for other applications as well in this manner.
I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.
> Seems to me that this technology should have been released for some other scientific usage first
Why?
Technologists need to get over the idea that technology is for science/technology.
> I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.
Huh? If there isn't a page that states which city is the fourth largest in eastern Montana, how will Google help you answer that question? (No fair going to the "populations of cities in eastern montana" page.)
Google doesn't (yet) do join queries.
I think its still way out of reach for non-trivial data-sets. Something like this doesn't just show up out of the blue, its not a problem amenable to some single new algorithm or breakthrough.
I'd love to be proven wrong...
A domain that doesn't have a mathematical model would not be describable by a formal system.
And likewise, any domain describable by a formal system would have a mathematical model.
Which brings us back to the original point: human stuff is not very math-friendly. I want to deal with emotions, politics, etc.
Think Bush and torture: can this system give me any definitive answers?
Too bad for that because right away I was thinking "Wow! It's a sentient version of Google only a bazillion times better!" but then I realized it's just a parser that turns natural language questions into queries against a large dataset then I became all sad and disappointed.
If I'm asking something like, "What is the capitol of Nebraska?" why not just get directed to the Wikipedia entry where I can learn a lot more than the one fact that answers what I just asked?
If Alpha is actually going to do computation, I'd rather be able to use it for something more complex than a single natural language query.
I would love to gather good questions and discuss the results when they are available. I think it is important to find questions to which google, yahoo, powerset or wikipedia don't provide a straight answer.
How about:
1) What is the smallest unknown prime number? ;)
2) Where on earth is the rainy days to sunny days ratio the lowest?
3) How many languages does the average person from the Benelux countries speak?
But heck, I wouldn't even be surprised if they push the scale, 7 cuils anyone?
This software should be able to look up rainfall data from Stephan Wolfram's Bumper Book of Trivia, work out average rainfall for each country, work out which country has the 15th-highest rainfall from that result, then look up the capital city for that country.
All determinable facts with a straight answer; you should simply get the name of the city as a result.
The parent posting is correct. Google is particularly poor at complex questions, especially if you aren't sure about the exact phrasing. I have had several queries in the past couple of weeks where I have had to spend 30 to 90 minutes trying to get the right set of terms to return the right set of documents to look at. This is particularly true if the name of the product is a common English word.
Practically speaking, static content that doesn't change often is better served by models like wikipedia.
If Wolfram knows all the answers, write them all in static html for the world to use, search, browse, replicate and extend instead of stored on semantic databases or ethereal brains.
I am not pissing on their parade, I know the scientific work is commendable, but practically speaking it can't compete with more efficient models.
Wikipedia isn't all that useful for storing all of the sums of integers.
In other words, you can't enumerate all of the questions that have one answer.
Not to be a dickhead, I know what you mean.
Questions that involve some kind of processing power can be a good target for Wolfram, but then, how much marketable besides academia?
The answer to the population of X country/city/town = wikipedia, plus more facts you may be interested while doing your research paper.
Maybe I just need 10 different questions/implementations of such service to get it.
Like TrueKnowledge and the Freebase answers in Powerset, this system will likely be good at answering a small subset of very direct questions. Having access to Mathematica's symbolic solver algorithms would definitely help in building this system.
If it's successful it will either be faster than current inference engines, or capable of solving more complex queries. Or perhaps both. We'll see.
askwolfram "What are all of the questions that will ever be asked?" | askwolfram "How do I format this answer for wikipedia?" > wikipedia.html
This is the point the author was making. Had you not paraphrased the article to skew the intended meaning of this observation, it would have taken on a different meaning.
The actual text: "It doesn't simply parse natural language and then use that to retrieve documents, like Powerset... Instead, Wolfram Alpha actually computes the answers to a wide range of questions"
Computing the answer for 'what does a string of natural language mean' and 'what is the intended answer of the question being asked' are 2 different things.
If it even close to real, the results will certainly speak for themselves. And the results might even be really awesome! I just would advise people to not pay attention until they see them.
But the more interesting question is "why is that?" And the answer is that you've used Google for so long that you've thoroughly internalized its capabilities that you think in Google and can't easily see outside of those capabilities.
In the programming language context, we call this the Blub paradox, but Blub happens everywhere, and is hardest to see when there is nothing to step up to.
If this works (a big if, IMHO, but I'm willing to give it a try), what might happen is that it might entirely recalibrate what queries you can conceive of. Until we see the system, though, it's hard to even begin to imagine what those queries might be.
How many queries in 1980 did people wish they could query Google for? Nearly none, not because people didn't have questions Google could answer, but because only a bare handful of visionaries could even conceive of Google.
Because if it looks like Google, then users will interface with it like they're used to interfacing with Google, and then you really have to win by beating Google at its own game. Much better to pick your own place & time.
So if you don't normally issue queries of this sort on Google even though you would probably get back an answer in the web page snippets, then perhaps you would not find this new system all that useful.
If you have not tried their sdk, I highly recommend it. It is open source(not free though). You can play around with it for free but if you want to use it commercially you need to pay.
How does it compare to other architectures? What sort of problems does it excel on? Where are it's weaknesses? What is the training like compared to other methods?
I wouldn't really take them seriously until a paper is linked to from their front page explaining the above. At the moment it just looks like hype.
You would be correct if you said:
"Isn't it true that any domain that can be described a formal system is also describable by mathematics."
If I have a friend Joe who is largely predictable then in certain situations, he is describable by a mathematical system (a logical system).
His full set of actions go beyond mathematics and if Joe realized how predictable he was, he might stop being so predictable.
Describable by mathematics does not mean "essentially equivalent" to mathematics.
Just like we extended our formal number system into negatives, imaginaries, quaternions, etc.
Math is a terrifically abstract, self-consistent model of reality. But that's all it is: a model. Sometimes the model tells us things we didn't know before, and sometimes we have to change the model to make it work with what we're observing.
People, in general, operate the opposite of computers. We don't think about what we should do, we think about we shouldn't do. So, it is very hard, if not impossible, to represent human behavior by a mathematical system.
This argument is explored in great detail in the book: Godel, Escher, Bach.
http://numenta.com/for-developers/education/htm-related-pape...
Unfortunately the three (why aren't there more?) independent papers didn't use HTM, and certainly don't answer all my questions.