If you consider language modeling the essential task of NLP, with deep learning, we can produce some interesting pattern, however still garbage anyway.
There exist tasks that can be solved at the level of human inter-annotator agreement, yes. That's how you know that the task is too simple or too factoid-driven, and we need a new one.
Can it formulate a human-level response to sarcasm? Could it respond in kind?
Does it know when a question is rhetorical? :)