The true power of regular expression(nikic.github.io) |
The true power of regular expression(nikic.github.io) |
That's why we call those derivatives "Franken-xpressions". They can take exponential time in the worst case. See "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)" (http://swtch.com/~rsc/regexp/regexp1.html).
The additional power of Franken-xpressions comes at a cost.
For general purpose parsing, I suggest using a parser combinator library.
the examples won't work in python or java, for example.
> Context-sensitive languages are something that you will rarely encounter during “normal” programming.
In my experience, context-sensitive languages, as a class, are never encountered in programming. Certainly, one might need to deal with a formal language that happens to be context-sensitive but not context-free; however, the fact that it lies in the context-sensitive class does not ever seem to be a useful fact.
The concepts of regular, context-free, and recursively-enumerable languages are obviously useful ones for us. But I think the only reason programmers ever mention the class of context-sensitive languages is that they feel obligated to, having mentioned the other three classes in Chomsky's hierarchy.
If anyone knows of an instance where the fact the concept of context-sensitive languages/grammars was useful in a computational context[1], I'd like to hear about it.
[1] Heck, any context. Are context-sensitive grammars really useful in the study of natural languages? I wouldn't be surprised if the answer turns out to be "no".
From the manual:
The theoretical computer scientists out there will correctly point out that a self-referential regular expression is not "regular", so in the strict sense, xpressive isn't really a regular expression engine at all. But as Larry Wall once said, "the term [regular expression] has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here."
Once I read an article where the autor recomended using "regular expression" meaning the formal theory and "regex" to point out the (more powerfull) implementation. I think it's a good way to provide a common vocabulary.
As far as no comments or not being able to debug, you can place each regex in a clearly named variable. And for debugging, there are a wealth of external tools out there, and in Java you certainly can step through them in a debugger (if you like pain). Like code, there are ways to write regex that are more readable and maintainable than others. We should make our regexes clear and used in conjunction with well thought out code. Then the software is a joy to work with, after spending 2 days studying regular-expressions.info.