Commit Graph

15 Commits

Author SHA1 Message Date
Johannes Schindelin
c975e25864 Regex: implement counted quantifiers: {<n>,<m>}
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-04 12:52:02 -06:00
Johannes Schindelin
fb6486e276 Regex: implement ^,$,\b and \B
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
8ab10a6953 Regex: support special character classes
This adds support for character classes such as \d or \W, leaving \p{...}
style character classes as an exercise for later.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
098f688cd8 Regex: implement negative look-arounds
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
8b611c8075 Regex: support look-behind patterns
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
85af36ef90 Regex: support lookaheads
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
d4a2f58eb5 Regex: implement alternatives
Now we support regular expressions like 'A|B|C'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
c3a06a600a Regex: implement non-capturing groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
53563c4f8e Regex: add support for character classes
Now we support regular expression patterns a la '[0-9]'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
ca428c406c Regex: implement find()
Now that we have non-greedy repeats, we can implement the find() (which
essentially prefixes the regular expression pattern with '.*?'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
7da03b0f19 Regex: Implement reluctant '?', '*' and '+'
Now that we have reluctant quantifiers, we can get rid of the hardcoded
program for the challenging regular expression pattern.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
f979505b3d Regex: implement * and + operators
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
d753edafcd Regex: support the dot
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e2105670a0 Regex compiler: fall back to TrivialPattern when possible
While at it, let's get rid of the unescaping in TrivialPattern which was
buggy anyway: special operators such as \b were misinterpreted as trivial
patterns.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
04d8955f98 Regex: Implement compiler for regular expression patterns
Originally, this developer wanted to (ab)use the PikeVM with a
hand-crafted program and an added "callback" opcode to parse the regular
expressions.

However, this turned out to be completely unnecessary: there are no
ambiguities in regular expression patterns, so there is no need to do
anything else than parse the pattern, one character at a time, into a
nested expression that then knows how to write itself into a program for
the PikeVM.

For the moment, we still hardcode the program for the regular expression
pattern demonstrating the challenge with the prioritized threads because
the compiler cannot yet parse reluctant operators.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00