Commit Graph

5 Commits

Author SHA1 Message Date
Johannes Schindelin
63b06ebde8 Regex: optimize matching characters
Instead of having an opcode 'CHAR', let's have the opcodes that fall
within the range of a char *be* the opcode 'match this character'.

While at it, break the ranges of the different types of opcodes apart
into ranges so that related operations are clustered.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
2073d4bffb Prepare the Matcher class for multiple groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e6ad10de04 Implement Pattern / Matcher classes based on the PikeVM
Based on the just-implemented PikeVM, let's test it with a specific
regular expression. At this point, no parsing is implemented but instead
an explicit program executing a(bb)?a is hardcoded.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
944f5f3567 Start implementing a regular expression engine
So far, these are humble beginnings indeed. Based on the descriptions of

	http://swtch.com/%7Ersc/regexp/regexp2.html

I started implementing a Thompson NFA / Pike VM.

The idea being that eventually, regular expressions are to be compiled
into special-purpose bytecode for the Pike VM that executes a varying
number of threads in lock-step over each character of the text to match.

The thread count is bounded by the length of the program: two different
threads with identical instruction pointer at the same character-to-match
would yield exactly the same outcome (and therefore, we can execute just
one such thread instead of possibly many).

To allow for matching groups, each thread carries a state with it, saving
the group offsets acquired so far.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
84829dc390 Refactor Pattern / Matcher classes
This makes both the Pattern and the Matcher class abstract so that more
specialized patterns than the trivial patterns we support so far can be
implemented as convenient subclasses of the respective abstract base
classes.

To ease development, we work on copies in test/regex/ in the 'regex'
package. That way, it can be developed in Eclipse (because it does not
interfere with Oracle JRE's java.util.regex.* classes).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00