This will be used to match character classes (such as '[0-9a-f]'),
but it will also be used by the regular expression pattern compiler
to determine whether a character has special meaning in regular
expressions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Among other challenges, this regular expression is designed to demonstrate
that thread prioritization is finicky: Given the string 'aaaaaa' to match,
the first four threads will try to grab the second 'a', the third thread
(the one that matched the '(a??)' group) having scheduled the same
instruction pointer to the '(a+)' group that the second -- higher-priority
-- thread will try to advance to only after processing the '(a??)' group's
SPLIT. The second thread must override the third thread in that case,
essentially stopping the latter.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If we want to match greedy or reluctant regular expressions, we have
to make sure that certain threads are split off with a higher priority
than others. We will use the ThreadQueues' natural order as priority
order: high to low.
To support splitting into different-priority threads, let's introduce
a second SPLIT opcode: SPLIT_JMP. The latter prefers to jump while the
former prefers to execute the opcode directly after the SPLIT opcode.
There is a subtle challenge here, though: let's assume that there are
two current threads and the higher-priority one wants to jump where
the lower-priority one is already. In the PikeVM implementation
before this change, queueImmediately() would see that there is
already a thread queued for that program counter and *not* queue the
higher-priority one.
Example: when matching the pattern '(a?)(a??)(a?)' against the string
'aa', after the first character, the first (high priority) thread
will have matched the first group while the second thread matched the
second group. In the following step, therefore, the first thread will
want to SPLIT_JMP to match the final 'a' to the third group but the
second thread already queued that program counter.
The proposed solution is to introduce a third thread queue: 'queued'.
When queuing threads to be executed after reading the next character
from the string to match, they are not directly queued into 'next' but
into 'queued'. Every thread requiring immediate execution (i.e. before
reading the next character) will be queued into 'current'. Whenever
'current' is drained, the next thread from 'queued' that has not been
queued to 'current' yet will be executed.
That way, we can guarantee that 1) no lower-priority thread can override
a higher-priority thread and 2) infinite loop are prevented.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Instead of having an opcode 'CHAR', let's have the opcodes that fall
within the range of a char *be* the opcode 'match this character'.
While at it, break the ranges of the different types of opcodes apart
into ranges so that related operations are clustered.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We still do not parse the regular expression patterns, but we can at
least test that the hardcoded 'a(bb)+a' works as expected.
This class will be extended as we support more and more features.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Based on the just-implemented PikeVM, let's test it with a specific
regular expression. At this point, no parsing is implemented but instead
an explicit program executing a(bb)?a is hardcoded.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
So far, these are humble beginnings indeed. Based on the descriptions of
http://swtch.com/%7Ersc/regexp/regexp2.html
I started implementing a Thompson NFA / Pike VM.
The idea being that eventually, regular expressions are to be compiled
into special-purpose bytecode for the Pike VM that executes a varying
number of threads in lock-step over each character of the text to match.
The thread count is bounded by the length of the program: two different
threads with identical instruction pointer at the same character-to-match
would yield exactly the same outcome (and therefore, we can execute just
one such thread instead of possibly many).
To allow for matching groups, each thread carries a state with it, saving
the group offsets acquired so far.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This makes both the Pattern and the Matcher class abstract so that more
specialized patterns than the trivial patterns we support so far can be
implemented as convenient subclasses of the respective abstract base
classes.
To ease development, we work on copies in test/regex/ in the 'regex'
package. That way, it can be developed in Eclipse (because it does not
interfere with Oracle JRE's java.util.regex.* classes).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For quick access, the sezpoz library stores lists in
META-INF/annotations/ of classes that have been annotated in a
special way.
To support the use case where the annotations actually changed since
sezpoz stored said lists, sezpoz then creates proxy instances for the
annotations to provide some backwards compatibility: as long as there
are default values for any newly-introduced annotation values,
everything is groovy.
Therefore, let's make sure that proxy instances inherit the
annotations' default values.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When the class whose field is to be inspected has no annotations at all,
at least my javac here (1.6.0_51 on MacOSX) does not produce any class
addendum.
Therefore, let's verify that the addendum is not null before proceeding.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This adds an extra class path element to the VM running the unit tests,
writes files with identical file names into both directories and then
verifies that SystemClassLoader#getResources can find them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This change reuses the existing insertion sort (which was previously what
Arrays.sort() executed) in a full intro sort pipeline.
The implementation is based on the Musser paper on intro sort (Musser,
David R. "Introspective sorting and selection algorithms." Softw., Pract.
Exper. 27.8 (1997): 983-993.) and Wikipedia's current description of the
heap sort: http://en.wikipedia.org/wiki/Heapsort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We do not really support regular expressions yet, but we do support
trivial patterns including ones with escaped characters. Let's make sure
that that works as advertised.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The OpenJDK library wants to track and run the shutdown hooks itself
rather than let the VM do it, so we need to tell it when we're
exiting.
Also, in machine.cpp we need to use only the modifiers specified in
the InnerClasses attribute for inner classes rather than OR them with
the flags given at the top level of the class file.
This fixes a couple of tests in the Scala test suite
(run/reflection-modulemirror-toplevel-badpath.scala and
run/reflection-constructormirror-nested-good.scala).
To execute tests on a remote host (for instance, because you're cross-compiling),
simply do:
make remote-test=true remote-test-host=<host_to_test_on> test
You can set several variables to control the functionality of remote-test.
See them below, along with their default values:
remote-test-host = localhost # host to ssh to
remote-test-port = 22
remote-test-user = ${USER} # user to execute tests as
remote-test-dir = /tmp/avian-test-${USER} # dir to rsync build output to
In order to calculate the initial stack map of GC roots for an
exception handler, we do a logical "and" of maps across all the
instructions contained in the try block for that handler. This is
complicated by the presence of jsr/ret instructions, though, because
instructions in a subroutine may have multiple maps associated with
them corresponding to all the paths from which execution might flow to
them.
The bug in this case was that we were using an uninitialized map in
our calculation, resulting in a map with no GC roots at all. By the
time the map was initialized, the damage had already been done. The
solution is to treat an uninitialized map as if it has roots at all
positions so that it has no effect on the calculation until it has
been initialized with real data.
My earlier attempt (fa5d76b) missed an important detail, and somehow I
forgot to test the 32-bit OpenJDK build which made that omission
obvious. Here's the fix.
Java requires that NaNs be converted to zero and that numbers at or
beyond the limits of integer representation be clamped to the largest
or smallest value that can be represented, respectively.
I get this error when compiling with "make openjdk=...." on both x86_64 and
arm:
compiling test classes
test/Arrays.java:90: error: reference to equals is ambiguous, both method
equals(float[],float[]) in Arrays and method equals(Object[],Object[]) in
Arrays match
expect(java.util.Arrays.equals(null, null));
test/Arrays.java:95: error: reference to hashCode is ambiguous, both method
hashCode(double[]) in Arrays and method hashCode(Object[]) in Arrays match
java.util.Arrays.hashCode(null);
The attached patch fixes this.
When we skip a single-precision register to ensure a double-precision
load is aligned, we need to remember that in case we see another
single-precision argument later on, which we must backfill into that
register we skipped according to the ABI.
We were assuming the array element size was always the native word
size, which is not correct in general for primitive arrays, and this
led to wasted space at best and memory corruption at worst.
The ArrayList(Collection) constructor was allocating two arrays
instead of one due to an off-by-one error in ArrayList.grow. This
commit fixes that and makes grow and shrink more robust.
We were not properly converting dots to slashes internally for package names
and we did not properly handle Method.getAnnotations and
Method.getAnnotation(Class<T>) on methods without any annotations.
Added some tests to cover these cases.