Commit Graph

483 Commits

Author SHA1 Message Date
Joel Dice
84d97fb34c Merge pull request #256 from joshuawarner32/docker
Add i386 and openjdk dockerfiles
2014-05-10 18:55:11 -06:00
Joel Dice
c35435e450 fix portability problem in Strings test
There was a test in Strings.java that assumed the default character
encoding was UTF-8, which is an invalid assumption on some platforms
(e.g. Windows).  This modifies the test to specify the encoding
explicitly.
2014-05-09 16:38:33 -06:00
Joshua Warner
27ee3114ae add build command logging to ci.sh 2014-05-09 15:16:55 -06:00
Joshua Warner
94bd876f35 ci.sh: control which target is run for each of the configurations
There are two important things here:
* We only want to run "jdk-test" if we were running "test" for everything else.
  This gets around a bug where jdk-test fails for cross-compile builds (where JNI is involved)
* We can specify a different test target by setting the "test" environment variable.
  This is useful for cross-compiling the tests in a docker image
  (setting the test_target to "build-test")
2014-04-29 14:19:42 -07:00
Joshua Warner
9cb1f1bb26 Fix ci.sh tests on arm qemu systems
There are two problems:
* The x86 JIT compiler requires detectFeatures, defined in the x86 assembly.
  Thus it can't (currently) be built on non-x86 platforms.
  For the purposes of fixing test/ci.sh, it suffices to pretend
  codegen-targets=all means codegen-targets=native when on arm.
* Qemu can introduce some extra latency which was regularly screwing up the LinkedBlockingQueueTest.
  Solution: increase the timeout to 1/10th seconds.
2014-04-29 14:14:44 -06:00
Joel Dice
172ef9a7e6 Merge pull request #246 from joshuawarner32/master
Stop using *Critical functions in throwIOException
2014-04-24 19:40:10 -06:00
Joshua Warner
34962ff334 Merge pull request #245 from dicej/jdk8
add support for using the OpenJDK 8 class library
2014-04-24 18:50:38 -06:00
Joshua Warner
690ba9cdc7 Stop using *Critical functions in throwIOException
This was a bug, wherein upon throwing an exception, we would try to
allocate memory for the message - all while holding a critical
reference to the jbyteArray representing the exception string.  This
caused an expect to fail in allocate3.
2014-04-24 15:23:05 -06:00
Joel Dice
a41efb76c5 avoid NPE in URL.set when file is null 2014-04-23 15:51:57 -06:00
Joel Dice
7de555c797 add support for using the OpenJDK 8 class library
This ensures that all tests pass when Avian is built with an
openjdk=$path option such that $path points to either OpenJDK 7 or 8.

Note that I have not yet tried using the openjdk-src option with
OpenJDK 8.  I'll work on that next.
2014-04-23 15:36:56 -06:00
Joel Dice
1ed3de08fa fix Misc test failures
The Misc test was failing when run as "make input=Misc run" since
test-flags did not include $(build)/extra-dir in the class library,
leading the ClassLoader.getResources test to fail.

Also, the UnknownHostException test was not reliable -- some ISPs
(mine included) return DNS matches for bogus hostnames, defaulting to
the IP address of a webserver intended to help users with name
resolution problems.  That's dumb, I know, but I'm guessing I'm not
the only person with a dumb ISP, and it seems better to just remove
the test than make people think Avian is broken when it's really just
their DNS server that's broken.
2014-04-20 19:11:15 -06:00
Joel Dice
b74f9e32e9 fix NPE in Field.getAnnotations 2014-04-17 13:16:21 -06:00
Joel Dice
8f4c0e78ce clean up System.getProperties and related methods
The behavior of Avian's versions of these methods was egregiously
non-standard, and there were problems with the Android implementations
as well.
2014-04-04 13:43:59 -06:00
Joshua Warner
573367e7a1 Merge pull request #212 from dicej/net
various refinements to network implementation
2014-04-02 19:41:21 -06:00
Joel Dice
a7e86e6cd4 implement Unsafe.{get|put}*Volatile 2014-03-31 17:31:28 -06:00
Joel Dice
6e7149061c various refinements to network implementation
The main idea is to make DatagramChannel and *SocketChannel behave in
a way that more closely matches the standard, e.g. allow binding
sockets to addresses without necessarily listening on those addresses
and accept null addresses where appropriate.  It also avoids multiple
redundant DNS lookups.

This commit also implements CharBuffer and BindException, and adds the
Readable interface.
2014-03-31 15:22:14 -06:00
Joel Dice
c2bfba92f0 consolidate duplicate Cell classes 2014-03-24 10:47:37 -06:00
Joel Dice
fd778c2c76 remove redundant interfaces and generalize shift/reset generics
Turns out Function can do the jobs of both CallbackReceiver and
FunctionReceiver, so I've removed the latter two.

Also, shift and reset should work with a combination of types, not
just a single type, so I've expanded their generic signatures.
2014-03-21 07:38:29 -06:00
Joel Dice
570b5447bf fix openjdk and android builds when continuations=true
Also, update the whitespace padding for printing test results to
accommodate long names like extra.ComposableContinuations.
2014-03-21 07:38:29 -06:00
Joel Dice
ff57447507 fix handling of multiple shifts delimited by a single reset 2014-03-21 07:38:28 -06:00
Joel Dice
aa3fa1aff4 simplify shift/reset API and add test (currently failing) 2014-03-21 07:38:28 -06:00
Joel Dice
91e4d2b4a1 quick sketch of composable continuation implementation
I've been told by knowledgeable people that it is impossible to
implement composable continuations (AKA delimited continuations AKA
shift/reset) in terms of call-with-current-continuation.  Since I
don't yet understand why that is, I figured it would help my
understanding to attempt it and see how it fails.
2014-03-21 07:38:28 -06:00
Joshua Warner
c5012cda72 Merge pull request #205 from dicej/getPackage
ensure ClassLoader.getPackage works with all class libraries
2014-03-19 17:59:44 -06:00
Joel Dice
8740d76154 ensure ClassLoader.getPackage works with all class libraries
There's more work to do to derive all the properties of a given class
from its code source (e.g. JAR file), but this at least ensures that
ClassLoader.getPackage will actually return something non-null when
appropriate.
2014-03-19 11:21:26 -06:00
Mike Jensen
354d522cd5 Renamed these two files to indicate they are not actual tests, but rather just to help other tests 2014-03-19 10:54:06 -06:00
Mike Jensen
54a1fbac4c Removing unit test where avian implementation is more readily willing to throw a ConcurrentModificationException. 2014-03-19 09:05:19 -06:00
Mike Jensen
b5d388a718 Added an implemention of ArrayDeque, as well as unit tests
I also used this opportunity to reduce code duplication around other queue/deque implementations.
2014-03-18 19:45:00 -06:00
Joel Dice
58079887a9 fix broken Class.getDeclar{ed|ing}Classes implementations
classpath-common.h's getDeclaringClass was trying to look up
non-existing classes, which led to an abort, and I don't even know
what Class.getDeclaredClasses was trying to do, but it was ugly and
wrong.
2014-03-14 11:10:54 -06:00
Joel Dice
918b7828f1 fix StackOverflowError stack walking in tails=true builds
The various Architecture::nextFrame implementations were not walking
the stack correctly when a StackOverflowError was thrown.  The
throwStackOverflow thunk is called before the frame of the most
recently called method has been fully created, and because tails=true
builds use a different calling convention, we need to treat this
situation carefully when building a stack trace or unwinding.
Otherwise, we will skip past all the java frames to the next native
frame, which is what was happening.
2014-03-14 09:59:04 -06:00
Joel Dice
dd359ef937 rename Concurrent to ConcurrentHashMapTest 2014-03-12 13:04:20 -06:00
Joel Dice
4d05bfd540 fix Completeion/Completion misspelling 2014-03-12 10:44:24 -06:00
Joel Dice
c0d178d5f1 implement ConcurrentHashMap and AtomicReferenceArray
This is the simplest possible ConcurrentHashMap I could come up with
that works and is actually concurrent in the way one would expect.
It's pretty unconventional, being based on a persistent red-black
tree, and not particularly memory-efficient or cache-friendly.  I
think this is a good place to start, though, and it should perform
reasonably well for most workloads.  Patches for a more efficient
implementation are welcome!

I also implemented AtomicReferenceArray, since I was using it in my
first, naive attempt to implement ConcurrentHashMap.

I had to do a bit of refactoring, including moving some non-standard
stuff from java.util.Collections to avian.Data so I could make it
available to code outside the java.util package, which is why I had to
modify several unrelated files.
2014-03-12 10:44:24 -06:00
Mike Jensen
efb31dd09a Added verify function to avoid throwing so many runtime exceptions in the tests 2014-03-11 09:20:34 -06:00
Mike Jensen
68fca60d21 Added interface BlockingDeque, and implementation for ExecutorCompletionService and LinkedBlockingQueue.
I had to implement a blocking queue for ExecutorCompletionService.  LinkedBlockingQueue could be very easily extended right now to implement the java 7 LinkedBlockingDeque.  Right now LinkedBlockingQueue just synchronizes and depends on LinkedList implementation.  But I wrote a very complete unit test suite so we if we want to put a more concurrent design here, we have a complete test suite to verify against.# Please enter the commit message for your changes. Lines starting
2014-03-10 19:06:37 -06:00
Joshua Warner
ed89e0c67d Merge pull request #194 from jentfoo/FutureTask
Added implementation and tests for FutureTask.
2014-03-10 16:51:36 -06:00
Mike Jensen
83a31314e0 Added implementation and tests for FutureTask.
I also was missing the set operation for AtomicReference, and cleaned a couple things up from LockSupport.
2014-03-10 10:53:49 -06:00
Joel Dice
866c057f0d fix Class.getDeclaredMethods
getDeclaredMethods was returning methods which were inherited from
interfaces but not (re)declared in the class itself, due to the VM's
internal use of VMClass.methodTable differing from its role in
reflection.  For reflection, we must only include the declared
methods, not the inherited but un-redeclared ones.

Previously, we saved the original method table in
ClassAddendum.methodTable before creating a new one which contains
both declared and inherited methods.  That wasted space, so this patch
replaces ClassAddendum.methodTable with
ClassAddendum.declaredMethodCount, which specifies how many of the
methods in VMClass.methodTable were declared in that class.

Alternatively, we could ensure that undeclared methods always have
their VMMethod.class_ field set to the declaring class instead of the
inheriting class.  I tried this, but it led to subtle crashes in
interface method lookup.  The rest of the VM relies not only on
VMClass.methodTable containing all inherited interface methods but
also that those methods point to the inheriting class, not the
declaring class.  Changing those assumptions would be a much bigger
(and more dangerous in terms of regression potential) effort than I
care to take on right now.  The solution I chose is a bit ugly, but
it's safe.
2014-03-10 08:51:00 -06:00
Joel Dice
25d69f38ee match Java's schizophrenic concept of inner class access modifiers
An inner class has two sets of modifier flags: one is declared in the
usual place in the class file and the other is part of the
InnerClasses attribute.  Not only is that redundant, but they can
contradict, and the VM can't just pick one and roll with it.  Instead,
Class.getModifiers must return the InnerClasses version, whereas
reflection must check the top-level version.  So even if
Class.getModifiers says the class is protected, it might still be
public for the purpose of reflection depending on what the
InnerClasses attribute says.  Crazy?  Yes.
2014-03-06 16:17:43 -07:00
Joshua Warner
deca71da52 build arm and powerpc targets in the ci build 2014-02-25 21:38:29 -07:00
Joel Dice
1445835c4f fix Thread.join when using Android class library
Android's Thread.join expects the VM to null-out Thread.vmThread when
the thread exits.  Otherwise, it will block forever.
2014-02-25 14:58:32 -07:00
Joel Dice
1735a7976a do not omit calls to empty methods which may trigger class initialization
There's a small optimization in compileDirectInvoke which tries to
avoid generating calls to empty methods.  However, this causes
problems for code which uses such a call to ensure a class is
initialized -- if we omit that call, the class may not be
initialized and any side effects of that initialization may not
happen when the program expects them to.

This commit ensures that the compiler only omits empty method calls
when the target class does not need initialization.  It also removes
commented-out code in classpath-openjdk.cpp which was responsible for
loading libmawt proactively; that was a hack to get JogAmp to work
before we understood what the real problem was.
2014-02-10 08:40:14 -07:00
Joshua Warner
02becdb5bf implement Arrays.deepEquals and Objects.deepEquals 2014-01-30 17:12:34 -07:00
Joshua Warner
65ca5752da Implement single quotes in MessageFormat 2014-01-28 09:56:25 -07:00
Joshua Warner
d2cc630736 implement java/util/Observ* 2014-01-20 10:17:22 -07:00
Joel Dice
1f6051bcbc Merge pull request #149 from jentfoo/concurrency_classpath_extension
Concurrency classpath extension (part of the atomic package implementation)
2014-01-03 16:06:04 -08:00
Mike Jensen
ac27ebd995 Reduced code duplication by combining these three very similar tests into a single file. 2014-01-03 16:24:11 -07:00
Mike Jensen
2760252a13 Avoid doing a Thread.sleep() and instead do a wait and notify. 2014-01-03 15:39:40 -07:00
Mike Jensen
9809898470 Moved the waitTillReady to before the doOperation call in order to have the threads synchronized. 2014-01-03 15:27:11 -07:00
Joel Dice
4ce545c4fd add test for d1bdf2f (Class.getMethod bug)
I meant to include this in the original commit, but forgot.
2014-01-03 14:40:47 -07:00
Mike Jensen
735921cd6f Renamed these tests to a shorter name so the test output still looks clean 2014-01-03 11:22:33 -07:00
Mike Jensen
f4f4b8a26b Fix for unit test failure when threads are created too slowly. 2014-01-03 10:08:36 -07:00
Mike Jensen
f7341732fc Added some tests which would fail with a simple volatile, but should work for the atomic implementations. 2014-01-03 09:36:27 -07:00
Mike Jensen
996e52170f Fix for spelling error joel pointed out, as well as a simple unit test around the TimeUnit conversions 2013-12-24 11:30:50 -07:00
Joel Dice
59d5bbbb1a throw UnknownHostException if host is not found in InetAddress.getByName 2013-12-18 10:43:11 -07:00
Joel Dice
5f40c1642e don't throw UnknownHostException from InetAddress.getByName("0.0.0.0")
0.0.0.0 means any local interface, which is commonly used by servers
which wish to listen on all interfaces.
2013-12-18 10:12:10 -07:00
Joshua Warner
0340be23ce make sure a busy-waiting loop can't block the GC (and hence the whole VM) 2013-12-13 10:39:36 -07:00
Joshua Warner
ef82c4a03a Fix extra java 1.6 failures... that only happen with -source 1.6. Go figure. 2013-12-11 08:51:00 -07:00
Joshua Warner
d2c3d771d7 fix java 1.6 compatibility, and make sure it's maintained in the future 2013-12-10 20:26:29 -07:00
Joel Dice
e50ee5152a use portable conditional expression in test.sh
[[ expression ]] is bash-specific, so we use [ expression ] instead.
2013-12-06 20:57:26 -07:00
Joel Dice
afc3c64e37 Merge pull request #121 from joshuawarner32/master
add jdk-test target, and fix failures
2013-12-06 18:50:45 -08:00
Joshua Warner
db2a701cf5 Merge pull request #122 from dicej/master
fix various Android test suite regressions and add more reflection tests
2013-12-06 18:40:39 -08:00
Joshua Warner
0a4eff33b2 fix jdk-test failures 2013-12-06 19:30:04 -07:00
Joel Dice
7056315c18 fix various Android test suite regressions and add more reflection tests
Most of these regressions were simply due to testing a lot more stuff,
esp. annotations and reflection, revealing holes in the Android
compatibility code.  There are still some holes, but at least the
suite is passing (except for a fragile test in Serialize.java which I
will open an issue for).

Sorry this is such a big commit; there was more to address than I
initially expected.
2013-12-06 18:48:47 -07:00
Johannes Schindelin
ddd057c53a Do not test java.util.TreeMap's serialization in the Serialize test
In the Android class path, TreeMap is implemented differently and as a
consequence its serialization is incompatible with OpenJDK's. So let's
test a private static class' serialization instead, to make sure that
the wire protocol defined by the Java Language Specification is
implemented.

This addresses issue #123 reported by Joel Dice.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-06 19:24:41 -06:00
Joshua Warner
47a7732a81 add jdk-test target, and fix failures
The intent of this target is to run our test suite against the installed jre.
This should help prevent our VM from diverging in implementation from the jdk.

The remainder of this commit fixes the problems that this exposes.
2013-12-06 15:00:02 -07:00
Johannes Schindelin
d8d980be9a Fix the look-behind test for OpenJDK
OpenJDK's regex engine can only handle look-behinds of limited sizes.
So let's just test for that, not the unbounded one we had before (that
our own regex engine handles quite fine, though).

This fixes issue #115.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-06 10:50:34 -06:00
Joel Dice
abe8bc6fda fix exception wrapping for Method.invoke and static initializers
Method.invoke should initialize its class before invoking the method,
throwing an ExceptionInInitializerError if it fails, without wrapping
said error in an InvocationTargetException.

Also, we must initialize ExceptionInInitializerError.exception when
throwing instances from the VM, since OpenJDK's
ExceptionInInitializerError.getCause uses the exception field, not the
cause field.
2013-12-05 22:28:13 -07:00
Joshua Warner
8cda2446d5 implement sun.misc.Unsafe.throwException 2013-12-05 20:28:08 -07:00
Joel Dice
2000c139ea modify TreeSet.MyIterator to support both ascending and descending iteration
This also fixes a bug such that the remove() method left the iterator
in an inconsistent state.
2013-12-04 17:52:27 -07:00
Joshua Warner
fe9ac94629 Merge pull request #105 from dscho/regex
Support (the most common subset of) regular expressions
2013-12-04 11:57:26 -08:00
Joshua Warner
a90100ee32 Merge pull request #112 from dscho/get-generic-type
Support Field#getGenericType()
2013-12-04 11:22:49 -08:00
Johannes Schindelin
6626b477ad Replace java.util.regex.* with the new regular expression engine
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-04 12:52:03 -06:00
Johannes Schindelin
e96379ee19 Regex: document the strengths and limitations
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-04 12:52:02 -06:00
Johannes Schindelin
9e7169fe34 Regex: let toString() in the Compiler reconstruct the regex
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-04 12:52:02 -06:00
Johannes Schindelin
c975e25864 Regex: implement counted quantifiers: {<n>,<m>}
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-04 12:52:02 -06:00
Johannes Schindelin
2d83622975 Implement Field#getGenericType()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 16:48:40 -06:00
Johannes Schindelin
0eb2d55da2 Class#getDeclaredClasses(): exclude inner classes of inner classes
Inner classes can have inner classes, but getDeclaredClasses() is
supposed to list *only* the immediate inner classes.

Example: if class Reflection contains a class Hello that contains
a class World, Reflection.class.getDeclaredClasses() must not
include World in its result.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 16:48:40 -06:00
Johannes Schindelin
fb6486e276 Regex: implement ^,$,\b and \B
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
fe32cce2ad Regex: support intersection/union of character classes
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
b4c768b101 Regex: Test Pattern#split(String)
The particular pattern we use to test it is used in ImgLib2, based on
this answer on stackoverflow:

	http://stackoverflow.com/a/279337

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
8ab10a6953 Regex: support special character classes
This adds support for character classes such as \d or \W, leaving \p{...}
style character classes as an exercise for later.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
098f688cd8 Regex: implement negative look-arounds
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
8b611c8075 Regex: support look-behind patterns
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
62d1964779 Regex: add a method to reverse the PikeVM program
A program for the PikeVM corresponds to a regular expression pattern. The
program matches the character sequence in left-to-right order. However,
for look-behind expressions, we will want to match the character sequence
backwards.

To this end, it is nice that regular expression patterns can be reversed
in a straight-forward manner. However, it would be nice if we could avoid
multiple parsing passes and simply parse even look-behind expressions as
if they were look-ahead ones, and then simply reverse the program for that
part.

Happily, it is not difficult to reverse the program so it is equivalent to
matching the pattern backwards.

There is one catch, though. Imagine matching the sequence "a" against the
regular expression "(a?)a?". If we match forward, the group will match the
letter "a", when matching backwards, it will match the empty string. So,
while the reverse pattern is equivalent to the forward pattern in terms of
"does the pattern match that sequence", but not its sub-matches. For that
reason, Java simply ignores capturing groups in look-behind patterns (and
for consistency, the same holds for look-ahead patterns).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
85af36ef90 Regex: support lookaheads
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
d4a2f58eb5 Regex: implement alternatives
Now we support regular expressions like 'A|B|C'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
c3a06a600a Regex: implement non-capturing groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
53563c4f8e Regex: add support for character classes
Now we support regular expression patterns a la '[0-9]'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
ca428c406c Regex: implement find()
Now that we have non-greedy repeats, we can implement the find() (which
essentially prefixes the regular expression pattern with '.*?'.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
7da03b0f19 Regex: Implement reluctant '?', '*' and '+'
Now that we have reluctant quantifiers, we can get rid of the hardcoded
program for the challenging regular expression pattern.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:11 -06:00
Johannes Schindelin
f979505b3d Regex: implement * and + operators
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
d753edafcd Regex: support the dot
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e2105670a0 Regex compiler: fall back to TrivialPattern when possible
While at it, let's get rid of the unescaping in TrivialPattern which was
buggy anyway: special operators such as \b were misinterpreted as trivial
patterns.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
04d8955f98 Regex: Implement compiler for regular expression patterns
Originally, this developer wanted to (ab)use the PikeVM with a
hand-crafted program and an added "callback" opcode to parse the regular
expressions.

However, this turned out to be completely unnecessary: there are no
ambiguities in regular expression patterns, so there is no need to do
anything else than parse the pattern, one character at a time, into a
nested expression that then knows how to write itself into a program for
the PikeVM.

For the moment, we still hardcode the program for the regular expression
pattern demonstrating the challenge with the prioritized threads because
the compiler cannot yet parse reluctant operators.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
26c4bf8d8b Regex: add a class for matching character classes
This will be used to match character classes (such as '[0-9a-f]'),
but it will also be used by the regular expression pattern compiler
to determine whether a character has special meaning in regular
expressions.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
d00f799d2e Regex: special-case a(a*?)(a?)(a??)(a+)(a*)a
Among other challenges, this regular expression is designed to demonstrate
that thread prioritization is finicky: Given the string 'aaaaaa' to match,
the first four threads will try to grab the second 'a', the third thread
(the one that matched the '(a??)' group) having scheduled the same
instruction pointer to the '(a+)' group that the second -- higher-priority
-- thread will try to advance to only after processing the '(a??)' group's
SPLIT. The second thread must override the third thread in that case,
essentially stopping the latter.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
edb48ffec2 Regex: support prioritized threads
If we want to match greedy or reluctant regular expressions, we have
to make sure that certain threads are split off with a higher priority
than others. We will use the ThreadQueues' natural order as priority
order: high to low.

To support splitting into different-priority threads, let's introduce
a second SPLIT opcode: SPLIT_JMP. The latter prefers to jump while the
former prefers to execute the opcode directly after the SPLIT opcode.

There is a subtle challenge here, though: let's assume that there are
two current threads and the higher-priority one wants to jump where
the lower-priority one is already. In the PikeVM implementation
before this change, queueImmediately() would see that there is
already a thread queued for that program counter and *not* queue the
higher-priority one.

Example: when matching the pattern '(a?)(a??)(a?)' against the string
'aa', after the first character, the first (high priority) thread
will have matched the first group while the second thread matched the
second group. In the following step, therefore, the first thread will
want to SPLIT_JMP to match the final 'a' to the third group but the
second thread already queued that program counter.

The proposed solution is to introduce a third thread queue: 'queued'.
When queuing threads to be executed after reading the next character
from the string to match, they are not directly queued into 'next' but
into 'queued'. Every thread requiring immediate execution (i.e. before
reading the next character) will be queued into 'current'. Whenever
'current' is drained, the next thread from 'queued' that has not been
queued to 'current' yet will be executed.

That way, we can guarantee that 1) no lower-priority thread can override
a higher-priority thread and 2) infinite loop are prevented.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
63b06ebde8 Regex: optimize matching characters
Instead of having an opcode 'CHAR', let's have the opcodes that fall
within the range of a char *be* the opcode 'match this character'.

While at it, break the ranges of the different types of opcodes apart
into ranges so that related operations are clustered.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
b03283033e Add a unit test for the regular expression engine
We still do not parse the regular expression patterns, but we can at
least test that the hardcoded 'a(bb)+a' works as expected.

This class will be extended as we support more and more features.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
2073d4bffb Prepare the Matcher class for multiple groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00