Commit Graph

3622 Commits

Author SHA1 Message Date
Johannes Schindelin
edb48ffec2 Regex: support prioritized threads
If we want to match greedy or reluctant regular expressions, we have
to make sure that certain threads are split off with a higher priority
than others. We will use the ThreadQueues' natural order as priority
order: high to low.

To support splitting into different-priority threads, let's introduce
a second SPLIT opcode: SPLIT_JMP. The latter prefers to jump while the
former prefers to execute the opcode directly after the SPLIT opcode.

There is a subtle challenge here, though: let's assume that there are
two current threads and the higher-priority one wants to jump where
the lower-priority one is already. In the PikeVM implementation
before this change, queueImmediately() would see that there is
already a thread queued for that program counter and *not* queue the
higher-priority one.

Example: when matching the pattern '(a?)(a??)(a?)' against the string
'aa', after the first character, the first (high priority) thread
will have matched the first group while the second thread matched the
second group. In the following step, therefore, the first thread will
want to SPLIT_JMP to match the final 'a' to the third group but the
second thread already queued that program counter.

The proposed solution is to introduce a third thread queue: 'queued'.
When queuing threads to be executed after reading the next character
from the string to match, they are not directly queued into 'next' but
into 'queued'. Every thread requiring immediate execution (i.e. before
reading the next character) will be queued into 'current'. Whenever
'current' is drained, the next thread from 'queued' that has not been
queued to 'current' yet will be executed.

That way, we can guarantee that 1) no lower-priority thread can override
a higher-priority thread and 2) infinite loop are prevented.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
63b06ebde8 Regex: optimize matching characters
Instead of having an opcode 'CHAR', let's have the opcodes that fall
within the range of a char *be* the opcode 'match this character'.

While at it, break the ranges of the different types of opcodes apart
into ranges so that related operations are clustered.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
b03283033e Add a unit test for the regular expression engine
We still do not parse the regular expression patterns, but we can at
least test that the hardcoded 'a(bb)+a' works as expected.

This class will be extended as we support more and more features.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
2073d4bffb Prepare the Matcher class for multiple groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e6ad10de04 Implement Pattern / Matcher classes based on the PikeVM
Based on the just-implemented PikeVM, let's test it with a specific
regular expression. At this point, no parsing is implemented but instead
an explicit program executing a(bb)?a is hardcoded.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
944f5f3567 Start implementing a regular expression engine
So far, these are humble beginnings indeed. Based on the descriptions of

	http://swtch.com/%7Ersc/regexp/regexp2.html

I started implementing a Thompson NFA / Pike VM.

The idea being that eventually, regular expressions are to be compiled
into special-purpose bytecode for the Pike VM that executes a varying
number of threads in lock-step over each character of the text to match.

The thread count is bounded by the length of the program: two different
threads with identical instruction pointer at the same character-to-match
would yield exactly the same outcome (and therefore, we can execute just
one such thread instead of possibly many).

To allow for matching groups, each thread carries a state with it, saving
the group offsets acquired so far.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
84829dc390 Refactor Pattern / Matcher classes
This makes both the Pattern and the Matcher class abstract so that more
specialized patterns than the trivial patterns we support so far can be
implemented as convenient subclasses of the respective abstract base
classes.

To ease development, we work on copies in test/regex/ in the 'regex'
package. That way, it can be developed in Eclipse (because it does not
interfere with Oracle JRE's java.util.regex.* classes).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
b4e1ee97eb Avoid committing temporary vi files
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-08 15:20:05 -06:00
Joshua Warner
c3bbe555be make Sockets test Java6-compilable, make it more generic, and move it to 'extra' 2013-11-08 10:05:53 -07:00
Ilya Mizus
45ee25f68c Implement socket API 2013-11-08 09:55:43 -07:00
Joshua Warner
2800ffe826 rename JNIEXPORT to AVIAN_EXPORT in common.h, to avoid conflicting with jni.h 2013-11-08 08:35:17 -07:00
Joshua Warner
fd81e126ef fix Dates test for openjdk and stub out java.util.TimeZone 2013-11-07 20:44:02 -07:00
Joshua Warner
3c1afdd272 make jni.h and avian/machine.h non-interfering 2013-11-07 19:15:31 -07:00
Joshua Warner
76b0bb4872 remove non-conforming ZipEntry.getJavaTime API and associated tests (which failed the openjdk build) 2013-11-07 19:13:13 -07:00
Joshua Warner
a54be0a381 fix openjdk build (add InnerClassReference to classpath-sources) 2013-11-07 18:52:11 -07:00
Joshua Warner
341929f92e Attribute <anonymous@example.com> to Anonymous, instead of Joel Dice 2013-11-06 19:39:05 -07:00
Joshua Warner
04ef32fa9c Merge pull request #100 from dscho/mailmap
Add a mailmap
2013-11-06 16:41:42 -08:00
Johannes Schindelin
24134e7004 Add a mailmap
... for proper statistics (I thought I was #5 contributor at the
time I started the mailmap, but I was only #6).

Unfortunately, I could not find the full name of Stan
<goo.in.my.shoes@gmail.com> for proper credit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 15:03:15 -06:00
Joshua Warner
dd460ab55e Merge pull request #99 from dscho/fix-get-annotation
Fix NPE in Field#getAnnotation
2013-11-06 09:03:45 -08:00
Joshua Warner
42651da0b2 Merge pull request #96 from dscho/filter-input-stream
Filter input stream
2013-11-06 09:02:57 -08:00
Joshua Warner
4cf3d9de88 Merge pull request #95 from dscho/compatible-serialization
Java-compatible (de)serialization of TreeMap, ArrayList and Number
2013-11-06 09:02:12 -08:00
Joshua Warner
d0d4f600dc Merge pull request #94 from dscho/serialization
Implement Java-compatible serialization
2013-11-06 08:49:14 -08:00
Johannes Schindelin
ff50034206 Fix NPE in Field#getAnnotation
When the class whose field is to be inspected has no annotations at all,
at least my javac here (1.6.0_51 on MacOSX) does not produce any class
addendum.

Therefore, let's verify that the addendum is not null before proceeding.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:46:56 -06:00
Joshua Warner
513f02ebd3 Merge pull request #93 from dscho/misc
Miscellaneous enhancements and a fix
2013-11-06 07:35:06 -08:00
Johannes Schindelin
dddd9e5016 Serialize test: augment the hexdump with address and ASCII dump
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:18 -06:00
Johannes Schindelin
6ea017eb86 Mark java.lang.Number as serializable
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:18 -06:00
Johannes Schindelin
2a9ab48137 Make ArrayList's serialization compatible with OpenJDK's
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:18 -06:00
Johannes Schindelin
7e72f4362b Add a test to ensure TreeMap's (de)serialization compatibility with OpenJDK
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:18 -06:00
Johannes Schindelin
a90b3ae574 Make TreeMap (de)serialization compatible with Java
This is done by implementing the readObject()/writeObject() method
pair as demanded by the serialization specification. The specifics
were reverse-engineered from serializing trivial TreeMap instances
with OpenJDK.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:18 -06:00
Johannes Schindelin
afe09e32de Add a 'comparator' field to TreeMap
This will be needed for Java-compatible serialization of tree maps.

Note that the field should be null when the TreeMap uses the default
comparator.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:12:17 -06:00
Johannes Schindelin
884d0979a9 ObjectInputStream: handle super class descriptors correctly
We punted previously on any serializable super class' descriptor and
simply expected the super class not to be serializable (and consequently,
we expected the respective descriptor to be null). However, for quite
common classes, e.g. OpenJDK's Double class, this is not true.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
ff45f452da ObjectInputStream: handle TC_REFERENCE
There are serialized objects out in the wild which make heavy use of
TC_REFERENCE: for example when an object has a reference to itself.

Therefore we need to support that, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
b6d3caf458 ObjectInputStream: refactor class desc parsing
We punted previously on any serializable super class' descriptor and
simply expected the super class not to be serializable (and consequently,
we expected the respective descriptor to be null).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
25ed2965e7 ObjectInputStream: use private readObject() methods
The specification of the Java deserialization demands that a private
readObject(ObjectOutputStream) method is used -- if it exists. In
that case, ObjectInputStream must not initialize the contents of the
fields (called 'classdata[]' in the documentation) but offer that
functionality via the defaultReadObject() method.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
931617a787 ObjectOutputStream: use private writeObject() methods
The specification of the Java serialization demands that a private
writeObject(ObjectOutputStream) method is used -- if it exists. In that
case, ObjectOutputStream must not write the contents of the fields
(called 'classdata[]' in the documentation) but offer that via the
defaultWriteObject() method.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
f3189bc79d ObjectOutputStream: optimize String serialization
The serialization protocol specifies a quick method to serialize
a String (because that is so common an operation): TC_STRING +
(short)length + bytes. Let's use that, also to make it easier to test
the upcoming changes to TreeMap harmonizing that Avian's serialization
of said class with OpenJDK's.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
bba0d25ba5 ObjectInputStream: handle fields of type String
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
48e0912ad4 Test the new, Java-compatible (de)serialization
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
c78923d717 ObjectInputStream: add rudimentary support for objects
This is by no means a complete support for the deserialization compliant
to the Java Language Specification, but it is better to add the support
incrementally, for better readability of the commits.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
4b8285e597 Implement a rudimentary Java-compatible ObjectInputStream
The Java Language Specification documents the serialization protocol
implemented by this change set:

http://docs.oracle.com/javase/7/docs/platform/serialization/spec/protocol.html#10258

This change is intended to make it easier to use Avian VM as a drop-in
replacement for the Oracle JVM when serializing objects.

The previous serialization code is still available as
avian.LegacyObjectInputStream.

This commit only implements the non-object parts of the deserialization
specification.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:51 -06:00
Johannes Schindelin
35ecf5025c Make ObjectOutputStream's constants available to java.io
We will reuse the constants in the upcoming deserialization counterpart.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:50 -06:00
Johannes Schindelin
3dccd68fe7 Implement the Field#set<PrimitiveType> method family
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:50 -06:00
Johannes Schindelin
c2a6f4a726 Implement a Java-compatible ObjectOutputStream
The Java Language Specification documents the serialization protocol
implemented by this change set:

http://docs.oracle.com/javase/7/docs/platform/serialization/spec/protocol.html#10258

This change is intended to make it easier to use Avian VM as a drop-in
replacement for the Oracle JVM when serializing objects.

The previous serialization code is still available as
avian.LegacyObjectOutputStream.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:50 -06:00
Johannes Schindelin
f2dd4add26 Implement FilterReader
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:50 -06:00
Johannes Schindelin
2904dd738e Fix java.lang.reflect.Field.getLong()
The bug was that the long was cast to an int, cutting off the most
significant bytes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:10:50 -06:00
Johannes Schindelin
6a7c03aef9 Implement the Math#signum method
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:07:58 -06:00
Johannes Schindelin
efb3ef9b51 Initialize the context class loader to the app class loader
Previously, we initialized it to the boot class loader, but that is
inconsistent with Java; if compiling against OpenJDK's class library,
the context class loader is therefore initialized to the app class
loader, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:07:58 -06:00
Johannes Schindelin
f8028c9864 Add a dummy implementation of EmptyStackException
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:07:58 -06:00
Johannes Schindelin
6159f5cd3c Support Logger#log(Level,String,Object)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:07:58 -06:00
Johannes Schindelin
874bf4ef4c Initialize Logger instances to the 'INFO' level by default
... otherwise, logging would throw an exception when trying to
determine whether the current level allows the message to be logged.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 09:07:58 -06:00