Commit Graph

3950 Commits

Author SHA1 Message Date
Johannes Schindelin
d753edafcd Regex: support the dot
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e2105670a0 Regex compiler: fall back to TrivialPattern when possible
While at it, let's get rid of the unescaping in TrivialPattern which was
buggy anyway: special operators such as \b were misinterpreted as trivial
patterns.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
04d8955f98 Regex: Implement compiler for regular expression patterns
Originally, this developer wanted to (ab)use the PikeVM with a
hand-crafted program and an added "callback" opcode to parse the regular
expressions.

However, this turned out to be completely unnecessary: there are no
ambiguities in regular expression patterns, so there is no need to do
anything else than parse the pattern, one character at a time, into a
nested expression that then knows how to write itself into a program for
the PikeVM.

For the moment, we still hardcode the program for the regular expression
pattern demonstrating the challenge with the prioritized threads because
the compiler cannot yet parse reluctant operators.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
26c4bf8d8b Regex: add a class for matching character classes
This will be used to match character classes (such as '[0-9a-f]'),
but it will also be used by the regular expression pattern compiler
to determine whether a character has special meaning in regular
expressions.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
d00f799d2e Regex: special-case a(a*?)(a?)(a??)(a+)(a*)a
Among other challenges, this regular expression is designed to demonstrate
that thread prioritization is finicky: Given the string 'aaaaaa' to match,
the first four threads will try to grab the second 'a', the third thread
(the one that matched the '(a??)' group) having scheduled the same
instruction pointer to the '(a+)' group that the second -- higher-priority
-- thread will try to advance to only after processing the '(a??)' group's
SPLIT. The second thread must override the third thread in that case,
essentially stopping the latter.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
edb48ffec2 Regex: support prioritized threads
If we want to match greedy or reluctant regular expressions, we have
to make sure that certain threads are split off with a higher priority
than others. We will use the ThreadQueues' natural order as priority
order: high to low.

To support splitting into different-priority threads, let's introduce
a second SPLIT opcode: SPLIT_JMP. The latter prefers to jump while the
former prefers to execute the opcode directly after the SPLIT opcode.

There is a subtle challenge here, though: let's assume that there are
two current threads and the higher-priority one wants to jump where
the lower-priority one is already. In the PikeVM implementation
before this change, queueImmediately() would see that there is
already a thread queued for that program counter and *not* queue the
higher-priority one.

Example: when matching the pattern '(a?)(a??)(a?)' against the string
'aa', after the first character, the first (high priority) thread
will have matched the first group while the second thread matched the
second group. In the following step, therefore, the first thread will
want to SPLIT_JMP to match the final 'a' to the third group but the
second thread already queued that program counter.

The proposed solution is to introduce a third thread queue: 'queued'.
When queuing threads to be executed after reading the next character
from the string to match, they are not directly queued into 'next' but
into 'queued'. Every thread requiring immediate execution (i.e. before
reading the next character) will be queued into 'current'. Whenever
'current' is drained, the next thread from 'queued' that has not been
queued to 'current' yet will be executed.

That way, we can guarantee that 1) no lower-priority thread can override
a higher-priority thread and 2) infinite loop are prevented.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
63b06ebde8 Regex: optimize matching characters
Instead of having an opcode 'CHAR', let's have the opcodes that fall
within the range of a char *be* the opcode 'match this character'.

While at it, break the ranges of the different types of opcodes apart
into ranges so that related operations are clustered.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
b03283033e Add a unit test for the regular expression engine
We still do not parse the regular expression patterns, but we can at
least test that the hardcoded 'a(bb)+a' works as expected.

This class will be extended as we support more and more features.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
2073d4bffb Prepare the Matcher class for multiple groups
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
e6ad10de04 Implement Pattern / Matcher classes based on the PikeVM
Based on the just-implemented PikeVM, let's test it with a specific
regular expression. At this point, no parsing is implemented but instead
an explicit program executing a(bb)?a is hardcoded.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
944f5f3567 Start implementing a regular expression engine
So far, these are humble beginnings indeed. Based on the descriptions of

	http://swtch.com/%7Ersc/regexp/regexp2.html

I started implementing a Thompson NFA / Pike VM.

The idea being that eventually, regular expressions are to be compiled
into special-purpose bytecode for the Pike VM that executes a varying
number of threads in lock-step over each character of the text to match.

The thread count is bounded by the length of the program: two different
threads with identical instruction pointer at the same character-to-match
would yield exactly the same outcome (and therefore, we can execute just
one such thread instead of possibly many).

To allow for matching groups, each thread carries a state with it, saving
the group offsets acquired so far.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Johannes Schindelin
84829dc390 Refactor Pattern / Matcher classes
This makes both the Pattern and the Matcher class abstract so that more
specialized patterns than the trivial patterns we support so far can be
implemented as convenient subclasses of the respective abstract base
classes.

To ease development, we work on copies in test/regex/ in the 'regex'
package. That way, it can be developed in Eclipse (because it does not
interfere with Oracle JRE's java.util.regex.* classes).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-12-03 12:28:10 -06:00
Joshua Warner
c3e3447c62 Merge pull request #107 from dscho/temp-file
Delete temporary test file afterwards
2013-12-02 19:54:38 -08:00
Mike Jensen
a2e1e1eec9 Merge pull request #102 from dscho/proxy-annotations
This looks good to me, good work.
2013-12-02 09:13:01 -08:00
Johannes Schindelin
0681531dc0 Test complicated annotation constructs
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-27 10:39:28 -06:00
Johannes Schindelin
e1d91f153b Fix parsing of recursive annotations
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-27 10:39:28 -06:00
Johannes Schindelin
6c57bd9174 Verify that Proxy instances have access to annotations' default values
For quick access, the sezpoz library stores lists in
META-INF/annotations/ of classes that have been annotated in a
special way.

To support the use case where the annotations actually changed since
sezpoz stored said lists, sezpoz then creates proxy instances for the
annotations to provide some backwards compatibility: as long as there
are default values for any newly-introduced annotation values,
everything is groovy.

Therefore, let's make sure that proxy instances inherit the
annotations' default values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-27 10:39:28 -06:00
Johannes Schindelin
0a179355f4 Pass the correct Method instance to the InvocationHandlers
We should pass the method of the original interface to the
InvocationHandler, not the method of the interface.

That way, proxy instances of annotations will have easy access to
the default values.

This happens to be compatible with the way Oracle Java does it, too.

To accomplish our goal, we keep a global map between proxy classes and
Method references and assign the appropriate list to a field of the
Proxy subclass. This means that we now have to call the super-class
constructor in the generated constructor (which is the correct thing to
do anyway... ;-)).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-27 10:35:48 -06:00
Johannes Schindelin
7d7aaa003e Delete temporary test file afterwards
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 16:05:14 -06:00
Johannes Schindelin
6f5bcb00b9 Use the default value in annotations' proxies
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 15:28:20 -06:00
Johannes Schindelin
58ec623d7a Implement Method#getDefaultValue()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 15:28:13 -06:00
Johannes Schindelin
ff323d46a3 AnnotationInvocationHandler: avoid calling Method#getName too often
This is only a cosmetic change, but we should not call getName()
over and over again ;-)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 15:27:48 -06:00
Johannes Schindelin
db0422dcde Proxy: make all methods public
Proxies implement interfaces whose methods *must* be public, as per the
specification of the Java language.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 15:27:31 -06:00
Johannes Schindelin
1960081d1a Compile the annotation tests with the annotations in the class path
Earlier, if the annotations were already up-to-date (but
Annotations.class not), the compilation would fail.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-26 15:27:07 -06:00
Joshua Warner
b246804793 Merge pull request #106 from dscho/fix-travis-javadoc
Fix travis javadoc
2013-11-25 14:03:03 -08:00
Johannes Schindelin
d61f9ec1eb Fix Travis' automatic javadoc update
The script used the wrong directory to copy the JavaDocs from.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-24 11:15:11 -06:00
Johannes Schindelin
fbf169bacf Use subdirectory in build/ for temporary gh-pages checkout
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-24 11:14:26 -06:00
Johannes Schindelin
e5caed555f Do not use 'echo -e' unnecessarily
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-24 11:13:33 -06:00
Joshua Warner
04112ba7e4 Merge pull request #104 from l1m5/master
Automatically generate and commit Javadoc.
2013-11-22 17:03:41 -08:00
Ben Limmer
39214d860f Automatically generate and commit Javadoc.
This will only generate on pushes to master, and not pull requests.
2013-11-20 17:02:33 -07:00
Joshua Warner
2b68815636 Merge pull request #103 from dscho/data-input-stream
Data input stream
2013-11-12 10:46:10 -08:00
Joshua Warner
80d49dada7 Merge pull request #101 from getlantern/lantern
Fixed problem picking up policy jars in OpenJDK embedded build
2013-11-11 08:34:46 -08:00
Johannes Schindelin
209f2a3aff Fix DataOutputStream#writeUTF
This developer did not read the specs closely enough and missed that
the length of the byte array needs to be written out first, so that
DataInputStream#readUTF has a chance of reading the string back.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-08 17:42:14 -06:00
Johannes Schindelin
759a08bb54 Implement DataInputStream
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-08 17:42:14 -06:00
Johannes Schindelin
b4e1ee97eb Avoid committing temporary vi files
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-08 15:20:05 -06:00
Joshua Warner
c3bbe555be make Sockets test Java6-compilable, make it more generic, and move it to 'extra' 2013-11-08 10:05:53 -07:00
Ilya Mizus
45ee25f68c Implement socket API 2013-11-08 09:55:43 -07:00
Joshua Warner
2800ffe826 rename JNIEXPORT to AVIAN_EXPORT in common.h, to avoid conflicting with jni.h 2013-11-08 08:35:17 -07:00
Ox To A Cart
cfe041c7ac Fixed problem picking up policy jars in OpenJDK embedded build 2013-11-08 09:11:46 -06:00
Joshua Warner
fd81e126ef fix Dates test for openjdk and stub out java.util.TimeZone 2013-11-07 20:44:02 -07:00
Joshua Warner
3c1afdd272 make jni.h and avian/machine.h non-interfering 2013-11-07 19:15:31 -07:00
Joshua Warner
76b0bb4872 remove non-conforming ZipEntry.getJavaTime API and associated tests (which failed the openjdk build) 2013-11-07 19:13:13 -07:00
Joshua Warner
a54be0a381 fix openjdk build (add InnerClassReference to classpath-sources) 2013-11-07 18:52:11 -07:00
Joshua Warner
341929f92e Attribute <anonymous@example.com> to Anonymous, instead of Joel Dice 2013-11-06 19:39:05 -07:00
Joshua Warner
04ef32fa9c Merge pull request #100 from dscho/mailmap
Add a mailmap
2013-11-06 16:41:42 -08:00
Johannes Schindelin
24134e7004 Add a mailmap
... for proper statistics (I thought I was #5 contributor at the
time I started the mailmap, but I was only #6).

Unfortunately, I could not find the full name of Stan
<goo.in.my.shoes@gmail.com> for proper credit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-11-06 15:03:15 -06:00
Joshua Warner
dd460ab55e Merge pull request #99 from dscho/fix-get-annotation
Fix NPE in Field#getAnnotation
2013-11-06 09:03:45 -08:00
Joshua Warner
42651da0b2 Merge pull request #96 from dscho/filter-input-stream
Filter input stream
2013-11-06 09:02:57 -08:00
Joshua Warner
4cf3d9de88 Merge pull request #95 from dscho/compatible-serialization
Java-compatible (de)serialization of TreeMap, ArrayList and Number
2013-11-06 09:02:12 -08:00
Joshua Warner
d0d4f600dc Merge pull request #94 from dscho/serialization
Implement Java-compatible serialization
2013-11-06 08:49:14 -08:00