I recently encountered a Batik JAR with a method containing a
redundant goto which confused the JIT compiler because it was refered
to in the exception handler and line number tables despite being
unreachable. I don't know how such code was generated, but this
commit ensures the compiler can handle it.
We must not allocate heap objects from doCollect, since it might
trigger a GC while one is already in progress, which can cause trouble
when we're still queuing up objects to finalize, among other things.
To avoid this, I've added extra fields to the finalizer and cleaner
types which we can use to link instances up during GC without
allocating new memory.
We can't blindly try release the monitors for all synchronized methods
when unwinding the stack since we may not have finished acquiring the
most recent one when the exception was thrown.
If we don't preallocate the memory we need to reacquire the lock after
we finish waiting, we risk an OOME which may unwind the stack into
code which assumes we still have acquire the lock successfully.
Instead of giving up when the backing allocator's tryAllocate method
returns null, we switch to the allocate method to show we mean
business. This makes use of zones more robust under low memory
situations since it allows us to exceed the soft memory ceiling when
the only alternative is to abort.
OpenJDK uses an alternative to Object.finalize for resource cleanup in
the form of sun.misc.Cleaner. Normally, OpenJDK's
java.lang.ref.Reference.ReferenceHandler thread handles this, calling
Cleaner.clean on any instances it finds in its "pending" queue.
However, Avian handles reference queuing internally, so it never
actually adds anything to that queue, so the VM must call
Cleaner.clean itself.
The main changes here are:
* fixes for runtime annotation support
* proper support for runtime generic type introspection
* throw NoClassDefFoundErrors instead of ClassNotFoundExceptions
where appropriate
It isn't necessarily safe or desireable to call the previous handler
even if it's non-null, so we ignore it entirely except to reinstate it
when unregistering our own handler.
Big applications can exceed the 16MB limit we previously used.
Increasing this above 30MB (if/when desired) will require changes to
the ARM and PowerPC JIT code to work around immediate branch encoding
limits on those platforms,
This commit ensures that we use the proper memory barriers or locking
necessary to preserve volatile semantics for such fields when accessed
or updated via JNI.
Unlike the interpreter, the JIT compiler tries to resolve all the
symbols referenced by a method when compiling that method. However,
this can backfire if a symbol cannot be resolved: we end up throwing
an e.g. NoClassDefFoundError for code which may never be executed.
This is particularly troublesome for code which supports multiple
APIs, choosing one at runtime.
The solution is to defer to stub code for symbols which can't be
resolved at JIT compile time. Such a stub will try again at runtime
to resolve the needed symbol and throw an appropriate error if it
still can't be found.
We were not always placing parameters in the correct stack positions
in the PowerPC implementations of dynamicCall and vmNativeCall. In
particular, the first stack slot used to hold a parameter depends on
the sizes and types of the preceding parameters which are passed in
registers.
This primarily required additions to classpath-openjdk.cpp to
intercept ZipFile, ZipEntry, and JarFile native methods to consult
embedded encryption policy jars when required.
It is possible to create an Exception with no stack trace by
overriding Throwable.fillInStackTrace, so we can't assume any given
instance will have one.
There was a race between these two functions such that one thread A
would run dispose on thread B just before thread B finishes exit, with
the result that Thread::lock and/or Thread::systemThread would be
disposed twice, resulting in a crash.
Due to encoding limitations, the immediate operand of conditional
branches can be no more than 32KB forward or backward. Since the
JIT-compiled form of some methods can be larger than 32KB, and we also
do conditional jumps to code outside the current method in some cases,
we must work around this limitation.
The strategy of this commit is to provide inline, intermediate jump
tables where necessary. A given conditional branch whose target is
too far for a direct jump will instead point to an unconditional
branch in the nearest jump table which points to the actual target.
Unconditional immediate branches are also limited on PowerPC, but this
limit is 32MB, which is not an impediment in practice. If it does
become a problem, we'll need to encode such branches using multiple
instructions.
The VM uses Integer and Long instances internally to wrap the results
of dynamic method invocations, but Method.invoke should use the
correct, specific type for the primitive (e.g. Character for char).
On PowerPC and ARM, we can't rely on the return address having already
been saved on the stack on entry to a thunk, so we must look for it in
the link register instead.
My previous attempt at this was incomplete; it did not address
Java->native->Java->native call sequences, nor did it address
continuations. This commit takes care of both.
This requires reducing HeapCapacity and CodeCapacity back to 128MB and
30MB respectively. I had set them to larger values to test
non-ProGuard'ed OpenJDK bootimage builds, which naturally needed a lot
more space. However, such builds aren't really useful in the real
world, and the compiler currently can't handle jumps or calls spanning
more than the maximum size of an immediate branch offset on ARM or
PowerPC, so I'm lowering them back down to more realistic values.
We can't rely on the C++ compiler to save the return address in a
known location on entry to each function we might call from Java
(although GCC 4.5 seems to do so consistently, which is why I hadn't
realized the unwinding code was relying on that assumption), so we
must store it explicitly in MyThread::ip in each thunk. For PowerPC
and x86, we continue saving it on the stack as always, since the
calling convention guarantees its location relative to the stack
pointer.
On Mac OS, signals sent using pthread_kill are never delivered if the
target thread is blocked (e.g. acquiring a lock or waiting on a
condition), so we can't rely on it and must use the Mach-specific
thread execution API instead to implement Thread.getStackTrace.
For Thread.interrupt, we must not only use pthread_kill but also
pthread_cond_signal to ensure the thread is woken up.
The stack mapping code was broken for cases of stack slots being
reused to hold primitives or addresses within subroutines after
previously being used to hold object references. We now bitwise "and"
the stack map upon return from the subroutine with the map as it
existed prior to calling the subroutine, which has the effect of
clearing map locations previously marked as GC roots where
appropriate.
It seems that older versions of GCC (4.0 and older, at least) generate
assembly files with duplicate symbols for function templates which
differ only by the attributes of the templated types. Newer versions
have no such problem, but we need to support both, hence the
workaround in this commit of using a dedicated, non-template "alias"
function where we previously used "cast<alias_t>".
We use a template function called "cast" to get raw access to fields
in in the VM. In particular, we use this function in util.cpp to
treat reference fields as intptr_t fields so we can use the least
significant bit as the red/black flag in red/black tree nodes.
Unfortunately, this runs afoul of the type aliasing rules in C/C++,
and the compiler is permitted to optimize in a way that assumes such
aliasing cannot occur. Such optimization caused all the nodes in the
tree to be black, leading to extremely unbalanced trees and thus slow
performance.
The fix in this case is to use the __may_alias__ attribute to tell the
compiler we're doing something devious. I've also used this technique
to avoid other potential aliasing problems. There may be others
lurking, so a complete audit of the VM might be a good idea.
The last two commits were meant to work around a supposed bug in
mingw-w64's dbghelp.h, but closer inspection reminded me that we're
not using dbghelp.h at all; legacy mingw doesn't have it, so we had to
declare the structures we needed ourselves based on the MSDN
documentation. What that documentation doesn't mention is that
MINIDUMP_EXCEPTION_INFORMATION is subject to a special, packed layout,
which we must represent using the __packed__ attribute.
dbghelp.dll expects that MINIDUMP_EXCEPTION_INFORMATION has a packed
layout and will crash if it doesn't (at least on 64-bit systems), but
as of this writing mingw-w64's version is not declared to be so.
Hence this workaround.
Each platform and architecture defines the va_list type differently;
on some we can treat it as a pointer and on others we must treat it as
a non-pointer. Turns out Windows is one of the latter.
If a native method using the fast calling convention throws an
exception, we need to make sure the frame for that method is popped
before handling the exception.
We risked deadlock when waiting for other non-daemon threads to exit
since they could not exit without entering exclusive state, which
required waiting for all other threads to go idle.
If AVIAN_USE_FRAME_POINTER is not defined, the caller of vmInvoke will
calculate a frame size which assumes vmInvoke does not push rbp on the
stack before allocating the frame. However, vmInvoke pushes rbp
reguardless, so we need to adjust the frame size to ensure the stack
remains aligned.
That code was unused and will be unecessary until we add proper
support for unwinding through tail calls in nextFrame, at which point
it may be reinstated in some form.
It is dangerous to initiate a GC from a thunk like divideLong (which
was possible when allocating a new ArithmeticException to signal
divide-by-zero) since we don't currently generate a GC root frame map
for the return address of the thunk call. Instead, we use the backup
heap area if there is room, or else throw a pre-allocated exception
instead.
Like, PowerPC, ARM has an instruction cache which must be manually
flushed if/when we compile a new method. This commit updates
syncInstructionCache to use GCC's builtin __clear_cache routine.
This fixes the tails=true build (at least for x86_64) and eliminates
the need for a frame table in the tails=false build. In the
tails=true build, we still need a frame table on x86(_64) to help
determine whether we've caught a thread executing code to do a tail
call or pop arguments off the stack. However, I've not yet written
the code to actually use this table, and it is only needed to handle
asynchronous unwinds via Thread.getStackTrace.
This is necessary to accomodate classes loaded at runtime which refer
to primitive array types. Otherwise, they won't be included unless
classes in the bootimage refer to them.
When loading a class which extends another class that contained a
field of primitive array type using defineClass in a bootimage=true
build, the VM was unable to find the primitive array class, and
makeArrayClass refused to create one since it should already have
existed.
The problem was that the bootimage=true build uses an empty
Machine::BootstrapClassMap, and resolveArrayClass expected to find the
primitive array classes there. The fix is to check the
Machine::BootLoader map if we can't find it in
Machine::BootstrapClassMap.
Previously, we unwound the stack by following the chain of frame
pointers for normal returns, stack trace creation, and exception
unwinding. On x86, this required reserving EBP/RBP for frame pointer
duties, making it unavailable for general computation and requiring
that it be explicitly saved and restored on entry and exit,
respectively.
On PowerPC, we use an ABI that makes the stack pointer double as a
frame pointer, so it doesn't cost us anything. We've been using the
same convention on ARM, but it doesn't match the native calling
convention, which makes it unusable when we want to call native code
from Java and pass arguments on the stack.
So far, the ARM calling convention mismatch hasn't been an issue
because we've never passed more arguments from Java to native code
than would fit in registers. However, we must now pass an extra
argument (the thread pointer) to e.g. divideLong so it can throw an
exception on divide by zero, which means the last argument must be
passed on the stack. This will clobber the linkage area we've been
using to hold the frame pointer, so we need to stop using it.
One solution would be to use the same convention on ARM as we do on
x86, but this would introduce the same overhead of making a register
unavailable for general use and extra code at method entry and exit.
Instead, this commit removes the need for a frame pointer. Unwinding
involves consulting a map of instruction offsets to frame sizes which
is generated at compile time. This is necessary because stack trace
creation can happen at any time due to Thread.getStackTrace being
called by another thread, and the frame size varies during the
execution of a method.
So far, only x86(_64) is working, and continuations and tail call
optimization are probably broken. More to come.
This rather large commit modifies the VM to use non-local returns to
throw exceptions instead of simply setting Thread::exception and
returning frame-by-frame as it used to. This has several benefits:
* Functions no longer need to check Thread::exception after each call
which might throw an exception (which would be especially tedious
and error-prone now that any function which allocates objects
directly or indirectly might throw an OutOfMemoryError)
* There's no need to audit the code for calls to functions which
previously did not throw exceptions but later do
* Performance should be improved slightly due to both the reduced
need for conditionals and because undwinding now occurs in a single
jump instead of a series of returns
The main disadvantages are:
* Slightly higher overhead for entering and leaving the VM via the
JNI and JDK methods
* Non-local returns can make the code harder to read
* We must be careful to register destructors for stack-allocated
resources with the Thread so they can be called prior to a
non-local return
The non-local return implementation is similar to setjmp/longjmp,
except it uses continuation-passing style to avoid the need for
cooperation from the C/C++ compiler. Native C++ exceptions would have
also been an option, but that would introduce a dependence on
libstdc++, which we're trying to avoid for portability reasons.
Finally, this commit ensures that the VM throws an OutOfMemoryError
instead of aborting when it reaches its memory ceiling. Currently, we
treat the ceiling as a soft limit and temporarily exceed it as
necessary to allow garbage collection and certain internal allocations
to succeed, but refuse to allocate any Java objects until the heap
size drops back below the ceiling.
There is a delay between when we tell the OS to start a thread and
when it actually starts, and during that time a thread might
mistakenly think it was the last to exit, try to shut down the VM, and
then block in joinAll when it finds it wasn't the last one after all.
The solution is to increment Machine::liveCount and add the new thread
to the process tree before starting it -- all while holding
Machine::stateLock for atomicity. This helps guarantee that when
liveCount is one, we can be sure there's really only one thread
running or staged to run.
If we don't do this, the VM will crash when it tries to create a stack
trace for the error because makeObjectArray will return null
immediately when it sees there is a pending exception.
GCC 4.5.1 and later use a naming convention where functions are not
prefixed with an underscore, whereas previous versions added the
underscore. This change was made to ensure compatibility with
Microsoft's compiler. Since GCC 4.5.0 has a serious code generation
bug, we now only support later versions, so it makes sense to assume
the newer convention.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
There was an unlikely but dangerous race condition in monitorRelease
such that when a thread released a monitor and then tried to notify
the next thread in line, the latter thread might exit before it can be
notified. This potentially led to a crash as the former thread tried
to acquire and notify the latter thread's private lock after it had
been disposed.
The solution is to do as we do in the interrupt and join cases: call
acquireSystem first and thereby either block the target thread from
exiting until we're done or find that it has already exited, in which
case nothing needs to be done.
I also looked at monitorNotify to see if we have a similar bug there,
but in that case the target thread can't exit without first acquiring
and releasing the monitor, and since we ensure that no thread can
execute monitorNotify without holding the monitor, there's no
potential for a race.
In makeCodeImage, we were passing zero to Promise::Listener::resolve,
which would lead to an assertion error if the address of the code
image was further from the base of the address space (i.e. zero) than
could be spanned by a jump on the target architecture. Since, in this
context, we immediately overwrite the value stored, we may pass
whatever we want to this function (we're only calling it so we can
retrieve the location of the value in the image), and the code image
pointer is a better choice for the above reason.
When trying to create an array class, we try to resolve
java.lang.Object so we can use its vtable in the array class.
However, if Object is missing, we'll try to create and throw a
ClassNotFoundException, which requires creating an array to store the
stack trace, which requires creating an array class, which requires
resolving Object, etc.. This commit short-circuits this process by
telling resolveClass not to create and throw an exception if it can't
find Object.
While doing the above work, I noticed that the implementations of
Classpath::makeThrowable in classpath-avian.cpp and
classpath-openjdk.cpp were identical, so I made makeThrowable a
top-level function.
Finally, I discovered that Thread.setDaemon can only be called before
the target thread has been started, which allowed me to simplify the
code to track daemon threads in the VM.
We have to be careful about how we calculate return addresses on ARM
due to padding introduced by constant pools interspersed with code.
When calculating the offset of code where we're inserting a constant
pool, we want the offset of the end of the pool for jump targets, but
we want the offset just prior to the beginning of the pool (i.e. the
offset of the instruction responsible for jumping past the pool) when
calculating a return address.
The code added to runJavaThread was unecessary and harmful since it
allowed the global daemon thread count to become permanently
out-of-sync with the actual number of daemon threads.
This mainly involves some makefile ugliness to work around bugs in the
native Windows OpenJDK code involving conflicting static and
not-static declarations which GCC 4.0 and later justifiably reject but
MSVC tolerates.