Since we use Thread::code to store a reference to either the method to
be invoked or the current bytecode being executed depending on the
context, we must be careful to switch it back to the bytecode of the
exception handler if an exception is thrown while invoking a method
(e.g. an UnsatisfiedLinkError).
There was a subtle bug in that we were not considering alignment
padding for fields defined in superclasses when calculating field
offsets for a derived class when the superclass(es) were visited by
the bootimage generator before the derived class.
Floats are implicitly promoted to doubles when passed as part of a
variable-length argument list, so we can't treat them the same way as
32-bit integers.
Apple's linker tends to remove functions which are never called, which
is not what we want for e.g. vmPrintTrace, since that function is only
intended to be called interactively from within GDB.
My previous attempt wasn't quite sufficient, since it was too late to
call join on a thread which had already exited given the code was
written to aggressively dispose of system handles as soon as the
thread exited. The solution is to delay disposing these handles until
after we're able to join the thread.
The bug here is that when a thread exits and becomes a "zombie", the
OS resources associated with it are not necessarily released until we
actually join and dispose of that thread. Since that only happens
during garbage collection, and collection normally only happens in
response to heap memory pressure, there's no guarantee that we'll GC
frequently enough to clean up zombies promptly and avoid running out
of resources.
The solution is to force a GC whenever we start a new thread and there
are at least N zombies waiting to be disposed, where N=16 for now.
We never define atomicCompareAndSwap64 for ARM or PowerPC, and
apparently only very recent ARM chips support it, so we must fall back
to synchronization-based emulation.
There were a couple of problems with the Avian_sun_misc_Unsafe_park
implementation in classpath-openjdk.cpp. First, the wait time should
be interpreted as milliseconds if absolute, but as nanoseconds
otherwise, whereas we were treating it as milliseconds in both cases.
Second, there was no mechanism to exit the while loop after the
specified time; the only way we could exit was via an unpark or
interrupt.
There was a subtle race condition in the VM shutdown process such that
a System::Thread would be disposed after the System instance it was
created under has been disposed, in which case doing a virtual call to
System::free with that instance would potentially cause a crash. The
solution is to just call the C library version of free directly, since
that's all System::free does.
On Ubuntu 11.10, the optimized build was breaking, apparently because
it was eliminating most of the symbols defined in assembly code
(e.g. vmJump) as unreachable when linking libjvm.so, which left
avian-dynamic unlinkable due to an unresolved symbol.
The solution in this commit is to export makeSystem and makeFinder
from libjvm.so rather than build redundant versions of finder.cpp and
posix.cpp/windows.cpp into avian-dynamic like we've been doing. This
avoids the whole problem of vmJump reachability and reduces the size
of avian-dynamic at the same time.
This commit also turns off LTO for the avian-dynamic link since we get
odd undefined symbol errors about libc-defined symbols otherwise.
This may merit future investigation, but avian-dynamic is so small and
simple that there's no need to optimize it anyway.
Until now, the bootimage build hasn't supported using the Java
invocation API to create a VM, destroy it, and create another in the
same process. Ideally, we would be able to create multiple VMs
simultaneously without any interference between them. In fact, Avian
is designed to support this for the most part, but there are a few
places we use global, mutable state which prevent this from working.
Most notably, the bootimage is modified in-place at runtime, so the
best we can do without extensive changes is to clean up the bootimage
when the VM is destroyed so it's ready for later instances. Hence
this commit.
Ultimately, we can move towards a fully reentrant VM by making the
bootimage immutable, but this will require some care to avoid
performance regressions. Another challenge is our Posix signal
handlers, which currently rely on a global handle to the VM, since you
can't, to my knowledge, pass a context pointer when registering a
signal handler. Thread local variables won't necessarily help, since
a thread might attatch to more than one VM at a time.
When the fourth argument is a 64-bit value on the Apple ARM ABI, it is
passed half by register and half on the stack, unlike on Linux where
it is passed entirely on the stack. The logic to handle this in arm.h
was flawed, and this commit fixes it.
This reverts commit 88d614eb25.
It turns out we still need separate sets of thunks for AOT-compiled
and JIT-compiled code to ensure we can always generate efficient jumps
and calls to thunks on architectures such as ARM and PowerPC, whose
relative jumps and calls have limited ranges.
Now that the AOT-compiled code image is position-independent, there is
no further need for this distinction. In fact, it was harmful,
because we were still using runtime-generated thunks when we should
have been using the ones in the code image. This resulted in
EXC_BAD_ACCESS errors on non-jailbroken iOS devices.
It seems that the Apple iOS Simulator's stat implementation writes
beyond the end of the struct stat we pass it, which can clobber
unrelated parts of the stack. Perhaps this is due to some kind of
header/library mismatch, but I've been unable to track it down so far.
The workaround is to give it 8 words more than it should need, where 8
is a number I just made up and seems to work.
This avoids the requirement of putting the code image in a
section/segment which is both writable and executable, which is good
for security and avoids trouble with systems like iOS which disallow
such things.
The implementation relies on relative addressing such that the offset
of the desired address is fixed as a compile-time constant relative to
the start of the memory area of interest (e.g. the code image, heap
image, or thunk table). At runtime, the base pointer to the memory
area is retrieved from the thread structure and added to the offset to
compute the final address. Using the thread pointer allows us to
generate read-only, position-independent code while avoiding the use
of IP-relative addressing, which is not available on all
architectures.
This fixes a number of bugs concerning cross-architecture bootimage
builds involving diffent endianesses. There will be more work to do
before it works.