Previously, we unwound the stack by following the chain of frame
pointers for normal returns, stack trace creation, and exception
unwinding. On x86, this required reserving EBP/RBP for frame pointer
duties, making it unavailable for general computation and requiring
that it be explicitly saved and restored on entry and exit,
respectively.
On PowerPC, we use an ABI that makes the stack pointer double as a
frame pointer, so it doesn't cost us anything. We've been using the
same convention on ARM, but it doesn't match the native calling
convention, which makes it unusable when we want to call native code
from Java and pass arguments on the stack.
So far, the ARM calling convention mismatch hasn't been an issue
because we've never passed more arguments from Java to native code
than would fit in registers. However, we must now pass an extra
argument (the thread pointer) to e.g. divideLong so it can throw an
exception on divide by zero, which means the last argument must be
passed on the stack. This will clobber the linkage area we've been
using to hold the frame pointer, so we need to stop using it.
One solution would be to use the same convention on ARM as we do on
x86, but this would introduce the same overhead of making a register
unavailable for general use and extra code at method entry and exit.
Instead, this commit removes the need for a frame pointer. Unwinding
involves consulting a map of instruction offsets to frame sizes which
is generated at compile time. This is necessary because stack trace
creation can happen at any time due to Thread.getStackTrace being
called by another thread, and the frame size varies during the
execution of a method.
So far, only x86(_64) is working, and continuations and tail call
optimization are probably broken. More to come.
This rather large commit modifies the VM to use non-local returns to
throw exceptions instead of simply setting Thread::exception and
returning frame-by-frame as it used to. This has several benefits:
* Functions no longer need to check Thread::exception after each call
which might throw an exception (which would be especially tedious
and error-prone now that any function which allocates objects
directly or indirectly might throw an OutOfMemoryError)
* There's no need to audit the code for calls to functions which
previously did not throw exceptions but later do
* Performance should be improved slightly due to both the reduced
need for conditionals and because undwinding now occurs in a single
jump instead of a series of returns
The main disadvantages are:
* Slightly higher overhead for entering and leaving the VM via the
JNI and JDK methods
* Non-local returns can make the code harder to read
* We must be careful to register destructors for stack-allocated
resources with the Thread so they can be called prior to a
non-local return
The non-local return implementation is similar to setjmp/longjmp,
except it uses continuation-passing style to avoid the need for
cooperation from the C/C++ compiler. Native C++ exceptions would have
also been an option, but that would introduce a dependence on
libstdc++, which we're trying to avoid for portability reasons.
Finally, this commit ensures that the VM throws an OutOfMemoryError
instead of aborting when it reaches its memory ceiling. Currently, we
treat the ceiling as a soft limit and temporarily exceed it as
necessary to allow garbage collection and certain internal allocations
to succeed, but refuse to allocate any Java objects until the heap
size drops back below the ceiling.
There is a delay between when we tell the OS to start a thread and
when it actually starts, and during that time a thread might
mistakenly think it was the last to exit, try to shut down the VM, and
then block in joinAll when it finds it wasn't the last one after all.
The solution is to increment Machine::liveCount and add the new thread
to the process tree before starting it -- all while holding
Machine::stateLock for atomicity. This helps guarantee that when
liveCount is one, we can be sure there's really only one thread
running or staged to run.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
There was an unlikely but dangerous race condition in monitorRelease
such that when a thread released a monitor and then tried to notify
the next thread in line, the latter thread might exit before it can be
notified. This potentially led to a crash as the former thread tried
to acquire and notify the latter thread's private lock after it had
been disposed.
The solution is to do as we do in the interrupt and join cases: call
acquireSystem first and thereby either block the target thread from
exiting until we're done or find that it has already exited, in which
case nothing needs to be done.
I also looked at monitorNotify to see if we have a similar bug there,
but in that case the target thread can't exit without first acquiring
and releasing the monitor, and since we ensure that no thread can
execute monitorNotify without holding the monitor, there's no
potential for a race.
When trying to create an array class, we try to resolve
java.lang.Object so we can use its vtable in the array class.
However, if Object is missing, we'll try to create and throw a
ClassNotFoundException, which requires creating an array to store the
stack trace, which requires creating an array class, which requires
resolving Object, etc.. This commit short-circuits this process by
telling resolveClass not to create and throw an exception if it can't
find Object.
While doing the above work, I noticed that the implementations of
Classpath::makeThrowable in classpath-avian.cpp and
classpath-openjdk.cpp were identical, so I made makeThrowable a
top-level function.
Finally, I discovered that Thread.setDaemon can only be called before
the target thread has been started, which allowed me to simplify the
code to track daemon threads in the VM.
The code added to runJavaThread was unecessary and harmful since it
allowed the global daemon thread count to become permanently
out-of-sync with the actual number of daemon threads.
* add libnet.so and libnio.so to built-in libraries for openjdk-src build
* implement sun.misc.Unsafe.park/unpark
* implement JVM_SetClassSigners/JVM_GetClassSigners
* etc.
The main change here is to use a lazily-populated vector to associate
runtime data with classes instead of referencing them directly from
the class which requires updating immutable references in the heap
image. The other changes employ other strategies to avoid trying to
update immutable references.
My recent commit to ensure that OS resources are released immediately
upon thread exit introduced a race condition where interrupting or
joining a thread as it exited could lead to attempts to use
already-released resources. This commit adds locking to avoid the
race.
Previously, we waited until the next GC to do this, but that can be
too long for workloads which create a lot of short-lived threads but
don't do much allocation.
This allows OpenJDK to access time zone data which is normally found
under java.home, but which we must embed in the executable itself to
create a self-contained build. The VM intercepts various file
operations, looking for paths which start with a prefix specified by
the avian.embed.prefix property and redirecting those operations to an
embedded JAR.
For example, if avian.embed.prefix is "/avian-embedded", and code
calls File.exists() with a path of
"/avian-embedded/javahomeJar/foo.txt", the VM looks for a function
named javahomeJar via dlsym, calls the function to find the memory
region containing the embeded JAR, and finally consults the JAR to see
if the file "foo.txt" exists.
As described in readme.txt, a standalone OpenJDK build embeds all
libraries, classes, and other files needed at runtime in the resulting
binary, eliminating dependencies on external resources.
We now consult the JAVA_HOME environment variable to determine where
to find the system library JARs and SOs. Ultimately, we'll want to
support self-contained build, but this allows Avian to behave like a
conventional libjvm.so.
The main changes in this commit ensure that we don't hold the global
class lock when doing class resolution using application-defined
classloaders. Such classloaders may do their own locking (in fact,
it's almost certain), making deadlock likely when mixed with VM-level
locking in various orders.
Other changes include a fix to avoid overflow when waiting for
extremely long intervals and a GC root stack mapping bug.
The biggest change in this commit is to split the system classloader
into two: one for boot classes (e.g. java.lang.*) and another for
application classes. This is necessary to make OpenJDK's security
checks happy.
The rest of the changes include bugfixes and additional JVM method
implementations in classpath-openjdk.cpp.
Whereas the GNU Classpath port used the strategy of patching Classpath
with core classes from Avian so as to minimize changes to the VM, this
port uses the opposite strategy: abstract and isolate
classpath-specific features in the VM similar to how we abstract away
platform-specific features in system.h. This allows us to use an
unmodified copy of OpenJDK's class library, including its core classes
and augmented by a few VM-specific classes in the "avian" package.
In order to facilitate making the VM compatible with multiple class
libraries, it's useful to separate the VM-specific representation of
these classes from the library implementations. This commit
introduces VMClass, VMField, and VMMethod for that purpose.
Previously, we risked segfaults by passing negative numbers to memcpy.
This commit also makes arraycopy throw an IndexOutOfBounds exception
instead of an ArrayStoreException if the specified offsets and lengths
would take us outside the bounds of one or both of the arrays, per the
Sun documentation.
It's not safe to use malloc from a signal handler, so we can't
allocate new memory when handling segfaults or Thread.getStackTrace
signals. Instead, we allocate a fixed-size backup heap for each
thread ahead of time and use it if there's no space left in the normal
heap pool. In the rare case that the backup heap isn't large enough,
we fall back to using a preallocated exception without a stack trace
as a last resort.
See commit 8120bee4dc for the original
problem description and solution. That commit and a couple of related
ones had to be reverted when we found they had introduced GC-safety
regressions leading to crashes.
This commit restores the reverted code and fixes the regressions.
We're seeing race conditions which occasionally lead to assertion
failures and thus crashes, so I'm reverting these changes for now:
29309fb414e92674cb738120bee4dc
We don't want to check Thread::waiting until we have re-acquired the
monitor, since another thread might notify us between releasing
Thread::lock and acquiring the monitor.
Due to SWT's nasty habit of creating a new object monitor for every
task added to Display.asyncExec, we've found that, on Windows at
least, we tend to run out of OS handles due to the large number of
mutexes we create between garbage collections.
One way to address this might be to trigger a GC when either the
number of monitors created since the last GC exceeds a certain number
or when the total number of monitors in the VM reaches a certain
number. Both of these risk hurting performance, especially if they
force major collections which would otherwise be infrequent. Also,
it's hard to know what the values of such thresholds should be on a
given system.
Instead, we reimplement Java monitors using atomic compare-and-swap
(CAS) and thread-specific native locks for blocking in the case of
contention. This way, we can create an arbitrary number of monitors
without creating any new native locks. The total number of native
locks needed by the VM is bounded instead by the number of live
threads plus a small constant.
Note that if we ever add support for an architecture which does not
support CAS, we'll need to provide a fallback monitor implementation.
Before allocating a new reference in NewGlobalReference or when
creating a local reference, we look for a previously-allocated
reference pointing to the same object. This is a linear search, but
usually the number of elements in the reference list is small, whereas
the memory, locking, and allocation overhead of creating duplicate
references can be large.