This fixes the tails=true build (at least for x86_64) and eliminates
the need for a frame table in the tails=false build. In the
tails=true build, we still need a frame table on x86(_64) to help
determine whether we've caught a thread executing code to do a tail
call or pop arguments off the stack. However, I've not yet written
the code to actually use this table, and it is only needed to handle
asynchronous unwinds via Thread.getStackTrace.
This is necessary to accomodate classes loaded at runtime which refer
to primitive array types. Otherwise, they won't be included unless
classes in the bootimage refer to them.
When loading a class which extends another class that contained a
field of primitive array type using defineClass in a bootimage=true
build, the VM was unable to find the primitive array class, and
makeArrayClass refused to create one since it should already have
existed.
The problem was that the bootimage=true build uses an empty
Machine::BootstrapClassMap, and resolveArrayClass expected to find the
primitive array classes there. The fix is to check the
Machine::BootLoader map if we can't find it in
Machine::BootstrapClassMap.
Previously, we unwound the stack by following the chain of frame
pointers for normal returns, stack trace creation, and exception
unwinding. On x86, this required reserving EBP/RBP for frame pointer
duties, making it unavailable for general computation and requiring
that it be explicitly saved and restored on entry and exit,
respectively.
On PowerPC, we use an ABI that makes the stack pointer double as a
frame pointer, so it doesn't cost us anything. We've been using the
same convention on ARM, but it doesn't match the native calling
convention, which makes it unusable when we want to call native code
from Java and pass arguments on the stack.
So far, the ARM calling convention mismatch hasn't been an issue
because we've never passed more arguments from Java to native code
than would fit in registers. However, we must now pass an extra
argument (the thread pointer) to e.g. divideLong so it can throw an
exception on divide by zero, which means the last argument must be
passed on the stack. This will clobber the linkage area we've been
using to hold the frame pointer, so we need to stop using it.
One solution would be to use the same convention on ARM as we do on
x86, but this would introduce the same overhead of making a register
unavailable for general use and extra code at method entry and exit.
Instead, this commit removes the need for a frame pointer. Unwinding
involves consulting a map of instruction offsets to frame sizes which
is generated at compile time. This is necessary because stack trace
creation can happen at any time due to Thread.getStackTrace being
called by another thread, and the frame size varies during the
execution of a method.
So far, only x86(_64) is working, and continuations and tail call
optimization are probably broken. More to come.
This is an attempt to reproduce an issue reported on the discussion
group. However, the current form of the test is passing, so further
work will be necessary to trigger the bug.
As reported on the discussion group, there is a problem with the
ClassLoader.defineClass implementation sunch that this test is not
currently passing, at least for the mode=debug and bootimage=true
builds. I plan to address these failures soon, but I wanted to add a
test first to make sure I could reproduce them.
This rather large commit modifies the VM to use non-local returns to
throw exceptions instead of simply setting Thread::exception and
returning frame-by-frame as it used to. This has several benefits:
* Functions no longer need to check Thread::exception after each call
which might throw an exception (which would be especially tedious
and error-prone now that any function which allocates objects
directly or indirectly might throw an OutOfMemoryError)
* There's no need to audit the code for calls to functions which
previously did not throw exceptions but later do
* Performance should be improved slightly due to both the reduced
need for conditionals and because undwinding now occurs in a single
jump instead of a series of returns
The main disadvantages are:
* Slightly higher overhead for entering and leaving the VM via the
JNI and JDK methods
* Non-local returns can make the code harder to read
* We must be careful to register destructors for stack-allocated
resources with the Thread so they can be called prior to a
non-local return
The non-local return implementation is similar to setjmp/longjmp,
except it uses continuation-passing style to avoid the need for
cooperation from the C/C++ compiler. Native C++ exceptions would have
also been an option, but that would introduce a dependence on
libstdc++, which we're trying to avoid for portability reasons.
Finally, this commit ensures that the VM throws an OutOfMemoryError
instead of aborting when it reaches its memory ceiling. Currently, we
treat the ceiling as a soft limit and temporarily exceed it as
necessary to allow garbage collection and certain internal allocations
to succeed, but refuse to allocate any Java objects until the heap
size drops back below the ceiling.
There is a delay between when we tell the OS to start a thread and
when it actually starts, and during that time a thread might
mistakenly think it was the last to exit, try to shut down the VM, and
then block in joinAll when it finds it wasn't the last one after all.
The solution is to increment Machine::liveCount and add the new thread
to the process tree before starting it -- all while holding
Machine::stateLock for atomicity. This helps guarantee that when
liveCount is one, we can be sure there's really only one thread
running or staged to run.
If we don't do this, the VM will crash when it tries to create a stack
trace for the error because makeObjectArray will return null
immediately when it sees there is a pending exception.
GCC 4.5.1 and later use a naming convention where functions are not
prefixed with an underscore, whereas previous versions added the
underscore. This change was made to ensure compatibility with
Microsoft's compiler. Since GCC 4.5.0 has a serious code generation
bug, we now only support later versions, so it makes sense to assume
the newer convention.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
There was an unlikely but dangerous race condition in monitorRelease
such that when a thread released a monitor and then tried to notify
the next thread in line, the latter thread might exit before it can be
notified. This potentially led to a crash as the former thread tried
to acquire and notify the latter thread's private lock after it had
been disposed.
The solution is to do as we do in the interrupt and join cases: call
acquireSystem first and thereby either block the target thread from
exiting until we're done or find that it has already exited, in which
case nothing needs to be done.
I also looked at monitorNotify to see if we have a similar bug there,
but in that case the target thread can't exit without first acquiring
and releasing the monitor, and since we ensure that no thread can
execute monitorNotify without holding the monitor, there's no
potential for a race.