Implementing Thread.getStackTrace is tricky. A thread may interrupt
another thread at any time to grab a stack trace, including while the
latter is executing Java code, JNI code, helper thunks, VM code, or
while transitioning between any of these.
To create a stack trace we use several context fields associated with
the target thread, including snapshots of the instruction pointer,
stack pointer, and frame pointer. These fields must be current,
accurate, and consistent with each other in order to get a reliable
trace. Otherwise, we risk crashing the VM by trying to walk garbage
stack frames or by misinterpreting the size and/or content of
legitimate frames.
This commit addresses sensitive transition points such as entering the
helper thunks which bridge the transitions from Java to native code
(where we must save the stack and frame registers for use from native
code) and stack unwinding (where we must atomically update the thread
context fields to indicate which frame we are unwinding to). When
grabbing a trace for another thread, we determine what kind of code we
caught the thread executing in and use that information to choose the
thread context values with which to begin the trace. See
MyProcessor::getStackTrace::Visitor::visit for details.
In order to atomically update the thread context fields, we do the
following:
1. Create a temporary "transition" object to serve as a staging area
and populate it with the new field values.
2. Update a transition pointer in the thread object to point to the
object created above. As long as this pointer is non-null,
interrupting threads will use the context values in the staging
object instead of those in the thread object.
3. Update the fields in the thread object.
4. Clear the transition pointer in the thread object.
We use a memory barrier between each of these steps to ensure they are
made visible to other threads in program order. See
MyThread::doTransition for details.
We were generating code which clobbered the data we were putting into
64-bit volatile fields (and potentially also clobbering the target or
source object in the case of non-static fields) due to misplaced
synchronization code. Reordering this code ensures that both the data
and the target or source survive across calls to synchronization
helper functions.
Previously, the stack frame mapping code (responsible for statically
calculating the map of GC roots for a method's stack frame during JIT
compilation) would assume that the map of GC roots on entry to an
exception handler is the same as on entry to the "try" block which the
handler is attached to. Technically, this is true, but the algorithm
we use does not consider whether a local variable is still "live"
(i.e. will be read later) when calculating the map - only whether we
can expect to find a reference there via normal (non-exceptional)
control flow. This can backfire if, within a "try" block, the stack
location which held an object reference on entry to the block gets
overwritten with a non-reference (i.e. a primitive). If an exception
is later thrown from such a block, we might end up trying to treat
that non-reference as a reference during GC, which will crash the VM.
The ideal way to fix this is to calculate the true interval for which
each value is live and use that to produce the stack frame maps. This
would provide the added benefit of ensuring that the garbage collector
does not visit references which, although still present on the stack,
will not be used again.
However, this commit uses the less invasive strategy of ANDing
together the root maps at each GC point within a "try" block and using
the result as the map on entry to the corresponding exception
handler(s). This should give us safe, if not optimal, results. Later
on, we can refine it as described above.
We were miscompiling methods which contained getfield, getstatic,
putfield, or putstatic instructions for volatile 64-bit primitives on
32-bit PowerPC due to not noticing that values in registers are clobbered
across function calls.
The solution is to create a separate Compiler::Operand instance for each
object monitor reference before and after the function call to avoid
confusing the compiler. To avoid duplicate entries in the constant pool,
we add code look for and, if found, reuse any existing entry for the same
constant.
Before allocating a new reference in NewGlobalReference or when
creating a local reference, we look for a previously-allocated
reference pointing to the same object. This is a linear search, but
usually the number of elements in the reference list is small, whereas
the memory, locking, and allocation overhead of creating duplicate
references can be large.
We need to check to see if we caught the thread somewhere in the thunk
code (i.e. about to call a helper function), in which case the stack
and base pointers are valid and may be used to create an accurate
trace.
Previously, we only visited frame locations containing values, but
this invited the possibility of reusing the same site for two
locations in some cases.
This allows the assembler to see the operand types of the comparison
and the condition for jumping in the same operation, which is
essential for generating efficient code in cases such as
multiple-precision compare-and-branch.