We were generating code which clobbered the data we were putting into
64-bit volatile fields (and potentially also clobbering the target or
source object in the case of non-static fields) due to misplaced
synchronization code. Reordering this code ensures that both the data
and the target or source survive across calls to synchronization
helper functions.
Previously, the stack frame mapping code (responsible for statically
calculating the map of GC roots for a method's stack frame during JIT
compilation) would assume that the map of GC roots on entry to an
exception handler is the same as on entry to the "try" block which the
handler is attached to. Technically, this is true, but the algorithm
we use does not consider whether a local variable is still "live"
(i.e. will be read later) when calculating the map - only whether we
can expect to find a reference there via normal (non-exceptional)
control flow. This can backfire if, within a "try" block, the stack
location which held an object reference on entry to the block gets
overwritten with a non-reference (i.e. a primitive). If an exception
is later thrown from such a block, we might end up trying to treat
that non-reference as a reference during GC, which will crash the VM.
The ideal way to fix this is to calculate the true interval for which
each value is live and use that to produce the stack frame maps. This
would provide the added benefit of ensuring that the garbage collector
does not visit references which, although still present on the stack,
will not be used again.
However, this commit uses the less invasive strategy of ANDing
together the root maps at each GC point within a "try" block and using
the result as the map on entry to the corresponding exception
handler(s). This should give us safe, if not optimal, results. Later
on, we can refine it as described above.
See commit 8120bee4dc for the original
problem description and solution. That commit and a couple of related
ones had to be reverted when we found they had introduced GC-safety
regressions leading to crashes.
This commit restores the reverted code and fixes the regressions.
We're seeing race conditions which occasionally lead to assertion
failures and thus crashes, so I'm reverting these changes for now:
29309fb414e92674cb738120bee4dc
We don't want to check Thread::waiting until we have re-acquired the
monitor, since another thread might notify us between releasing
Thread::lock and acquiring the monitor.
We need to prefix instructions of the form "mov R,M" with a REX byte
when R is %spl, %bpl, %sil, or %dil. Such moves are unencodable on
32-bit x86, and, because of the order in which we pick registers,
pretty rare on 64-bit systems, which is why this took so long to
notice.
Due to SWT's nasty habit of creating a new object monitor for every
task added to Display.asyncExec, we've found that, on Windows at
least, we tend to run out of OS handles due to the large number of
mutexes we create between garbage collections.
One way to address this might be to trigger a GC when either the
number of monitors created since the last GC exceeds a certain number
or when the total number of monitors in the VM reaches a certain
number. Both of these risk hurting performance, especially if they
force major collections which would otherwise be infrequent. Also,
it's hard to know what the values of such thresholds should be on a
given system.
Instead, we reimplement Java monitors using atomic compare-and-swap
(CAS) and thread-specific native locks for blocking in the case of
contention. This way, we can create an arbitrary number of monitors
without creating any new native locks. The total number of native
locks needed by the VM is bounded instead by the number of live
threads plus a small constant.
Note that if we ever add support for an architecture which does not
support CAS, we'll need to provide a fallback monitor implementation.
We were miscompiling methods which contained getfield, getstatic,
putfield, or putstatic instructions for volatile 64-bit primitives on
32-bit PowerPC due to not noticing that values in registers are clobbered
across function calls.
The solution is to create a separate Compiler::Operand instance for each
object monitor reference before and after the function call to avoid
confusing the compiler. To avoid duplicate entries in the constant pool,
we add code look for and, if found, reuse any existing entry for the same
constant.
Currently, we just set this to /tmp (or the equivalent) since Avian
doesn't really have a home. This avoids a NullPointerException from
javax/xml/parsers/SAXParserFactory.
The latter is cheaper (avoids a state transition and possible memory
allocation) when we just want to know if an exception is thrown
without needing a handle to that exception.
Before allocating a new reference in NewGlobalReference or when
creating a local reference, we look for a previously-allocated
reference pointing to the same object. This is a linear search, but
usually the number of elements in the reference list is small, whereas
the memory, locking, and allocation overhead of creating duplicate
references can be large.
We need to check to see if we caught the thread somewhere in the thunk
code (i.e. about to call a helper function), in which case the stack
and base pointers are valid and may be used to create an accurate
trace.
If another thread succeeds in entering the "exclusive" state while we
use the fast path to transition the current thread to "active", we
must switch back to "idle" temporarily to allow the exclusive thread a
chance to continue, and then retry the transition to "active" via the
slow path.
These paths reduce contention among threads by using atomic operations
and memory barriers instead of mutexes where possible. This is
especially important for JNI calls, since each such call involves two
state transitions: from "active" to "idle" and back.
Previously, we assumed that the "context" parameter to
GetThreadContext was only an output parameter, but it actually uses at
the value of CONTEXT::ContextFlags on entry to decide what parts of
the structure to fill in. We were getting lucking most of the time,
because whatever garbage was on the stack at that location had the
necessary bits set. When we weren't so lucky, we got all zeros for
the register values which sometimes lead to a crash depending on the
state of the thread being examined.
Previously, we only visited frame locations containing values, but
this invited the possibility of reusing the same site for two
locations in some cases.
These warnings are due to GCC being smart enough to do interprocedural
constant propagation but not smart enough to avoid false positives in
all cases when looking for array bounds errors.
We use this utility instead of objcopy to embed data into object files
because it offers more control over e.g. section alignment, which is
important for bootimage builds.
This is necessary because objcopy does not currently allow us to
specify the alignment requirement for the .boot section used to store
the boot image for AOT builds. This may be a problem for Windows as
well, in which case we'll need to add a binaryToPE utility.
uniqueSite also checks, if applicable, to see if the second word of a
value shares the specified site with the first value as its sole site.
Also renamed a couple of fields in Value for clarity.
This allows the assembler to see the operand types of the comparison
and the condition for jumping in the same operation, which is
essential for generating efficient code in cases such as
multiple-precision compare-and-branch.