The first problem was that, on x86, we failed to properly keep track
of whether to expect the return address to be on the stack or not when
unwinding through a frame. We were relying on a "stackLimit" pointer
to tell us whether we were looking at the most recently-called frame
by comparing it with the stack pointer for that frame. That was
inaccurate in the case of a thread executing at the beginning of a
method before a new frame is allocated, in which case the most recent
two frames share a stack pointer, confusing the unwinder. The
solution involves keeping track of how many frames we've looked at
while walking the stack.
The other problem was that compareIpToMethodBounds assumed every
method was followed by at least one byte of padding before the next
method started. That assumption was usually valid because we were
storing the size following method code prior to the code itself.
However, the last method of an AOT-compiled code image is not followed
by any such method header and may instead be followed directly by
native code with no intervening padding. In that case, we risk
interpreting that native code as part of the preceding method, with
potentially bizarre results.
The reason for the compareIpToMethodBounds assumption was that methods
which throw exceptions as their last instruction generate a
non-returning call, which nonetheless push a return address on the
stack which points past the end of the method, and the unwinder needs
to know that return address belongs to that method. A better solution
is to add an extra trap instruction to the end of such methods, which
is what this patch does.
This fixes a number of bugs concerning cross-architecture bootimage
builds involving diffent endianesses. There will be more work to do
before it works.
The previous code did not take into account any padding embedded in a
basic block due to inline jump tables, which led to invalid code
generation in large methods
Due to encoding limitations, the immediate operand of conditional
branches can be no more than 32KB forward or backward. Since the
JIT-compiled form of some methods can be larger than 32KB, and we also
do conditional jumps to code outside the current method in some cases,
we must work around this limitation.
The strategy of this commit is to provide inline, intermediate jump
tables where necessary. A given conditional branch whose target is
too far for a direct jump will instead point to an unconditional
branch in the nearest jump table which points to the actual target.
Unconditional immediate branches are also limited on PowerPC, but this
limit is 32MB, which is not an impediment in practice. If it does
become a problem, we'll need to encode such branches using multiple
instructions.
On PowerPC and ARM, we can't rely on the return address having already
been saved on the stack on entry to a thunk, so we must look for it in
the link register instead.
We can't rely on the C++ compiler to save the return address in a
known location on entry to each function we might call from Java
(although GCC 4.5 seems to do so consistently, which is why I hadn't
realized the unwinding code was relying on that assumption), so we
must store it explicitly in MyThread::ip in each thunk. For PowerPC
and x86, we continue saving it on the stack as always, since the
calling convention guarantees its location relative to the stack
pointer.
We have to be careful about how we calculate return addresses on ARM
due to padding introduced by constant pools interspersed with code.
When calculating the offset of code where we're inserting a constant
pool, we want the offset of the end of the pool for jump targets, but
we want the offset just prior to the beginning of the pool (i.e. the
offset of the instruction responsible for jumping past the pool) when
calculating a return address.
Previously, loading an arbitrary 32-bit constant required up to four
instructions (128 bytes), since we did so one byte at a time via
immediate-mode operations.
The preferred way to load constants on ARM is via PC-relative
addressing, but this is challenging because immediate memory offsets
are limited to 4096 bytes in either direction. We frequently need to
compile methods which are larger than 4096, or even 8192, bytes, so we
must intersperse code and data if we want to use PC-relative loads
everywhere.
This commit enables pervasive PC-relative loads by handling the
following cases:
1. Method is shorter than 4096 bytes: append data table to end
2. Method is longer than 4096 bytes, but no basic block is longer
than 4096 bytes: insert data tables as necessary after blocks, taking
care to minimize the total number of tables
3. Method is longer than 4096 bytes, and some blocks are longer than
4096 bytes: split large basic blocks and insert data tables as above
We've been getting away with not doing this so far since our Java
calling convention matches the native calling convention concerning
where the return address is saved, so when our thunk calls native code
it gets saved for us automatically. However, there was still the
danger that a thread would interrupt another thread after the stack
pointer was saved to the thread field but before the native code was
called and try to get a stack trace, at which point it would try to
find the return address relative to that stack pointer and find
garbage instead. This commit ensures that we save the return address
before saving the stack pointer to avoid such a situation.
The shiftLeftC function in powerpc.cpp was miscompiling such shifts,
leading to crashes due to illegal instructions and other weirdness due
to instructions that meant something completely different. This
commit fixes that and adds a test to Longs.java to make sure it stays
fixed.
The SingleRead::successor field is used (when non-null) to further
constrain the SiteMask in SingleRead::intersect based on reads of
successor values (as in the cases of moves and condensed-addressing
combine and translate instructions).