Previously, we unwound the stack by following the chain of frame
pointers for normal returns, stack trace creation, and exception
unwinding. On x86, this required reserving EBP/RBP for frame pointer
duties, making it unavailable for general computation and requiring
that it be explicitly saved and restored on entry and exit,
respectively.
On PowerPC, we use an ABI that makes the stack pointer double as a
frame pointer, so it doesn't cost us anything. We've been using the
same convention on ARM, but it doesn't match the native calling
convention, which makes it unusable when we want to call native code
from Java and pass arguments on the stack.
So far, the ARM calling convention mismatch hasn't been an issue
because we've never passed more arguments from Java to native code
than would fit in registers. However, we must now pass an extra
argument (the thread pointer) to e.g. divideLong so it can throw an
exception on divide by zero, which means the last argument must be
passed on the stack. This will clobber the linkage area we've been
using to hold the frame pointer, so we need to stop using it.
One solution would be to use the same convention on ARM as we do on
x86, but this would introduce the same overhead of making a register
unavailable for general use and extra code at method entry and exit.
Instead, this commit removes the need for a frame pointer. Unwinding
involves consulting a map of instruction offsets to frame sizes which
is generated at compile time. This is necessary because stack trace
creation can happen at any time due to Thread.getStackTrace being
called by another thread, and the frame size varies during the
execution of a method.
So far, only x86(_64) is working, and continuations and tail call
optimization are probably broken. More to come.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
We have to be careful about how we calculate return addresses on ARM
due to padding introduced by constant pools interspersed with code.
When calculating the offset of code where we're inserting a constant
pool, we want the offset of the end of the pool for jump targets, but
we want the offset just prior to the beginning of the pool (i.e. the
offset of the instruction responsible for jumping past the pool) when
calculating a return address.
Compiling the entire OpenJDK class library into a bootimage revealed
some corner cases which broke the compiler, including synchronization
in a finally block and gotos targeting the first instruction of an
unsynchronized method.
Previously, loading an arbitrary 32-bit constant required up to four
instructions (128 bytes), since we did so one byte at a time via
immediate-mode operations.
The preferred way to load constants on ARM is via PC-relative
addressing, but this is challenging because immediate memory offsets
are limited to 4096 bytes in either direction. We frequently need to
compile methods which are larger than 4096, or even 8192, bytes, so we
must intersperse code and data if we want to use PC-relative loads
everywhere.
This commit enables pervasive PC-relative loads by handling the
following cases:
1. Method is shorter than 4096 bytes: append data table to end
2. Method is longer than 4096 bytes, but no basic block is longer
than 4096 bytes: insert data tables as necessary after blocks, taking
care to minimize the total number of tables
3. Method is longer than 4096 bytes, and some blocks are longer than
4096 bytes: split large basic blocks and insert data tables as above
Previously, we only visited frame locations containing values, but
this invited the possibility of reusing the same site for two
locations in some cases.
uniqueSite also checks, if applicable, to see if the second word of a
value shares the specified site with the first value as its sole site.
Also renamed a couple of fields in Value for clarity.
This allows the assembler to see the operand types of the comparison
and the condition for jumping in the same operation, which is
essential for generating efficient code in cases such as
multiple-precision compare-and-branch.