Previously, loading an arbitrary 32-bit constant required up to four
instructions (128 bytes), since we did so one byte at a time via
immediate-mode operations.
The preferred way to load constants on ARM is via PC-relative
addressing, but this is challenging because immediate memory offsets
are limited to 4096 bytes in either direction. We frequently need to
compile methods which are larger than 4096, or even 8192, bytes, so we
must intersperse code and data if we want to use PC-relative loads
everywhere.
This commit enables pervasive PC-relative loads by handling the
following cases:
1. Method is shorter than 4096 bytes: append data table to end
2. Method is longer than 4096 bytes, but no basic block is longer
than 4096 bytes: insert data tables as necessary after blocks, taking
care to minimize the total number of tables
3. Method is longer than 4096 bytes, and some blocks are longer than
4096 bytes: split large basic blocks and insert data tables as above
This allows the assembler to see the operand types of the comparison
and the condition for jumping in the same operation, which is
essential for generating efficient code in cases such as
multiple-precision compare-and-branch.
We now use SingleRead::successor in pickTarget, where we use it to
determine the prefered target site for the successor without requiring
the target to conform to that preference. The previous code made the
preference a hard requirement, which is not desirable or even possible
in general.
We now create a unique thunk for each vtable position so as to avoid
relying on using the return address to determine what method is to be
compiled and invoked, since we will not have the correct return address
in the case of a tail call. This required refactoring how executable
memory is allocated in order to keep AOT compilation working. Also, we
must always use the same register to hold the class pointer when
compiling virtual calls, and ensure that the pointer stays there until
the call instruction is executed so we know where to find it in the
thunk.
This is important on the 32-bit OS X PowerPC ABI, since the location
of the low 32-bits of a return value change depending on whether the
entire value is 64-bits or not.
The trick is to make all destructors non-virtual. This is safe because
we never use the delete operator, which is the only case where virtual
destructors are relevant. This is a better solution than implementing
our own delete operator, because we want libraries loaded at runtime to
use the libstdc++ version, not ours.