This is necessary to avoid name conflicts on various platforms. For
example, iOS has its own util.h, and Windows has a process.h. By
including our version as e.g. "avian/util.h", we avoid confusion with
the system version.
The eventual intent with the lir namespace is to formalize some of
the important bits of Assembler interface, to be tested, debug-printed,
and potentially, serialized.
Also, group arguments to apply(...) in OperandInfos
The primary motivation behind this is to allow all the different Assemblers
to be built at once, on a single machine. This should dramatically reduce
the time required to make sure that a particular change doesn't break
the build for one of the not-so-common architectures (arm, powerpc)
Simply pass "codegen-targets=all" to make to compile all
src/codegen/<arch>/assembler.cpp.
Note that while these architectures are built, they will not be fully-
functional. Certain stuff is assumed to be the same across the entire
build (such as TargetBytesPerWord), but this isn't the case anymore.
In order to calculate the initial stack map of GC roots for an
exception handler, we do a logical "and" of maps across all the
instructions contained in the try block for that handler. This is
complicated by the presence of jsr/ret instructions, though, because
instructions in a subroutine may have multiple maps associated with
them corresponding to all the paths from which execution might flow to
them.
The bug in this case was that we were using an uninitialized map in
our calculation, resulting in a map with no GC roots at all. By the
time the map was initialized, the damage had already been done. The
solution is to treat an uninitialized map as if it has roots at all
positions so that it has no effect on the calculation until it has
been initialized with real data.
Some OSes (notably, Windows CE) restrict the size of the call stack
such that recursive compilation of branch instructions can lead to
stack overflow in methods with large numbers of such instructions. In
fact, a worst-case method could even lead to overflow when the stack
size limit is relatively generous.
The solution is to convert this recursion into iteration with an
explicit stack to maintain state about alternate paths through each
branch.
We weren't adding entries to the frame map for calls to the instanceof
thunk when compiling methods. However, that thunk may trigger a GC,
in which case we'll need to unwind the stack, which will lead to a
crash if we don't have a frame map entry for that instruction.
Java requires that NaNs be converted to zero and that numbers at or
beyond the limits of integer representation be clamped to the largest
or smallest value that can be represented, respectively.
The existing code handled such odd switch statements correctly in the
JIT case, but did the wrong thing for the AOT case, leading to an
assertion failure later on.
4512a9a introduced a new ArgumentList constructor which was handling
some types incorrectly (e.g. implicitly converting floats to
integers). This commit fixes it.
Our Thread.getStackTrace implementation is tricky because it might be
invoked on a thread executing arbitrary native or Java code, and there
are numerous edge cases to consider. Unsurprisingly, there were a few
lingering, non-fatal bugs revealed by Valgrind recently, one involving
the brief interval just before and after returning from invokeNative,
and the other involving an off-by-one error in x86.cpp's nextFrame
implementation. This commit fixes both.
If a class references a field or method as static and we find it's
actually non-static -- or vice-versa -- we ought to throw an error
rather than abort.
The first problem was that, on x86, we failed to properly keep track
of whether to expect the return address to be on the stack or not when
unwinding through a frame. We were relying on a "stackLimit" pointer
to tell us whether we were looking at the most recently-called frame
by comparing it with the stack pointer for that frame. That was
inaccurate in the case of a thread executing at the beginning of a
method before a new frame is allocated, in which case the most recent
two frames share a stack pointer, confusing the unwinder. The
solution involves keeping track of how many frames we've looked at
while walking the stack.
The other problem was that compareIpToMethodBounds assumed every
method was followed by at least one byte of padding before the next
method started. That assumption was usually valid because we were
storing the size following method code prior to the code itself.
However, the last method of an AOT-compiled code image is not followed
by any such method header and may instead be followed directly by
native code with no intervening padding. In that case, we risk
interpreting that native code as part of the preceding method, with
potentially bizarre results.
The reason for the compareIpToMethodBounds assumption was that methods
which throw exceptions as their last instruction generate a
non-returning call, which nonetheless push a return address on the
stack which points past the end of the method, and the unwinder needs
to know that return address belongs to that method. A better solution
is to add an extra trap instruction to the end of such methods, which
is what this patch does.
Scala occasionally generates exception handler tables with interval
bounds which fall outside the range of valid bytecode indexes, so we
must clamp them or risk out-of-bounds array accesses.
Floats are implicitly promoted to doubles when passed as part of a
variable-length argument list, so we can't treat them the same way as
32-bit integers.
Until now, the bootimage build hasn't supported using the Java
invocation API to create a VM, destroy it, and create another in the
same process. Ideally, we would be able to create multiple VMs
simultaneously without any interference between them. In fact, Avian
is designed to support this for the most part, but there are a few
places we use global, mutable state which prevent this from working.
Most notably, the bootimage is modified in-place at runtime, so the
best we can do without extensive changes is to clean up the bootimage
when the VM is destroyed so it's ready for later instances. Hence
this commit.
Ultimately, we can move towards a fully reentrant VM by making the
bootimage immutable, but this will require some care to avoid
performance regressions. Another challenge is our Posix signal
handlers, which currently rely on a global handle to the VM, since you
can't, to my knowledge, pass a context pointer when registering a
signal handler. Thread local variables won't necessarily help, since
a thread might attatch to more than one VM at a time.
This reverts commit 88d614eb25.
It turns out we still need separate sets of thunks for AOT-compiled
and JIT-compiled code to ensure we can always generate efficient jumps
and calls to thunks on architectures such as ARM and PowerPC, whose
relative jumps and calls have limited ranges.
Now that the AOT-compiled code image is position-independent, there is
no further need for this distinction. In fact, it was harmful,
because we were still using runtime-generated thunks when we should
have been using the ones in the code image. This resulted in
EXC_BAD_ACCESS errors on non-jailbroken iOS devices.
This avoids the requirement of putting the code image in a
section/segment which is both writable and executable, which is good
for security and avoids trouble with systems like iOS which disallow
such things.
The implementation relies on relative addressing such that the offset
of the desired address is fixed as a compile-time constant relative to
the start of the memory area of interest (e.g. the code image, heap
image, or thunk table). At runtime, the base pointer to the memory
area is retrieved from the thread structure and added to the offset to
compute the final address. Using the thread pointer allows us to
generate read-only, position-independent code while avoiding the use
of IP-relative addressing, which is not available on all
architectures.
This fixes a number of bugs concerning cross-architecture bootimage
builds involving diffent endianesses. There will be more work to do
before it works.
This monster commit is the first step towards supporting
cross-architecture bootimage builds. The challenge is to build a heap
and code image for the target platform where the word size and
endianess may differ from those of the build architecture. That means
the memory layout of objects may differ due to alignment and size
differences, so we can't just copy objects into the heap image
unchanged; we must copy field by field, resizing values, reversing
endianess and shifting offsets as necessary.
This commit also removes POD (plain old data) type support from the
type generator because it added a lot of complication and little
value.
We must throw an AbstractMethodError when such a call is executed (not
when the call is compiled), so we compile this case as a call to a
thunk which throws such an error.
sun.font.FontManager.initIDs is a native method defined in
libfontmanager.so, yet there seems to be no mechanism in OpenJDK's
class library to actually load that library, so we lazily load it
before trying to resolve the method.
As described in commit 36aa0d6, apps such as jython which generate
bytecode dynamically can produce patterns of bytecode for which the
VM's compiler could not handle properly. However, that commit
introduced a regression and had to be partially reverted.
It turns out the real problem was the call to Compiler::restoreState
which we made before checking whether we were actually ready to
compile the exception handler (we delay compiling an exception handler
until and unless the try/catch block it serves has been compiled so we
can calculate the stack maps properly). That confused the compiler in
rare cases, so we now only call restoreState once we're actually ready
to compile the handler.
My last commit introduced a regression in JIT compilation of
subroutines. This reverts the specific change which caused the
regression. Further work will be needed to address the case which
that change was intended to fix (namely, exception handlers which
apply to multiple try/catch blocks).
Bytecode generated by compilers other than javac or ecj (such as
jython's dynamically generated classes) can contain unreachable code
and exception handlers which apply to more than one try/catch scope.
Previously, the VM's JIT compiler did not handle either of these cases
well, hence this commit.
Also, assume any class which has an ancestor class which has a static
initializer needs initialization even if it doesn't have one itself,
per the Java Language Spec.
OpenJDK's sun.reflect.MethodAccessorGenerator can generate
invokevirtual calls to private methods (which we normally consider
non-virtual); we must compile them as non-virtual calls since they
aren't in the vtable.
It turns out commit 31eb047 was too aggressive and led to incorrect
calculation of line numbers for machine addresses, as well as
potentially incorrect exception handler scope calculation. This fixes
the regression.
I recently encountered a Batik JAR with a method containing a
redundant goto which confused the JIT compiler because it was refered
to in the exception handler and line number tables despite being
unreachable. I don't know how such code was generated, but this
commit ensures the compiler can handle it.
We can't blindly try release the monitors for all synchronized methods
when unwinding the stack since we may not have finished acquiring the
most recent one when the exception was thrown.
Big applications can exceed the 16MB limit we previously used.
Increasing this above 30MB (if/when desired) will require changes to
the ARM and PowerPC JIT code to work around immediate branch encoding
limits on those platforms,
Unlike the interpreter, the JIT compiler tries to resolve all the
symbols referenced by a method when compiling that method. However,
this can backfire if a symbol cannot be resolved: we end up throwing
an e.g. NoClassDefFoundError for code which may never be executed.
This is particularly troublesome for code which supports multiple
APIs, choosing one at runtime.
The solution is to defer to stub code for symbols which can't be
resolved at JIT compile time. Such a stub will try again at runtime
to resolve the needed symbol and throw an appropriate error if it
still can't be found.
Due to encoding limitations, the immediate operand of conditional
branches can be no more than 32KB forward or backward. Since the
JIT-compiled form of some methods can be larger than 32KB, and we also
do conditional jumps to code outside the current method in some cases,
we must work around this limitation.
The strategy of this commit is to provide inline, intermediate jump
tables where necessary. A given conditional branch whose target is
too far for a direct jump will instead point to an unconditional
branch in the nearest jump table which points to the actual target.
Unconditional immediate branches are also limited on PowerPC, but this
limit is 32MB, which is not an impediment in practice. If it does
become a problem, we'll need to encode such branches using multiple
instructions.
On PowerPC and ARM, we can't rely on the return address having already
been saved on the stack on entry to a thunk, so we must look for it in
the link register instead.
My previous attempt at this was incomplete; it did not address
Java->native->Java->native call sequences, nor did it address
continuations. This commit takes care of both.
We can't rely on the C++ compiler to save the return address in a
known location on entry to each function we might call from Java
(although GCC 4.5 seems to do so consistently, which is why I hadn't
realized the unwinding code was relying on that assumption), so we
must store it explicitly in MyThread::ip in each thunk. For PowerPC
and x86, we continue saving it on the stack as always, since the
calling convention guarantees its location relative to the stack
pointer.
The stack mapping code was broken for cases of stack slots being
reused to hold primitives or addresses within subroutines after
previously being used to hold object references. We now bitwise "and"
the stack map upon return from the subroutine with the map as it
existed prior to calling the subroutine, which has the effect of
clearing map locations previously marked as GC roots where
appropriate.
It is dangerous to initiate a GC from a thunk like divideLong (which
was possible when allocating a new ArithmeticException to signal
divide-by-zero) since we don't currently generate a GC root frame map
for the return address of the thunk call. Instead, we use the backup
heap area if there is room, or else throw a pre-allocated exception
instead.
This fixes the tails=true build (at least for x86_64) and eliminates
the need for a frame table in the tails=false build. In the
tails=true build, we still need a frame table on x86(_64) to help
determine whether we've caught a thread executing code to do a tail
call or pop arguments off the stack. However, I've not yet written
the code to actually use this table, and it is only needed to handle
asynchronous unwinds via Thread.getStackTrace.
Previously, we unwound the stack by following the chain of frame
pointers for normal returns, stack trace creation, and exception
unwinding. On x86, this required reserving EBP/RBP for frame pointer
duties, making it unavailable for general computation and requiring
that it be explicitly saved and restored on entry and exit,
respectively.
On PowerPC, we use an ABI that makes the stack pointer double as a
frame pointer, so it doesn't cost us anything. We've been using the
same convention on ARM, but it doesn't match the native calling
convention, which makes it unusable when we want to call native code
from Java and pass arguments on the stack.
So far, the ARM calling convention mismatch hasn't been an issue
because we've never passed more arguments from Java to native code
than would fit in registers. However, we must now pass an extra
argument (the thread pointer) to e.g. divideLong so it can throw an
exception on divide by zero, which means the last argument must be
passed on the stack. This will clobber the linkage area we've been
using to hold the frame pointer, so we need to stop using it.
One solution would be to use the same convention on ARM as we do on
x86, but this would introduce the same overhead of making a register
unavailable for general use and extra code at method entry and exit.
Instead, this commit removes the need for a frame pointer. Unwinding
involves consulting a map of instruction offsets to frame sizes which
is generated at compile time. This is necessary because stack trace
creation can happen at any time due to Thread.getStackTrace being
called by another thread, and the frame size varies during the
execution of a method.
So far, only x86(_64) is working, and continuations and tail call
optimization are probably broken. More to come.
This rather large commit modifies the VM to use non-local returns to
throw exceptions instead of simply setting Thread::exception and
returning frame-by-frame as it used to. This has several benefits:
* Functions no longer need to check Thread::exception after each call
which might throw an exception (which would be especially tedious
and error-prone now that any function which allocates objects
directly or indirectly might throw an OutOfMemoryError)
* There's no need to audit the code for calls to functions which
previously did not throw exceptions but later do
* Performance should be improved slightly due to both the reduced
need for conditionals and because undwinding now occurs in a single
jump instead of a series of returns
The main disadvantages are:
* Slightly higher overhead for entering and leaving the VM via the
JNI and JDK methods
* Non-local returns can make the code harder to read
* We must be careful to register destructors for stack-allocated
resources with the Thread so they can be called prior to a
non-local return
The non-local return implementation is similar to setjmp/longjmp,
except it uses continuation-passing style to avoid the need for
cooperation from the C/C++ compiler. Native C++ exceptions would have
also been an option, but that would introduce a dependence on
libstdc++, which we're trying to avoid for portability reasons.
Finally, this commit ensures that the VM throws an OutOfMemoryError
instead of aborting when it reaches its memory ceiling. Currently, we
treat the ceiling as a soft limit and temporarily exceed it as
necessary to allow garbage collection and certain internal allocations
to succeed, but refuse to allocate any Java objects until the heap
size drops back below the ceiling.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
When trying to create an array class, we try to resolve
java.lang.Object so we can use its vtable in the array class.
However, if Object is missing, we'll try to create and throw a
ClassNotFoundException, which requires creating an array to store the
stack trace, which requires creating an array class, which requires
resolving Object, etc.. This commit short-circuits this process by
telling resolveClass not to create and throw an exception if it can't
find Object.
While doing the above work, I noticed that the implementations of
Classpath::makeThrowable in classpath-avian.cpp and
classpath-openjdk.cpp were identical, so I made makeThrowable a
top-level function.
Finally, I discovered that Thread.setDaemon can only be called before
the target thread has been started, which allowed me to simplify the
code to track daemon threads in the VM.
* add libnet.so and libnio.so to built-in libraries for openjdk-src build
* implement sun.misc.Unsafe.park/unpark
* implement JVM_SetClassSigners/JVM_GetClassSigners
* etc.
The main change here is to use a lazily-populated vector to associate
runtime data with classes instead of referencing them directly from
the class which requires updating immutable references in the heap
image. The other changes employ other strategies to avoid trying to
update immutable references.
Previously, loading an arbitrary 32-bit constant required up to four
instructions (128 bytes), since we did so one byte at a time via
immediate-mode operations.
The preferred way to load constants on ARM is via PC-relative
addressing, but this is challenging because immediate memory offsets
are limited to 4096 bytes in either direction. We frequently need to
compile methods which are larger than 4096, or even 8192, bytes, so we
must intersperse code and data if we want to use PC-relative loads
everywhere.
This commit enables pervasive PC-relative loads by handling the
following cases:
1. Method is shorter than 4096 bytes: append data table to end
2. Method is longer than 4096 bytes, but no basic block is longer
than 4096 bytes: insert data tables as necessary after blocks, taking
care to minimize the total number of tables
3. Method is longer than 4096 bytes, and some blocks are longer than
4096 bytes: split large basic blocks and insert data tables as above
This allows OpenJDK to access time zone data which is normally found
under java.home, but which we must embed in the executable itself to
create a self-contained build. The VM intercepts various file
operations, looking for paths which start with a prefix specified by
the avian.embed.prefix property and redirecting those operations to an
embedded JAR.
For example, if avian.embed.prefix is "/avian-embedded", and code
calls File.exists() with a path of
"/avian-embedded/javahomeJar/foo.txt", the VM looks for a function
named javahomeJar via dlsym, calls the function to find the memory
region containing the embeded JAR, and finally consults the JAR to see
if the file "foo.txt" exists.
As described in readme.txt, a standalone OpenJDK build embeds all
libraries, classes, and other files needed at runtime in the resulting
binary, eliminating dependencies on external resources.
The main changes in this commit ensure that we don't hold the global
class lock when doing class resolution using application-defined
classloaders. Such classloaders may do their own locking (in fact,
it's almost certain), making deadlock likely when mixed with VM-level
locking in various orders.
Other changes include a fix to avoid overflow when waiting for
extremely long intervals and a GC root stack mapping bug.
The biggest change in this commit is to split the system classloader
into two: one for boot classes (e.g. java.lang.*) and another for
application classes. This is necessary to make OpenJDK's security
checks happy.
The rest of the changes include bugfixes and additional JVM method
implementations in classpath-openjdk.cpp.
Whereas the GNU Classpath port used the strategy of patching Classpath
with core classes from Avian so as to minimize changes to the VM, this
port uses the opposite strategy: abstract and isolate
classpath-specific features in the VM similar to how we abstract away
platform-specific features in system.h. This allows us to use an
unmodified copy of OpenJDK's class library, including its core classes
and augmented by a few VM-specific classes in the "avian" package.
In order to facilitate making the VM compatible with multiple class
libraries, it's useful to separate the VM-specific representation of
these classes from the library implementations. This commit
introduces VMClass, VMField, and VMMethod for that purpose.
A long time ago, I refactored the class initialization code in the VM,
but did not notice until today that it had caused the
process=interpret build to break on certain recursive initializations.
In particular, we were not always detecting when a thread recursively
tried to initialize a class it was already in the process of
initializing, leading to the mistaken assumption that another thread
was initializing it and that we should wait until it was done, in
which case we would wait forever.
This commit ensures that we always detect recursive initialization and
short-circuit it.
If we catch the target thread in a virtual thunk when getting its
stack trace, we must assume its Thread::stack field is garbage and use
the register values instead. Previously, we treated these thunks as
any other native code, leading to crashes when we tried to use the
garbage pointer.
compileDirectInvoke does some magic to optimize tail calls to native
methods which involves storing the return address (which we'll never
actually return to, since it's a tail call) in a thread-local field so
the thunk function can figure out which native method to look up at
runtime. Since this address will change when the boot image is
loaded, the boot image creation code needs to know about it.
callContinuation failed to call the correct continuation when feeding
it an exception due to a regression introduced with the
Thread.getStackTrace changes.
It's not safe to use malloc from a signal handler, so we can't
allocate new memory when handling segfaults or Thread.getStackTrace
signals. Instead, we allocate a fixed-size backup heap for each
thread ahead of time and use it if there's no space left in the normal
heap pool. In the rare case that the backup heap isn't large enough,
we fall back to using a preallocated exception without a stack trace
as a last resort.
This function was broken in two different ways:
1. It only checked MyProcessor::thunks, not MyProcessor::bootThunks.
It needs to check both.
2. When checking MyProcessor::thunks, it used fields from
MyProcessor::bootThunks instead of from the same thunk collection.
This fixes both problems.
Implementing Thread.getStackTrace is tricky. A thread may interrupt
another thread at any time to grab a stack trace, including while the
latter is executing Java code, JNI code, helper thunks, VM code, or
while transitioning between any of these.
To create a stack trace we use several context fields associated with
the target thread, including snapshots of the instruction pointer,
stack pointer, and frame pointer. These fields must be current,
accurate, and consistent with each other in order to get a reliable
trace. Otherwise, we risk crashing the VM by trying to walk garbage
stack frames or by misinterpreting the size and/or content of
legitimate frames.
This commit addresses sensitive transition points such as entering the
helper thunks which bridge the transitions from Java to native code
(where we must save the stack and frame registers for use from native
code) and stack unwinding (where we must atomically update the thread
context fields to indicate which frame we are unwinding to). When
grabbing a trace for another thread, we determine what kind of code we
caught the thread executing in and use that information to choose the
thread context values with which to begin the trace. See
MyProcessor::getStackTrace::Visitor::visit for details.
In order to atomically update the thread context fields, we do the
following:
1. Create a temporary "transition" object to serve as a staging area
and populate it with the new field values.
2. Update a transition pointer in the thread object to point to the
object created above. As long as this pointer is non-null,
interrupting threads will use the context values in the staging
object instead of those in the thread object.
3. Update the fields in the thread object.
4. Clear the transition pointer in the thread object.
We use a memory barrier between each of these steps to ensure they are
made visible to other threads in program order. See
MyThread::doTransition for details.
We were generating code which clobbered the data we were putting into
64-bit volatile fields (and potentially also clobbering the target or
source object in the case of non-static fields) due to misplaced
synchronization code. Reordering this code ensures that both the data
and the target or source survive across calls to synchronization
helper functions.
Previously, the stack frame mapping code (responsible for statically
calculating the map of GC roots for a method's stack frame during JIT
compilation) would assume that the map of GC roots on entry to an
exception handler is the same as on entry to the "try" block which the
handler is attached to. Technically, this is true, but the algorithm
we use does not consider whether a local variable is still "live"
(i.e. will be read later) when calculating the map - only whether we
can expect to find a reference there via normal (non-exceptional)
control flow. This can backfire if, within a "try" block, the stack
location which held an object reference on entry to the block gets
overwritten with a non-reference (i.e. a primitive). If an exception
is later thrown from such a block, we might end up trying to treat
that non-reference as a reference during GC, which will crash the VM.
The ideal way to fix this is to calculate the true interval for which
each value is live and use that to produce the stack frame maps. This
would provide the added benefit of ensuring that the garbage collector
does not visit references which, although still present on the stack,
will not be used again.
However, this commit uses the less invasive strategy of ANDing
together the root maps at each GC point within a "try" block and using
the result as the map on entry to the corresponding exception
handler(s). This should give us safe, if not optimal, results. Later
on, we can refine it as described above.
We were miscompiling methods which contained getfield, getstatic,
putfield, or putstatic instructions for volatile 64-bit primitives on
32-bit PowerPC due to not noticing that values in registers are clobbered
across function calls.
The solution is to create a separate Compiler::Operand instance for each
object monitor reference before and after the function call to avoid
confusing the compiler. To avoid duplicate entries in the constant pool,
we add code look for and, if found, reuse any existing entry for the same
constant.
Before allocating a new reference in NewGlobalReference or when
creating a local reference, we look for a previously-allocated
reference pointing to the same object. This is a linear search, but
usually the number of elements in the reference list is small, whereas
the memory, locking, and allocation overhead of creating duplicate
references can be large.
We need to check to see if we caught the thread somewhere in the thunk
code (i.e. about to call a helper function), in which case the stack
and base pointers are valid and may be used to create an accurate
trace.