It seems that GCC 4.6.1 gets confused at LTO time when we take the
address of inline functions, so I'm switching them to non-inline
linkage to make it happy.
It seems that GCC 4.6.1 gets confused at LTO time when we take the
address of inline functions, so I'm switching them to non-inline
linkage to make it happy.
The JRE lib dir for OpenJDK 7 on OS X seems to be just "lib", not
e.g. "lib/amd64" by default, so we use that now. Also, the default
library compatibility version for libjvm.dylib is 0.0.0, but OpenJDK
wants 1.0.0, so we set it explicitly.
If we clear Thread::flags before releasing the thread mutex and
re-acquiring the monitor mutex, it's possible that we will be notified
between the release and re-acquire, which will confuse us later if we
try to wait on the same monitor again such that we well not remove
ourselves from the wait list because we think we've been removed by
the notifier.
The solution is to wait until we've acquired both mutexes before we
clear Thread::flags.
We've already been handling this case in arm.cpp and powerpc.cpp, but
apparently we've never hit this code path in x86.cpp before. Indeed,
I've been unable to come up with a Java source code test that hits it;
it's only come up in Scala-generated bytecode.
Scala occasionally generates exception handler tables with interval
bounds which fall outside the range of valid bytecode indexes, so we
must clamp them or risk out-of-bounds array accesses.
Since we use Thread::code to store a reference to either the method to
be invoked or the current bytecode being executed depending on the
context, we must be careful to switch it back to the bytecode of the
exception handler if an exception is thrown while invoking a method
(e.g. an UnsatisfiedLinkError).
There was a subtle bug in that we were not considering alignment
padding for fields defined in superclasses when calculating field
offsets for a derived class when the superclass(es) were visited by
the bootimage generator before the derived class.
Floats are implicitly promoted to doubles when passed as part of a
variable-length argument list, so we can't treat them the same way as
32-bit integers.
Apple's linker tends to remove functions which are never called, which
is not what we want for e.g. vmPrintTrace, since that function is only
intended to be called interactively from within GDB.
My previous attempt wasn't quite sufficient, since it was too late to
call join on a thread which had already exited given the code was
written to aggressively dispose of system handles as soon as the
thread exited. The solution is to delay disposing these handles until
after we're able to join the thread.
The bug here is that when a thread exits and becomes a "zombie", the
OS resources associated with it are not necessarily released until we
actually join and dispose of that thread. Since that only happens
during garbage collection, and collection normally only happens in
response to heap memory pressure, there's no guarantee that we'll GC
frequently enough to clean up zombies promptly and avoid running out
of resources.
The solution is to force a GC whenever we start a new thread and there
are at least N zombies waiting to be disposed, where N=16 for now.
We never define atomicCompareAndSwap64 for ARM or PowerPC, and
apparently only very recent ARM chips support it, so we must fall back
to synchronization-based emulation.
There were a couple of problems with the Avian_sun_misc_Unsafe_park
implementation in classpath-openjdk.cpp. First, the wait time should
be interpreted as milliseconds if absolute, but as nanoseconds
otherwise, whereas we were treating it as milliseconds in both cases.
Second, there was no mechanism to exit the while loop after the
specified time; the only way we could exit was via an unpark or
interrupt.
There was a subtle race condition in the VM shutdown process such that
a System::Thread would be disposed after the System instance it was
created under has been disposed, in which case doing a virtual call to
System::free with that instance would potentially cause a crash. The
solution is to just call the C library version of free directly, since
that's all System::free does.
On Ubuntu 11.10, the optimized build was breaking, apparently because
it was eliminating most of the symbols defined in assembly code
(e.g. vmJump) as unreachable when linking libjvm.so, which left
avian-dynamic unlinkable due to an unresolved symbol.
The solution in this commit is to export makeSystem and makeFinder
from libjvm.so rather than build redundant versions of finder.cpp and
posix.cpp/windows.cpp into avian-dynamic like we've been doing. This
avoids the whole problem of vmJump reachability and reduces the size
of avian-dynamic at the same time.
This commit also turns off LTO for the avian-dynamic link since we get
odd undefined symbol errors about libc-defined symbols otherwise.
This may merit future investigation, but avian-dynamic is so small and
simple that there's no need to optimize it anyway.
Until now, the bootimage build hasn't supported using the Java
invocation API to create a VM, destroy it, and create another in the
same process. Ideally, we would be able to create multiple VMs
simultaneously without any interference between them. In fact, Avian
is designed to support this for the most part, but there are a few
places we use global, mutable state which prevent this from working.
Most notably, the bootimage is modified in-place at runtime, so the
best we can do without extensive changes is to clean up the bootimage
when the VM is destroyed so it's ready for later instances. Hence
this commit.
Ultimately, we can move towards a fully reentrant VM by making the
bootimage immutable, but this will require some care to avoid
performance regressions. Another challenge is our Posix signal
handlers, which currently rely on a global handle to the VM, since you
can't, to my knowledge, pass a context pointer when registering a
signal handler. Thread local variables won't necessarily help, since
a thread might attatch to more than one VM at a time.
When the fourth argument is a 64-bit value on the Apple ARM ABI, it is
passed half by register and half on the stack, unlike on Linux where
it is passed entirely on the stack. The logic to handle this in arm.h
was flawed, and this commit fixes it.
This reverts commit 88d614eb25.
It turns out we still need separate sets of thunks for AOT-compiled
and JIT-compiled code to ensure we can always generate efficient jumps
and calls to thunks on architectures such as ARM and PowerPC, whose
relative jumps and calls have limited ranges.
Now that the AOT-compiled code image is position-independent, there is
no further need for this distinction. In fact, it was harmful,
because we were still using runtime-generated thunks when we should
have been using the ones in the code image. This resulted in
EXC_BAD_ACCESS errors on non-jailbroken iOS devices.
It seems that the Apple iOS Simulator's stat implementation writes
beyond the end of the struct stat we pass it, which can clobber
unrelated parts of the stack. Perhaps this is due to some kind of
header/library mismatch, but I've been unable to track it down so far.
The workaround is to give it 8 words more than it should need, where 8
is a number I just made up and seems to work.
This avoids the requirement of putting the code image in a
section/segment which is both writable and executable, which is good
for security and avoids trouble with systems like iOS which disallow
such things.
The implementation relies on relative addressing such that the offset
of the desired address is fixed as a compile-time constant relative to
the start of the memory area of interest (e.g. the code image, heap
image, or thunk table). At runtime, the base pointer to the memory
area is retrieved from the thread structure and added to the offset to
compute the final address. Using the thread pointer allows us to
generate read-only, position-independent code while avoiding the use
of IP-relative addressing, which is not available on all
architectures.
This fixes a number of bugs concerning cross-architecture bootimage
builds involving diffent endianesses. There will be more work to do
before it works.
Some apps and libraries may generate recoverable SEH exceptions on
Windows, in which cases we don't want to waste time and disk space
generating memory dumps.
This monster commit is the first step towards supporting
cross-architecture bootimage builds. The challenge is to build a heap
and code image for the target platform where the word size and
endianess may differ from those of the build architecture. That means
the memory layout of objects may differ due to alignment and size
differences, so we can't just copy objects into the heap image
unchanged; we must copy field by field, resizing values, reversing
endianess and shifting offsets as necessary.
This commit also removes POD (plain old data) type support from the
type generator because it added a lot of complication and little
value.
On Windows, some versions of Git automatically translate Unix line
endings to Windows line endings, which can break the build unless we
treat carriage returns as whitespace when parsing.
Previously, we returned immediately from Monitor.wait if we found we
had been interrupted, but this caused deadlock when waiting to enter
the exclusive state, since we never released Machine::stateLock to
allow active threads to transition into the idle state. This commit
ensures that we at least briefly release the lock without actually
waiting in that case.
We must throw an AbstractMethodError when such a call is executed (not
when the call is compiled), so we compile this case as a call to a
thunk which throws such an error.
Singletons may have embedded object references, and if they are
allocated at fixed memory locations (e.g. if they are larger than
64KB), they must have object masks so the garbage collector knows were
to find said references.
We can't reduce a conditional branch to an unconditional jump unless
both arguments to the comparison are constants *and* those constants
have been resolved. The latter may not be true in the case of a
bootimage build.
We had be using System::Monitor::wait to block threads internally in
the VM as well as to implement Object.wait. However, the interrupted
flag should only be cleared in the latter case, not the former. This
commit adds a new method and changes the semantics of the old one in
order to acheive the desired behavior.
This was causing 8-byte SSE-to-SSE moves involving registers
xmm8-xmm15 to be misencoded on x86_64, leading to incorrect code
generation in methods with lots of local variables of type double.
We can only omit the jump past a constant pool if it's placed at the
end of a method, which is only true if the pool belongs to the last
block of that method and that block is not so large that the pool must
be placed inside the block instead of after it.
The previous code did not take into account any padding embedded in a
basic block due to inline jump tables, which led to invalid code
generation in large methods
If we fail to resolve a given class (e.g. due to ProGuard obfuscating
or eliminating it), just move on to the next one rather than return
immediately. Otherwise, we may miss intercepting methods of classes
we can resolve.
sun.font.FontManager.initIDs is a native method defined in
libfontmanager.so, yet there seems to be no mechanism in OpenJDK's
class library to actually load that library, so we lazily load it
before trying to resolve the method.
Internally, the VM augments the method tables for abstract classes
with any inherited abstract methods to make code simpler elsewhere,
but that means we can't use that table to construct the result of
Class.getDeclaredMethods since it would include methods not actually
declared in the class. This commit ensures that we preserve and use
the original, un-augmented table for that purpose.
Under certain circumstances, the implementations of these functions
may throw errors, so we need to wrap them using vm::run so we don't
try to unwind past the JNI boundary.
As described in commit 36aa0d6, apps such as jython which generate
bytecode dynamically can produce patterns of bytecode for which the
VM's compiler could not handle properly. However, that commit
introduced a regression and had to be partially reverted.
It turns out the real problem was the call to Compiler::restoreState
which we made before checking whether we were actually ready to
compile the exception handler (we delay compiling an exception handler
until and unless the try/catch block it serves has been compiled so we
can calculate the stack maps properly). That confused the compiler in
rare cases, so we now only call restoreState once we're actually ready
to compile the handler.
My last commit introduced a regression in JIT compilation of
subroutines. This reverts the specific change which caused the
regression. Further work will be needed to address the case which
that change was intended to fix (namely, exception handlers which
apply to multiple try/catch blocks).
Bytecode generated by compilers other than javac or ecj (such as
jython's dynamically generated classes) can contain unreachable code
and exception handlers which apply to more than one try/catch scope.
Previously, the VM's JIT compiler did not handle either of these cases
well, hence this commit.
Previously, we would abort the process if we encountered a truncated
multibyte character in parseUtf8NonAscii (called by the JNI method
NewStringUTF). Now we simply terminate the string at that point.
Previously, we would abort the process if we encountered a truncated
multibyte character in parseUtf8NonAscii (called by the JNI method
NewStringUTF). Now we simply terminate the string at that point.
Also, assume any class which has an ancestor class which has a static
initializer needs initialization even if it doesn't have one itself,
per the Java Language Spec.
The result of Class.getInterfaces should not include interfaces
declared to be implemented/extended by superclasses/superinterfaces,
only those declared by the class itself. This is important because it
influences how java.io.ObjectStreamClass calculates serial version
IDs.
Some broken code implicitly relies on System.identityHashCode always
returning a non-negative number (e.g. old versions of
com/sun/xml/bind/v2/util/CollisionCheckStack.hash).
Code including subroutines and conditionals can result in frame and
register resources being held by values which aren't in scope when
resetFrame is called, so we need to clean them up after cleaning the
in-scope values.
OpenJDK's sun.reflect.MethodAccessorGenerator can generate
invokevirtual calls to private methods (which we normally consider
non-virtual); we must compile them as non-virtual calls since they
aren't in the vtable.
It turns out commit 31eb047 was too aggressive and led to incorrect
calculation of line numbers for machine addresses, as well as
potentially incorrect exception handler scope calculation. This fixes
the regression.
This includes a proper implementation of JVM_ActiveProcessorCount, as
well as JVM_SetLength and JVM_NewMultiArray. Also, we now accept up
to JNI_VERSION_1_6 in JVM_IsSupportedJNIVersion.
I recently encountered a Batik JAR with a method containing a
redundant goto which confused the JIT compiler because it was refered
to in the exception handler and line number tables despite being
unreachable. I don't know how such code was generated, but this
commit ensures the compiler can handle it.
We must not allocate heap objects from doCollect, since it might
trigger a GC while one is already in progress, which can cause trouble
when we're still queuing up objects to finalize, among other things.
To avoid this, I've added extra fields to the finalizer and cleaner
types which we can use to link instances up during GC without
allocating new memory.
We can't blindly try release the monitors for all synchronized methods
when unwinding the stack since we may not have finished acquiring the
most recent one when the exception was thrown.
If we don't preallocate the memory we need to reacquire the lock after
we finish waiting, we risk an OOME which may unwind the stack into
code which assumes we still have acquire the lock successfully.
Instead of giving up when the backing allocator's tryAllocate method
returns null, we switch to the allocate method to show we mean
business. This makes use of zones more robust under low memory
situations since it allows us to exceed the soft memory ceiling when
the only alternative is to abort.
OpenJDK uses an alternative to Object.finalize for resource cleanup in
the form of sun.misc.Cleaner. Normally, OpenJDK's
java.lang.ref.Reference.ReferenceHandler thread handles this, calling
Cleaner.clean on any instances it finds in its "pending" queue.
However, Avian handles reference queuing internally, so it never
actually adds anything to that queue, so the VM must call
Cleaner.clean itself.
The main changes here are:
* fixes for runtime annotation support
* proper support for runtime generic type introspection
* throw NoClassDefFoundErrors instead of ClassNotFoundExceptions
where appropriate
It isn't necessarily safe or desireable to call the previous handler
even if it's non-null, so we ignore it entirely except to reinstate it
when unregistering our own handler.
Big applications can exceed the 16MB limit we previously used.
Increasing this above 30MB (if/when desired) will require changes to
the ARM and PowerPC JIT code to work around immediate branch encoding
limits on those platforms,
This commit ensures that we use the proper memory barriers or locking
necessary to preserve volatile semantics for such fields when accessed
or updated via JNI.
Unlike the interpreter, the JIT compiler tries to resolve all the
symbols referenced by a method when compiling that method. However,
this can backfire if a symbol cannot be resolved: we end up throwing
an e.g. NoClassDefFoundError for code which may never be executed.
This is particularly troublesome for code which supports multiple
APIs, choosing one at runtime.
The solution is to defer to stub code for symbols which can't be
resolved at JIT compile time. Such a stub will try again at runtime
to resolve the needed symbol and throw an appropriate error if it
still can't be found.
We were not always placing parameters in the correct stack positions
in the PowerPC implementations of dynamicCall and vmNativeCall. In
particular, the first stack slot used to hold a parameter depends on
the sizes and types of the preceding parameters which are passed in
registers.
This primarily required additions to classpath-openjdk.cpp to
intercept ZipFile, ZipEntry, and JarFile native methods to consult
embedded encryption policy jars when required.
It is possible to create an Exception with no stack trace by
overriding Throwable.fillInStackTrace, so we can't assume any given
instance will have one.
There was a race between these two functions such that one thread A
would run dispose on thread B just before thread B finishes exit, with
the result that Thread::lock and/or Thread::systemThread would be
disposed twice, resulting in a crash.
Due to encoding limitations, the immediate operand of conditional
branches can be no more than 32KB forward or backward. Since the
JIT-compiled form of some methods can be larger than 32KB, and we also
do conditional jumps to code outside the current method in some cases,
we must work around this limitation.
The strategy of this commit is to provide inline, intermediate jump
tables where necessary. A given conditional branch whose target is
too far for a direct jump will instead point to an unconditional
branch in the nearest jump table which points to the actual target.
Unconditional immediate branches are also limited on PowerPC, but this
limit is 32MB, which is not an impediment in practice. If it does
become a problem, we'll need to encode such branches using multiple
instructions.
The VM uses Integer and Long instances internally to wrap the results
of dynamic method invocations, but Method.invoke should use the
correct, specific type for the primitive (e.g. Character for char).
On PowerPC and ARM, we can't rely on the return address having already
been saved on the stack on entry to a thunk, so we must look for it in
the link register instead.
My previous attempt at this was incomplete; it did not address
Java->native->Java->native call sequences, nor did it address
continuations. This commit takes care of both.
This requires reducing HeapCapacity and CodeCapacity back to 128MB and
30MB respectively. I had set them to larger values to test
non-ProGuard'ed OpenJDK bootimage builds, which naturally needed a lot
more space. However, such builds aren't really useful in the real
world, and the compiler currently can't handle jumps or calls spanning
more than the maximum size of an immediate branch offset on ARM or
PowerPC, so I'm lowering them back down to more realistic values.
We can't rely on the C++ compiler to save the return address in a
known location on entry to each function we might call from Java
(although GCC 4.5 seems to do so consistently, which is why I hadn't
realized the unwinding code was relying on that assumption), so we
must store it explicitly in MyThread::ip in each thunk. For PowerPC
and x86, we continue saving it on the stack as always, since the
calling convention guarantees its location relative to the stack
pointer.
On Mac OS, signals sent using pthread_kill are never delivered if the
target thread is blocked (e.g. acquiring a lock or waiting on a
condition), so we can't rely on it and must use the Mach-specific
thread execution API instead to implement Thread.getStackTrace.
For Thread.interrupt, we must not only use pthread_kill but also
pthread_cond_signal to ensure the thread is woken up.