The usage statement for the bootimage-generator now looks like this:
build/linux-x86_64-bootimage/bootimage-generator \
-cp <classpath> \
-bootimage <bootimage file> \
-codeimage <codeimage file> \
[-entry <class name>[.<method name>[<method spec>]]] \
[-bootimage-symbols <start symbol name>:<end symbol name>] \
[-codeimage-symbols <start symbol name>:<end symbol name>]
The first problem was that, on x86, we failed to properly keep track
of whether to expect the return address to be on the stack or not when
unwinding through a frame. We were relying on a "stackLimit" pointer
to tell us whether we were looking at the most recently-called frame
by comparing it with the stack pointer for that frame. That was
inaccurate in the case of a thread executing at the beginning of a
method before a new frame is allocated, in which case the most recent
two frames share a stack pointer, confusing the unwinder. The
solution involves keeping track of how many frames we've looked at
while walking the stack.
The other problem was that compareIpToMethodBounds assumed every
method was followed by at least one byte of padding before the next
method started. That assumption was usually valid because we were
storing the size following method code prior to the code itself.
However, the last method of an AOT-compiled code image is not followed
by any such method header and may instead be followed directly by
native code with no intervening padding. In that case, we risk
interpreting that native code as part of the preceding method, with
potentially bizarre results.
The reason for the compareIpToMethodBounds assumption was that methods
which throw exceptions as their last instruction generate a
non-returning call, which nonetheless push a return address on the
stack which points past the end of the method, and the unwinder needs
to know that return address belongs to that method. A better solution
is to add an extra trap instruction to the end of such methods, which
is what this patch does.
OpenJDK is huge, so building a bootimage out of the whole thing (as
opposed to an app shrunk using ProGuard) requires a lot of space.
Note that we still can't handle this on ARM or PowerPC due to a
limitation in the compiler, but we don't expect people to ship
binaries with the entire OpenJDK class library anyway, so it shouldn't
be a problem in practice.
If we don't initialize that at our first opportunity, it's possible
we'll run out of memory later and exit silently instead of printing
the error and returning a nonzero exit code.
There was a subtle bug in that we were not considering alignment
padding for fields defined in superclasses when calculating field
offsets for a derived class when the superclass(es) were visited by
the bootimage generator before the derived class.
This avoids the requirement of putting the code image in a
section/segment which is both writable and executable, which is good
for security and avoids trouble with systems like iOS which disallow
such things.
The implementation relies on relative addressing such that the offset
of the desired address is fixed as a compile-time constant relative to
the start of the memory area of interest (e.g. the code image, heap
image, or thunk table). At runtime, the base pointer to the memory
area is retrieved from the thread structure and added to the offset to
compute the final address. Using the thread pointer allows us to
generate read-only, position-independent code while avoiding the use
of IP-relative addressing, which is not available on all
architectures.
This fixes a number of bugs concerning cross-architecture bootimage
builds involving diffent endianesses. There will be more work to do
before it works.
This monster commit is the first step towards supporting
cross-architecture bootimage builds. The challenge is to build a heap
and code image for the target platform where the word size and
endianess may differ from those of the build architecture. That means
the memory layout of objects may differ due to alignment and size
differences, so we can't just copy objects into the heap image
unchanged; we must copy field by field, resizing values, reversing
endianess and shifting offsets as necessary.
This commit also removes POD (plain old data) type support from the
type generator because it added a lot of complication and little
value.
This requires reducing HeapCapacity and CodeCapacity back to 128MB and
30MB respectively. I had set them to larger values to test
non-ProGuard'ed OpenJDK bootimage builds, which naturally needed a lot
more space. However, such builds aren't really useful in the real
world, and the compiler currently can't handle jumps or calls spanning
more than the maximum size of an immediate branch offset on ARM or
PowerPC, so I'm lowering them back down to more realistic values.
This is necessary to accomodate classes loaded at runtime which refer
to primitive array types. Otherwise, they won't be included unless
classes in the bootimage refer to them.
This rather large commit modifies the VM to use non-local returns to
throw exceptions instead of simply setting Thread::exception and
returning frame-by-frame as it used to. This has several benefits:
* Functions no longer need to check Thread::exception after each call
which might throw an exception (which would be especially tedious
and error-prone now that any function which allocates objects
directly or indirectly might throw an OutOfMemoryError)
* There's no need to audit the code for calls to functions which
previously did not throw exceptions but later do
* Performance should be improved slightly due to both the reduced
need for conditionals and because undwinding now occurs in a single
jump instead of a series of returns
The main disadvantages are:
* Slightly higher overhead for entering and leaving the VM via the
JNI and JDK methods
* Non-local returns can make the code harder to read
* We must be careful to register destructors for stack-allocated
resources with the Thread so they can be called prior to a
non-local return
The non-local return implementation is similar to setjmp/longjmp,
except it uses continuation-passing style to avoid the need for
cooperation from the C/C++ compiler. Native C++ exceptions would have
also been an option, but that would introduce a dependence on
libstdc++, which we're trying to avoid for portability reasons.
Finally, this commit ensures that the VM throws an OutOfMemoryError
instead of aborting when it reaches its memory ceiling. Currently, we
treat the ceiling as a soft limit and temporarily exceed it as
necessary to allow garbage collection and certain internal allocations
to succeed, but refuse to allocate any Java objects until the heap
size drops back below the ceiling.
We now check for stack overflow in the JIT build as well as the
interpreted build, throwing a StackOverflowError if the limit
(currently hard-coded to 64KB, but should be easy to make
configurable) is exceeded.
In makeCodeImage, we were passing zero to Promise::Listener::resolve,
which would lead to an assertion error if the address of the code
image was further from the base of the address space (i.e. zero) than
could be spanned by a jump on the target architecture. Since, in this
context, we immediately overwrite the value stored, we may pass
whatever we want to this function (we're only calling it so we can
retrieve the location of the value in the image), and the code image
pointer is a better choice for the above reason.
The main change here is to use a lazily-populated vector to associate
runtime data with classes instead of referencing them directly from
the class which requires updating immutable references in the heap
image. The other changes employ other strategies to avoid trying to
update immutable references.
32MB was just slightly too large for PowerPC immediate call instructions
to span, and 16MB matches the JIT executable memory area we use in
compile.cpp.