Due to SWT's nasty habit of creating a new object monitor for every
task added to Display.asyncExec, we've found that, on Windows at
least, we tend to run out of OS handles due to the large number of
mutexes we create between garbage collections.
One way to address this might be to trigger a GC when either the
number of monitors created since the last GC exceeds a certain number
or when the total number of monitors in the VM reaches a certain
number. Both of these risk hurting performance, especially if they
force major collections which would otherwise be infrequent. Also,
it's hard to know what the values of such thresholds should be on a
given system.
Instead, we reimplement Java monitors using atomic compare-and-swap
(CAS) and thread-specific native locks for blocking in the case of
contention. This way, we can create an arbitrary number of monitors
without creating any new native locks. The total number of native
locks needed by the VM is bounded instead by the number of live
threads plus a small constant.
Note that if we ever add support for an architecture which does not
support CAS, we'll need to provide a fallback monitor implementation.
If another thread succeeds in entering the "exclusive" state while we
use the fast path to transition the current thread to "active", we
must switch back to "idle" temporarily to allow the exclusive thread a
chance to continue, and then retry the transition to "active" via the
slow path.
These paths reduce contention among threads by using atomic operations
and memory barriers instead of mutexes where possible. This is
especially important for JNI calls, since each such call involves two
state transitions: from "active" to "idle" and back.
This implementation does not conform to the Java standard in that
finalize methods are called from whichever thread happens to be garbage
collecting, and that thread may hold locks, whereas the standard
guarantees that finalize will be run from a thread which holds no locks.
Also, an object will never be finalized more than once, even if its
finalize method "rescues" (i.e. makes reachable) the object such that it
might become unreachable a second time and thus a candidate for
finalization once more. It's not clear to me from the standard if this
is OK or not.
Nonwithstanding the above, this implementation is useful for "normal"
finalize methods which simply release resources associated with an
object.
The previous code relied on the invalid assumption that the thread-local
heaps for all threads would have been cleared immediately following a
garbage collection. However, the last thing the garbage collection
function does is run finalizers which may allocate new objects. This
can lead allocate3 to call allocateSmall with a size which is too large
to accomodate, overflowing the heap.
The solution is to iterate until there really is enough room for the
original allocation request.