Note that only the main parts of the instrumentation are shown above, the actual transformation is more complex and involves handling
corner cases and optimizations.
Fibers
^^^^^^
The above instrumentation allows the implementation of *co-operative* scheduling. That is, ``@Suspendable`` code can yield its execution by
throwing a ``SuspendExecution`` exception. This exception throw takes care of handing the control flow to a top-level try-catch, which then
has access to the thread-locally constructed execution stack, as well as a way to return to the suspension point using the "method entry"
list.
A ``Fiber`` thus is nothing more than a data structure holding the execution stack, the method entry list, as well as various bookkeeping
data related to the management of the ``Fiber``, e.g. its state enum or identifier.
The main try-catch that handles the yielding may be found `here <https://github.com/puniverse/quasar/blob/db0ac29f55bc0515023d67ab86a2178c5e6eeb94/quasar-core/src/main/java/co/paralleluniverse/fibers/Fiber.java#L790>`_.
..note:: For those adventurous enough to explore the implementation, the execution stack and method entry list are merged into two growing
arrays in ``Stack``, one holding ``Object`` s (``dataObject``, for structured objects), the other holding ``long`` s (``dataLong``, for
primitive values). The arrays always have the same length, and they both contain values for each stack frame. The primitive stack
additionally has a "metadata" slot for each stack frame, this is where the "method entry" value is put, as well as frame size data.
The main idea behind checkpoints is to utilize the ``Fiber`` data structure and treat it as a serializable object capturing the state of a
running computation. Whenever a Corda-suspendable API is hit, we capture the execution stack and corresponding entry list, and serialize
it using `Kryo <https://github.com/EsotericSoftware/kryo>`_, a reflection-based serialization library capable of serializing unstructured
data. We thus get a handle to an arbitrary suspended computation.
In the flow state machine there is a strict separation of the user-code's state, and the flow framework's internal state. The former is the
serialized ``Fiber``, and the latter consists of structured objects.
The definition of a ``Checkpoint`` can be found `here <https://github.com/corda/corda/blob/dc4644643247d86b14165944f6925c2d2561eabc/node/src/main/kotlin/net/corda/node/services/statemachine/StateMachineState.kt#L55>`_.
The "user state" can be found in ``FlowState``. It is either
#.``Unstarted``: in this case there's no ``Fiber`` to serialize yet, we serialize the ``FlowLogic`` instead.
#.``Started``: in this case the flow has been started already, and has been suspended on some IO. We store the ``FlowIORequest`` and the
serialized ``Fiber``.
The rest of the ``Checkpoint`` deals with internal bookkeeping. Sessions, the subflow-stack, errors. Note how all data structures are
read-only. This is deliberate, to enable easier reasoning. Any "modification" of the checkpoint therefore implies making a shallow copy.
The state machine
-----------------
The internals of the flow framework were designed as a state machine. A flow is a strange event loop that has a state, and goes through
state transitions triggered by events. The transitions may be either
#. User transitions, when we hand control to user-defined code in the cordapp. This may transition to a suspension point, the end of the
flow, or may abort exceptionally.
#. Internal transitions, where we keep strict track of side-effects and failure conditions.
The core data structures of the state machine are:
#.``StateMachineState``: this is the full state of the state machine. It includes the ``Checkpoint`` (the persisted part of the state), and
other non-persisted state, most importantly the list of pending ``DeduplicationHandler`` s, to be described later.
#.``Event``: Every state transition is triggered by one of these. These may be external events, notifying the state machine of something,
or internal events, for example suspensions.
#.``Action``: These are created by internal state transitions. These transitions do not inherently execute any side-effects, instead, they
create a list of ``Action`` s, which are later executed.
#.``FlowContinuation``: indicates how the state machine should proceed after a transition. It can resume to user code, throw an exception,
keep processing events or abort the flow completely.
The state machine is a **pure** function that when given an ``Event`` and an initial ``StateMachineState`` returns the next state, a list of
``Action`` s to execute, and a ``FlowContinuation`` to indicate how to proceed:
fun transition(event: Event, state: StateMachineState): TransitionResult
The top-level entry point for the state machine transitions is in ``TopLevelTransition``.
As an example let's examine message delivery. This transition will be triggered by a ``DeliverSessionMessage`` event, defined like this:
..code-block:: kotlin
data class DeliverSessionMessage(
val sessionMessage: ExistingSessionMessage,
override val deduplicationHandler: DeduplicationHandler,
val sender: Party
) : Event(), GeneratedByExternalEvent
The event then goes through ``TopLevelTransition``, which then passes it to ``DeliverSessionMessageTransition``. This transition inspects
the event, then does the relevant bookkeeping, updating sessions, buffering messages etc. Note that we don't do any checkpoint persistence,
and we don't return control to the user code afterwards, we simply schedule a ``DoRemainingWork`` and return a ``ProcessEvents``
continuation. This means that it's going to be the next transition that decides whether the received message is "relevant" to the current
suspension, and whether control should thus be returned to user code with the message.
FlowStateMachineImpl
--------------------
The state machine is a pure function, so what is the "driver" of it, that actually executes the transitions and side-effects? This is what
``FlowStateMachineImpl`` is doing, which is a ``Fiber``. This class requires great care when it's modified, as the programmer must be aware
of what's on the stack, what fields get persisted as part of the ``Checkpoint``, and how the control flow is wired.
The usual way to implement state machines is to create a simple event loop that keeps popping events from a queue, and executes the
resulting transitions. With flows however this isn't so simple, because control must be returned to suspending operations. Therefore the
eventloop is split up into several smaller eventloops, executed when "we get the chance", i.e. when users call API functions. Whenever the
flow calls a Flow API function, control is handed to the flow framework, that's when we can process events, until a ``FlowContinuation``
indicates that control should be returned to user code.
There are two functions that aid the above:
#.``FlowStateMachineImpl.processEventsUntilFlowIsResumed``: as the name suggests this is a loop that keeps popping and processing events
from the flow's event queue, until a ``FlowContinuation.Resume`` or some continuation other than ``ProcessEvents`` is returned.
#.``FlowStateMachineImpl.processEventImmediately``: this function skips the event queue and processes an event immediately. There are
certain transitions (e.g. subflow enter/exit) that must be done this way, otherwise the event ordering can cause problems.
The two main functions that call the above are the top-level ``run``, which is the entry point of the flow, and ``suspend``, which every
blocking API call eventually calls.
Suspensions
-----------
Let's take a look at ``suspend``, which is the most delicate/brittle function in this class, and most probably the whole flow framework.
Examining it will reveal a lot about how flows and fibers work.
..code-block:: kotlin
@Suspendable
override fun <R : Any> suspend(ioRequest: FlowIORequest<R>, maySkipCheckpoint: Boolean): R {
First off, the type signature. We pass in a ``FlowIORequest<R>``, which is an encapsulation of the IO action we're about to suspend on. It
is a sealed class with members like ``Send``/``Receive``/``ExecuteAsyncOperation``. It is serializable, and will be part of the
``Checkpoint``. In fact, it is doubly-serialized, as it is in ``FlowState`` in a typed form, but is also present in the Fiber's stack, as a
part of ``suspend``'s stack frame.
We also pass a ``maySkipCheckpoint`` boolean which if true will prevent the checkpoint from being persisted.
The function returns ``R``, but the runtime control flow achieving this "return" is quite tricky. When the Fiber suspends a
``SuspendExecution`` exception will be thrown, and when the fiber is resumed this ``suspend`` function will be re-entered, however this time
in a "different capacity", indicated by Quasar's implicitly stored method entry, which will jump to the end of the suspension. This is
repeated several times as this function has two suspension points, one of them possibly executing multiple times, as we will see later.
..code-block:: kotlin
val serializationContext = TransientReference(getTransientField(TransientValues::checkpointSerializationContext))
val transaction = extractThreadLocalTransaction()
These lines extract some data required for the suspension. Note that both local variables are ``TransientReference`` s, which means the
referred-to object will not be serialized as part of the stack frame. During resumption from a deserialized checkpoint these local variables
will thus be null, however at that point these objects will not be required anymore.
The first line gets the serialization context from a ``TransientValues`` datastructure, which is where all objects live that are required
for the flow's functioning but which we don't want to persist. This means all of these values must be re-initialized each time we are
restoring a flow from a persisted checkpoint.
..code-block:: kotlin
parkAndSerialize { _, _ ->
This is the Quasar API that does the actual suspension. The passed in lambda will not be executed in the current ``suspend`` frame, but
rather is stored temporarily in the internal ``Fiber`` structure, and will be run in the outer Quasar try-catch as a "post park" action
after the catch of the ``SuspendExecution`` exception. See `Fiber.java <https://github.com/puniverse/quasar/blob/db0ac29f55bc0515023d67ab86a2178c5e6eeb94/quasar-core/src/main/java/co/paralleluniverse/fibers/Fiber.java#L804>`_ for details.
This means that within this lambda the Fiber will have already technically parked, but it hasn't yet properly yielded to the enclosing
scheduler.
..code-block:: kotlin
setLoggingContext()
Thread-locals are treated in a special way when Quasar suspends/resumes. Through use of `reflection and JDK-internal unsafe operations <https://github.com/puniverse/quasar/blob/db0ac29f55bc0515023d67ab86a2178c5e6eeb94/quasar-core/src/main/java/co/paralleluniverse/concurrent/util/ThreadAccess.java>`_
it accesses all ThreadLocals in the current thread and swaps them with ones stored in the Fiber data structure. In essence for each thread
that executes as a Fiber we have two sets of thread locals, one set belongs to the original "non-Quasar" thread, and the other belongs to
the Fiber. During Fiber execution the latter is active, this is swapped with the former during suspension, and swapped back during resume.
Note that during resume these thread-locals may actually be restored to a *different* thread than the original.
In the ``parkAndSerialize`` closure the Fiber is partially parked, and at this point the thread locals are already swapped out. This means
that data stored in ``ThreadLocal`` s that we still need must be re-initialized somehow. In the above case this is the logging MDC.
..code-block:: kotlin
// Will skip checkpoint if there are any idempotent flows in the subflow stack.
val skipPersistingCheckpoint = containsIdempotentFlows() || maySkipCheckpoint
As the last step in the park closure we unpark the Fiber we are currently parking. This effectively causes an "immediate" re-enter of the
fiber, and therefore the ``suspend`` function, but this time jumping over the park and executing the next statement. Of course this re-enter
may happen much later, perhaps even on a different thread.
We then enter a mini event-loop, which also does Quasar yields, processing the flow's event queue until a transition continuation indicates
that control can be returned to user code . Practically this means that when a flow is waiting on an IO action it won't actually be blocked
in the ``parkAndSerialize`` call, but rather in this event loop, popping from the event queue.
..note::
The ``processEvent*`` calls do explicit checks of database transaction state on entry and exit. This is because Quasar yields make
reasoning about resource usage difficult, as they detach resource lifetime from lexical scoping, or in fact any other scoping that
programmers are used to. These internal checks ensure that we are aware of which code blocks have a transaction open and which ones
don't. Incidentally these checks also seem to catch instrumentation/missing ``@Suspendable``-annotation problems.
Event processing
----------------
The processing of an event consists of two steps:
#. Calculating a transition. This is the pure ``StateMachineState`` + ``Event`` -> ``TransitionResult`` function.
#. Executing the transition. This is done by a ``TransitionExecutor``, which in turn uses an ``ActionExecutor`` for individual ``Action``s.
This structuring allows the introspection and interception of state machine transitions through the registering of ``TransitionExecutor``
interceptors. These interceptors are ``TransitionExecutor`` s that have access to a delegate. When they receive a new transition they can
inspect it, pass it to the delegate, and do something specific to the interceptor.
For example checkpoint deserializability is checked by such an `interceptor <https://github.com/corda/corda/blob/76d738c4529fd7bdfabcfd1b61d500f9259978f7/node/src/main/kotlin/net/corda/node/services/statemachine/interceptors/FiberDeserializationCheckingInterceptor.kt#L18>`_.
It inspects a transition, and if it contains a Fiber checkpoint then it checks whether it's deserializable in a separate thread.
The transition calculation is done in the ``net.corda.node.services.statemachine.transitions`` package, the top-level entry point being
``TopLevelTransition``. There is a ``TransitionBuilder`` helper that makes the transition definitions a bit more readable. It contains a
``currentState`` field that may be updated with new ``StateMachineState`` instances as the event is being processed, and has some helper
functions for common functionality, for example for erroring the state machine with some error condition.
Here are a couple of highlighted transitions:
Suspend
^^^^^^^
Handling of ``Event.Suspend`` is quite straightforward and is done `here <https://github.com/corda/corda/blob/26855967989557e4c078bb08dd528231d30fad8b/node/src/main/kotlin/net/corda/node/services/statemachine/transitions/TopLevelTransition.kt#L143>`_.
We take the serialized ``Fiber`` and the IO request and create a new checkpoint, then depending on whether we should persist or not we
either simply commit the database transaction and schedule a ``DoRemainingWork`` (to be explained later), or we persist the checkpoint, run
the ``DeduplicationHandler`` inside-tx hooks, commit, then run the after-tx hooks, and schedule a ``DoRemainingWork``.
Every checkpoint persistence implies the above steps, in this specific order.
DoRemainingWork
^^^^^^^^^^^^^^^
This is a generic event that simply tells the state machine: inspect your current state, and decide what to do, if anything. Using this
event we can break down transitions into a <modify state> and <inspect and do stuff> transition, which compose well with other transitions,
as we don't need to add special cases everywhere in the state machine.
As an example take error propagation. When a flow errors it's put into an "errored" state, and it's waiting for further instructions. One
possibility is the triggering of error propagation through the scheduling of ``Event.StartErrorPropagation``. Note how the handling of this