corda/node
Dan Newton c79ad972d0 ENT-6142 Flows become dead due to uncaught exceptions (#4158)
If a flow fails outside of its normal error processing code path it will
end up in `FlowDefaultUncaughtExceptionHandler`.

This handler will put the flow into overnight observation if possible.
This is done in-memory and the database.

Even with this being done, the fiber itself has blown up and therefore
does not manage to get to `SMM.removeFlow` which is where
`SMM.decrementLiveFibers` is called. For example, a flow that errored
will hit this code eventually. This code is also hit when a flow is
suspended and a shutdown event is sent to it.

The `liveFibers` latch blocks the SMM from shutting down until all flows
have finished or processed shutdown events.

The changes described below resolve this problem.

Any flow that goes to the `FlowDefaultUncaughtExceptionHandler` will be
put marked as dead (`StateMachineState.isDead`). Highlighting that the
flow cannot continue to process events normally as it has broken out
of its event loop

Retrying and shutdown are done manually rather than injecting events
into the flow fiber's queue, because it can't execute its event loop.

Killing a dead flow executes an altered version of
`retryFlowFromSafePoint`. It does this so it can delete the checkpoint
and then continue using the checkpoint it just deleted to run the
kill flow transition on a new fiber.

If a killed flow reaches the `FlowDefaultUncaughtExceptionHandler` it
will be forcibly killed via `killFlowForcibly` which deletes the
checkpoint/or updates it to KILLED and then calls `removeFlow` to bypass
any event processing. This means that a flow that was dead and was killed
will be terminated manually if it reaches the handler again. The same is
true for flows that were not dead before but reached the handler after
being killed.

Also, `FlowCreator.createFlowFromCheckpoint` now retains the `isKilled`
state of the previous fiber's state.
2021-01-29 11:15:28 +00:00
..
capsule Refactored the previous bug fix to minimize duplication by reusing an existing function. 2020-08-06 11:15:19 +01:00
djvm [DRAFT] feat/CORDA-3823-hash-agility-qa-ready (#6789) 2020-11-05 22:05:29 +00:00
src ENT-6142 Flows become dead due to uncaught exceptions (#4158) 2021-01-29 11:15:28 +00:00
build.gradle ENT-5759 check for app schema migration (#6701) 2020-09-14 11:25:18 +01:00