mirror of
https://github.com/corda/corda.git
synced 2025-03-15 16:46:12 +00:00
Wrap exceptions that occur in state machine transitions with a custom exception type which is then handled inside of the flow hospital. As part of this change, a number of side negative side effects have been addressed. General summary: - `StateTransitionException` wraps exceptions caught in `TransitionExecutorImpl` - `StateTransitionExceptions` are handled in the flow hospital, retried 3 times and then kept in for observation if errors persist (assuming conditions below are false) - Exceptions that occur in `FlowAsyncOperation` events are wrapped in `AsyncOperationTransitionException` and ignored by the flow hospital transition staff member - `InterruptException`s are given a `TERMINAL` diagnosis by the flow hospital transition staff member (can occur due to `killFlow`) - Allow flows which have not persisted their original checkpoint to still retry by replaying their start flow messages - Swallow exceptions in `AcknowledgeMessages` actions Detailed summary: * CORDA-3194 Add state machine transition error handling to flow hospital Wrap exceptions that are caught in `TransitionExecutorImpl` (coming from new errors) with `StateTransitionException`. This exception is then handled explicitly by the flow hospital. Add `TransitionErrorGeneralPractitioner` to `StaffedFlowHospital`. This staff member handles errors that mention `StateTransitionException`. Errors are retried and then kept in the hospital if the errors persist. * CORDA-3194 Remove a fiber from the `hospitalisedFlows` if its previous state was clean If the fiber's previous state was clean then remove it from `HospitalisingInterceptor.hospitalisedFlows`. This allows flows that are being retried to clean themselves. Doing this allows them to re-enter the flow hospital after executing the fiber's transition (if an error occurs). This is important for retrying a flow that has errored during a transition. * CORDA-3194 Set `isAnyCheckpointPersisted` to true when retrying a flow Added to prevent a single flow from creating multiple checkpoints when a failure occurs during `Action.AcknowledgeMessages`. More specifically, to `isAnyCheckpointPersisted` is false when retrying the flow, even though a checkpoint has actually been saved. Due to this a brand new flow is started with a new flow id (causing duplication). Setting `isAnyCheckpointPersisted` to true specifically when retrying a flow resolves this issue. * CORDA-3194 Add Byteman test to verify transition error handling Add `StatemachineErrorHandlingTest` to verify transition error handling. Byteman allows exceptions to be injected at certain points in the code's execution. Therefore exceptions can be thrown when needed inside of the state machine. The current tests check errors in events: - `InitiateFlow` - `AcknowledgeMessages` * CORDA-3194 Swallow all exceptions in `ActionExecutorImpl.executeAcknowledgeMessages` Swallow the exceptions that occur in the `DeduplicationHandler`s when inside of `ActionExecutorImpl.executeAcknowledgeMessages`. The side effects of the failures that can happen in the handlers are not serious enough to put the transition into a failure state. Therefore they are now caught. This allows the transition to continue as normal, even if an error occurs in one any of the handlers. * CORDA-3194 Wrap unexpected exceptions thrown in async operation transitions Exceptions thrown inside of `FlowAsyncOperation.execute` implementations that are not returned as part of the future, are caught, wrapped and rethrown. This prevents unexpected exceptions thrown by (most likely) user code from being handled by the hospital by the transition staff member. This handling might change moving forward, but it allows the async operation to continue working as it was before transition error handling was added. * CORDA-3194 Verify that errors inside of `AcknowledgeMessages` work as expected Update `StatemachineErrorHandlingTest` to correctly test errors that occur when executing the `AcknowledgeMessages` action. * CORDA-3194 Retry flows that failed to persist their original checkpoint Allow a flow that failed when creating their original checkpoint (for example - failing to commit the db transaction) to retry. The flow will create a brand new checkpoint (as the original did not saved). This required adding `flowId` to `ExternalStartFlowEvent` to allow the event to keep a record of the flow's id. When the flow is retried, the events are replayed which trigger a flow to be started that has the id stored in the event. To allow this change, code was removed from `retryFlowFromSafePoint` to allow the function to continue, even if no checkpoint matches the passed in flow id. * CORDA-3194 Correct `FlowFrameworkTests` test due to error handling Test assumed that errors in transitions are not retried, this has now been updated so the test passes with the flow succeeding after an exception is thrown. * CORDA-3194 Remove unneeded import * CORDA-3194 Make the state transition exceptions extend `CordaException` `StateTransitionException` and `AsyncOperationTransitionException` now extend `CordaException` instead of `Exception`. * CORDA-3194 Improve log messages * CORDA-3194 Remove unneeded code in `HospitalisingInterceptor` Due to a previous change, a section of code that removes a flow id from the `hospitalisedFlows` map is no longer required. This code has been removed. * CORDA-3194 Constraint violations are given `TERMINAL` diagnosis Add `Diagnosis.TERMINAL` to `StaffedFlowHospital` to allow an error to be ignored and left to die a quick and painful death. `StateTransitionException` changed so it does not cause serialisation errors when propagated from a flow. * CORDA-3194 `InterruptedExceptions` are given `TERMINAL` diagnosis