- Port ledger integrity work to `SingleThreadedStateMachineManager`
- Fix `StatemachineErrorHandlingTest`
- Fix compile errors in `RetryFlowMockTest` + `VaultObserverExceptionTest`
- Add method to `StaffedFlowHospital` that was missed during original merge
* CORDA-3194 Do not allow killed flows back into the hospital
This change has been made to prevent killed flows from being added back
to the hospital after being forcibly removed by `killFlow`. Not doing so,
could leave references to a flow inside of the hospital, which is not
the correct behaviour.
`killFlow` now sets a flow's `StatemachineState.isRemoved` to true.
This check is then used in `StaffedFlowHospital` and the
`DumpHistoryOnErrorInterceptor`.
* CORDA-3194 Log different message for transition error due to killed flow
When a flow is killed, its checkpoint is deleted. Currently, the
statemachine will still try a process the next event even if it has
been killed. This can lead to an error when trying to update the
deleted checkpoint. The exception thrown from this is logged out.
An if statement has been added to log a different message at debug level
if it is due to an update error for a killed flow. This is done to not
alarm node operators of the exception.
* CORDA-3194 Relax duplicate insert flow hospital handling
Revert a previous change to now make the duplicate insert staff member to
give a diagnosis of discharge or not my speciality (previously gave
terminal).
This is to prevent duplicate insert handling from overriding finality
flow error handling.
* added the warning as a TimerTask at StaffedFlowHospital#delayedDischargeTimer
* moved the scheduling of the warning task at StaffedFlowHospital#init block. That way we ensure that the task will be scheduled only once at StaffedFlowHospital initialization.
* Corrected overnight observation warning task's logging message. Changed StaffedFlowHospital#delayedDischargeTimer to the more generic StaffedFlowHospital#hospitalJobTimer since it now schedules delayed discharges tasks as well the overnight observation warning task. Removed this from property reference
* switching to fun timerTask for the instantiation of anonymous TimerTask classes
* Correct condition to log patients who are currently in the hospital, whose last record in their medical records is Outcome.OVERNIGHT_OBSERVATION. Extended logging to include treatableSessionInits staying in the hospital
* Add not empty check for patientsUnderOvernightObservation. Correct strings.
When a flow fails to retry, it should be kept in for overnight observation and aborted.
In the future, it might be possible to retry flows again that failed during their retry, but for now keeping for observation and aborting is satisfactory.
* CORDA-3194 Remove hospitalised flows from `HospitalisingInterceptor`
Small refactor to remove some of the hospital logic out of the
`HospitalisingInterceptor` and into the `StaffedFlowHospital`.
Add some comments to help clarify the purpose of the two maps inside
of the hospital.
* CORDA-3194 When a flow fails to retry force it into observation
When a flow fails to retry, it should be kept in for overnight
observation and aborted.
In the future, it might be possible to retry flows again that failed
during their retry, but for now keeping for observation and aborting is
satisfactory.
* CORDA-3194 Test for database commit failure when retrying a flow
Failing during the database commit failure that occurs after the retry
flow action does not stop the flow from actually retrying. This test
confirms this functionality.
The retried flow gets scheduled as part of the retry action. The failure
in the commit action does not prevent this since it has already been
scheduled.
* CORDA-3194 Replay start flow events when responding flow fails initial checkpoint commit
Logic has already been added to recover from initial checkpoint commit
failures on the initiating flow side but this did not suffice for
the same failure occurring on the responding flow's side.
The same idea has been added to resolve the responding flow's issue.
`ExternalMessageEvent` now has a `flowId` that is maintained on the
event. Messages can then be replayed to start/restart the flow, while
the event provides the flow id to each flow start.
Each `ExternalMessageEvent` implementation generates a random `flowId`
when constructed.
Events are stored in Artemis. This allows the solution to recover across
node restarts as the events will be pulled from artemis again when
restarting.
In the future `flowId`s will probably moved off of the events and
generated purely on the responding flow's node.
* CORDA-3194 Add test to verify that errors removing a responding flow are recoverable
* Unwrap rx.OnErrorNotImplementedException so the hospital can handle the cause appropriately
* Add db failure cordapp
* Renamed folders to avoid ambiguity in gradle
* Add integration test for exception hospitalisation when thrown from an RX observable.
* Make the test slightly cleaner
* Fix the schema to actually match the requirements for my custom state. Thanks a bunch, H2.
* Switch test to use SqlException base class.
* Schedule error event if we detect that a commit or db flush has thrown (forcing the flow to error even if customer code then goes ahead to swallow the exception)
* Revert change to schedule extra error
* Add more tests for edge case with DB exceptions, changed CorDapp to suppor this an hook in the flow hospital
* Warning about unsubscribe
Check state transitioned from clean to error for hospital admission.
* Match the test to our actual expectations
* Revert "Revert change to schedule extra error"
This reverts commit 43d47937
* Prevent suppression of errors arising in `transaction()` and `jdbcConnection()`
* Test for SqlException caught trying to escape from recordTransaction and suppressed outside being intercepted.
* More tests for various error/catch combinations
* Clean up and comments
* Code reformat
* Fix test compilation
Wrap exceptions that occur in state machine transitions with a custom exception type which is
then handled inside of the flow hospital. As part of this change, a number of side negative side
effects have been addressed.
General summary:
- `StateTransitionException` wraps exceptions caught in `TransitionExecutorImpl`
- `StateTransitionExceptions` are handled in the flow hospital, retried 3 times and then kept in
for observation if errors persist (assuming conditions below are false)
- Exceptions that occur in `FlowAsyncOperation` events are wrapped in
`AsyncOperationTransitionException` and ignored by the flow hospital transition staff member
- `InterruptException`s are given a `TERMINAL` diagnosis by the flow hospital transition staff
member (can occur due to `killFlow`)
- Allow flows which have not persisted their original checkpoint to still retry by replaying their
start flow messages
- Swallow exceptions in `AcknowledgeMessages` actions
Detailed summary:
* CORDA-3194 Add state machine transition error handling to flow hospital
Wrap exceptions that are caught in `TransitionExecutorImpl` (coming from
new errors) with `StateTransitionException`. This exception is then
handled explicitly by the flow hospital.
Add `TransitionErrorGeneralPractitioner` to `StaffedFlowHospital`. This
staff member handles errors that mention `StateTransitionException`.
Errors are retried and then kept in the hospital if the errors persist.
* CORDA-3194 Remove a fiber from the `hospitalisedFlows` if its previous state was clean
If the fiber's previous state was clean then remove it from
`HospitalisingInterceptor.hospitalisedFlows`. This allows flows that are
being retried to clean themselves. Doing this allows them to re-enter
the flow hospital after executing the fiber's transition (if an error
occurs).
This is important for retrying a flow that has errored during a
transition.
* CORDA-3194 Set `isAnyCheckpointPersisted` to true when retrying a flow
Added to prevent a single flow from creating multiple checkpoints when
a failure occurs during `Action.AcknowledgeMessages`.
More specifically, to `isAnyCheckpointPersisted` is false when retrying
the flow, even though a checkpoint has actually been saved. Due to this
a brand new flow is started with a new flow id (causing duplication).
Setting `isAnyCheckpointPersisted` to true specifically when retrying a
flow resolves this issue.
* CORDA-3194 Add Byteman test to verify transition error handling
Add `StatemachineErrorHandlingTest` to verify transition error handling.
Byteman allows exceptions to be injected at certain points in the code's
execution. Therefore exceptions can be thrown when needed inside of the
state machine.
The current tests check errors in events:
- `InitiateFlow`
- `AcknowledgeMessages`
* CORDA-3194 Swallow all exceptions in `ActionExecutorImpl.executeAcknowledgeMessages`
Swallow the exceptions that occur in the `DeduplicationHandler`s when
inside of `ActionExecutorImpl.executeAcknowledgeMessages`.
The side effects of the failures that can happen in the handlers are
not serious enough to put the transition into a failure state.
Therefore they are now caught. This allows the transition to continue
as normal, even if an error occurs in one any of the handlers.
* CORDA-3194 Wrap unexpected exceptions thrown in async operation transitions
Exceptions thrown inside of `FlowAsyncOperation.execute` implementations
that are not returned as part of the future, are caught, wrapped and
rethrown. This prevents unexpected exceptions thrown by (most likely)
user code from being handled by the hospital by the transition
staff member.
This handling might change moving forward, but it allows the async
operation to continue working as it was before transition error handling
was added.
* CORDA-3194 Verify that errors inside of `AcknowledgeMessages` work as expected
Update `StatemachineErrorHandlingTest` to correctly test errors that
occur when executing the `AcknowledgeMessages` action.
* CORDA-3194 Retry flows that failed to persist their original checkpoint
Allow a flow that failed when creating their original checkpoint (for
example - failing to commit the db transaction) to retry.
The flow will create a brand new checkpoint (as the original did not
saved).
This required adding `flowId` to `ExternalStartFlowEvent` to allow the
event to keep a record of the flow's id. When the flow is retried, the
events are replayed which trigger a flow to be started that has the
id stored in the event.
To allow this change, code was removed from `retryFlowFromSafePoint` to
allow the function to continue, even if no checkpoint matches the passed
in flow id.
* CORDA-3194 Correct `FlowFrameworkTests` test due to error handling
Test assumed that errors in transitions are not retried, this has now
been updated so the test passes with the flow succeeding after an
exception is thrown.
* CORDA-3194 Remove unneeded import
* CORDA-3194 Make the state transition exceptions extend `CordaException`
`StateTransitionException` and `AsyncOperationTransitionException` now
extend `CordaException` instead of `Exception`.
* CORDA-3194 Improve log messages
* CORDA-3194 Remove unneeded code in `HospitalisingInterceptor`
Due to a previous change, a section of code that removes a flow id
from the `hospitalisedFlows` map is no longer required. This code has
been removed.
* CORDA-3194 Constraint violations are given `TERMINAL` diagnosis
Add `Diagnosis.TERMINAL` to `StaffedFlowHospital` to allow an error
to be ignored and left to die a quick and painful death.
`StateTransitionException` changed so it does not cause serialisation
errors when propagated from a flow.
* CORDA-3194 `InterruptedExceptions` are given `TERMINAL` diagnosis
* Add GP to flow hospital, and start working on a list of things the GP knows to be incurable.
* Only hospitalise SQL and Persistence Exceptions (let's see if that is enough?), also rename to DatabaseDentist.
* Disabled hospitalisation of SQL exceptions in flow retry tests
* Fix RPC exception handling test by not using PersistenceException
* Ignore flaky integration test
* Code review: Rename staff member and add testing annotation
* Revert compiler.xml
* Added a new way for environment variables to be loaded, which allows for underscore based separation.
* Moved test to its own kotlin file.
* Added case insensitivity support.
* The corda. prefix is now case insensitive too.
* Removed unused variable.
* Added env variables support for driverDSL. Shadowing corda. properties raises an exception.
* Driver api stability fix.
* Changed type of cordapps param to reflect the real one, rather than what IntelliJ auto completed.
* Some detekt issue fixes. Spread operator removed, baselined api stability constructors and buggy line.
* Fixed misspelled variable.
* Reverted unintentional changes.
* Added suppress instead of changing baseline.
* Reworked logic to handle previously defined CORDA_ starting properties and handle accordingly. Fixed a bug where wrong class was used for reflection walking.
* Fix for detekt issues.
* Changed message to a more understandable one.
* Changelog + doc note, console error grammar.
* Changes according to PR review.
* Fixed wrong command line. Added security policy how to.
* Elaborated on security policy to mention when it applies and when not.
* Changes according to PR review.
* Magic to get jolokia version from a single place and forwarded to the docs.
* Fix for CORDA-3315. Removed default implementation of partyFromKey and replaced with implementations in IdentityService sub-types.
* Added test.
* Added missing DB transaction to append only persistent map lookup.
* Fixed not utilising the observables being returned by stateMachines added response with a notUsed(). Opening a ticket for implementation investigation.
* stateMachinesFeed will unsubscribe on interrupt rather than remaining infinitely subscribed.
* Fixed reported detekt issues on the InteractiveShell.
* Changes according to PR review.
* Capture and log "nodeInfo" persistence failures, whilst maintaining an optimistic retry mechanism.
* Additional test cases (update and insert)
* Handle both updates and inserts consistently (single transaction for happy path)
* Fix detekt violations and update baseline with false detection.
* Streamline the code a little.
* Update baseline reporting false violation.
* O/S version of fix for slow running in 4.3
* Removal of IdentityServiceInternal from test classes
* Code review comments
* O/S version of fix for slow running in 4.3
* Removal of IdentityServiceInternal from test classes
* Code review comments
* Re-baselined Detekt
* Fixed warning
* Prevent fat packaging of platform-specific JDK tools.jar (required at compile time for Java 8).
* Catch Throwable.
* Re-instate "tools.jar" in corda.jar as JRE's do not ship with this platform dependency.
* Baseline "TooGenericExceptionCaught".
* Added general exception handler for capturing and exiting on all types of Virtual Machine errors.
* Minimize additional processing (use constant string for error message).
* Remove redundant DB specific thread termination handler.
* Incorporating PR review feedback.
* Add println before halt.
* Add println before halt ... to stderr
* Fix dta migration for PostgreSQL following changes for CORDA-3009 Invalid hash function used for PersistentIdentity in PersistentIdentityService
and ENT-4192 Identity service refactor for confidential-identities and accounts.
* Different table definition for PostgreSQL and other dbs in one changeset instead of running generic DDL and the specifically fix table in Postgres (in relation to CORDA-3009 Invalid hash function used for PersistentIdentity in PersistentIdentityService and ENT-4192 Identity service refactor for confidential-identities and accounts).
* CORDA-3232: Make backward compatible RPC client changes
Such that it will be able to talk to new and old server versions.
* CORDA-3232: Make backward compatible RPC server changes
Such that it will be able to talk to new and old client versions.
* CORDA-3232: Trick Detekt
* CORDA-3232: Integration test for multi-interface communication.
* CORDA-3232: Add legacy mode test.
* CORDA-3232: Making Detekt happier
* CORDA-3232: Fix Detekt baseline after merge with `4.3` branch
* CORDA-3232: Incrementing Platform version
As discussed with @lockathan
* CORDA-3232: Fix legacy test post platform version increment
* CORDA-3232: Use recursive logic to establish complete population of method names
* Revert "CORDA-3232: Incrementing Platform version"
This reverts commit d75f48aa
* CORDA-3232: Remove logic that conditions on PLATFORM_VERSION
* CORDA-3232: Making Detekt happier
* CORDA-3232: Few more changes after conversation with @mnesbit
* CORDA-3232: Make a strict match to `CordaRPCOps` on client side
Or else will fail:
net.corda.tools.shell.InteractiveShellIntegrationTest.dumpCheckpoints creates zip with json file for suspended flow
Flagging that `InternalCordaRPCOps.dumpCheckpoints` cannot be called.
* CORDA-3232: Address PR comments by @rick-r3
* CORDA-3232: Address further review input from @rick-r3
* Change the way how methods stored in the map;
* Extend test to make sure that `CordaRPCOps` can indeed be mixed with other RPC interfaces.
* Generalise participant parsing code & additional test cases.
* Use a common predicate to expand the participants query (when specified more than once - eg. in fungible and linear query criteria).
* Introduce some re-usable functions.
* Additional code clean-up and improvements.
* Fix detekt MaxLineLength errors.