Observers registered on NodeVaultService#rawUpdates, if they throw an exception when called from serviceHub#recordTransactions and if this exception is not handled by the flow hospital, then this leads to the transaction not being recorded in the local vault. This could get the ledger in an out of sync state.
In the specific case this happens within FinalityFlow#notariseAndRecord this leads to the transaction being notarized but not recorded in the local vault nor broadcasted in any counter party. The -failed to be recorded locally- transaction and its output states are not visible to any vault, and its input states not able to consumed by a new transaction, since they are recorded as consumed within the Notary. In this specific case we need not loose, by any means, the current transaction.
We will handle all cases by catching all exceptions thrown from serviceHub#recordTransactions, wrapping them with a HospitalizeFlowException and throwing it instead. The flow will get to the hospital for observation to be retried from previous checkpoint on next node restart.
Use `flowId` from `ExternalMessageEvent` when failing to init sessions instead of generating
a new random UUID. The a `flowId` is generated and stored inside the event after the
state machine work that done previously.
Cap the default size of the external operation thread pool to 10 or
the maximum number of available processors, whichever is smaller.
Set the minimum size of the thread pool to 1. Meaning that only a
single thread is used unless the node actually starts to use
`FlowExternalOperation` which consumes threads from this pool.
* CORDA-2942: Allow exception from `CordaService` creation to propagate
It will ultimately be thrown from Node's `start()` method terminating the node start-up sequence.
* CORDA-2942: Be lenient when retrievign the name of the Notary
Some tests setup such that they do nto have Notary running.
* CORDA-3549: Improve stability of `CordaServiceLifecycleFatalTests`
* CORDA-3549: Bump-up reps count to ensure that test is definitely not flaky when executed by CI
(once proved the number of reps will be reduced)
* CORDA-3549: Making Detekt happier
* CORDA-2942: Ensure `NodeLifecycleEventsDistributor` cleans-up smoothly when node shuts down
Deprecate FlowAsyncOperation and reimplement public versions FlowExternalOperation and FlowExternalAsyncOperation.
await added to FlowLogic to allow easy calling from both Java and Kotlin. There are two overrides of await (one for FlowExternalOperation and FlowExternalAsyncOperation).
Implementations of FlowExternalOperation return a result (written as blocking code) from their execute function. This operation will then be executed using a thread provided by the externalOperationExecutor.
Implementations of FlowExternalAsyncOperation return a future from their execute function. This operation must be executed on a newly spawned thread or one provided by a thread pool. It is up to developers to handle threading in this scenario.
The default thread pool (externalOperationExecutor) can be configured through the flowExternalOperationThreadPoolSize node config.
The current implementation leaves FlowAsyncOperation alone, meaning that any developers that have used it (even though it is internal) won't need to change their apps. If this was not concern I would delete it completely and replumb the state machine code. Instead, it has been marked with @DoNotImplement and executeAsync is annotated with @Deprecated
* CORDA-2942: Port minimal set of changes to make lifecycle events work
... and make codebase compile.
* CORDA-2942: Undo some changes which are not strictly speaking necessary
* CORDA-2942: Make `NodeServicesContext` leaner and delete `extensions-api` module
* CORDA-2942: Reduce even more number of files affected
* CORDA-2942: Integration test fix
* CORDA-2942: Make events `AfterStart` and `BeforeStop` generic w.r.t. `NodeServicesContext`
* CORDA-2942: `NodeLifecycleObserverService` and a set of integration tests.
Public API violations are expected as well as integration tests failing.
* CORDA-2942: Re-work to introduce `ServiceLifecycleObserver`
* CORDA-2942: Explicitly mention a type of exception that may be thrown for some events.
* CORDA-2942: Register `ServiceLifecycleObserver` through `AppServiceHub`
* CORDA-2942: Fix integration test + KDocs update
* CORDA-2942: Detekt and `api-current` update
* CORDA-2942: Improvement to `CordaServiceLifecycleFatalTests`
... or else it has side effects on other tests.
* CORDA-2942: Add an integration test for new API use in Java
Driver test is written in Kotlin, but services definition is written in Java.
Also KDocs improvements.
* CORDA-2942: Documentation and release notes update
* CORDA-2942: First set of changes following review by @mnesbit
* CORDA-2942: Second set of changes following review by @mnesbit
* CORDA-2942: Added multi-threaded test
* CORDA-2942: Fixes
* CORDA-2942: Undo changes to `api-current.txt`
* CORDA-2942: Bare mimimum change to `api-current.txt` for CI gate to pass.
* CORDA-2942: Address review feedback from @rick-r3
* CORDA-2942: Detekt update
* CORDA-2942: Delete `ServiceLifecycleObserverPriority` and replace it with `Int` after discussion with @mnesbit
* CORDA-2942: Introduce more `NodeLifecycleEvent` and switch services to listen for those events
* CORDA-2942: Few more changes after input from @rick-r3
* First stub on integration test
Unfinished - hang on issue and pay
* CORDA-2942: Switch to use out-of-process nodes for the inetgration test
Currently Alice and Notary stuck waiting to hear from each other.
* CORDA-2942: Extra log lines during event distribution
* CORDA-2942: Asynchronously distribute lifecycle events
* CORDA-2942: Await for complete P2P client start-up
Next step: Add vault query to integration test
* CORDA-2942: Asynchronously distribute lifecycle events
Next step: Improve integration test
* CORDA-2942: Fix test broken by recent changes and improve logging
* CORDA-2942: Improvement of the test to be able to monitor actions performed by @CordaService in the remote process
* CORDA-2942: Add node re-start step to the integration test
* CORDA-2942: Remove `CORDAPP_STOPPED` event for now
* CORDA-2942: s/CORDAPP_STARTED/STATE_MACHINE_STARTED/
* CORDA-2942: Inverse the meaning of `priority` as requested by @rick-r3
* CORDA-2942: Register `AppServiceHubImpl` for lifecycle events and put a warning when SMM is not ready.
* Do not register cordapp custom serialisers when using attachment classloader.
* Record the URLs of CorDapp JARs that contain custom serialisers. Include these JARs as extra attachments if we discover that we're missing a custom serialiser during transaction verification.
* Check for disabled serializer when explicitly requesting a custom serializer.
Refactor test case to force use of a custom serializer.
* Tidy up basic custom serializer test.
* Also test that TransactionBuilder rejects missing custom serializers.
* Remove test whitelists, which should not be needed with custom serialisers.
* Add changelog entry. Also align TestCordappImpl.findRoots() with OS backports.
* Second approach based around CorDapps inside AttachmentStorage - report missing type descriptor or any non-composable types.
* Initial implementation of Corda-Fixup rules inside a CorDapp jar.
* Replace original "automatic attachment fixing" mechanism completely.
* First review comments: restore "missing class" logic to TransactionBuilder.
* Restore "missing class" mechanism as fallback for SignedTransaction too.
* CORDA-3507: Use the config value for connectionRetryInterval rather than a hardcoded value
* CORDA-3507: Use the config value for connectionRetryInterval rather than a hardcoded value
* CORDA-3452: Node: Configure the input of custom string in CSR to be used by Identity Service
* CORDA-3452: Remove unused import
* CORDA-3452: Add test for networkServices configuration
* [CORDA-3436] Allow CorDapps access to node diagnostic information
* [CORDA-3436] Fix API breakages
* [CORDA-3436] Improve documentation around diagnostics service
* [CORDA-3436] Remove CorDapps from the diagnostics information
* [CORDA-3436] Silence detekt warning
* CORDA-3513: Don't try to reconnect for PermissionExceptions
* CORDA-3513: Don't try to reconnect for PermissionExceptions
* CORDA-3513: Add test for not reconnecting for PermissionExceptions
* CORDA-3513: Update exception message and test
* CORDA-2942: Switch to use predictable timestamp
* CORDA-2942: Validate content of dumped checkpoint
* CORDA-2942: First stub on the integration test
(no checkpoints dumped for some reason using RPC)
* CORDA-2942: Reduce checkpointing code to bare minimum
* CORDA-2942: Minor refactoring
* CORDA-2942: Verify dump checkpoint content
* ENT-4382: Move `InvocationHandlerTemplate` into `core`
This is an internal helper which is general enough and does not have any Node specific code.
* ENT-4382: Make @CordaInternal applicable to classes
And apply it on `AttachmentTrustCalculator` which is `core/internal` interface anyway.
* ENT-4237: Added timestamp to the node_transactions table.
* ENT-4237: Clock for timestamp now retrieved from ServiceHub. And now record verification time as well.
* ENT-4237: Fixed tests. Also enabled stream output in allParallelIntegrationTest.
* ENT-4237: Changed timestamp to a val.
* ENT-4237: Changed streamOutput to false for allParallelIntegrationTest
* ENT-4237: Unit tests added for new timestamp column. Also now passing a clock into DBTransactionStorage.
* ENT-4237: Added more unit tests to check timestamp
* ENT-4237: Fix test to actually change clock time when testing transaction time does not change.
* Introducing a new type of exception and a new hospital staff member to pause flows by immediately hospitalising them.
* Renaming exception to "HospitalizeFlowException".
* Making HospitalizeFlowException an open class.
* Overloading constructors of HospitalizeFlowException to be available in Java.
* Using Throwable#mentionsThrowable.
* Moving HospitalizeFlowException in its own file.
* Update kdocs for HospitalizeFlowException and StaffedFlowHospital#SedationNurse.
* Added tests, testing various HospitalizeFlowException types thrown.
* Fix Detekt issues.
* Imports optimizing.
* Add safe casting.
* Update api-flows and node-flow-hospital docs.
* Minor code comment change.
* Add DOCSTART-DOCEND signs in HospitalizeFlowException for makeDocs. It is referenced by api-flows.rst.
* Minor change in note.
* Code formatting.
* Remove comment.
* Remove if statement that makes example worse.
* Remove redundant comment.
* Moving 'Internal Corda errors' at the bottom.
* Changing node-flow-hospital.rst as per review.
* Change HospitalizeFlowException description as per review.
* Adding an example for FlowException.
* Minor indentation fix.
* Update FlowException example label as per review.
* Correcting handling of custom exception.
* Harmonize serialization/core and deterministic counterparts
* Fix test for changed private alias key behaviour
* Detekt errors
* roll back project.xml
* CORDA-3471: Create `CordaTransactionSupport` and use wherever possible instead of `CordaPersistence`
* CORDA-3471: Address comments by @mnesbit
- Relocate `CordaTransactionSupport` to `core`
- Create a lighter version of transaction - `VaultTransaction` that gives access to `session` object only.
* CORDA-3471: More changes after discussion with @mnesbit
- Rename `VaultTransaction` into `SessionScope`.
* CORDA-3471: Revert changes to most of the files after conversation with @mnesbit and @rick-r3
* CORDA-3471: Introduce `CordaTransactionSupportImpl` and make it accessible via `AppServiceHub`.
* CORDA-3471: Minor change (comment).
* CORDA-3471: Address input from @mnesbit
* CORDA-3471: Address input from @rick-r3
* CORDA-3471: Make Detekt happier
* CORDA-3471: Add a new test that proves transactions can be started from client threads
As requested by @mnesbit
* CORDA-3471: Change log and documentation update.
As requested by @mnesbit
* CORDA-3464: Also scan attachment:// URLs for custom serializers.
* Only scan the given classloader - ignore this classloader's parents.
* Upgrade to ClassGraph 4.8.58 - for "robustness fixes".
* Register the attachment:// URL scheme using AttachmentsClassLoader.
* Add integration test for custom serializer in contract state.
* Rename Currancy -> Currantsy, just to make the point.
* CORDA-3356 Subflow ledger consistency tests + move statemachine tests to slow integration tests
Add tests for subflows that fail during transitions.
Split out `StatemachineErrorHandlingTest` into a series of smaller tests.
Move these tests into the `integration-test-slow` category so they are
not run against every PR.
* CORDA-3356 Fix detekt issue
* CORDA-3356 Tidy test names
* Added a timestamp property to Checkpoint getting a new Instant.now() value at every Checkpoint instantiation/ copy instantiation. FlowMonitor is now using this new property (Checkpoint#timestamp) and StateMachineState#isFlowResumed to determine which flows are actually suspended. It leaves out flows that are doing work in their FlowLogic#call method.
* Cleaner comment
* Broke FlowMonitor#logFlowsWaitingForParty into logFlowsWaitingForParty and waitingFlowsToDurations. This way waitingFlowsToDurations is modular and can be tested.
Made FlowMonitor constructor get StateMachineManager instead of the retrieveFlows lamda. This way FlowMonitor is more consistent as a service, and entire flow filtering process is now being done in FlowMonitor#waitingFlowsToDurations.
Removed "smm as? StateMachineManagerInternal" in AbstractNode#start as it made no sense.
Updated CheckpointDumper to mention the Checkpoint#timestamp when writing the checkpoint as json.
* Added tests for FlowMonitor service.
* Remove old comment
* 1. FLowMonitor#waitingFlowDurations now returns a Sequence to have an iteration less.
It used to be, one iteration from returning a Set from FLowMonitor#waitingFlowDurations plus one iteration from FlowMonitor#logFlowsWaitingForParty.
2. Code reformattings
* 1. Remove constructor keyword from FlowMonitor
2. Code reformattings
3. Update detekt baseline
* Resolve conflict in Detekt baseline
* Revert "Revert "CORDA-3307 - add support for environment variables in linux (#5523)" (#5643)"
This reverts commit 03ab258fc2.
* Env variables with underscore are now validated using schema validation and checking for unknown key errors.
* Resolving comments from PR review.
* Fix for deprecated import.
* Reworked logic according to PR review.
* Resolved bad string parsing problems where the json structure could be broken if some symbols were included in the key or value.
* Quick and dirty change to stop "Unable to start notaries." error message (#5686)
"Unable to start notaries. A required port might be bound already" is
returned whenever a startup error occurs while starting the notary nodes
in driver tests. This hides the real error.
This change prints the actual error to std_err and read from file
at a later point. This means the real error is not lost and will be
shown in failed builds.
* Suppress detekt warnings
This is to potentially help with debugging in the future as the
`flowId` could become confusing for received messages where the `flowId`
has nothing to do with the current flow.
* * CORDA-2876: Migrate DJVM serialization modules into Corda.
* Pre-generate Corda classes for DJVM sandbox when node boots in production mode.
* Ensure that all DJVM test contract CorDapps are signed.
* Test examining attachments within DJVM sandbox.
* Test Contract.verify() using cryptographic verify function.
* Add test cases for more non-determinism in Contract.verify().
* Update node-driver to support testing nodes with DJVM support.
* Modify Node to allow alternative DJVM configurations for testing.
* Refactor DeterministicVerifierFactoryService for default use-case.
* Small whitespace and code-style refactors.
* Create and activate a DJVM execution profile for the Node.
* Revert making Verifier implement AutoCloseable.
* Allow the node to cache sandboxed Corda byte-code for reuse.
* Use updated Quasar agent that knows not to touch DJVM classloaders.
* Fix Quasar's package exclusions globs for DJVM.
* Deserialise LedgerTransaction into the sandbox for Contract.verify().
* Add the DJVM's serialisation modules to the Corda node.
* Update the node for the latest DJVM API, and preserve the ConstructorForDeserialization annotation on user contract classes.
* Add corda-dev to repositories while DJVM is SNAPSHOT.
* Migrate DJVM specialisation into AbstractNode's ServiceHubInternalImpl.
* Exclude sandbox.** and shaded djvm.** classes from Quasar agent.
* Add the corda-dev repository to :node for the deterministic runtime.
* Turn Verifier into an abstract base class that is specialised by BasicVerifier and DeterministicVerifier.
* Add the Corda deterministic libraries to the Node, and split the DJVM sandbox across two SandboxClassLoader instances.
* Add DJVM to contract verification path inside Corda Node.
* Minor lambda simplifications and removing unused import.
* CORDA-2871: Remove @CordaSerializable from LedgerTransaction.
* CORDA-2871: Add a callback to ServicesForResolution to allow the Node to modify a LedgerTransaction object.
* CORDA-2871: Refactor the contract verification code into a separate class,
and allow LedgerTransaction to choose different Verifier objects.
* Update DJVM to use Corda 4.4-SNAPSHOT. (#95)
* CORDA-3330: Allow DJVM to preload / pregenerate classes from selected jars. (#92)
* Add support for SourceClassLoader.getResources() to DJVM.
* Allow a SandboxConfiguration to preload sandbox byte-code for all classes inside jars containing META-INF/DJVM-preload.
* CORDA-3309: Remove explicit try-catch in favour of UncaughtExceptionHandler. (#91)
* CORDA-3309: Install UncaughtExceptionHandler for DJVM tasks. (#88)
* Fix tests broken by Windows line endings. (#82)
* CORDA-3292: Reimplement ExecutionProfile as a data class. (#80)
* CORDA-2877: Refactor how we create child SandboxConfiguration objects. (#76)
* CORDA-2877: Load bytecode from a persistent cache to prevent repeated rewriting. (#75)
* Refactor byte-code cache to SandboxConfiguration instead of AnalysisConfiguration. We cannot "mix and match" byte-code generated by different sets of rules.
* CORDA-3137: Enhance annotation handling so that we can allow some annotations to be mapped into the sandbox without also needing to be stitched. (#72)
* CORDA-2871: Minor cosmetic fixes. (#69)
* CORDA-3218: Align DJVM with internal Corda Serialisation API. (#68)
* Ensure we get the latest SNAPSHOT of the serialisation code.
* CORDA-2871: Refactor SourceClassLoader to define source classes. (#66)
* Rewrite SourceClassLoader to support parent/child relationships.
* Revert catching TypNotPresebtException - it was a symptom of a bigger problem.
* Remove AutoCloseable from AnalysisConfiguration and SourceClassLoader.
* SourceClassLoader.getResource() must delegate to its parent first.
* CORDA-2871: Ensure ClassLoader.loadClass() throws ClassNotFoundException for all cases where the class cannot be found. (#64)
* CORDA-2871: Modify sandbox tasks to implement both java.Function and sandbox.Function (#62)
* Make TaskExecutors implement BiFunction to make them composable.
* Create ImportTask to wrap a java.Function inside a sandbox.Function.
* Add createExecutor() and createRawExecutor() APIs to SandboxClassLoader.
* Update serialization to use SandboxClassLoader.toSandboxClass().
* Remove a layer of lambdas from the serialisation code.
* Update SandboxExecutor and SandboxRawExecutor.
* Rename Executor to TaskFactory.
* Rename dangling executor -> taskFactory.
* CORDA-2871: Sanity fixes! (#63)
* Improve message for SandboxClassLoadingException.
* Fix serialisation API for using sandboxed environment.
* CORDA-3174: Extend serialisation to include InputStream and OpaqueBytesSubSequence. (#60)
* Update DJVM Example project for serialisation.
* Add serializers for InputStream and OpaqueBytesSubSequence.
* Support ZIP Inflater and CRC32 inside the sandbox.
* Allow the DJVM to wrap java.io.InputStream as sandbox.java.io.InputStream.
* Configure tests also to preserve @DeprecatedConstructorForDeserialization.
* CORDA-3174: Implement Corda serialization modules. (#59)
* Create DJVM serialization modules.
* Create test cases for Array<T>, List<T> and List<Array<T>>.
* Refactor SandboxPrimiveSerializer for all primitive types.
* Implement SandboxCollectionSerializer to support Collection types.
* Implement SandboxMapSerializer to support Map types.
* Attempt to fix infinite loop when computing Collection and Map fingerprints.
* Apply special handling when deserialising sandbox.java.lang.Character.
* Remap Java primitive types to sandbox Java object types to deter evolution.
* Use Class.getPackage().getName() to determine sandbox package name.
* Implement SandboxEnumSerializer to support Enum types.
* Implement SandboxPublicKeySerializer to support Java security keys.
* Add serialization projects to the composite example project.
* Implement serializers for BigInteger, BigDecimal, Currency and StringBuffer.
* Test that deserialising does not instantiate the untrusted user classes.
* Implement serializers for java.time.* types.
* Add serialiser for BitSet - currently disabled until BitSet itself is supported.
* Add serialisers for EnumSet and Class.
* Include support for EnumMap in the SandboxMapSerializer.
* Ensure the DJVM Example project's tests preserve @CordaSerializable.
* Add support for UUID as a primitive type.
* Use common abortReadOnly() method for declaring serialization as unsupported.
* Streamline the API for deserialising into the sandbox.
* Add preliminary support for deserialising X.509 certificates.
* Implement serializer for java.util.Optional.
* Refactor configuration of the sandbox serialization scheme.
* Add tests for deserialising arrays of basic types.
* Include method annotations in annotation stitching. This ensures that `@ConstructorForDeserialization` is not dropped.
* Enable test for SandboxBitSetSerializer.
* Enable tests for X.509 serializers.
* Implement serializers for ProtonJ primitive types.
* Serialize java.util.Date as a primitive type.
* Add the bintray Gradle plugin to the serialisation modules.
* Do not publish serialisation modules - they will become part of Corda itself.
* CORDA-2876: Only apply DJVM sources to Node Driver when devMode=true.
* Resolve DeteKT warnings.
* Require Node's JVM to set -Dnet.corda.djvm=true in order to enable DJVM.
* Enable DJVM for DemoBench nodes.
* Disable Quasar instrumentation verification for DemoBench nodes.
* Upgrade to DJVM 1.0-RC01.
* Try to modify DriverParameters in a more "ABI friendly" way.
* Refactor and simplify sandbox deserialisation of primitive objects.
* Review fixes.
* Update EvolutionSerializerFactory to handle sandboxed primitive boxed types.
* CORDA-3350: Increase size of constraints column (#5639)
* Detekt
* Update api file with new threshold
* Add check in transaction builder
* Revert "Add check in transaction builder"
This reverts commit ca3128f44c.
* Add check for max number of keys
* Update api file
* Address Tudor's comments
* Remove check for pre-5 and add test for EC keys
* fix typo and rename liquibase script
* updated docs with measurement numbers for composite keys
* Make detekt happy again
- Port ledger integrity work to `SingleThreadedStateMachineManager`
- Fix `StatemachineErrorHandlingTest`
- Fix compile errors in `RetryFlowMockTest` + `VaultObserverExceptionTest`
- Add method to `StaffedFlowHospital` that was missed during original merge
* CORDA-3194 Do not allow killed flows back into the hospital
This change has been made to prevent killed flows from being added back
to the hospital after being forcibly removed by `killFlow`. Not doing so,
could leave references to a flow inside of the hospital, which is not
the correct behaviour.
`killFlow` now sets a flow's `StatemachineState.isRemoved` to true.
This check is then used in `StaffedFlowHospital` and the
`DumpHistoryOnErrorInterceptor`.
* CORDA-3194 Log different message for transition error due to killed flow
When a flow is killed, its checkpoint is deleted. Currently, the
statemachine will still try a process the next event even if it has
been killed. This can lead to an error when trying to update the
deleted checkpoint. The exception thrown from this is logged out.
An if statement has been added to log a different message at debug level
if it is due to an update error for a killed flow. This is done to not
alarm node operators of the exception.
* CORDA-3194 Relax duplicate insert flow hospital handling
Revert a previous change to now make the duplicate insert staff member to
give a diagnosis of discharge or not my speciality (previously gave
terminal).
This is to prevent duplicate insert handling from overriding finality
flow error handling.
* added the warning as a TimerTask at StaffedFlowHospital#delayedDischargeTimer
* moved the scheduling of the warning task at StaffedFlowHospital#init block. That way we ensure that the task will be scheduled only once at StaffedFlowHospital initialization.
* Corrected overnight observation warning task's logging message. Changed StaffedFlowHospital#delayedDischargeTimer to the more generic StaffedFlowHospital#hospitalJobTimer since it now schedules delayed discharges tasks as well the overnight observation warning task. Removed this from property reference
* switching to fun timerTask for the instantiation of anonymous TimerTask classes
* Correct condition to log patients who are currently in the hospital, whose last record in their medical records is Outcome.OVERNIGHT_OBSERVATION. Extended logging to include treatableSessionInits staying in the hospital
* Add not empty check for patientsUnderOvernightObservation. Correct strings.
When a flow fails to retry, it should be kept in for overnight observation and aborted.
In the future, it might be possible to retry flows again that failed during their retry, but for now keeping for observation and aborting is satisfactory.
* CORDA-3194 Remove hospitalised flows from `HospitalisingInterceptor`
Small refactor to remove some of the hospital logic out of the
`HospitalisingInterceptor` and into the `StaffedFlowHospital`.
Add some comments to help clarify the purpose of the two maps inside
of the hospital.
* CORDA-3194 When a flow fails to retry force it into observation
When a flow fails to retry, it should be kept in for overnight
observation and aborted.
In the future, it might be possible to retry flows again that failed
during their retry, but for now keeping for observation and aborting is
satisfactory.
* CORDA-3194 Test for database commit failure when retrying a flow
Failing during the database commit failure that occurs after the retry
flow action does not stop the flow from actually retrying. This test
confirms this functionality.
The retried flow gets scheduled as part of the retry action. The failure
in the commit action does not prevent this since it has already been
scheduled.
* CORDA-3194 Replay start flow events when responding flow fails initial checkpoint commit
Logic has already been added to recover from initial checkpoint commit
failures on the initiating flow side but this did not suffice for
the same failure occurring on the responding flow's side.
The same idea has been added to resolve the responding flow's issue.
`ExternalMessageEvent` now has a `flowId` that is maintained on the
event. Messages can then be replayed to start/restart the flow, while
the event provides the flow id to each flow start.
Each `ExternalMessageEvent` implementation generates a random `flowId`
when constructed.
Events are stored in Artemis. This allows the solution to recover across
node restarts as the events will be pulled from artemis again when
restarting.
In the future `flowId`s will probably moved off of the events and
generated purely on the responding flow's node.
* CORDA-3194 Add test to verify that errors removing a responding flow are recoverable
* Unwrap rx.OnErrorNotImplementedException so the hospital can handle the cause appropriately
* Add db failure cordapp
* Renamed folders to avoid ambiguity in gradle
* Add integration test for exception hospitalisation when thrown from an RX observable.
* Make the test slightly cleaner
* Fix the schema to actually match the requirements for my custom state. Thanks a bunch, H2.
* Switch test to use SqlException base class.
* Schedule error event if we detect that a commit or db flush has thrown (forcing the flow to error even if customer code then goes ahead to swallow the exception)
* Revert change to schedule extra error
* Add more tests for edge case with DB exceptions, changed CorDapp to suppor this an hook in the flow hospital
* Warning about unsubscribe
Check state transitioned from clean to error for hospital admission.
* Match the test to our actual expectations
* Revert "Revert change to schedule extra error"
This reverts commit 43d47937
* Prevent suppression of errors arising in `transaction()` and `jdbcConnection()`
* Test for SqlException caught trying to escape from recordTransaction and suppressed outside being intercepted.
* More tests for various error/catch combinations
* Clean up and comments
* Code reformat
* Fix test compilation
Wrap exceptions that occur in state machine transitions with a custom exception type which is
then handled inside of the flow hospital. As part of this change, a number of side negative side
effects have been addressed.
General summary:
- `StateTransitionException` wraps exceptions caught in `TransitionExecutorImpl`
- `StateTransitionExceptions` are handled in the flow hospital, retried 3 times and then kept in
for observation if errors persist (assuming conditions below are false)
- Exceptions that occur in `FlowAsyncOperation` events are wrapped in
`AsyncOperationTransitionException` and ignored by the flow hospital transition staff member
- `InterruptException`s are given a `TERMINAL` diagnosis by the flow hospital transition staff
member (can occur due to `killFlow`)
- Allow flows which have not persisted their original checkpoint to still retry by replaying their
start flow messages
- Swallow exceptions in `AcknowledgeMessages` actions
Detailed summary:
* CORDA-3194 Add state machine transition error handling to flow hospital
Wrap exceptions that are caught in `TransitionExecutorImpl` (coming from
new errors) with `StateTransitionException`. This exception is then
handled explicitly by the flow hospital.
Add `TransitionErrorGeneralPractitioner` to `StaffedFlowHospital`. This
staff member handles errors that mention `StateTransitionException`.
Errors are retried and then kept in the hospital if the errors persist.
* CORDA-3194 Remove a fiber from the `hospitalisedFlows` if its previous state was clean
If the fiber's previous state was clean then remove it from
`HospitalisingInterceptor.hospitalisedFlows`. This allows flows that are
being retried to clean themselves. Doing this allows them to re-enter
the flow hospital after executing the fiber's transition (if an error
occurs).
This is important for retrying a flow that has errored during a
transition.
* CORDA-3194 Set `isAnyCheckpointPersisted` to true when retrying a flow
Added to prevent a single flow from creating multiple checkpoints when
a failure occurs during `Action.AcknowledgeMessages`.
More specifically, to `isAnyCheckpointPersisted` is false when retrying
the flow, even though a checkpoint has actually been saved. Due to this
a brand new flow is started with a new flow id (causing duplication).
Setting `isAnyCheckpointPersisted` to true specifically when retrying a
flow resolves this issue.
* CORDA-3194 Add Byteman test to verify transition error handling
Add `StatemachineErrorHandlingTest` to verify transition error handling.
Byteman allows exceptions to be injected at certain points in the code's
execution. Therefore exceptions can be thrown when needed inside of the
state machine.
The current tests check errors in events:
- `InitiateFlow`
- `AcknowledgeMessages`
* CORDA-3194 Swallow all exceptions in `ActionExecutorImpl.executeAcknowledgeMessages`
Swallow the exceptions that occur in the `DeduplicationHandler`s when
inside of `ActionExecutorImpl.executeAcknowledgeMessages`.
The side effects of the failures that can happen in the handlers are
not serious enough to put the transition into a failure state.
Therefore they are now caught. This allows the transition to continue
as normal, even if an error occurs in one any of the handlers.
* CORDA-3194 Wrap unexpected exceptions thrown in async operation transitions
Exceptions thrown inside of `FlowAsyncOperation.execute` implementations
that are not returned as part of the future, are caught, wrapped and
rethrown. This prevents unexpected exceptions thrown by (most likely)
user code from being handled by the hospital by the transition
staff member.
This handling might change moving forward, but it allows the async
operation to continue working as it was before transition error handling
was added.
* CORDA-3194 Verify that errors inside of `AcknowledgeMessages` work as expected
Update `StatemachineErrorHandlingTest` to correctly test errors that
occur when executing the `AcknowledgeMessages` action.
* CORDA-3194 Retry flows that failed to persist their original checkpoint
Allow a flow that failed when creating their original checkpoint (for
example - failing to commit the db transaction) to retry.
The flow will create a brand new checkpoint (as the original did not
saved).
This required adding `flowId` to `ExternalStartFlowEvent` to allow the
event to keep a record of the flow's id. When the flow is retried, the
events are replayed which trigger a flow to be started that has the
id stored in the event.
To allow this change, code was removed from `retryFlowFromSafePoint` to
allow the function to continue, even if no checkpoint matches the passed
in flow id.
* CORDA-3194 Correct `FlowFrameworkTests` test due to error handling
Test assumed that errors in transitions are not retried, this has now
been updated so the test passes with the flow succeeding after an
exception is thrown.
* CORDA-3194 Remove unneeded import
* CORDA-3194 Make the state transition exceptions extend `CordaException`
`StateTransitionException` and `AsyncOperationTransitionException` now
extend `CordaException` instead of `Exception`.
* CORDA-3194 Improve log messages
* CORDA-3194 Remove unneeded code in `HospitalisingInterceptor`
Due to a previous change, a section of code that removes a flow id
from the `hospitalisedFlows` map is no longer required. This code has
been removed.
* CORDA-3194 Constraint violations are given `TERMINAL` diagnosis
Add `Diagnosis.TERMINAL` to `StaffedFlowHospital` to allow an error
to be ignored and left to die a quick and painful death.
`StateTransitionException` changed so it does not cause serialisation
errors when propagated from a flow.
* CORDA-3194 `InterruptedExceptions` are given `TERMINAL` diagnosis
* Add GP to flow hospital, and start working on a list of things the GP knows to be incurable.
* Only hospitalise SQL and Persistence Exceptions (let's see if that is enough?), also rename to DatabaseDentist.
* Disabled hospitalisation of SQL exceptions in flow retry tests
* Fix RPC exception handling test by not using PersistenceException
* Ignore flaky integration test
* Code review: Rename staff member and add testing annotation
* Revert compiler.xml
* Added a new way for environment variables to be loaded, which allows for underscore based separation.
* Moved test to its own kotlin file.
* Added case insensitivity support.
* The corda. prefix is now case insensitive too.
* Removed unused variable.
* Added env variables support for driverDSL. Shadowing corda. properties raises an exception.
* Driver api stability fix.
* Changed type of cordapps param to reflect the real one, rather than what IntelliJ auto completed.
* Some detekt issue fixes. Spread operator removed, baselined api stability constructors and buggy line.
* Fixed misspelled variable.
* Reverted unintentional changes.
* Added suppress instead of changing baseline.
* Reworked logic to handle previously defined CORDA_ starting properties and handle accordingly. Fixed a bug where wrong class was used for reflection walking.
* Fix for detekt issues.
* Changed message to a more understandable one.
* Changelog + doc note, console error grammar.
* Changes according to PR review.
* Fixed wrong command line. Added security policy how to.
* Elaborated on security policy to mention when it applies and when not.
* Changes according to PR review.
* Magic to get jolokia version from a single place and forwarded to the docs.
* Fix for CORDA-3315. Removed default implementation of partyFromKey and replaced with implementations in IdentityService sub-types.
* Added test.
* Added missing DB transaction to append only persistent map lookup.
* Fixed not utilising the observables being returned by stateMachines added response with a notUsed(). Opening a ticket for implementation investigation.
* stateMachinesFeed will unsubscribe on interrupt rather than remaining infinitely subscribed.
* Fixed reported detekt issues on the InteractiveShell.
* Changes according to PR review.
* Capture and log "nodeInfo" persistence failures, whilst maintaining an optimistic retry mechanism.
* Additional test cases (update and insert)
* Handle both updates and inserts consistently (single transaction for happy path)
* Fix detekt violations and update baseline with false detection.
* Streamline the code a little.
* Update baseline reporting false violation.