Added command-line option: `--pause-all-flows` to the Node to control this.
This mode causes all checkpoints to be set to status PAUSED when the
state machine starts up (in StartMode.Safe mode).
Changed the state machine so that PAUSED checkpoints are loaded into
memory (the checkpoint is deserialised but the flow state is left serialised)
but not started.
Messages from peers are queued whilst the flow is paused and processed
once the flow is resumed.
When a non-database exception is thrown out of a `withEntityManager`
block, always check if the session needs to be rolled back.
This means if a database error is caught and a new non-database error is
thrown out of the `withEntityManager` block, the transaction is still
rolled back. The flow can then continue progressing as normal.
* CORDA-3715: When loading cordapps now check that contract classes have class version between 49 and 52
* CORDA-3715: Now check class version when contract verification takes place.
* CORDA-3715: Making detekt happy with number of levels in func
* CORDA-3715: Make use of new ClassGraph release which provides class file major version number.
* CORDA-3715: Changed package name in test jars
* CORDA-3715: Use ClassGraph when loading attachments.
* CORDA-3715: Reverted file to 4.5 version
* CORDA-3715: Updating method to match non deterministic version.
* CORDA-3715: Added in default param.
* CORDA-3715: Adjusted min JDK version to 1.1
* CORDA-3715: Switching check to JDK 1.2
* CORDA-3715: Now version check SerializationWhitelist classes.
* CORDA-3715: Switched default to null for range.
* [EG-438] First commit of error code interface
* [EG-438] Implement error reporter and a few error codes
* [EG-438] Add unit tests and default properties files
* [EG-438] Add the error table builder
* [EG-438] Update initial properties files
* [EG-438] Add some Irish tests and the build.gradle
* [EG-438] Fall back for aliases and use different resource strategy
* [EG-438] Define the URL using a project-specific context
* [EG-438] Tidy up initialization code
* [EG-438] Add testing to generator and tidy up
* [EG-438] Remove direct dependency on core and add own logging config
* [EG-438] Fix compiler warnings and tidy up logging
* [EG-438] Fix detekt warnings
* [EG-438] Improve error messages
* [EG-438] Address first set of review comments
* [EG-438] Use enums and a builder for the reporter
* [EG-438] Address first set of review comments
* [EG-438] Use enums and a builder for the reporter
* [EG-438] Add kdocs for error resource static methods
* [EG-440] Add error code for duplicate CorDapp loading
* [EG-438] Handle enums defined with underscores
* [EG-440] Add errors for some CorDapp loading scenarios
* [EG-440] Finish adding errors for CorDapp loading
* [EG-440] Fix up errors in properties files
* [EG-440] Start change to error code definition
* [EG-440] Update error code definition and add resource generation tool
* [EG-440] Tidy up error resource generation tool frontend
* [EG-440] Small refactorings and add kdocs
* [EG-440] Generate all missing resources
* [EG-440] Some refactoring and start writing a test
* [EG-440] Update unit test for resource generator
* [EG-440] Renaming of various parts of the error tool
* [EG-440] Add testing for errors and fix an issue in resource generation
* [EG-440] Add a kdoc for context provider API
* [EG-440] Remove old code from repository
* [EG-440] Address some review comments
* CORDA-3291 `isKilled` flag and session errors for killed flows
## Summary
Two major improvements have been worked on:
- A new flag named `isKilled` has been added to `FlowLogic` to allow
developers to break out of loops without suspension points.
- Killed flows now send session errors to their counter parties allowing
their flows to also terminate without further coordination.
Achieving these changes required a __fundamental__ change to how flows are
killed as well as how they sleep.
## `isKilled` flag
The addition of `FlowLogic.isKilled` allows flows to check if the
current flow has been killed. They can then throw an exception to lead
to the flow's termination (following the standard error pathway). They
can also perform some extra logic or not throw an exception if they
really wanted to.
No matter what, once the flag is set, the flow will terminate. Due to
timing, a killed flow might successfully process its next suspension
event, but it will then process a killed transition and terminate.
## Send session errors when killing a flow
A flow will now send session errors to all of its counter parties. They
are transferred as `UnexpectedFlowEndException`s. This allows initiated
flows to handle these errors as they see fit, although they should
probably just terminate.
## How flows are killed
### Before
Originally we were relying on Quasar to interrupt a flow's fiber, we
could then handle the resulting `InterruptedException`. The problem with
this solution is that it only worked when a flow was already suspended
or when a flow moved into suspension. Flows stuck in loops did not work.
### After
We now *do not* use Quasar to interrupt a flow's fiber. Instead, we
switch `FlowStateMachine.isKilled` to true and schedule a new event.
Any event that is processed after switching this flag will now cause a
`KilledFlowTransition`. This transition follows similar logic to how
error propagation works. Note, the extra event allows a suspended flow
to be killed without waiting for the event that it was _really_ waiting
for.
This allows a lot of the tidy up code in `StateMachineManager.killFlow`
to be removed as tidy up is executed as part of removing a flow.
Deleting a flow's checkpoint and releasing related soft locks is still
handled manually in case of infinite loops but also triggered as part
of the actions executed in a transition.
This required flow sleeping to be changed as we no longer rely on
quasar.
## How flows now sleep
The reliance on Quasar to make a flow sleep has been removed.
Instead, when a flow sleeps we create a `ScheduledFuture` that is
delayed for the requested sleep duration. When the future executes it
schedules a `WakeUpFromSleep` event that wakes up the flow... Duh.
`FlowSleepScheduler` handles the future logic. It also uses the same
scheduled thread pool that timed flows uses.
A future field was added to `StateMachineState`. This removes the
need for concurrency control around flow sleeps as the code path does
not need to touch any concurrent data structures.
To achieve this:
- `StateMachineState.future` added as a `var`
- When the `ScheduledFuture` is created to wake up the flow the passed
in `StateMachineState` has its `future` value changed
- When resumed `future` and `isWaitingForFuture` are set to `null` and
`false` respectively
- When cancelling a sleeping flow, the `future` is cancelled and nulled
out. `isWaitingForFuture` is not changed since the flow is ending anyway
so really the value of the field is not important.
* [EG-438] First commit of error code interface
* [EG-438] Implement error reporter and a few error codes
* [EG-438] Add unit tests and default properties files
* [EG-438] Add the error table builder
* [EG-438] Update initial properties files
* [EG-438] Add some Irish tests and the build.gradle
* [EG-438] Fall back for aliases and use different resource strategy
* [EG-438] Define the URL using a project-specific context
* [EG-438] Tidy up initialization code
* [EG-438] Add testing to generator and tidy up
* [EG-438] Remove direct dependency on core and add own logging config
* [EG-438] Fix compiler warnings and tidy up logging
* [EG-438] Fix detekt warnings
* [EG-438] Improve error messages
* [EG-438] Address first set of review comments
* [EG-438] Use enums and a builder for the reporter
* [EG-438] Add kdocs for error resource static methods
* [EG-438] Handle enums defined with underscores
* [EG-438] Slight refactoring of startup code
* [EG-438] Port changes to error reporting code from future branch
* [EG-438] Also port test changes
* [EG-438] Suppress a deliberately unused parameter
* CORDA-3722 withEntityManager can rollback its session
## Summary
Improve the handling of database transactions when using
`withEntityManager` inside a flow.
Extra changes have been included to improve the safety and
correctness of Corda around handling database transactions.
This focuses on allowing flows to catch errors that occur inside an
entity manager and handle them accordingly.
Errors can be caught in two places:
- Inside `withEntityManager`
- Outside `withEntityManager`
Further changes have been included to ensure that transactions are
rolled back correctly.
## Catching errors inside `withEntityManager`
Errors caught inside `withEntityManager` require the flow to manually
`flush` the current session (the entity manager's individual session).
By manually flushing the session, a `try-catch` block can be placed
around the `flush` call, allowing possible exceptions to be caught.
Once an error is thrown from a call to `flush`, it is no longer possible
to use the same entity manager to trigger any database operations. The
only possible option is to rollback the changes from that session.
The flow can continue executing updates within the same session but they
will never be committed. What happens in this situation should be handled
by the flow. Explicitly restricting the scenario requires a lot of effort
and code. Instead, we should rely on the developer to control complex
workflows.
To continue updating the database after an error like this occurs, a new
`withEntityManager` block should be used (after catching the previous
error).
## Catching errors outside `withEntityManager`
Exceptions can be caught around `withEntityManager` blocks. This allows
errors to be handled in the same way as stated above, except the need to
manually `flush` the session is removed. `withEntityManager` will
automatically `flush` a session if it has not been marked for rollback
due to an earlier error.
A `try-catch` can then be placed around the whole of the
`withEntityManager` block, allowing the error to be caught while not
committing any changes to the underlying database transaction.
## Savepoints / Transactionality
To make `withEntityManager` blocks work like mini database transactions,
save points have been utilised. A new savepoint is created when opening
a `withEntityManager` block (along with a new session). It is then used
as a reference point to rollback to if the session errors and needs to
roll back. The savepoint is then released (independently from
completing successfully or failing).
Using save points means, that either all the statements inside the
entity manager are executed, or none of them are.
## Some implementation details
- A new session is created every time an entity manager is requested,
but this does not replace the flow's main underlying database session.
- `CordaPersistence.transaction` can now determine whether it needs
to execute its extra error handling code. This is needed to allow errors
escape `withEntityManager` blocks while allowing some of our exception
handling around subscribers (in `NodeVaultService`) to continue to work.
On node start, load CordaServices before starting the NotaryService,
so that the NotaryService can check that the services it requires are
available when starting.
Resolves#6172.
* CORDA-3762: Integration test exposing the problem reported
* CORDA-3726: Additional logging
* CORDA-3726: Prevent thread leaks
* CORDA-3726: New `journalBufferTimeout` parameter
* CORDA-3726: Override `journalBufferTimeout` parameter
* CORDA-3726: Making Detekt happier
* CORDA-3276: Account for extra thread user in MockNetwork
For real node this does not matter as `shutdown` can safely be called multiple times, which is not true for server thread provided by MockNetwork
* CORDA-3276: Do not make SMM shutdown "executor" as it belongs to AbstractNode
* CORDA-3276: Address input from @rick-r3
* CORDA-3276: Fix test after rebase
* adding blocked functions ro RestrictedEntityManager and creating RestrictedConnection class
* adding flow tests and fixing issues regarding the review
* adding quasar util to gradle
* updating flow tests
* adding space before } at .isThrownBy()
* adding spaces
* [EG-503] Spent state audit tool
Fixes
* Refinements to notary query interfaces. Feature complete.
* EG-503: Introduce optional `notaryService` in `ServiceHubCoreInternal`
* Remove redundant logic following change to use extensions API
Co-authored-by: Viktor Kolomeyko <viktor.kolomeyko@r3.com>
* CORDA-3696: Temporary update to enable JDK11 build and test. Will eventually be switchable.
* CORDA-3696: Filter out the Nashorn warning.
* CORDA-3696: Add JDK11 classifier.
* CORDA-3696: Updated match string to cope with JDK11.
* CORDA-3696: Filtering out SPHINCS256_SHA256 where failing due to JDK11.
* CORDA-3696: Now remove SPHINCS256_SHA256 only if JDK11.
* CORDA-3696: Fix test failure - switch to regex matching.
* CORDA-3696: Hide the illegal access warnings.
* CORDA-3696: Check for Java11 when disabling Java11 warnings.
* CORDA-3696: Fix unneccessary non null check.
* CORDA-3696: Reverting build env to JDK8
* CORDA-3696: Revert hiding of illegal access warnings via Unsafe class.
* CORDA-3696: Remove internal access warnings and new JDK11 version checker.
* CORDA-3696: Updated build file for OS
* CORDA-3696: Removed typo
* CORDA-3696: Fixed space typo.
* CORDA-3696: Open modules to remove the illegal access warnings.
Co-authored-by: Adel El-Beik <adelel-beik@19LDN-MAC108.local>
* CORDA-3691 Delete checkpoint when flow finishes
The checkpoint and its related records in joined tables should be deleted
when a flow finishes.
Keeping these flows around will be completed in the future.
* CORDA-3691 Ignore some flow metadata tests
Ignore tests around recording the finish time of flow metadata records
since we are not currently keeping COMPLETED flows in the database.
Flows that are kept for overnight observation:
- Save their Checkpoint.status as 'HOSPITALIZED' in the database
- Save the error that caused the hospitalization in the database
A new Event was added for this reason. Whenever the hospital determines
a flow for hospitalization, it adds this Event in the flow's fiber queue.
When processed it creates a new DB transaction, stores the checkpoint status along with
the error, and it adds a 'FlowContinuation.ProcessEvents' continuation so that the fiber keeps
processing events (effectively since there are no more events in the fiber's channel, the fiber will suspend).
Flows that error:
- Their checkpoints are kept in the database
- Save their Checkpoint.status as 'FAILED'
- Save the error that caused the error in the database
Upon erroring, the flow's Checkpoint.status gets updated('FAILED') and the checkpoint is stored
in the database instead of getting removed. The flow then propagates the error to counterparties,
sets its future with the error and gets removed from memory.
* ENT-4967: Require no classifier for corda-node-djvm, corda-deserializers-djvm.
* Also remove classifiers from core, serialization and finance-contracts.
* Compile corda-serialization-djvm for Java 8 and remove its classifier.
Added a new field Completed to the in-memory object FlowState.
FlowState.Completed is corresponds to flow_state=Null in the DB.
This change will save disk space.
* Run serialisation tests with both in-process and out-of-process nodes.
* Add custom serialisers and whitelists to Driver's AMQPServerSerializationScheme.
* Run serialisation tests with both in-process and out-of-process nodes.
* Add custom serialisers and whitelists to Driver's AMQPServerSerializationScheme.
* CORDA-3601 Record a flow's finish time
Record a flow's finish time by updating its metadata record. It is set
in `updateCheckpoint` by checking the status of the checkpoint. If it is
`COMPLETED` it will set the `finishInstant` on the metadata object and
update it.
* CORDA-3601 Record flow finish time for all finished statuses
Update the flow finish time for the following statuses:
- COMPLETED
- KILLED
- FAILED
* CORDA-3601 Use platform clock in `DBCheckpointStorage`
* CORDA-3669 Do not execute `ExecuteAsyncOperation` multiple times
When a `FlowExternalOperation` or `FlowExternalAsyncOperation` executes
and completes a flag (`isFlowResumed`) is switched to true.
This flag was used inside of `DoRemainingWorkTransition` to decide
whether to skip over the execution of an event.
Since this flag was being switched to true when the external operation's
future completed, it was possible for _unexpected_ events to be placed
in the fiber's queue that would retrigger the
`FlowIORequest.ExecuteAsyncOperation`, that is held as the checkpoint's
next `FlowIORequest`to process.
By using the existing `StateMachineState.isTransactionTracked` (and
renaming it to `isWaitingForFuture`) we can decide to not process the
`FlowIORequest.ExecuteAsyncOperation` if it has already been called
before. This moves this code path in line with
`FlowIORequest.WaitForLedgerCommit`.
Random `DoRemainingWork` events can now be pushed to the fiber's queue
without causing the `FlowIORequest.ExecuteAsyncOperation` to execute
again.
* CORDA-3596 Record flow metadata
Record flow metadata during the zero'th checkpoint that occurs before
calling the flow's `call` function.
This required adding an RPC call's arguments to the `InvocationContext`
that gets created. These arguments are then accessible within the
statemachine and from the `Checkpoint` class. The arguments are then
extracted when recording a flow's metadata inside of
`DBCheckpointStorage`.
Updated the size of the started by column to 128 since it was not long
enough to hold the fully qualified class of a service that started a
flow.
* CORDA-3596 Remove arguments from in-memory checkpoint
When executing a flows first real suspend (from flow code) the arguments
contained in the `InvocationContext` are removed. This saves holding
these arguments for the whole lifecyle of a flow.
* CORDA-3596 Increase `cordapp_name` column to 128
* CORDA-3596 Join metadata by `flow_id`
Due to changes in where metadata is recorded, there is no need for
having `invocation_id` as the metadata table's primary key. The
`flow_id` is now the primary key of the table and is used to join to the
main checkpoints table.
The `invocation_id` has been removed from the checkpoints table since it
is not needed for the join anymore.
* CORDA-3596 Remove `received_time` from metadata table
* CORDA-3596 Remove unused `StartReason` enum
* CORDA-3596 Simple `DBCheckpointStorageTests` for metadata
* CORDA-3596 Truncate really long flow names
* CheckpointStorage.getAllCheckpoints will not fetch COMPLETED, FAILED and KILLED flows by default
* Rename getAllCheckpoints to getAllRunnableCheckpoints for clarity
* Fix Detekt issue
* Rename getAllRunnableCheckpoints to getRunnableCheckpoints
* Minor kdoc update
* Bring back in CheckpointStorage.getAllCheckpoints to co-exist with getRunnableCheckpoints
* Add progress tracker information to checkpoint
The checkpoint Datebase is updated when the statemachine suspends
with the progress trackers current step name. This is truncated if
it is longer than the Database column.
* Minor rename in statemachine for clarity
* Set/ Reset Checkpoint.status to RUNNABLE after when suspending
* Removing/ Moving comment as it makes no longer sense to be there since, we now always create a new Checkpoint object in SingleThreadedStateMachineManager.createFlowFromCheckpoint through tryDeserializeCheckpoint
* Set -in memory- Checkpoint.status to RUNNABLE when a flow is retrying from Checkpoint
Due to a change in how messaging works, `ActionExecutorImpl
.executeSendInitial` was no longer being called. Changing the byteman
script to throw exception on hits to `ActionExecutorImpl
.executeSendMultiple` allowed the tests to pass.
When a flow is finished do not delete the checkpoint from the DB.
Instead, the FlowStatus is marked as Completed in the DB.
Updated numerous tests which relied on the flow being removed
when finished.
* Update Checkpoint DB to update flow io request
* Modify flow monitor to update Checkpoint DB with waiting flows
This happens periodically.
* Refactored code to avoid looping twice and updated tests
* Fix tests after rebasing
* Fix MR comments (non-functional refactor of tests + FlowMonitor).
* Made visible for testing method private in DBCheckpointStorage
This is not needed anymore.
* Explicity check if ioRequestType has changed in update method
* Fix shadowing warning
* Import non deprecated Assert into test
* Use AssertEquals not assert in test
* Address more comments (minor refactor) of DBCheckpointStorage
* Minor fix use it instead of referencing object explicitly
* Add null check to DBCheckpointStorage
* Revert changes to Flow Monitor.
We will instead store the information in the main thread of the
state machine.
* Remove now uneeded API and make statemachine update ioRequest
* Add Integration Test to check statemachine updates DB on Recieve
* Use simpleName in checkpoint storage instead of class.
Hibernate was previously resetting the class field this is now
set to null (when getting checkpoint form DB) and a new method
for getting back the simple name as a string.
* Update StateMachineState to store simple name.
* Fix after rebase broke stuff + renamed test
* Fix Detekt issue
* Remove uneeded null assertion
* [CORDA-3628] - Implement sendAll API
* detekt
* Some minor refactorings and docs
* Eliminate warnings
* Address Rick's comments
* Switch sendAll to use a set
Do not cascade updates to checkpoint error and result tables to hopefully
improve database performance moving forward. Because the joined tables
are no longer being updated by updating the main `DBFlowCheckpoint` entity,
they must be created/updated/deleted manually.
The checkpoint blobs still cascade as they pretty much always evolve in
tandem with the main checkpoint table.