In enterprise, `AuthDBTests` picked up a schema from a unit test and
included it in the cordapp it builds. This schema does not have a
migration and therefore fails the integration tests.
`NodeBasedTest` now lets cordapps to be defined and passed in to avoid
this issue. It defaults to making a cordapp from the tests base
directory if none are provided.
Performance tuning of the new checkpoint schema;
- Checkpoint tables are now using `flowId` as join keys.
- Indexes consist of a PK's index on `node_checkpoints(flow_id)` and then unique indexes on `node_checkpoint_blobs(flow_id)` and `node_flow_metadata(flow_id)`.
- Serialization of `checkpointState` is being done with `CHECKPOINT_CONTEXT` so that we can have compression. This is needed when messages get passed into `checkpointState.sessions` therefore `checkpointState` grows in size upon serialized and
saved into the database.
* Deserialize checkpointState with CHECKPOINT_CONTEXT
* Align tests with schema update; We cannot add and update a checkpoint in the same session now, ends up with hibernate complaining: two different objects with same identifier
* Fix indentation and format
* Ignore tests that assert DBFlowResult or DBFlowException
* Set DBFlowCheckpoint.blob to null whenever the flow errors or hospitalizes; this way we save an extra SELECT in such cases;
* Fix test; cleared Hibernate session, it would fail at checkpoint with 'org.hibernate.NonUniqueObjectException'
* Changing VARCHAR to NVARCHAR
* Rename v17 liquibase scripts to v19 to resolve collision with ENT v17 scripts
* CORDA-3750: Use hand-written sandbox Crypto object that delegates to the node.
* CORDA-3750: Add integration test for deterministic CashIssueAndPayment flow.
* Tidy up generics for Array instances.
* Upgrade to DJVM 1.1-RC04.
* Fix erroneous sql statement for oracle; It was failing tests with 'ORA-00933: SQL command not properly ended'
* Fixed flaky test; it didn't wait for counter party flow to get hospitalized as the test implied
Added command-line option: `--pause-all-flows` to the Node to control this.
This mode causes all checkpoints to be set to status PAUSED when the
state machine starts up (in StartMode.Safe mode).
Changed the state machine so that PAUSED checkpoints are loaded into
memory (the checkpoint is deserialised but the flow state is left serialised)
but not started.
Messages from peers are queued whilst the flow is paused and processed
once the flow is resumed.
When a non-database exception is thrown out of a `withEntityManager`
block, always check if the session needs to be rolled back.
This means if a database error is caught and a new non-database error is
thrown out of the `withEntityManager` block, the transaction is still
rolled back. The flow can then continue progressing as normal.
* CORDA-3291 `isKilled` flag and session errors for killed flows
## Summary
Two major improvements have been worked on:
- A new flag named `isKilled` has been added to `FlowLogic` to allow
developers to break out of loops without suspension points.
- Killed flows now send session errors to their counter parties allowing
their flows to also terminate without further coordination.
Achieving these changes required a __fundamental__ change to how flows are
killed as well as how they sleep.
## `isKilled` flag
The addition of `FlowLogic.isKilled` allows flows to check if the
current flow has been killed. They can then throw an exception to lead
to the flow's termination (following the standard error pathway). They
can also perform some extra logic or not throw an exception if they
really wanted to.
No matter what, once the flag is set, the flow will terminate. Due to
timing, a killed flow might successfully process its next suspension
event, but it will then process a killed transition and terminate.
## Send session errors when killing a flow
A flow will now send session errors to all of its counter parties. They
are transferred as `UnexpectedFlowEndException`s. This allows initiated
flows to handle these errors as they see fit, although they should
probably just terminate.
## How flows are killed
### Before
Originally we were relying on Quasar to interrupt a flow's fiber, we
could then handle the resulting `InterruptedException`. The problem with
this solution is that it only worked when a flow was already suspended
or when a flow moved into suspension. Flows stuck in loops did not work.
### After
We now *do not* use Quasar to interrupt a flow's fiber. Instead, we
switch `FlowStateMachine.isKilled` to true and schedule a new event.
Any event that is processed after switching this flag will now cause a
`KilledFlowTransition`. This transition follows similar logic to how
error propagation works. Note, the extra event allows a suspended flow
to be killed without waiting for the event that it was _really_ waiting
for.
This allows a lot of the tidy up code in `StateMachineManager.killFlow`
to be removed as tidy up is executed as part of removing a flow.
Deleting a flow's checkpoint and releasing related soft locks is still
handled manually in case of infinite loops but also triggered as part
of the actions executed in a transition.
This required flow sleeping to be changed as we no longer rely on
quasar.
## How flows now sleep
The reliance on Quasar to make a flow sleep has been removed.
Instead, when a flow sleeps we create a `ScheduledFuture` that is
delayed for the requested sleep duration. When the future executes it
schedules a `WakeUpFromSleep` event that wakes up the flow... Duh.
`FlowSleepScheduler` handles the future logic. It also uses the same
scheduled thread pool that timed flows uses.
A future field was added to `StateMachineState`. This removes the
need for concurrency control around flow sleeps as the code path does
not need to touch any concurrent data structures.
To achieve this:
- `StateMachineState.future` added as a `var`
- When the `ScheduledFuture` is created to wake up the flow the passed
in `StateMachineState` has its `future` value changed
- When resumed `future` and `isWaitingForFuture` are set to `null` and
`false` respectively
- When cancelling a sleeping flow, the `future` is cancelled and nulled
out. `isWaitingForFuture` is not changed since the flow is ending anyway
so really the value of the field is not important.
* CORDA-3722 withEntityManager can rollback its session
## Summary
Improve the handling of database transactions when using
`withEntityManager` inside a flow.
Extra changes have been included to improve the safety and
correctness of Corda around handling database transactions.
This focuses on allowing flows to catch errors that occur inside an
entity manager and handle them accordingly.
Errors can be caught in two places:
- Inside `withEntityManager`
- Outside `withEntityManager`
Further changes have been included to ensure that transactions are
rolled back correctly.
## Catching errors inside `withEntityManager`
Errors caught inside `withEntityManager` require the flow to manually
`flush` the current session (the entity manager's individual session).
By manually flushing the session, a `try-catch` block can be placed
around the `flush` call, allowing possible exceptions to be caught.
Once an error is thrown from a call to `flush`, it is no longer possible
to use the same entity manager to trigger any database operations. The
only possible option is to rollback the changes from that session.
The flow can continue executing updates within the same session but they
will never be committed. What happens in this situation should be handled
by the flow. Explicitly restricting the scenario requires a lot of effort
and code. Instead, we should rely on the developer to control complex
workflows.
To continue updating the database after an error like this occurs, a new
`withEntityManager` block should be used (after catching the previous
error).
## Catching errors outside `withEntityManager`
Exceptions can be caught around `withEntityManager` blocks. This allows
errors to be handled in the same way as stated above, except the need to
manually `flush` the session is removed. `withEntityManager` will
automatically `flush` a session if it has not been marked for rollback
due to an earlier error.
A `try-catch` can then be placed around the whole of the
`withEntityManager` block, allowing the error to be caught while not
committing any changes to the underlying database transaction.
## Savepoints / Transactionality
To make `withEntityManager` blocks work like mini database transactions,
save points have been utilised. A new savepoint is created when opening
a `withEntityManager` block (along with a new session). It is then used
as a reference point to rollback to if the session errors and needs to
roll back. The savepoint is then released (independently from
completing successfully or failing).
Using save points means, that either all the statements inside the
entity manager are executed, or none of them are.
## Some implementation details
- A new session is created every time an entity manager is requested,
but this does not replace the flow's main underlying database session.
- `CordaPersistence.transaction` can now determine whether it needs
to execute its extra error handling code. This is needed to allow errors
escape `withEntityManager` blocks while allowing some of our exception
handling around subscribers (in `NodeVaultService`) to continue to work.
* CORDA-3762: Integration test exposing the problem reported
* CORDA-3726: Additional logging
* CORDA-3726: Prevent thread leaks
* CORDA-3726: New `journalBufferTimeout` parameter
* CORDA-3726: Override `journalBufferTimeout` parameter
* CORDA-3726: Making Detekt happier
* CORDA-3276: Account for extra thread user in MockNetwork
For real node this does not matter as `shutdown` can safely be called multiple times, which is not true for server thread provided by MockNetwork
* CORDA-3276: Do not make SMM shutdown "executor" as it belongs to AbstractNode
* CORDA-3276: Address input from @rick-r3
* CORDA-3276: Fix test after rebase
* CORDA-3696: Temporary update to enable JDK11 build and test. Will eventually be switchable.
* CORDA-3696: Filter out the Nashorn warning.
* CORDA-3696: Add JDK11 classifier.
* CORDA-3696: Updated match string to cope with JDK11.
* CORDA-3696: Filtering out SPHINCS256_SHA256 where failing due to JDK11.
* CORDA-3696: Now remove SPHINCS256_SHA256 only if JDK11.
* CORDA-3696: Fix test failure - switch to regex matching.
* CORDA-3696: Hide the illegal access warnings.
* CORDA-3696: Check for Java11 when disabling Java11 warnings.
* CORDA-3696: Fix unneccessary non null check.
* CORDA-3696: Reverting build env to JDK8
* CORDA-3696: Revert hiding of illegal access warnings via Unsafe class.
* CORDA-3696: Remove internal access warnings and new JDK11 version checker.
* CORDA-3696: Updated build file for OS
* CORDA-3696: Removed typo
* CORDA-3696: Fixed space typo.
* CORDA-3696: Open modules to remove the illegal access warnings.
Co-authored-by: Adel El-Beik <adelel-beik@19LDN-MAC108.local>
Flows that are kept for overnight observation:
- Save their Checkpoint.status as 'HOSPITALIZED' in the database
- Save the error that caused the hospitalization in the database
A new Event was added for this reason. Whenever the hospital determines
a flow for hospitalization, it adds this Event in the flow's fiber queue.
When processed it creates a new DB transaction, stores the checkpoint status along with
the error, and it adds a 'FlowContinuation.ProcessEvents' continuation so that the fiber keeps
processing events (effectively since there are no more events in the fiber's channel, the fiber will suspend).
Flows that error:
- Their checkpoints are kept in the database
- Save their Checkpoint.status as 'FAILED'
- Save the error that caused the error in the database
Upon erroring, the flow's Checkpoint.status gets updated('FAILED') and the checkpoint is stored
in the database instead of getting removed. The flow then propagates the error to counterparties,
sets its future with the error and gets removed from memory.
* Run serialisation tests with both in-process and out-of-process nodes.
* Add custom serialisers and whitelists to Driver's AMQPServerSerializationScheme.
* Run serialisation tests with both in-process and out-of-process nodes.
* Add custom serialisers and whitelists to Driver's AMQPServerSerializationScheme.
* CORDA-3596 Record flow metadata
Record flow metadata during the zero'th checkpoint that occurs before
calling the flow's `call` function.
This required adding an RPC call's arguments to the `InvocationContext`
that gets created. These arguments are then accessible within the
statemachine and from the `Checkpoint` class. The arguments are then
extracted when recording a flow's metadata inside of
`DBCheckpointStorage`.
Updated the size of the started by column to 128 since it was not long
enough to hold the fully qualified class of a service that started a
flow.
* CORDA-3596 Remove arguments from in-memory checkpoint
When executing a flows first real suspend (from flow code) the arguments
contained in the `InvocationContext` are removed. This saves holding
these arguments for the whole lifecyle of a flow.
* CORDA-3596 Increase `cordapp_name` column to 128
* CORDA-3596 Join metadata by `flow_id`
Due to changes in where metadata is recorded, there is no need for
having `invocation_id` as the metadata table's primary key. The
`flow_id` is now the primary key of the table and is used to join to the
main checkpoints table.
The `invocation_id` has been removed from the checkpoints table since it
is not needed for the join anymore.
* CORDA-3596 Remove `received_time` from metadata table
* CORDA-3596 Remove unused `StartReason` enum
* CORDA-3596 Simple `DBCheckpointStorageTests` for metadata
* CORDA-3596 Truncate really long flow names
When a flow is finished do not delete the checkpoint from the DB.
Instead, the FlowStatus is marked as Completed in the DB.
Updated numerous tests which relied on the flow being removed
when finished.
* Update Checkpoint DB to update flow io request
* Modify flow monitor to update Checkpoint DB with waiting flows
This happens periodically.
* Refactored code to avoid looping twice and updated tests
* Fix tests after rebasing
* Fix MR comments (non-functional refactor of tests + FlowMonitor).
* Made visible for testing method private in DBCheckpointStorage
This is not needed anymore.
* Explicity check if ioRequestType has changed in update method
* Fix shadowing warning
* Import non deprecated Assert into test
* Use AssertEquals not assert in test
* Address more comments (minor refactor) of DBCheckpointStorage
* Minor fix use it instead of referencing object explicitly
* Add null check to DBCheckpointStorage
* Revert changes to Flow Monitor.
We will instead store the information in the main thread of the
state machine.
* Remove now uneeded API and make statemachine update ioRequest
* Add Integration Test to check statemachine updates DB on Recieve
* Use simpleName in checkpoint storage instead of class.
Hibernate was previously resetting the class field this is now
set to null (when getting checkpoint form DB) and a new method
for getting back the simple name as a string.
* Update StateMachineState to store simple name.
* Fix after rebase broke stuff + renamed test
* Fix Detekt issue
* Remove uneeded null assertion
Do not cascade updates to checkpoint error and result tables to hopefully
improve database performance moving forward. Because the joined tables
are no longer being updated by updating the main `DBFlowCheckpoint` entity,
they must be created/updated/deleted manually.
The checkpoint blobs still cascade as they pretty much always evolve in
tandem with the main checkpoint table.