* CORDA-3871: Import external code
Compiles, but does not work for various reasons
* CORDA-3871: More improvements to imported code
Currently fails due to keystores not being found
* CORDA-3871: Initialise keystores for the server
Currently fails due to keystores for client not being found
* CORDA-3871: Configure certificates to client
The program started to run
* CORDA-3871: Improve debug output
* CORDA-3871: Few more minor changes
* CORDA-3871: Add AMQClient test
Currently fails due to `localCert` not being set
* CORDA-3871: Configure server to demand client to present its certificate
* CORDA-3871: Changes to the test to make it pass
ACK status is not delivered as server is not talking AMQP
* CORDA-3871: Add delayed handshake scenario
* CORDA-3871: Tidy-up imported classes
* CORDA-3871: Hide thread creation inside `ServerThread`
* CORDA-3871: Test description
* CORDA-3871: Detekt baseline update
* CORDA-3871: Trigger repeated execution of new tests
To make sure they are not flaky
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: New tests proven to be stable - reduce number of iterations to 1
* CORDA-3871: Adding Alex Karnezis to the list of contributors
Correct race condition in FlowVersioningTest where the last message is read (and the session close can be triggered)
before one side has finished reading metadata from the session.
Enable reloading of a flow after every checkpoint is saved. This
includes reloading the checkpoint from the database and recreating the
fiber.
When a flow and its `StateMachineState` is created it checks the node's
config to see if the `reloadCheckpointAfterSuspend` is set to true. If it is
it initialises `StateMachineState.reloadCheckpointAfterSuspendCount`
with the value 0. Otherwise, it remains `null`.
This count represents how many times the flow has reloaded from its
checkpoint (not the same as retrying). It is incremented every time the
flow is reloaded.
When a flow suspends, it processes the suspend event like usual, but
it will now also check if `reloadCheckpointAfterSuspendCount` is not
`null` (that it is activated) and process a
`ReloadFlowFromCheckpointAfterSuspend`event, if and only if
`reloadCheckpointAfterSuspendCount` is greater than
`CheckpointState.numberOfSuspends`.
This means idempotent flows can reload from the start and not reload
again until reaching a new suspension point.
Flows that skip checkpoints can reload from a previously saved
checkpoint (or from the initial checkpoint) and will continue reloading
on reaching the next new suspension point (not the suspension point that
it skipped saving).
If the flow fails to deserialize the checkpoint from the database upon
reloading a `ReloadFlowFromCheckpointException` is throw. This causes
the flow to be kept for observation.
* CORDA-3845: Update BC to 1.64
* CORDA-3845: Upgraded log4j to 2.13.3
* We can remove the use of Manifests from the logging package so that when _it_ logs it doesn't error on the fact the stream was already closed by the default Java logger.
* Some more tidy up
* Remove the logging package as a plugin
* latest BC version
* Remove old test
* fix up
* Fix some rebased changes to log file handling
* Fix some rebased changes to log file handling
* Update slf4j too
Co-authored-by: Adel El-Beik <adel.el-beik@r3.com>
* CORDA-3717: Apply custom serializers to checkpoints
* Remove try/catch to fix TooGenericExceptionCaught detekt rule
* Rename exception
* Extract method
* Put calls to the userSerializer on their own lines to improve readability
* Remove unused constructors from exception
* Remove unused proxyType field
* Give field a descriptive name
* Explain why we are looking for two type parameters when we only use one
* Tidy up the fetching of types
* Use 0 seconds when forcing a flow checkpoint inside test
* Add test to check references are restored correctly
* Add CheckpointCustomSerializer interface
* Wire up the new CheckpointCustomSerializer interface
* Use kryo default for abstract classes
* Remove unused imports
* Remove need for external library in tests
* Make file match original to remove from diff
* Remove maySkipCheckpoint from calls to sleep
* Add newline to end of file
* Test custom serializers mapped to interfaces
* Test serializer configured with abstract class
* Move test into its own package
* Rename test
* Move flows and serializers into their own source file
* Move broken map into its own source file
* Delete comment now source file is simpler
* Rename class to have a shorter name
* Add tests that run the checkpoint serializer directly
* Check serialization of final classes
* Register as default unless the target class is final
* Test PublicKey serializer has not been overridden
* Add a broken serializer for EdDSAPublicKey to make test more robust
* Split serializer registration into default and non-default registrations. Run registrations at the right time to preserve Cordas own custom serializers.
* Check for duplicate custom checkpoint serializers
* Add doc comments
* Add doc comments to CustomSerializerCheckpointAdaptor
* Add test to check duplicate serializers are logged
* Do not log the duplicate serializer warning when the duplicate is the same class
* Update doc comment for CheckpointCustomSerializer
* Sort serializers by classname so we are not registering in an unknown or random order
* Add test to serialize a class that references itself
* Store custom serializer type in the Kryo stream so we can spot when a different serializer is being used to deserialize
* Testing has shown that registering custom serializers as default is more robust when adding new cordapps
* Remove new line character
* Remove unused imports
* Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt
* Remove comment
* Update comment on exception
* Make CustomSerializerCheckpointAdaptor internal
* Revert "Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt"
This reverts commit b835de79bd.
* Restore "Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt""
This reverts commit 718873a4e9.
* Pass the class loader instead of the context
* Do less work in test setup
* Make the serialization context unique for CustomCheckpointSerializerTest so we get a new Kryo pool for the test
* Rebuild the Kryo pool for the given context when we change custom serializers
* Rebuild all Kryo pools on serializer change to keep serializer list consistent
* Move the custom serializer list into CheckpointSerializationContext to reduce scope from global to a serialization context
* Remove unused imports
* Make the new checkpointCustomSerializers property default to the empty list
* Delegate implementation using kotlin language feature
In enterprise, `AuthDBTests` picked up a schema from a unit test and
included it in the cordapp it builds. This schema does not have a
migration and therefore fails the integration tests.
`NodeBasedTest` now lets cordapps to be defined and passed in to avoid
this issue. It defaults to making a cordapp from the tests base
directory if none are provided.
Performance tuning of the new checkpoint schema;
- Checkpoint tables are now using `flowId` as join keys.
- Indexes consist of a PK's index on `node_checkpoints(flow_id)` and then unique indexes on `node_checkpoint_blobs(flow_id)` and `node_flow_metadata(flow_id)`.
- Serialization of `checkpointState` is being done with `CHECKPOINT_CONTEXT` so that we can have compression. This is needed when messages get passed into `checkpointState.sessions` therefore `checkpointState` grows in size upon serialized and
saved into the database.
* Deserialize checkpointState with CHECKPOINT_CONTEXT
* Align tests with schema update; We cannot add and update a checkpoint in the same session now, ends up with hibernate complaining: two different objects with same identifier
* Fix indentation and format
* Ignore tests that assert DBFlowResult or DBFlowException
* Set DBFlowCheckpoint.blob to null whenever the flow errors or hospitalizes; this way we save an extra SELECT in such cases;
* Fix test; cleared Hibernate session, it would fail at checkpoint with 'org.hibernate.NonUniqueObjectException'
* Changing VARCHAR to NVARCHAR
* Rename v17 liquibase scripts to v19 to resolve collision with ENT v17 scripts
* CORDA-3750: Use hand-written sandbox Crypto object that delegates to the node.
* CORDA-3750: Add integration test for deterministic CashIssueAndPayment flow.
* Tidy up generics for Array instances.
* Upgrade to DJVM 1.1-RC04.
* Fix erroneous sql statement for oracle; It was failing tests with 'ORA-00933: SQL command not properly ended'
* Fixed flaky test; it didn't wait for counter party flow to get hospitalized as the test implied
Added command-line option: `--pause-all-flows` to the Node to control this.
This mode causes all checkpoints to be set to status PAUSED when the
state machine starts up (in StartMode.Safe mode).
Changed the state machine so that PAUSED checkpoints are loaded into
memory (the checkpoint is deserialised but the flow state is left serialised)
but not started.
Messages from peers are queued whilst the flow is paused and processed
once the flow is resumed.
When a non-database exception is thrown out of a `withEntityManager`
block, always check if the session needs to be rolled back.
This means if a database error is caught and a new non-database error is
thrown out of the `withEntityManager` block, the transaction is still
rolled back. The flow can then continue progressing as normal.
* CORDA-3291 `isKilled` flag and session errors for killed flows
## Summary
Two major improvements have been worked on:
- A new flag named `isKilled` has been added to `FlowLogic` to allow
developers to break out of loops without suspension points.
- Killed flows now send session errors to their counter parties allowing
their flows to also terminate without further coordination.
Achieving these changes required a __fundamental__ change to how flows are
killed as well as how they sleep.
## `isKilled` flag
The addition of `FlowLogic.isKilled` allows flows to check if the
current flow has been killed. They can then throw an exception to lead
to the flow's termination (following the standard error pathway). They
can also perform some extra logic or not throw an exception if they
really wanted to.
No matter what, once the flag is set, the flow will terminate. Due to
timing, a killed flow might successfully process its next suspension
event, but it will then process a killed transition and terminate.
## Send session errors when killing a flow
A flow will now send session errors to all of its counter parties. They
are transferred as `UnexpectedFlowEndException`s. This allows initiated
flows to handle these errors as they see fit, although they should
probably just terminate.
## How flows are killed
### Before
Originally we were relying on Quasar to interrupt a flow's fiber, we
could then handle the resulting `InterruptedException`. The problem with
this solution is that it only worked when a flow was already suspended
or when a flow moved into suspension. Flows stuck in loops did not work.
### After
We now *do not* use Quasar to interrupt a flow's fiber. Instead, we
switch `FlowStateMachine.isKilled` to true and schedule a new event.
Any event that is processed after switching this flag will now cause a
`KilledFlowTransition`. This transition follows similar logic to how
error propagation works. Note, the extra event allows a suspended flow
to be killed without waiting for the event that it was _really_ waiting
for.
This allows a lot of the tidy up code in `StateMachineManager.killFlow`
to be removed as tidy up is executed as part of removing a flow.
Deleting a flow's checkpoint and releasing related soft locks is still
handled manually in case of infinite loops but also triggered as part
of the actions executed in a transition.
This required flow sleeping to be changed as we no longer rely on
quasar.
## How flows now sleep
The reliance on Quasar to make a flow sleep has been removed.
Instead, when a flow sleeps we create a `ScheduledFuture` that is
delayed for the requested sleep duration. When the future executes it
schedules a `WakeUpFromSleep` event that wakes up the flow... Duh.
`FlowSleepScheduler` handles the future logic. It also uses the same
scheduled thread pool that timed flows uses.
A future field was added to `StateMachineState`. This removes the
need for concurrency control around flow sleeps as the code path does
not need to touch any concurrent data structures.
To achieve this:
- `StateMachineState.future` added as a `var`
- When the `ScheduledFuture` is created to wake up the flow the passed
in `StateMachineState` has its `future` value changed
- When resumed `future` and `isWaitingForFuture` are set to `null` and
`false` respectively
- When cancelling a sleeping flow, the `future` is cancelled and nulled
out. `isWaitingForFuture` is not changed since the flow is ending anyway
so really the value of the field is not important.
* CORDA-3722 withEntityManager can rollback its session
## Summary
Improve the handling of database transactions when using
`withEntityManager` inside a flow.
Extra changes have been included to improve the safety and
correctness of Corda around handling database transactions.
This focuses on allowing flows to catch errors that occur inside an
entity manager and handle them accordingly.
Errors can be caught in two places:
- Inside `withEntityManager`
- Outside `withEntityManager`
Further changes have been included to ensure that transactions are
rolled back correctly.
## Catching errors inside `withEntityManager`
Errors caught inside `withEntityManager` require the flow to manually
`flush` the current session (the entity manager's individual session).
By manually flushing the session, a `try-catch` block can be placed
around the `flush` call, allowing possible exceptions to be caught.
Once an error is thrown from a call to `flush`, it is no longer possible
to use the same entity manager to trigger any database operations. The
only possible option is to rollback the changes from that session.
The flow can continue executing updates within the same session but they
will never be committed. What happens in this situation should be handled
by the flow. Explicitly restricting the scenario requires a lot of effort
and code. Instead, we should rely on the developer to control complex
workflows.
To continue updating the database after an error like this occurs, a new
`withEntityManager` block should be used (after catching the previous
error).
## Catching errors outside `withEntityManager`
Exceptions can be caught around `withEntityManager` blocks. This allows
errors to be handled in the same way as stated above, except the need to
manually `flush` the session is removed. `withEntityManager` will
automatically `flush` a session if it has not been marked for rollback
due to an earlier error.
A `try-catch` can then be placed around the whole of the
`withEntityManager` block, allowing the error to be caught while not
committing any changes to the underlying database transaction.
## Savepoints / Transactionality
To make `withEntityManager` blocks work like mini database transactions,
save points have been utilised. A new savepoint is created when opening
a `withEntityManager` block (along with a new session). It is then used
as a reference point to rollback to if the session errors and needs to
roll back. The savepoint is then released (independently from
completing successfully or failing).
Using save points means, that either all the statements inside the
entity manager are executed, or none of them are.
## Some implementation details
- A new session is created every time an entity manager is requested,
but this does not replace the flow's main underlying database session.
- `CordaPersistence.transaction` can now determine whether it needs
to execute its extra error handling code. This is needed to allow errors
escape `withEntityManager` blocks while allowing some of our exception
handling around subscribers (in `NodeVaultService`) to continue to work.
* CORDA-3762: Integration test exposing the problem reported
* CORDA-3726: Additional logging
* CORDA-3726: Prevent thread leaks
* CORDA-3726: New `journalBufferTimeout` parameter
* CORDA-3726: Override `journalBufferTimeout` parameter
* CORDA-3726: Making Detekt happier
* CORDA-3276: Account for extra thread user in MockNetwork
For real node this does not matter as `shutdown` can safely be called multiple times, which is not true for server thread provided by MockNetwork
* CORDA-3276: Do not make SMM shutdown "executor" as it belongs to AbstractNode
* CORDA-3276: Address input from @rick-r3
* CORDA-3276: Fix test after rebase
* CORDA-3696: Temporary update to enable JDK11 build and test. Will eventually be switchable.
* CORDA-3696: Filter out the Nashorn warning.
* CORDA-3696: Add JDK11 classifier.
* CORDA-3696: Updated match string to cope with JDK11.
* CORDA-3696: Filtering out SPHINCS256_SHA256 where failing due to JDK11.
* CORDA-3696: Now remove SPHINCS256_SHA256 only if JDK11.
* CORDA-3696: Fix test failure - switch to regex matching.
* CORDA-3696: Hide the illegal access warnings.
* CORDA-3696: Check for Java11 when disabling Java11 warnings.
* CORDA-3696: Fix unneccessary non null check.
* CORDA-3696: Reverting build env to JDK8
* CORDA-3696: Revert hiding of illegal access warnings via Unsafe class.
* CORDA-3696: Remove internal access warnings and new JDK11 version checker.
* CORDA-3696: Updated build file for OS
* CORDA-3696: Removed typo
* CORDA-3696: Fixed space typo.
* CORDA-3696: Open modules to remove the illegal access warnings.
Co-authored-by: Adel El-Beik <adelel-beik@19LDN-MAC108.local>
Flows that are kept for overnight observation:
- Save their Checkpoint.status as 'HOSPITALIZED' in the database
- Save the error that caused the hospitalization in the database
A new Event was added for this reason. Whenever the hospital determines
a flow for hospitalization, it adds this Event in the flow's fiber queue.
When processed it creates a new DB transaction, stores the checkpoint status along with
the error, and it adds a 'FlowContinuation.ProcessEvents' continuation so that the fiber keeps
processing events (effectively since there are no more events in the fiber's channel, the fiber will suspend).
Flows that error:
- Their checkpoints are kept in the database
- Save their Checkpoint.status as 'FAILED'
- Save the error that caused the error in the database
Upon erroring, the flow's Checkpoint.status gets updated('FAILED') and the checkpoint is stored
in the database instead of getting removed. The flow then propagates the error to counterparties,
sets its future with the error and gets removed from memory.