Add `CordaRPCOps.reattachFlowWithClientId` to allow clients to reattach
to an existing flow by only providing a client id. This behaviour is the
same as calling `startFlowDynamicWithClientId` for an existing
`clientId`. Where it differs is `reattachFlowWithClientId` will return
`null` if there is no flow running or finished on the node with the same
client id.
Return `null` if record deleted from race-condition
Update the compatible flag in the DB if the flowstate cannot be deserialised.
The most common cause of this problem is if a CorDapp has been upgraded
without draining flows from the node.
`RUNNABLE` and `HOSPITALISED` flows are restored on node startup so
the flag is set for these then. The flag can also be set when a flow
retries for some reason (see retryFlowFromSafePoint) in this case the
problem has been caused by another reason.
Added a newpause event to the statemachine which returns an Abort
continuation and causes the flow to be moved into the Paused flow Map.
Flows can receive session messages whilst paused.
Add a lock to `StateMachineState`, allowing every flow to lock
themselves when performing a transition or when an external thread (such
as `killFlow`) tries to interact with a flow from occurring at the same
time.
Doing this prevents race-conditions where the external threads mutate
the database or the flow's state causing an in-flight transition to
fail.
A `Semaphore` is used to acquire and release the lock. A `ReentrantLock`
is not used as it is possible for a flow to suspend while locked, and
resume on a different thread. This causes a `ReentrantLock` to fail when
releasing the lock because the thread doing so is not the thread holding
the lock. `Semaphore`s can be used across threads, therefore bypassing
this issue.
The lock is copied across when a flow is retried. This is to prevent
another thread from interacting with a flow just after it has been
retried. Without copying the lock, the external thread would acquire the
old lock and execute, while the fiber thread acquires the new lock and
also executes.
* Remove use of Thread.sleep() FROM FlowReloadAfterCheckpointTest, instead relying on CountdownLatch to wait until the target number has been hit or a timeout occurs, so the thread can continue as soon as the target is hit.
* Replace use of hashmaps to a concurrent queue, to mitigate risk of complex threading issues.
Enhance rpc acknowledgement method (`removeClientId`) to remove checkpoint
from all checkpoint database tables.
Optimize `CheckpointStorage.removeCheckpoint` to not delete from all checkpoint
tables if not needed. This includes excluding the results (`DBFlowResult`) and
exceptions (`DBFlowException`) tables.
Integrate `DBFlowException` with the rest of the checkpoint schema, so now
we are saving the flow's exception result in the database.
Making statemachine not remove `FAILED` flows' checkpoints from the
database if they are started with a clientId.
Retrieve the DBFlowException from the database to construct a
`FlowStateMachineHandle` future and complete exceptionally the flow's result
future for requests (`startFlowDynamicWithClientId`) that pick FAILED flows ,
started with client id, of status Removed.
On killing a flow the client id mapping of the flow gets removed.
The storage serialiser is used for serialising exceptions. Note, that if an
exception cannot be serialised, it will not fail and will instead be stored
as a `CordaRuntimeException`. This could be improved in future
changes.
Dummy package names cause build failure as they are not found on the classpath when trying to import them. Now that empty package name list is allowed, the dummy names are removed.
* CORDA-3663 MockServices crashes when two of the provided packages to scan are deemed empty in 4.4 RC05
this happends when a given package is not found on the classpath. Now it is handled and an exception is thrown
* replace dummy package names in tests with valid ones
* allow empty package list for CustomCordapps and exclude those from the created jars
* detekt fix
* always true logic fix
* fix to check for empty packages instead of empty classes
* fix for classes and fixups
* logic refactor because of detekt stupidity
* PR related minor refactors
When an incorrect message is received, the flow should resume to allow
it to throw the error back to user code and possibly cause the flow to
fail.
For now, if an `EndSessionMessage` is received instead of a
`DataSessionMessage`, then an `UnexpectedFlowEndException` is thrown
back to user code. Allowing it to correctly re-enter normal flow error
handling.
Without this change, the flow will hang due to it failing while creating
a transition which exists outside of the general state machine error
handling code path.
Sessions are now terminated after performing the original
`FlowIORequest` passed into `StartedFlowTransition`, instead of before.
This is done by scheduling an `Event.TerminateSessions` if there are
sessions to terminate when performing a suspending event.
Originally this was done by hijacking a transition that is trying to
perform a `StartedFlowTransition`, terminating the sessions and then
scheduling another `Event.DoRemainingWork` to perform the original
transition. This introduced a bug where, another event (from a external
message) could be placed onto the queue before the
`Event.DoRemainingWork` could be added. In most scenarios, that should
be ok. But, if a flow is retrying (while in an uninitiated state) and
this occurs the flow could fail due to being in an unexpected state.
Terminating the sessions after performing the original transition
removes this possibility. Meaning that a restarting flow will always
perform the transition they supposed to do (based on the called
suspending event).
* CORDA-3871: Import external code
Compiles, but does not work for various reasons
* CORDA-3871: More improvements to imported code
Currently fails due to keystores not being found
* CORDA-3871: Initialise keystores for the server
Currently fails due to keystores for client not being found
* CORDA-3871: Configure certificates to client
The program started to run
* CORDA-3871: Improve debug output
* CORDA-3871: Few more minor changes
* CORDA-3871: Add AMQClient test
Currently fails due to `localCert` not being set
* CORDA-3871: Configure server to demand client to present its certificate
* CORDA-3871: Changes to the test to make it pass
ACK status is not delivered as server is not talking AMQP
* CORDA-3871: Add delayed handshake scenario
* CORDA-3871: Tidy-up imported classes
* CORDA-3871: Hide thread creation inside `ServerThread`
* CORDA-3871: Test description
* CORDA-3871: Detekt baseline update
* CORDA-3871: Trigger repeated execution of new tests
To make sure they are not flaky
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: New tests proven to be stable - reduce number of iterations to 1
* CORDA-3871: Adding Alex Karnezis to the list of contributors
Correct race condition in FlowVersioningTest where the last message is read (and the session close can be triggered)
before one side has finished reading metadata from the session.
Remove memory leak endurance test as it spends 8 minutes testing a single failure case that's not end user visible,
and ultimately manifests elsewhere in test failures (which is where this came from in the beginning). It was a good
idea to confirm the change fixed the issue, but this isn't critical enough to retain.
Making statemachine not remove COMPLETED flows' checkpoints from the database
if they are started with a clientId, instead they are getting persisted and retained within
the database along with their result (`DBFlowResult`).
On flow start with a client id (`startFlowDynamicWithClientId`), if the client id maps to
a flow that was previously started with the same client id and the flow is now finished,
then fetch the `DBFlowResult` from the database to construct a
`FlowStateMachineHandle` done future and return it back to the client.
Object stored as results must abide by the storage serializer rules. If they fail to do so
the result will not be stored and an exception is thrown to the client to indicate this.
Enable reloading of a flow after every checkpoint is saved. This
includes reloading the checkpoint from the database and recreating the
fiber.
When a flow and its `StateMachineState` is created it checks the node's
config to see if the `reloadCheckpointAfterSuspend` is set to true. If it is
it initialises `StateMachineState.reloadCheckpointAfterSuspendCount`
with the value 0. Otherwise, it remains `null`.
This count represents how many times the flow has reloaded from its
checkpoint (not the same as retrying). It is incremented every time the
flow is reloaded.
When a flow suspends, it processes the suspend event like usual, but
it will now also check if `reloadCheckpointAfterSuspendCount` is not
`null` (that it is activated) and process a
`ReloadFlowFromCheckpointAfterSuspend`event, if and only if
`reloadCheckpointAfterSuspendCount` is greater than
`CheckpointState.numberOfSuspends`.
This means idempotent flows can reload from the start and not reload
again until reaching a new suspension point.
Flows that skip checkpoints can reload from a previously saved
checkpoint (or from the initial checkpoint) and will continue reloading
on reaching the next new suspension point (not the suspension point that
it skipped saving).
If the flow fails to deserialize the checkpoint from the database upon
reloading a `ReloadFlowFromCheckpointException` is throw. This causes
the flow to be kept for observation.
* CORDA-3844: Add new functions to network map client
* CORDA-3844: Apply new fetch logic to nm updater
* CORDA-3844: Fix base url and warnings
* CORDA-3844: Change response object and response validation
In order to make sure that the returned node infos are not maliciously modified, either a signed list response
or a signed reference object would need to be provided. As providing a signed list requires a lot of effort from NM and Signer services,
the signed network map is provided instead, allowing nodes to validate that the list provided conforms to the entries of the signed network map.
* CORDA-3844: Add clarifications and comments
* CORDA-3844: Add error handling for bulk request
* CORDA-3844: Enhance testing
* CORDA-3844: Fix detekt issues
* EG-3844: Apply pr suggestions
* CORDA-3845: Update BC to 1.64
* CORDA-3845: Upgraded log4j to 2.13.3
* We can remove the use of Manifests from the logging package so that when _it_ logs it doesn't error on the fact the stream was already closed by the default Java logger.
* Some more tidy up
* Remove the logging package as a plugin
* latest BC version
* Remove old test
* fix up
* Fix some rebased changes to log file handling
* Fix some rebased changes to log file handling
* Update slf4j too
Co-authored-by: Adel El-Beik <adel.el-beik@r3.com>
* CORDA-3717: Apply custom serializers to checkpoints
* Remove try/catch to fix TooGenericExceptionCaught detekt rule
* Rename exception
* Extract method
* Put calls to the userSerializer on their own lines to improve readability
* Remove unused constructors from exception
* Remove unused proxyType field
* Give field a descriptive name
* Explain why we are looking for two type parameters when we only use one
* Tidy up the fetching of types
* Use 0 seconds when forcing a flow checkpoint inside test
* Add test to check references are restored correctly
* Add CheckpointCustomSerializer interface
* Wire up the new CheckpointCustomSerializer interface
* Use kryo default for abstract classes
* Remove unused imports
* Remove need for external library in tests
* Make file match original to remove from diff
* Remove maySkipCheckpoint from calls to sleep
* Add newline to end of file
* Test custom serializers mapped to interfaces
* Test serializer configured with abstract class
* Move test into its own package
* Rename test
* Move flows and serializers into their own source file
* Move broken map into its own source file
* Delete comment now source file is simpler
* Rename class to have a shorter name
* Add tests that run the checkpoint serializer directly
* Check serialization of final classes
* Register as default unless the target class is final
* Test PublicKey serializer has not been overridden
* Add a broken serializer for EdDSAPublicKey to make test more robust
* Split serializer registration into default and non-default registrations. Run registrations at the right time to preserve Cordas own custom serializers.
* Check for duplicate custom checkpoint serializers
* Add doc comments
* Add doc comments to CustomSerializerCheckpointAdaptor
* Add test to check duplicate serializers are logged
* Do not log the duplicate serializer warning when the duplicate is the same class
* Update doc comment for CheckpointCustomSerializer
* Sort serializers by classname so we are not registering in an unknown or random order
* Add test to serialize a class that references itself
* Store custom serializer type in the Kryo stream so we can spot when a different serializer is being used to deserialize
* Testing has shown that registering custom serializers as default is more robust when adding new cordapps
* Remove new line character
* Remove unused imports
* Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt
* Remove comment
* Update comment on exception
* Make CustomSerializerCheckpointAdaptor internal
* Revert "Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt"
This reverts commit b835de79bd.
* Restore "Add interface net.corda.core.serialization.CheckpointCustomSerializer to api-current.txt""
This reverts commit 718873a4e9.
* Pass the class loader instead of the context
* Do less work in test setup
* Make the serialization context unique for CustomCheckpointSerializerTest so we get a new Kryo pool for the test
* Rebuild the Kryo pool for the given context when we change custom serializers
* Rebuild all Kryo pools on serializer change to keep serializer list consistent
* Move the custom serializer list into CheckpointSerializationContext to reduce scope from global to a serialization context
* Remove unused imports
* Make the new checkpointCustomSerializers property default to the empty list
* Delegate implementation using kotlin language feature
Refactor `FlowStateMachineImpl.transientValues` and
`FlowStateMachineImpl.transientState` to stop the fields from exposing
the fact that they are nullable.
This is done by having private backing fields `transientValuesReference`
and `transientStateReference` that can be null. The nullability is still
needed due to serialisation and deserialisation of flow fibers. The
fields are transient and therefore will be null when reloaded from the
database.
Getters and setters hide the private field, allowing a non-null field to
returned.
There is no point other than in `FlowCreator` where the transient fields
can be null. Therefore the non null checks that are being made are
valid.
Add custom kryo serialisation and deserialisation to `TransientValues`
and `StateMachineState` to ensure that neither of the objects are ever
touched by kryo.