The issue with the test was that the environment variable are kept as a static member so it passed if it was the first one to run, but failed if another test runs the config beforehand.
Terminate sessions that need to be removed instantly in whatever transition is currently executing, rather than scheduling another event and doing so at a later time.
To do this, update the transition being created in `TopLevelTransition` to remove the sessions and append the `RemoveSessionBindings` action to it.
This achieves the same outcome as the original code but does so with 1 less transition. Doing this also removes the race condition that can occur where another external event is added to the flow's event queue before the terminate event could be added.
* CORDA-3986 Increase sleep in `FlowSessionCloseTest`
A sleep duration needed to be increased to ensure that an end session
message has time to be processed by the other node.
Locks do not fully fix this because some internal processing needs to be
completed that can't be waited for using a lock. Therefore the sleep
time was increased generously.
* removed dependency from tools.jar
I removed the log line in /node/src/main/kotlin/net/corda/node/internal/NodeStartup.kt because I felt it was not so important
and I modified the checkpoint agent detection simply using a static field (I tested both with and without the checkpoint agent running and detection works correctly)
* move method to node-api to address review comments
Co-authored-by: Walter Oggioni <walter.oggioni@r3.com>
* CORDA-3954: Added step to run database migration scripts during initial node registration with -s / --skip-schema-creation option (default to false) to prevent migration.
* CORDA-3954: Applied code convention to if statement
* CORDA-3954: Marked NodeCmdLineOptions' -s/--skip-schema-creation as deprecated and hidden in line with --initial-registration
Disable `SignatureConstraintMigrationFromHashConstraintsTests.HashConstraint cannot be migrated to SignatureConstraint if a HashConstraint is specified for one state and another uses an AutomaticPlaceholderConstraint()` as it frequently appears in reports about Gradle process failures, to try isolating the actual cause.
Fix memory leak due to DB not shutting down in FlowFrameworkPersistenceTests.flow restarted just after receiving payload.
Also reduces number of class-wide variables to reduce scope for references being accidentally held between runs.
`KillFlowTest` is failing quite often. This is probably due to issues in ordering when taking and releasing locks. By using `CountDownLatch` in places instead of `Semaphore`s should reduce the likelihood of tests failing.
If an error occurs when creating a transition (a.k.a anything inside of
`TopLevelTransition`) then resume the flow with the error that occurred.
This is needed, because the current code is swallowing all errors thrown
at this point and causing the flow to hang.
This change will allow better debugging of errors since the real error
will be thrown back to the flow and will get handled and logged by the
normal error code path.
Extra logging has been added to `processEventsUntilFlowIsResumed`, just
in case an exception gets thrown out of the normal code path. We do not
want this exception to be swallowed as it can make it impossible to
debug the original error.
Save the exception for flows that fail during session init when they are
kept for observation.
Change the exception tidy up logic to only update the flow's status if
the exception was removed.
Add `CordaRPCOps.reattachFlowWithClientId` to allow clients to reattach
to an existing flow by only providing a client id. This behaviour is the
same as calling `startFlowDynamicWithClientId` for an existing
`clientId`. Where it differs is `reattachFlowWithClientId` will return
`null` if there is no flow running or finished on the node with the same
client id.
Return `null` if record deleted from race-condition
Update the compatible flag in the DB if the flowstate cannot be deserialised.
The most common cause of this problem is if a CorDapp has been upgraded
without draining flows from the node.
`RUNNABLE` and `HOSPITALISED` flows are restored on node startup so
the flag is set for these then. The flag can also be set when a flow
retries for some reason (see retryFlowFromSafePoint) in this case the
problem has been caused by another reason.
Added a newpause event to the statemachine which returns an Abort
continuation and causes the flow to be moved into the Paused flow Map.
Flows can receive session messages whilst paused.
Add a lock to `StateMachineState`, allowing every flow to lock
themselves when performing a transition or when an external thread (such
as `killFlow`) tries to interact with a flow from occurring at the same
time.
Doing this prevents race-conditions where the external threads mutate
the database or the flow's state causing an in-flight transition to
fail.
A `Semaphore` is used to acquire and release the lock. A `ReentrantLock`
is not used as it is possible for a flow to suspend while locked, and
resume on a different thread. This causes a `ReentrantLock` to fail when
releasing the lock because the thread doing so is not the thread holding
the lock. `Semaphore`s can be used across threads, therefore bypassing
this issue.
The lock is copied across when a flow is retried. This is to prevent
another thread from interacting with a flow just after it has been
retried. Without copying the lock, the external thread would acquire the
old lock and execute, while the fiber thread acquires the new lock and
also executes.
* Remove use of Thread.sleep() FROM FlowReloadAfterCheckpointTest, instead relying on CountdownLatch to wait until the target number has been hit or a timeout occurs, so the thread can continue as soon as the target is hit.
* Replace use of hashmaps to a concurrent queue, to mitigate risk of complex threading issues.
Enhance rpc acknowledgement method (`removeClientId`) to remove checkpoint
from all checkpoint database tables.
Optimize `CheckpointStorage.removeCheckpoint` to not delete from all checkpoint
tables if not needed. This includes excluding the results (`DBFlowResult`) and
exceptions (`DBFlowException`) tables.
Integrate `DBFlowException` with the rest of the checkpoint schema, so now
we are saving the flow's exception result in the database.
Making statemachine not remove `FAILED` flows' checkpoints from the
database if they are started with a clientId.
Retrieve the DBFlowException from the database to construct a
`FlowStateMachineHandle` future and complete exceptionally the flow's result
future for requests (`startFlowDynamicWithClientId`) that pick FAILED flows ,
started with client id, of status Removed.
On killing a flow the client id mapping of the flow gets removed.
The storage serialiser is used for serialising exceptions. Note, that if an
exception cannot be serialised, it will not fail and will instead be stored
as a `CordaRuntimeException`. This could be improved in future
changes.
Dummy package names cause build failure as they are not found on the classpath when trying to import them. Now that empty package name list is allowed, the dummy names are removed.
* CORDA-3663 MockServices crashes when two of the provided packages to scan are deemed empty in 4.4 RC05
this happends when a given package is not found on the classpath. Now it is handled and an exception is thrown
* replace dummy package names in tests with valid ones
* allow empty package list for CustomCordapps and exclude those from the created jars
* detekt fix
* always true logic fix
* fix to check for empty packages instead of empty classes
* fix for classes and fixups
* logic refactor because of detekt stupidity
* PR related minor refactors
When an incorrect message is received, the flow should resume to allow
it to throw the error back to user code and possibly cause the flow to
fail.
For now, if an `EndSessionMessage` is received instead of a
`DataSessionMessage`, then an `UnexpectedFlowEndException` is thrown
back to user code. Allowing it to correctly re-enter normal flow error
handling.
Without this change, the flow will hang due to it failing while creating
a transition which exists outside of the general state machine error
handling code path.
Sessions are now terminated after performing the original
`FlowIORequest` passed into `StartedFlowTransition`, instead of before.
This is done by scheduling an `Event.TerminateSessions` if there are
sessions to terminate when performing a suspending event.
Originally this was done by hijacking a transition that is trying to
perform a `StartedFlowTransition`, terminating the sessions and then
scheduling another `Event.DoRemainingWork` to perform the original
transition. This introduced a bug where, another event (from a external
message) could be placed onto the queue before the
`Event.DoRemainingWork` could be added. In most scenarios, that should
be ok. But, if a flow is retrying (while in an uninitiated state) and
this occurs the flow could fail due to being in an unexpected state.
Terminating the sessions after performing the original transition
removes this possibility. Meaning that a restarting flow will always
perform the transition they supposed to do (based on the called
suspending event).