Commit Graph

2968 Commits

Author SHA1 Message Date
d08b62da39 Infra-656 - NoSuchFileException in NodeInfoWatcher fix (#6672)
* Filter out tmp files

* Ignore .tmp files

* Ignore .tmp files

* Remove unused import
2020-08-27 16:41:25 +01:00
9a018e7bee Updated CLI usage error (#6661) 2020-08-27 12:15:21 +01:00
0cfe6b3084 Filter out tmp files (#6668) 2020-08-27 08:48:08 +01:00
99f835bb4a CORDA-3995 Redeliver external events if number of suspends differs (#6646)
* CORDA-3995 Redeliver external events in number of suspends differs

When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`numberOfSuspends` on the `currentState`'s checkpoint or the checkpoint
in the database.

If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.

This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).

* CORDA-3995 Redeliver external events in number of commits differs

When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`numberOfCommits` on the `currentState`'s checkpoint or the checkpoint
in the database.

If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.

This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).

* CORDA-3995 Redeliver external events if number of commits differs

When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`currentState`'s `numberOfCommits` or the `numberOfCommits`
the checkpoint has recorded in the database.

If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.

This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).

* Add @Suspendable to a test flow.

I am surprised this worked at all.

* Fix a few minor things based on review.

Co-authored-by: Will Vigor <william.vigor@r3.com>
2020-08-25 11:54:55 +01:00
133e6fe39a CORDA-4001 Verify Paused Checkpoints on Node Startup (#6655)
We should check that PAUSED Checkpoints can be deserialised on node
startup as we do for RUNNABLE checkpoints. Otherwise a user might
get into trouble if they update the CorDapp.
2020-08-24 12:59:51 +01:00
39491e8835 NOTICK - Fix flow framework tests (#6650)
Clear hooks between tests.
2020-08-20 11:20:35 +01:00
f428836f33 Clear hospital hooks after tests; should be fixing tests hanging 2020-08-20 09:22:56 +01:00
5d24b70227 CORDA-3998 - Commit db transaction before starting flows at node start (#6647)
Delay the firing of future/callback chain in 'AbstractNode.start' to after db transaction commit
2020-08-19 15:56:12 +01:00
771fade972 Clear flow hospital hooks - commented out for now 2020-08-19 14:24:54 +01:00
742312b85a NOTICK Do not replace stacktrace for local errors (#6635)
We should not overwrite the stack trace of local errors thrown by
`FlowContinuation.Throw` as it hides the real cause of the error.

Exceptions received from peer nodes are still overwritten.
2020-08-18 12:05:05 +01:00
abfe83626f CORDA-3809 - Remove tests applicable only in ENT (#6645) 2020-08-17 16:41:45 +01:00
949489a117 CORDA-3994 Retry errors in flow init started with client ids (#6643)
Flows that were started with a client id would hang because it would
retrieve the existing flow's future and wait for it to finish. But,
because the flow has failed its flow init and not saved its initial
checkpoint, it is relying on `startFlow` to start the flow again (by
redelivering the start flow external event).

`FlowWithClientIdStatus` now holds the flow id that it is related to.
This is then checked in `startFlow`. If a matching client id is found
for a flow start, it then checks the flow id as well. If the flow id
matches, then it lets the `startFlow` call continue, allowing it to
actually start the flow again (how a flow without a client id would
retry in this situation).
2020-08-17 10:35:22 +01:00
854e6638ff CORDA-3881 Get all finished flows with client ids (#6580)
Return map of `clientId` -> success/fail
2020-08-17 10:27:32 +01:00
f1b7bc9dcb CORDA-3993 Correct mock network handling (#6642)
Correct mock network handling in VaultQueryJoinTest so it does not block other mock networks.
2020-08-15 12:28:03 +01:00
8534aad3b1 ENT-5672 Know if paused flow is hospitalized (#6641)
Missing change from the original commit.
2020-08-15 10:08:37 +01:00
be6b76ff89 ENT-5684 Reconnect flow's progress tracker when unpausing (#6640)
Previously we were just throwing this away when pausing, meaning
updates would not be passed back to the user.

The progress tracker is now maintained in the `NonResidentFlow`
allowing it to be reused in the flow when it is retried.
2020-08-14 21:11:48 +01:00
32cb085a53 ENT-5672 Know if a paused flow is hospitalized (#6639)
* ENT-5672 Update database query to get paused flows which have previously been hospitalised

* NOTICK Remove unneeded check if a database exception was removed when switching a flow to RUNNABLE since we were to remove it anyway
2020-08-14 20:07:00 +01:00
c4027e23bf ENT-5649 Always load from db when flow retries (#6637)
Always attempt to load a checkpoint from the database when a flow
retries.

This is to prevent transient errors where the checkpoint is committed to
the database but throws an error back to the node. When the node tries
to retry in this scenario, `isAnyCheckpointPersisted` is false, meaning
that it will try to insert when it tries to save its initial checkpoint
again.

By loading from the existing checkpoint, even though it doesn't
really use it because it is `Unstarted`, the flag gets put into the
right state and will update rather than insert later on.
2020-08-14 17:42:19 +01:00
55133b02b9 Merge pull request #6626 from corda/bugfix/ENT-5654-run-migration-scripts-completes-with-unexpected-error
ENT-5654: Fixed migration error message, improved success message
2020-08-14 17:17:16 +01:00
2fb21373a4 Merge pull request #6632 from corda/nnagy-os-4.5-os-4.6-20200813
NOTICK - OS 4.5 to OS 4.6 merge 20200813
2020-08-14 16:16:40 +01:00
da065a6215 CORDA-3981: Fix bad keys are ignored and warned for (#6636)
The issue with the test was that the environment variable are kept as a static member so it passed if it was the first one to run, but failed if another test runs the config beforehand.
2020-08-14 15:36:46 +01:00
845ef8d3d1 CORDA-3989 Terminate sessions instantly (#6634)
Terminate sessions that need to be removed instantly in whatever transition is currently executing, rather than scheduling another event and doing so at a later time.

To do this, update the transition being created in `TopLevelTransition` to remove the sessions and append the `RemoveSessionBindings` action to it.

This achieves the same outcome as the original code but does so with 1 less transition. Doing this also removes the race condition that can occur where another external event is added to the flow's event queue before the terminate event could be added.
2020-08-14 11:13:42 +01:00
1cbfb74022 CORDA-3986 Increase sleep in FlowSessionCloseTest (#6629)
* CORDA-3986 Increase sleep in `FlowSessionCloseTest`

A sleep duration needed to be increased to ensure that an end session
message has time to be processed by the other node.

Locks do not fully fix this because some internal processing needs to be
completed that can't be waited for using a lock. Therefore the sleep
time was increased generously.
2020-08-14 10:57:24 +01:00
205ce84033 [EG-3461] removed dependency from tools.jar (#6631)
* removed dependency from tools.jar

I removed the log line in /node/src/main/kotlin/net/corda/node/internal/NodeStartup.kt because I felt it was not so important
and I modified the checkpoint agent detection simply using a static field (I tested both with and without the checkpoint agent running and detection works correctly)

* move method to node-api to address review comments

Co-authored-by: Walter Oggioni <walter.oggioni@r3.com>
2020-08-14 10:56:37 +01:00
bf53e47f0d ENT-5669 Improve robustness of FlowReloadAfterCheckpointTest (#6627)
Improve robustness of `FlowReloadAfterCheckpointTest` by adding a countdown latch to observe for when the reloads should have finished.
2020-08-13 15:04:52 +01:00
1a0e445a49 Merge branch 'release/os/4.5' into nnagy-os-4.5-os-4.6-20200813
# Conflicts:
#	node/src/main/kotlin/net/corda/node/services/vault/HibernateQueryCriteriaParser.kt
2020-08-13 14:25:58 +01:00
a7ea8df9a7 CORDA-3954: run database migration scripts during initial node registration (#6624)
* CORDA-3954: Added step to run database migration scripts during initial node registration with -s / --skip-schema-creation option (default to false) to prevent migration.

* CORDA-3954: Applied code convention to if statement

* CORDA-3954: Marked NodeCmdLineOptions' -s/--skip-schema-creation as deprecated and hidden in line with --initial-registration
2020-08-13 14:19:16 +01:00
294c2aa514 Merge pull request #6584 from filipesoliveira/filipeoliveira/corda-3931
CORDA-3931 - Fixed a bug which was preventing the custom JVM arguments from  being picked up when the command line "-f" flag was used
2020-08-13 14:13:09 +01:00
4aea7b876a ENT-5666 Disable test due to Gradle process death (#6617)
Disable `SignatureConstraintMigrationFromHashConstraintsTests.HashConstraint cannot be migrated to SignatureConstraint if a HashConstraint is specified for one state and another uses an AutomaticPlaceholderConstraint()` as it frequently appears in reports about Gradle process failures, to try isolating the actual cause.
2020-08-13 13:17:49 +01:00
07fe1b0960 ENT-5654: Fixed migration error message, improved success message 2020-08-13 11:32:56 +02:00
a6b2a3159d CORDA-3879 - query with OR combinator returns too many results (#6456)
* fix suggestion and tests

* detekt suppress

* making sure the forced join works with IndirectStatePersistable and removing unnecessary joinPredicates from parse with sorting

* remove joinPredicates and add tests

* rename sorting

* revert deleting joinPredicates and modify the force join to use `OR` instead of `AND`

* add system property switch
2020-08-13 10:04:53 +01:00
748480c33e CORDA-3981 Disable unstable test (#6625) 2020-08-12 18:39:36 +01:00
28b440d1f7 CORDA-3981: Change test to avoid timeout (#6612) 2020-08-12 10:31:58 +01:00
9965af180e CORDA-5985 Simplify network map test (#6618)
Remove parameterization from NetworkMapTest as it doesn't actually significantly improve coverage, and costs 6 and a half minutes on every build.
2020-08-12 09:01:58 +01:00
12e7fa1d93 CORDA-3981 Disable unstable config helper test (#6610) 2020-08-10 20:35:15 +01:00
29e87a586a CORDA-3973 Fix memory leak due to DB not shutting down (#6605)
Fix memory leak due to DB not shutting down in FlowFrameworkPersistenceTests.flow restarted just after receiving payload.

Also reduces number of class-wide variables to reduce scope for references being accidentally held between runs.
2020-08-10 20:04:51 +01:00
c191960cb8 CORDA-3948 Make KillFlowTest less flakey (#6606)
`KillFlowTest` is failing quite often. This is probably due to issues in ordering when taking and releasing locks. By using `CountDownLatch` in places instead of `Semaphore`s should reduce the likelihood of tests failing.
2020-08-10 16:59:28 +01:00
8aafb1db4a INFRA-545: Convert a few tests to unit tests; rename another test (#6562) 2020-08-10 15:25:28 +01:00
66406ba0fb ENT-5450 Resume flow when transition creation errors (#6604)
If an error occurs when creating a transition (a.k.a anything inside of
`TopLevelTransition`) then resume the flow with the error that occurred.

This is needed, because the current code is swallowing all errors thrown
at this point and causing the flow to hang.

This change will allow better debugging of errors since the real error
will be thrown back to the flow and will get handled and logged by the
normal error code path.

Extra logging has been added to `processEventsUntilFlowIsResumed`, just
in case an exception gets thrown out of the normal code path. We do not
want this exception to be swallowed as it can make it impossible to
debug the original error.
2020-08-10 13:09:43 +01:00
0d5ee8b0fa NOTICK Save exception for hospitalized session init errors (#6587)
Save the exception for flows that fail during session init when they are
kept for observation.

Change the exception tidy up logic to only update the flow's status if
the exception was removed.
2020-08-06 22:35:05 +01:00
849d51c8cd INFRA-505: Move integration tests to unit tests (#6530) 2020-08-06 15:16:27 +01:00
0005317fec Align code with merging changes 2020-08-06 13:35:46 +01:00
2afedeabb4 Merge branch 'release/os/4.6' into os_4.6-feature_pass_in_client_id_when_starting_a_flow 2020-08-06 13:21:45 +01:00
3f31aeaa5f CORDA-3822 Add CordaRPCOps.reattachFlowWithClientId (#6579)
Add `CordaRPCOps.reattachFlowWithClientId` to allow clients to reattach
to an existing flow by only providing a client id. This behaviour is the
same as calling `startFlowDynamicWithClientId` for an existing
`clientId`. Where it differs is `reattachFlowWithClientId` will return
`null` if there is no flow running or finished on the node with the same
client id.

Return `null` if record deleted from race-condition
2020-08-06 11:42:02 +01:00
8eaf237a27 Refactored the previous bug fix to minimize duplication by reusing an existing function.
Bear in mind that the condition in the previous code was (config == null) and now is (config == null || config.equals("")).
2020-08-06 11:15:19 +01:00
7acc510534 CORDA-3602 Set a Checkpoint as incompatible if it can't be deserialised (#3653)
Update the compatible flag in the DB if the flowstate cannot be deserialised.

The most common cause of this problem is if a CorDapp has been upgraded
without draining flows from the node.

`RUNNABLE` and `HOSPITALISED` flows are restored on node startup so
the flag is set for these then. The flag can also be set when a flow
retries for some reason (see retryFlowFromSafePoint) in this case the
problem has been caused by another reason.
2020-08-06 11:00:02 +01:00
80d279a70e Fixed a bug which would make the "-f" flag to be ignored. 2020-08-06 10:51:23 +01:00
4a828fcb99 ENT-5397 Pause individual running and hospitalised flows (#3564)
Added a newpause event to the statemachine which returns an Abort
continuation and causes the flow to be moved into the Paused flow Map.

Flows can receive session messages whilst paused.
2020-08-06 10:40:09 +01:00
bbf5a93761 ENT-5396 Allow Retrying a Hospitalised Flow from the Statemachine (#3499)
Added functionality to the statemachine to enable retrying a Hospitalised flow without a node restart.
2020-08-06 10:11:15 +01:00
a73dad00e2 CORDA-3850 Add a per flow lock (#6437)
Add a lock to `StateMachineState`, allowing every flow to lock
themselves when performing a transition or when an external thread (such
as `killFlow`) tries to interact with a flow from occurring at the same
time.

Doing this prevents race-conditions where the external threads mutate
the database or the flow's state causing an in-flight transition to
fail.

A `Semaphore` is used to acquire and release the lock. A `ReentrantLock`
is not used as it is possible for a flow to suspend while locked, and
resume on a different thread. This causes a `ReentrantLock` to fail when
releasing the lock because the thread doing so is not the thread holding
the lock. `Semaphore`s can be used across threads, therefore bypassing
this issue.

The lock is copied across when a flow is retried. This is to prevent
another thread from interacting with a flow just after it has been
retried. Without copying the lock, the external thread would acquire the
old lock and execute, while the fiber thread acquires the new lock and
also executes.
2020-08-06 09:51:42 +01:00