Do not keep a flow in for observation if it receives an unexpected
session end message while in `ReceiveFinalityFlow` and
`ReceiveTransactionFlow` (due to being called by the former).
This is done by checking the message of the `UnexpectedFlowEndException`
that is thrown when a session end message instead of a data message and
if the stacktrace has `ReceiveTransactionFlow` at the top, after
removing statemachine stack frames.
Checking the stacktrace for `ReceiveTransactionFlow` is important
because the unexpected session end session message is only ok if a
transaction has not already been received. For example, if
`ResolveTransactionsFlow` is in the stack, then this indicates failure
when receiving transaction dependencies on a transaction that should be
recorded.
Also added a test that highlights that the `UnexpectedFlowEndException`
caused by the session end message can be caught, therefore users can
determine their own behaviour if desired.
Remove the shell code from the OS code base, this includes the modules:
- `:tools:shell`
- `:tools:shell-cli`
The shell will be run within a node if it exists within the node's `drivers` directory.
This is done by using a `URLClassloader` to load the `InteractiveShell` class into Corda's JVM process and running `startShell` and `runLocalShell`.
Running the shell within the `:samples` will require adding:
```
cordaDriver "net.corda:corda-shell:<corda_shell_version>"
```
To the module's `build.gradle` containing `deployNodes`. The script will then include the shell in the created nodes.
Checkpoint dumping of paused flows was not working because the dumper
expects a flow to have a `FlowState` of `Unstarted` or `Started`,
however due to a memory optimisation paused flows have their `FlowState`
set to `Paused`. This was causing causing an exception as well as a loss
of potentially useful information.
A flag `alwaysDeserializeCheckpoint` has been added to
`Checkpoint.Serialized.deserialize` which skips the memory optimisation
and forces the deserialization of the flow's `FlowState`.
Paused flows are now included in the dumped output along with their real
`FlowState` which is useful to users even if the flow is paused rather
than waiting for something to complete.
The status of the flow has also been added to the JSON output to assist
users in debugging their flows.
A public version of `FlowManagerRPCOps` which does not live in an
internal package has been added. This new interface shares the same name
as the internal one.
Because of the name sharing, the internal version has been
`@Deprecated`.
`FlowManagerRPCOpsImpl` implements both the new and old interfaces. This
allows for backwards compatibility, allowing old shells or clients to
call the old interface on newer nodes without breaking.
A user passed in a `FlowLogic` as an argument into another `FlowLogic`
called `subFlow` on it and had it throw an exception.
This all occurred before the first checkpoint, causing the state machine
to try and persist a FAILED checkpoint containing the flow's arguments.
Because the arguments contained a `FlowLogic` that had been started via
`subFlow` it held a reference to `FlowLogic._stateMachine` which cannot
be serialized.
This caused the flow to fail when trying to persist the fact that it
failed.
The flow arguments are now emptied during `ErrorFlowTransition` to
resolve this issue which mimics the behaviour of the first suspend.
Note, this only takes the arguments out of the serialized checkpoint, it
does not affect the flow metadata and therefore a flow's arguments can
still be viewed.
Co-authored-by: Dan Newton <dan.newton@r3.com>
* ENT-6357: Deserialize LedgerTransaction elements for each Contract.verify().
* Lock the LedgerTransaction and NetworkParameters objects down for contract verification.
* Refactor BasicVerifier to be package private instead of public.
* Simplify verifyConstraints() operation.
* Review fixes: replace HashSet with LinkedHashSet, and add signing parties to commands via mapIndexed.
* Ensure tests also run notary nodes "out of process".
* Streamline SerializationContext switching.
* Cache deserialised cryptographic instances during contract verification.
* Invoke Class.forName() instead of ClassLoader.loadClass() to reduce contention on the system classloader's lock.
* Deserialization cache key now pre-computes its hash code.
* Allow AttachmentsClassLoader to be used concurrently.
* Cache all Envelope objects for reuse during contract verification.
* Generate CertPathProxy hash code using conventional algorithm.
* Adjust CustomSerializer.Proxy to allow better access to SerializationContext.
* ENT-6330 Fixed reading jar entries in memory
This is a trivial fix that is however enough to allow to send zip bombs as attachments without the node crashing, a size limit could be added for increased reliability
* added attachment cumulative size check
* added compression ratio check
* added unit test and moved the code to a standalone verifier object
* removed attachment check from AttachmentClassLoader to minimize performance impact
* CORDA-3755: Switched attachments map to a WeakHashMap (#6214)
* Bump OS release version 4.6
* CORDA-3755: Switched attachments map to a WeakHashMap
* CORDA-3755: Added explicit strong references to map key.
* CORDA-3755: Keeping detekt happy.
* CORDA-3755: Test a gc in verify.
* CORDA-3755: Making detekt happy.
* CORDA-3755: Suppress warnings for weak reference test.
* CORDA-3755: Fixing build failure with attachments.
* CORDA-3755: Rewrite based on Ricks input - now handles attachment already existing in map!
* CORDA-3755: Refactor WeakReference behaviour into AttachmentsHolderImpl and provide alternate version of this class for core-deterministic.
* CORDA-3755: Added more tests for WeakHashMap.
* CORDA-3755: Ignore the tests using System.gc keep for local testing only
* CORDA-3755: Adding comment to explain the ignored tests.
* Make AttachmentsHolderImpl package-private inside core-deterministic, just like it is inside core.
* CORDA-3755: Update assertions following review comments.
* CORDA-3755: Removing import
* CORDA-3755: Removed unused var.
* CORDA-3755: Reverting files that somehow got changed in rebase.
Co-authored-by: nargas-ritu <ritu.gupta@r3.com>
Co-authored-by: Chris Rankin <chris.rankin@r3.com>
* CORDA-3769: Switched attachments class loader cache to use caffeine (#6326)
* CORDA-3769: Switched attachments class loader cache to use caffeine with original implementation used by determinstic core.
* CORDA-3769: Removed default ctor arguments.
* CORDA-3769: Switched mapping function to Function type to avoid synthetic method being generated.
* CORDA-3769: Now using a cache created from NamedCacheFactory for the attachments class loader cache.
* CORDA-3769: Making detekt happy.
* CORDA-3769: The finality tests now check for UntrustedAttachmentsException which will actually happen in reality.
* CORDA-3769: Refactored after review comments.
* CORDA-3769: Removed the AttachmentsClassLoaderSimpleCacheImpl as DJVM does not need it. Also updated due to review comments.
* CORDA-3769: Removed the generic parameters from AttachmentsClassLoader.
* CORDA-3769: Removed unused imports.
* CORDA-3769: Updates from review comments.
* CORDA-3769: Updated following review comments. MigrationServicesForResolution now uses cache factory. Ctor updated for AttachmentsClassLoaderSimpleCacheImpl.
* CORDA-3769: Reduced max class loader cache size
* CORDA-3769: Fixed the attachments class loader cache size to a fixed default
* CORDA-3769: Switched attachments class loader size to be reduced by fixed value.
* CORDA-4125: Parameter has been added to a private ctor.
Co-authored-by: nargas-ritu <ritu.gupta@r3.com>
Co-authored-by: Chris Rankin <chris.rankin@r3.com>
* CORDA-4105 Add public API to allow custom serialization schemes (#6848)
* CORDA-4105 Add public API to allow custom serialization schemes
* Fix Detekt
* Suppress warning
* Fix usused import
* Improve API to use generics
This does not break Java support (only Intelij gets confused).
* Add more detailed documentation to public interfaces
* Change internal variable name after rename public API
* Update Public API to use ByteSquence instead of SerializedBytes
* Remove unused import
* Fix whitespace.
* Add added public API to .ci/api-current.txt
* Improve public interfaces
Rename CustomSchemeContext to SerializationSchemeContext to improve
clarity and move to it's own file. Improve kdoc to make things less
confusing.
* Update API current with changed API
* CORDA-4104 Implement custom serialization scheme discovery mechanism (#6854)
* CORDA-4104 Implement CustomSerializationScheme Discovery Mechanism
Discovers a single CustomSerializationScheme implementation inside
the drivers dir using a system property.
* Started MockNetwork test
* Add driver test of Custom Serialization Scheme
* Fix detekt and minor style error
* Respond to review comments
Allow non-single arg constructors (there must be one no args
constructor), move code from SerializationEnviroment into its
own file, improve exceptions to be more user friendly.
* Fix minor bug in Scheme finding code + improve error messages
* CORDA-4104 Improve test coverage of custom serialization scheme discovery (#6855)
* CORDA-4104 Add test of classloader scanning for CustomSerializationSchemes
* Fix Detekt
* NOTICK Clarify KDOC on SerializationSchemeContext (#6865)
* CORDA-4111 Change Component Group Serialization to use contex when the lazy map is constructed (#6856)
Currently the component group will recheck the thread local (global)
serialization context when component groups are serialized lazily.
Instead store the serialization context when the lazy map is constructed
and use that latter when doing serialization lazily.
* CORDA-4106 Test wire transaction can still be written to the ledger (#6860)
* Add test that writes transaction to the Database
* Improve test check serialization scheme in test body
* CORDA-4119 Minor changes to serialisation injection for transaction building (#6868)
* CORDA-4119 Minor changes to serialisation injection for transaction building
Scan the CorDapp classloader instead of the drivers classloader.
Add properties map to CustomSerialiaztionContext (copied from SerializationContext).
Change API to let a user pass in the serialization context in TransactionBuilder.toLedgerTransaction
* Improve KDOC + fix shawdowing issue in CordaUtils
* Pass only the properties map into theTransactionBuilder.toWireTransaction
Not the entire serializationContext
* Revert change to CordaUtils
* Improve KDOC explain pitfalls of setting properties
Prevent some serialization errors that occur due to serialization
and deserialization of `ArrayList$SubList` found inside the
`SessionState` data structures.
To prevent this, an explicit `ArrayList` is used rather than a `List`.
Overload the `List` operator functions so that `+` returns an
`ArrayList` instead of a `List`.
Create `toArrayList` for a few conversions.
If a flow fails outside of its normal error processing code path it will
end up in `FlowDefaultUncaughtExceptionHandler`.
This handler will put the flow into overnight observation if possible.
This is done in-memory and the database.
Even with this being done, the fiber itself has blown up and therefore
does not manage to get to `SMM.removeFlow` which is where
`SMM.decrementLiveFibers` is called. For example, a flow that errored
will hit this code eventually. This code is also hit when a flow is
suspended and a shutdown event is sent to it.
The `liveFibers` latch blocks the SMM from shutting down until all flows
have finished or processed shutdown events.
The changes described below resolve this problem.
Any flow that goes to the `FlowDefaultUncaughtExceptionHandler` will be
put marked as dead (`StateMachineState.isDead`). Highlighting that the
flow cannot continue to process events normally as it has broken out
of its event loop
Retrying and shutdown are done manually rather than injecting events
into the flow fiber's queue, because it can't execute its event loop.
Killing a dead flow executes an altered version of
`retryFlowFromSafePoint`. It does this so it can delete the checkpoint
and then continue using the checkpoint it just deleted to run the
kill flow transition on a new fiber.
If a killed flow reaches the `FlowDefaultUncaughtExceptionHandler` it
will be forcibly killed via `killFlowForcibly` which deletes the
checkpoint/or updates it to KILLED and then calls `removeFlow` to bypass
any event processing. This means that a flow that was dead and was killed
will be terminated manually if it reaches the handler again. The same is
true for flows that were not dead before but reached the handler after
being killed.
Also, `FlowCreator.createFlowFromCheckpoint` now retains the `isKilled`
state of the previous fiber's state.
* CORDA-4045 Disable flaky test (#6792)
Disable `Restart does not set senderUUID` as it is unstable.
* INFRA-965: forward merge from 4.5 (2020-11-15) (#6806)
* EG-4168 Updating contributors.md list for OS 4.5 release branch (#6784)
* INFRA-965: forward merge from 4.4 (2020-11-14) (#6805)
* EG-4168 Updating contributors.md list for OS 4.4 release branch (#6785)
* INFRA-965: forward merge from 4.3 (2020-11-13) (#6803)
* INFRA-965: Jenkins/NexusIQ integration should target patches (#6802)
as well as major/minor releases
* Updated nightly builds to match changes for release branch builds
* Updated JDK11 builds to match changes for release branch builds
Co-authored-by: ivanterziev <61829352+ivanterziev-r3@users.noreply.github.com>
* Backporting updates for check-pr-title workflow
Co-authored-by: ivanterziev <61829352+ivanterziev-r3@users.noreply.github.com>
Co-authored-by: Ross Nicoll <ross.nicoll@r3.com>
Co-authored-by: ivanterziev <61829352+ivanterziev-r3@users.noreply.github.com>
* wip
* wip
* wip (need to review IEE comments)
* wip
* wip
* Small refactoring, fixed network-verifier's TestNotaryFlow
* Added command line option to explicitly enable hash agility support
* wip-do-not-push
* wip
* wip
* wip
* aligned merkletree/transaction hash algorithms
* wip
* Added mixed algorithm support for nodes vs leaves and corrected mixed algorithm tests
* moved global computeNonce and componentHash to DigestService
* added comment for failing test to fix
* wip
* Minor cleanups, added deprecated componentHash/computeNonce
* restored exploratory changes to failing SignedTransaction test
* cleaned up and minor rafactoring
* Fixed some tests with hardcoded hash algorithm
* some changes and cleanups following code review
* WIP commit before large change
* WIP Fixed 3 tests
* WIP removed direct references to randomSHA256() and sha256()
* Updated/added liquibase migrations to support larger hash algorithms
* Reviewed, cleanups, comments, fixes
* removing direct references to sha256()
* WIP verifying obligations test errors
* reviewing obligation/attachment issues with sha3_256
* Full review before PR - intermediate commits
* Reviewed and cleaned up
* Futher cleanup
* Fixed partial tree backward compatible json and cleanups
* all tests passing
* Removed couple of unused imports
* Reworked global componentHash function to avoid deprecated warnings
* replaced SHA3s with some alternate SHA2s
* Removed SHA3-256 and SHA3-512 references
* fixed some tests using non ubiquitous hash algorithms
* Fixed ABI compatibility (not for TransactionBuilder)
* Fixed ABI compatibility to TransactionBuilder
* couple of fixes
* fixed DigestService's randomHash
* Removed constructor with loosely typed args for private constructor of LedgerTransaction class (API removal)
* re-introduced LedgerTransaction deprecated ctor for deserialization
* Add possibility to load CustomMessageDigest bypassing JCA (#6798)
* Change api-current for DigestAlgorithm
* disable flaky tests
* addressed liquibase migration script versions
* Removed TODOs and cleanups
* relaxed privacy salt validation
* Fixed privacy salt test to comply with relaxed validation
* detekt and privacySalt validation
* diff cleanup
* diff cleanup
* removed unused import
* removed PrivacySalt's validateFor method and references
* removed invalid character
Co-authored-by: Denis Rekalov <denis.rekalov@r3.com>
* wip
* wip
* wip (need to review IEE comments)
* wip
* wip
* Small refactoring, fixed network-verifier's TestNotaryFlow
* Added command line option to explicitly enable hash agility support
* wip-do-not-push
* wip
* wip
* wip
* aligned merkletree/transaction hash algorithms
* wip
* Added mixed algorithm support for nodes vs leaves and corrected mixed algorithm tests
* moved global computeNonce and componentHash to DigestService
* added comment for failing test to fix
* wip
* Minor cleanups, added deprecated componentHash/computeNonce
* restored exploratory changes to failing SignedTransaction test
* cleaned up and minor rafactoring
* Fixed some tests with hardcoded hash algorithm
* some changes and cleanups following code review
* WIP commit before large change
* WIP Fixed 3 tests
* WIP removed direct references to randomSHA256() and sha256()
* Updated/added liquibase migrations to support larger hash algorithms
* Reviewed, cleanups, comments, fixes
* removing direct references to sha256()
* WIP verifying obligations test errors
* reviewing obligation/attachment issues with sha3_256
* Full review before PR - intermediate commits
* Reviewed and cleaned up
* Futher cleanup
* Fixed partial tree backward compatible json and cleanups
* all tests passing
* Removed couple of unused imports
* Reworked global componentHash function to avoid deprecated warnings
* replaced SHA3s with some alternate SHA2s
* Removed SHA3-256 and SHA3-512 references
* fixed some tests using non ubiquitous hash algorithms
* Fixed ABI compatibility (not for TransactionBuilder)
* Fixed ABI compatibility to TransactionBuilder
* couple of fixes
* fixed DigestService's randomHash
* Removed constructor with loosely typed args for private constructor of LedgerTransaction class (API removal)
* re-introduced LedgerTransaction deprecated ctor for deserialization
* Add possibility to load CustomMessageDigest bypassing JCA (#6798)
* Change api-current for DigestAlgorithm
* disable flaky tests
Co-authored-by: Denis Rekalov <denis.rekalov@r3.com>
Direct SQL was causing issues with the saving of instants not
maintaining their timezones properly.
Changing to parameterised queries fixed this issue.
Also changed other queries to do the same, since it is good practice to
query in this way.
Change assertion in `Restart does not set senderUUID` to verify a single message has a sender UUID set, rather than the last to be recorded as sent has no sender UUID.
* CORDA-3845: Update BC, log4j, slf4j (#6699)
* CORDA-3845: Update BC to 1.64
* CORDA-3845: Upgraded log4j to 2.12.1
* We can remove the use of Manifests from the logging package so that when _it_ logs it doesn't error on the fact the stream was already closed by the default Java logger.
* Remove the logging package as a plugin
* latest BC version
* Remove old test
* Fix some rebased changes to log file handling
* Update slf4j too
Co-authored-by: Ryan Fowler <fowlerrr@users.noreply.github.com>
Co-authored-by: Adel El-Beik <adel.el-beik@r3.com>
* CORDA-4034 Reduce forkEvery to 15 to attempt to mitigate memory leak.
* ENT-5679 Disable test which triggers OOM
* Run tests on two Jenkins agents
* Fixed processing JUnit test results by Allure
* Add timeouts to VaultObserverExceptionTest
* Revert "CORDA-3845: Update BC, log4j, slf4j (#6699)" to eliminate introduced memory leaks
Co-authored-by: Waldemar Zurowski <waldemar.zurowski@r3.com>
Do not let a user reattach to a flow started by another user.
Reattaching to a flow using startFlowWithClientId for a flow not
started by the current user throws a PermissionException
Reattaching to a flow using reattachFlowWithClientId for a flow not
started by the current user returns null.
finishedFlowsWithClientIds does not return flows started by other
users.
Normal rpc permissions around startFlowWithClientId and
startFlowDynamicWithClientId has also been added.
To allow admins to remove client ids as well as be able to see all the
client ids on the node, admin versions have been added that bypass the
user restrictions. These can be permitted via rpc to only provide
their usage to admins.
* Do not use Security.addProvider(BouncyCastleProvider()) in tests, to avoid disruptions of other tests.
* Forcibly register security providers before starting Jimfs, to resolve a sequencing problem where Jimfs triggers loading of the SFTP filesystem provider, which in turn registers the standard BouncyCastle provider rather than the patched version Corda needs.
* INFRA-683 Move Corda OS release branch builds to serial (#6703)
Co-authored-by: Waldemar Zurowski <waldemar.zurowski@r3.com>
Co-authored-by: Denis Rekalov <denis.rekalov@r3.com>
Co-authored-by: Waldemar Zurowski <waldemar.zurowski@r3.com>
* CORDA-3845: Update BC to 1.64
* CORDA-3845: Upgraded log4j to 2.12.1
* We can remove the use of Manifests from the logging package so that when _it_ logs it doesn't error on the fact the stream was already closed by the default Java logger.
* Remove the logging package as a plugin
* latest BC version
* Remove old test
* Fix some rebased changes to log file handling
* Update slf4j too
Co-authored-by: Ryan Fowler <fowlerrr@users.noreply.github.com>
Co-authored-by: Adel El-Beik <adel.el-beik@r3.com>
* Make existing client id flows re-attachable via rpc 'startFlow' when flows draining mode is enabled
* Fix detekt issue
* Remove unneeded/ unreached waiting on flow's return future
Forcibly register security providers before starting Jimfs, to resolve a sequencing problem where Jimfs triggers loading of the SFTP filesystem provider, which in turn registers the standard BouncyCastle provider rather than the patched version Corda needs.
Do not use Security.addProvider(BouncyCastleProvider()) in tests, to avoid disruptions of other tests.
Co-authored-by: Denis Rekalov <denis.rekalov@r3.com>
If a flow is started with a client id, do not delete it from the
database. Instead, set the status to `KILLED` and store a
`KilledFlowException` in the database.
Keeping it around allows it to be reattached to using the same
mechanisms as active, completed or failed flows. Furthermore, without
this change it could be possible for a flow to be triggered again if
killed while the reconnecting rpc client is being used.
If there is no client id related to the flow, then kill flow keeps its
original behaviour of deleting all traces of the flow.
Flows cannot be killed if they are `COMPLETED`, `FAILED` or `KILLED`
already.
Logs have been added if requests to kill flows in these statuses are
made.
Do not update the status + persist the exception if the flow was killed
during flow initialisation (before persisting its first checkpoint).
Remove the client id mapping if the flow was killed and did not persist
its original checkpoint.
* ENT-5666 Extract shutdown events when retrying a flow
When a flow is retrying, only a select set of events are transferred
over to the new fiber. Shutdown events were not included in this set.
This meant that if a flow retries when an `Event.SoftShutdown` is in its
queue, it will never process it. This causes the node to hang, as the
node awaits `liveFibers`, which never reaches 0 because the shutdown
event is never processed.
To resolve this, `Event.SoftShutdown` is added to the set of events to
extract and reschedule.
* ENT-5666 Don't schedule extra shutdown event
When a flow is stopped by an `Event.SoftShutdown` it will eventually
reschedule another shutdown even when it reaches `SMM.removeFlow`. It
won't actually be processed because the flow returns an abort
continuation. But, it does look odd.
Therefore, it now does nothing instead since that is what it was
implicitly doing.
Created a database snapshot of a clean Corda OS 4.5.1 database, which can now be used for testing by both the node driver and mock network.
The MockNetwork was changed from using an in memory database to using an on disk database, and makes use of the snapshot to speed up setup times.
The Node Driver was changed from defaulting to an in-memory database to defaulting to an on-disk database. Tests that do not specify the type of database to use will thus use an on-disk database. Tests that opt in for an in-memory database will continue to use an in-memory database as before.
The database snapshots are copied to the node directory inside the build folder, therefore, they should be cleaned up after a build.
Co-authored-by: Ross Nicoll <ross.nicoll@r3.com>
* EG-3458 - Missing onError implementation message logged in the node log file with ERROR level - the changes made on top of 4.6 branch
* EG-3458 - Reducing the number of logs by only logging on first consecutive error. Retry without completing the observable
* EG-3458 - Refactor the overly complex method to smaller functions
* EG-3458 - Reducing the number of functions in the class
* INFRA-424 linux1 jenkinsfile
* INFRA-424 full run
* INFRA-424 bigger heap size
* Upgraded DJVM to handle BC - latest version of BC is a multirelease JAR.
When reading JKS keystore if a BC EdDSAPrivateKey is returned then swap for a net.i2p EdDSA private key.
* Temporary downgrade of BC
* Removed the BC EdDSA conversion
* INFRA-424 bigger heap size
* Upgrading Quasar to handle openJ9 different fields.
* INFRA-424: Handle lack of SUPPRESSED_SENTINEL in openj9.
* INFRA-424: If BCEdDSA public or private key is generated convert to net.i2p EdDSA form.
* INFRA-424 bigger heap size
* INFRA-424: On openJ9 only getting upto milli resolution.
* INFRA-424: Handle keystore returning a BCEdDSAPrivateKey.
* INFRA-424: Disable test on JDK11, as it requires the custom cordapp to generate JDK8 contract code, which we now check for.
* INFRA-424: Truncated time test to resolution of millis for openj9.
* INFRA-424 disabling log intensive tests until a fix is developed
* INFRA-424 one more test disabled
* INFRA-424: Disabled a couple of tests failing on openj9.
* INFRA-424: Disabling failing openj9 tests.
* INFRA-424: Disabling test failing on openj9.
* INFRA-424: Ignoring another flaky sleep test on openj9.
* INFRA-424 run integrationTests
* INFRA-424 set timeout to 4 hours
* INFRA-424: Cope with exception message from openj9.
* INFRA-424: Handle the coloured text characters openj9 adds.
* INFRA-424: Disabling test as it is generating JDK11 contract code under JDK11. Currently on JDK8 contract code allowed.
* INFRA-424: Commenting test out for openj9. Output of the processs thats read by the test is sometimes garbled.
* INFRA-424 switching to smoke tests
* INFRA-424 switching to slow integration tests
* INFRA-424 full run
* INFRA-424 moving jenkinsfile
* INFRA-424 removing references
* INFRA-424: Created common IS_OPENJ9 func for ignoring tests.
Co-authored-by: Schife <razvan.codreanu@r3.com>
* CORDA-4003: Now cope with file: prefix not being in class path element.
* CORDA-4003: Switched to new URL type filter.
* CORDA-4003: Switched to a URL comparison. In the string comparison the scheme was removed in latest version of classgraph.
* CORDA-4003: Moved to latest version of classgraph that has support for + in filenames.
* CORDA-4003: Switched to accept version of the deprecated classgraph methods.
* CORDA-3995 Redeliver external events in number of suspends differs
When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`numberOfSuspends` on the `currentState`'s checkpoint or the checkpoint
in the database.
If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.
This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).
* CORDA-3995 Redeliver external events in number of commits differs
When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`numberOfCommits` on the `currentState`'s checkpoint or the checkpoint
in the database.
If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.
This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).
* CORDA-3995 Redeliver external events if number of commits differs
When retrying a flow, only redeliver external events held in a flow's
pending deduplication handlers if there is a difference in the
`currentState`'s `numberOfCommits` or the `numberOfCommits`
the checkpoint has recorded in the database.
If the checkpoint committed, but the flow retried, then the external
events would have been persisted to the database as part of the same
transaction. Therefore there is no need to replay them, as they have
already been processed as saved as part of the checkpoint.
This change is only relevant when the checkpoint persists, but the flow
still needs to retry after this occurs (within the same
transition/event).
* Add @Suspendable to a test flow.
I am surprised this worked at all.
* Fix a few minor things based on review.
Co-authored-by: Will Vigor <william.vigor@r3.com>
We should check that PAUSED Checkpoints can be deserialised on node
startup as we do for RUNNABLE checkpoints. Otherwise a user might
get into trouble if they update the CorDapp.
We should not overwrite the stack trace of local errors thrown by
`FlowContinuation.Throw` as it hides the real cause of the error.
Exceptions received from peer nodes are still overwritten.
* CORDA-3960: Port MultiRPCClient to OS
* CORDA-3960: Carefully restore serialisation environment in `MultiRpcClientTest` to eliminate side effects on other tests
* CORDA-3960: Move ThreadDumpUtils.kt to `core-utils`
Flows that were started with a client id would hang because it would
retrieve the existing flow's future and wait for it to finish. But,
because the flow has failed its flow init and not saved its initial
checkpoint, it is relying on `startFlow` to start the flow again (by
redelivering the start flow external event).
`FlowWithClientIdStatus` now holds the flow id that it is related to.
This is then checked in `startFlow`. If a matching client id is found
for a flow start, it then checks the flow id as well. If the flow id
matches, then it lets the `startFlow` call continue, allowing it to
actually start the flow again (how a flow without a client id would
retry in this situation).
Previously we were just throwing this away when pausing, meaning
updates would not be passed back to the user.
The progress tracker is now maintained in the `NonResidentFlow`
allowing it to be reused in the flow when it is retried.
* ENT-5672 Update database query to get paused flows which have previously been hospitalised
* NOTICK Remove unneeded check if a database exception was removed when switching a flow to RUNNABLE since we were to remove it anyway
Always attempt to load a checkpoint from the database when a flow
retries.
This is to prevent transient errors where the checkpoint is committed to
the database but throws an error back to the node. When the node tries
to retry in this scenario, `isAnyCheckpointPersisted` is false, meaning
that it will try to insert when it tries to save its initial checkpoint
again.
By loading from the existing checkpoint, even though it doesn't
really use it because it is `Unstarted`, the flag gets put into the
right state and will update rather than insert later on.
The issue with the test was that the environment variable are kept as a static member so it passed if it was the first one to run, but failed if another test runs the config beforehand.
Terminate sessions that need to be removed instantly in whatever transition is currently executing, rather than scheduling another event and doing so at a later time.
To do this, update the transition being created in `TopLevelTransition` to remove the sessions and append the `RemoveSessionBindings` action to it.
This achieves the same outcome as the original code but does so with 1 less transition. Doing this also removes the race condition that can occur where another external event is added to the flow's event queue before the terminate event could be added.
* CORDA-3986 Increase sleep in `FlowSessionCloseTest`
A sleep duration needed to be increased to ensure that an end session
message has time to be processed by the other node.
Locks do not fully fix this because some internal processing needs to be
completed that can't be waited for using a lock. Therefore the sleep
time was increased generously.
* removed dependency from tools.jar
I removed the log line in /node/src/main/kotlin/net/corda/node/internal/NodeStartup.kt because I felt it was not so important
and I modified the checkpoint agent detection simply using a static field (I tested both with and without the checkpoint agent running and detection works correctly)
* move method to node-api to address review comments
Co-authored-by: Walter Oggioni <walter.oggioni@r3.com>
* CORDA-3954: Added step to run database migration scripts during initial node registration with -s / --skip-schema-creation option (default to false) to prevent migration.
* CORDA-3954: Applied code convention to if statement
* CORDA-3954: Marked NodeCmdLineOptions' -s/--skip-schema-creation as deprecated and hidden in line with --initial-registration
Disable `SignatureConstraintMigrationFromHashConstraintsTests.HashConstraint cannot be migrated to SignatureConstraint if a HashConstraint is specified for one state and another uses an AutomaticPlaceholderConstraint()` as it frequently appears in reports about Gradle process failures, to try isolating the actual cause.
* fix suggestion and tests
* detekt suppress
* making sure the forced join works with IndirectStatePersistable and removing unnecessary joinPredicates from parse with sorting
* remove joinPredicates and add tests
* rename sorting
* revert deleting joinPredicates and modify the force join to use `OR` instead of `AND`
* add system property switch
* CORDA-3657 Extract information from state machine
`FlowReadOperations` interface provides functions that extract
information about flows from the state machine manager.
`FlowOperator` implements this interface (along with another currenly
empty interface).
* CORDA-3657 Rename function and use set
* initial test is passing
* wip
* done tests
* additional tests to cover more FlowIORequest variations
* completed tests
* The quasar.jar should nat have been changed
* Fixed issues reported by detekt
* got rid of sync objects, instead relying on nodes being offline
* Added extra grouping test and minor simplification
* Hospital test must use online node which fails on otherside
* Added additional information required for the ENT
* Added tests to cover SEND FlowIORequests
* using node name constants from the core testing module
* Changed flow operator to the query pattern
* made query fields mutable to simply building query
* fixed detekt issue
* Fixed test which had dependency on the order int the result (failed for windows)
* Fixed recommendations in PR
* Moved WrappedFlowExternalOperation and WrappedFlowExternalAsyncOperation to FlowExternalOperation.kt as per PR comment
* Moved extension to FlowAsyncOperation
* removed unnecessarily brackets
Co-authored-by: LankyDan <danknewton@hotmail.com>
Fix memory leak due to DB not shutting down in FlowFrameworkPersistenceTests.flow restarted just after receiving payload.
Also reduces number of class-wide variables to reduce scope for references being accidentally held between runs.
`KillFlowTest` is failing quite often. This is probably due to issues in ordering when taking and releasing locks. By using `CountDownLatch` in places instead of `Semaphore`s should reduce the likelihood of tests failing.
If an error occurs when creating a transition (a.k.a anything inside of
`TopLevelTransition`) then resume the flow with the error that occurred.
This is needed, because the current code is swallowing all errors thrown
at this point and causing the flow to hang.
This change will allow better debugging of errors since the real error
will be thrown back to the flow and will get handled and logged by the
normal error code path.
Extra logging has been added to `processEventsUntilFlowIsResumed`, just
in case an exception gets thrown out of the normal code path. We do not
want this exception to be swallowed as it can make it impossible to
debug the original error.
Save the exception for flows that fail during session init when they are
kept for observation.
Change the exception tidy up logic to only update the flow's status if
the exception was removed.
Add `CordaRPCOps.reattachFlowWithClientId` to allow clients to reattach
to an existing flow by only providing a client id. This behaviour is the
same as calling `startFlowDynamicWithClientId` for an existing
`clientId`. Where it differs is `reattachFlowWithClientId` will return
`null` if there is no flow running or finished on the node with the same
client id.
Return `null` if record deleted from race-condition
Update the compatible flag in the DB if the flowstate cannot be deserialised.
The most common cause of this problem is if a CorDapp has been upgraded
without draining flows from the node.
`RUNNABLE` and `HOSPITALISED` flows are restored on node startup so
the flag is set for these then. The flag can also be set when a flow
retries for some reason (see retryFlowFromSafePoint) in this case the
problem has been caused by another reason.
Added a newpause event to the statemachine which returns an Abort
continuation and causes the flow to be moved into the Paused flow Map.
Flows can receive session messages whilst paused.
Add a lock to `StateMachineState`, allowing every flow to lock
themselves when performing a transition or when an external thread (such
as `killFlow`) tries to interact with a flow from occurring at the same
time.
Doing this prevents race-conditions where the external threads mutate
the database or the flow's state causing an in-flight transition to
fail.
A `Semaphore` is used to acquire and release the lock. A `ReentrantLock`
is not used as it is possible for a flow to suspend while locked, and
resume on a different thread. This causes a `ReentrantLock` to fail when
releasing the lock because the thread doing so is not the thread holding
the lock. `Semaphore`s can be used across threads, therefore bypassing
this issue.
The lock is copied across when a flow is retried. This is to prevent
another thread from interacting with a flow just after it has been
retried. Without copying the lock, the external thread would acquire the
old lock and execute, while the fiber thread acquires the new lock and
also executes.
* Remove use of Thread.sleep() FROM FlowReloadAfterCheckpointTest, instead relying on CountdownLatch to wait until the target number has been hit or a timeout occurs, so the thread can continue as soon as the target is hit.
* Replace use of hashmaps to a concurrent queue, to mitigate risk of complex threading issues.
Enhance rpc acknowledgement method (`removeClientId`) to remove checkpoint
from all checkpoint database tables.
Optimize `CheckpointStorage.removeCheckpoint` to not delete from all checkpoint
tables if not needed. This includes excluding the results (`DBFlowResult`) and
exceptions (`DBFlowException`) tables.
Integrate `DBFlowException` with the rest of the checkpoint schema, so now
we are saving the flow's exception result in the database.
Making statemachine not remove `FAILED` flows' checkpoints from the
database if they are started with a clientId.
Retrieve the DBFlowException from the database to construct a
`FlowStateMachineHandle` future and complete exceptionally the flow's result
future for requests (`startFlowDynamicWithClientId`) that pick FAILED flows ,
started with client id, of status Removed.
On killing a flow the client id mapping of the flow gets removed.
The storage serialiser is used for serialising exceptions. Note, that if an
exception cannot be serialised, it will not fail and will instead be stored
as a `CordaRuntimeException`. This could be improved in future
changes.
Dummy package names cause build failure as they are not found on the classpath when trying to import them. Now that empty package name list is allowed, the dummy names are removed.
* CORDA-3663 MockServices crashes when two of the provided packages to scan are deemed empty in 4.4 RC05
this happends when a given package is not found on the classpath. Now it is handled and an exception is thrown
* replace dummy package names in tests with valid ones
* allow empty package list for CustomCordapps and exclude those from the created jars
* detekt fix
* always true logic fix
* fix to check for empty packages instead of empty classes
* fix for classes and fixups
* logic refactor because of detekt stupidity
* PR related minor refactors
When an incorrect message is received, the flow should resume to allow
it to throw the error back to user code and possibly cause the flow to
fail.
For now, if an `EndSessionMessage` is received instead of a
`DataSessionMessage`, then an `UnexpectedFlowEndException` is thrown
back to user code. Allowing it to correctly re-enter normal flow error
handling.
Without this change, the flow will hang due to it failing while creating
a transition which exists outside of the general state machine error
handling code path.
Sessions are now terminated after performing the original
`FlowIORequest` passed into `StartedFlowTransition`, instead of before.
This is done by scheduling an `Event.TerminateSessions` if there are
sessions to terminate when performing a suspending event.
Originally this was done by hijacking a transition that is trying to
perform a `StartedFlowTransition`, terminating the sessions and then
scheduling another `Event.DoRemainingWork` to perform the original
transition. This introduced a bug where, another event (from a external
message) could be placed onto the queue before the
`Event.DoRemainingWork` could be added. In most scenarios, that should
be ok. But, if a flow is retrying (while in an uninitiated state) and
this occurs the flow could fail due to being in an unexpected state.
Terminating the sessions after performing the original transition
removes this possibility. Meaning that a restarting flow will always
perform the transition they supposed to do (based on the called
suspending event).
* CORDA-3871: Import external code
Compiles, but does not work for various reasons
* CORDA-3871: More improvements to imported code
Currently fails due to keystores not being found
* CORDA-3871: Initialise keystores for the server
Currently fails due to keystores for client not being found
* CORDA-3871: Configure certificates to client
The program started to run
* CORDA-3871: Improve debug output
* CORDA-3871: Few more minor changes
* CORDA-3871: Add AMQClient test
Currently fails due to `localCert` not being set
* CORDA-3871: Configure server to demand client to present its certificate
* CORDA-3871: Changes to the test to make it pass
ACK status is not delivered as server is not talking AMQP
* CORDA-3871: Add delayed handshake scenario
* CORDA-3871: Tidy-up imported classes
* CORDA-3871: Hide thread creation inside `ServerThread`
* CORDA-3871: Test description
* CORDA-3871: Detekt baseline update
* CORDA-3871: Trigger repeated execution of new tests
To make sure they are not flaky
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: Improve robustness of the newly introduced tests
* CORDA-3871: New tests proven to be stable - reduce number of iterations to 1
* CORDA-3871: Adding Alex Karnezis to the list of contributors
Correct race condition in FlowVersioningTest where the last message is read (and the session close can be triggered)
before one side has finished reading metadata from the session.
Remove memory leak endurance test as it spends 8 minutes testing a single failure case that's not end user visible,
and ultimately manifests elsewhere in test failures (which is where this came from in the beginning). It was a good
idea to confirm the change fixed the issue, but this isn't critical enough to retain.