The state machines state is held within `InnerState` which lived inside
the SMM. `InnerState` has been extracted out of the SMM to allow the SMM
to be refactored in the future. Smaller classes can now be made that
focus on a single goal as the locking of the state can be accessed from
external classes. To achieve this, pass the `InnerState` into the class
and request a lock if needed.
The locking of `InnerState` has been made a property of the `InnerState`
itself. It has a `lock` field that allows locks to be taken out when
needed.
An inline `withLock` function has been added to tidy up the code and not
harm performance.
Some classes have been made internal to prevent invalid usage of purely
node internal classes.
As part of this change, flow timeouts have been extracted out into
`FlowTimeoutScheduler`.
Only hit the database if `StateMachineState.isAnyCheckpointPersisted`
returns true. Otherwise, there will be no checkpoint to retrieve from the
database anyway. This can prevent errors due to a transient loss of
connection to the database.
Update tests after merging to 4.6
Only hit the database if `StateMachineState.isAnyCheckpointPersisted`
returns true. Otherwise, there will be no checkpoint to retrieve from the
database anyway. This can prevent errors due to a transient loss of
connection to the database.
* CORDA-3578 add corda prefix to conf file names as original issue was having non-corda reference.conf files on classpath causes DriverDSLImp failure
it is better to have this naming convention and avoid further conflicts of conf files.
* fixed weird duplicates
* revert renaming changes for web-reference.conf and loadtest-reference.conf
* Ensure logs directory is written to correct location
* Remove a superfluous set of log path property
* Add a unit test to catch bad log paths
* Address detekt issues
* checking for unsigned cordapps in prod mode and shutting down if it the contracts are not signed
* inheriting from CordaRuntimeException instead of Exception
* had the same tests twice, removed the shutdown for minimumplatformversion, modified some of the tests
* shut down when we get invalid platformversion and modified the test according to this
* making the message more accurate
* CORDA-3176 Inefficient query generated on vault queries with custom paging
a check added to avoid self joins
local postgres db tests were executed on large volume test data (50k states) to ensure that the DB optimizes the suspicious query and so it does not cuase performance issues.
A test added to check that a pagination query on large volume of sorted data is completed in a reasonable time
* foreach detekt fix
Adding to the query -explicitly- the flow's locked states resolved the SQL Deadlocks in SQL server. The SQL Deadlocks would come up from `softLockRelease` when only the `lockId` was passed in as argument. In that case the query optimizer would use `lock_id_idx(lock_id, state_status)` to search and update entries in `VAULT_STATES` table.
However, all rest of the queries would follow the opposite direction meaning they would use PK's `index(output_index, transaction_id)` but they would also update the `lock_id` column and therefore the `lock_id_idx` as well, because `lock_id` is a part of it. That was causing a circular locking among the different transactions (SQL processes) within the database.
To resolve this, whenever a flow attempts to reserve soft locks using their flow id (remember the flow id is always the flow id for the very first flow in the flow stack), we then save these states to the fiber. Then, upon releasing soft locks the fiber passes that set to the release soft locks query. That way the database query optimizer will use the primary key index of VAULT_STATES table, instead of lock_id_idx in order to search rows to update. That way the query will be aligned with the rest of the queries that are following that route as well (i.e. making use of the primary key), and therefore its locking order of resources within the database will be aligned with the rest queries' locking orders (solving SQL deadlocks).
* Fixed SQL deadlocks caused from softLockRelease, by saving locked states per fiber; NodeVaultService.softLockRelease query then uses VAULT_STATES PK index instead of lock_id_idx
Speed up SQL server by breaking down queries with > 16 elements in their IN clause, into sub queries with 16 elements max in their IN clauses
* Allow softLockedStates to remove states
* Add to softLockedStates only states soft locked under our flow id
* Fix softLockRelease not to take into account flowStateMachineImpl.softLockedStates when using lockId != ourFlowId
* Moved CriteriaBuilder.executeUpdate at the bottom of the file
Performance tuning of the new checkpoint schema;
- Checkpoint tables are now using `flowId` as join keys.
- Indexes consist of a PK's index on `node_checkpoints(flow_id)` and then unique indexes on `node_checkpoint_blobs(flow_id)` and `node_flow_metadata(flow_id)`.
- Serialization of `checkpointState` is being done with `CHECKPOINT_CONTEXT` so that we can have compression. This is needed when messages get passed into `checkpointState.sessions` therefore `checkpointState` grows in size upon serialized and
saved into the database.
* Deserialize checkpointState with CHECKPOINT_CONTEXT
* Align tests with schema update; We cannot add and update a checkpoint in the same session now, ends up with hibernate complaining: two different objects with same identifier
* Fix indentation and format
* Ignore tests that assert DBFlowResult or DBFlowException
* Set DBFlowCheckpoint.blob to null whenever the flow errors or hospitalizes; this way we save an extra SELECT in such cases;
* Fix test; cleared Hibernate session, it would fail at checkpoint with 'org.hibernate.NonUniqueObjectException'
* Changing VARCHAR to NVARCHAR
* Rename v17 liquibase scripts to v19 to resolve collision with ENT v17 scripts
* CORDA-3750: Use hand-written sandbox Crypto object that delegates to the node.
* CORDA-3750: Add integration test for deterministic CashIssueAndPayment flow.
* Tidy up generics for Array instances.
* Upgrade to DJVM 1.1-RC04.
When a flow is missing its database transaction the _real_ stacktrace
of the exception is missing in the logs. This is due to the normal error
handling path way does not work due the transaction being missing. When
it goes to process the next error event (due to the original transaction
context missing error) it will fail for the same error as it needs the
context to be able to process the error event.
The extra logging should help diagnose future errors.
* Fix erroneous sql statement for oracle; It was failing tests with 'ORA-00933: SQL command not properly ended'
* Fixed flaky test; it didn't wait for counter party flow to get hospitalized as the test implied