Commit Graph

8099 Commits

Author SHA1 Message Date
Jonathan Locke
c193aa46f0
CID-1154: reliable finality merge to OS (#5658)
CID-1154: reliable finality merge to OS (#5658)
2019-11-05 10:48:00 +00:00
Jonathan Locke
7a78b93124
CORDA-3369: fix samples readme (#5651)
CORDA-3369: fix samples readme (#5651)
2019-11-05 10:12:06 +00:00
Barry
4a7a9a56be TM-80 Do not publish the junit zip file to Artifactory automatically. (#5667)
This functionality was only in place for debugging purposes, to switch
it back on, set -Dpublish.junit=true in the Jenkinsfile.
2019-11-04 17:37:43 +00:00
Barry
6bf49bc4b7 TM-81 Do not write out the callstack when we cannot find tests.csv (#5669)
The first run of any new branch will not find a corresponding tests.csv
and will return 404 not found which is fine.  We do not need to display
the callstack at warning level.
2019-11-04 17:36:33 +00:00
stefano
718b7abb2f add pod index to container output message to make it easier to find in artifacts 2019-11-04 17:12:50 +00:00
stefano
2dab35c362 tidy up environment within pipeline 2019-11-04 17:00:38 +00:00
stefano
62002e0b3c surround post steps with check if comment build 2019-11-04 16:28:52 +00:00
stefano
f858827757 trigger is not null when comment caused build 2019-11-04 16:24:14 +00:00
stefano
6ae082f67f fix pipeline syntax errors 2019-11-04 16:19:55 +00:00
stefano
bf792c63d5 redesign Jenkinsfile for smoketests to determine if current build is a comment triggered build 2019-11-04 16:18:31 +00:00
stefano
35def14b1f add ability to tigger smoke tests on arbitrary PR 2019-11-04 16:03:12 +00:00
Stefano Franz
f7e0ce6f0b
convert to nanos from seconds rather than milliseconds when parsing JUnit xml (#5666)
* convert to nanos from seconds rather than milliseconds

* fix tests
2019-11-04 15:17:57 +00:00
Stefano Franz
5a0b8c7992
add extra whitespace in jenkins regression build command (#5665) 2019-11-04 13:26:43 +00:00
LankyDan
54394f6747 Release state soft locks when a flow is killed via killFlow 2019-11-04 13:21:06 +00:00
Stefano Franz
e09cd84339
add test time publishing to regression test build (#5664) 2019-11-04 13:05:13 +00:00
Stefano Franz
e4e920eee9
multiprocess port allocator is no longer used, so we can remove the tests as they add a significant amount of time to run (2-3 min) (#5663) 2019-11-04 11:56:38 +00:00
LankyDan
3c0631a26a Fix VaultObserverExceptionTest due to differences between ENT and OS 2019-11-04 09:42:41 +00:00
Barry
91e6c9783f TM-51 Read and write test results to artifactory. (#5597)
* TM-51  Prep for reading and writing test results to artifactory.

* TM-51  Tests from target branch if no tests for current branch

* TM-51  Placeholder for test averaging over runs.

* TM-51  Replace slashes in branch names used as tags.

* TM-51  More placeholder work for the mean duration work.

* TM-51  Write out average tests results as as csv.

The csv file should grow and be updated on each run.  This includes whether or not we are running unit tests, integration tests and so on.

* TM-51  Comment out old junit test archiving, add more comments.

* TM-51  Zip task needs to depend on a csv creation task.

If there isn't a csv file present, then the zip task doesn't run due to 'NO-SOURCE'

* TM-51  Zip task should ignore empty dirs

* TM-51  Fix up loading of test results.

We were looking for the wrong artifact name.
Add a bit more logging.

* TM-51  Fix up possible problem with allocating by class distribution.

If we encounter a class we haven't seen before, there won't be any tests.
This means we should give it some weight.  '1' is far too small.

* TM-51  Test that we are definitely increment the run count.

Tracking down whether the zipped csv file should have incremented.

* TM-51  Better default value for missing test/class names.

Begin by using mean unit test duration, but we have the option to bump
that to the mean class unit tests duration.

* TM-51  More debug information around csv writing.

We should be incrementing the tests.

* TM-51  Reload the csv before updating it.

* TM-51  Reduce verbosity of logging.

* TM-51  Reinstate unit tests.  Remove logging verbosity.

* TM-51  Load tests from artifactory in memory and avoid interim file.

* TM-51  Better handling of zero duration tests.

Ensure we return zero times from junit artifacts which may either be zero or have no recorded time.  Before writing the tests duration csv file, store those with a known time, and then store those with zero using the average time.

* TM-51  Log whether we have recorded a test.

Tracking down the curious case where we seem to not be rerunning the
same set of tests on the second run.

* TM-51  Capture junit files as well.

Trying to track down whether some tests are intermittently run.

* TM-51  Change task dependencies to ensure ziptask is triggered.

* TM-51  Remove test assertion, and trigger build

* TM-51  Add corda/enterprise to artifactory tag name.

Moved properties to own file.

* TM-51  Remove unnecessary mean class-based duration.

* TM-51  Add more BucketingAllocator tests.

We need these to nail down its behaviour some more.

* TM-51  Further log information.

We don't seem to be finding the tests in the 'production' runs which is odd.

* TM-51  corda type double set?

* TM-51  do not set the project type in the properties.

SRP and all that.

* TM-51  better plan reporting

* TM-51  duration may be zero

Another runtime problem that doesn't show in tests.

* TM-51  better plan reporting

* fix missing space after image id

* fix merge issue in DistributedTesting

* TM-51  remove unused code when GET/PUT-ting to Artifactory.

* TM-51  put tasks in gradle group and tidy up zip task creation

* TM-51 Fix the junit XML path.

* TM-51 Fix the task graph

* TM-51 Less logging
2019-11-02 09:07:53 +00:00
Stefano Franz
cf849fbdbd
Merge pull request #5662 from corda/setup_regression_builds
Setup regression builds in jenkins
2019-11-01 16:51:07 +00:00
stefano
f8b4b334e3 truncate pod name from start rather than end 2019-11-01 16:10:35 +00:00
LankyDan
be2e34b33a Fix error from warning from detekt 2019-11-01 16:00:09 +00:00
stefano
d54f2ddd87 fix Library import 2019-11-01 15:57:08 +00:00
stefano
e7e7de1d05 add regression test jenkins file 2019-11-01 15:53:05 +00:00
LankyDan
3aaddb47ea Attempt to fix detekt issues in DriverDSLImpl again 2019-11-01 14:36:47 +00:00
LankyDan
dfb86f5d9c Attempt to fix detekt issues in DriverDSLImpl 2019-11-01 14:21:17 +00:00
LankyDan
b05bd76a77 Fix error in VaultObserverExceptionTest due to bad merge 2019-11-01 13:38:42 +00:00
Stefano Franz
359bb64d69
Merge pull request #5655 from corda/my_merge_branch
Use Kubernetes Jobs rather than Pods to preallocate nodes
2019-11-01 13:11:46 +00:00
LankyDan
4aa9add8c8 CORDA-2050 Upgrade Corda to Java 11 (compatibility mode) (#5356) Upgrade Corda to run with Java 11 (compatibility mode) - see https://github.com/corda/corda/pull/5356 - 3fafbe55
Reapply change that was lost during merge - Adjust resolution of byteman jar to use java 11 compatible mechanism. - a1077092

Manual cherry pick of these changes (a1077092 + 3fafbe55)
2019-11-01 11:50:16 +00:00
Viktor Kolomeyko
c53ea9dde5 Eliminate extensive printout when Byteman not found on the classpath. (#1277)
Byteman is absent for most of the integration tests and long stacktrace is seen in the log
presently that un-necessarily attracts attention and consumes logging space.

(cherry picked from commit 2b6e59e7bd)
2019-11-01 11:50:16 +00:00
LankyDan
e737b01184 Port ledger integrity work to SingleThreadedStateMachineManager
- Port ledger integrity work to `SingleThreadedStateMachineManager`
- Fix `StatemachineErrorHandlingTest`
- Fix compile errors in `RetryFlowMockTest` + `VaultObserverExceptionTest`
- Add method to `StaffedFlowHospital` that was missed during original merge
2019-11-01 11:50:16 +00:00
LankyDan
bedfba8c3d ENT-1967: Illustration for Byteman library can be used in Node integration test. (#1204) - c396b80a
(Only took `DriverDSLImpl` changes)

Simplifying internal startNode with bytemanPort parameter - d88b02f7

Manual cherry pick of these changes (c396b80a + d88b02f7)
2019-11-01 11:50:16 +00:00
Christian Sailer
8a2f4478d2 Undo changes to detekt config that were not required. 2019-11-01 11:48:08 +00:00
Christian Sailer
e9808c5e67 Update compiler.xml for new projects (and code style settings)
# Conflicts:
#	.idea/codeStyles/Project.xml
#	.idea/compiler.xml
2019-11-01 11:48:07 +00:00
Christian Sailer
f670cf774b Suppress/fix more warnings 2019-11-01 11:48:07 +00:00
Christian Sailer
9f15457045 add else branch to avoid warning that fails the warning check 2019-11-01 11:48:07 +00:00
Christian Sailer
1f82213827 Suppress/Baseline detekt issues that were highlighted freshly because of the merge. 2019-11-01 11:48:07 +00:00
Christian Sailer
5b9c5a6b83 Fix and or suppress detekt warnings 2019-11-01 11:48:07 +00:00
Christian Sailer
119f939ee1 Fix and or suppress detekt warnings 2019-11-01 11:48:07 +00:00
Christian Sailer
48aa2f2faa Update compiler.xml for new projects (and code style settings) 2019-11-01 11:48:07 +00:00
LankyDan
c5124689ca CORDA-3194 Update TODOs with jira tickets 2019-11-01 11:48:07 +00:00
Christian Sailer
8f2923c7a3 CID-843 code review fixes (#2681)
* Layout fix

* Fix test to use proper hook in flow hospital

* Fix comment
2019-11-01 11:48:07 +00:00
LankyDan
44206a0bea CORDA-3194 Update flow hospital docs, tidy comment and make exception public 2019-11-01 11:48:07 +00:00
Dan Newton
f4e9b9d5d2 CORDA-3194 Stop killed flows from re-entering hospital (#2664)
* CORDA-3194 Do not allow killed flows back into the hospital

This change has been made to prevent killed flows from being added back
to the hospital after being forcibly removed by `killFlow`. Not doing so,
could leave references to a flow inside of the hospital, which is not
the correct behaviour.

`killFlow` now sets a flow's `StatemachineState.isRemoved` to true.

This check is then used in `StaffedFlowHospital` and the
`DumpHistoryOnErrorInterceptor`.

* CORDA-3194 Log different message for transition error due to killed flow

When a flow is killed, its checkpoint is deleted. Currently, the
statemachine will still try a process the next event even if it has
been killed. This can lead to an error when trying to update the
deleted checkpoint. The exception thrown from this is logged out.

An if statement has been added to log a different message at debug level
if it is due to an update error for a killed flow. This is done to not
alarm node operators of the exception.
2019-11-01 11:48:07 +00:00
Dan Newton
3b3dbd7352 CORDA-3194 Add extra test cases to StatemachineErrorHandlingTest (#2651)
* CORDA-3194 - Add `ReceiveFinalityFlow` byteman tests

* CORDA-3194 Tidy up `StatemachineErrorHandlingTest`
2019-11-01 11:48:07 +00:00
Dan Newton
a591c8e25b CORDA-3194 Relax duplicate insert flow hospital handling (#2643)
* CORDA-3194 Relax duplicate insert flow hospital handling

Revert a previous change to now make the duplicate insert staff member to
give a diagnosis of discharge or not my speciality (previously gave
terminal).

This is to prevent duplicate insert handling from overriding finality
flow error handling.
2019-11-01 11:48:07 +00:00
kyriathar
877ce5587f CORDA-3196 warning at intervals when flows waiting in flow hospital (#2636)
* added the warning as a TimerTask at StaffedFlowHospital#delayedDischargeTimer

* moved the scheduling of the warning task at StaffedFlowHospital#init block. That way we ensure that the task will be scheduled only once at StaffedFlowHospital initialization.

* Corrected overnight observation warning task's logging message. Changed StaffedFlowHospital#delayedDischargeTimer to the more generic StaffedFlowHospital#hospitalJobTimer since it now schedules delayed discharges tasks as well the overnight observation warning task. Removed this from property reference

* switching to fun timerTask for the instantiation of anonymous TimerTask classes

* Correct condition to log patients who are currently in the hospital, whose last record in their medical records is Outcome.OVERNIGHT_OBSERVATION. Extended logging to include treatableSessionInits staying in the hospital

* Add not empty check for patientsUnderOvernightObservation. Correct strings.
2019-11-01 11:48:07 +00:00
Dan Newton
ef01a99737 CORDA-3194 Failure during flow retry forces the flow into overnight observation (#2640)
When a flow fails to retry, it should be kept in for overnight observation and aborted.

In the future, it might be possible to retry flows again that failed during their retry, but for now keeping for observation and aborting is satisfactory.

* CORDA-3194 Remove hospitalised flows from `HospitalisingInterceptor`

Small refactor to remove some of the hospital logic out of the
`HospitalisingInterceptor` and into the `StaffedFlowHospital`.

Add some comments to help clarify the purpose of the two maps inside
of the hospital.

* CORDA-3194 When a flow fails to retry force it into observation

When a flow fails to retry, it should be kept in for overnight
observation and aborted.

In the future, it might be possible to retry flows again that failed
during their retry, but for now keeping for observation and aborting is
satisfactory.

* CORDA-3194 Test for database commit failure when retrying a flow

Failing during the database commit failure that occurs after the retry
flow action does not stop the flow from actually retrying. This test
confirms this functionality.

The retried flow gets scheduled as part of the retry action. The failure
in the commit action does not prevent this since it has already been
scheduled.
2019-11-01 11:48:07 +00:00
Dan Newton
268d129838 CORDA-3194 Replay start flow events when responding flow fails initial checkpoint commit (#2601)
* CORDA-3194 Replay start flow events when responding flow fails initial checkpoint commit

Logic has already been added to recover from initial checkpoint commit
failures on the initiating flow side but this did not suffice for
the same failure occurring on the responding flow's side.

The same idea has been added to resolve the responding flow's issue.

`ExternalMessageEvent` now has a `flowId` that is maintained on the
event. Messages can then be replayed to start/restart the flow, while
the event provides the flow id to each flow start.

Each `ExternalMessageEvent` implementation generates a random `flowId`
when constructed.

Events are stored in Artemis. This allows the solution to recover across
node restarts as the events will be pulled from artemis again when
restarting.

In the future `flowId`s will probably moved off of the events and
generated purely on the responding flow's node.

* CORDA-3194 Add test to verify that errors removing a responding flow are recoverable
2019-11-01 11:48:07 +00:00
Christian Sailer
1f71b071aa CORDA-3217 and CORDA-3195 Various bits arond SQL exceptions and flow hospital (#2605)
* Unwrap rx.OnErrorNotImplementedException so the hospital can handle the cause appropriately

* Add db failure cordapp

* Renamed folders to avoid ambiguity in gradle

* Add integration test for exception hospitalisation when thrown from an RX observable.

* Make the test slightly cleaner

* Fix the schema to actually match the requirements for my custom state. Thanks a bunch, H2.

* Switch test to use SqlException base class.

* Schedule error event if we detect that a commit or db flush has thrown (forcing the flow to error even if customer code then goes ahead to swallow the exception)

* Revert change to schedule extra error

* Add more tests for edge case with DB exceptions, changed CorDapp to suppor this an hook in the flow hospital

* Warning about unsubscribe
Check state transitioned from clean to error for hospital admission.

* Match the test to our actual expectations

* Revert "Revert change to schedule extra error"

This reverts commit 43d47937

* Prevent suppression of errors arising in `transaction()` and `jdbcConnection()`

* Test for SqlException caught trying to escape from recordTransaction and suppressed outside being intercepted.

* More tests for various error/catch combinations

* Clean up and comments

* Code reformat

* Fix test compilation
2019-11-01 11:48:07 +00:00
Dan Newton
9b169df2b8 CORDA-3194 Wrap state transition exceptions and add flow hospital error handling for them (#2542)
Wrap exceptions that occur in state machine transitions with a custom exception type which is
then handled inside of the flow hospital. As part of this change, a number of side negative side
effects have been addressed.

General summary:

- `StateTransitionException` wraps exceptions caught in `TransitionExecutorImpl`
- `StateTransitionExceptions` are handled in the flow hospital, retried 3 times and then kept in
    for observation if errors persist (assuming conditions below are false)
- Exceptions that occur in `FlowAsyncOperation` events are wrapped in
  `AsyncOperationTransitionException` and ignored by the flow hospital transition staff member
- `InterruptException`s are given a `TERMINAL` diagnosis by the flow hospital transition staff
   member (can occur due to `killFlow`)
- Allow flows which have not persisted their original checkpoint to still retry by replaying their
   start flow messages
- Swallow exceptions in `AcknowledgeMessages` actions

Detailed summary:

* CORDA-3194 Add state machine transition error handling to flow hospital

Wrap exceptions that are caught in `TransitionExecutorImpl` (coming from
new errors) with `StateTransitionException`. This exception is then
handled explicitly by the flow hospital.

Add `TransitionErrorGeneralPractitioner` to `StaffedFlowHospital`. This
staff member handles errors that mention `StateTransitionException`.
Errors are retried and then kept in the hospital if the errors persist.

* CORDA-3194 Remove a fiber from the `hospitalisedFlows` if its previous state was clean

If the fiber's previous state was clean then remove it from
`HospitalisingInterceptor.hospitalisedFlows`. This allows flows that are
being retried to clean themselves. Doing this allows them to re-enter
the flow hospital after executing the fiber's transition (if an error
occurs).

This is important for retrying a flow that has errored during a
transition.

* CORDA-3194 Set `isAnyCheckpointPersisted` to true when retrying a flow

Added to prevent a single flow from creating multiple checkpoints when
a failure occurs during `Action.AcknowledgeMessages`.

More specifically, to `isAnyCheckpointPersisted` is false when retrying
the flow, even though a checkpoint has actually been saved. Due to this
a brand new flow is started with a new flow id (causing duplication).

Setting `isAnyCheckpointPersisted` to true specifically when retrying a
flow resolves this issue.

* CORDA-3194 Add Byteman test to verify transition error handling

Add `StatemachineErrorHandlingTest` to verify transition error handling.

Byteman allows exceptions to be injected at certain points in the code's
execution. Therefore exceptions can be thrown when needed inside of the
state machine.

The current tests check errors in events:
- `InitiateFlow`
- `AcknowledgeMessages`

* CORDA-3194 Swallow all exceptions in `ActionExecutorImpl.executeAcknowledgeMessages`

Swallow the exceptions that occur in the `DeduplicationHandler`s when
inside of `ActionExecutorImpl.executeAcknowledgeMessages`.

The side effects of the failures that can happen in the handlers are
not serious enough to put the transition into a failure state.
Therefore they are now caught. This allows the transition to continue
as normal, even if an error occurs in one any of the handlers.

* CORDA-3194 Wrap unexpected exceptions thrown in async operation transitions

Exceptions thrown inside of `FlowAsyncOperation.execute` implementations
that are not returned as part of the future, are caught, wrapped and
rethrown. This prevents unexpected exceptions thrown by (most likely)
user code from being handled by the hospital by the transition
staff member.

This handling might change moving forward, but it allows the async
operation to continue working as it was before transition error handling
was added.

* CORDA-3194 Verify that errors inside of `AcknowledgeMessages` work as expected

Update `StatemachineErrorHandlingTest` to correctly test errors that
occur when executing the `AcknowledgeMessages` action.

* CORDA-3194 Retry flows that failed to persist their original checkpoint

Allow a flow that failed when creating their original checkpoint (for
example - failing to commit the db transaction) to retry.

The flow will create a brand new checkpoint (as the original did not
saved).

This required adding `flowId` to `ExternalStartFlowEvent` to allow the
event to keep a record of the flow's id. When the flow is retried, the
events are replayed which trigger a flow to be started that has the
id stored in the event.

To allow this change, code was removed from `retryFlowFromSafePoint` to
allow the function to continue, even if no checkpoint matches the passed
in flow id.

* CORDA-3194 Correct `FlowFrameworkTests` test due to error handling

Test assumed that errors in transitions are not retried, this has now
been updated so the test passes with the flow succeeding after an
exception is thrown.

* CORDA-3194 Remove unneeded import

* CORDA-3194 Make the state transition exceptions extend `CordaException`

`StateTransitionException` and `AsyncOperationTransitionException` now
extend `CordaException` instead of `Exception`.

* CORDA-3194 Improve log messages

* CORDA-3194 Remove unneeded code in `HospitalisingInterceptor`

Due to a previous change, a section of code that removes a flow id
from the `hospitalisedFlows` map is no longer required. This code has
been removed.

* CORDA-3194 Constraint violations are given `TERMINAL` diagnosis

Add `Diagnosis.TERMINAL` to `StaffedFlowHospital` to allow an error
to be ignored and left to die a quick and painful death.

`StateTransitionException` changed so it does not cause serialisation
errors when propagated from a flow.

* CORDA-3194 `InterruptedExceptions` are given `TERMINAL` diagnosis
2019-11-01 11:48:07 +00:00