Commit Graph

8100 Commits

Author SHA1 Message Date
Dan Newton
9b169df2b8 CORDA-3194 Wrap state transition exceptions and add flow hospital error handling for them (#2542)
Wrap exceptions that occur in state machine transitions with a custom exception type which is
then handled inside of the flow hospital. As part of this change, a number of side negative side
effects have been addressed.

General summary:

- `StateTransitionException` wraps exceptions caught in `TransitionExecutorImpl`
- `StateTransitionExceptions` are handled in the flow hospital, retried 3 times and then kept in
    for observation if errors persist (assuming conditions below are false)
- Exceptions that occur in `FlowAsyncOperation` events are wrapped in
  `AsyncOperationTransitionException` and ignored by the flow hospital transition staff member
- `InterruptException`s are given a `TERMINAL` diagnosis by the flow hospital transition staff
   member (can occur due to `killFlow`)
- Allow flows which have not persisted their original checkpoint to still retry by replaying their
   start flow messages
- Swallow exceptions in `AcknowledgeMessages` actions

Detailed summary:

* CORDA-3194 Add state machine transition error handling to flow hospital

Wrap exceptions that are caught in `TransitionExecutorImpl` (coming from
new errors) with `StateTransitionException`. This exception is then
handled explicitly by the flow hospital.

Add `TransitionErrorGeneralPractitioner` to `StaffedFlowHospital`. This
staff member handles errors that mention `StateTransitionException`.
Errors are retried and then kept in the hospital if the errors persist.

* CORDA-3194 Remove a fiber from the `hospitalisedFlows` if its previous state was clean

If the fiber's previous state was clean then remove it from
`HospitalisingInterceptor.hospitalisedFlows`. This allows flows that are
being retried to clean themselves. Doing this allows them to re-enter
the flow hospital after executing the fiber's transition (if an error
occurs).

This is important for retrying a flow that has errored during a
transition.

* CORDA-3194 Set `isAnyCheckpointPersisted` to true when retrying a flow

Added to prevent a single flow from creating multiple checkpoints when
a failure occurs during `Action.AcknowledgeMessages`.

More specifically, to `isAnyCheckpointPersisted` is false when retrying
the flow, even though a checkpoint has actually been saved. Due to this
a brand new flow is started with a new flow id (causing duplication).

Setting `isAnyCheckpointPersisted` to true specifically when retrying a
flow resolves this issue.

* CORDA-3194 Add Byteman test to verify transition error handling

Add `StatemachineErrorHandlingTest` to verify transition error handling.

Byteman allows exceptions to be injected at certain points in the code's
execution. Therefore exceptions can be thrown when needed inside of the
state machine.

The current tests check errors in events:
- `InitiateFlow`
- `AcknowledgeMessages`

* CORDA-3194 Swallow all exceptions in `ActionExecutorImpl.executeAcknowledgeMessages`

Swallow the exceptions that occur in the `DeduplicationHandler`s when
inside of `ActionExecutorImpl.executeAcknowledgeMessages`.

The side effects of the failures that can happen in the handlers are
not serious enough to put the transition into a failure state.
Therefore they are now caught. This allows the transition to continue
as normal, even if an error occurs in one any of the handlers.

* CORDA-3194 Wrap unexpected exceptions thrown in async operation transitions

Exceptions thrown inside of `FlowAsyncOperation.execute` implementations
that are not returned as part of the future, are caught, wrapped and
rethrown. This prevents unexpected exceptions thrown by (most likely)
user code from being handled by the hospital by the transition
staff member.

This handling might change moving forward, but it allows the async
operation to continue working as it was before transition error handling
was added.

* CORDA-3194 Verify that errors inside of `AcknowledgeMessages` work as expected

Update `StatemachineErrorHandlingTest` to correctly test errors that
occur when executing the `AcknowledgeMessages` action.

* CORDA-3194 Retry flows that failed to persist their original checkpoint

Allow a flow that failed when creating their original checkpoint (for
example - failing to commit the db transaction) to retry.

The flow will create a brand new checkpoint (as the original did not
saved).

This required adding `flowId` to `ExternalStartFlowEvent` to allow the
event to keep a record of the flow's id. When the flow is retried, the
events are replayed which trigger a flow to be started that has the
id stored in the event.

To allow this change, code was removed from `retryFlowFromSafePoint` to
allow the function to continue, even if no checkpoint matches the passed
in flow id.

* CORDA-3194 Correct `FlowFrameworkTests` test due to error handling

Test assumed that errors in transitions are not retried, this has now
been updated so the test passes with the flow succeeding after an
exception is thrown.

* CORDA-3194 Remove unneeded import

* CORDA-3194 Make the state transition exceptions extend `CordaException`

`StateTransitionException` and `AsyncOperationTransitionException` now
extend `CordaException` instead of `Exception`.

* CORDA-3194 Improve log messages

* CORDA-3194 Remove unneeded code in `HospitalisingInterceptor`

Due to a previous change, a section of code that removes a flow id
from the `hospitalisedFlows` map is no longer required. This code has
been removed.

* CORDA-3194 Constraint violations are given `TERMINAL` diagnosis

Add `Diagnosis.TERMINAL` to `StaffedFlowHospital` to allow an error
to be ignored and left to die a quick and painful death.

`StateTransitionException` changed so it does not cause serialisation
errors when propagated from a flow.

* CORDA-3194 `InterruptedExceptions` are given `TERMINAL` diagnosis
2019-11-01 11:48:07 +00:00
Christian Sailer
938828b52f CORDA-3195 Default behaviour of FlowHospital (WIP) (#2520)
* Add GP to flow hospital, and start working on a list of things the GP knows to be incurable.

* Only hospitalise SQL and Persistence Exceptions (let's see if that is enough?), also rename to DatabaseDentist.

* Disabled hospitalisation of SQL exceptions in flow retry tests

* Fix RPC exception handling test by not using PersistenceException

* Ignore flaky integration test

* Code review: Rename staff member and add testing annotation

* Revert compiler.xml
2019-11-01 11:48:07 +00:00
snedamle
93ff072812 adding one point to whitelist contract constraints migration - signed… (#5568)
* adding one point to whitelist contract constraints migration - signed CorDapp JAR must be registered with the CZ network operator

* 1. Removing later releases section
2. Changing 4.0 to 4.3

* Changing 4.3 to |corda_version|
2019-11-01 10:49:04 +00:00
Andrius Dagys
d033fceeef CORDA-3365: Reintroduce dependency to fix BFT-Smart notary (#5640)
The commons-codec:1.10 library was removed due to a security vulnerability,
but in commons-codec:1.13 it appears to have been fixed.
2019-11-01 09:05:12 +00:00
James Higgs
e0701231ac [DOCS] Add rows for new database tables in node-database.rst (#5654) 2019-10-31 17:57:52 +00:00
stefano
7a9ee89ded modify watcher to watch jobs rather than pods 2019-10-31 16:51:33 +00:00
stefano
bd9d8dbdbd promote failure to delete to error log level 2019-10-31 15:58:05 +00:00
stefano
547e6d9edd use jobs to preallocate nodes instead of Pods as they support auto delete 2019-10-31 15:57:03 +00:00
Chris Rankin
2895283500 CORDA-3388: Restore mapping of 'java.lang.Void -> void' (#5650) 2019-10-30 17:47:13 +00:00
Tudor Malene
5cdf7f2b2f CORDA-3370 Remove Network visualiser reference 2019-10-30 16:45:33 +00:00
Tudor Malene
e32f1ca4f2 CORDA-3369 fix samples readme 2019-10-30 16:30:18 +00:00
Stefan Iliev
03ab258fc2 Revert "CORDA-3307 - add support for environment variables in linux (#5523)" (#5643)
This reverts commit c882b221a5.
2019-10-29 17:55:58 +00:00
Razvan Codreanu
d5462a2afe Re enabling persistent volume claims (#5628)
* TM-68 reenabling persistent volume claims using azure files

* TM-68 jenkins stackstracee

* TM-68 removing duplicate volume

* TM-68 pushing storage class yaml file

* TM-68 writing all results to the new persistent volume

* TM-68 fix wrong directory

* TM-68 fix wrong directory

* reapply lost merge commit

* investigate missing POD from test results

* more investigations around pods not executing their tests

* make Pod command line more strict with regards to sub command failure

* make logs an artifact within jenkins

* tidy up command line
2019-10-29 16:23:22 +00:00
Ed Prosser
b4d16399a8 streamlining getting started docs
Signed-off-by: Ed Prosser <edward.prosser@r3.com>
2019-10-28 11:49:19 +00:00
Stefano Franz
f9890a5359
PreAllocate pod resources during image build phase (#5587)
* use zulu for jdk
add some parallel groups

* port kubesTest to Java
remove asterix from tests listed by ListTests, instead add after allocation

* attempt to setup unit test builds with correct github integrations

# Conflicts:
#	.ci/dev/unit/Jenkinsfile

* fix issue with github context

* add credentials block

* start pre-allocating pods for builds

* test

* add blocks for reporting build stages

* add logic to preallocate pods during image building

* tidy up Jenkinsfile for unit tests

* add magic command line flag to enable preallocation of pods

* make docker tag deterministic

* fix issue concatenating docker tag inputs

* add build type specific Jenkinsfile

* try new preallocation approach

* make pre-allocation prefix group specific

* force deAllocator to wait for pods to be actually deleted

* revert jenkinsfiles in .ci

* use smarter waiting logic to address review comments

* add --stacktrace to builds to help debugging

* fix issue with closed stream

* add some logging around preallocation

* tidy up by refactoring (de)allocate task generation into method

* change default from 20 pods to 5 pods

* fix issue where docker tag was unstable between building and running tests

* more documentation

* add some infrastructure around setting the log level for a given build

* change preallocation pod duration to 5min

* see if fast enough if using combined unit and integration tests

* disable unit tests

* print out test summaries

* try and make the kubes client a per-use object, rather than a long lived object. This is step one of making GKE use possible

* add log line about what command is executed in the pod
2019-10-28 11:48:04 +00:00
kyriathar
e2836b1106 CORDA-3279 Change single quotes to double quotes fixes node's shutdown (#5611)
A ConfigException$Parse would be thrown at CordaCaplet#parseConfigFile.

com.typesafe.config.ConfigFactory needs ':' to be included in a double quoted and not in a single quoted string.
2019-10-23 10:01:24 +01:00
Christian Sailer
df8cc7282f CORDA-3360 Add nodeInfo permissions to web user 2019-10-23 09:58:59 +01:00
Ryan Fowler
8e541cb732 CORDA-3358: Add capsule-friendly argument to docs 2019-10-22 16:14:35 +01:00
szymonsztuka
b524c6368b CORDA-3335 Corda Shell flow kill - better warning for misformatted flow ID (#5601)
* CORDA-3081 warn that flow ID passed to flow kill is malformed as due to JDK8 doesn't fully validate it (JDK8 bug https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8159339)

* CORDA-3335 Corda Shell flow kill - better warning for misformatted flow ID - exit earlier and don't RPC to node, refactoring for detekt
2019-10-22 11:32:17 +01:00
Ryan Fowler
5da114caa3 CORDA-3281: Drop some errors to warnings and clean up logic around (#5605)
shell "gracefulShutdown" command.
2019-10-22 11:02:04 +01:00
Dimos Raptis
61cdfa5b26 [CORDA-3341] - Add missing liquibase script for hibernate test (#5623) 2019-10-22 09:52:37 +01:00
Stefano Franz
d693a9c1ce
TM-65 re-add local port availability check (#5618)
* re-add local port availability check

* attempt to fix issue with port allocator

* ensure the serversocket is closed
2019-10-21 16:17:17 +01:00
Jonathan Locke
2ea7e6ccae
CORDA-3346: Remove the JitPack repository from Corda (#5620)
CORDA-3346: Remove the JitPack repository from Corda (#5620)
2019-10-21 15:13:42 +01:00
Anthony Keenan
b9a3b3a871
CORDA-3336: Remove log4j plugins to stop errors in config generator (#5617) 2019-10-21 13:31:09 +01:00
Dimos Raptis
bb7c06fa45 [CORDA-3342] - Show proper error message and adjust indentation in shell (#5612) 2019-10-21 13:21:12 +01:00
Chris Rankin
971eb56a98 CORDA-3346: Remove the JitPack repository from Corda. 2019-10-21 13:15:03 +01:00
Stefan Iliev
c882b221a5 CORDA-3307 - add support for environment variables in linux (#5523)
* Added a new way for environment variables to be loaded, which allows for underscore based separation.

* Moved test to its own kotlin file.

* Added case insensitivity support.

* The corda. prefix is now case insensitive too.

* Removed unused variable.

* Added env variables support for driverDSL. Shadowing corda. properties raises an exception.

* Driver api stability fix.

* Changed type of cordapps param to reflect the real one, rather than what IntelliJ auto completed.

* Some detekt issue fixes. Spread operator removed, baselined api stability constructors and buggy line.

* Fixed misspelled variable.

* Reverted unintentional changes.

* Added suppress instead of changing baseline.

* Reworked logic to handle previously defined CORDA_ starting properties and handle accordingly. Fixed a bug where wrong class was used for reflection walking.

* Fix for detekt issues.

* Changed message to a more understandable one.

* Changelog + doc note, console error grammar.

* Changes according to PR review.
2019-10-21 12:01:14 +01:00
Jonathan Locke
292f83ba1d
CORDA-3304: Add test for max attempts on reconnecting rpc (#5608)
CORDA-3304: Add test for max attempts on reconnecting rpc (#5608)
2019-10-21 09:57:44 +01:00
Jonathan Locke
fc9343628b
CORDA-3332: Add rpc reconnection to node explorer (#5613)
CORDA-3332: Add rpc reconnection to node explorer (#5613)
2019-10-18 19:24:31 +01:00
Razvan Codreanu
c5c80033d3 TM-67 converting ImageBuilding from groovy to java (#5609)
* TM-67 converting ImageBuilding from groovy to java

* TM-67 wrong way to set properties

* TM-67 do first does not seem to work

* TM-67 setting credentials first

* TM-67 small logic fix

* TM-67 adding stacktrace

* TM-67 addressing PR comments

* TM-67 fixed normal builds that don't require docker

* TM-67 adding guard rails to the code

* TM-67 removing stacktrace

* TM-67 addressing pr comments
2019-10-18 16:54:09 +01:00
Ryan Fowler
b0cc5f5ca3 CORDA-3332: Add rpc reconnection to node explorer. 2019-10-18 16:25:49 +01:00
Anthony Keenan
49e904b3a2
Fix corda docker image names in docs (#5606) 2019-10-18 10:46:59 +01:00
Jonathan Locke
7d90e305ea
ENT-4119: Make welcome message more accurate (#5596)
[ENT-4119] Make welcome message more accurate
2019-10-18 08:52:16 +01:00
Dimos Raptis
a3adb4816a [CORDA-3304] - Add test for max attempts on reconnecting rpc 2019-10-17 16:27:50 +01:00
Roger Willis
8978512784
Merge pull request #5599 from corda/willh-db-docs
CORDA-3313 Update docs in line with DB changes.
2019-10-17 15:35:36 +01:00
Stefano Franz
22490ecb51
disable pvc creation during k8s build (#5604)
* disable pvc creation

* make testruns available without pvc
2019-10-17 14:40:55 +01:00
Stefano Franz
5bfdf4ce20 CORDA-3257 Docker image: do not post json to filter generated zip for testnet generation (#5598) 2019-10-17 14:31:31 +01:00
Jonathan Locke
18fbd93268
Merge pull request #5585 from corda/CORDA-3304-rpc-max-retries
[CORDA-3304] Introduce max number of retries per invocation for recon…
2019-10-17 11:54:41 +01:00
Jonathan Locke
e9b85a35c6
CORDA-3317 correct docs typo
CORDA-3317 correct docs typo
2019-10-17 08:35:57 +01:00
Will Hester
f85448072a CORDA-3313 formatting 2019-10-16 16:23:32 +01:00
davidrapacchiale
bcb1eb2fe1 Corda-3317 correct docs typo
Removed ??? from "Note: this information is not currently supposed to be
used in production."
2019-10-16 16:15:42 +01:00
Will Hester
312c72d3fb CORDA-3313 Update docs in line with DB changes. 2019-10-16 16:04:57 +01:00
Dimos Raptis
608fdb82f7 [ENT-4119] Make welcome message generic 2019-10-16 15:08:34 +01:00
Jonathan Locke
1dec07f4d1
CORDA-3152: Register custom serializers for jackson as well as amqp
CORDA-3152: Register custom serializers for jackson as well as amqp
2019-10-16 13:08:01 +01:00
Dimos Raptis
f37638c93d [CORDA-3122] - Cleanup non-finalised, errored flows (#5594)
* [CORDA-3122] - Cleanup non-finalised, errored flows

* detekt
2019-10-16 09:37:28 +01:00
Ryan Fowler
bfa460bc07 CORDA-3152: Register custom serializers for jackson as well as amqp 2019-10-15 15:52:31 +01:00
Razvan Codreanu
ee09cd8762 TM-45 Make detektBaseline pass (#5561)
* TM-45 make the baseline generating task show a successful build regardless of the existing detekt violations

* TM-45 address PR feedback
2019-10-15 15:49:31 +01:00
Razvan Codreanu
45172515ac TM-41 Ability to resume test runs (#5573)
* TM-41 writing test completions to file to keep track of what was finished. to be used in the case that a pod terminates abruptly

* TM-41 addressing PR comments

* TM-41 addressing PR comments

* TM-41 adding exclusion list to guard against tests being passes as a group

* TM-41 trying to find the jenkins breakpoint

* TM-41 debugging jenkins

* TM-41 revert debugging change

* TM-41 revert debugging changes

* TM-41 revert debugging changes

* TM-41 fixing merge conflicts

* TM-41 now that TM-40 is merged static needs to be updated

* TM-41 refactor constant

* TM-41 fixing jenkins failure

* TM-41 trying new path

* TM-41 moving the file reading to the task that will be executed by the workers as the master does not have a persistent volume

* TM-41 moving the after test as well
2019-10-15 15:14:41 +01:00
Dimos Raptis
42e364386d
Merge branch 'release/os/4.3' into CORDA-3304-rpc-max-retries 2019-10-15 14:04:40 +01:00
Zoltan Kiss
a1dd6abe17 TM-40 Ephemeral workspace for k8s workers that survives restarts (#5567)
* Simplify

* Mount shared dir to worker

* format

* podnames with separators

* refactor parameters

* Use PVC for storage

* pvc in namespace

* KubesTest simplify

* no tolowercase

* no private

* lowercase

* RetryStrategy

* minor changes

* wait forever

* undo .idea

* elvis

* add comment

* regcred

* use correct ConfigBuilder

* delete java, will migrate later

* Revert "delete java, will migrate later"

This reverts commit e3bab1f3

* Merging changes in groovy to new java file

* format

* rename variable

* fix log

* private

* remove bak

* move java files

* Revert "move java files"

This reverts commit 89aa4c35
2019-10-15 10:52:44 +01:00