diff --git a/docs/source/corda-nodes-index.rst b/docs/source/corda-nodes-index.rst index c1dfa0b508..7d904694c2 100644 --- a/docs/source/corda-nodes-index.rst +++ b/docs/source/corda-nodes-index.rst @@ -12,4 +12,5 @@ Corda nodes shell node-database node-administration - out-of-process-verification \ No newline at end of file + out-of-process-verification + high-availability diff --git a/docs/source/high-availablility.rst b/docs/source/high-availablility.rst new file mode 100644 index 0000000000..beb3c7b67c --- /dev/null +++ b/docs/source/high-availablility.rst @@ -0,0 +1,48 @@ +High Availability +================= + +This section describes how to make a Corda node highly available. + +Hot Cold +~~~~~~~~ + +In the hot cold configuration, failover is handled manually, by promoting the cold node after the former hot node +failed or was taken offline for maintainance. + +RPC clients have to handle ``RPCException`` and implement application specific recovery and retry. + +Prerequisites +------------- + +* A load-balancer for P2P, RPC and web traffic +* A shared file system for the artemis and certificates directories +* A shared database, e.g. Azure SQL + +The hot-cold deployment consists of two Corda nodes, a hot node that is currently handling request and running flows +and a cold backup node that can take over, if the hot node fails or is taken offline for an upgrade. Both nodes should +be able to connect to a shared database and a replicated file-system hosting the artemis and certificates directories. +The hot-cold ensemble should be fronted by a load-balancer for P2P, web and RPC traffic. The load-balancer should do +health monitoring and route the traffic to the node that is currently active. To prevent data corruption in case of +accidental simultaneous start of both nodes, the current hot node takes a leader lease in the form of a mutual exclusion +lock implemented by a row in the shared database. + +Configuration +------------- + +The configuration snippet below shows the relevant settings. + +.. sourcecode:: none + + enterpriseConfiguration = { + mutualExclusionConfiguration = { + on = true + machineName = ${HOSTNAME} + updateInterval = 20000 + waitInterval = 40000 + } + } + +Hot Warm +~~~~~~~~ + +In the future we are going to support automatic failover.