More doc work on bridge Fixup docs with regard to HA changes discovered during testing Link to bridge docs and add note about Zookeeper prerequisite. Doc in progress Add operating modes to overview doc Change to BridgeInner from FloatInner Add diagram to worked configuration section Fix some typos and comments from PR reviews Try to tidy up some of the wording. Add new config properties and change to subsection headers for mode discussion
13 KiB
Corda Bridge Component Overview
Introduction
The Corda bridge
/float
component is designed for enterprise deployments and acts as an application level firewall and protocol break on all internet facing endpoints. The corda-bridgeserver.jar
encapsulates the peer network functionality of the basic Corda node
, so that this can be operated separately from the security sensitive JVM runtime of the node
. This gives separation of functionality and ensures that the legal identity keys are not used in the same process as the internet TLS connections. Also, it adds support for enterprise deployment requirements, such as High Availability (HA) and SOCKS proxy support.
This document is intended to provide an overview of the architecture and options available.
Terminology
The component referred to here as the bridge
is the library of code responsible for managing outgoing links to peer nodes and implements the AMQP 1.0 protocol over TLS 1.0 between peers to provide reliable flow message delivery. This component can be run as a simple integrated feature of the node
. However, for enhanced security and features on R3 Enterprise Corda the in-node version should be turned off and a standalone and HA version can be run from the corda-bridgeserver.jar
, possibly integrating with a SOCKS proxy too.
The float
component refers to the inbound socket listener, packet filtering and DMZ compatible component. In the simple all-in-one node
all inbound peer connections terminate directly onto an embedded Artemis broker component hosted within the node
. The connection authentication and packet the filtering is managed directly via Artemis permission controls managed directly inside the node
JVM. For R3 Enterprise Corda deployments we provide a more secure and configurable isolation component that is available using code inside corda-bridgeserver.jar
. This component is designed to provide a clear protocol break and thus prevents the node
and Artemis server ever being directly exposed to peers. For simpler deployments with no DMZ the float
and bridge
logic can also be run as a single application behind the firewall, but still protecting the node
and hosted Artemis. In future we may also host the Artemis server out of process and shared across nodes, but this will be transparent to peers as the interchange protocol will continue to be AMQP 1.0 over TLS.
Note
All deployment modes of the bridge
, float
, or all-in-one node
are transparently interoperable, if correctly configured.
Message Path Between Peer Nodes
When a flow within a node
needs to send a message to a peer there is a carefully orchestrated sequence of steps to ensure correct secure routing based upon the network map information and to ensure safe, restartable delivery to the remote flow. Adding the bridge
and float
to this process adds some extra steps and security checking of the messages. The complete sequence is therefore:
The flow calls
send
, orsendAndReceive
to propagate a message to a peer. This leads to checkpointing of the flow fiber within theStateMachine
and posting the message to the internalMessagingService
. This ensures that the send activity will be retried if there are any errors before further durable transmission of the message.The
MessagingService
checks if this is a new destinationnode
and if an existing out queue and bridge exists in Artemis. If the durable out queue does not exist then this will need to be created in Artemis:- First the durable queue needs to be created in the peer-to-peer Artemis. Each queue is uniquely named based upon the hash of the legal identity
PublicKey
of the targetnode
. - Once the queue creation is complete a bridge creation request is also published onto the Artemis bus via the bridge control protocol. This message uses information from the network map to link the out queue to the target host and port and TLS credentials. The flow does not need to wait for any response at this point and can carry on to send messages to the Artemis out queue.
- The message when received by the
bridge
process opens a TLS connection to the remote peer (optionally, this connection can be made via a SOCKS4/5 proxy). On connect the two ends of the TLS link exchange certificate details and confirm that the certificate path is anchored at the network root certificate and that the X500 subject matches the expected target as specified in the create bridge message using details contained in the network map. The links are long lived so as to reduce the setup cost of the P2P messaging. In future, there may also be DOS protection measures applied. - If the outgoing TLS 1.2 link is created successfully then the
bridge
opens a consumer on the Artemis out queue. The pending messages will then be transferred to the remote destination using AMQP 1.0, with final removal from the out queue only occurring when the remote end fully acknowledges safe message receipt. This ensures at least once delivery semantics. - Note that at startup of either the
node
, or thebridge
the bridge control protocol resynchronises the bridging state, so that all out queues have an active bridge.
- First the durable queue needs to be created in the peer-to-peer Artemis. Each queue is uniquely named based upon the hash of the legal identity
Assuming an out queue exists the message can be posted to Artemis and the
bridge
should eventually deliver this message to the remote system.On receipt of a message acknowledge from Artemis the
StateMachine
can continue flow if it is not awaiting a response i.e. asend
operation. Otherwise it remains suspended waiting for the reply.The receiving end of the
bridge
TLS/AMQP 1.0 link might be the Artemis broker of a remotenode
, but for now we assume it is an enterprise deployment that is using afloat
process running behind a firewall. The receiver will already have confirmed the validity of the TLS originator when it accepted the TLS handshake. However, thefloat
does some further basic checking of received messages and their associated headers. For instance the message must be targeted at an inbox address and must be below the network parameters definedmaxMessageSize
.Having passed initial checks on the message the
float
bundles up the message and originator as a payload to be sent across the DMZ internal firewall. This inbound message path uses a separate AMQP 1.0/TLS control tunnel. (N.B. This link is initiated from the local masterbridge
in the trusted zone to thefloat
in the DMZ. This allows a simple firewall rule to be configured which blocks any attempts to probe the internal network from the DMZ.) Once the message is forwarded thefloat
keeps track of the delivery acknowledgements, so that the original sender will consume the message in the source queue, only on final delivery to the peer inbox. Any disconnections, or problems will send a reject status leading to redelivery from source.The
bridge
process having now received custody of the message does further checks that the message is good. At the minute the checks are essentially of well formedness of the message and that the source and destination are valid. However, future enhancements may include deep inspection of the message payload for CorDapp blacklisting, and other purposes. Any problems and the message is acknowledged to prevent further redelivery, logged to audit and dropped.Assuming this is a normal message it is passed onto the Artemis inbox and on acknowledgment of delivery is cascaded back. Thus, Artemis acknowledgement, leads to acknowledgement of the tunnel AMQP packet, which acknowledges the AMQP back to the sending
bridge
and that finally marks the Artemis out queue item as consumed. To prevent this leading to very slow one after the other message delivery the AMQP channels using sliding window flow control. (Currently, a practical default is set internally and the window size is not user configurable.)The
MessagingService
on the peernode
will pick up the message from inbox on Artemis, carry out any necessary deduplication. This deduplication is needed as the distributed restartable logic of the Corda wire protocol only offers 'at least once' delivery guarantees. The resulting unique messages are then passed to theStateMachine
so that the remote flow can be woken up.The reply messages use the authenticated originator flag attached by the
float
to route the replies back to the correct originator.Note
That the message reply path is not via the inbound path, but instead is via a separately validated route from the local
bridge
to the originalnode
'sfloat
and then on to the originalnode
via Artemis.
Operating modes of the Bridge and Float
Embedded Developer Node (node + artemis + internal bridge, no float, no DMZ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The simplest development deployment of the bridge is to just use the embedded Peer-to-Peer Artemis with the node as TLS endpoint and to have the outgoing packets use the internal bridge functionality. Typically this should only be used for easy development, or for organisations evaluating on Open Source Corda, where this is the only available option:
Node + Bridge (no float, no DMZ)
The next simplest deployment is to turn off the built in bridge using the externalBridge
enterprise config property and to run a single combined bridge
/float
process. This might be suitable for a test environment, to conserve VM's.
Note
Note that to run the
bridge
and thenode
on the same machine there could be a port conflict with a naive setup, but by using themessagingServerAddress
property to specify the bind address and port plus settingmessagingServerExternal = false
the embedded Artemis P2P broker can be set to listen on a different port rather than the advertisedp2paddress
port. Then configure an all-in-onebridge
to point at thisnode
:
DMZ ready (node + bridge + float)
To familiarize oneself with the a more complete deployment including a DMZ and separated inbound and outbound paths the bridgeMode
property in the bridge.conf
should be set to BridgeInner
for the bridge
and FloatOuter
for the DMZ float
. The diagram below shows such a non-HA deployment. This would not be recommended for production, unless used as part of a cold DR type standby.
Note
Note that whilst the bridge
needs access to the official TLS private key, the tunnel link should use a private set of link specific keys and certificates. The float
will be provisioned dynamically with the official TLS key when activated via the tunnel and this key will never be stored in the DMZ:
DMZ ready with outbound SOCKS
Some organisations require dynamic outgoing connections to operate via a SOCKS proxy. The code supports this option by adding extra information to the outboundConfig
section of the bridge
process. An simplified example deployment is shown here to highlight the option:
Full production HA DMZ ready mode (hot/cold node, hot/warm bridge)
Finally, we show a full HA solution as recommended for production. This does require adding an external Zookeeper
cluster to provide bridge
master selection and extra instances of the bridge
and float
. This allows hot-warm operation of all the bridge
and float
instances. The R3 enterprise Corda node
should be run as hot-cold HA too. Highlighted in the diagram is the addition of the haConfig
section to point at zookeeper
and also the use of secondary addresses in the alternateArtemisAddresses
to allow node
failover and in the floatAddresses
to point at a pool of DMZ float
processes.: