corda/docs/source/corda-bridge-component.rst

Corda Bridge Component Overview
===============================

.. contents::

Introduction
------------
The Corda ``bridge``/``float`` component is designed for enterprise deployments and acts as an application level
firewall and protocol break on all internet facing endpoints. The ``corda-bridgeserver.jar`` encapsulates the peer
network functionality of the basic Corda ``node``, so that this can be operated separately from the security sensitive
JVM runtime of the ``node``. This gives separation of functionality and ensures that the legal identity keys are not
used in the same process as the internet TLS connections. Also, it adds support for enterprise deployment requirements,
such as High Availability (HA) and SOCKS proxy support.

This document is intended to provide an overview of the architecture and options available.

Terminology
-----------
The component referred to here as the ``bridge`` is the library of code responsible for managing outgoing links to peer
nodes and implements the AMQP 1.0 protocol over TLS 1.0 between peers to provide reliable flow message delivery. This
component can be run as a simple integrated feature of the ``node``. However, for enhanced security and features on R3
Enterprise Corda the in-node version should be turned off and a standalone and HA version can be run from the
``corda-bridgeserver.jar``, possibly integrating with a SOCKS proxy too.

The ``float`` component refers to the inbound socket listener, packet filtering and DMZ compatible component. In the
simple all-in-one ``node`` all inbound peer connections terminate directly onto an embedded Artemis broker component
hosted within the ``node``. The connection authentication and packet the filtering is managed directly via Artemis
permission controls managed directly inside the ``node`` JVM. For R3 Enterprise Corda deployments we provide a more
secure and configurable isolation component that is available using code inside ``corda-bridgeserver.jar``. This
component is designed to provide a clear protocol break and thus prevents the ``node`` and Artemis server ever being
directly exposed to peers. For simpler deployments with no DMZ the ``float`` and ``bridge`` logic can also be run as a
single application behind the firewall, but still protecting the ``node`` and hosted Artemis. In future we may also host
the Artemis server out of process and shared across nodes, but this will be transparent to peers as the interchange
protocol will continue to be AMQP 1.0 over TLS.

.. Note:: All deployment modes of the ``bridge``, ``float``, or all-in-one ``node`` are transparently interoperable, if correctly configured.

Message Path Between Peer Nodes
-------------------------------
When a flow within a ``node`` needs to send a message to a peer there is a carefully orchestrated sequence of steps to ensure
correct secure routing based upon the network map information and to ensure safe, restartable delivery to the remote flow.
Adding the ``bridge`` and ``float`` to this process adds some extra steps and security checking of the messages.
The complete sequence is therefore:

1.   The flow calls ``send``, or ``sendAndReceive`` to propagate a message to a peer. This leads to checkpointing
     of the flow fiber within the ``StateMachine`` and posting the message to the internal ``MessagingService``. This ensures that
     the send activity will be retried if there are any errors before further durable transmission of the message.

2.   The ``MessagingService`` checks if this is a new destination ``node`` and if an existing out queue and bridge exists in Artemis.
     If the durable out queue does not exist then this will need to be created in Artemis:

     a.   First the durable queue needs to be created in the peer-to-peer Artemis. Each queue is uniquely named based upon the hash of the
          legal identity ``PublicKey`` of the target ``node``.

     b.   Once the queue creation is complete a bridge creation request is also published onto the Artemis bus via the bridge control protocol.
          This message uses information from the network map to link the out queue to the target host and port and TLS credentials.
          The flow does not need to wait for any response at this point and can carry on to send messages to the Artemis out queue.

     c.   The message when received by the ``bridge`` process opens a TLS connection to the remote peer (optionally, this
          connection can be made via a SOCKS4/5 proxy). On connect the two ends of the TLS link exchange certificate details
          and confirm that the certificate path is anchored at the network root certificate and that the X500 subject matches
          the expected target as specified in the create bridge message using details contained in the network map.
          The links are long lived so as to reduce the setup cost of the P2P messaging.
          In future, there may also be DOS protection measures applied.

     d.   If the outgoing TLS 1.2 link is created successfully then the ``bridge`` opens a consumer on the Artemis out queue.
          The pending messages will then be transferred to the remote destination using AMQP 1.0, with final removal from the
          out queue only occurring when the remote end fully acknowledges safe message receipt. This ensures at least once
          delivery semantics.

     e.   Note that at startup of either the ``node``, or the ``bridge`` the bridge control protocol resynchronises the bridging state,
          so that all out queues have an active bridge.

3.   Assuming an out queue exists the message can be posted to Artemis and the ``bridge`` should eventually deliver this
     message to the remote system.

4.   On receipt of a message acknowledge from Artemis the ``StateMachine`` can continue flow if it is not awaiting a response
     i.e. a ``send`` operation. Otherwise it remains suspended waiting for the reply.

5.   The receiving end of the ``bridge`` TLS/AMQP 1.0 link might be the Artemis broker of a remote ``node``,
     but for now we assume it is an enterprise deployment that is using a ``float`` process running behind a firewall.
     The receiver will already  have confirmed the validity of the TLS originator when it accepted the TLS handshake.
     However, the ``float`` does some further basic checking of received messages and their associated headers.
     For instance the message must be targeted at an inbox address and must be below the network parameters defined ``maxMessageSize``.

6.   Having passed initial checks on the message the ``float`` bundles up the message and originator as a payload to be
     sent across the DMZ internal firewall. This inbound message path uses a separate AMQP 1.0/TLS control tunnel.
     (N.B. This link is initiated from the local master ``bridge`` in the trusted zone to the ``float`` in the DMZ. This allows a
     simple firewall rule to be configured which blocks any attempts to probe the internal network from the DMZ.)
     Once the message is forwarded the ``float`` keeps track of the delivery acknowledgements,
     so that the original sender will consume the message in the source queue, only on final delivery to the peer inbox.
     Any disconnections, or problems will send a reject status leading to redelivery from source.

7.   The ``bridge`` process having now received custody of the message does further checks that the message is good. At the
     minute the checks are essentially of well formedness of the message and that the source and destination are valid.
     However, future enhancements may include deep inspection of the message payload for CorDapp blacklisting, and other purposes.
     Any problems and the message is acknowledged to prevent further redelivery, logged to audit and dropped.

8.   Assuming this is a normal message it is passed onto the Artemis inbox and on acknowledgment of delivery
     is cascaded back. Thus, Artemis acknowledgement, leads to acknowledgement of the tunnel AMQP packet,
     which acknowledges the AMQP back to the sending ``bridge`` and that finally marks the Artemis out queue item as consumed.
     To prevent this leading to very slow one after the other message delivery the AMQP channels using sliding window flow control.
     (Currently, a practical default is set internally and the window size is not user configurable.)

9.   The ``MessagingService`` on the peer ``node`` will pick up the message from inbox on Artemis, carry out any necessary
     deduplication. This deduplication is needed as the distributed restartable logic of the Corda wire protocol only
     offers 'at least once' delivery guarantees.
     The resulting unique messages are then passed to the ``StateMachine`` so that the remote flow can be woken up.

10.  The reply messages use the authenticated originator flag attached by the ``float`` to route the replies back to the
     correct originator.

     .. Note::   That the message reply path is not via the inbound path, but instead is via a separately validated route
        from the local ``bridge`` to the original ``node``'s ``float`` and then on to the original ``node`` via Artemis.

Operating modes of the Bridge and Float
---------------------------------------

Embedded Developer Node (node + artemis + internal bridge, no float, no DMZ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The simplest development deployment of the bridge is to just use the embedded Peer-to-Peer Artemis with the node as TLS endpoint
and to have the outgoing packets use the internal bridge functionality. Typically this should only be used for easy development,
or for organisations evaluating on Open Source Corda, where this is the only available option:

.. image:: resources/bridge/node_embedded_bridge.png
     :scale: 100%
     :align: center

Node + Bridge (no float, no DMZ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The next simplest deployment is to turn off the built in bridge using the ``externalBridge`` enterprise config property
and to run a single combined ``bridge``/``float`` process. This might be suitable for a test environment, to conserve VM's.

 .. note::  Note that to run the ``bridge`` and the ``node`` on the same machine there could be a port conflict with a naive setup,
            but by using the ``messagingServerAddress`` property to specify the bind address and port plus setting
            ``messagingServerExternal = false``
            the embedded Artemis P2P broker can be set to listen on a different port rather than the advertised ``p2paddress`` port.
            Then configure an all-in-one ``bridge`` to point at this ``node``:

.. image:: resources/bridge/simple_bridge.png
     :scale: 100%
     :align: center

DMZ ready (node + bridge + float)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To familiarize oneself with the a more complete deployment including a DMZ and separated inbound and outbound paths
the ``bridgeMode`` property in the ``bridge.conf`` should be set to ``BridgeInner`` for the ``bridge`` and
``FloatOuter`` for the DMZ ``float``. The diagram below shows such a non-HA deployment. This would not be recommended
for production, unless used as part of a cold DR type standby.

.. note::  Note that whilst the ``bridge`` needs access to the official TLS private
        key, the tunnel link should use a private set of link specific keys and certificates. The ``float`` will be provisioned
        dynamically with the official TLS key when activated via the tunnel and this key will never be stored in the DMZ:

.. image:: resources/bridge/bridge_and_float.png
     :scale: 100%
     :align: center

DMZ ready with outbound SOCKS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some organisations require dynamic outgoing connections to operate via a SOCKS proxy. The code supports this option
by adding extra information to the ``outboundConfig`` section of the ``bridge`` process. An simplified example deployment is shown here
to highlight the option:

.. image:: resources/bridge/bridge_with_socks.png
     :scale: 100%
     :align: center

Full production HA DMZ ready mode (hot/cold node, hot/warm bridge)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Finally, we show a full HA solution as recommended for production. This does require adding an external ``Zookeeper``
cluster to provide ``bridge`` master selection and extra instances of the ``bridge`` and ``float``. This allows
hot-warm operation of all the ``bridge`` and ``float`` instances. The R3 enterprise Corda ``node`` should be run as hot-cold HA too.
Highlighted in the diagram is the addition of the ``haConfig`` section to point at ``zookeeper`` and also the use of secondary
addresses in the ``alternateArtemisAddresses`` to allow ``node`` failover and in the ``floatAddresses`` to point at a
pool of DMZ ``float`` processes.:

.. image:: resources/bridge/ha_bridge_float.png
     :scale: 100%
     :align: center
First pass doc on the bridge.conf configuration file. More doc work on bridge Fixup docs with regard to HA changes discovered during testing Link to bridge docs and add note about Zookeeper prerequisite. Doc in progress Add operating modes to overview doc Change to BridgeInner from FloatInner Add diagram to worked configuration section Fix some typos and comments from PR reviews Try to tidy up some of the wording. Add new config properties and change to subsection headers for mode discussion 2018-04-30 16:35:06 +01:00			`Corda Bridge Component Overview`
			`===============================`

			`.. contents::`

			`Introduction`
			`------------`
			The Corda ``bridge``/``float`` component is designed for enterprise deployments and acts as an application level
			firewall and protocol break on all internet facing endpoints. The ``corda-bridgeserver.jar`` encapsulates the peer
			network functionality of the basic Corda ``node``, so that this can be operated separately from the security sensitive
			JVM runtime of the ``node``. This gives separation of functionality and ensures that the legal identity keys are not
			`used in the same process as the internet TLS connections. Also, it adds support for enterprise deployment requirements,`
			`such as High Availability (HA) and SOCKS proxy support.`

			`This document is intended to provide an overview of the architecture and options available.`

			`Terminology`
			`-----------`
			The component referred to here as the ``bridge`` is the library of code responsible for managing outgoing links to peer
			`nodes and implements the AMQP 1.0 protocol over TLS 1.0 between peers to provide reliable flow message delivery. This`
			component can be run as a simple integrated feature of the ``node``. However, for enhanced security and features on R3
			`Enterprise Corda the in-node version should be turned off and a standalone and HA version can be run from the`
			``corda-bridgeserver.jar``, possibly integrating with a SOCKS proxy too.

			The ``float`` component refers to the inbound socket listener, packet filtering and DMZ compatible component. In the
			simple all-in-one ``node`` all inbound peer connections terminate directly onto an embedded Artemis broker component
			hosted within the ``node``. The connection authentication and packet the filtering is managed directly via Artemis
			permission controls managed directly inside the ``node`` JVM. For R3 Enterprise Corda deployments we provide a more
			secure and configurable isolation component that is available using code inside ``corda-bridgeserver.jar``. This
			component is designed to provide a clear protocol break and thus prevents the ``node`` and Artemis server ever being
			directly exposed to peers. For simpler deployments with no DMZ the ``float`` and ``bridge`` logic can also be run as a
			single application behind the firewall, but still protecting the ``node`` and hosted Artemis. In future we may also host
			`the Artemis server out of process and shared across nodes, but this will be transparent to peers as the interchange`
			`protocol will continue to be AMQP 1.0 over TLS.`

			.. Note:: All deployment modes of the ``bridge``, ``float``, or all-in-one ``node`` are transparently interoperable, if correctly configured.

			`Message Path Between Peer Nodes`
			`-------------------------------`
			When a flow within a ``node`` needs to send a message to a peer there is a carefully orchestrated sequence of steps to ensure
			`correct secure routing based upon the network map information and to ensure safe, restartable delivery to the remote flow.`
			Adding the ``bridge`` and ``float`` to this process adds some extra steps and security checking of the messages.
			`The complete sequence is therefore:`

			1. The flow calls ``send``, or ``sendAndReceive`` to propagate a message to a peer. This leads to checkpointing
			of the flow fiber within the ``StateMachine`` and posting the message to the internal ``MessagingService``. This ensures that
			`the send activity will be retried if there are any errors before further durable transmission of the message.`

			2. The ``MessagingService`` checks if this is a new destination ``node`` and if an existing out queue and bridge exists in Artemis.
			`If the durable out queue does not exist then this will need to be created in Artemis:`

			`a. First the durable queue needs to be created in the peer-to-peer Artemis. Each queue is uniquely named based upon the hash of the`
			legal identity ``PublicKey`` of the target ``node``.

			`b. Once the queue creation is complete a bridge creation request is also published onto the Artemis bus via the bridge control protocol.`
			`This message uses information from the network map to link the out queue to the target host and port and TLS credentials.`
			`The flow does not need to wait for any response at this point and can carry on to send messages to the Artemis out queue.`

			c. The message when received by the ``bridge`` process opens a TLS connection to the remote peer (optionally, this
			`connection can be made via a SOCKS4/5 proxy). On connect the two ends of the TLS link exchange certificate details`
			`and confirm that the certificate path is anchored at the network root certificate and that the X500 subject matches`
			`the expected target as specified in the create bridge message using details contained in the network map.`
			`The links are long lived so as to reduce the setup cost of the P2P messaging.`
			`In future, there may also be DOS protection measures applied.`

			d. If the outgoing TLS 1.2 link is created successfully then the ``bridge`` opens a consumer on the Artemis out queue.
			`The pending messages will then be transferred to the remote destination using AMQP 1.0, with final removal from the`
			`out queue only occurring when the remote end fully acknowledges safe message receipt. This ensures at least once`
			`delivery semantics.`

			e. Note that at startup of either the ``node``, or the ``bridge`` the bridge control protocol resynchronises the bridging state,
			`so that all out queues have an active bridge.`

			3. Assuming an out queue exists the message can be posted to Artemis and the ``bridge`` should eventually deliver this
			`message to the remote system.`

			4. On receipt of a message acknowledge from Artemis the ``StateMachine`` can continue flow if it is not awaiting a response
			i.e. a ``send`` operation. Otherwise it remains suspended waiting for the reply.

			5. The receiving end of the ``bridge`` TLS/AMQP 1.0 link might be the Artemis broker of a remote ``node``,
			but for now we assume it is an enterprise deployment that is using a ``float`` process running behind a firewall.
			`The receiver will already have confirmed the validity of the TLS originator when it accepted the TLS handshake.`
			However, the ``float`` does some further basic checking of received messages and their associated headers.
			For instance the message must be targeted at an inbox address and must be below the network parameters defined ``maxMessageSize``.

			6. Having passed initial checks on the message the ``float`` bundles up the message and originator as a payload to be
			`sent across the DMZ internal firewall. This inbound message path uses a separate AMQP 1.0/TLS control tunnel.`
			(N.B. This link is initiated from the local master ``bridge`` in the trusted zone to the ``float`` in the DMZ. This allows a
			`simple firewall rule to be configured which blocks any attempts to probe the internal network from the DMZ.)`
			Once the message is forwarded the ``float`` keeps track of the delivery acknowledgements,
			`so that the original sender will consume the message in the source queue, only on final delivery to the peer inbox.`
			`Any disconnections, or problems will send a reject status leading to redelivery from source.`

			7. The ``bridge`` process having now received custody of the message does further checks that the message is good. At the
			`minute the checks are essentially of well formedness of the message and that the source and destination are valid.`
			`However, future enhancements may include deep inspection of the message payload for CorDapp blacklisting, and other purposes.`
			`Any problems and the message is acknowledged to prevent further redelivery, logged to audit and dropped.`

			`8. Assuming this is a normal message it is passed onto the Artemis inbox and on acknowledgment of delivery`
			`is cascaded back. Thus, Artemis acknowledgement, leads to acknowledgement of the tunnel AMQP packet,`
			which acknowledges the AMQP back to the sending ``bridge`` and that finally marks the Artemis out queue item as consumed.
			`To prevent this leading to very slow one after the other message delivery the AMQP channels using sliding window flow control.`
			`(Currently, a practical default is set internally and the window size is not user configurable.)`

			9. The ``MessagingService`` on the peer ``node`` will pick up the message from inbox on Artemis, carry out any necessary
			`deduplication. This deduplication is needed as the distributed restartable logic of the Corda wire protocol only`
			`offers 'at least once' delivery guarantees.`
			The resulting unique messages are then passed to the ``StateMachine`` so that the remote flow can be woken up.

			10. The reply messages use the authenticated originator flag attached by the ``float`` to route the replies back to the
			`correct originator.`

			`.. Note:: That the message reply path is not via the inbound path, but instead is via a separately validated route`
			from the local ``bridge`` to the original ``node``'s ``float`` and then on to the original ``node`` via Artemis.

			`Operating modes of the Bridge and Float`
			`---------------------------------------`

			`Embedded Developer Node (node + artemis + internal bridge, no float, no DMZ)`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			`The simplest development deployment of the bridge is to just use the embedded Peer-to-Peer Artemis with the node as TLS endpoint`
			`and to have the outgoing packets use the internal bridge functionality. Typically this should only be used for easy development,`
			`or for organisations evaluating on Open Source Corda, where this is the only available option:`

			`.. image:: resources/bridge/node_embedded_bridge.png`
			`:scale: 100%`
			`:align: center`

			`Node + Bridge (no float, no DMZ)`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			The next simplest deployment is to turn off the built in bridge using the ``externalBridge`` enterprise config property
			and to run a single combined ``bridge``/``float`` process. This might be suitable for a test environment, to conserve VM's.

			.. note:: Note that to run the ``bridge`` and the ``node`` on the same machine there could be a port conflict with a naive setup,
			but by using the ``messagingServerAddress`` property to specify the bind address and port plus setting
			``messagingServerExternal = false``
			the embedded Artemis P2P broker can be set to listen on a different port rather than the advertised ``p2paddress`` port.
			Then configure an all-in-one ``bridge`` to point at this ``node``:

			`.. image:: resources/bridge/simple_bridge.png`
			`:scale: 100%`
			`:align: center`

			`DMZ ready (node + bridge + float)`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			`To familiarize oneself with the a more complete deployment including a DMZ and separated inbound and outbound paths`
			the ``bridgeMode`` property in the ``bridge.conf`` should be set to ``BridgeInner`` for the ``bridge`` and
			``FloatOuter`` for the DMZ ``float``. The diagram below shows such a non-HA deployment. This would not be recommended
			`for production, unless used as part of a cold DR type standby.`

			.. note:: Note that whilst the ``bridge`` needs access to the official TLS private
			key, the tunnel link should use a private set of link specific keys and certificates. The ``float`` will be provisioned
			`dynamically with the official TLS key when activated via the tunnel and this key will never be stored in the DMZ:`

			`.. image:: resources/bridge/bridge_and_float.png`
			`:scale: 100%`
			`:align: center`

			`DMZ ready with outbound SOCKS`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			`Some organisations require dynamic outgoing connections to operate via a SOCKS proxy. The code supports this option`
			by adding extra information to the ``outboundConfig`` section of the ``bridge`` process. An simplified example deployment is shown here
			`to highlight the option:`

			`.. image:: resources/bridge/bridge_with_socks.png`
			`:scale: 100%`
			`:align: center`

			`Full production HA DMZ ready mode (hot/cold node, hot/warm bridge)`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			Finally, we show a full HA solution as recommended for production. This does require adding an external ``Zookeeper``
			cluster to provide ``bridge`` master selection and extra instances of the ``bridge`` and ``float``. This allows
			hot-warm operation of all the ``bridge`` and ``float`` instances. The R3 enterprise Corda ``node`` should be run as hot-cold HA too.
			Highlighted in the diagram is the addition of the ``haConfig`` section to point at ``zookeeper`` and also the use of secondary
			addresses in the ``alternateArtemisAddresses`` to allow ``node`` failover and in the ``floatAddresses`` to point at a
			pool of DMZ ``float`` processes.:

			`.. image:: resources/bridge/ha_bridge_float.png`
			`:scale: 100%`
			`:align: center`