Merge pull request #106 from corda/dl-float-design-doc

Current design doc baselined - please raise future updates via a separate PR
This commit is contained in:
David Lee 2017-11-24 14:02:05 +00:00 committed by GitHub
commit 71681e0e0a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 644 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

View File

@ -0,0 +1,155 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
--------------------------------------------
Design Review Board Meeting Minutes
============================================
**Date / Time:** 16/11/2017, 14:00
## Attendees
- Mark Oldfield (MO)
- Matthew Nesbit (MN)
- Richard Gendal Brown (RGB)
- James Carlyle (JC)
- Mike Hearn (MH)
- Jose Coll (JoC)
- Rick Parker (RP)
- Andrey Bozhko (AB)
- Dave Hudson (DH)
- Nick Arini (NA)
- Ben Abineri (BA)
- Jonathan Sartin (JS)
- David Lee (DL)
## **Minutes**
MO opened the meeting, outlining the agenda and meeting review process, and clarifying that consensus on each design decision would be sought from RGB, JC and MH.
MO set out ground rules for the meeting. RGB asked everyone to confirm they had read both documents; all present confirmed.
MN outlined the motivation for a Float as responding to organisations expectation for afire break protocol termination in the DMZ where manipulation and operation can be checked and monitored.
The meetingwas briefly interrupted by technical difficulties with the GoToMeetingconferencing system.
MN continued to outline how the design was constrained by expected DMZ rules and influenced by currently perceived client expectations e.g. making the float unidirectional. He gave a prelude to certain design decisions e.g. the use ofAMQP from the outset.
MN went onto describe the target solution in detail, covering the handling of both inbound and outbound connections. He highlighted implicit overlaps with the HA design clustering support, queue names etc., and clarified that the local broker was not required to use AMQP.
### [TLS termination](./ssl-termination.md)
JC questioned where the TLS connection would terminate. MN outlined the pros and cons of termination on firewall vs. float, highlighting the consequence of float termination that access by the float to the to the private key was required, and that mechanisms may be needed to store that key securely.
MH contended that the need to propagate TLS headers etc. through to the node (for reinforcing identity checks etc.) implied a need to terminate on the float. MN agreed but noted that in practice the current node design did not make much use of that feature.
JCquestioned how users would provision a TLS cert on a firewall MN confirmedusers would be able to do this themselves and were typically familiar withdoing so.
RGB highlighted the distinction between the signing key for the TLS vs. identity certificates, and that this needed to be made clear to users. MN agreed that TLS private keys could be argued to be less critical from a security perspective, particularly when revocation was enabled.
MH noted potential to issue sub-certs with key usage flags as an additional mitigating feature.
RGB queried at what point in the flow a message would be regarded as trusted. MN set an expectation that the float would apply basic checks (e.g. stopping a connection talking on other topics etc.) but that subsequent sanitisation should happen in internal trusted portion.
RGB questioned whether the TLS key on the float could be re-used on the bridge to enable wrapped messages to be forwarded in an encrypted form session migration. MH and MN maintained TLS forwarding could not work in that way, and this would not allow the fire break requirement to inspect packets.
RGB concluded the bridge must effectively trust the firewall or bridge on the origin of incoming messages. MN raised the possibility of SASL verification,but noted objections by MH (clumsy because of multiple handshakes etc.).
JC queried whether SASL would allow passing of identity and hence termination at the firewall;MN confirmed this.
MH contented that the TLS implementation was specific to Corda in several ways which may challenge implementation using firewalls, and that typical firewalls(using old OpenSSL etc.) were probably not more secure than R3s own solutions. RGB pointed out that the design was ultimately driven by client perception ofsecurity (MN: “security theatre”) rather than objective assessment. MH added that implementations would be firewall-specific and not all devices would support forwarding, support for AMQP etc.
RGB proposed messaging to clients that the option existed to terminate on the firewall if it supported the relevant requirements.
MN re-raised the question of key management. RGB asked about the risk implied from the threat of a compromised float. MN said an attacker who compromised a float could establish TLS connections in the name of the compromised party, and couldinspect and alter packets including readable busness data (assuming AMQP serialisation). MH gave an example of a MITM attack where an attacker could swap in their own single-use key allowing them to gain control of (e.g.) a cash asset; the TLS layer is the only current protection against that.
RGB queried whether messages could be signed by senders. MN raised potential threat of traffic analysis, and stated E2E encryption was definitely possible but not for March-April.
MH viewed the use-case for extra encryption as the consumer/SME market, where users would want to upload/download messages from a mailbox without needing to trust it not the target market yet. MH maintained TLS really strong and that assuming compromise of float was not conceptually different from compromise of another device e.g. the firewall. MN confirmed that use of an HSM would generally require signing on the HSM device for every session; MH observed this could bea bottleneck in the scenario of a restored node seeking to re-establish a large number of connections. It was observed that the float would still need access to a key provisioning access to the HSM, so this did not materially improve the security in a compromised float scenario.
MH advised against offering clients support for their own firewall since it would likely require R3 effort to test support and help with customisations.
MN described option 2b to tunnel through to the internal trusted portion of the float over a connection initiated from inside the internal network in order for the key to be loaded into memory at run-time; this would require a bit more code.
MH advocated option 2c - just to accept risk and store on file system on the basis of time constraints, maintaining that TLS handshakes are complicated to code and hard to proxy. MH suggested upgrading to 2b or 2a later if needed. MH described how keys were managed at Google.
**DECISION CONFIRMED**: Accept option 2b - Terminate on float, inject key from internal portion of the float (RGB, JC, MH agreed)
### [E2E encryption](./e2e-encryption.md)
DH proposed that E2E encryption would be much better but conceded the time limitations and agreed that the threat scenario of a compromised DMZ device was the same under the proposed options. MN agreed.
MN argued for a placeholder vs. ignoring or scheduling work to build e2e encryption now. MH agreed, seeking more detailed proposals on what the placeholder was and how it would be used.
MH queried whether e2e encryption would be done at the app level rather than the AMQP level, raising questions what would happen on non-supporting nodes etc.
MN highlighted the link to AMQP serialisation work being done.
**DECISION CONFIRMED:** Add placeholder, subject to more detailed design proposal (RGB, JC, MH agreed)
### **[AMQP vs. custom protocol](./p2p-protocol.md) **
MN described alternative options involving onion-routing etc.
JoC questioned whether this would also allow support for load balancing; MN advised this would be too much change in direction in practice.
MH outlined his original reasoning for AMQP (lots of e.g. manageability features, not allof which would be needed at the outset but possibly in future) vs. other options e.g. MQTT.
MO questioned whether the broker would imply performance limitations.
RGB argued there were two separate concerns: Carrying messages from float to bridge and then bridge to node, with separate design options.
JC proposed the decision could be deferred until later. MN pointed out changing the protocol would compromise wire stability.
MH advocated sticking with AMQP for now and implementing a custom protocol later with suitable backwards-compatibility features when needed.
RGB queried whether full AMQP implementation should be done in this phase. MN provided explanation.
**DECISION CONFIRMED:** Continue to use AMQP (RGB, JC, MH agreed)
### [Pluggable broker prioritisation](./pluggable-broker.md)
MN outlined arguments for deferring pluggable brokers, whilst describing how hed go about implementing the functionality. MH agreed with prioritisation for later.
JC queried whether broker providers could be asked to deliver the feature. AB mentioned that Solace seemed keen on working with R3 and could possibly be utilised. MH was sceptical, arguing that R3 resource would still be needed to support.
JoC noted a distinction in scope for P2P and/or RPC.
There was discussion of replacing the core protocol with JMS + plugins. RGB drew focus tothe question of when to do so, rather than how.
AB noted Solace have functionality with conceptual similarities to the float, and questioned to what degree the float could be considered non-core technology. MH argued the nature of Corda as a P2P network made the float pretty core to avoiding dedicated network infrastructure.
**DECISION CONFIRMED:** Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed)
### **Inbound only vs. inbound & outbound connections**
DL sought confirmation that the group was happy with the float to act as a Listener only.MN repeated the explanation of how outbound connections would be initiated through a SOCKS 4/5 proxy. No objections were raised.
### Overall design and implementation plan
MH requested more detailed proposals going forward on:
1) To what degree logs from different components need to be integrated (consensus wasno requirement at this stage)
2) Bridge control protocols.
3) Scalability of hashing network map entries to a queue names
4) Node admins' user experience MH argued for documenting this in advance to validate design
5) Behaviour following termination of a remote node (retry frequency, back-off etc.)?
6) Impact on standalone nodes (no float)?
JC noted an R3 obligation with Microsoft to support AMQP-compliant Azure messaging,. MN confirmed support for pluggable brokers should cover that.
JC argued for documentation of procedures to be the next step as it is needed for the Project Agent Pilot phase. MH proposed sharing the advance documentation.
JoC questioned whether the Bridge Manager locked the design to Artemis? MO highlighted the transitional elements of the design.
RGB questioned the rationale for moving the broker out of the node. MN provided clarification.
**DECISION CONFIRMED**: Design to proceed as discussed (RGB, JC, MH agreed)

View File

@ -0,0 +1,56 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
--------------------------------------------
Design Decision: End-to-end encryption
============================================
## Background / Context
End-to-end encryption is a desirable potential design feature for the [float](../design.md).
## Options Analysis
### 1. No end-to-end encryption
#### Advantages
1. Least effort
2. Easier to fault find and manage
#### Disadvantages
1. With no placeholder, it is very hard to add support later and maintain wire stability.
2. May not get past security reviews of Float.
### 2. Placeholder only
#### Advantages
1. Allows wire stability when we have agreed an encrypted approach
2. Shows that we are serious about security, even if this isnt available yet.
3. Allows later encrypted version to be an enterprise feature that can interoperate with OS versions.
#### Disadvantages
1. Doesnt actually provide E2E, or define what an encrypted payloadlooks like.
2. Doesnt address any crypto features that target protecting the AMQP headers.
### 3. Implement end-to-end encryption
1. Will protect the sensitive data fully.
#### Disadvantages
1. Lots of work.
2. Difficult to get right.
3. Re-inventing TLS.
## Recommendation and justification
Proceed with Option 2: Placeholder
## Decision taken
[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with Option 2 - Add placeholder, subject to more detailed design proposal (RGB, JC, MH agreed)

View File

@ -0,0 +1,69 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
--------------------------------------------
Design Decision: P2P Messaging Protocol
============================================
## Background / Context
Corda requires messages to be exchanged between nodes via a well-defined protocol.
Determining this protocol is a critical upstream dependency for the design of key messaging components including the [float](../design.md).
## Options Analysis
### 1. Use AMQP
Under this option, P2P messaging will follow the [Advanced Message Queuing Protocol](https://www.amqp.org/).
#### Advantages
1. As we have described in our marketing materials.
2. Well-defined standard.
3. Supportfor packet level flow control and explicit delivery acknowledgement.
4. Will allow eventual swap out of Artemis for other brokers.
#### Disadvantages
1. AMQP is a complex protocol with many layered state machines, for which it may prove hard to verify security properties.
2. No support for secure MAC in packets frames.
3. No defined encryption mode beyond creating custom payload encryption and custom headers.
4. No standardised support for queue creation/enumeration, or deletion.
5. Use of broker durable queues and autonomousbridge transfers does not align with checkpoint timing, so that independent replication of the DB and Artemis data risks causing problems. (Writing to the DB doesnt work currently and is probably also slow).
### 2. Develop a custom protocol
This option would discard existing Artemis server/AMQP support for peer-to-peer communications in favour of a custom implementation of the Corda MessagingService, which takes direct responsibility for message retries and stores the pending messages into the node's database. The wire level of this service would be built on top of a fully encrypted MIX network which would not require a fully connected graph, but rather send messages on randomly selected paths over the dynamically managed network graph topology.
Packet format would likely use the ![SPHINX packet format](http://www0.cs.ucl.ac.uk/staff/G.Danezis/papers/sphinx-eprint.pdf) although with the body encryption updated to a modern AEAD scheme as in https://www.cs.ru.nl/~bmennink/pubs/16cans.pdf . In this scheme, nodes would be identified in the overlay network solely by Curve25519 public key addresses and floats would be dumb nodes that only run the MIX network code and don't act as message sources, or sinks. Intermediate traffic would not be readable except by the intended waypoint and only the final node can read the payload.
Point to point links would be standard TLS and the network certificates would be whatever is acceptable to the host institutions e.g. standard Verisign certs. It is assumed institutions would select partners to connect to that they trust and permission them individually in their firewalls. Inside the MIX network the nodes would be connected mostly in a static way and use standard HELLO packets to determine the liveness of neighbour routes, then use tunnelled gossip to distribute the signed/versioned Link topology messages. Nodes will also be allowed to advertise a public IP, so some dynamic links and publicly visible nodes would exist. Network map addresses would then be mappings from Legal Identity to these overlay network addresses, not to physical network locations.
#### Advantages
1. Can be defined with very small message surface area that is amenable to security analysis.
2. Packet formats can follow best practice cryptography from the start and be matched to Cordas needs.
3. Doesnt require Complete Graph structure for network if we have intermediate routing.
4. More closely aligns checkpointing and message delivery handling at the application level.
#### Disadvantages
1. Inconsistent with previous design statements published to external stakeholders.
2. Effort implications - starting from scratch
3. Technical complexity in developing a P2P protocols which is attack tolerant.
## Recommendation and justification
Proceed with Option 1
## Decision taken
[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with Option 1 - Continue to use AMQP (RGB, JC, MH agreed)

View File

@ -0,0 +1,65 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
--------------------------------------------
Design Decision: Pluggable Broker prioritisation
============================================
## Background / Context
A decision on when to prioritise implementation of a pluggable broker has implications for delivery of key messaging components including the [float](../design.md).
## Options Analysis
### 1. Deliver pluggable brokers now
#### Advantages
1. Meshes with business opportunities from HPE and Solace Systems.
2. Would allow us to interface to existing Bank middleware.
3. Would allow us to switch away from Artemis if we need higher performance.
4. Makes our AMQP story stronger.
#### Disadvantages
1. More up-front work.
2. Might slow us down on other priorities.
### 2. Defer development of pluggable brokers until later
#### Advantages
1. Still gets us where we want to go, just later.
2. Work can be progressed as resource is available, rather than right now.
#### Disadvantages
1. Have to take care that we have sufficient abstractions that things like CORE connections can be replaced later.
2. Leaves HPE and Solace hanging even longer.
### 3. Never enable pluggable brokers
#### Advantages
1. What we already have.
#### Disadvantages
1. Ties us to ArtemisMQ development speed.
2. Not good for our relationship with HPE and Solace.
3. Probably limits our maximum messaging performance longer term.
## Recommendation and justification
Proceed with Option 2 (defer development of pluggable brokers until later)
## Decision taken
[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with Option 2- Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed)

View File

@ -0,0 +1,103 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
--------------------------------------------
Design Decision: TLS termination point
============================================
## Background / Context
Design of the [float](../design.md) is critically influenced by the decision of where TLS connections to the node should be terminated.
## Options Analysis
### 1. Terminate TLS on Firewall
#### Advantages
1. Common practice for DMZ web solutions, often with an HSM associated with the Firewall and should be familiar for banks to setup.
2. Doesnt expose our private key in the less trusted DMZ context.
3. Bugs in the firewall TLS engine will be patched frequently.
4. The DMZ float server would only require a self-signed certificate/private key to enable secure communications, so theft of this key has no impact beyond the compromised machine.
#### Disadvantages
1. May limit cryptography options to RSA, and prevent checking of X500 names (only the root certificate checked) - Corda certificates are not totally standard.
2. Doesnt allow identification of the message source.
3. May require additional work and SASL support code to validate the ultimate origin of connections in the float.
#### Variant option 1a: Include SASL connection checking
##### Advantages
1. Maintain authentication support
2. Can authenticate against keys held internally e.g. Legal Identity not just TLS.
##### Disadvantages
1. More work than the do-nothing approach
2. More protocol to design for sending across the inner firewall.
### 2. Direct TLS Termination onto Float
#### Advantages
1. Validate our PKI certificates directly ourselves.
2. Allow messages to be reliably tagged with source.
#### Disadvantages
1. We dont currently use the identity to check incoming packets, only for connection authentication anyway.
2. Management of Private Key a challenge requiring extra work and security implications. Options for this are presented below.
#### Variant Option 2a: Float TLS certificate via direct HSM
##### Advantages
1. Key cant be stolen (only access to signing operations)
2. Audit trail of signings.
##### Disadvantages
1. Accessing HSM from DMZ probably not allowed.
2. Breaks the inbound-connection-only rule of modern DMZ.
#### Variant Option 2b: Tunnel signing requests to bridge manager
##### Advantages
1. No new connections involved from Float box.
2. No access to actual private key from DMZ.
##### Disadvantages
1. Requires implementation of a message protocol, in addition to a key provider that can be passed to the standard SSLEngine, but proxies signing requests.
#### Variant Option 2c: Store key on local file system
##### Advantages
1. Simple with minimal extra code required.
2. Delegates access control to banks own systems.
3. Risks losing only the TLS private key, which can easily be revoked. This isnt the legal identity key at all.
##### Disadvantages
1. Risks losing the TLS private key.
2. Probably not allowed.
## Recommendation and justification
Proceed with Variant option 1a: Terminate on firewall; include SASL connection checking.
## Decision taken
[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with option 2b - Terminate on float, inject key from internal portion of the float (RGB, JC, MH agreed)

View File

@ -0,0 +1,196 @@
![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png)
# Float Design
--------------------------------------------
DOCUMENT MANAGEMENT
============================================
## Document Control
* Title: Float Design
* Date: 13th November 2017
* Author: Matthew Nesbit
* Distribution: Design Review Board, Product Management, Services - Technical (Consulting), Platform Delivery
* Corda target version: Enterprise
## Document Sign-off
* Author: David Lee
* Reviewers(s): TBD
* Final approver(s): TBD
## Document History
# HIGH LEVEL DESIGN
## Overview
The role of the 'float' is to meet the requirements of organisations that will not allow direct incoming connections to their node, but would rather host a proxy component in a DMZ to achieve this. As such it needs to meet the requirements of modern DMZ security rules, which essentially assume that the entire machine in the DMZ may become compromised. At the same time, we expect that the Float can interoperate with directly connected nodes, possibly even those using open source Corda.
### Background
#### Current state of peer-to-peer messaging in Corda
The diagram below illustrates the current mechanism for peer-to-peer messaging between Corda nodes.
![Current P2P State](./current-p2p-state.png)
When a flow running on a Corda node triggers a requirement to send a message to a peer node, it first checks for pre-existence of an applicable message queue for that peer.
**If the relevant queue exists:**
1. The node submits the message to the queue and continues after receiving acknowledgement.
2. The Core Bridge picks up the message and transfers it via a TLS socket to the inbox of the destination node.
3. A flow on the recipient receives message from peer and acknowledged consumption on bus when the flow has checkpointed this progress.
**If the queue does not exist (messaging a new peer):**
1. The flow triggers creation of a new queue with a name encoding the identity of the intended recipient.
2. When the queue creation has completed the node sends the message to the queue.
3. The hosted Artemis server within the node has a queue creation hook which is called.
4. The queue name is used to lookup the remote connection details and a new bridge is registered.
5. The client certificate of the peer is compared to the expected legal identity X500 Name. If this is OK, message flow proceeds as for a pre-existing queue (above).
## Scope
* Goals:
* Allow connection to a Corda node wihout requiring direct incoming connections from external participants.
* Allow connections to a Corda node without requiring the node itself to have a public IP address. Separate TLS connection handling from the MQ broker.
* Non-goals (out of scope):
* Support for MQ brokers other than Apache Artemis
## Timeline
For delivery by end Q1 2018.
## Requirements
Allow connectivity in compliance with DMZ constraints commonly imposed by modern financial institutions; namely:
1. Firewalls required between the internet and any device in the DMZ, and between the DMZ and the internal network
2. Data passing from the internet and the internal network via the DMZ should pass through a clear protocol break in the DMZ.
3. Only identified IPs and ports are permitted to access devices in the DMZ; this include communications between devices colocated in the DMZ.
4. Only a limited number of ports are opened in the firewall (<5) to make firewall operation manageable. These ports must change slowly.
5. Any DMZ machine is typically multi-homed, with separate network cards handling traffic through the institutional firewall vs. to the Internet. (There is usually a further hidden management interface card accessed via a jump box for managing the box and shipping audit trail information). This requires that our software can bind listening ports to the correct network card not just to 0.0.0.0.
6. No connections to be initiated by DMZ devices towards the internal network. Communications should be initiated from the internal network to form a bidirectional channel with the proxy process.
7. No business data should be persisted on the DMZ box.
8. An audit log of all connection events is required to track breaches. Latency information should also be tracked to facilitate management of connectivity issues.
9. Processes on DMZ devices run as local accounts with no relationship to internal permission systems, or ability to enumerate devices on the internal network.
10. Communications in the DMZ should yse modern TLS, often with local-only certificates/keys that hold no value outside of use in predefined links.
11. Where TLS is required to terminate on the firewall, provide a suitably secure key management mechanism (e.g. an HSM).
12. Any proxy in the DMZ should be subject to the same HA requirements as the devices it is servicing
13. Any business data passing through the proxy should be separately encrypted, so that no data is in the clear of the program memory if the DMZ box is compromised.
## Design Decisions
The following design decisions are assumed by this design:
1. [AMQP vs. custom P2P](./decisions/p2p-protocol.md): Use AMQP
2. [SSL termination (firewall vs. float)](./decisions/ssl-termination.md): Terminate on firewall; include SASL connection checking
3. [End-to-end encryption](./decisions/e2e-encryption.md): Include placeholder only
4. [Prioritisation of pluggable broker support](./decisions/pluggable-broker.md): Defer pluggable brokers until later
## Target Solution
The proposed solution introduces a reverse proxy component ("**float**") which may be sited in the DMZ, as illustrated in the diagram below.
![Full Float Implementation](./full-float.png)
The main role of the float is to forward incoming AMQP link packets from authenticated TLS links to the AMQP Bridge Manager, then echo back final delivery acknowledgements once the Bridge Manager has successfully inserted the messages. The Bridge Manager is responsible for rejecting inbound packets on queues that are not local inboxes to prevent e.g. 'cheating' messages onto management topics, faking outgoing messages etc.
The float is linked to the internal AMQP Bridge Manager via a single AMQP/TLS connection, which can contain multiple logical AMQP links. This link is initiated at the socket level by the Bridge Manager towards the float.
The float is a **listener only** and does not enable outgoing bridges (see Design Decisions, above). Outgoing bridge formation and message sending come directly from the internal Bridge Manager (possibly via a SOCKS 4/5 proxy, which is easy enough to enable in netty, or directly through the corporate firewall. Initiating from the float gives rise to security concerns.)
The float is **not mandatory**; interoperability with older nodes, even those using direct AMQP from bridges in the node, is supported.
**No state will be serialized on the float**, although suitably protected logs will be recorded of all float activities.
**End-to-end encryption** of the payload is not delivered through this design (see Design Decisions, above). For current purposes, a header field indicating plaintext/encrypted payload is employed as a placeholder.
**HA** is enabled (this should be easy as the bridge manager can choose which float to make active). Only fully connected DMZ floats should activate their listening port.
Implementation of the float is expected to be based on existing AMQP Bridge Manager code - see Implementation Plan, below, for expected work stages.
### Bridge control protocol
The bridge control is designed to be as stateless as possible. Thus, nodes and bridges restarting must re-request/broadcast information to each other. Messages are sent to a 'bridge.control' address in Artemis as non-persistent messages with a non-durable queue. Each message should contain a duplicate message ID, which is also re-used as the correlation id in replies. Relevant scenarios are described below:
#### On bridge start-up, or reconnection to Artemis
1. The bridge process should subscribe to the 'bridge.control'.
2. The bridge should start sending QueueQuery messages which will contain a unique message id and an identifier for the bridge sending the message.
3. The bridge should continue to send these until at least one node replies with a matched QueueSnapshot message.
4. The QueueSnapshot message replies from the nodes contains a correlationId field set to the unique id of the QueueQuery query, or the correlation id is null. The message payload is a list of inbox queue info items and a list of outbound queue info items. Each queue info item is a tuple of Legal X500 Name (as expected upon the destination TLS certificates) and the queue name which should have the form of "internal.peers."+hash key of legal identity (using the same algorithm as we use in the db to make the string). Note this queue name is a change from the current logic, but will be more portable to length constrained topics and allow multiple inboxes on the same broker.
5. The bridge should process the QueueSnapshot, initiating links to the outgoing targets. It should also add expected inboxes to its in-bound permission list.
6. When an outgoing link is successfully formed the remote client certificate should be checked against the expected X500 name. Assuming the link is valid the bridge should subscribe to the related queue and start trying to forward the messages.
#### On node start-up, or reconnection to Artemis
1. The node should subscribe to 'bridge.control'.
2. The node should enumerate the queues and identify which are have well known identities in the network map cache. The appropriate information about its own inboxes and any known outgoing queues should be compiled into an unsolicited QueueSnapshot message with a null correlation id. This should be broadcasted to update any bridges that are running.
3. If any QueueQuery messages arrive these should be responded to with specific QueueSnapshot messages with the correlation id set.
#### On network map updates
1. On receipt of any network map cache updates the information should be evaluated to see if any addition queues can now be mapped to a bridge. At this point a BridgeRequest packet should be sent which will contain the legal X500Name and queue name of the new update.
#### On flow message to Peer
1. If a message is to be sent to a peer the code should (as it does now) check for queue existence in its cache and then on the broker. If it does exist it simply sends the message.
2. If the queue is not listed in its cache it should block until the queue is created (this should be safe versus race conditions with other nodes).
3. Once the queue is created the original message and subsequent messages can now be sent.
4. In parallel a BridgeRequest packet should be sent to activate a new connection outwards. This will contain the contain the legal X500Name and queue name of the new queue.
5. Future QueueSnapshot requests should be responded to with the new queue included in the list.
### Behaviour with a Float portion in the DMZ
1. On initial connection of an inbound bridge, AMQP is configured to run a SASL challenge response to (re-)validate the origin and confirm the client identity. (The most likely SASL mechanism for this is using https://tools.ietf.org/html/rfc3163 as this allows reuse of our PKI certificates in the challenge response. Potentially we could forward some bridge control messages to cover the SASL exchange to the internal Bridge Controller. This would allow us to keep the private keys internal to the organisation, so we may also require a SASLAuth message type as part of the bridge control protocol.)
2. The float restricts acceptable AMQP topics to the name space appropriate for inbound messages only. Hence, there should be no way to tunnel messages to bridge control, or RPC topics on the bus.
3. On receipt of a message from the external network, the Float should append a header to link the source channel's X500 name, then create a Delivery for forwarding the message inwards.
4. The internal Bridge Control Manager process validates the message further to ensure that it is targeted at a legitimate inbox (i.e. not an outbound queue) and then forwards it to the bus. Once delivered to the broker, the Delivery acknowledgements are cascaded back.
5. On receiving Delivery notification from the internal side, the Float acknowledges back the correlated original Delivery.
6. The Float should protect against excessive inbound messages by AMQP flow control and refusing to accept excessive unacknowledged deliveries.
7. The Float only exposes its inbound server socket when activated by a valid AMQP link from the Bridge Control Manager to allow for a simple HA pool of DMZ Float processes. (Floats cannot run hot-hot as this would invalidate Corda's message ordering guarantees.)
# IMPLEMENTATION PLAN
## Proposed Incremental Steps Towards a Float
1. First, I would like to more explicitly split the RPC and P2P MessagingService instances inside the Node. They can keep the same interface, but this would let us develop P2P and RPC at different rates if required.
2. The current in-node design with Artemis Core bridges should first be replaced with an equivalent piece of code that initiates send only bridges using an in-house wrapper over the proton-j library. Thus, the current Artemis message objects will be picked up from existing queues using the CORE protocol via an abstraction interface to allow later pluggable replacement. The specific subscribed queues are controlled as before and bridges started by the existing code path. The only difference is the bridges will be the new AMQP client code. The remote Artemis broker should accept transferred packets directly onto its own inbox queue and acknowledge receipt via standard AMQP Delivery notifications. This in turn will be acknowledged back to the Artemis Subscriber to permanently remove the message from the source Artemis queue. The headers for deduplication, address names, etc will need to be mapped to the AMQP messages and we will have to take care about the message payload. This should be an envelope that is capable in the future of being end-to-end encrypted. Where possible we should stay close to the current Artemis mappings.
3. We need to define a bridge control protocol, so that we can have an out of process float/bridge. The current process is that on message send the node checks the target address to see if the target queue already exists. If the queue doesn't exist it creates a new queue which includes an encoding of the PublicKey in its name. This is picked up by a wrapper around the Artemis Server which is also hosted inside the node and can ask the network map cache for a translation to a target host and port. This in turn allows a new bridge to be provisioned. At node restart the re-population of the network map cache is followed to re-create the bridges to any unsent queues/messages.
4. My proposal for a bridge control protocol is partly influenced by the fact that AMQP does not have a built-in mechanism for queue creation/deletion/enumeration. Also, the flows cannot progress until they are sure that there is an accepting queue. Finally, if one runs a local broker it should be fine to run multiple nodes without any bridge processes. Therefore, I will leave the queue creation as the node's responsibility. Initially we can continue to use the existing CORE protocol for this. The requirement to initiate a bridge will change from being implicit signalling via server queue detection to being an explicit pub-sub message that requests bridge formation. This doesn't need durability, or acknowledgements, because when a bridge process starts it should request a refresh of the required bridge list. The typical create bridge messages should contain:
1. The queue name (ideally with the sha256 of the PublicKey, not the whole PublicKey as that may not work on brokers with queue name length constraints).
2. The expected X500Name for the remote TLS certificate.
3. The list of host and ports to attempt connection to. See separate section for more info.
5. Once we have the bridge protocol in place and a bridge out of process the broker can move out of process too, which is a requirement for clustering anyway. We can then start work on floating the bridge and making our broker pluggable.
1. At this point the bridge connection to the local queues should be upgraded to also be AMQP client, rather than CORE protocol, which will give the ability for the P2P bridges to work with other broker products.
2. An independent task is to look at making the Bridge process HA, probably using a similar hot-warm mastering solution as the node, or atomix.io. The inactive node should track the control messages, but obviously doesn't initiate any bridges.
3. Another potentially parallel piece of development is to start to build a float, which is essentially just splitting the bridge in two and putting in an intermediate hop AMQP/TLS link. The thin proxy in the DMZ zone should be as stateless as possible in this.
4. Finally, the node should use AMQP to talk to its local broker cluster, but this will have to remain partly tied to Artemis, as queue creation will require sending management messages to the Artemis core, but we should be able to abstract this. Bridge Management Protocol.
## Float evolution
### In-Process AMQP Bridging
![In-Process AMQP Bridging](./in-process-amqp-bridging.png)
1. In this phase of evolution we hook the same bridge creation code as before and use the same in-process data access to network map cache.
2. However, we now implement AMQP sender clients using proton-j and netty for TLS layer and connection retry.
3. This will also involve formalising the AMQP packet format of the Corda P2P protocol.
4. Once a bridge makes a successful link to a remote node's Artemis broker it will subscribe to the associated local queue.
5. The messages will be picked up from the local broker via an Artemis CORE consumer for simplicity of initial implementation.
6. The queue consumer should be implemented with a simple generic interface as façade, to allow future replacement.
7. The message will be sent across the AMQP protocol directly to the remote Artemis broker.
8. Once acknowledgement of receipt is given with an AMQP Delivery notification the queue consumption will be acknowledged.
9. This will remove the original item from the source queue.
10. If delivery fails due to link loss the subscriber should be closed until a new link is established to ensure messages are not consumed.
11. If delivery fails for other reasons there should be some for of periodic retry over the AMQP link.
12. For authentication checks the client cert returned from the remote server will be checked and the link dropped if it doesn't match expectations.
### Out of process Artemis Broker and Bridges
![Out of process Artemis Broker and Bridges](./out-of-proc-artemis-broker-bridges.png)
1. Move the Artemis broker and bridge formation logic out of the node. This requires formalising the bridge creation requests, but allows clustered brokers, standardised AMQP usage and ultimately pluggable brokers.
2. We should implement a netty socket server on the bridge and forward authenticated packets to the local Artemis broker inbound queues. An AMQP server socket is required for the float, although it should be transparent whether a NodeInfo refers to a bridge socket address, or an Artemis broker.
3. The queue names should use the sha-256 of the PublicKey not the full key. Also, the name should be used for in and out queues, so that multiple distinct nodes can coexist on the same broker. This will simplify development as developers just run a background broker and shouldn't need to restart it.
4. To export the network map information and to initiate bridges a non-durable bridge control protocol will be needed (in blue). Essentially the messages declare the local queue names and target TLS link information. For in-bound messages only messages for known inbox targets will be acknowledged.
5. It should not be hard to make the bridges active-passive HA as they contain no persisted message state and simple RPC can resync the state of the bridge.
6. Queue creation will remain with the node as this must use non-AMQP mechanisms and because flows should be able to queue sent messages even if the bridge is temporarily down.
7. In parallel work can start to upgrade the local links to Artemis (i.e. the node-Artemis link and the Bridge Manager-Artemis link) to be AMQP clients as much as possible.
### Full float implementation
As described in the 'Target Solution' section, above.

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB