diff --git a/docs/source/design/certificate-hierarchies/design.md b/docs/source/design/certificate-hierarchies/design.md index 3a02026ab0..b4a99e395e 100644 --- a/docs/source/design/certificate-hierarchies/design.md +++ b/docs/source/design/certificate-hierarchies/design.md @@ -53,8 +53,13 @@ context. ## Design Decisions -* [Hierarchy levels](./decisions/levels.html). Option 1 - 2-level hierarchy. -* [TLS trust root](./decisions/tls-trust-root.html). Option 1 - Single trust root. +```eval_rst +.. toctree:: + :maxdepth: 2 + + decisions/levels.md + decisions/tls-trust-root.md +``` ## **Target** Solution diff --git a/docs/source/design/failure-detection-master-election/design.md b/docs/source/design/failure-detection-master-election/design.md index f8e3f711ca..c9f2da8fae 100644 --- a/docs/source/design/failure-detection-master-election/design.md +++ b/docs/source/design/failure-detection-master-election/design.md @@ -36,6 +36,16 @@ as possible. It would also be helpful for the chosen solution to not add deployment complexity. +## Design decisions + +```eval_rst +.. toctree:: + :maxdepth: 2 + + drb-meeting-20180131.md + +``` + ## Proposed solutions Based on what is needed for Hot-Warm, 1 active node and at least one passive node (started but in stand-by mode), and @@ -110,4 +120,4 @@ that it doesn't suit our needs because: Our preference would be Zookeeper despite not being as lightweight and deployment-friendly as Atomix. The wide spread use, proper documentation and flexibility to use it not only for automatic failover and master election but also -configuration management(something we might consider moving forward) makes it a better fit for our needs. +configuration management(something we might consider moving forward) makes it a better fit for our needs. \ No newline at end of file diff --git a/docs/source/design/failure-detection-master-election/drb-meeting-20180131.md b/docs/source/design/failure-detection-master-election/drb-meeting-20180131.md index 47c0e0707c..fbe57f2b6d 100644 --- a/docs/source/design/failure-detection-master-election/drb-meeting-20180131.md +++ b/docs/source/design/failure-detection-master-election/drb-meeting-20180131.md @@ -1,8 +1,4 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Review Board Meeting Minutes -============================================ +# Design Review Board Meeting Minutes **Date / Time:** Jan 31 2018, 11.00 @@ -45,33 +41,54 @@ MN presented a high level summary of the options: Wrapper library choice for Zookeeper requires some analysis -MH: predictable source of API for RAFT implementations and Zookeeper compared to Atomix. Be better to have master selector implemented as an abstraction +MH: predictable source of API for RAFT implementations and Zookeeper compared to Atomix. Be better to have master +selector implemented as an abstraction -MH: hybrid approach possible - 3rd node for oversight, i.e. 2 embedded in the node, 3rd is an observer. Zookeeper can have one node in primary data centre, one in secondary data centre and 3rd as tie-breaker +MH: hybrid approach possible - 3rd node for oversight, i.e. 2 embedded in the node, 3rd is an observer. Zookeeper can +have one node in primary data centre, one in secondary data centre and 3rd as tie-breaker -WN: why are we concerned about cost of 3 machines? MN: we're seeing / hearing clients wanting to run many nodes on one VM. Zookeeper is good for this since 1 Zookepper cluster can serve 100+ nodes +WN: why are we concerned about cost of 3 machines? MN: we're seeing / hearing clients wanting to run many nodes on one +VM. Zookeeper is good for this since 1 Zookepper cluster can serve 100+ nodes -MH: terminology clarification required: what holds the master lock? Ideally would be good to see design thinking around split node and which bits need HA. MB: as a long term vision, ideally have 1 database for many IDs and the flows for those IDs are load balanced. Regarding services internally to node being suspended, this is being investigated. +MH: terminology clarification required: what holds the master lock? Ideally would be good to see design thinking around +split node and which bits need HA. MB: as a long term vision, ideally have 1 database for many IDs and the flows for +those IDs are load balanced. Regarding services internally to node being suspended, this is being investigated. -MH: regarding auto failover, in the event a database has its own perception of master and slave, how is this handled? Failure detector will need to grow or have local only schedule to confirm it is processing everything including connectivity between database and bus, i.e. implement a 'healthiness' concept +MH: regarding auto failover, in the event a database has its own perception of master and slave, how is this handled? +Failure detector will need to grow or have local only schedule to confirm it is processing everything including +connectivity between database and bus, i.e. implement a 'healthiness' concept -MH: can you get into a situation where the node fails over but the database does not, but database traffic continues to be sent to down node? MB: database will go offline leading to an all-stop event. +MH: can you get into a situation where the node fails over but the database does not, but database traffic continues to +be sent to down node? MB: database will go offline leading to an all-stop event. -MH: can you have master affinity between node and database? MH: need watchdog / heartbeat solutions to confirm state of all components +MH: can you have master affinity between node and database? MH: need watchdog / heartbeat solutions to confirm state of +all components -JC: how long will this solution live? MB: will work for hot / hot flow sharding, multiple flow workers and soft locks, then this is long term solution. Service abstraction will be used so we are not wedded to Zookeeper however the abstraction work can be done later +JC: how long will this solution live? MB: will work for hot / hot flow sharding, multiple flow workers and soft locks, +then this is long term solution. Service abstraction will be used so we are not wedded to Zookeeper however the +abstraction work can be done later -JC: does the implementation with Zookeeper have an impact on whether cloud or physical deployments are used? MB: its an internal component, not part of the larger Corda network therefore can be either. For the customer they will have to deploy a separate Zookeeper solution, but this is the same for Atomix. +JC: does the implementation with Zookeeper have an impact on whether cloud or physical deployments are used? MB: its an +internal component, not part of the larger Corda network therefore can be either. For the customer they will have to +deploy a separate Zookeeper solution, but this is the same for Atomix. -WN: where Corda as a service is being deployed with many nodes in the cloud. Zookeeper will be better suited to big providers. +WN: where Corda as a service is being deployed with many nodes in the cloud. Zookeeper will be better suited to big +providers. -WN: concern is the customer expects to get everything on a plate, therefore will need to be educated on how to implement Zookeeper, but this is the same for other master selection solutions. +WN: concern is the customer expects to get everything on a plate, therefore will need to be educated on how to implement +Zookeeper, but this is the same for other master selection solutions. -JC: is it possible to launch R3 Corda with a button on Azure marketplace to commission a Zookeeper? Yes, if we can resource it. But expectation is Zookeeper will be used by well-informed clients / implementers so one-click option is less relevant. +JC: is it possible to launch R3 Corda with a button on Azure marketplace to commission a Zookeeper? Yes, if we can +resource it. But expectation is Zookeeper will be used by well-informed clients / implementers so one-click option is +less relevant. -MH: how does failover work with HSMs? MB: can replicate realm so failover is trivial +MH: how does failover work with HSMs? -JC: how do we document Enterprise features? Publish design docs? Enterprise fact sheets? R3 Corda marketing material? Clear seperation of documentation is required. GT: this is already achieved by havind docs.corda.net for open source Corda and docs.corda.r3.com for enterprise R3 Corda +MN: can replicate realm so failover is trivial + +JC: how do we document Enterprise features? Publish design docs? Enterprise fact sheets? R3 Corda marketing material? +Clear seperation of documentation is required. GT: this is already achieved by having docs.corda.net for open source +Corda and docs.corda.r3.com for enterprise R3 Corda ### Next Steps diff --git a/docs/source/design/float/decisions/drb-meeting-20171116.md b/docs/source/design/float/decisions/drb-meeting-20171116.md index 0aa6742adf..556da216d2 100644 --- a/docs/source/design/float/decisions/drb-meeting-20171116.md +++ b/docs/source/design/float/decisions/drb-meeting-20171116.md @@ -1,13 +1,7 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Review Board Meeting Minutes -============================================ +# Design Review Board Meeting Minutes **Date / Time:** 16/11/2017, 14:00 - - ## Attendees - Mark Oldfield (MO) @@ -24,9 +18,7 @@ Design Review Board Meeting Minutes - Jonathan Sartin (JS) - David Lee (DL) - - -## **Minutes** +## Minutes MO opened the meeting, outlining the agenda and meeting review process, and clarifying that consensus on each design decision would be sought from RGB, JC and MH. @@ -90,7 +82,7 @@ MN highlighted the link to AMQP serialisation work being done. **DECISION CONFIRMED:** Add placeholder, subject to more detailed design proposal (RGB, JC, MH agreed) -### **[AMQP vs. custom protocol](./p2p-protocol.md) ** +### [AMQP vs. custom protocol](./p2p-protocol.md) MN described alternative options involving onion-routing etc. @@ -110,7 +102,7 @@ RGB queried whether full AMQP implementation should be done in this phase. MN pr **DECISION CONFIRMED:** Continue to use AMQP (RGB, JC, MH agreed) -### [Pluggable broker prioritisation](./pluggable-broker.md) +### [Pluggable broker prioritisation](./pluggable-broker.md) MN outlined arguments for deferring pluggable brokers, whilst describing how he’d go about implementing the functionality. MH agreed with prioritisation for later. @@ -124,7 +116,7 @@ AB noted Solace have functionality with conceptual similarities to the float, an **DECISION CONFIRMED:** Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed) -### **Inbound only vs. inbound & outbound connections** +### Inbound only vs. inbound & outbound connections DL sought confirmation that the group was happy with the float to act as a Listener only.MN repeated the explanation of how outbound connections would be initiated through a SOCKS 4/5 proxy. No objections were raised. diff --git a/docs/source/design/float/decisions/e2e-encryption.md b/docs/source/design/float/decisions/e2e-encryption.md index 9677c57fcd..09c217630d 100644 --- a/docs/source/design/float/decisions/e2e-encryption.md +++ b/docs/source/design/float/decisions/e2e-encryption.md @@ -1,15 +1,9 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Decision: End-to-end encryption -============================================ +# Design Decision: End-to-end encryption ## Background / Context End-to-end encryption is a desirable potential design feature for the [float](../design.md). - - ## Options Analysis ### 1. No end-to-end encryption diff --git a/docs/source/design/float/decisions/p2p-protocol.md b/docs/source/design/float/decisions/p2p-protocol.md index fb0f2bc6e9..edc5f35f18 100644 --- a/docs/source/design/float/decisions/p2p-protocol.md +++ b/docs/source/design/float/decisions/p2p-protocol.md @@ -1,8 +1,4 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Decision: P2P Messaging Protocol -============================================ +# Design Decision: P2P Messaging Protocol ## Background / Context @@ -10,8 +6,6 @@ Corda requires messages to be exchanged between nodes via a well-defined protoco Determining this protocol is a critical upstream dependency for the design of key messaging components including the [float](../design.md). - - ## Options Analysis ### 1. Use AMQP @@ -54,16 +48,10 @@ Point to point links would be standard TLS and the network certificates would be 2. Effort implications - starting from scratch 3. Technical complexity in developing a P2P protocols which is attack tolerant. - - - - ## Recommendation and justification Proceed with Option 1 - - ## Decision taken [DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with Option 1 - Continue to use AMQP (RGB, JC, MH agreed) diff --git a/docs/source/design/float/decisions/pluggable-broker.md b/docs/source/design/float/decisions/pluggable-broker.md index 9ecd8039be..d4f0a8edcd 100644 --- a/docs/source/design/float/decisions/pluggable-broker.md +++ b/docs/source/design/float/decisions/pluggable-broker.md @@ -1,14 +1,9 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Decision: Pluggable Broker prioritisation -============================================ +# Design Decision: Pluggable Broker prioritisation ## Background / Context -A decision on when to prioritise implementation of a pluggable broker has implications for delivery of key messaging components including the [float](../design.md). - - +A decision on when to prioritise implementation of a pluggable broker has implications for delivery of key messaging +components including the [float](../design.md). ## Options Analysis @@ -58,8 +53,12 @@ A decision on when to prioritise implementation of a pluggable broker has implic Proceed with Option 2 (defer development of pluggable brokers until later) - - ## Decision taken -[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with Option 2- Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed) +```eval_rst +.. toctree:: + + drb-meeting-20171116.md +``` + +Proceed with Option 2 - Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed) diff --git a/docs/source/design/float/decisions/ssl-termination.md b/docs/source/design/float/decisions/ssl-termination.md index 59b66fcb99..b42dd82111 100644 --- a/docs/source/design/float/decisions/ssl-termination.md +++ b/docs/source/design/float/decisions/ssl-termination.md @@ -1,21 +1,14 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - --------------------------------------------- -Design Decision: TLS termination point -============================================ +# Design Decision: TLS termination point ## Background / Context -Design of the [float](../design.md) is critically influenced by the decision of where TLS connections to the node should be terminated. - - +Design of the [float](../design.md) is critically influenced by the decision of where TLS connections to the node should +be terminated. ## Options Analysis ### 1. Terminate TLS on Firewall - - #### Advantages 1. Common practice for DMZ web solutions, often with an HSM associated with the Firewall and should be familiar for banks to setup. @@ -39,11 +32,8 @@ Design of the [float](../design.md) is critically influenced by the decision of ##### Disadvantages 1. More work than the do-nothing approach - 2. More protocol to design for sending across the inner firewall. - ​ - ### 2. Direct TLS Termination onto Float #### Advantages @@ -96,8 +86,6 @@ Design of the [float](../design.md) is critically influenced by the decision of Proceed with Variant option 1a: Terminate on firewall; include SASL connection checking. - - ## Decision taken [DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with option 2b - Terminate on float, inject key from internal portion of the float (RGB, JC, MH agreed) diff --git a/docs/source/design/float/design.md b/docs/source/design/float/design.md index 52255b521c..2be1d52b1e 100644 --- a/docs/source/design/float/design.md +++ b/docs/source/design/float/design.md @@ -1,32 +1,16 @@ -![Corda](https://www.corda.net/wp-content/uploads/2016/11/fg005_corda_b.png) - # Float Design --------------------------------------------- -DOCUMENT MANAGEMENT -============================================ - -## Document Control - -* Title: Float Design -* Date: 13th November 2017 -* Author: Matthew Nesbit -* Distribution: Design Review Board, Product Management, Services - Technical (Consulting), Platform Delivery -* Corda target version: Enterprise - -## Document Sign-off - -* Author: David Lee -* Reviewers(s): TBD -* Final approver(s): TBD - -## Document History - -# HIGH LEVEL DESIGN +```eval_rst +.. important:: This design document describes a feature of Corda Enterprise. +``` ## Overview -The role of the 'float' is to meet the requirements of organisations that will not allow direct incoming connections to their node, but would rather host a proxy component in a DMZ to achieve this. As such it needs to meet the requirements of modern DMZ security rules, which essentially assume that the entire machine in the DMZ may become compromised. At the same time, we expect that the Float can interoperate with directly connected nodes, possibly even those using open source Corda. +The role of the 'float' is to meet the requirements of organisations that will not allow direct incoming connections to +their node, but would rather host a proxy component in a DMZ to achieve this. As such it needs to meet the requirements +of modern DMZ security rules, which essentially assume that the entire machine in the DMZ may become compromised. At +the same time, we expect that the Float can interoperate with directly connected nodes, possibly even those using open +source Corda. ### Background @@ -36,7 +20,8 @@ The diagram below illustrates the current mechanism for peer-to-peer messaging b ![Current P2P State](./current-p2p-state.png) -When a flow running on a Corda node triggers a requirement to send a message to a peer node, it first checks for pre-existence of an applicable message queue for that peer. +When a flow running on a Corda node triggers a requirement to send a message to a peer node, it first checks for +pre-existence of an applicable message queue for that peer. **If the relevant queue exists:** @@ -69,49 +54,78 @@ Allow connectivity in compliance with DMZ constraints commonly imposed by modern 2. Data passing from the internet and the internal network via the DMZ should pass through a clear protocol break in the DMZ. 3. Only identified IPs and ports are permitted to access devices in the DMZ; this include communications between devices colocated in the DMZ. 4. Only a limited number of ports are opened in the firewall (<5) to make firewall operation manageable. These ports must change slowly. -5. Any DMZ machine is typically multi-homed, with separate network cards handling traffic through the institutional firewall vs. to the Internet. (There is usually a further hidden management interface card accessed via a jump box for managing the box and shipping audit trail information). This requires that our software can bind listening ports to the correct network card not just to 0.0.0.0. -6. No connections to be initiated by DMZ devices towards the internal network. Communications should be initiated from the internal network to form a bidirectional channel with the proxy process. +5. Any DMZ machine is typically multi-homed, with separate network cards handling traffic through the institutional + firewall vs. to the Internet. (There is usually a further hidden management interface card accessed via a jump box for + managing the box and shipping audit trail information). This requires that our software can bind listening ports to the + correct network card not just to 0.0.0.0. +6. No connections to be initiated by DMZ devices towards the internal network. Communications should be initiated from + the internal network to form a bidirectional channel with the proxy process. 7. No business data should be persisted on the DMZ box. -8. An audit log of all connection events is required to track breaches. Latency information should also be tracked to facilitate management of connectivity issues. -9. Processes on DMZ devices run as local accounts with no relationship to internal permission systems, or ability to enumerate devices on the internal network. +8. An audit log of all connection events is required to track breaches. Latency information should also be tracked to + facilitate management of connectivity issues. +9. Processes on DMZ devices run as local accounts with no relationship to internal permission systems, or ability to + enumerate devices on the internal network. 10. Communications in the DMZ should yse modern TLS, often with local-only certificates/keys that hold no value outside of use in predefined links. 11. Where TLS is required to terminate on the firewall, provide a suitably secure key management mechanism (e.g. an HSM). 12. Any proxy in the DMZ should be subject to the same HA requirements as the devices it is servicing -13. Any business data passing through the proxy should be separately encrypted, so that no data is in the clear of the program memory if the DMZ box is compromised. +13. Any business data passing through the proxy should be separately encrypted, so that no data is in the clear of the + program memory if the DMZ box is compromised. ## Design Decisions -The following design decisions are assumed by this design: +The following design decisions fed into this design: -1. [AMQP vs. custom P2P](./decisions/p2p-protocol.md): Use AMQP -2. [SSL termination (firewall vs. float)](./decisions/ssl-termination.md): Terminate on firewall; include SASL connection checking -3. [End-to-end encryption](./decisions/e2e-encryption.md): Include placeholder only -4. [Prioritisation of pluggable broker support](./decisions/pluggable-broker.md): Defer pluggable brokers until later +```eval_rst +.. toctree:: + :maxdepth: 2 + + decisions/p2p-protocol.md + decisions/ssl-termination.md + decisions/e2e-encryption.md + decisions/pluggable-broker.md + +``` ## Target Solution -The proposed solution introduces a reverse proxy component ("**float**") which may be sited in the DMZ, as illustrated in the diagram below. +The proposed solution introduces a reverse proxy component ("**float**") which may be sited in the DMZ, as illustrated +in the diagram below. ![Full Float Implementation](./full-float.png) -The main role of the float is to forward incoming AMQP link packets from authenticated TLS links to the AMQP Bridge Manager, then echo back final delivery acknowledgements once the Bridge Manager has successfully inserted the messages. The Bridge Manager is responsible for rejecting inbound packets on queues that are not local inboxes to prevent e.g. 'cheating' messages onto management topics, faking outgoing messages etc. +The main role of the float is to forward incoming AMQP link packets from authenticated TLS links to the AMQP Bridge +Manager, then echo back final delivery acknowledgements once the Bridge Manager has successfully inserted the messages. +The Bridge Manager is responsible for rejecting inbound packets on queues that are not local inboxes to prevent e.g. +'cheating' messages onto management topics, faking outgoing messages etc. -The float is linked to the internal AMQP Bridge Manager via a single AMQP/TLS connection, which can contain multiple logical AMQP links. This link is initiated at the socket level by the Bridge Manager towards the float. +The float is linked to the internal AMQP Bridge Manager via a single AMQP/TLS connection, which can contain multiple +logical AMQP links. This link is initiated at the socket level by the Bridge Manager towards the float. -The float is a **listener only** and does not enable outgoing bridges (see Design Decisions, above). Outgoing bridge formation and message sending come directly from the internal Bridge Manager (possibly via a SOCKS 4/5 proxy, which is easy enough to enable in netty, or directly through the corporate firewall. Initiating from the float gives rise to security concerns.) +The float is a **listener only** and does not enable outgoing bridges (see Design Decisions, above). Outgoing bridge +formation and message sending come directly from the internal Bridge Manager (possibly via a SOCKS 4/5 proxy, which is +easy enough to enable in netty, or directly through the corporate firewall. Initiating from the float gives rise to +security concerns.) -The float is **not mandatory**; interoperability with older nodes, even those using direct AMQP from bridges in the node, is supported. +The float is **not mandatory**; interoperability with older nodes, even those using direct AMQP from bridges in the +node, is supported. **No state will be serialized on the float**, although suitably protected logs will be recorded of all float activities. -**End-to-end encryption** of the payload is not delivered through this design (see Design Decisions, above). For current purposes, a header field indicating plaintext/encrypted payload is employed as a placeholder. +**End-to-end encryption** of the payload is not delivered through this design (see Design Decisions, above). For current +*purposes, a header field indicating plaintext/encrypted payload is employed as a placeholder. -**HA** is enabled (this should be easy as the bridge manager can choose which float to make active). Only fully connected DMZ floats should activate their listening port. +**HA** is enabled (this should be easy as the bridge manager can choose which float to make active). Only fully +*connected DMZ floats should activate their listening port. -Implementation of the float is expected to be based on existing AMQP Bridge Manager code - see Implementation Plan, below, for expected work stages. +Implementation of the float is expected to be based on existing AMQP Bridge Manager code - see Implementation Plan, +below, for expected work stages. ### Bridge control protocol -The bridge control is designed to be as stateless as possible. Thus, nodes and bridges restarting must re-request/broadcast information to each other. Messages are sent to a 'bridge.control' address in Artemis as non-persistent messages with a non-durable queue. Each message should contain a duplicate message ID, which is also re-used as the correlation id in replies. Relevant scenarios are described below: + +The bridge control is designed to be as stateless as possible. Thus, nodes and bridges restarting must +re-request/broadcast information to each other. Messages are sent to a 'bridge.control' address in Artemis as +non-persistent messages with a non-durable queue. Each message should contain a duplicate message ID, which is also +re-used as the correlation id in replies. Relevant scenarios are described below: #### On bridge start-up, or reconnection to Artemis 1. The bridge process should subscribe to the 'bridge.control'. @@ -137,60 +151,111 @@ The bridge control is designed to be as stateless as possible. Thus, nodes and b 5. Future QueueSnapshot requests should be responded to with the new queue included in the list. ### Behaviour with a Float portion in the DMZ -1. On initial connection of an inbound bridge, AMQP is configured to run a SASL challenge response to (re-)validate the origin and confirm the client identity. (The most likely SASL mechanism for this is using https://tools.ietf.org/html/rfc3163 as this allows reuse of our PKI certificates in the challenge response. Potentially we could forward some bridge control messages to cover the SASL exchange to the internal Bridge Controller. This would allow us to keep the private keys internal to the organisation, so we may also require a SASLAuth message type as part of the bridge control protocol.) -2. The float restricts acceptable AMQP topics to the name space appropriate for inbound messages only. Hence, there should be no way to tunnel messages to bridge control, or RPC topics on the bus. -3. On receipt of a message from the external network, the Float should append a header to link the source channel's X500 name, then create a Delivery for forwarding the message inwards. -4. The internal Bridge Control Manager process validates the message further to ensure that it is targeted at a legitimate inbox (i.e. not an outbound queue) and then forwards it to the bus. Once delivered to the broker, the Delivery acknowledgements are cascaded back. + +1. On initial connection of an inbound bridge, AMQP is configured to run a SASL challenge response to (re-)validate the + origin and confirm the client identity. (The most likely SASL mechanism for this is using https://tools.ietf.org/html/rfc3163 + as this allows reuse of our PKI certificates in the challenge response. Potentially we could forward some bridge control + messages to cover the SASL exchange to the internal Bridge Controller. This would allow us to keep the private keys + internal to the organisation, so we may also require a SASLAuth message type as part of the bridge control protocol.) +2. The float restricts acceptable AMQP topics to the name space appropriate for inbound messages only. Hence, there + should be no way to tunnel messages to bridge control, or RPC topics on the bus. +3. On receipt of a message from the external network, the Float should append a header to link the source channel's X500 + name, then create a Delivery for forwarding the message inwards. +4. The internal Bridge Control Manager process validates the message further to ensure that it is targeted at a legitimate + inbox (i.e. not an outbound queue) and then forwards it to the bus. Once delivered to the broker, the Delivery + acknowledgements are cascaded back. 5. On receiving Delivery notification from the internal side, the Float acknowledges back the correlated original Delivery. 6. The Float should protect against excessive inbound messages by AMQP flow control and refusing to accept excessive unacknowledged deliveries. -7. The Float only exposes its inbound server socket when activated by a valid AMQP link from the Bridge Control Manager to allow for a simple HA pool of DMZ Float processes. (Floats cannot run hot-hot as this would invalidate Corda's message ordering guarantees.) +7. The Float only exposes its inbound server socket when activated by a valid AMQP link from the Bridge Control Manager + to allow for a simple HA pool of DMZ Float processes. (Floats cannot run hot-hot as this would invalidate Corda's + message ordering guarantees.) +## Implementation plan +### Proposed incremental steps towards a float -# IMPLEMENTATION PLAN +1. First, I would like to more explicitly split the RPC and P2P MessagingService instances inside the Node. They can + keep the same interface, but this would let us develop P2P and RPC at different rates if required. + +2. The current in-node design with Artemis Core bridges should first be replaced with an equivalent piece of code that + initiates send only bridges using an in-house wrapper over the proton-j library. Thus, the current Artemis message + objects will be picked up from existing queues using the CORE protocol via an abstraction interface to allow later + pluggable replacement. The specific subscribed queues are controlled as before and bridges started by the existing code + path. The only difference is the bridges will be the new AMQP client code. The remote Artemis broker should accept + transferred packets directly onto its own inbox queue and acknowledge receipt via standard AMQP Delivery notifications. + This in turn will be acknowledged back to the Artemis Subscriber to permanently remove the message from the source + Artemis queue. The headers for deduplication, address names, etc will need to be mapped to the AMQP messages and we will + have to take care about the message payload. This should be an envelope that is capable in the future of being + end-to-end encrypted. Where possible we should stay close to the current Artemis mappings. -## Proposed Incremental Steps Towards a Float -1. First, I would like to more explicitly split the RPC and P2P MessagingService instances inside the Node. They can keep the same interface, but this would let us develop P2P and RPC at different rates if required. -2. The current in-node design with Artemis Core bridges should first be replaced with an equivalent piece of code that initiates send only bridges using an in-house wrapper over the proton-j library. Thus, the current Artemis message objects will be picked up from existing queues using the CORE protocol via an abstraction interface to allow later pluggable replacement. The specific subscribed queues are controlled as before and bridges started by the existing code path. The only difference is the bridges will be the new AMQP client code. The remote Artemis broker should accept transferred packets directly onto its own inbox queue and acknowledge receipt via standard AMQP Delivery notifications. This in turn will be acknowledged back to the Artemis Subscriber to permanently remove the message from the source Artemis queue. The headers for deduplication, address names, etc will need to be mapped to the AMQP messages and we will have to take care about the message payload. This should be an envelope that is capable in the future of being end-to-end encrypted. Where possible we should stay close to the current Artemis mappings. -3. We need to define a bridge control protocol, so that we can have an out of process float/bridge. The current process is that on message send the node checks the target address to see if the target queue already exists. If the queue doesn't exist it creates a new queue which includes an encoding of the PublicKey in its name. This is picked up by a wrapper around the Artemis Server which is also hosted inside the node and can ask the network map cache for a translation to a target host and port. This in turn allows a new bridge to be provisioned. At node restart the re-population of the network map cache is followed to re-create the bridges to any unsent queues/messages. -4. My proposal for a bridge control protocol is partly influenced by the fact that AMQP does not have a built-in mechanism for queue creation/deletion/enumeration. Also, the flows cannot progress until they are sure that there is an accepting queue. Finally, if one runs a local broker it should be fine to run multiple nodes without any bridge processes. Therefore, I will leave the queue creation as the node's responsibility. Initially we can continue to use the existing CORE protocol for this. The requirement to initiate a bridge will change from being implicit signalling via server queue detection to being an explicit pub-sub message that requests bridge formation. This doesn't need durability, or acknowledgements, because when a bridge process starts it should request a refresh of the required bridge list. The typical create bridge messages should contain: - 1. The queue name (ideally with the sha256 of the PublicKey, not the whole PublicKey as that may not work on brokers with queue name length constraints). - 2. The expected X500Name for the remote TLS certificate. - 3. The list of host and ports to attempt connection to. See separate section for more info. -5. Once we have the bridge protocol in place and a bridge out of process the broker can move out of process too, which is a requirement for clustering anyway. We can then start work on floating the bridge and making our broker pluggable. - 1. At this point the bridge connection to the local queues should be upgraded to also be AMQP client, rather than CORE protocol, which will give the ability for the P2P bridges to work with other broker products. - 2. An independent task is to look at making the Bridge process HA, probably using a similar hot-warm mastering solution as the node, or atomix.io. The inactive node should track the control messages, but obviously doesn't initiate any bridges. - 3. Another potentially parallel piece of development is to start to build a float, which is essentially just splitting the bridge in two and putting in an intermediate hop AMQP/TLS link. The thin proxy in the DMZ zone should be as stateless as possible in this. - 4. Finally, the node should use AMQP to talk to its local broker cluster, but this will have to remain partly tied to Artemis, as queue creation will require sending management messages to the Artemis core, but we should be able to abstract this. Bridge Management Protocol. +3. We need to define a bridge control protocol, so that we can have an out of process float/bridge. The current process + is that on message send the node checks the target address to see if the target queue already exists. If the queue + doesn't exist it creates a new queue which includes an encoding of the PublicKey in its name. This is picked up by a + wrapper around the Artemis Server which is also hosted inside the node and can ask the network map cache for a + translation to a target host and port. This in turn allows a new bridge to be provisioned. At node restart the + re-population of the network map cache is followed to re-create the bridges to any unsent queues/messages. -## Float evolution +4. My proposal for a bridge control protocol is partly influenced by the fact that AMQP does not have a built-in + mechanism for queue creation/deletion/enumeration. Also, the flows cannot progress until they are sure that there is an + accepting queue. Finally, if one runs a local broker it should be fine to run multiple nodes without any bridge + processes. Therefore, I will leave the queue creation as the node's responsibility. Initially we can continue to use the + existing CORE protocol for this. The requirement to initiate a bridge will change from being implicit signalling via + server queue detection to being an explicit pub-sub message that requests bridge formation. This doesn't need + durability, or acknowledgements, because when a bridge process starts it should request a refresh of the required bridge + list. The typical create bridge messages should contain: + + 1. The queue name (ideally with the sha256 of the PublicKey, not the whole PublicKey as that may not work on brokers with queue name length constraints). + 2. The expected X500Name for the remote TLS certificate. + 3. The list of host and ports to attempt connection to. See separate section for more info. + +5. Once we have the bridge protocol in place and a bridge out of process the broker can move out of process too, which + is a requirement for clustering anyway. We can then start work on floating the bridge and making our broker pluggable. + + 1. At this point the bridge connection to the local queues should be upgraded to also be AMQP client, rather than CORE + protocol, which will give the ability for the P2P bridges to work with other broker products. + 2. An independent task is to look at making the Bridge process HA, probably using a similar hot-warm mastering solution + as the node, or atomix.io. The inactive node should track the control messages, but obviously doesn't initiate any + bridges. + 3. Another potentially parallel piece of development is to start to build a float, which is essentially just splitting + the bridge in two and putting in an intermediate hop AMQP/TLS link. The thin proxy in the DMZ zone should be as + stateless as possible in this. + 4. Finally, the node should use AMQP to talk to its local broker cluster, but this will have to remain partly tied + to Artemis, as queue creation will require sending management messages to the Artemis core, but we should be + able to abstract this. + +### Float evolution + +#### In-Process AMQP Bridging -### In-Process AMQP Bridging ![In-Process AMQP Bridging](./in-process-amqp-bridging.png) -1. In this phase of evolution we hook the same bridge creation code as before and use the same in-process data access to network map cache. -2. However, we now implement AMQP sender clients using proton-j and netty for TLS layer and connection retry. -3. This will also involve formalising the AMQP packet format of the Corda P2P protocol. -4. Once a bridge makes a successful link to a remote node's Artemis broker it will subscribe to the associated local queue. -5. The messages will be picked up from the local broker via an Artemis CORE consumer for simplicity of initial implementation. -6. The queue consumer should be implemented with a simple generic interface as façade, to allow future replacement. -7. The message will be sent across the AMQP protocol directly to the remote Artemis broker. -8. Once acknowledgement of receipt is given with an AMQP Delivery notification the queue consumption will be acknowledged. -9. This will remove the original item from the source queue. -10. If delivery fails due to link loss the subscriber should be closed until a new link is established to ensure messages are not consumed. -11. If delivery fails for other reasons there should be some for of periodic retry over the AMQP link. -12. For authentication checks the client cert returned from the remote server will be checked and the link dropped if it doesn't match expectations. +In this phase of evolution we hook the same bridge creation code as before and use the same in-process data access to +network map cache. However, we now implement AMQP sender clients using proton-j and netty for TLS layer and connection +retry. This will also involve formalising the AMQP packet format of the Corda P2P protocol. Once a bridge makes a +successful link to a remote node's Artemis broker it will subscribe to the associated local queue. The messages will be +picked up from the local broker via an Artemis CORE consumer for simplicity of initial implementation. The queue +consumer should be implemented with a simple generic interface as façade, to allow future replacement. The message will +be sent across the AMQP protocol directly to the remote Artemis broker. Once acknowledgement of receipt is given with an +AMQP Delivery notification the queue consumption will be acknowledged. This will remove the original item from the +source queue. If delivery fails due to link loss the subscriber should be closed until a new link is established to +ensure messages are not consumed. If delivery fails for other reasons there should be some for of periodic retry over +the AMQP link. For authentication checks the client cert returned from the remote server will be checked and the link +dropped if it doesn't match expectations. -### Out of process Artemis Broker and Bridges +#### Out of process Artemis Broker and Bridges ![Out of process Artemis Broker and Bridges](./out-of-proc-artemis-broker-bridges.png) -1. Move the Artemis broker and bridge formation logic out of the node. This requires formalising the bridge creation requests, but allows clustered brokers, standardised AMQP usage and ultimately pluggable brokers. -2. We should implement a netty socket server on the bridge and forward authenticated packets to the local Artemis broker inbound queues. An AMQP server socket is required for the float, although it should be transparent whether a NodeInfo refers to a bridge socket address, or an Artemis broker. -3. The queue names should use the sha-256 of the PublicKey not the full key. Also, the name should be used for in and out queues, so that multiple distinct nodes can coexist on the same broker. This will simplify development as developers just run a background broker and shouldn't need to restart it. -4. To export the network map information and to initiate bridges a non-durable bridge control protocol will be needed (in blue). Essentially the messages declare the local queue names and target TLS link information. For in-bound messages only messages for known inbox targets will be acknowledged. -5. It should not be hard to make the bridges active-passive HA as they contain no persisted message state and simple RPC can resync the state of the bridge. -6. Queue creation will remain with the node as this must use non-AMQP mechanisms and because flows should be able to queue sent messages even if the bridge is temporarily down. -7. In parallel work can start to upgrade the local links to Artemis (i.e. the node-Artemis link and the Bridge Manager-Artemis link) to be AMQP clients as much as possible. - -### Full float implementation -As described in the 'Target Solution' section, above. +Move the Artemis broker and bridge formation logic out of the node. This requires formalising the bridge creation +requests, but allows clustered brokers, standardised AMQP usage and ultimately pluggable brokers. We should implement a +netty socket server on the bridge and forward authenticated packets to the local Artemis broker inbound queues. An AMQP +server socket is required for the float, although it should be transparent whether a NodeInfo refers to a bridge socket +address, or an Artemis broker. The queue names should use the sha-256 of the PublicKey not the full key. Also, the name +should be used for in and out queues, so that multiple distinct nodes can coexist on the same broker. This will simplify +development as developers just run a background broker and shouldn't need to restart it. To export the network map +information and to initiate bridges a non-durable bridge control protocol will be needed (in blue). Essentially the +messages declare the local queue names and target TLS link information. For in-bound messages only messages for known +inbox targets will be acknowledged. It should not be hard to make the bridges active-passive HA as they contain no +persisted message state and simple RPC can resync the state of the bridge. Queue creation will remain with the node as +this must use non-AMQP mechanisms and because flows should be able to queue sent messages even if the bridge is +temporarily down. In parallel work can start to upgrade the local links to Artemis (i.e. the node-Artemis link and the +Bridge Manager-Artemis link) to be AMQP clients as much as possible. diff --git a/docs/source/index.rst b/docs/source/index.rst index 5df612f050..4a810feb63 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -59,6 +59,7 @@ We look forward to seeing what you can do with Corda! design/design-review-process.md design/certificate-hierarchies/design.md design/failure-detection-master-election/design.md + design/float/design.md .. toctree:: :caption: Participate