1. The node submits the message to the queue and continues after receiving acknowledgement.
2. The Core Bridge picks up the message and transfers it via a TLS socket to the inbox of the destination node.
3. A flow on the recipient receives message from peer and acknowledged consumption on bus when the flow has checkpointed this progress.
**If the queue does not exist (messaging a new peer):**
1. The flow triggers creation of a new queue with a name encoding the identity of the intended recipient.
2. When the queue creation has completed the node sends the message to the queue.
3. The hosted Artemis server within the node has a queue creation hook which is called.
4. The queue name is used to lookup the remote connection details and a new bridge is registered.
5. The client certificate of the peer is compared to the expected legal identity X500 Name. If this is OK, message flow proceeds as for a pre-existing queue (above).
* Allow connections to a Corda node without requiring the node itself to have a public IP address. Separate TLS connection handling from the MQ broker.
* Non-goals (out of scope):
* Support for MQ brokers other than Apache Artemis
## Timeline
For delivery by end Q1 2018.
## Requirements
Allow connectivity in compliance with DMZ constraints commonly imposed by modern financial institutions; namely:
1. Firewalls required between the internet and any device in the DMZ, and between the DMZ and the internal network
2. Data passing from the internet and the internal network via the DMZ should pass through a clear protocol break in the DMZ.
#### On bridge start-up, or reconnection to Artemis
1. The bridge process should subscribe to the 'bridge.control'.
2. The bridge should start sending QueueQuery messages which will contain a unique message id and an identifier for the bridge sending the message.
3. The bridge should continue to send these until at least one node replies with a matched QueueSnapshot message.
4. The QueueSnapshot message replies from the nodes contains a correlationId field set to the unique id of the QueueQuery query, or the correlation id is null. The message payload is a list of inbox queue info items and a list of outbound queue info items. Each queue info item is a tuple of Legal X500 Name (as expected upon the destination TLS certificates) and the queue name which should have the form of "internal.peers."+hash key of legal identity (using the same algorithm as we use in the db to make the string). Note this queue name is a change from the current logic, but will be more portable to length constrained topics and allow multiple inboxes on the same broker.
5. The bridge should process the QueueSnapshot, initiating links to the outgoing targets. It should also add expected inboxes to its in-bound permission list.
6. When an outgoing link is successfully formed the remote client certificate should be checked against the expected X500 name. Assuming the link is valid the bridge should subscribe to the related queue and start trying to forward the messages.
#### On node start-up, or reconnection to Artemis
1. The node should subscribe to 'bridge.control'.
2. The node should enumerate the queues and identify which are have well known identities in the network map cache. The appropriate information about its own inboxes and any known outgoing queues should be compiled into an unsolicited QueueSnapshot message with a null correlation id. This should be broadcasted to update any bridges that are running.
3. If any QueueQuery messages arrive these should be responded to with specific QueueSnapshot messages with the correlation id set.
#### On network map updates
1. On receipt of any network map cache updates the information should be evaluated to see if any addition queues can now be mapped to a bridge. At this point a BridgeRequest packet should be sent which will contain the legal X500Name and queue name of the new update.
#### On flow message to Peer
1. If a message is to be sent to a peer the code should (as it does now) check for queue existence in its cache and then on the broker. If it does exist it simply sends the message.
2. If the queue is not listed in its cache it should block until the queue is created (this should be safe versus race conditions with other nodes).
3. Once the queue is created the original message and subsequent messages can now be sent.
4. In parallel a BridgeRequest packet should be sent to activate a new connection outwards. This will contain the contain the legal X500Name and queue name of the new queue.
5. Future QueueSnapshot requests should be responded to with the new queue included in the list.
7. The Float only exposes its inbound server socket when activated by a valid AMQP link from the Bridge Control Manager
to allow for a simple HA pool of DMZ Float processes. (Floats cannot run hot-hot as this would invalidate Corda's
message ordering guarantees.)
## Implementation plan
### Proposed incremental steps towards a float
1. First, I would like to more explicitly split the RPC and P2P MessagingService instances inside the Node. They can
keep the same interface, but this would let us develop P2P and RPC at different rates if required.
2. The current in-node design with Artemis Core bridges should first be replaced with an equivalent piece of code that
initiates send only bridges using an in-house wrapper over the proton-j library. Thus, the current Artemis message
objects will be picked up from existing queues using the CORE protocol via an abstraction interface to allow later
pluggable replacement. The specific subscribed queues are controlled as before and bridges started by the existing code
path. The only difference is the bridges will be the new AMQP client code. The remote Artemis broker should accept
transferred packets directly onto its own inbox queue and acknowledge receipt via standard AMQP Delivery notifications.
This in turn will be acknowledged back to the Artemis Subscriber to permanently remove the message from the source
Artemis queue. The headers for deduplication, address names, etc will need to be mapped to the AMQP messages and we will
have to take care about the message payload. This should be an envelope that is capable in the future of being
end-to-end encrypted. Where possible we should stay close to the current Artemis mappings.
3. We need to define a bridge control protocol, so that we can have an out of process float/bridge. The current process
is that on message send the node checks the target address to see if the target queue already exists. If the queue
doesn't exist it creates a new queue which includes an encoding of the PublicKey in its name. This is picked up by a
wrapper around the Artemis Server which is also hosted inside the node and can ask the network map cache for a
translation to a target host and port. This in turn allows a new bridge to be provisioned. At node restart the
re-population of the network map cache is followed to re-create the bridges to any unsent queues/messages.
4. My proposal for a bridge control protocol is partly influenced by the fact that AMQP does not have a built-in
mechanism for queue creation/deletion/enumeration. Also, the flows cannot progress until they are sure that there is an
accepting queue. Finally, if one runs a local broker it should be fine to run multiple nodes without any bridge
processes. Therefore, I will leave the queue creation as the node's responsibility. Initially we can continue to use the
existing CORE protocol for this. The requirement to initiate a bridge will change from being implicit signalling via
server queue detection to being an explicit pub-sub message that requests bridge formation. This doesn't need
durability, or acknowledgements, because when a bridge process starts it should request a refresh of the required bridge
list. The typical create bridge messages should contain:
1. The queue name (ideally with the sha256 of the PublicKey, not the whole PublicKey as that may not work on brokers with queue name length constraints).
2. The expected X500Name for the remote TLS certificate.
3. The list of host and ports to attempt connection to. See separate section for more info.
5. Once we have the bridge protocol in place and a bridge out of process the broker can move out of process too, which
is a requirement for clustering anyway. We can then start work on floating the bridge and making our broker pluggable.
1. At this point the bridge connection to the local queues should be upgraded to also be AMQP client, rather than CORE
protocol, which will give the ability for the P2P bridges to work with other broker products.
2. An independent task is to look at making the Bridge process HA, probably using a similar hot-warm mastering solution
as the node, or atomix.io. The inactive node should track the control messages, but obviously doesn't initiate any
bridges.
3. Another potentially parallel piece of development is to start to build a float, which is essentially just splitting
the bridge in two and putting in an intermediate hop AMQP/TLS link. The thin proxy in the DMZ zone should be as
stateless as possible in this.
4. Finally, the node should use AMQP to talk to its local broker cluster, but this will have to remain partly tied
to Artemis, as queue creation will require sending management messages to the Artemis core, but we should be