Merge pull request #1252 from corda/anthony/remove-unlinked-docs
ENT-1955 Remove some unlinked docs
@ -1,58 +0,0 @@
|
||||
API stability check
|
||||
===================
|
||||
|
||||
We have committed not to alter Corda's API so that developers will not have to keep rewriting their CorDapps with each
|
||||
new Corda release. The stable Corda modules are listed :ref:`here <internal-apis-and-stability-guarantees>`. Our CI process runs an "API Stability"
|
||||
check for each GitHub pull request in order to check that we don't accidentally introduce an API-breaking change.
|
||||
|
||||
Build Process
|
||||
-------------
|
||||
|
||||
As part of the build process the following commands are run for each PR:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ gradlew generateApi
|
||||
$ .ci/check-api-changes.sh
|
||||
|
||||
This ``bash`` script has been tested on both MacOS and various Linux distributions, it can also be run on Windows with the
|
||||
use of a suitable bash emulator such as git bash. The script's return value is the number of API-breaking changes that it
|
||||
has detected, and this should be zero for the check to pass. The maximum return value is 255, although the script will still
|
||||
correctly report higher numbers of breaking changes.
|
||||
|
||||
There are three kinds of breaking change:
|
||||
|
||||
* Removal or modification of existing API, i.e. an existing class, method or field has been either deleted or renamed, or
|
||||
its signature somehow altered.
|
||||
* Addition of a new method to an interface or abstract class. Types that have been annotated as ``@DoNotImplement`` are
|
||||
excluded from this check. (This annotation is also inherited across subclasses and sub-interfaces.)
|
||||
* Exposure of an internal type via a public API. Internal types are considered to be anything in a ``*.internal.`` package
|
||||
or anything in a module that isn't in the stable modules list :ref:`here <internal-apis-and-stability-guarantees>`.
|
||||
|
||||
Developers can execute these commands themselves before submitting their PR, to ensure that they haven't inadvertently
|
||||
broken Corda's API.
|
||||
|
||||
|
||||
How it works
|
||||
------------
|
||||
|
||||
The ``generateApi`` Gradle task writes a summary of Corda's public API into the file ``build/api/api-corda-<version>.txt``.
|
||||
The ``.ci/check-api-changes.sh`` script then compares this file with the contents of ``.ci/api-current.txt``, which is a
|
||||
managed file within the Corda repository.
|
||||
|
||||
The Gradle task itself is implemented by the API Scanner plugin. More information on the API Scanner plugin is available `here <https://github.com/corda/corda-gradle-plugins/tree/master/api-scanner>`_.
|
||||
|
||||
|
||||
Updating the API
|
||||
----------------
|
||||
|
||||
As a rule, ``api-current.txt`` should only be updated by the release manager for each Corda release.
|
||||
|
||||
We do not expect modifications to ``api-current.txt`` as part of normal development. However, we may sometimes need to adjust
|
||||
the public API in ways that would not break developers' CorDapps but which would be blocked by the API Stability check.
|
||||
For example, migrating a method from an interface into a superinterface. Any changes to the API summary file should be
|
||||
included in the PR, which would then need explicit approval from either `Mike Hearn <https://github.com/mikehearn>`_, `Rick Parker <https://github.com/rick-r3>`_ or `Matthew Nesbit <https://github.com/mnesbit>`_.
|
||||
|
||||
.. note:: If you need to modify ``api-current.txt``, do not re-generate the file on the master branch. This will include new API that
|
||||
hasn't been released or committed to, and may be subject to change. Manually change the specific line or lines of the
|
||||
existing committed API that has changed.
|
@ -1,72 +0,0 @@
|
||||
AWS Marketplace
|
||||
===============
|
||||
|
||||
To help you design, build and test applications on Corda, called CorDapps, a Corda network AMI can be deployed from the `AWS Marketplace <https://aws.amazon.com/marketplace/pp/B077PG9SP5>`__. Instructions on running Corda nodes can be found `here <https://docs.corda.net/deploying-a-node.html>`_.
|
||||
|
||||
This Corda network offering builds a pre-configured network of Corda nodes as Ubuntu virtual machines (VM). The network consists of a Notary node and three Corda nodes using version 1 of Corda. The following guide will also show you how to load one of four `Corda Sample apps <https://www.corda.net/samples>`_ which demonstrates the basic principles of Corda. When you are ready to go further with developing on Corda and start making contributions to the project head over to the `Corda.net <https://www.corda.net/>`_.
|
||||
|
||||
Pre-requisites
|
||||
--------------
|
||||
* Ensure you have a registered AWS account which can create virtual machines under your subscription(s) and you are logged on to the `AWS portal <https://console.aws.amazon.com>`_
|
||||
* It is recommended you generate a private-public SSH key pair (see `here <https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2/>`__)
|
||||
|
||||
|
||||
Deploying a Corda Network
|
||||
-------------------------
|
||||
|
||||
Browse to the `AWS Marketplace <https://aws.amazon.com/marketplace>`__ and search for Corda.
|
||||
|
||||
Follow the instructions to deploy the AMI to an instance of EC2 which is in a region near to your location.
|
||||
|
||||
Build and Run a Sample CorDapp
|
||||
------------------------------
|
||||
Once the instance is running ssh into the instance using your keypair
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
cd ~/dev
|
||||
|
||||
There are 4 sample apps available by default
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
ubuntu@ip-xxx-xxx-xxx-xxx:~/dev$ ls -la
|
||||
total 24
|
||||
drwxrwxr-x 6 ubuntu ubuntu 4096 Nov 13 21:48 .
|
||||
drwxr-xr-x 8 ubuntu ubuntu 4096 Nov 21 16:34 ..
|
||||
drwxrwxr-x 11 ubuntu ubuntu 4096 Oct 31 19:02 cordapp-example
|
||||
drwxrwxr-x 9 ubuntu ubuntu 4096 Nov 13 21:48 obligation-cordapp
|
||||
drwxrwxr-x 11 ubuntu ubuntu 4096 Nov 13 21:48 oracle-example
|
||||
drwxrwxr-x 8 ubuntu ubuntu 4096 Nov 13 21:48 yo-cordapp
|
||||
|
||||
cd into the Corda sample you would like to run. For example:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
cd cordapp-example/
|
||||
|
||||
Follow instructions for the specific sample at https://www.corda.net/samples to build and run the Corda sample
|
||||
For example: with cordapp-example (IOU app) the following commands would be run:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
./gradlew deployNodes
|
||||
./kotlin-source/build/nodes/runnodes
|
||||
|
||||
Then start the Corda webserver
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
find ~/dev/cordapp-example/kotlin-source/ -name corda-webserver.jar -execdir sh -c 'java -jar {} &' \;
|
||||
|
||||
You can now interact with your running CorDapp. See the instructions `here <https://docs.corda.net/tutorial-cordapp.html#via-http>`__.
|
||||
|
||||
Next Steps
|
||||
----------
|
||||
Now you have built a Corda network and used a basic Corda Cordapp do go and visit the `dedicated Corda website <https://www.corda.net>`_
|
||||
|
||||
Additional support is available on `Stack Overflow <https://stackoverflow.com/questions/tagged/corda>`_ and the `Corda Slack channel <https://slack.corda.net/>`_.
|
||||
|
||||
You can build and run any other `Corda samples <https://www.corda.net/samples>`_ or your own custom CorDapp here.
|
||||
|
||||
Or to join the growing Corda community and get straight into the Corda open source codebase, head over to the `Github Corda repo <https://www.github.com/corda>`_
|
@ -1,214 +0,0 @@
|
||||
Azure Marketplace
|
||||
=================
|
||||
|
||||
To help you design, build and test applications on Corda, called CorDapps, a Corda network can be deployed on the `Microsoft Azure Marketplace <https://azure.microsoft.com/en-gb/overview/what-is-azure>`_
|
||||
|
||||
This Corda network offering builds a pre-configured network of Corda nodes as Ubuntu virtual machines (VM). The network comprises of a Notary node and up to nine Corda nodes using a version of Corda of your choosing. The following guide will also show you how to load a simple Yo! CorDapp which demonstrates the basic principles of Corda. When you are ready to go further with developing on Corda and start making contributions to the project head over to the `Corda.net <https://www.corda.net/>`_.
|
||||
|
||||
Pre-requisites
|
||||
--------------
|
||||
* Ensure you have a registered Microsoft Azure account which can create virtual machines under your subscription(s) and you are logged on to the Azure portal (portal.azure.com)
|
||||
* It is recommended you generate a private-public SSH key pair (see `here <https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2/>`__)
|
||||
|
||||
|
||||
Deploying the Corda Network
|
||||
---------------------------
|
||||
|
||||
Browse to portal.azure.com, login and search the Azure Marketplace for Corda and select 'Corda Single Ledger Network'.
|
||||
|
||||
Click the 'Create' button.
|
||||
|
||||
STEP 1: Basics
|
||||
|
||||
Define the basic parameters which will be used to pre-configure your Corda nodes.
|
||||
|
||||
* **Resource prefix**: Choose an appropriate descriptive name for your Corda nodes. This name will prefix the node hostnames
|
||||
* **VM user name**: This is the user login name on the Ubuntu VMs. Leave it as azureuser or define your own
|
||||
* **Authentication type**: Select 'SSH public key', then paste the contents of your SSH public key file (see pre-requisites, above) into the box. Alternatively select 'Password' to use a password of your choice to administer the VM
|
||||
* **Restrict access by IP address**: Leave this as 'No' to allow access from any internet host, or provide an IP address or a range of IP addresses to limit access
|
||||
* **Subscription**: Select which of your Azure subscriptions you want to use
|
||||
* **Resource group**: Choose to 'Create new' and provide a useful name of your choice
|
||||
* **Location**: Select the geographical location physically closest to you
|
||||
|
||||
.. image:: resources/azure_multi_node_step1.png
|
||||
:width: 300px
|
||||
|
||||
Click 'OK'
|
||||
|
||||
STEP 2: Network Size and Performance
|
||||
|
||||
Define the number of Corda nodes in your network and the size of VM.
|
||||
|
||||
* **Number of Network Map nodes**: There can only be one Network Map node in this network. Leave as '1'
|
||||
* **Number of Notary nodes**: There can only be one Notary node in this network. Leave as '1'
|
||||
* **Number of participant nodes**: This is the number of Corda nodes in your network. At least 2 nodes in your network is recommended (so you can send transactions between them). You can specific 1 participant node and use the Notary node as a second node. There is an upper limit of 9
|
||||
* **Storage performance**: Leave as 'Standard'
|
||||
* **Virtual machine size**: The size of the VM is automatically adjusted to suit the number of participant nodes selected. It is recommended to use the suggested values
|
||||
|
||||
.. image:: resources/azure_multi_node_step2.png
|
||||
:width: 300px
|
||||
|
||||
Click 'OK'
|
||||
|
||||
STEP 3: Corda Specific Options
|
||||
|
||||
Define the version of Corda you want on your nodes and the type of notary.
|
||||
|
||||
* **Corda version (as seen in Maven Central)**: Select the version of Corda you want your nodes to use from the drop down list. The version numbers can be seen in `Maven Central <http://repo1.maven.org/maven2/net/corda/corda/>`_, for example 0.11.0
|
||||
* **Notary type**: Select either 'Non Validating' (notary only checks whether a state has been previously used and marked as historic) or 'Validating' (notary performs transaction verification by seeing input and output states, attachments and other transaction information). More information on notaries can be found `here <https://vimeo.com/album/4555732/video/214138458>`_
|
||||
|
||||
.. image:: resources/azure_multi_node_step3.png
|
||||
:width: 300px
|
||||
|
||||
Click 'OK'
|
||||
|
||||
STEP 4: Summary
|
||||
|
||||
A summary of your selections is shown.
|
||||
|
||||
.. image:: resources/azure_multi_node_step4.png
|
||||
:width: 300px
|
||||
|
||||
Click 'OK' for your selection to be validated. If everything is ok you will see the message 'Validation passed'
|
||||
|
||||
Click 'OK'
|
||||
|
||||
STEP 5: Buy
|
||||
|
||||
Review the Azure Terms of Use and Privacy Policy and click 'Purchase' to buy the Azure VMs which will host your Corda nodes.
|
||||
|
||||
The deployment process will start and typically takes 8-10 minutes to complete.
|
||||
|
||||
Once deployed click 'Resources Groups', select the resource group you defined in Step 1 above and click 'Overview' to see the virtual machine details. The names of your VMs will be pre-fixed with the resource prefix value you defined in Step 1 above.
|
||||
|
||||
The Network Map Service node is suffixed nm0. The Notary node is suffixed not0. Your Corda participant nodes are suffixed node0, node1, node2 etc. Note down the **Public IP address** for your Corda nodes. You will need these to connect to UI screens via your web browser:
|
||||
|
||||
.. image:: resources/azure_ip.png
|
||||
:width: 300px
|
||||
|
||||
Using the Yo! CorDapp
|
||||
---------------------
|
||||
Loading the Yo! CordDapp on your Corda nodes lets you send simple Yo! messages to other Corda nodes on the network. A Yo! message is a very simple transaction. The Yo! CorDapp demonstrates:
|
||||
|
||||
- how transactions are only sent between Corda nodes which they are intended for and are not shared across the entire network by using the network map
|
||||
- uses a pre-defined flow to orchestrate the ledger update automatically
|
||||
- the contract imposes rules on the ledger updates
|
||||
|
||||
|
||||
* **Loading the Yo! CorDapp onto your nodes**
|
||||
|
||||
The nodes you will use to send and receive Yo messages require the Yo! CorDapp jar file to be saved to their cordapps directory.
|
||||
|
||||
Connect to one of your Corda nodes (make sure this is not the Notary node) using an SSH client of your choice (e.g. Putty) and log into the virtual machine using the public IP address and your SSH key or username / password combination you defined in Step 1 of the Azure build process. Type the following command:
|
||||
|
||||
For Corda nodes running release M10
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
cd /opt/corda/cordapps
|
||||
wget http://downloads.corda.net/cordapps/net/corda/yo/0.10.1/yo-0.10.1.jar
|
||||
|
||||
For Corda nodes running release M11
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
cd /opt/corda/cordapps
|
||||
wget http://downloads.corda.net/cordapps/net/corda/yo/0.11.0/yo-0.11.0.jar
|
||||
|
||||
For Corda nodes running version 2
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
cd /opt/corda/plugins
|
||||
wget http://ci-artifactory.corda.r3cev.com/artifactory/cordapp-showcase/yo-4.jar
|
||||
|
||||
|
||||
|
||||
Now restart Corda and the Corda webserver using the following commands or restart your Corda VM from the Azure portal:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
sudo systemctl restart corda
|
||||
sudo systemctl restart corda-webserver
|
||||
|
||||
Repeat these steps on other Corda nodes on your network which you want to send or receive Yo messages.
|
||||
|
||||
* **Verify the Yo! CorDapp is running**
|
||||
|
||||
Open a browser tab and browse to the following URL:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
http://(public IP address):(port)/web/yo
|
||||
|
||||
where (public IP address) is the public IP address of one of your Corda nodes on the Azure Corda network and (port) is the web server port number for your Corda node, 10004 by default
|
||||
|
||||
You will now see the Yo! CordDapp web interface:
|
||||
|
||||
.. image:: resources/Yo_web_ui.png
|
||||
:width: 300px
|
||||
|
||||
* **Sending a Yo message via the web interface**
|
||||
|
||||
In the browser window type the following URL to send a Yo message to a target node on your Corda network:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
http://(public IP address):(port)/api/yo/yo?target=(legalname of target node)
|
||||
|
||||
where (public IP address) is the public IP address of one of your Corda nodes on the Azure Corda network and (port) is the web server port number for your Corda node, 10004 by default and (legalname of target node) is the Legal Name for the target node as defined in the node.conf file, for example:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
http://40.69.40.42:10004/api/yo/yo?target=Corda 0.10.1 Node 1 in tstyo2
|
||||
|
||||
An easy way to see the Legal Names of Corda nodes on the network is to use the peers screen:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
http://(public IP address):(port)/api/yo/peers
|
||||
|
||||
.. image:: resources/yo_peers2.png
|
||||
:width: 300px
|
||||
|
||||
* **Viewing Yo messages**
|
||||
|
||||
To see Yo! messages sent to a particular node open a browser window and browse to the following URL:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
http://(public IP address):(port)/api/yo/yos
|
||||
|
||||
.. image:: resources/azure_yos.png
|
||||
:width: 300px
|
||||
|
||||
Viewing logs
|
||||
------------
|
||||
Users may wish to view the raw logs generated by each node, which contain more information about the operations performed by each node.
|
||||
|
||||
You can access these using an SSH client of your choice (e.g. Putty) and logging into the virtual machine using the public IP address.
|
||||
Once logged in, navigate to the following directory for Corda logs (node-xxxxxx):
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
/opt/corda/logs
|
||||
|
||||
And navigate to the following directory for system logs (syslog):
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
/var/log
|
||||
|
||||
You can open log files with any text editor.
|
||||
|
||||
.. image:: resources/azure_vm_10_49.png
|
||||
:width: 300px
|
||||
|
||||
.. image:: resources/azure_syslog.png
|
||||
:width: 300px
|
||||
|
||||
Next Steps
|
||||
----------
|
||||
Now you have built a Corda network and used a basic Corda CorDapp do go and visit the `dedicated Corda website <https://www.corda.net>`_
|
||||
|
||||
Or to join the growing Corda community and get straight into the Corda open source codebase, head over to the `Github Corda repo <https://www.github.com/corda>`_
|
@ -1,51 +0,0 @@
|
||||
Building the documentation
|
||||
==========================
|
||||
|
||||
The documentation is under the ``docs`` folder, and is written in reStructuredText format. Documentation in HTML format
|
||||
is pre-generated, as well as code documentation, and this can be done automatically via a provided script.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
To build the documentation, you will need:
|
||||
|
||||
* GNU Make
|
||||
* Python and pip (tested with Python 2.7.10)
|
||||
* Sphinx: http://www.sphinx-doc.org/
|
||||
* sphinx_rtd_theme: https://github.com/snide/sphinx_rtd_theme
|
||||
|
||||
Note that to install under OS X El Capitan, you will need to tell pip to install under ``/usr/local``, which can be
|
||||
done by specifying the installation target on the command line:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
sudo -H pip install --install-option '--install-data=/usr/local' Sphinx
|
||||
sudo -H pip install --install-option '--install-data=/usr/local' sphinx_rtd_theme
|
||||
|
||||
.. warning:: When installing Sphinx, you may see the following error message: "Found existing installation: six 1.4.1
|
||||
Cannot uninstall 'six'. It is a distutils installed project and thus we cannot accurately determine which files
|
||||
belong to it which would lead to only a partial uninstall.". If so, run the install with the
|
||||
``--ignore-installed six`` flag.
|
||||
|
||||
Build
|
||||
-----
|
||||
|
||||
Once the requirements are installed, you can automatically build the HTML format user documentation and the API
|
||||
documentation by running the following script:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
./gradlew buildDocs
|
||||
|
||||
Alternatively you can build non-HTML formats from the ``docs`` folder. Change directory to the folder and then run the
|
||||
following to see a list of all available formats:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
make
|
||||
|
||||
For example to produce the documentation in HTML format:
|
||||
|
||||
.. sourcecode:: shell
|
||||
|
||||
make html
|
@ -1,222 +0,0 @@
|
||||
Code style guide
|
||||
================
|
||||
|
||||
This document explains the coding style used in the Corda repository. You will be expected to follow these
|
||||
recommendations when submitting patches for review. Please take the time to read them and internalise them, to save
|
||||
time during code review.
|
||||
|
||||
What follows are *recommendations* and not *rules*. They are in places intentionally vague, so use your good judgement
|
||||
when interpreting them.
|
||||
|
||||
.. note:: Parts of the codebase may not follow this style guide yet. If you see a place that doesn't, please fix it!
|
||||
|
||||
1. General style
|
||||
################
|
||||
|
||||
We use the standard Java coding style from Sun, adapted for Kotlin in ways that should be fairly intuitive.
|
||||
|
||||
Files no longer have copyright notices at the top, and license is now specified in the global README.md file.
|
||||
We do not mark classes with @author Javadoc annotations.
|
||||
|
||||
In Kotlin code, KDoc is used rather than JavaDoc. It's very similar except it uses Markdown for formatting instead
|
||||
of HTML tags.
|
||||
|
||||
We target Java 8 and use the latest Java APIs whenever convenient. We use ``java.time.Instant`` to represent timestamps
|
||||
and ``java.nio.file.Path`` to represent file paths.
|
||||
|
||||
Never apply any design pattern religiously. There are no silver bullets in programming and if something is fashionable,
|
||||
that doesn't mean it's always better. In particular:
|
||||
|
||||
* Use functional programming patterns like map, filter, fold only where it's genuinely more convenient. Never be afraid
|
||||
to use a simple imperative construct like a for loop or a mutable counter if that results in more direct, English-like
|
||||
code.
|
||||
* Use immutability when you don't anticipate very rapid or complex changes to the content. Immutability can help avoid
|
||||
bugs, but over-used it can make code that has to adjust fields of an immutable object (in a clone) hard to read and
|
||||
stress the garbage collector. When such code becomes a widespread pattern it can lead to code that is just generically
|
||||
slow but without hotspots.
|
||||
* The trade-offs between various thread safety techniques are complex, subtle, and no technique is always superior to
|
||||
the others. Our code uses a mix of locks, worker threads and messaging depending on the situation.
|
||||
|
||||
1.1 Line Length and Spacing
|
||||
---------------------------
|
||||
|
||||
We aim for line widths of no more than 120 characters. That is wide enough to avoid lots of pointless wrapping but
|
||||
narrow enough that with a widescreen monitor and a 12 point fixed width font (like Menlo) you can fit two files
|
||||
next to each other. This is not a rigidly enforced rule and if wrapping a line would be excessively awkward, let it
|
||||
overflow. Overflow of a few characters here and there isn't a big deal: the goal is general convenience.
|
||||
|
||||
Where the number of parameters in a function, class, etc. causes an overflow past the end of the first line, they should
|
||||
be structured one parameter per line.
|
||||
|
||||
Code is vertically dense, blank lines in methods are used sparingly. This is so more code can fit on screen at once.
|
||||
|
||||
We use spaces and not tabs, with indents being 4 spaces wide.
|
||||
|
||||
1.2 Naming
|
||||
----------
|
||||
|
||||
Naming generally follows Java standard style (pascal case for class names, camel case for methods, properties and
|
||||
variables). Where a class name describes a tuple, "And" should be included in order to clearly indicate the elements are
|
||||
individual parts, for example ``PartyAndReference``, not ``PartyReference`` (which sounds like a reference to a
|
||||
``Party``).
|
||||
|
||||
2. Comments
|
||||
###########
|
||||
|
||||
We like them as long as they add detail that is missing from the code. Comments that simply repeat the story already
|
||||
told by the code are best deleted. Comments should:
|
||||
|
||||
* Explain what the code is doing at a higher level than is obtainable from just examining the statement and
|
||||
surrounding code.
|
||||
* Explain why certain choices were made and the trade-offs considered.
|
||||
* Explain how things can go wrong, which is a detail often not easily seen just by reading the code.
|
||||
* Use good grammar with capital letters and full stops. This gets us in the right frame of mind for writing real
|
||||
explanations of things.
|
||||
|
||||
When writing code, imagine that you have an intelligent colleague looking over your shoulder asking you questions
|
||||
as you go. Think about what they might ask, and then put your answers in the code.
|
||||
|
||||
Don’t be afraid of redundancy, many people will start reading your code in the middle with little or no idea of what
|
||||
it’s about (e.g. due to a bug or a need to introduce a new feature). It’s OK to repeat basic facts or descriptions in
|
||||
different places if that increases the chance developers will see something important.
|
||||
|
||||
API docs: all public methods, constants and classes should have doc comments in either JavaDoc or KDoc. API docs should:
|
||||
|
||||
* Explain what the method does in words different to how the code describes it.
|
||||
* Always have some text, annotation-only JavaDocs don’t render well. Write “Returns a blah blah blah” rather
|
||||
than “@returns blah blah blah” if that's the only content (or leave it out if you have nothing more to say than the
|
||||
code already says).
|
||||
* Illustrate with examples when you might want to use the method or class. Point the user at alternatives if this code
|
||||
is not always right.
|
||||
* Make good use of {@link} annotations.
|
||||
|
||||
Bad JavaDocs look like this:
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
/** @return the size of the Bloom filter. */
|
||||
public int getBloomFilterSize() {
|
||||
return block;
|
||||
}
|
||||
|
||||
Good JavaDocs look like this:
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
/**
|
||||
* Returns the size of the current {@link BloomFilter} in bytes. Larger filters have
|
||||
* lower false positive rates for the same number of inserted keys and thus lower privacy,
|
||||
* but bandwidth usage is also correspondingly reduced.
|
||||
*/
|
||||
public int getBloomFilterSize() { ... }
|
||||
|
||||
We use C-style (``/** */``) comments for API docs and we use C++ style comments (``//``) for explanations that are
|
||||
only intended to be viewed by people who read the code.
|
||||
When writing multi-line TODO comments, indent the body text past the TODO line, for example:
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
// TODO: Something something
|
||||
// More stuff to do
|
||||
// Etc. etc.
|
||||
|
||||
3. Threading
|
||||
############
|
||||
|
||||
Classes that are thread safe should be annotated with the ``@ThreadSafe`` annotation. The class or method comments
|
||||
should describe how threads are expected to interact with your code, unless it's obvious because the class is
|
||||
(for example) a simple immutable data holder.
|
||||
|
||||
Code that supports callbacks or event listeners should always accept an ``Executor`` argument that defaults to
|
||||
``MoreExecutors.directThreadExecutor()`` (i.e. the calling thread) when registering the callback. This makes it easy
|
||||
to integrate the callbacks with whatever threading environment the calling code expects, e.g. serialised onto a single
|
||||
worker thread if necessary, or run directly on the background threads used by the class if the callback is thread safe
|
||||
and doesn't care in what context it's invoked.
|
||||
|
||||
In the prototyping code it's OK to use synchronised methods i.e. with an exposed lock when the use of locking is quite
|
||||
trivial. If the synchronisation in your code is getting more complex, consider the following:
|
||||
|
||||
1. Is the complexity necessary? At this early stage, don't worry too much about performance or scalability, as we're
|
||||
exploring the design space rather than making an optimal implementation of a design that's already nailed down.
|
||||
2. Could you simplify it by making the data be owned by a dedicated, encapsulated worker thread? If so, remember to
|
||||
think about flow control and what happens if a work queue fills up: the actor model can often be useful but be aware
|
||||
of the downsides and try to avoid explicitly defining messages, prefer to send closures onto the worker thread
|
||||
instead.
|
||||
3. If you use an explicit lock and the locking gets complex, and *always* if the class supports callbacks, use the
|
||||
cycle detecting locks from the Guava library.
|
||||
4. Can you simplify some things by using thread-safe collections like ``CopyOnWriteArrayList`` or ``ConcurrentHashMap``?
|
||||
These data structures are more expensive than their non-thread-safe equivalents but can be worth it if it lets us
|
||||
simplify the code.
|
||||
|
||||
Immutable data structures can be very useful for making it easier to reason about multi-threaded code. Kotlin makes it
|
||||
easy to define these via the "data" attribute, which auto-generates a copy() method. That lets you create clones of
|
||||
an immutable object with arbitrary fields adjusted in the clone. But if you can't use the data attribute for some
|
||||
reason, for instance, you are working in Java or because you need an inheritance hierarchy, then consider that making
|
||||
a class fully immutable may result in very awkward code if there's ever a need to make complex changes to it. If in
|
||||
doubt, ask. Remember, never apply any design pattern religiously.
|
||||
|
||||
We have an extension to the ``Executor`` interface called ``AffinityExecutor``. It is useful when the thread safety
|
||||
of a piece of code is based on expecting to be called from a single thread only (or potentially, a single thread pool).
|
||||
``AffinityExecutor`` has additional methods that allow for thread assertions. These can be useful to ensure code is not
|
||||
accidentally being used in a multi-threaded way when it didn't expect that.
|
||||
|
||||
4. Assertions and errors
|
||||
########################
|
||||
|
||||
We use them liberally and we use them at runtime, in production. That means we avoid the "assert" keyword in Java,
|
||||
and instead prefer to use the ``check()`` or ``require()`` functions in Kotlin (for an ``IllegalStateException`` or
|
||||
``IllegalArgumentException`` respectively), or the Guava ``Preconditions.check`` method from Java.
|
||||
|
||||
We define new exception types liberally. We prefer not to provide English language error messages in exceptions at
|
||||
the throw site, instead we define new types with any useful information as fields, with a toString() method if
|
||||
really necessary. In other words, don't do this:
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
throw new Exception("The foo broke")
|
||||
|
||||
instead do this
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
class FooBrokenException extends Exception {}
|
||||
throw new FooBrokenException()
|
||||
|
||||
The latter is easier to catch and handle if later necessary, and the type name should explain what went wrong.
|
||||
|
||||
Note that Kotlin does not require exception types to be declared in method prototypes like Java does.
|
||||
|
||||
5. Properties
|
||||
#############
|
||||
|
||||
Where we want a public property to have one super-type in public and another sub-type in private (or internal), perhaps
|
||||
to expose additional methods with a greater level of access to the code within the enclosing class, the style should be:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
class PrivateFoo : PublicFoo
|
||||
|
||||
private val _foo = PrivateFoo()
|
||||
val foo: PublicFoo get() = _foo
|
||||
|
||||
Notably:
|
||||
|
||||
* The public property should have an explicit and more restrictive type, most likely a super class or interface.
|
||||
* The private, backed property should begin with underscore but otherwise have the same name as the public property.
|
||||
The underscore resolves a potential property name clash, and avoids naming such as "privateFoo". If the type or use
|
||||
of the private property is different enough that there is no naming collision, prefer the distinct names without
|
||||
an underscore.
|
||||
* The underscore prefix is not a general pattern for private properties.
|
||||
* The public property should not have an additional backing field but use "get()" to return an appropriate copy of the
|
||||
private field.
|
||||
* The public property should optionally wrap the returned value in an immutable wrapper, such as Guava's immutable
|
||||
collection wrappers, if that is appropriate.
|
||||
* If the code following "get()" is succinct, prefer a one-liner formatting of the public property as above, otherwise
|
||||
put the "get()" on the line below, indented.
|
||||
|
||||
6. Compiler warnings
|
||||
####################
|
||||
|
||||
We do not allow compiler warnings, except in the experimental module where the usual standards do not apply and warnings
|
||||
are suppressed. If a warning exists it should be either fixed or suppressed using @SuppressWarnings and if suppressed
|
||||
there must be an accompanying explanation in the code for why the warning is a false positive.
|
@ -1,154 +0,0 @@
|
||||
Contributing
|
||||
============
|
||||
|
||||
Corda is an open-source project and contributions are welcome. Our contributing philosophy is described in
|
||||
`CONTRIBUTING.md <https://github.com/corda/corda/blob/master/CONTRIBUTING.md>`_. This guide explains the mechanics
|
||||
of contributing to Corda.
|
||||
|
||||
.. contents::
|
||||
|
||||
Identifying an area to contribute
|
||||
---------------------------------
|
||||
There are several ways to identify an area where you can contribute to Corda:
|
||||
|
||||
* Browse issues labelled as ``good first issue`` in the
|
||||
`Corda GitHub Issues <https://github.com/corda/corda/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_
|
||||
|
||||
* Any issue with a ``good first issue`` label is considered ideal for open-source contributions
|
||||
* If there is a feature you would like to add and there isn't a corresponding issue labelled as ``good first issue``,
|
||||
that doesn't mean your contribution isn't welcome. Please reach out on the ``#design`` channel to clarify (see
|
||||
below)
|
||||
|
||||
* Ask in the ``#design`` channel of the `Corda Slack <http://slack.corda.net/>`_
|
||||
|
||||
Making the required changes
|
||||
---------------------------
|
||||
|
||||
1. Create a fork of the master branch of the `Corda repo <https://github.com/corda/corda>`_
|
||||
2. Clone the fork to your local machine
|
||||
3. Make the changes, in accordance with the :doc:`code style guide </codestyle>`
|
||||
|
||||
Things to check
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
Is your error handling up to scratch?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Errors should not leak to the UI. When writing tools intended for end users, like the node or command line tools,
|
||||
remember to add ``try``/``catch`` blocks. Throw meaningful errors. For example, instead of throwing an
|
||||
``OutOfMemoryError``, use the error message to indicate that a file is missing, a network socket was unreachable, etc.
|
||||
Tools should not dump stack traces to the end user.
|
||||
|
||||
Look for API breaks
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We have an automated checker tool that runs as part of our continuous integration pipeline and helps a lot, but it
|
||||
can't catch semantic changes where the behavior of an API changes in ways that might violate app developer expectations.
|
||||
|
||||
Suppress inevitable compiler warnings
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Compiler warnings should have a ``@Suppress`` annotation on them if they're expected and can't be avoided.
|
||||
|
||||
Remove deprecated functionality
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When deprecating functionality, make sure you remove the deprecated uses in the codebase.
|
||||
|
||||
Avoid making formatting changes as you work
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In Kotlin 1.2.20, new style guide rules were implemented. The new Kotlin style guide is significantly more detailed
|
||||
than before and IntelliJ knows how to implement those rules. Re-formatting the codebase creates a lot of diffs that
|
||||
make merging more complicated.
|
||||
|
||||
Things to consider when writing CLI apps
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Set exit codes using ``exitProcess``. Zero means success. Other numbers mean errors. Setting a unique error code
|
||||
(starting from 1) for each thing that can conceivably break makes your tool shell-scripting friendly
|
||||
|
||||
* Do a bit of work to figure out reasonable defaults. Nobody likes having to set a dozen flags before the tool will
|
||||
cooperate
|
||||
|
||||
* Your ``--help`` text or other docs should ideally include examples. Writing examples is also a good way to find out
|
||||
that your program requires a dozen flags to do anything
|
||||
|
||||
* Flags should have sensible defaults
|
||||
|
||||
* Don’t print logging output to the console unless the user requested it via a ``–verbose`` flag (conventionally
|
||||
shortened to ``-v``) or a ``–log-to-console`` flag. Logs should be either suppressed or saved to a text file during
|
||||
normal usage, except for errors, which are always OK to print
|
||||
|
||||
Testing the changes
|
||||
-------------------
|
||||
|
||||
Adding tests
|
||||
^^^^^^^^^^^^
|
||||
Unit tests and integration tests for external API changes must cover Java and Kotlin. For internal API changes these
|
||||
tests can be scaled back to kotlin only.
|
||||
|
||||
Running the tests
|
||||
^^^^^^^^^^^^^^^^^
|
||||
Your changes must pass the tests described :doc:`here </testing>`.
|
||||
|
||||
Manual testing
|
||||
^^^^^^^^^^^^^^
|
||||
Before sending that code for review, spend time poking and prodding the tool and thinking, “Would the experience of
|
||||
using this feature make my mum proud of me?”. Automated tests are not a substitute for dogfooding.
|
||||
|
||||
Building against the master branch
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
You can test your changes against CorDapps defined in other repos by following the instructions
|
||||
:doc:`here </building-against-master>`.
|
||||
|
||||
Running the API scanner
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Your changes must also not break compatibility with existing public API. We have an API scanning tool which runs as part of the build
|
||||
process which can be used to flag up any accidental changes, which is detailed :doc:`here </api-scanner>`.
|
||||
|
||||
|
||||
Updating the docs
|
||||
-----------------
|
||||
|
||||
Any changes to Corda's public API must be documented as follows:
|
||||
|
||||
1. Add comments and javadocs/kdocs. API functions must have javadoc/kdoc comments and sentences must be terminated
|
||||
with a full stop. We also start comments with capital letters, even for inline comments. Where Java APIs have
|
||||
synonyms (e.g. ``%d`` and ``%date``), we prefer the longer form for legibility reasons. You can configure your IDE
|
||||
to highlight these in bright yellow
|
||||
2. Update the relevant `.rst file(s) <https://github.com/corda/corda/tree/master/docs/source>`_
|
||||
3. Include the change in the :doc:`changelog </changelog>` if the change is external and therefore visible to CorDapp
|
||||
developers and/or node operators
|
||||
4. :doc:`Build the docs locally </building-the-docs>`
|
||||
5. Check the built .html files (under ``docs/build/html``) for the modified pages to ensure they render correctly
|
||||
6. If relevant, add a sample. Samples are one of the key ways in which users learn about what the platform can do.
|
||||
If you add a new API or feature and don't update the samples, your work will be much less impactful
|
||||
|
||||
Merging the changes back into Corda
|
||||
-----------------------------------
|
||||
|
||||
1. Create a pull request from your fork to the ``master`` branch of the Corda repo
|
||||
|
||||
2. In the PR comments box:
|
||||
|
||||
* Complete the pull-request checklist:
|
||||
|
||||
* [ ] Have you run the unit, integration and smoke tests as described here? https://docs.corda.net/head/testing.html
|
||||
* [ ] If you added/changed public APIs, did you write/update the JavaDocs?
|
||||
* [ ] If the changes are of interest to application developers, have you added them to the changelog, and potentially
|
||||
release notes?
|
||||
* [ ] If you are contributing for the first time, please read the agreement in CONTRIBUTING.md now and add to this
|
||||
Pull Request that you agree to it.
|
||||
|
||||
* Add a clear description of the purpose of the PR
|
||||
|
||||
* Add the following statement to confirm that your contribution is your own original work: "I hereby certify that my contribution is in accordance with the Developer Certificate of Origin (https://github.com/corda/corda/blob/master/CONTRIBUTING.md#developer-certificate-of-origin)."
|
||||
|
||||
4. Request a review from a member of the Corda platform team via the `#design channel <http://slack.corda.net/>`_
|
||||
|
||||
5. The reviewer will either:
|
||||
|
||||
* Accept and merge your PR
|
||||
* Request that you make further changes. Do this by committing and pushing the changes onto the branch you are PRing
|
||||
into Corda. The PR will be updated automatically
|
@ -16,4 +16,3 @@ Nodes
|
||||
node-administration
|
||||
node-operations-upgrading
|
||||
node-operations-upgrade-cordapps
|
||||
out-of-process-verification
|
@ -1,26 +0,0 @@
|
||||
Corda repo layout
|
||||
=================
|
||||
|
||||
The Corda repository comprises the following folders:
|
||||
|
||||
* **buildSrc** contains necessary gradle plugins to build Corda
|
||||
* **client** contains libraries for connecting to a node, working with it remotely and binding server-side data to
|
||||
JavaFX UI
|
||||
* **confidential-identities** contains experimental support for confidential identities on the ledger
|
||||
* **config** contains logging configurations and the default node configuration file
|
||||
* **core** containing the core Corda libraries such as crypto functions, types for Corda's building blocks: states,
|
||||
contracts, transactions, attachments, etc. and some interfaces for nodes and protocols
|
||||
* **docs** contains the Corda docsite in restructured text format
|
||||
* **experimental** contains platform improvements that are still in the experimental stage
|
||||
* **finance** defines a range of elementary contracts (and associated schemas) and protocols, such as abstract fungible
|
||||
assets, cash, obligation and commercial paper
|
||||
* **gradle** contains the gradle wrapper which you'll use to execute gradle commands
|
||||
* **lib** contains some dependencies
|
||||
* **node** contains the core code of the Corda node (eg: node driver, node services, messaging, persistence)
|
||||
* **node-api** contains data structures shared between the node and the client module, e.g. types sent via RPC
|
||||
* **samples** contains all our Corda demos and code samples
|
||||
* **testing** contains some utilities for unit testing contracts (the contracts testing DSL) and flows (the
|
||||
mock network) implementation
|
||||
* **tools** contains the explorer which is a GUI front-end for Corda, and also the DemoBench which is a GUI tool that
|
||||
allows you to run Corda nodes locally for demonstrations
|
||||
* **webserver** is a servlet container for CorDapps that export HTTP endpoints. This server is an RPC client of the node
|
@ -1,40 +0,0 @@
|
||||

|
||||
|
||||
<a href="https://ci-master.corda.r3cev.com/viewType.html?buildTypeId=CordaEnterprise_Build&tab=buildTypeStatusDiv"><img src="https://ci.corda.r3cev.com/app/rest/builds/buildType:Corda_CordaBuild/statusIcon"/></a>
|
||||
|
||||
# Design Documentation
|
||||
|
||||
This directory should be used to version control Corda design documents.
|
||||
|
||||
These should be written in [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) (a design template is provided for general guidance) and follow the design review process outlined below. It is recommended you use a Markdown editor such as [Typora](https://typora.io/), or an appropriate plugin for your favourite editor (eg. [Sublime Markdown editing theme](http://plaintext-productivity.net/2-04-how-to-set-up-sublime-text-for-markdown-editing.html)).
|
||||
|
||||
## Design Review Process
|
||||
|
||||
Please see the [design review process](design-review-process.md).
|
||||
|
||||
* Feature request submission
|
||||
* High level design
|
||||
* Review / approve gate
|
||||
* Technical design
|
||||
* Review / approve gate
|
||||
* Plan, prototype, implement, QA
|
||||
|
||||
## Design Template
|
||||
|
||||
Please copy this [directory](template) to a new location under `/docs/source/design` (use a meaningful short descriptive directory name) and use the [Design Template](template/design.md) contained within to guide writing your Design Proposal. Whilst the section headings may be treated as placeholders for guidance, you are expected to be able to answer any questions related to pertinent section headings (where relevant to your design) at the design review stage. Use the [Design Decision Template](template/decisions/decision.md) (as many times as needed) to record the pros and cons, and justification of any design decision recommendations where multiple options are available. These should be directly referenced from the *Design Decisions* section of the main design document.
|
||||
|
||||
The design document may be completed in one or two iterations, by completing the following main two sections individually or singularly:
|
||||
|
||||
* High level design
|
||||
Where a feature requirement is specified at a high level, and multiple design solutions are possible, this section should be completed and circulated for review prior to completing the detailed technical design.
|
||||
High level designs will often benefit from a formal meeting and discussion review amongst stakeholders to reach consensus on the preferred way to proceed. The design author will then incorporate all meeting outcome decisions back into a revision for final GitHub PR approval.
|
||||
* Technical design
|
||||
The technical design will consist of implementation specific details which require a deeper understanding of the Corda software stack, such as public API's and services, libraries, and associated middleware infrastructure (messaging,security, database persistence, serialization) used to realize these.
|
||||
Technical designs should lead directly to a GitHub PR review process.
|
||||
|
||||
Once a design is approved using the GitHub PR process, please commit the PR to the GitHub repository with a meaningful version identifier (eg. my super design document - **V1.0**)
|
||||
|
||||
## Design Repository
|
||||
|
||||
All design documents will be version controlled under github under the directory `/docs/source/design`.
|
||||
For designs that relate to Enterprise-only features (and that may contain proprietary IP), these should be stored under the [Enterprise Github repository](https://github.com/corda/enterprise). All other public designs should be stored under the [Open Source Github repository](https://github.com/corda/corda).
|
Before Width: | Height: | Size: 212 KiB |
@ -1,93 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: Business Network Membership control: Node level or CorDapp level?
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
During discussion of [Business Networks](../design.md) document multiple people voiced concerns
|
||||
about Business Network membership for different CorDapps deployed on the Corda Node.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. One set of Business Networks for the whole Node
|
||||
|
||||
The idea is that a Node has knowledge of what Business Networks it is a member of. E.g. a Node will be notified by one or
|
||||
many BN operator node(s) of it's membership. Configurability and management of Membership is the responsibility of the
|
||||
BN Operator node with updates are pushed to member Nodes.
|
||||
In other words, Business Network membership is enforced on the Node level
|
||||
and **all** the CorDapps installed on a node can communicate with **all** the Business Networks node been included into.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Business Network remote communication API will become node level API and it will be a single API to use;
|
||||
2. The change in Business Network composition will be quickly propagated to all the peer nodes via push mechanism.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. A set of CorDapps may need to be split and hosted by multiple Corda Nodes. A member will need to run a separate
|
||||
Corda Node for every Business Network it wants to participate in;
|
||||
|
||||
Deployment of a node may be a big deal as it requires new X.500 name, Node registration through
|
||||
Doorman, separate production process monitoring, etc.
|
||||
|
||||
2. BNO node will have to know about Corda member nodes to push Business Network information to them. Not only this
|
||||
requires a uniform remote API that every node will have to support, but also member nodes' IP addresses
|
||||
(which may change from time to time) should be known to BNO node. This might be problematic as with maximum privacy
|
||||
enforced member node may not be listed on the NetworkMap.
|
||||
|
||||
### 2. Allow CorDapps to specify Business Network name
|
||||
|
||||
The idea is that every CorDapp will be able to specify which Business Network(s) it can work with.
|
||||
Upon Corda Node start-up (or CorDapp re-load) CorDapps will be inspected to establish super-set of Business Networks
|
||||
for the current Node run.
|
||||
After that a call will be made to each of the BNO nodes to inform about Node's IP address such that the node can be
|
||||
communicated with.
|
||||
|
||||
#### Advantages
|
||||
1. Flexibility for different CorDapps to work with multiple Business Network(s) which do not have to be aligned;
|
||||
2. No need for multiple Nodes - a single Node may host many CorDapps which belong to many Business Networks.
|
||||
|
||||
#### Disadvantages
|
||||
1. Difficult to know upfront which set of Business Networks a Corda Node is going to connect to.
|
||||
It is entirely dependant on which CorDapps are installed on the Node.
|
||||
|
||||
This can be mitigated by explicitly white-listing Business Networks in Node configuration, such that only intersection
|
||||
of Business Network name set obtained from CorDapps and Node configuration will be the resulting set of Business Networks
|
||||
a Node can connect to.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
As per meeting held on Fri, 26-Jan-2018 by @vkolomeyko, @josecoll, @mikehearn and @davejh69
|
||||
we agreed that it would make sense for all the new CorDapps written post BN implementation
|
||||
to know which BN they operate on.
|
||||
Such that they will be able to make BN specific membership checks and work with "Additional Information" that may be provided by BNO.
|
||||
"Additional Information" may incorporate BN specific information like Roles (E.g. "Agent" and "Lender")
|
||||
or transaction limits, reference data, etc.
|
||||
We will provide flexibility such that BN designers will be able to hold/distribute information that they feel relevant for their BN.
|
||||
|
||||
All the pre-BN CorDapps will work as before with BN membership enforced on a Node level. Configuration details TBC.
|
||||
So it is not the case of Option #1 vs. Option #2 decision, but a form of hybrid approach.
|
||||
|
||||
|
||||
|
||||
### In terms of addressing BN Privacy requirement:
|
||||
|
||||
Unfortunately, we will have to supply information of **every** node into Global Network Map(GNM) regardless whether it is part of BN or not.
|
||||
This is necessary to protect against non-member on member attack when content of a GNM is used on TLS handshake phase to prevent
|
||||
connection with IP addresses that are not part of Compatibility Zone(CZ).
|
||||
Also GNP will be a key point where certification revocation will be enforced.
|
||||
|
||||
In order to prevent non-members of BN from discovering content of a BN within CZ, when membership check fails, the node is meant to reply as if CorDapp is not installed at all on this node.
|
||||
This will make use cases:
|
||||
- I am not talking to you because you are not part of my BN;
|
||||
|
||||
and
|
||||
|
||||
- I do not know which flow you are talking about;
|
||||
|
||||
un-distinguishable from attacker point of view.
|
||||
|
||||
BN composition will be represented by a set which will be distributed to all the members.
|
||||
Fetching of "Additional Information" will be a separate operation which new style CorDapps may, but not have to, use.
|
@ -1,71 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: Using TLS signing vs. Membership lists for Business Network composition.
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
As per High Level Design document for [Business Networks](../design.md), there has to be a mechanism established
|
||||
for composition of the Business Network.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Use Transport Layer Security (TLS) signing
|
||||
|
||||
The idea is to employ Public/Private keys mechanism and certification path to be able to prove that certain
|
||||
member belongs to a Business Network.
|
||||
Simplified approach can be as follows:
|
||||
1. NodeA wants to perform communication with NodeB on the assumption that they both belong to the same
|
||||
Business Network (BN1);
|
||||
2. During initial handshake each node presents a certificate signed by BNO node confirming that given
|
||||
node indeed belongs to the said BN1;
|
||||
3. Nodes cross-check the certificates to ensure that signature is indeed valid.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Complete de-centralization.
|
||||
Even if BNO node is down, as long as it is public key known - signature can be verified.
|
||||
2. Approach can scale to a great majority of the nodes.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Revocation of membership becomes problematic;
|
||||
This could be mitigated by introducing some form of a "blacklist" or by issuing certificates with expiration. But this will
|
||||
add pressure on BNO node to be more available in order to be able to re-new certificates.
|
||||
2. Privacy requirement will not be achieved.
|
||||
Both NodeA and NodeB will have to advertise themselves on the global Network Map, which might be undesired.
|
||||
3. Cannot produce a list of BN participants;
|
||||
Since BN participation is established after certificate is checked, it is not quite possible to establish
|
||||
composition of the Business Network without talking to **each** node in the whole universe of the Compatibility Zone (CZ).
|
||||
|
||||
This has been discussed with Mike Hearn in great details on [this PR](https://github.com/corda/enterprise/pull/101#pullrequestreview-77476717).
|
||||
|
||||
### 2. Make BNO node maintain membership list for Business Network
|
||||
|
||||
The idea is that BNO node will hold a "golden" copy of Business Network membership list and will vend
|
||||
content to the parties who are entitled to know BN composition.
|
||||
That said, if an outsider makes an enquiry about composition of Business Network, such request is likely
|
||||
to be rejected.
|
||||
|
||||
#### Advantages
|
||||
1. Satisfies all the requirements know so far for Business Network functionality, including:
|
||||
* Joining/leaving Business Network;
|
||||
* Privacy;
|
||||
Privacy is enforced by BNO node that is only going to vend membership information to the parties that need to know.
|
||||
Also member node no longer has to register with global NetworkMap and may register with BNO instead.
|
||||
* Ability to discover Business Network peers;
|
||||
* BNO owner has ultimate control as for how membership information is stored (e.g. DB or CSV file).
|
||||
|
||||
#### Disadvantages
|
||||
1. BNO node gains a critical role and must be highly available for flows to work within Business Network.
|
||||
2. When the Business Network will be expanding BNO node need to be reasonably performant to cope with the load.
|
||||
This can be mitigated by holding local caches of Business Network membership on the Node side to make requests
|
||||
to BNO node less frequent.
|
||||
3. There is no pub-sub facility which would allow a member node to learn about new nodes joining Business Network.
|
||||
But at all times it is possible to approach BNO node and download complete list of current Business Network members.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2 after discussion with Mike Hearn and Richard Brown on [this PR](https://github.com/corda/enterprise/pull/101).
|
||||
The PR was about having Proof of Concept implementation for Business Networks to demonstrate how it might work.
|
@ -1,227 +0,0 @@
|
||||

|
||||
|
||||
# Business Network design
|
||||
|
||||
DOCUMENT MANAGEMENT
|
||||
---
|
||||
|
||||
## Document Control
|
||||
|
||||
| Title | |
|
||||
| -------------------- | ---------------------------------------- |
|
||||
| Date | 14-Dec-2017 |
|
||||
| Authors | David Lee, Viktor Kolomeyko, Jose Coll, Mike Hearn |
|
||||
| Distribution | Design Review Board, Product Management, Services - Technical (Consulting), Platform Delivery |
|
||||
| Corda target version | Enterprise |
|
||||
| JIRA reference | [Sample Business Network implementation](https://r3-cev.atlassian.net/browse/R3NET-546) |
|
||||
|
||||
## Approvals
|
||||
|
||||
#### Document Sign-off
|
||||
|
||||
| Author | |
|
||||
| ----------------- | ---------------------------------------- |
|
||||
| Reviewer(s) | David Lee, Viktor Kolomeyko, Mike Hearn, Jose Coll, Dave Hudson and James Carlyle |
|
||||
| Final approver(s) | Richard G. Brown |
|
||||
|
||||
|
||||
HIGH LEVEL DESIGN
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Business Networks are introduced in order to segregate Corda Nodes that do not need to transact with each other or indeed
|
||||
even know of each other existence.
|
||||
|
||||
The key concept of Business Network is **Business Network Operator ('BNO') node** as a mean for BNOs to serve reference data to members
|
||||
of their business network(s), as required by CorDapp(s) specific to that business network. This includes allowing BNO
|
||||
nodes to vend additional information associated with Business Network participant like roles, transaction limits, etc.
|
||||
|
||||
## Background
|
||||
|
||||
Multiple prospective clients of Corda Connect expressed concerns about privacy of the nodes and CorDapps they are going to run.
|
||||
Ability to discover every node in Compatibility Zone through a global network map was not seen as a good design as it
|
||||
will allow competing firms to have an insight into on-boarding of the new clients, businesses and flows.
|
||||
|
||||
In order to address those privacy concerns Business Networks were introduced as a way to partition nodes into groups
|
||||
based on a need-to-know principle.
|
||||
|
||||
This design document reflects on what was previously discussed on this Confluence page
|
||||
[Business Network Membership](https://r3-cev.atlassian.net/wiki/spaces/CCD/pages/131972363/Business+Network+Membership).
|
||||
|
||||
## Scope
|
||||
|
||||
### Goals
|
||||
|
||||
* Allow Corda Connect participants to create private business networks and allow the owner of the network decide which
|
||||
parties will be included into it;
|
||||
|
||||
* Not to allow parties outside of the Business Network to discover the content of the Business Network;
|
||||
|
||||
* Provide a reference implementation for a BNO node which enables the node operator to perform actions stated above.
|
||||
|
||||
### Non-goals
|
||||
|
||||
* To mandate Business Networks. Business Networks are offered as on optional extra which some of the CorDapps may chose to use;
|
||||
|
||||
* To constrain all BNOs to adopt a single consistent CorDapp, for which the design (flows, persistence etc.) is controlled by R3;
|
||||
|
||||
* To define inclusion/exclusion or authorisation criteria for admitting a node into a Business Network.
|
||||
|
||||
## Timeline
|
||||
|
||||
This is a long-term solution initially aimed to support Project Agent go-live implementation in Q2 2018.
|
||||
|
||||
## Requirements
|
||||
|
||||
See [Identity high-level requirements](https://r3-cev.atlassian.net/wiki/spaces/CCD/pages/131746442/Identity+high-level+requirements)
|
||||
for the full set of requirements discussed to date.
|
||||
|
||||
Given the following roles:
|
||||
|
||||
| Role name | Definition |
|
||||
| --------------------------- | ---------- |
|
||||
| Compatibility Zone Operator | Responsible for administering the overall Compatibility Zone. Usually R3. |
|
||||
| Business Network Operator | Responsible for a specific business network within the Corda Connect Compatibility Zone. |
|
||||
| Node User | Uses a Corda node to carry out transactions on their own behalf. In the case of a self-owned deployment, the Node User can also be the Node Operator. |
|
||||
| Node Operator | Responsible for management of a particular corda node (deployment, configuration etc.) on behalf of the Node User. |
|
||||
|
||||
The following requirements are addressed as follows:
|
||||
|
||||
| Role | Requirement | How design satisfies requirement |
|
||||
| --------------------------- | -------------------------------------------------------------------------------------------------- | -------------------------------- |
|
||||
| Compatibility Zone Operator | To support multiple business networks operating across the CZ. | Current design supports any number of membership lists, each of which defines a business network. |
|
||||
| Compatibility Zone Operator | To permission a Business Network Operator to provision my services to its members. | Individual CordApps will be responsible for checking Business Network membership constrains. |
|
||||
| Compatibility Zone Operator | To revoke the ability for a specific Business Network Operator to operate within my CZ. | The membership lists are configurable and business networks can have their membership lists removed when needed. |
|
||||
| Business Network Operator | To be able to permission nodes to access my Business Network. | The BNO will ensure that the CorDapp is designed to keep each node's copy of the membership list up-to-date on a timely basis as new members are added to the business network. |
|
||||
| Business Network Operator | To revoke access to my business network by any member. | The BNO will ensure that the CorDapp is designed to keep each node's copy of the membership list up-to-date on a timely basis as members are removed from the business network. The BNO will be able to decide, through the timing of the membership list checks in the CorDapp flow design, whether in-flight flows should be exited upon revocation, or simply to avoid starting new flows. |
|
||||
| Business Network Operator | To protect knowledge of the membership of my Business Network from parties who don't need to know. | The BNO node upon receiving an enquiry to deliver membership list will first check identity of the caller against membership list. If the identity does not belong to the membership list the content of the membership will not be served. Optionally, Business Network membership list may be made available for discovery by the nodes outside of membership list. |
|
||||
| Business Network Operator | To help members of my Business Network to discover each other. | The BNO node will be able to serve the full list of membership participants at a minimum and may also have an API to perform the fuzzy matches. |
|
||||
| Node User | To request the termination of my identity within the CZ / business network. | A member may ask a BNO to exclude them from a business network, and it will be in the BNO's commercial interest to do so. They can be obliged to do so reliably under the terms of the R3Net agreement. |
|
||||
| Node Operator | To control the IP address on which my node receives messages in relation to a specific CorDapp. | Nodes may choose to publish different `NodeInfos` to different business networks. Business network-specific CorDapps will ensure that that the `NodeInfo` served by the business network is used for addressing that node. |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Description | Recommendation | Approval |
|
||||
| ---------------------------------------- | --------------- | ----------------------- |
|
||||
| [TLS vs. Membership](decisions/tlsVsMembership.md) | Proceed with Membership | Mike Hearn |
|
||||
| [BN membership: Node vs. CorDapp](decisions/nodeVsCorDapp.md) | Hybrid approach for options #1 and #2 | Mike Hearn |
|
||||
|
||||
* Per [New Network Map](https://r3-cev.atlassian.net/wiki/spaces/AWG/pages/127710793/New+Network+Map):
|
||||
* R3 will serve a file containing NetworkParameters to all members of the Corda Connect Compatibility Zone, made publicly available via CDN.
|
||||
*NetworkParameters are only served by the network map and cannot be overwritten by BNOs*.
|
||||
* The network map file will continue to serve references to downloadable entries
|
||||
(Signed `NodeInfo` file with the node's certificate chain, IP addresses etc. for a given node) for any nodes which wish to be publicly discoverable by all CZ members.
|
||||
|
||||
* BNO nodes expose a range of services to their membership over normal Corda flows (via a custom BNO CorDapp deployed on their BNO node).
|
||||
The BNO is free to define whatever services apply in the context of their business network; these will typically include:
|
||||
* Managing requests to join/leave the business network;
|
||||
* Vending a membership list of distinguished names (DNs) that a given party within the business network is allowed
|
||||
to see / transact with, for use in 'address book' / 'drop-down' type UI functionality.
|
||||
*The structure of the membership list may be tailored according to the needs of the BNO and, according to the needs
|
||||
of the CorDapp, it may either be expressed as a standalone data structure in its own right.*
|
||||
* Vending `AdditionalInformation` for a Business Network Participant which may include roles associated with it, trading limits, etc.
|
||||
|
||||
* For each **Business Network-specific CorDapp**, the CorDapp developer will include features to restrict such that the
|
||||
CorDapp cannot be used to transact with non-members. Namely:
|
||||
* any 'address-book' features in that CorDapp are filtered according to the membership list;
|
||||
* any `InitiatedBy` flow will first check that the initiating party is on the membership list, and throw a `FlowException` if it is not.
|
||||
|
||||
## Target Solution
|
||||
|
||||

|
||||
|
||||
## Complementary solutions
|
||||
|
||||
* No requirement to change the Network Map design currently proposed in
|
||||
[New Network Map](https://r3-cev.atlassian.net/wiki/spaces/AWG/pages/127710793/New+Network+Map).
|
||||
|
||||
* [Cash business network requirements](https://r3-cev.atlassian.net/wiki/spaces/CCD/pages/198443424/Cash+business+network+requirements)
|
||||
|
||||
## Final recommendation
|
||||
|
||||
* Proceed with reference implementation as detailed by [this Jira](https://r3-cev.atlassian.net/browse/R3NET-546).
|
||||
* Provision of mechanisms to bind accessibility of CorDapp flows to named parties on a membership list.
|
||||
|
||||
TECHNICAL DESIGN (this section is WIP)
|
||||
---
|
||||
|
||||
## Interfaces
|
||||
|
||||
* No impact on Public APIs
|
||||
* Internal APIs impacted
|
||||
* Modules impacted
|
||||
|
||||
* Illustrate with Software Component diagrams
|
||||
|
||||
## Functional
|
||||
|
||||
* UI requirements
|
||||
|
||||
* Illustrate with UI Mockups and/or Wireframes
|
||||
|
||||
* (Subsystem) Components descriptions and interactions)
|
||||
|
||||
Consider and list existing impacted components and services within Corda:
|
||||
|
||||
* Doorman
|
||||
* Network Map
|
||||
* Public API's (ServiceHub, RPCOps)
|
||||
* Vault
|
||||
* Notaries
|
||||
* Identity services
|
||||
* Flow framework
|
||||
* Attachments
|
||||
* Core data structures, libraries or utilities
|
||||
* Testing frameworks
|
||||
* Pluggable infrastructure: DBs, Message Brokers, LDAP
|
||||
|
||||
* Data model & serialization impact and changes required
|
||||
|
||||
* Illustrate with ERD diagrams
|
||||
|
||||
* Infrastructure services: persistence (schemas), messaging
|
||||
|
||||
## Non-Functional
|
||||
|
||||
* Performance
|
||||
* Scalability
|
||||
* High Availability
|
||||
|
||||
## Operational
|
||||
|
||||
* Deployment
|
||||
|
||||
* Versioning
|
||||
|
||||
* Maintenance
|
||||
|
||||
* Upgradability, migration
|
||||
|
||||
* Management
|
||||
|
||||
* Audit, alerting, monitoring, backup/recovery, archiving
|
||||
|
||||
## Security
|
||||
|
||||
* Data privacy
|
||||
* Authentication
|
||||
* Access control
|
||||
|
||||
## Software Development Tools and Programming Standards to be adopted.
|
||||
|
||||
* languages
|
||||
* frameworks
|
||||
* 3rd party libraries
|
||||
* architectural / design patterns
|
||||
* supporting tools
|
||||
|
||||
## Testability
|
||||
|
||||
* Unit
|
||||
* Integration
|
||||
* Smoke
|
||||
* Non-functional (performance)
|
||||
|
||||
APPENDICES
|
||||
---
|
@ -1,50 +0,0 @@
|
||||
Design Decision: Certificate hierarchy levels
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
The decision of how many levels to include is a key feature of the [proposed certificate hierarchy](../design.md).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### Option 1: 2-level hierarchy
|
||||
|
||||
Under this option, intermediate CA certificates for key signing services (Doorman, Network Map, CRL) are generated as
|
||||
direct children of the root certificate.
|
||||
|
||||

|
||||
|
||||
#### Advantages
|
||||
|
||||
- Simplest option
|
||||
- Minimal change to existing structure
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
- The Root CA certificate is used to sign both intermediate certificates and CRL. This may be considered as a drawback
|
||||
as the Root CA should be used only to issue other certificates.
|
||||
|
||||
### Option 2: 3-level hierarchy
|
||||
|
||||
Under this option, an additional 'Company CA' cert is generated from the root CA cert, which is then used to generate
|
||||
intermediate certificates.
|
||||
|
||||

|
||||
|
||||
#### Advantages
|
||||
|
||||
- Allows for option to remove the root CA from the network altogether and store in an offline medium - may be preferred by some stakeholders
|
||||
- Allows (theoretical) revocation and replacement of the company CA cert without needing to replace the trust root.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
- Greater complexity
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with option 1: 2-level hierarchy.
|
||||
|
||||
No authoritative argument from a security standpoint has been made which would justify the added complexity of option 2.
|
||||
Given the business impact of revoking the Company CA certificate, this must be considered an extremely unlikely event
|
||||
with comparable implications to the revocation of the root certificate itself; hence no practical justification for the
|
||||
addition of the third level is observed.
|
@ -1,42 +0,0 @@
|
||||
Design Decision: Certificate Hierarchy
|
||||
======================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
This document purpose is to make a decision on the certificate hierarchy. It is necessary to make this decision as it
|
||||
affects development of features (e.g. Certificate Revocation List).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
There are various options in how we structure the hierarchy above the node CA.
|
||||
|
||||
### Option 1: Single trust root
|
||||
|
||||
Under this option, TLS certificates are issued by the node CA certificate.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- Existing design
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
- The Root CA certificate is used to sign both intermediate certificates and CRL. This may be considered as a drawback as the Root CA should be used only to issue other certificates.
|
||||
|
||||
### Option 2: Separate TLS vs. identity trust roots
|
||||
|
||||
This option splits the hierarchy by introducing a separate trust root for TLS certificates.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- Simplifies issuance of TLS certificates (implementation constraints beyond those of other certificates used by Corda - specifically, EdDSA keys are not yet widely supported for TLS certificates)
|
||||
- Avoids requirement to specify accurate usage restrictions on node CA certificates to issue their own TLS certificates
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
- Additional complexity
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with option 1 (Single Trust Root) for current purposes.
|
||||
|
||||
Feasibility of option 2 in the code should be further explored in due course.
|
@ -1,84 +0,0 @@
|
||||
# Certificate hierarchies
|
||||
|
||||
.. important:: This design doc applies to the main Corda network. Other networks may use different certificate hierarchies.
|
||||
|
||||
## Overview
|
||||
|
||||
A certificate hierarchy is proposed to enable effective key management in the context of managing Corda networks.
|
||||
This includes certificate usage for the data signing process and certificate revocation process
|
||||
in case of a key compromise. At the same time, result should remain compliant with
|
||||
[OCSP](https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol) and [RFC 5280](https://www.ietf.org/rfc/rfc5280.txt)
|
||||
|
||||
## Background
|
||||
|
||||
Corda utilises public key cryptography for signing and authentication purposes, and securing communication
|
||||
via TLS. As a result, every entity participating in a Corda network owns one or more cryptographic key pairs {*private,
|
||||
public*}. Integrity and authenticity of an entity's public key is assured using digital certificates following the
|
||||
[X.509 standard](https://tools.ietf.org/html/rfc5280), whereby the receiver’s identity is cryptographically bonded to
|
||||
his or her public key.
|
||||
|
||||
Certificate Revocation List (CRL) functionality interacts with the hierarchy of the certificates, as the revocation list
|
||||
for any given certificate must be signed by the certificate's issuer. Therefore if we have a single doorman CA, the sole
|
||||
CRL for node CA certificates would be maintained by that doorman CA, creating a bottleneck. Further, if that doorman CA
|
||||
is compromised and its certificate revoked by the root certificate, the entire network is invalidated as a consequence.
|
||||
|
||||
The current solution of a single intermediate CA is therefore too simplistic.
|
||||
|
||||
Further, the split and location of intermediate CAs has impact on where long term infrastructure is hosted, as the CRLs
|
||||
for certificates issued by these CAs must be hosted at the same URI for the lifecycle of the issued certificates.
|
||||
|
||||
## Scope
|
||||
|
||||
Goals:
|
||||
|
||||
* Define effective certificate relationships between participants and Corda network services (i.e. nodes, notaries, network map, doorman).
|
||||
* Enable compliance with both [OCSP](https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol) and [RFC 5280](https://www.ietf.org/rfc/rfc5280.txt) (CRL)-based revocation mechanisms
|
||||
* Mitigate relevant security risks (keys being compromised, data privacy loss etc.)
|
||||
|
||||
Non-goals:
|
||||
|
||||
* Define an end-state mechanism for certificate revocation.
|
||||
|
||||
## Requirements
|
||||
|
||||
In case of a private key being compromised, or a certificate incorrectly issued, it must be possible for the issuer to
|
||||
revoke the appropriate certificate(s).
|
||||
|
||||
The solution needs to scale, keeping in mind that the list of revoked certificates from any given certificate authority
|
||||
is likely to grow indefinitely. However for an initial deployment a temporary certificate authority may be used, and
|
||||
given that it will not require to issue certificates in the long term, scaling issues are less of a concern in this
|
||||
context.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
decisions/levels.md
|
||||
decisions/tls-trust-root.md
|
||||
|
||||
## **Target** Solution
|
||||
|
||||

|
||||
|
||||
The design introduces discrete intermediate CAs below the network trust root for each logical service exposed by the doorman - specifically:
|
||||
|
||||
1. Node CA certificate issuance
|
||||
2. Network map signing
|
||||
3. Certificate Revocation List (CRL) signing
|
||||
4. OCSP revocation signing
|
||||
|
||||
The use of discrete certificates in this way facilitates subsequent changes to the model, including retiring and replacing certificates as needed.
|
||||
|
||||
Each of the above certificates will specify a CRL allowing the certificate to be revoked. The root CA operator
|
||||
(primarily R3) will be required to maintain this CRL for the lifetime of the process.
|
||||
|
||||
TLS certificates will remain issued under Node CA certificates (see [decision: TLS trust
|
||||
root](./decisions/tls-trust-root.md)).
|
||||
|
||||
Nodes will be able to specify CRL(s) for TLS certificates they issue; in general, they will be required to such CRLs for
|
||||
the lifecycle of the TLS certificates.
|
||||
|
||||
In the initial state, a single doorman intermediate CA will be used for issuing all node certificates. Further
|
||||
intermediate CAs for issuance of node CA certificates may subsequently be added to the network, where appropriate,
|
||||
potentially split by geographic region or otherwise.
|
Before Width: | Height: | Size: 142 KiB |
Before Width: | Height: | Size: 175 KiB |
Before Width: | Height: | Size: 309 KiB |
Before Width: | Height: | Size: 349 KiB |
Before Width: | Height: | Size: 353 KiB |
@ -1,35 +0,0 @@
|
||||
# Design review process
|
||||
|
||||
The Corda design review process defines a means of collaborating approving Corda design thinking in a consistent,
|
||||
structured, easily accessible and open manner.
|
||||
|
||||
The process has several steps:
|
||||
|
||||
1. High level discussion with the community and developers on corda-dev.
|
||||
2. Writing a design doc and submitting it for review via a PR to this directory. See other design docs and the
|
||||
design doc template (below).
|
||||
3. Respond to feedback on the github discussion.
|
||||
4. You may be invited to a design review board meeting. This is a video conference in which design may be debated in
|
||||
real time. Notes will be sent afterwards to corda-dev.
|
||||
5. When the design is settled it will be approved and can be merged as normal.
|
||||
|
||||
The following diagram illustrates the process flow:
|
||||
|
||||

|
||||
|
||||
At least some of the following people will take part in a DRB meeting:
|
||||
|
||||
* Richard G Brown (CTO)
|
||||
* James Carlyle (Chief Engineer)
|
||||
* Mike Hearn (Lead Platform Engineer)
|
||||
* Mark Oldfield (Lead Platform Architect)
|
||||
* Jonathan Sartin (Information Security manager)
|
||||
* Select external key contributors (directly involved in design process)
|
||||
|
||||
The Corda Technical Advisory Committee may also be asked to review a design.
|
||||
|
||||
Here's the outline of the design doc template:
|
||||
|
||||
.. toctree::
|
||||
|
||||
template/design.md
|
Before Width: | Height: | Size: 150 KiB |
@ -1,28 +0,0 @@
|
||||
**Design review functionality checklist**
|
||||
|
||||
Does the design impact performance?
|
||||
|
||||
Does the design impact availability/disaster recovery?
|
||||
|
||||
Does the design impact operability (monitoring or management)?
|
||||
|
||||
Does the design impact security?
|
||||
|
||||
Does the design impact privacy of data on the ledger?
|
||||
|
||||
Does the design break API stability?
|
||||
|
||||
Does the design break wire stability?
|
||||
|
||||
Does the design break binary compatibility for OS CorDapps with Enterprise?
|
||||
|
||||
Does the design introduce any new dependencies on 3rd party libraries?
|
||||
|
||||
Does the design work with a mixed network of OS and Enterprise Corda nodes?
|
||||
|
||||
Does the design imply a change in deployment architecture/configuration?
|
||||
|
||||
Does the design introduce any potentially patentable IP?
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 85 KiB |
@ -1,118 +0,0 @@
|
||||
# Failure detection and master election
|
||||
|
||||
.. important:: This design document describes a feature of Corda Enterprise.
|
||||
|
||||
## Background
|
||||
|
||||
Two key issues need to be resolved before Hot-Warm can be implemented:
|
||||
|
||||
* Automatic failure detection (currently our Hot-Cold set-up requires a human observer to detect a failed node)
|
||||
* Master election and node activation (currently done manually)
|
||||
|
||||
This document proposes two solutions to the above mentioned issues. The strengths and drawbacks of each solution are explored.
|
||||
|
||||
## Constraints/Requirements
|
||||
|
||||
Typical modern HA environments rely on a majority quorum of the cluster to be alive and operating normally in order to
|
||||
service requests. This means:
|
||||
|
||||
* A cluster of 1 replica can tolerate 0 failures
|
||||
* A cluster of 2 replicas can tolerate 0 failures
|
||||
* A cluster of 3 replicas can tolerate 1 failure
|
||||
* A cluster of 4 replicas can tolerate 1 failure
|
||||
* A cluster of 5 replicas can tolerate 2 failures
|
||||
|
||||
This already poses a challenge to us as clients will most likely want to deploy the minimum possible number of R3 Corda
|
||||
nodes. Ideally that minimum would be 3 but a solution for only 2 nodes should be available (even if it provides a lesser
|
||||
degree of HA than 3, 5 or more nodes). The problem with having only two nodes in the cluster is there is no distinction
|
||||
between failure and network partition.
|
||||
|
||||
Users should be allowed to set a preference for which node to be active in a hot-warm environment. This would probably
|
||||
be done with the help of a property(persisted in the DB in order to be changed on the fly). This is an important
|
||||
functionality as users might want to have the active node on better hardware and switch to the back-ups and back as soon
|
||||
as possible.
|
||||
|
||||
It would also be helpful for the chosen solution to not add deployment complexity.
|
||||
|
||||
## Design decisions
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
drb-meeting-20180131.md
|
||||
|
||||
## Proposed solutions
|
||||
|
||||
Based on what is needed for Hot-Warm, 1 active node and at least one passive node (started but in stand-by mode), and
|
||||
the constraints identified above (automatic failover with at least 2 nodes and master preference), two frameworks have
|
||||
been explored: Zookeeper and Atomix. Neither apply to our use cases perfectly and require some tinkering to solve our
|
||||
issues, especially the preferred master election.
|
||||
|
||||
### Zookeeper
|
||||
|
||||

|
||||
|
||||
Preferred leader election - while the default algorithm does not take into account a leader preference, a custom
|
||||
algorithm can be implemented to suit our needs.
|
||||
|
||||
Environment with 2 nodes - while this type of set-up can't distinguish between a node failure and network partition, a
|
||||
workaround can be implemented by having 2 nodes and 3 zookeeper instances(3rd would be needed to form a majority).
|
||||
|
||||
Pros:
|
||||
- Very well documented
|
||||
- Widely used, hence a lot of cookbooks, recipes and solutions to all sorts of problems
|
||||
- Supports custom leader election
|
||||
|
||||
Cons:
|
||||
- Added deployment complexity
|
||||
- Bootstrapping a cluster is not very straightforward
|
||||
- Too complex for our needs?
|
||||
|
||||
### Atomix
|
||||
|
||||

|
||||
|
||||
Preferred leader election - cannot be implemented easily; a creative solution would be required.
|
||||
|
||||
Environment with 2 nodes - using only embedded replicas, there's no solution; Atomix comes also as a standalone server
|
||||
which could be run outside the node as a 3rd entity to allow a quorum(see image above).
|
||||
|
||||
Pros:
|
||||
- Easy to get started with
|
||||
- Embedded, no added deployment complexity
|
||||
- Already used partially (Atomix Catalyst) in the notary cluster
|
||||
|
||||
Cons:
|
||||
- Not as popular as Zookeeper, less used
|
||||
- Documentation is underwhelming; no proper usage examples
|
||||
- No easy way of influencing leader election; will require some creative use of Atomix functionality either via distributed groups or other resources
|
||||
|
||||
## Recommendations
|
||||
|
||||
If Zookeeper is chosen, we would need to look into a solution for easy configuration and deployment (maybe docker
|
||||
images). Custom leader election can be implemented by following one of the
|
||||
[examples](https://github.com/SainTechnologySolutions/allprogrammingtutorials/tree/master/apache-zookeeper/leader-election)
|
||||
available online.
|
||||
|
||||
If Atomix is chosen, a solution to enforce some sort of preferred leader needs to found. One way to do it would be to
|
||||
have the Corda cluster leader be a separate entity from the Atomix cluster leader. Implementing the election would then
|
||||
be done using the distributed resources made available by the framework.
|
||||
|
||||
## Conclusions
|
||||
|
||||
Whichever solution is chosen, using 2 nodes in a Hot-Warm environment is not ideal. A minimum of 3 is required to ensure proper failover.
|
||||
|
||||
Almost every configuration option that these frameworks offer should be exposed through node.conf.
|
||||
|
||||
We've looked into using Galera which is currently used for the notary cluster for storing the committed state hashes. It
|
||||
offers multi-master read/write and certification-based replication which is not leader based. It could be used to
|
||||
implement automatic failure detection and master election(similar to our current mutual exclusion).However, we found
|
||||
that it doesn't suit our needs because:
|
||||
|
||||
- it adds to deployment complexity
|
||||
- usable only with MySQL and InnoDB storage engine
|
||||
- we'd have to implement node failure detection and master election from scratch; in this regard both Atomix and Zookeeper are better suited
|
||||
|
||||
Our preference would be Zookeeper despite not being as lightweight and deployment-friendly as Atomix. The wide spread
|
||||
use, proper documentation and flexibility to use it not only for automatic failover and master election but also
|
||||
configuration management(something we might consider moving forward) makes it a better fit for our needs.
|
@ -1,104 +0,0 @@
|
||||
# Design Review Board Meeting Minutes
|
||||
|
||||
**Date / Time:** Jan 31 2018, 11.00
|
||||
|
||||
## Attendees
|
||||
|
||||
- Matthew Nesbit (MN)
|
||||
- Bogdan Paunescu (BP)
|
||||
- James Carlyle (JC)
|
||||
- Mike Hearn (MH)
|
||||
- Wawrzyniec Niewodniczanski (WN)
|
||||
- Jonathan Sartin (JS)
|
||||
- Gavin Thomas (GT)
|
||||
|
||||
|
||||
## **Decision**
|
||||
|
||||
Proceed with recommendation to use Zookeeper as the master selection solution
|
||||
|
||||
|
||||
## **Primary Requirement of Design**
|
||||
|
||||
- Client can run just 2 nodes, master and slave
|
||||
- Current deployment model to not change significantly
|
||||
- Prioritised mastering or be able to automatically elect a master. Useful to allow clients to do rolling upgrades, or for use when a high spec machine is used for master
|
||||
- Nice to have: use for flow sharding and soft locking
|
||||
|
||||
## **Minutes**
|
||||
|
||||
MN presented a high level summary of the options:
|
||||
- Galera:
|
||||
- Negative: does not have leader election and failover capability.
|
||||
|
||||
- Atomix IO:
|
||||
- Positive: does integrate into node easily, can setup ports
|
||||
- Negative: requires min 3 nodes, cannot manipulate election e.g. drop the master rolling deployments / upgrades, cannot select the 'beefy' host for master where cost efficiencies have been used for the slave / DR, young library and has limited functionality, poor documentation and examples
|
||||
|
||||
- Zookeeper (recommended option): industry standard widely used and trusted. May be able to leverage clients' incumbent Zookeeper infrastructure
|
||||
- Positive: has flexibility for storage and a potential for future proofing; good permissioning capabilities; standalone cluster of Zookeeper servers allows 2 nodes solution rather than 3
|
||||
- Negative: adds deployment complexity due to need for Zookeeper cluster split across data centers
|
||||
Wrapper library choice for Zookeeper requires some analysis
|
||||
|
||||
|
||||
MH: predictable source of API for RAFT implementations and Zookeeper compared to Atomix. Be better to have master
|
||||
selector implemented as an abstraction
|
||||
|
||||
MH: hybrid approach possible - 3rd node for oversight, i.e. 2 embedded in the node, 3rd is an observer. Zookeeper can
|
||||
have one node in primary data centre, one in secondary data centre and 3rd as tie-breaker
|
||||
|
||||
WN: why are we concerned about cost of 3 machines? MN: we're seeing / hearing clients wanting to run many nodes on one
|
||||
VM. Zookeeper is good for this since 1 Zookepper cluster can serve 100+ nodes
|
||||
|
||||
MH: terminology clarification required: what holds the master lock? Ideally would be good to see design thinking around
|
||||
split node and which bits need HA. MB: as a long term vision, ideally have 1 database for many IDs and the flows for
|
||||
those IDs are load balanced. Regarding services internally to node being suspended, this is being investigated.
|
||||
|
||||
MH: regarding auto failover, in the event a database has its own perception of master and slave, how is this handled?
|
||||
Failure detector will need to grow or have local only schedule to confirm it is processing everything including
|
||||
connectivity between database and bus, i.e. implement a 'healthiness' concept
|
||||
|
||||
MH: can you get into a situation where the node fails over but the database does not, but database traffic continues to
|
||||
be sent to down node? MB: database will go offline leading to an all-stop event.
|
||||
|
||||
MH: can you have master affinity between node and database? MH: need watchdog / heartbeat solutions to confirm state of
|
||||
all components
|
||||
|
||||
JC: how long will this solution live? MB: will work for hot / hot flow sharding, multiple flow workers and soft locks,
|
||||
then this is long term solution. Service abstraction will be used so we are not wedded to Zookeeper however the
|
||||
abstraction work can be done later
|
||||
|
||||
JC: does the implementation with Zookeeper have an impact on whether cloud or physical deployments are used? MB: its an
|
||||
internal component, not part of the larger Corda network therefore can be either. For the customer they will have to
|
||||
deploy a separate Zookeeper solution, but this is the same for Atomix.
|
||||
|
||||
WN: where Corda as a service is being deployed with many nodes in the cloud. Zookeeper will be better suited to big
|
||||
providers.
|
||||
|
||||
WN: concern is the customer expects to get everything on a plate, therefore will need to be educated on how to implement
|
||||
Zookeeper, but this is the same for other master selection solutions.
|
||||
|
||||
JC: is it possible to launch R3 Corda with a button on Azure marketplace to commission a Zookeeper? Yes, if we can
|
||||
resource it. But expectation is Zookeeper will be used by well-informed clients / implementers so one-click option is
|
||||
less relevant.
|
||||
|
||||
MH: how does failover work with HSMs?
|
||||
|
||||
MN: can replicate realm so failover is trivial
|
||||
|
||||
JC: how do we document Enterprise features? Publish design docs? Enterprise fact sheets? R3 Corda marketing material?
|
||||
Clear separation of documentation is required. GT: this is already achieved by having docs.corda.net for open source
|
||||
Corda and docs.corda.r3.com for enterprise R3 Corda
|
||||
|
||||
|
||||
### Next Steps
|
||||
|
||||
MN proposed the following steps:
|
||||
|
||||
1) Determine who has experience in the team to help select wrapper library
|
||||
2) Build container with Zookeeper for development
|
||||
3) Demo hot / cold with current R3 Corda Dev Preview release (writing a guide)
|
||||
4) Turn nodes passive or active
|
||||
5) Leader election
|
||||
6) Failure detection and tooling
|
||||
7) Edge case testing
|
Before Width: | Height: | Size: 100 KiB |
Before Width: | Height: | Size: 119 KiB |
@ -1,147 +0,0 @@
|
||||
# Design Review Board Meeting Minutes
|
||||
|
||||
**Date / Time:** 16/11/2017, 14:00
|
||||
|
||||
## Attendees
|
||||
|
||||
- Mark Oldfield (MO)
|
||||
- Matthew Nesbit (MN)
|
||||
- Richard Gendal Brown (RGB)
|
||||
- James Carlyle (JC)
|
||||
- Mike Hearn (MH)
|
||||
- Jose Coll (JoC)
|
||||
- Rick Parker (RP)
|
||||
- Andrey Bozhko (AB)
|
||||
- Dave Hudson (DH)
|
||||
- Nick Arini (NA)
|
||||
- Ben Abineri (BA)
|
||||
- Jonathan Sartin (JS)
|
||||
- David Lee (DL)
|
||||
|
||||
## Minutes
|
||||
|
||||
MO opened the meeting, outlining the agenda and meeting review process, and clarifying that consensus on each design decision would be sought from RGB, JC and MH.
|
||||
|
||||
MO set out ground rules for the meeting. RGB asked everyone to confirm they had read both documents; all present confirmed.
|
||||
|
||||
MN outlined the motivation for a Float as responding to organisation’s expectation for a‘fire break’ protocol termination in the DMZ where manipulation and operation can be checked and monitored.
|
||||
|
||||
The meeting was briefly interrupted by technical difficulties with the GoToMeeting conferencing system.
|
||||
|
||||
MN continued to outline how the design was constrained by expected DMZ rules and influenced by currently perceived client expectations – e.g. making the float unidirectional. He gave a prelude to certain design decisions e.g. the use ofAMQP from the outset.
|
||||
|
||||
MN went onto describe the target solution in detail, covering the handling of both inbound and outbound connections. He highlighted implicit overlaps with the HA design – clustering support, queue names etc., and clarified that the local broker was not required to use AMQP.
|
||||
|
||||
### [TLS termination](./ssl-termination.md)
|
||||
|
||||
JC questioned where the TLS connection would terminate. MN outlined the pros and cons of termination on firewall vs. float, highlighting the consequence of float termination that access by the float to the to the private key was required, and that mechanisms may be needed to store that key securely.
|
||||
|
||||
MH contended that the need to propagate TLS headers etc. through to the node (for reinforcing identity checks etc.) implied a need to terminate on the float. MN agreed but noted that in practice the current node design did not make much use of that feature.
|
||||
|
||||
JC questioned how users would provision a TLS cert on a firewall – MN confirmed users would be able to do this themselves and were typically familiar with doing so.
|
||||
|
||||
RGB highlighted the distinction between the signing key for the TLS vs. identity certificates, and that this needed to be made clear to users. MN agreed that TLS private keys could be argued to be less critical from a security perspective, particularly when revocation was enabled.
|
||||
|
||||
MH noted potential to issue sub-certs with key usage flags as an additional mitigating feature.
|
||||
|
||||
RGB queried at what point in the flow a message would be regarded as trusted. MN set an expectation that the float would apply basic checks (e.g. stopping a connection talking on other topics etc.) but that subsequent sanitisation should happen in internal trusted portion.
|
||||
|
||||
RGB questioned whether the TLS key on the float could be re-used on the bridge to enable wrapped messages to be forwarded in an encrypted form – session migration. MH and MN maintained TLS forwarding could not work in that way, and this would not allow the ‘fire break’ requirement to inspect packets.
|
||||
|
||||
RGB concluded the bridge must effectively trust the firewall or bridge on the origin of incoming messages. MN raised the possibility of SASL verification,but noted objections by MH (clumsy because of multiple handshakes etc.).
|
||||
|
||||
JC queried whether SASL would allow passing of identity and hence termination at the firewall;MN confirmed this.
|
||||
|
||||
MH contented that the TLS implementation was specific to Corda in several ways which may challenge implementation using firewalls, and that typical firewalls(using old OpenSSL etc.) were probably not more secure than R3’s own solutions. RGB pointed out that the design was ultimately driven by client perception of security (MN: “security theatre”) rather than objective assessment. MH added that implementations would be firewall-specific and not all devices would support forwarding, support for AMQP etc.
|
||||
|
||||
RGB proposed messaging to clients that the option existed to terminate on the firewall if it supported the relevant requirements.
|
||||
|
||||
MN re-raised the question of key management. RGB asked about the risk implied from the threat of a compromised float. MN said an attacker who compromised a float could establish TLS connections in the name of the compromised party, and could inspect and alter packets including readable business data (assuming AMQP serialisation). MH gave an example of a MITM attack where an attacker could swap in their own single-use key allowing them to gain control of (e.g.) a cash asset; the TLS layer is the only current protection against that.
|
||||
|
||||
RGB queried whether messages could be signed by senders. MN raised potential threat of traffic analysis, and stated E2E encryption was definitely possible but not for March-April.
|
||||
|
||||
MH viewed the use-case for extra encryption as the consumer/SME market, where users would want to upload/download messages from a mailbox without needing to trust it –not the target market yet. MH maintained TLS really strong and that assuming compromise of float was not conceptually different from compromise of another device e.g. the firewall. MN confirmed that use of an HSM would generally require signing on the HSM device for every session; MH observed this could bea bottleneck in the scenario of a restored node seeking to re-establish a large number of connections. It was observed that the float would still need access to a key provisioning access to the HSM, so this did not materially improve the security in a compromised float scenario.
|
||||
|
||||
MH advised against offering clients support for their own firewall since it would likely require R3 effort to test support and help with customisations.
|
||||
|
||||
MN described option 2b to tunnel through to the internal trusted portion of the float over a connection initiated from inside the internal network in order for the key to be loaded into memory at run-time; this would require a bit more code.
|
||||
|
||||
MH advocated option 2c - just to accept risk and store on file system – on the basis of time constraints, maintaining that TLS handshakes are complicated to code and hard to proxy. MH suggested upgrading to 2b or 2a later if needed. MH described how keys were managed at Google.
|
||||
|
||||
**DECISION CONFIRMED**: Accept option 2b - Terminate on float, inject key from internal portion of the float (RGB, JC, MH agreed)
|
||||
|
||||
### [E2E encryption](./e2e-encryption.md)
|
||||
|
||||
DH proposed that E2E encryption would be much better but conceded the time limitations and agreed that the threat scenario of a compromised DMZ device was the same under the proposed options. MN agreed.
|
||||
|
||||
MN argued for a placeholder vs. ignoring or scheduling work to build e2e encryption now. MH agreed, seeking more detailed proposals on what the placeholder was and how it would be used.
|
||||
|
||||
MH queried whether e2e encryption would be done at the app level rather than the AMQP level, raising questions what would happen on non-supporting nodes etc.
|
||||
|
||||
MN highlighted the link to AMQP serialisation work being done.
|
||||
|
||||
**DECISION CONFIRMED:** Add placeholder, subject to more detailed design proposal (RGB, JC, MH agreed)
|
||||
|
||||
### [AMQP vs. custom protocol](./p2p-protocol.md)
|
||||
|
||||
MN described alternative options involving onion-routing etc.
|
||||
|
||||
JoC questioned whether this would also allow support for load balancing; MN advised this would be too much change in direction in practice.
|
||||
|
||||
MH outlined his original reasoning for AMQP (lots of e.g. manageability features, not all of which would be needed at the outset but possibly in future) vs. other options e.g. MQTT.
|
||||
|
||||
MO questioned whether the broker would imply performance limitations.
|
||||
|
||||
RGB argued there were two separate concerns: Carrying messages from float to bridge and then bridge to node, with separate design options.
|
||||
|
||||
JC proposed the decision could be deferred until later. MN pointed out changing the protocol would compromise wire stability.
|
||||
|
||||
MH advocated sticking with AMQP for now and implementing a custom protocol later with suitable backwards-compatibility features when needed.
|
||||
|
||||
RGB queried whether full AMQP implementation should be done in this phase. MN provided explanation.
|
||||
|
||||
**DECISION CONFIRMED:** Continue to use AMQP (RGB, JC, MH agreed)
|
||||
|
||||
### [Pluggable broker prioritisation](./pluggable-broker.md)
|
||||
|
||||
MN outlined arguments for deferring pluggable brokers, whilst describing how he’d go about implementing the functionality. MH agreed with prioritisation for later.
|
||||
|
||||
JC queried whether broker providers could be asked to deliver the feature. AB mentioned that Solace seemed keen on working with R3 and could possibly be utilised. MH was sceptical, arguing that R3 resource would still be needed to support.
|
||||
|
||||
JoC noted a distinction in scope for P2P and/or RPC.
|
||||
|
||||
There was discussion of replacing the core protocol with JMS + plugins. RGB drew focus to the question of when to do so, rather than how.
|
||||
|
||||
AB noted Solace have functionality with conceptual similarities to the float, and questioned to what degree the float could be considered non-core technology. MH argued the nature of Corda as a P2P network made the float pretty core to avoiding dedicated network infrastructure.
|
||||
|
||||
**DECISION CONFIRMED:** Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed)
|
||||
|
||||
### Inbound only vs. inbound & outbound connections
|
||||
|
||||
DL sought confirmation that the group was happy with the float to act as a Listener only.MN repeated the explanation of how outbound connections would be initiated through a SOCKS 4/5 proxy. No objections were raised.
|
||||
|
||||
### Overall design and implementation plan
|
||||
|
||||
MH requested more detailed proposals going forward on:
|
||||
|
||||
1) To what degree logs from different components need to be integrated (consensus was no requirement at this stage)
|
||||
|
||||
2) Bridge control protocols.
|
||||
|
||||
3) Scalability of hashing network map entries to a queue names
|
||||
|
||||
4) Node admins' user experience – MH argued for documenting this in advance to validate design
|
||||
|
||||
5) Behaviour following termination of a remote node (retry frequency, back-off etc.)?
|
||||
|
||||
6) Impact on standalone nodes (no float)?
|
||||
|
||||
JC noted an R3 obligation with Microsoft to support AMQP-compliant Azure messaging,. MN confirmed support for pluggable brokers should cover that.
|
||||
|
||||
JC argued for documentation of procedures to be the next step as it is needed for the Project Agent Pilot phase. MH proposed sharing the advance documentation.
|
||||
|
||||
JoC questioned whether the Bridge Manager locked the design to Artemis? MO highlighted the transitional elements of the design.
|
||||
|
||||
RGB questioned the rationale for moving the broker out of the node. MN provided clarification.
|
||||
|
||||
**DECISION CONFIRMED**: Design to proceed as discussed (RGB, JC, MH agreed)
|
@ -1,55 +0,0 @@
|
||||
# Design Decision: End-to-end encryption
|
||||
|
||||
## Background / Context
|
||||
|
||||
End-to-end encryption is a desirable potential design feature for the [float](../design.md).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. No end-to-end encryption
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Least effort
|
||||
2. Easier to fault find and manage
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. With no placeholder, it is very hard to add support later and maintain wire stability.
|
||||
2. May not get past security reviews of Float.
|
||||
|
||||
### 2. Placeholder only
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Allows wire stability when we have agreed an encrypted approach
|
||||
2. Shows that we are serious about security, even if this isn’t available yet.
|
||||
3. Allows later encrypted version to be an enterprise feature that can interoperate with OS versions.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Doesn’t actually provide E2E, or define what an encrypted payload looks like.
|
||||
2. Doesn’t address any crypto features that target protecting the AMQP headers.
|
||||
|
||||
### 3. Implement end-to-end encryption
|
||||
|
||||
1. Will protect the sensitive data fully.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Lots of work.
|
||||
2. Difficult to get right.
|
||||
3. Re-inventing TLS.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2: Placeholder
|
||||
|
||||
## Decision taken
|
||||
|
||||
Proceed with Option 2 - Add placeholder, subject to more detailed design proposal (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
||||
|
@ -1,75 +0,0 @@
|
||||
# Design Decision: P2P Messaging Protocol
|
||||
|
||||
## Background / Context
|
||||
|
||||
Corda requires messages to be exchanged between nodes via a well-defined protocol.
|
||||
|
||||
Determining this protocol is a critical upstream dependency for the design of key messaging components including the [float](../design.md).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Use AMQP
|
||||
|
||||
Under this option, P2P messaging will follow the [Advanced Message Queuing Protocol](https://www.amqp.org/).
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. As we have described in our marketing materials.
|
||||
2. Well-defined standard.
|
||||
3. Support for packet level flow control and explicit delivery acknowledgement.
|
||||
4. Will allow eventual swap out of Artemis for other brokers.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. AMQP is a complex protocol with many layered state machines, for which it may prove hard to verify security properties.
|
||||
2. No support for secure MAC in packets frames.
|
||||
3. No defined encryption mode beyond creating custom payload encryption and custom headers.
|
||||
4. No standardised support for queue creation/enumeration, or deletion.
|
||||
5. Use of broker durable queues and autonomous bridge transfers does not align with checkpoint timing, so that independent replication of the DB and Artemis data risks causing problems. (Writing to the DB doesn’t work currently and is probably also slow).
|
||||
|
||||
### 2. Develop a custom protocol
|
||||
|
||||
This option would discard existing Artemis server/AMQP support for peer-to-peer communications in favour of a custom
|
||||
implementation of the Corda MessagingService, which takes direct responsibility for message retries and stores the
|
||||
pending messages into the node's database. The wire level of this service would be built on top of a fully encrypted MIX
|
||||
network which would not require a fully connected graph, but rather send messages on randomly selected paths over the
|
||||
dynamically managed network graph topology.
|
||||
|
||||
Packet format would likely use the [SPHINX packet format](http://www0.cs.ucl.ac.uk/staff/G.Danezis/papers/sphinx-eprint.pdf) although with the body encryption updated to
|
||||
a modern AEAD scheme as in https://www.cs.ru.nl/~bmennink/pubs/16cans.pdf . In this scheme, nodes would be identified in
|
||||
the overlay network solely by Curve25519 public key addresses and floats would be dumb nodes that only run the MIX
|
||||
network code and don't act as message sources, or sinks. Intermediate traffic would not be readable except by the
|
||||
intended waypoint and only the final node can read the payload.
|
||||
|
||||
Point to point links would be standard TLS and the network certificates would be whatever is acceptable to the host
|
||||
institutions e.g. standard Verisign certs. It is assumed institutions would select partners to connect to that they
|
||||
trust and permission them individually in their firewalls. Inside the MIX network the nodes would be connected mostly in
|
||||
a static way and use standard HELLO packets to determine the liveness of neighbour routes, then use tunnelled gossip to
|
||||
distribute the signed/versioned Link topology messages. Nodes will also be allowed to advertise a public IP, so some
|
||||
dynamic links and publicly visible nodes would exist. Network map addresses would then be mappings from Legal Identity
|
||||
to these overlay network addresses, not to physical network locations.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Can be defined with very small message surface area that is amenable to security analysis.
|
||||
2. Packet formats can follow best practice cryptography from the start and be matched to Corda’s needs.
|
||||
3. Doesn’t require a complete graph structure for network if we have intermediate routing.
|
||||
4. More closely aligns checkpointing and message delivery handling at the application level.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Inconsistent with previous design statements published to external stakeholders.
|
||||
2. Effort implications - starting from scratch
|
||||
3. Technical complexity in developing a P2P protocols which is attack tolerant.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 1
|
||||
|
||||
## Decision taken
|
||||
|
||||
Proceed with Option 1 - Continue to use AMQP (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,62 +0,0 @@
|
||||
# Design Decision: Pluggable Broker prioritisation
|
||||
|
||||
## Background / Context
|
||||
|
||||
A decision on when to prioritise implementation of a pluggable broker has implications for delivery of key messaging
|
||||
components including the [float](../design.md).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Deliver pluggable brokers now
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Meshes with business opportunities from HPE and Solace Systems.
|
||||
2. Would allow us to interface to existing Bank middleware.
|
||||
3. Would allow us to switch away from Artemis if we need higher performance.
|
||||
4. Makes our AMQP story stronger.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. More up-front work.
|
||||
2. Might slow us down on other priorities.
|
||||
|
||||
### 2. Defer development of pluggable brokers until later
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Still gets us where we want to go, just later.
|
||||
2. Work can be progressed as resource is available, rather than right now.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Have to take care that we have sufficient abstractions that things like CORE connections can be replaced later.
|
||||
2. Leaves HPE and Solace hanging even longer.
|
||||
|
||||
|
||||
### 3. Never enable pluggable brokers
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. What we already have.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Ties us to ArtemisMQ development speed.
|
||||
|
||||
2. Not good for our relationship with HPE and Solace.
|
||||
|
||||
3. Probably limits our maximum messaging performance longer term.
|
||||
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2 (defer development of pluggable brokers until later)
|
||||
|
||||
## Decision taken
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
||||
|
||||
Proceed with Option 2 - Defer support for pluggable brokers until later, except in the event that a requirement to do so emerges from higher priority float / HA work. (RGB, JC, MH agreed)
|
@ -1,91 +0,0 @@
|
||||
# Design Decision: TLS termination point
|
||||
|
||||
## Background / Context
|
||||
|
||||
Design of the [float](../design.md) is critically influenced by the decision of where TLS connections to the node should
|
||||
be terminated.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Terminate TLS on Firewall
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Common practice for DMZ web solutions, often with an HSM associated with the Firewall and should be familiar for banks to setup.
|
||||
2. Doesn’t expose our private key in the less trusted DMZ context.
|
||||
3. Bugs in the firewall TLS engine will be patched frequently.
|
||||
4. The DMZ float server would only require a self-signed certificate/private key to enable secure communications, so theft of this key has no impact beyond the compromised machine.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. May limit cryptography options to RSA, and prevent checking of X500 names (only the root certificate checked) - Corda certificates are not totally standard.
|
||||
2. Doesn’t allow identification of the message source.
|
||||
3. May require additional work and SASL support code to validate the ultimate origin of connections in the float.
|
||||
|
||||
#### Variant option 1a: Include SASL connection checking
|
||||
|
||||
##### Advantages
|
||||
|
||||
1. Maintain authentication support
|
||||
2. Can authenticate against keys held internally e.g. Legal Identity not just TLS.
|
||||
|
||||
##### Disadvantages
|
||||
|
||||
1. More work than the do-nothing approach
|
||||
2. More protocol to design for sending across the inner firewall.
|
||||
|
||||
### 2. Direct TLS Termination onto Float
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Validate our PKI certificates directly ourselves.
|
||||
2. Allow messages to be reliably tagged with source.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. We don’t currently use the identity to check incoming packets, only for connection authentication anyway.
|
||||
2. Management of Private Key a challenge requiring extra work and security implications. Options for this are presented below.
|
||||
|
||||
#### Variant Option 2a: Float TLS certificate via direct HSM
|
||||
|
||||
##### Advantages
|
||||
|
||||
1. Key can’t be stolen (only access to signing operations)
|
||||
2. Audit trail of signings.
|
||||
|
||||
##### Disadvantages
|
||||
|
||||
1. Accessing HSM from DMZ probably not allowed.
|
||||
2. Breaks the inbound-connection-only rule of modern DMZ.
|
||||
|
||||
#### Variant Option 2b: Tunnel signing requests to bridge manager
|
||||
|
||||
##### Advantages
|
||||
|
||||
1. No new connections involved from Float box.
|
||||
2. No access to actual private key from DMZ.
|
||||
|
||||
##### Disadvantages
|
||||
|
||||
1. Requires implementation of a message protocol, in addition to a key provider that can be passed to the standard SSLEngine, but proxies signing requests.
|
||||
|
||||
#### Variant Option 2c: Store key on local file system
|
||||
|
||||
##### Advantages
|
||||
|
||||
1. Simple with minimal extra code required.
|
||||
2. Delegates access control to bank’s own systems.
|
||||
3. Risks losing only the TLS private key, which can easily be revoked. This isn’t the legal identity key at all.
|
||||
|
||||
##### Disadvantages
|
||||
|
||||
1. Risks losing the TLS private key.
|
||||
2. Probably not allowed.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Variant option 1a: Terminate on firewall; include SASL connection checking.
|
||||
|
||||
## Decision taken
|
||||
|
||||
[DNB Meeting, 16/11/2017](./drb-meeting-20171116.md): Proceed with option 2b - Terminate on float, inject key from internal portion of the float (RGB, JC, MH agreed)
|
@ -1,256 +0,0 @@
|
||||
# Float Design
|
||||
|
||||
.. important:: This design document describes a feature of Corda Enterprise.
|
||||
|
||||
## Overview
|
||||
|
||||
The role of the 'float' is to meet the requirements of organisations that will not allow direct incoming connections to
|
||||
their node, but would rather host a proxy component in a DMZ to achieve this. As such it needs to meet the requirements
|
||||
of modern DMZ security rules, which essentially assume that the entire machine in the DMZ may become compromised. At
|
||||
the same time, we expect that the Float can interoperate with directly connected nodes, possibly even those using open
|
||||
source Corda.
|
||||
|
||||
### Background
|
||||
|
||||
#### Current state of peer-to-peer messaging in Corda
|
||||
|
||||
The diagram below illustrates the current mechanism for peer-to-peer messaging between Corda nodes.
|
||||
|
||||

|
||||
|
||||
When a flow running on a Corda node triggers a requirement to send a message to a peer node, it first checks for
|
||||
pre-existence of an applicable message queue for that peer.
|
||||
|
||||
**If the relevant queue exists:**
|
||||
|
||||
1. The node submits the message to the queue and continues after receiving acknowledgement.
|
||||
2. The Core Bridge picks up the message and transfers it via a TLS socket to the inbox of the destination node.
|
||||
3. A flow on the recipient receives message from peer and acknowledged consumption on bus when the flow has checkpointed this progress.
|
||||
|
||||
**If the queue does not exist (messaging a new peer):**
|
||||
|
||||
1. The flow triggers creation of a new queue with a name encoding the identity of the intended recipient.
|
||||
2. When the queue creation has completed the node sends the message to the queue.
|
||||
3. The hosted Artemis server within the node has a queue creation hook which is called.
|
||||
4. The queue name is used to lookup the remote connection details and a new bridge is registered.
|
||||
5. The client certificate of the peer is compared to the expected legal identity X500 Name. If this is OK, message flow proceeds as for a pre-existing queue (above).
|
||||
|
||||
## Scope
|
||||
|
||||
* Goals:
|
||||
* Allow connection to a Corda node without requiring direct incoming connections from external participants.
|
||||
* Allow connections to a Corda node without requiring the node itself to have a public IP address. Separate TLS connection handling from the MQ broker.
|
||||
* Non-goals (out of scope):
|
||||
* Support for MQ brokers other than Apache Artemis
|
||||
|
||||
## Timeline
|
||||
For delivery by end Q1 2018.
|
||||
|
||||
## Requirements
|
||||
Allow connectivity in compliance with DMZ constraints commonly imposed by modern financial institutions; namely:
|
||||
1. Firewalls required between the internet and any device in the DMZ, and between the DMZ and the internal network
|
||||
2. Data passing from the internet and the internal network via the DMZ should pass through a clear protocol break in the DMZ.
|
||||
3. Only identified IPs and ports are permitted to access devices in the DMZ; this include communications between devices co-located in the DMZ.
|
||||
4. Only a limited number of ports are opened in the firewall (<5) to make firewall operation manageable. These ports must change slowly.
|
||||
5. Any DMZ machine is typically multi-homed, with separate network cards handling traffic through the institutional
|
||||
firewall vs. to the Internet. (There is usually a further hidden management interface card accessed via a jump box for
|
||||
managing the box and shipping audit trail information). This requires that our software can bind listening ports to the
|
||||
correct network card not just to 0.0.0.0.
|
||||
6. No connections to be initiated by DMZ devices towards the internal network. Communications should be initiated from
|
||||
the internal network to form a bidirectional channel with the proxy process.
|
||||
7. No business data should be persisted on the DMZ box.
|
||||
8. An audit log of all connection events is required to track breaches. Latency information should also be tracked to
|
||||
facilitate management of connectivity issues.
|
||||
9. Processes on DMZ devices run as local accounts with no relationship to internal permission systems, or ability to
|
||||
enumerate devices on the internal network.
|
||||
10. Communications in the DMZ should yse modern TLS, often with local-only certificates/keys that hold no value outside of use in predefined links.
|
||||
11. Where TLS is required to terminate on the firewall, provide a suitably secure key management mechanism (e.g. an HSM).
|
||||
12. Any proxy in the DMZ should be subject to the same HA requirements as the devices it is servicing
|
||||
13. Any business data passing through the proxy should be separately encrypted, so that no data is in the clear of the
|
||||
program memory if the DMZ box is compromised.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
The following design decisions fed into this design:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
decisions/p2p-protocol.md
|
||||
decisions/ssl-termination.md
|
||||
decisions/e2e-encryption.md
|
||||
decisions/pluggable-broker.md
|
||||
|
||||
## Target Solution
|
||||
|
||||
The proposed solution introduces a reverse proxy component ("**float**") which may be sited in the DMZ, as illustrated
|
||||
in the diagram below.
|
||||
|
||||

|
||||
|
||||
The main role of the float is to forward incoming AMQP link packets from authenticated TLS links to the AMQP Bridge
|
||||
Manager, then echo back final delivery acknowledgements once the Bridge Manager has successfully inserted the messages.
|
||||
The Bridge Manager is responsible for rejecting inbound packets on queues that are not local inboxes to prevent e.g.
|
||||
'cheating' messages onto management topics, faking outgoing messages etc.
|
||||
|
||||
The float is linked to the internal AMQP Bridge Manager via a single AMQP/TLS connection, which can contain multiple
|
||||
logical AMQP links. This link is initiated at the socket level by the Bridge Manager towards the float.
|
||||
|
||||
The float is a **listener only** and does not enable outgoing bridges (see Design Decisions, above). Outgoing bridge
|
||||
formation and message sending come directly from the internal Bridge Manager (possibly via a SOCKS 4/5 proxy, which is
|
||||
easy enough to enable in netty, or directly through the corporate firewall. Initiating from the float gives rise to
|
||||
security concerns.)
|
||||
|
||||
The float is **not mandatory**; interoperability with older nodes, even those using direct AMQP from bridges in the
|
||||
node, is supported.
|
||||
|
||||
**No state will be serialized on the float**, although suitably protected logs will be recorded of all float activities.
|
||||
|
||||
**End-to-end encryption** of the payload is not delivered through this design (see Design Decisions, above). For current
|
||||
purposes, a header field indicating plaintext/encrypted payload is employed as a placeholder.
|
||||
|
||||
**HA** is enabled (this should be easy as the bridge manager can choose which float to make active). Only fully
|
||||
connected DMZ floats should activate their listening port.
|
||||
|
||||
Implementation of the float is expected to be based on existing AMQP Bridge Manager code - see Implementation Plan,
|
||||
below, for expected work stages.
|
||||
|
||||
### Bridge control protocol
|
||||
|
||||
The bridge control is designed to be as stateless as possible. Thus, nodes and bridges restarting must
|
||||
re-request/broadcast information to each other. Messages are sent to a 'bridge.control' address in Artemis as
|
||||
non-persistent messages with a non-durable queue. Each message should contain a duplicate message ID, which is also
|
||||
re-used as the correlation id in replies. Relevant scenarios are described below:
|
||||
|
||||
#### On bridge start-up, or reconnection to Artemis
|
||||
1. The bridge process should subscribe to the 'bridge.control'.
|
||||
2. The bridge should start sending QueueQuery messages which will contain a unique message id and an identifier for the bridge sending the message.
|
||||
3. The bridge should continue to send these until at least one node replies with a matched QueueSnapshot message.
|
||||
4. The QueueSnapshot message replies from the nodes contains a correlationId field set to the unique id of the QueueQuery query, or the correlation id is null. The message payload is a list of inbox queue info items and a list of outbound queue info items. Each queue info item is a tuple of Legal X500 Name (as expected upon the destination TLS certificates) and the queue name which should have the form of "internal.peers."+hash key of legal identity (using the same algorithm as we use in the db to make the string). Note this queue name is a change from the current logic, but will be more portable to length constrained topics and allow multiple inboxes on the same broker.
|
||||
5. The bridge should process the QueueSnapshot, initiating links to the outgoing targets. It should also add expected inboxes to its in-bound permission list.
|
||||
6. When an outgoing link is successfully formed the remote client certificate should be checked against the expected X500 name. Assuming the link is valid the bridge should subscribe to the related queue and start trying to forward the messages.
|
||||
|
||||
#### On node start-up, or reconnection to Artemis
|
||||
1. The node should subscribe to 'bridge.control'.
|
||||
2. The node should enumerate the queues and identify which are have well known identities in the network map cache. The appropriate information about its own inboxes and any known outgoing queues should be compiled into an unsolicited QueueSnapshot message with a null correlation id. This should be broadcasted to update any bridges that are running.
|
||||
3. If any QueueQuery messages arrive these should be responded to with specific QueueSnapshot messages with the correlation id set.
|
||||
|
||||
#### On network map updates
|
||||
1. On receipt of any network map cache updates the information should be evaluated to see if any addition queues can now be mapped to a bridge. At this point a BridgeRequest packet should be sent which will contain the legal X500Name and queue name of the new update.
|
||||
|
||||
#### On flow message to Peer
|
||||
1. If a message is to be sent to a peer the code should (as it does now) check for queue existence in its cache and then on the broker. If it does exist it simply sends the message.
|
||||
2. If the queue is not listed in its cache it should block until the queue is created (this should be safe versus race conditions with other nodes).
|
||||
3. Once the queue is created the original message and subsequent messages can now be sent.
|
||||
4. In parallel a BridgeRequest packet should be sent to activate a new connection outwards. This will contain the contain the legal X500Name and queue name of the new queue.
|
||||
5. Future QueueSnapshot requests should be responded to with the new queue included in the list.
|
||||
|
||||
### Behaviour with a Float portion in the DMZ
|
||||
|
||||
1. On initial connection of an inbound bridge, AMQP is configured to run a SASL challenge response to (re-)validate the
|
||||
origin and confirm the client identity. (The most likely SASL mechanism for this is using https://tools.ietf.org/html/rfc3163
|
||||
as this allows reuse of our PKI certificates in the challenge response. Potentially we could forward some bridge control
|
||||
messages to cover the SASL exchange to the internal Bridge Controller. This would allow us to keep the private keys
|
||||
internal to the organisation, so we may also require a SASLAuth message type as part of the bridge control protocol.)
|
||||
2. The float restricts acceptable AMQP topics to the name space appropriate for inbound messages only. Hence, there
|
||||
should be no way to tunnel messages to bridge control, or RPC topics on the bus.
|
||||
3. On receipt of a message from the external network, the Float should append a header to link the source channel's X500
|
||||
name, then create a Delivery for forwarding the message inwards.
|
||||
4. The internal Bridge Control Manager process validates the message further to ensure that it is targeted at a legitimate
|
||||
inbox (i.e. not an outbound queue) and then forwards it to the bus. Once delivered to the broker, the Delivery
|
||||
acknowledgements are cascaded back.
|
||||
5. On receiving Delivery notification from the internal side, the Float acknowledges back the correlated original Delivery.
|
||||
6. The Float should protect against excessive inbound messages by AMQP flow control and refusing to accept excessive unacknowledged deliveries.
|
||||
7. The Float only exposes its inbound server socket when activated by a valid AMQP link from the Bridge Control Manager
|
||||
to allow for a simple HA pool of DMZ Float processes. (Floats cannot run hot-hot as this would invalidate Corda's
|
||||
message ordering guarantees.)
|
||||
|
||||
## Implementation plan
|
||||
|
||||
### Proposed incremental steps towards a float
|
||||
|
||||
1. First, I would like to more explicitly split the RPC and P2P MessagingService instances inside the Node. They can
|
||||
keep the same interface, but this would let us develop P2P and RPC at different rates if required.
|
||||
|
||||
2. The current in-node design with Artemis Core bridges should first be replaced with an equivalent piece of code that
|
||||
initiates send only bridges using an in-house wrapper over the proton-j library. Thus, the current Artemis message
|
||||
objects will be picked up from existing queues using the CORE protocol via an abstraction interface to allow later
|
||||
pluggable replacement. The specific subscribed queues are controlled as before and bridges started by the existing code
|
||||
path. The only difference is the bridges will be the new AMQP client code. The remote Artemis broker should accept
|
||||
transferred packets directly onto its own inbox queue and acknowledge receipt via standard AMQP Delivery notifications.
|
||||
This in turn will be acknowledged back to the Artemis Subscriber to permanently remove the message from the source
|
||||
Artemis queue. The headers for deduplication, address names, etc will need to be mapped to the AMQP messages and we will
|
||||
have to take care about the message payload. This should be an envelope that is capable in the future of being
|
||||
end-to-end encrypted. Where possible we should stay close to the current Artemis mappings.
|
||||
|
||||
3. We need to define a bridge control protocol, so that we can have an out of process float/bridge. The current process
|
||||
is that on message send the node checks the target address to see if the target queue already exists. If the queue
|
||||
doesn't exist it creates a new queue which includes an encoding of the PublicKey in its name. This is picked up by a
|
||||
wrapper around the Artemis Server which is also hosted inside the node and can ask the network map cache for a
|
||||
translation to a target host and port. This in turn allows a new bridge to be provisioned. At node restart the
|
||||
re-population of the network map cache is followed to re-create the bridges to any unsent queues/messages.
|
||||
|
||||
4. My proposal for a bridge control protocol is partly influenced by the fact that AMQP does not have a built-in
|
||||
mechanism for queue creation/deletion/enumeration. Also, the flows cannot progress until they are sure that there is an
|
||||
accepting queue. Finally, if one runs a local broker it should be fine to run multiple nodes without any bridge
|
||||
processes. Therefore, I will leave the queue creation as the node's responsibility. Initially we can continue to use the
|
||||
existing CORE protocol for this. The requirement to initiate a bridge will change from being implicit signalling via
|
||||
server queue detection to being an explicit pub-sub message that requests bridge formation. This doesn't need
|
||||
durability, or acknowledgements, because when a bridge process starts it should request a refresh of the required bridge
|
||||
list. The typical create bridge messages should contain:
|
||||
|
||||
1. The queue name (ideally with the sha256 of the PublicKey, not the whole PublicKey as that may not work on brokers with queue name length constraints).
|
||||
2. The expected X500Name for the remote TLS certificate.
|
||||
3. The list of host and ports to attempt connection to. See separate section for more info.
|
||||
|
||||
5. Once we have the bridge protocol in place and a bridge out of process the broker can move out of process too, which
|
||||
is a requirement for clustering anyway. We can then start work on floating the bridge and making our broker pluggable.
|
||||
|
||||
1. At this point the bridge connection to the local queues should be upgraded to also be AMQP client, rather than CORE
|
||||
protocol, which will give the ability for the P2P bridges to work with other broker products.
|
||||
2. An independent task is to look at making the Bridge process HA, probably using a similar hot-warm mastering solution
|
||||
as the node, or atomix.io. The inactive node should track the control messages, but obviously doesn't initiate any
|
||||
bridges.
|
||||
3. Another potentially parallel piece of development is to start to build a float, which is essentially just splitting
|
||||
the bridge in two and putting in an intermediate hop AMQP/TLS link. The thin proxy in the DMZ zone should be as
|
||||
stateless as possible in this.
|
||||
4. Finally, the node should use AMQP to talk to its local broker cluster, but this will have to remain partly tied
|
||||
to Artemis, as queue creation will require sending management messages to the Artemis core, but we should be
|
||||
able to abstract this.
|
||||
|
||||
### Float evolution
|
||||
|
||||
#### In-Process AMQP Bridging
|
||||
|
||||

|
||||
|
||||
In this phase of evolution we hook the same bridge creation code as before and use the same in-process data access to
|
||||
network map cache. However, we now implement AMQP sender clients using proton-j and netty for TLS layer and connection
|
||||
retry. This will also involve formalising the AMQP packet format of the Corda P2P protocol. Once a bridge makes a
|
||||
successful link to a remote node's Artemis broker it will subscribe to the associated local queue. The messages will be
|
||||
picked up from the local broker via an Artemis CORE consumer for simplicity of initial implementation. The queue
|
||||
consumer should be implemented with a simple generic interface as façade, to allow future replacement. The message will
|
||||
be sent across the AMQP protocol directly to the remote Artemis broker. Once acknowledgement of receipt is given with an
|
||||
AMQP Delivery notification the queue consumption will be acknowledged. This will remove the original item from the
|
||||
source queue. If delivery fails due to link loss the subscriber should be closed until a new link is established to
|
||||
ensure messages are not consumed. If delivery fails for other reasons there should be some for of periodic retry over
|
||||
the AMQP link. For authentication checks the client cert returned from the remote server will be checked and the link
|
||||
dropped if it doesn't match expectations.
|
||||
|
||||
#### Out of process Artemis Broker and Bridges
|
||||

|
||||
|
||||
Move the Artemis broker and bridge formation logic out of the node. This requires formalising the bridge creation
|
||||
requests, but allows clustered brokers, standardised AMQP usage and ultimately pluggable brokers. We should implement a
|
||||
netty socket server on the bridge and forward authenticated packets to the local Artemis broker inbound queues. An AMQP
|
||||
server socket is required for the float, although it should be transparent whether a NodeInfo refers to a bridge socket
|
||||
address, or an Artemis broker. The queue names should use the sha-256 of the PublicKey not the full key. Also, the name
|
||||
should be used for in and out queues, so that multiple distinct nodes can coexist on the same broker. This will simplify
|
||||
development as developers just run a background broker and shouldn't need to restart it. To export the network map
|
||||
information and to initiate bridges a non-durable bridge control protocol will be needed (in blue). Essentially the
|
||||
messages declare the local queue names and target TLS link information. For in-bound messages only messages for known
|
||||
inbox targets will be acknowledged. It should not be hard to make the bridges active-passive HA as they contain no
|
||||
persisted message state and simple RPC can resync the state of the bridge. Queue creation will remain with the node as
|
||||
this must use non-AMQP mechanisms and because flows should be able to queue sent messages even if the bridge is
|
||||
temporarily down. In parallel work can start to upgrade the local links to Artemis (i.e. the node-Artemis link and the
|
||||
Bridge Manager-Artemis link) to be AMQP clients as much as possible.
|
Before Width: | Height: | Size: 162 KiB |
Before Width: | Height: | Size: 63 KiB |
Before Width: | Height: | Size: 126 KiB |
@ -1,50 +0,0 @@
|
||||
# Design Decision: Node starting & stopping
|
||||
|
||||
## Background / Context
|
||||
|
||||
The potential use of a crash shell is relevant to high availability capabilities of nodes.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Use crash shell
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Already built into the node.
|
||||
2. Potentially add custom commands.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Won’t reliably work if the node is in an unstable state
|
||||
2. Not practical for running hundreds of nodes as our customers already trying to do.
|
||||
3. Doesn’t mesh with the user access controls of the organisation.
|
||||
4. Doesn’t interface to the existing monitoring and control systems i.e. Nagios, Geneos ITRS, Docker Swarm, etc.
|
||||
|
||||
### 2. Delegate to external tools
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Doesn’t require change from our customers
|
||||
2. Will work even if node is completely stuck
|
||||
3. Allows scripted node restart schedules
|
||||
4. Doesn’t raise questions about access control lists and audit
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. More uncertainty about what customers do.
|
||||
2. Might be more requirements on us to interact nicely with lots of different products.
|
||||
3. Might mean we get blamed for faults in other people’s control software.
|
||||
4. Doesn’t coordinate with the node for graceful shutdown.
|
||||
5. Doesn’t address any crypto features that target protecting the AMQP headers.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2: Delegate to external tools
|
||||
|
||||
## Decision taken
|
||||
|
||||
Restarts should be handled by polite shutdown, followed by a hard clear. (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,46 +0,0 @@
|
||||
# Design Decision: Message storage
|
||||
|
||||
## Background / Context
|
||||
|
||||
Storage of messages by the message broker has implications for replication technologies which can be used to ensure both
|
||||
[high availability](../design.md) and disaster recovery of Corda nodes.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Storage in the file system
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Out of the box configuration.
|
||||
2. Recommended Artemis setup
|
||||
3. Faster
|
||||
4. Less likely to have interaction with DB Blob rules
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Unaligned capture time of journal data compared to DB checkpointing.
|
||||
2. Replication options on Azure are limited. Currently we may be forced to the ‘Azure Files’ SMB mount, rather than the ‘Azure Data Disk’ option. This is still being evaluated
|
||||
|
||||
### 2. Storage in node database
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Single point of data capture and backup
|
||||
2. Consistent solution between VM and physical box solutions
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Doesn’t work on H2, or SQL Server. From my own testing LargeObject support is broken. The current Artemis code base does allow some pluggability, but not of the large object implementation, only of the SQL statements. We should lobby for someone to fix the implementations for SQLServer and H2.
|
||||
2. Probably much slower, although this needs measuring.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Continue with Option 1: Storage in the file system
|
||||
|
||||
## Decision taken
|
||||
|
||||
Use storage in the file system (for now)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,118 +0,0 @@
|
||||
# Design Review Board Meeting Minutes
|
||||
|
||||
**Date / Time:** 16/11/2017, 16:30
|
||||
|
||||
## Attendees
|
||||
|
||||
- Mark Oldfield (MO)
|
||||
- Matthew Nesbit (MN)
|
||||
- Richard Gendal Brown (RGB)
|
||||
- James Carlyle (JC)
|
||||
- Mike Hearn (MH)
|
||||
- Jose Coll (JoC)
|
||||
- Rick Parker (RP)
|
||||
- Andrey Bozhko (AB)
|
||||
- Dave Hudson (DH)
|
||||
- Nick Arini (NA)
|
||||
- Ben Abineri (BA)
|
||||
- Jonathan Sartin (JS)
|
||||
- David Lee (DL)
|
||||
|
||||
## Minutes
|
||||
|
||||
The meeting re-opened following prior discussion of the float design.
|
||||
|
||||
MN introduced the design for high availability, clarifying that the design did not include support for DR-implied features (asynchronous replication etc.).
|
||||
|
||||
MN highlighted limitations in testability: Azure had confirmed support for geo replication but with limited control by the user and no testing facility; all R3 can do is test for impact on performance.
|
||||
|
||||
The design was noted to be dependent on a lot on external dependencies for replication, with R3's testing capability limited to Azure. Agent banks may want to use SAN across dark fiber sites, redundant switches etc. not available to R3.
|
||||
|
||||
MN noted that certain databases are not yet officially supported in Corda.
|
||||
|
||||
### [Near-term-target](./near-term-target.md), [Medium-term target](./medium-term-target.md)
|
||||
|
||||
Outlining the hot-cold design, MN highlighted importance of ensuring only one node is active at one time. MN argued for having a tested hot-cold solution as a ‘backstop’. MN confirmed the work involved was to develop DB/SAN exclusion checkers and test appropriately.
|
||||
|
||||
JC queried whether unknowns exist for hot-cold. MN described limitations of Azure file replication.
|
||||
|
||||
JC noted there was optionality around both the replication mechanisms and the on-premises vs. cloud deployment.
|
||||
|
||||
### [Message storage](./db-msg-store.md)
|
||||
|
||||
Lack of support for storing Artemis messages via JDBC was raised, and the possibility for RedHat to provide an enhancement was discussed.
|
||||
|
||||
MH raised the alternative of using Artemis’ inbuilt replication protocol - MN confirmed this was in scope for hot-warm, but not hot-cold.
|
||||
|
||||
JC posited that file system/SAN replication should be OK for banks
|
||||
|
||||
**DECISION AGREED**: Use storage in the file system (for now)
|
||||
|
||||
AB questioned about protections against corruption; RGB highlighted the need for testing on this. MH described previous testing activity, arguing for a performance cluster that repeatedly runs load tests, kills nodes,checking they come back etc.
|
||||
|
||||
MN could not comment on testing status of current code. MH noted the notary hasn't been tested.
|
||||
|
||||
AB queried how basic node recovery would work. MN explained, highlighting the limitation for RPC callbacks.
|
||||
|
||||
JC proposed these limitations should be noted and explained to Finastra; move on.
|
||||
|
||||
There was discussion of how RPC observables could be made to persist across node outages. MN argued that for most applications, a clear signal of the outage that triggered clients to resubscribe was preferable. This was agreed.
|
||||
|
||||
JC argued for using Kafka.
|
||||
|
||||
MN presented the Hot-warm solution as a target for March-April and provide clarifications on differences vs. hot-cold and hot-hot.
|
||||
|
||||
JC highlighted that the clustered artemis was an important intermediate step. MN highlighted other important features
|
||||
|
||||
MO noted that different banks may opt for different solutions.
|
||||
|
||||
JoC raised the question of multi-IP per node.
|
||||
|
||||
MN described the Hot-hot solution, highlighting that flows remained 'sticky' to a particular instance but could be picked up by another when needed.
|
||||
|
||||
AB preferred the hot-hot solution. MN noted the many edge cases to be worked through.
|
||||
|
||||
AB Queried the DR story. MO stated this was out of scope at present.
|
||||
|
||||
There was discussion of the implications of not having synchronous replication.
|
||||
|
||||
MH questioned the need for a backup strategy that allows winding back the clock. MO stated this was out of scope at present.
|
||||
|
||||
MO drew attention to the expectation that Corda would be considered part of larger solutions with controlled restore procedures under BCP.
|
||||
|
||||
JC noted the variability in many elements as a challenge.
|
||||
|
||||
MO argued for providing a 'shrink-wrapped' solution based around equipment R3 could test (e.g. Azure)
|
||||
|
||||
JC argued for the need to manage testing of banks' infrastructure choices in order to reduce time to implementation.
|
||||
|
||||
There was discussion around the semantic difference between HA and DR. MH argued for a definition based around rolling backups. MN and MO shared banks' view of what DR is. MH contrasted this with Google definitions. AB noted HA and DR have different SLAs.
|
||||
|
||||
**DECISION AGREED:** Near-term target: Hot Cold; Medium-term target: Hot-warm (RGB, JC, MH agreed)
|
||||
|
||||
RGB queried why Artemis couldn't be run in clustered mode now. MN explained.
|
||||
|
||||
AB queried what Finastra asked for. MO implied nothing specific; MH maintained this would be needed anyway.
|
||||
|
||||
### [Broker separation](./external-broker.md)
|
||||
|
||||
MN outlined his rationale for Broker separation.
|
||||
|
||||
JC queried whether this would affect demos.
|
||||
|
||||
MN gave an assumption that HA was for enterprise only; RGB, JC: pointed out that Enterprise might still be made available for non-production use.
|
||||
|
||||
**DECISION AGREED**: The broker should only be separated if required by other features (e.g. the float), otherwise not. (RGB, JC, MH agreed).
|
||||
|
||||
### [Load balancers and multi-IP](./ip-addressing.md)
|
||||
|
||||
The topic was discussed.
|
||||
|
||||
**DECISION AGREED**: The design can allow for optional load balancers to be implemented by clients.
|
||||
|
||||
### [Crash shell](./crash-shell.md)
|
||||
|
||||
MN provided outline explanation.
|
||||
|
||||
**DECISION AGREED**: Restarts should be handled by polite shutdown, followed by a hard clear. (RGB, JC, MH agreed)
|
||||
|
@ -1,48 +0,0 @@
|
||||
# Design Decision: Broker separation
|
||||
|
||||
## Background / Context
|
||||
|
||||
A decision of whether to extract the Artemis message broker as a separate component has implications for the design of
|
||||
[high availability](../design.md) for nodes.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. No change (leave broker embedded)
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Least change
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Means that starting/stopping Corda is tightly coupled to starting/stopping Artemis instances.
|
||||
2. Risks resource leaks from one system component affecting other components.
|
||||
3. Not pluggable if we wish to have an alternative broker.
|
||||
|
||||
### 2. External broker
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Separates concerns
|
||||
2. Allows future pluggability and standardisation on AMQP
|
||||
3. Separates life cycles of the components
|
||||
4. Makes Artemis deployment much more out of the box.
|
||||
5. Allows easier tuning of VM resources for Flow processing workloads vs broker type workloads.
|
||||
6. Allows later encrypted version to be an enterprise feature that can interoperate with OS versions.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. More work
|
||||
2. Requires creating a protocol to control external bridge formation.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2: External broker
|
||||
|
||||
## Decision taken
|
||||
|
||||
The broker should only be separated if required by other features (e.g. the float), otherwise not. (RGB, JC, MH agreed).
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,46 +0,0 @@
|
||||
# Design Decision: IP addressing mechanism (near-term)
|
||||
|
||||
## Background / Context
|
||||
|
||||
End-to-end encryption is a desirable potential design feature for the [high availability support](../design.md).
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Via load balancer
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Standard technology in banks and on clouds, often for non-HA purposes.
|
||||
2. Intended to allow us to wait for completion of network map work.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. We do need to support multiple IP address advertisements in network map long term.
|
||||
2. Might involve small amount of code if we find Artemis doesn’t like the health probes. So far though testing of the Azure Load balancer doesn’t need this.
|
||||
3. Won’t work over very large data centre separations, but that doesn’t work for HA/DR either
|
||||
|
||||
### 2. Via IP list in Network Map
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. More flexible
|
||||
2. More deployment options
|
||||
3. We will need it one day
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Have to write code to support it.
|
||||
2. Configuration more complicated and now the nodes are non-equivalent, so you can’t just copy the config to the backup.
|
||||
3. Artemis has round robin and automatic failover, so we may have to expose a vendor specific config flag in the network map.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 1: Via Load Balancer
|
||||
|
||||
## Decision taken
|
||||
|
||||
The design can allow for optional load balancers to be implemented by clients. (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,49 +0,0 @@
|
||||
# Design Decision: Medium-term target for node HA
|
||||
|
||||
## Background / Context
|
||||
|
||||
Designing for high availability is a complex task which can only be delivered over an operationally-significant
|
||||
timeline. It is therefore important to determine whether an intermediate state design (deliverable for around March
|
||||
2018) is desirable as a precursor to longer term outcomes.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. Hot-warm as interim state
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Simpler master/slave election logic
|
||||
2. Less edge cases with respect to messages being consumed by flows.
|
||||
3. Naive solution of just stopping/starting the node code is simple to implement.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Still probably requires the Artemis MQ outside of the node in a cluster.
|
||||
2. May actually turn out more risky than hot-hot, because shutting down code is always prone to deadlocks and resource leakages.
|
||||
3. Some work would have to be thrown away when we create a full hot-hot solution.
|
||||
|
||||
### 2. Progress immediately to Hot-hot
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Horizontal scalability is what all our customers want.
|
||||
2. It simplifies many deployments as nodes in a cluster are all equivalent.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. More complicated especially regarding message routing.
|
||||
2. Riskier to do this big-bang style.
|
||||
3. Might not meet deadlines.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 1: Hot-warm as interim state.
|
||||
|
||||
## Decision taken
|
||||
|
||||
Adopt option 1: Medium-term target: Hot Warm (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
||||
|
@ -1,46 +0,0 @@
|
||||
# Design Decision: Near-term target for node HA
|
||||
|
||||
## Background / Context
|
||||
|
||||
Designing for high availability is a complex task which can only be delivered over an operationally-significant
|
||||
timeline. It is therefore important to determine the target state in the near term as a precursor to longer term
|
||||
outcomes.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### 1. No HA
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Reduces developer distractions.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. No backstop if we miss our targets for fuller HA.
|
||||
2. No answer at all for simple DR modes.
|
||||
|
||||
### 2. Hot-cold (see [HA design doc](../design.md))
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Flushes out lots of basic deployment issues that will be of benefit later.
|
||||
2. If stuff slips we at least have a backstop position with hot-cold.
|
||||
3. For now, the only DR story we have is essentially a continuation of this mode
|
||||
4. The intent of decisions such as using a loadbalancer is to minimise code changes
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Distracts from the work for more complete forms of HA.
|
||||
2. Involves creating a few components that are not much use later, for instance the mutual exclusion lock.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option 2: Hot-cold.
|
||||
|
||||
## Decision taken
|
||||
|
||||
Adopt option 2: Near-term target: Hot Cold (RGB, JC, MH agreed)
|
||||
|
||||
.. toctree::
|
||||
|
||||
drb-meeting-20171116.md
|
@ -1,284 +0,0 @@
|
||||
# High availability support
|
||||
|
||||
.. important:: This design document describes a feature of Corda Enterprise.
|
||||
|
||||
## Overview
|
||||
### Background
|
||||
|
||||
The term high availability (HA) is used in this document to refer to the ability to rapidly handle any single component
|
||||
failure, whether due to physical issues (e.g. hard drive failure), network connectivity loss, or software faults.
|
||||
|
||||
Expectations of HA in modern enterprise systems are for systems to recover normal operation in a few minutes at most,
|
||||
while ensuring minimal/zero data loss. Whilst overall reliability is the overriding objective, it is desirable for Corda
|
||||
to offer HA mechanisms which are both highly automated and transparent to node operators. HA mechanism must not involve
|
||||
any configuration changes that require more than an appropriate admin tool, or a simple start/stop of a process as that
|
||||
would need an Emergency Change Request.
|
||||
|
||||
HA naturally grades into requirements for Disaster Recovery (DR), which requires that there is a tested procedure to
|
||||
handle large scale multi-component failures e.g. due to data centre flooding, acts of terrorism. DR processes are
|
||||
permitted to involve significant manual intervention, although the complications of actually invoking a Business
|
||||
Continuity Plan (BCP) mean that the less manual intervention, the more competitive Corda will be in the modern vendor
|
||||
market. For modern financial institutions, maintaining comprehensive and effective BCP procedures are a legal
|
||||
requirement which is generally tested at least once a year.
|
||||
|
||||
However, until Corda is the system of record, or the primary system for transactions we are unlikely to be required to
|
||||
have any kind of fully automatic DR. In fact, we are likely to be restarted only once BCP has restored the most critical
|
||||
systems. In contrast, typical financial institutions maintain large, complex technology landscapes in which individual
|
||||
component failures can occur, such as:
|
||||
|
||||
* Small scale software failures
|
||||
* Mandatory data centre power cycles
|
||||
* Operating system patching and restarts
|
||||
* Short lived network outages
|
||||
* Middleware queue build-up
|
||||
* Machine failures
|
||||
|
||||
Thus, HA is essential for enterprise Corda and providing help to administrators necessary for rapid fault diagnosis.
|
||||
|
||||
### Current node topology
|
||||
|
||||

|
||||
|
||||
The current solution has a single integrated process running in one JVM including Artemis, H2 database, Flow State
|
||||
Machine, P2P bridging. All storage is on the local file system. There is no HA capability other than manual restart of
|
||||
the node following failure.
|
||||
|
||||
#### Limitations
|
||||
|
||||
- All sub-systems must be started and stopped together.
|
||||
- Unable to handle partial failure e.g. Artemis.
|
||||
- Artemis cannot use its in-built HA capability (clustered slave mode) as it is embedded.
|
||||
- Cannot run the node with the flow state machine suspended.
|
||||
- Cannot use alternative message brokers.
|
||||
- Cannot run multiple nodes against the same broker.
|
||||
- Cannot use alternative databases to H2.
|
||||
- Cannot share the database across Corda nodes.
|
||||
- RPC clients do have automatic reconnect but there is no clear solution for resynchronising on reconnect.
|
||||
- The backup strategy is unclear.
|
||||
|
||||
## Requirements
|
||||
### Goals
|
||||
|
||||
* A logical Corda node should continue to function in the event of an individual component failure or (e.g.) restart.
|
||||
* No loss, corruption or duplication of data on the ledger due to component outages
|
||||
* Ensure continuity of flows throughout any disruption
|
||||
* Support software upgrades in a live network
|
||||
|
||||
### Non-goals (out of scope for this design document)
|
||||
|
||||
* Be able to distribute a node over more than two data centers.
|
||||
* Be able to distribute a node between data centers that are very far apart latency-wise (unless you don't care about performance).
|
||||
* Be able to tolerate arbitrary byzantine failures within a node cluster.
|
||||
* DR, specifically in the case of the complete failure of a site/datacentre/cluster or region will require a different
|
||||
solution to that specified here. For now DR is only supported where performant synchronous replication is feasible
|
||||
i.e. sites only a few miles apart.
|
||||
|
||||
## Timeline
|
||||
|
||||
This design document outlines a range of topologies which will be enabled through progressive enhancements from the
|
||||
short to long term.
|
||||
|
||||
On the timescales available for the current production pilot deployments we clearly do not have time to reach the ideal
|
||||
of a highly fault tolerant, horizontally scaled Corda.
|
||||
|
||||
Instead, I suggest that we can only achieve the simplest state of a standby Corda installation only by January 5th and
|
||||
even this is contingent on other enterprise features, such as external database and network map stabilisation being
|
||||
completed on this timescale, plus any issues raised by testing.
|
||||
|
||||
For the Enterprise GA timeline, I hope that we can achieve a more fully automatic node failover state, with the Artemis
|
||||
broker running as a cluster too. I include a diagram of a fully scaled Corda for completeness and so that I can discuss
|
||||
what work is re-usable/throw away.
|
||||
|
||||
With regards to DR it is unclear how this would work where synchronous replication is not feasible. At this point we can
|
||||
only investigate approaches as an aside to the main thrust of work for HA support. In the synchronous replication mode
|
||||
it is assumed that the file and database replication can be used to ensure a cold DR backup.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
The following design decisions are assumed by this design:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
decisions/near-term-target.md
|
||||
decisions/medium-term-target.md
|
||||
decisions/external-broker.md
|
||||
decisions/db-msg-store.md
|
||||
decisions/ip-addressing.md
|
||||
decisions/crash-shell.md
|
||||
|
||||
## Target Solution
|
||||
|
||||
### Hot-Cold (minimum requirement)
|
||||

|
||||
|
||||
Small scale software failures on a node are recovered from locally via restarting/re-setting the offending component by
|
||||
the external (to JVM) "Health Watchdog" (HW) process. The HW process (eg a shell script or similar) would monitor
|
||||
parameters for java processes by periodically query them (sleep period a few seconds). This may require introduction of
|
||||
a few monitoring 'hooks' into Corda codebase or a "health" CorDapp the HW script can interface with. There would be a
|
||||
back-off logic to prevent continues restarts in the case of persistent failure.
|
||||
|
||||
We would provide a fully-functional sample HW script for Linux/Unix deployment platforms.
|
||||
|
||||
The hot-cold design provides a backup VM and Corda deployment instance that can be manually started if the primary is
|
||||
stopped. The failed primary must be killed to ensure it is fully stopped.
|
||||
|
||||
For single-node deployment scenarios the simplest supported way to recover from failures is to re-start the entire set
|
||||
of Corda Node processes or reboot the node OS.
|
||||
|
||||
For a 2-node HA deployment scenario a load balancer determines which node is active and routes traffic to that node. The
|
||||
load balancer will need to monitor the health of the primary and secondary nodes and automatically route traffic from
|
||||
the public IP address to the only active end-point. An external solution is required for the load balancer and health
|
||||
monitor. In the case of Azure cloud deployments, no custom code needs to be developed to support the health monitor.
|
||||
|
||||
An additional component will be written to prevent accidental dual running which is likely to make use of a database
|
||||
heartbeat table. Code size should be minimal.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- This approach minimises the need for new code so can be deployed quickly.
|
||||
- Use of a load balancer in the short term avoids the need for new code and configuration management to support the alternative approach of multiple advertised addresses for a single legal identity.
|
||||
- Configuration of the inactive mode should be a simple mirror of the primary.
|
||||
- Assumes external monitoring and management of the nodes e.g. ability to identify node failure and that Corda watchdog code will not be required (customer developed).
|
||||
|
||||
#### Limitations
|
||||
|
||||
- Slow failover as this is manually controlled.
|
||||
- Requires external solutions for replication of database and Artemis journal data.
|
||||
- Replication mechanism on agent banks with real servers not tested.
|
||||
- Replication mechanism on Azure is under test but may prove to be too slow.
|
||||
- Compatibility with external load balancers not tested. Only Azure configuration tested.
|
||||
- Contingent on completion of database support and testing of replication.
|
||||
- Failure of database (loss of connection) may not be supported or may require additional code.
|
||||
- RPC clients assumed to make short lived RPC requests e.g. from Rest server so no support for long term clients operating across failover.
|
||||
- Replication time point of the database and Artemis message data are independent and may not fully synchronise (may work subject to testing) .
|
||||
- Health reporting and process controls need to be developed by the customer.
|
||||
|
||||
### Hot-Warm (Medium-term solution)
|
||||

|
||||
|
||||
Hot-warm aims to automate failover and provide failover of individual major components e.g. Artemis.
|
||||
|
||||
It involves Two key changes to the hot-cold design:
|
||||
1) Separation and clustering of the Artemis broker.
|
||||
2) Start and stop of flow processing without JVM exit.
|
||||
|
||||
The consequences of these changes are that peer to peer bridging is separated from the node and a bridge control
|
||||
protocol must be developed. A leader election component is a pre-cursor to load balancing – likely to be a combination
|
||||
of custom code and standard library and, in the short term, is likely to be via the database. Cleaner handling of
|
||||
disconnects from the external components (Artemis and the database) will also be needed.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- Faster failover as no manual intervention.
|
||||
- We can use Artemis replication protocol to replicate the message store.
|
||||
- The approach is integrated with preliminary steps for the float.
|
||||
- Able to handle loss of network connectivity to the database from one node.
|
||||
- Extraction of Artemis server allows a more standard Artemis deployment.
|
||||
- Provides protection against resource leakage in Artemis or Node from affecting the other component.
|
||||
- VMs can be tuned to address different work load patterns of broker and node.
|
||||
- Bridge work allows chance to support multiple IP addresses without a load balancer.
|
||||
|
||||
#### Limitations
|
||||
|
||||
- This approach will require careful testing of resource management on partial shutdown.
|
||||
- No horizontal scaling support.
|
||||
- Deployment of master and slave may not be completely symmetric.
|
||||
- Care must be taken with upgrades to ensure master/slave election operates across updates.
|
||||
- Artemis clustering does require a designated master at start-up of its cluster hence any restart involving changing
|
||||
the primary node will require configuration management.
|
||||
- The development effort is much more significant than the hot-cold configuration.
|
||||
|
||||
### Hot-Hot (Long-term strategic solution)
|
||||

|
||||
|
||||
In this configuration, all nodes are actively processing work and share a clustered database. A mechanism for sharding
|
||||
or distributing the work load will need to be developed.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- Faster failover as flows are picked up by other active nodes.
|
||||
- Rapid scaling by adding additional nodes.
|
||||
- Node deployment is symmetric.
|
||||
- Any broker that can support AMQP can be used.
|
||||
- RPC can gracefully handle failover because responsibility for the flow can be migrated across nodes without the client being aware.
|
||||
|
||||
#### Limitations
|
||||
|
||||
- Very significant work with many edge cases during failure.
|
||||
- Will require handling of more states than just checkpoints e.g. soft locks and RPC subscriptions.
|
||||
- Single flows will not be active on multiple nodes without future development work.
|
||||
|
||||
## Implementation plan
|
||||
|
||||
### Transitioning from Corda 2.0 to Manually Activated HA
|
||||
|
||||
The current Corda is built to run as a fully contained single process with the Flow logic, H2 database and Artemis
|
||||
broker all bundled together. This limits the options for automatic replication, or subsystem failure. Thus, we must use
|
||||
external mechanisms to replicate the data in the case of failure. We also should ensure that accidental dual start is
|
||||
not possible in case of mistakes, or slow shutdown of the primary.
|
||||
|
||||
Based on this situation, I suggest the following minimum development tasks are required for a tested HA deployment:
|
||||
|
||||
1. Complete and merge JDBC support for an external clustered database. Azure SQL Server has been identified as the most
|
||||
likely initial deployment. With this we should be able to point at an HA database instance for Ledger and Checkpoint data.
|
||||
2. I am suggesting that for the near term we just use the Azure Load Balancer to hide the multiple machine addresses.
|
||||
This does require allowing a health monitoring link to the Artemis broker, but so far testing indicates that this
|
||||
operates without issue. Longer term we need to ensure that the network map and configuration support exists for the
|
||||
system to work with multiple TCP/IP endpoints advertised to external nodes. Ideally this should be rolled into the
|
||||
work for AMQP bridges and Floats.
|
||||
3. Implement a very simple mutual exclusion feature, so that an enterprise node cannot start if another is running onto
|
||||
the same database. This can be via a simple heartbeat update in the database, or possibly some other library. This
|
||||
feature should be enabled only when specified by configuration.
|
||||
4. The replication of the Artemis Message Queues will have to be via an external mechanism. On Azure we believe that the
|
||||
only practical solution is the 'Azure Files' approach which maps a virtual Samba drive. This we are testing in-case it
|
||||
is too slow to work. The mounting of separate Data Disks is possible, but they can only be mounted to one VM at a
|
||||
time, so they would not be compatible with the goal of no change requests for HA.
|
||||
5. Improve health monitoring to better indicate fault failure. Extending the existing JMX and logging support should
|
||||
achieve this, although we probably need to create watchdog CordApp that verifies that the State Machine and Artemis
|
||||
messaging are able to process new work and to monitor flow latency.
|
||||
6. Test the checkpointing mechanism and confirm that failures don't corrupt the data by deploying an HA setup on Azure
|
||||
and driving flows through the system as we stop the node randomly and switch to the other node. If this reveals any
|
||||
issues we will have to fix them.
|
||||
7. Confirm that the behaviour of the RPC Client API is stable through these restarts, from the perspective of a stateless
|
||||
REST server calling through to RPC. The RPC API should provide positive feedback to the application, so that it can
|
||||
respond in a controlled fashion when disconnected.
|
||||
8. Work on flow hospital tools where needed
|
||||
|
||||
### Moving Towards Automatic Failover HA
|
||||
|
||||
To move towards more automatic failover handling we need to ensure that the node can be partially active i.e. live
|
||||
monitoring the health status and perhaps keeping major data structures in sync for faster activation, but not actually
|
||||
processing flows. This needs to be reversible without leakage, or destabilising the node as it is common to use manually
|
||||
driven master changes to help with software upgrades and to carry out regular node shutdown and maintenance. Also, to
|
||||
reduce the risks associated with the uncoupled replication of the Artemis message data and the database I would
|
||||
recommend that we move the Artemis broker out of the node to allow us to create a failover cluster. This is also in line
|
||||
with the goal of creating a AMQP bridges and Floats.
|
||||
|
||||
To this end I would suggest packages of work that include:
|
||||
|
||||
1. Move the broker out of the node, which will require having a protocol that can be used to signal bridge creation and
|
||||
which decouples the network map. This is in line with the Flow work anyway.
|
||||
2. Create a mastering solution, probably using Atomix.IO although this might require a solution with a minimum of three
|
||||
nodes to avoid split brain issues. Ideally this service should be extensible in the future to lead towards an eventual
|
||||
state with Flow level sharding. Alternatively, we may be able to add a quick enterprise adaptor to ZooKeeper as
|
||||
master selector if time is tight. This will inevitably impact upon configuration and deployment support.
|
||||
3. Test the leakage when we repeated start-stop the Node class and fix any resource leaks, or deadlocks that occur at shutdown.
|
||||
4. Switch the Artemis client code to be able to use the HA mode connection type and thus take advantage of the rapid
|
||||
failover code. Also, ensure that we can support multiple public IP addresses reported in the network map.
|
||||
5. Implement proper detection and handling of disconnect from the external database and/or Artemis broker, which should
|
||||
immediately drop the master status of the node and flush any incomplete flows.
|
||||
6. We should start looking at how to make RPC proxies recover from disconnect/failover, although this is probably not a
|
||||
top priority. However, it would be good to capture the missed results of completed flows and ensure the API allows
|
||||
clients to unregister/re-register Observables.
|
||||
|
||||
## The Future
|
||||
|
||||
Hopefully, most of the work from the automatic failover mode can be modified when we move to a full hot-hot sharding of
|
||||
flows across nodes. The mastering solution will need to be modified to negotiate finer grained claim on individual
|
||||
flows, rather than stopping the whole of Node. Also, the routing of messages will have to be thought about so that they
|
||||
go to the correct node for processing, but failover if the node dies. However, most of the other health monitoring and
|
||||
operational aspects should be reusable.
|
||||
|
||||
We also need to look at DR issues and in particular how we might handle asynchronous replication and possibly
|
||||
alternative recovery/reconciliation mechanisms.
|
Before Width: | Height: | Size: 376 KiB |
Before Width: | Height: | Size: 423 KiB |
Before Width: | Height: | Size: 247 KiB |
Before Width: | Height: | Size: 280 KiB |
@ -1,50 +0,0 @@
|
||||
# Design Decision: Storage engine for committed state index
|
||||
|
||||
## Background / Context
|
||||
|
||||
The storage engine for the committed state index needs to support a single operation: "insert all values with unique
|
||||
keys, or abort if any key conflict found". A wide range of solutions could be used for that, from embedded key-value
|
||||
stores to full-fledged relational databases. However, since we don't need any extra features a RDBMS provides over a
|
||||
simple key-value store, we'll only consider lightweight embedded solutions to avoid extra operational costs.
|
||||
|
||||
Most RDBMSs are also generally optimised for read performance (use B-tree based storage engines like InnoDB, MyISAM).
|
||||
Our workload is write-heavy and uses "random" primary keys (state references), which leads to particularly poor write
|
||||
performance for those types of engines – as we have seen with our Galera-based notary service. One exception is the
|
||||
MyRocks storage engine, which is based on RocksDB and can handle write workloads well, and is supported by Percona
|
||||
Server, and MariaDB. It is easier, however, to just use RocksDB directly.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. RocksDB
|
||||
|
||||
An embedded key-value store based on log-structured merge-trees (LSM). It's highly configurable, provides lots of
|
||||
configuration options for performance tuning. E.g. can be tuned to run on different hardware – flash, hard disks or
|
||||
entirely in-memory.
|
||||
|
||||
### B. LMDB
|
||||
|
||||
An embedded key-value store using B+ trees, has ACID semantics and support for transactions.
|
||||
|
||||
### C. MapDB
|
||||
|
||||
An embedded Java database engine, providing persistent collection implementations. Uses memory mapped files. Simple to
|
||||
use, implements Java collection interfaces. Provides a HashMap implementation that we can use for storing committed
|
||||
states.
|
||||
|
||||
### D. MVStore
|
||||
|
||||
An embedded log structured key-value store. Provides a simple persistent map abstraction. Supports multiple map
|
||||
implementations (B-tree, R-tree, concurrent B-tree).
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Performance test results when running on a Macbook Pro with Intel Core i7-4980HQ CPU @ 2.80GHz, 16 GB RAM, SSD:
|
||||
|
||||

|
||||
|
||||
Multiple tests were run with varying number of transactions and input states per transaction: "1m x 1" denotes a million
|
||||
transactions with one input state.
|
||||
|
||||
Proceed with Option A, as RocksDB provides most tuning options and achieves by far the best write performance.
|
||||
|
||||
Note that the index storage engine can be replaced in the future with minimal changes required on the notary service.
|
@ -1,144 +0,0 @@
|
||||
# Design Decision: Replication framework
|
||||
|
||||
## Background / Context
|
||||
|
||||
Multiple libraries/platforms exist for implementing fault-tolerant systems. In existing CFT notary implementations we
|
||||
experimented with using a traditional relational database with active replication, as well as a pure state machine
|
||||
replication approach based on CFT consensus algorithms.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. Atomix
|
||||
|
||||
*Raft-based fault-tolerant distributed coordination framework.*
|
||||
|
||||
Our first CFT notary notary implementation was based on Atomix. Atomix can be easily embedded into a Corda node and
|
||||
provides abstractions for implementing custom replicated state machines. In our case the state machine manages committed
|
||||
Corda contract states. When notarisation requests are sent to Atomix, they get forwarded to the leader node. The leader
|
||||
persists the request to a log, and replicates it to all followers. Once the majority of followers acknowledge receipt,
|
||||
it applies the request to the user-defined state machine. In our case we commit all input states in the request to a
|
||||
JDBC-backed map, or return an error if conflicts occur.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Lightweight, easy to integrate – embeds into Corda node.
|
||||
2. Uses Raft for replication – simpler and requires less code than other algorithms like Paxos.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Not designed for storing large datasets. State is expected to be maintained in memory only. On restart, each replica re-reads the entire command log to reconstruct the state. This behaviour is not configurable and would require code changes.
|
||||
2. Does not support batching, not optimised for performance.
|
||||
3. Since version 2.0, only supports snapshot replication. This means that each replica has to periodically dump the entire commit log to disk, and replicas that fall behind have to download the _entire_ snapshot.
|
||||
4. Limited tooling.
|
||||
|
||||
### B. Permazen
|
||||
|
||||
*Java persistence layer with a built-in Raft-based replicated key-value store.*
|
||||
|
||||
Conceptually similar to Atomix, but persists the state machine instead of the request log. Built around an abstract
|
||||
persistent key-value store: requests get cleaned up after replication and processing.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Lightweight, easy to integrate – embeds into Corda node.
|
||||
2. Uses Raft for replication – simpler and requires less code than other algorithms like Paxos.
|
||||
3. Built around a (optionally) persistent key-value store – supports large datasets.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Maintained by a single developer, used by a single company in production. Code quality and documentation looks to be of a high standard though.
|
||||
2. Not tested with large datasets.
|
||||
3. Designed for read-write-delete workloads. Replicas that fall behind too much will have to download the entire state snapshot (similar to Atomix).
|
||||
4. Does not support batching, not optimised for performance.
|
||||
5. Limited tooling.
|
||||
|
||||
### C. Apache Kafka
|
||||
|
||||
*Paxos-based distributed streaming platform.*
|
||||
|
||||
Atomix and Permazen implement both the replicated request log and the state machine, but Kafka only provides the log
|
||||
component. In theory that means more complexity having to implement request log processing and state machine management,
|
||||
but for our use case it's fairly straightforward: consume requests and insert input states into a database, marking the
|
||||
position of the last processed request. If the database is lost, we can just replay the log from the beginning. The main
|
||||
benefit of this approach is that it gives a more granular control and performance tuning opportunities in different
|
||||
parts of the system.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Stable – used in production for many years.
|
||||
2. Optimised for performance. Provides multiple configuration options for performance tuning.
|
||||
3. Designed for managing large datasets (performance not affected by dataset size).
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Relatively complex to set up and operate, requires a Zookeeper cluster. Note that some hosting providers offer Kafka as-a-service (e.g. Confluent Cloud), so we could delegate the setup and management.
|
||||
2. Dictates a more complex notary service architecture.
|
||||
|
||||
### D. Custom Raft-based implementation
|
||||
|
||||
For even more granular control, we could replace Kafka with our own replicated log implementation. Kafka was started
|
||||
before the Raft consensus algorithm was introduced, and is using Zookeeper for coordination, which is based on Paxos for
|
||||
consensus. Paxos is known to be complex to understand and implement, and the main driver behind Raft was to create a
|
||||
much simpler algorithm with equivalent functionality. Hence, while reimplementing Zookeeper would be an onerous task,
|
||||
building a Raft-based alternative from scratch is somewhat feasible.
|
||||
|
||||
#### Advantages
|
||||
|
||||
Most of the implementations above have many extra features our use-case does not require. We can implement a relatively
|
||||
simple clean optimised solution that will most likely outperform others (Thomas Schroeter already built a prototype).
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
Large effort required to make it highly performant and reliable.
|
||||
|
||||
### E. Galera
|
||||
|
||||
*Synchronous replication plugin for MySQL, uses certification-based replication.*
|
||||
|
||||
All of the options discussed so far were based on abstract state machine replication. Another approach is simply using a
|
||||
more traditional RDBMS with active replication support. Note that most relational databases support some form
|
||||
replication in general, however, very few provide strong consistency guarantees and ensure no data loss. Galera is a
|
||||
plugin for MySQL enabling synchronous multi-master replication.
|
||||
|
||||
Galera uses certification-based replication, which operates on write-sets: a database server executes the (database)
|
||||
transaction, and only performs replication if the transaction requires write operations. If it does, the transaction is
|
||||
broadcasted to all other servers (using atomic broadcast). On delivery, each server executes a deterministic
|
||||
certification phase, which decides if the transaction can commit or must abort. If a conflict occurs, the entire cluster
|
||||
rolls back the transaction. This type of technique is quite efficient in low-conflict situations and allows read scaling
|
||||
(the latter is mostly irrelevant for our use case).
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Very little code required on Corda side to implement.
|
||||
2. Stable – used in production for many years.
|
||||
3. Large tooling and support ecosystem.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Certification-based replication is based on database transactions. A replication round is performed on every transaction commit, and batching is not supported. To improve performance, we need to combine the committing of multiple Corda transactions into a single database transaction, which gets complicated when conflicts occur.
|
||||
2. Only supports the InnoDB storage engine, which is based on B-trees. It works well for reads, but performs _very_ poorly on write-intensive workloads with "random" primary keys. In tests we were only able to achieve up to 60 TPS throughput. Moreover, the performance steadily drops with more data added.
|
||||
|
||||
### F. CockroachDB
|
||||
|
||||
*Distributed SQL database built on a transactional and strongly-consistent key-value store. Uses Raft-based replication.*
|
||||
|
||||
On paper, CockroachDB looks like a great candidate, but it relies on sharding: data is automatically split into
|
||||
partitions, and each partition is replicated using Raft. It performs great for single-shard database transactions, and
|
||||
also natively supports cross-shard atomic commits. However, the majority of Corda transactions are likely to have more
|
||||
than one input state, which means that most transaction commits will require cross-shard database transactions. In our
|
||||
tests we were only able to achieve up to 30 TPS in a 3 DC deployment.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Scales very well horizontally by sharding data.
|
||||
2. Easy to set up and operate.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Cross-shard atomic commits are slow. Since we expect most transactions to contain more than one input state, each transaction commit will very likely span multiple shards.
|
||||
2. Fairly new, limited use in production so far.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option C. A Kafka-based solution strikes the best balance between performance and the required effort to
|
||||
build a production-ready solution.
|
@ -1,240 +0,0 @@
|
||||
# High Performance CFT Notary Service
|
||||
|
||||
.. important:: This design document describes a feature of Corda Enterprise.
|
||||
|
||||
## Overview
|
||||
|
||||
This proposal describes the architecture and an implementation for a high performance crash fault-tolerant notary
|
||||
service, operated by a single party.
|
||||
|
||||
## Background
|
||||
|
||||
For initial deployments, we expect to operate a single non-validating CFT notary service. The current Raft and Galera
|
||||
implementations cannot handle more than 100-200 TPS, which is likely to be a serious bottleneck in the near future. To
|
||||
support our clients and compete with other platforms we need a notary service that can handle TPS in the order of
|
||||
1,000s.
|
||||
|
||||
## Scope
|
||||
|
||||
Goals:
|
||||
|
||||
- A CFT non-validating notary service that can handle more than 1,000 TPS. Stretch goal: 10,000 TPS.
|
||||
- Disaster recovery strategy and tooling.
|
||||
- Deployment strategy.
|
||||
|
||||
Out-of-scope:
|
||||
|
||||
- Validating notary service.
|
||||
- Byzantine fault-tolerance.
|
||||
|
||||
## Timeline
|
||||
|
||||
No strict delivery timeline requirements, depends on client throughput needs. Estimated delivery by end of Q3 2018.
|
||||
|
||||
## Requirements
|
||||
|
||||
The notary service should be able to:
|
||||
|
||||
- Notarise more than 1,000 transactions per second, with average 4 inputs per transaction.
|
||||
- Notarise a single transaction within 1s (from the service perspective).
|
||||
- Tolerate single node crash without affecting service availability.
|
||||
- Tolerate single data center failure.
|
||||
- Tolerate single disk failure/corruption.
|
||||
|
||||
|
||||
## Design Decisions
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
decisions/replicated-storage.md
|
||||
decisions/index-storage.md
|
||||
|
||||
## Target Solution
|
||||
|
||||
Having explored different solutions for implementing notaries we propose the following architecture for a CFT notary,
|
||||
consisting of two components:
|
||||
|
||||
1. A central replicated request log, which orders and stores all notarisation requests. Efficient append-only log
|
||||
storage can be used along with batched replication, making performance mainly dependent on network throughput.
|
||||
2. Worker nodes that service clients and maintain a consumed state index. The state index is a simple key-value store
|
||||
containing committed state references and pointers to the corresponding request positions in the log. If lost, it can be
|
||||
reconstructed by replaying and applying request log entries. There is a range of fast key-value stores that can be used
|
||||
for implementation.
|
||||
|
||||

|
||||
|
||||
At high level, client notarisation requests first get forwarded to a central replicated request log. The requests are
|
||||
then applied in order to the consumed state index in each worker to verify input state uniqueness. Each individual
|
||||
request outcome (success/conflict) is then sent back to the initiating client by the worker responsible for it. To
|
||||
emphasise, each worker will process _all_ notarisation requests, but only respond to the ones it received directly.
|
||||
|
||||
Messages (requests) in the request log are persisted and retained forever. The state index has a relatively low
|
||||
footprint and can in theory be kept entirely in memory. However, when a worker crashes, replaying the log to recover the
|
||||
index may take too long depending on the SLAs. Additionally, we expect applying the requests to the index to be much
|
||||
faster than consuming request batches even with persistence enabled.
|
||||
|
||||
_Technically_, the request log can also be kept entirely in memory, and the cluster will still be able to tolerate up to
|
||||
$f < n/2$ node failures. However, if for some reason the entire cluster is shut down (e.g. administrator error), all
|
||||
requests will be forever lost! Therefore, we should avoid it.
|
||||
|
||||
The request log does not need to be a separate cluster, and the worker nodes _could_ maintain the request log replicas
|
||||
locally. This would allow workers to consume ordered requests from the local copy rather than from a leader node across
|
||||
the network. It is hard to say, however, if this would have a significant performance impact without performing tests in
|
||||
the specific network environment (e.g. the bottleneck could be the replication step).
|
||||
|
||||
One advantage of hosting the request log in a separate cluster is that it makes it easier to independently scale the
|
||||
number of worker nodes. If, for example, if transaction validation and resolution is required when receiving a
|
||||
notarisation request, we might find that a significant number of receivers is required to generate enough incoming
|
||||
traffic to the request log. On the flip side, increasing the number of workers adds additional consumers and load on the
|
||||
request log, so a balance needs to be found.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
As the design decision documents below discuss, the most suitable platform for managing the request log was chosen to be
|
||||
[Apache Kafka](https://kafka.apache.org/), and [RocksDB](http://rocksdb.org/) as the storage engine for the committed
|
||||
state index.
|
||||
|
||||
| Heading | Recommendation |
|
||||
| ---------------------------------------- | -------------- |
|
||||
| [Replication framework](decisions/replicated-storage.md) | Option C |
|
||||
| [Index storage engine](decisions/index-storage.md) | Option A |
|
||||
|
||||
TECHNICAL DESIGN
|
||||
---
|
||||
|
||||
## Functional
|
||||
|
||||
A Kafka-based notary service does not deviate much from the high-level target solution architecture as described above.
|
||||
|
||||

|
||||
|
||||
For our purposes we can view Kafka as a replicated durable queue we can push messages (_records_) to and consume from.
|
||||
Consuming a record just increments the consumer's position pointer, and does not delete it. Old records eventually
|
||||
expire and get cleaned up, but the expiry time can be set to "indefinite" so all data is retained (it's a supported
|
||||
use-case).
|
||||
|
||||
The main caveat is that Kafka does not allow consuming records from replicas directly – all communication has to be
|
||||
routed via a single leader node.
|
||||
|
||||
In Kafka, logical queues are called _topics_. Each topic can be split into multiple partitions. Topics are assigned a
|
||||
_replication factor_, which specifies how many replicas Kafka should create for each partition. Each replicated
|
||||
partition has an assigned leader node which producers and consumers can connect to. Partitioning topics and evenly
|
||||
distributing partition leadership allows Kafka to scale well horizontally.
|
||||
|
||||
In our use-case, however, we can only use a single-partition topic for notarisation requests, which limits the total
|
||||
capacity and throughput to a single machine. Partitioning requests would break global transaction ordering guarantees
|
||||
for consumers. There is a [proposal](#kafka-throughput-scaling-via-partitioning) from Rick Parker on how we _could_ use
|
||||
partitioning to potentially avoid traffic contention on the single leader node.
|
||||
|
||||
### Data model
|
||||
|
||||
Each record stored in the Kafka topic contains:
|
||||
1. Transaction Id
|
||||
2. List of input state references
|
||||
2. Requesting party X.500 name
|
||||
3. Notarisation request signature
|
||||
|
||||
The committed state index contains a map of:
|
||||
|
||||
`Input state reference: StateRef -> ( Transaction Id: SecureHash, Kafka record position: Long )`
|
||||
|
||||
It also stores a special key-value pair denoting the position of the last applied Kafka record.
|
||||
|
||||
## Non-Functional
|
||||
|
||||
### Fault tolerance, durability and consistency guarantees
|
||||
|
||||
Let's have a closer look at what exactly happens when a client sends a notarisation request to a notary worker node.
|
||||
|
||||

|
||||
|
||||
A small note on terminology: the "notary service" we refer to in this section is the internal long-running service in the Corda node.
|
||||
|
||||
1. Client sends a notarisation request to the chosen Worker node. The load balancing is handled on the client by Artemis (round-robin).
|
||||
2. Worker acknowledges receipt and starts the service flow. The flow validates the request: verifies the transaction if needed, validates timestamp and notarisation request signature. The flow then forwards the request to the notary service, and suspends waiting for a response.
|
||||
3. The notary service wraps the request in a Kafka record and sends it to the global log via a Kafka producer. The sends are asynchronous from the service's perspective, and the producer is configured to buffer records and perform sends in batches.
|
||||
4. The Kafka leader node responsible for the topic partition replicates the received records to followers. The producer also specifies "ack" settings, which control when the records are considered to be committed. Only committed records are available for consumers. Using the "all" setting ensures that the records are persisted all replicas before it is available for consumption. **This ensures that no worker will consume a record that may later be lost if the Kafka leader crashes**.
|
||||
7. The notary service maintains a separate thread that continuously attempts to pull new available batches of records from the Kafka leader node. It processes the received batches of notarisation requests – commits input states to a local persistent key-value store. Once a batch is processed, the last record position in the Kafka partition is also persisted locally. On restart, the consumption of records is started from the last recorded position.
|
||||
9. Kafka also tracks consumer positions in Zookeeper, and provides the ability for consumers to commit the last consumed position either synchronously, or asynchronously. Since we don't require exactly once delivery semantics, we opt for asynchronous position commits for performance reasons.
|
||||
10. Once notarisation requests are processed, the notary service matches them against ones received by this particular worker node, and resumes the flows to send responses back to the clients.
|
||||
|
||||
Now let's consider the possible failure scenarios and how they are handled:
|
||||
* 2: Worker fails to acknowledge request. The Artemis broker on the client will redirect the message to a different worker node.
|
||||
* 3: Worker fails right after acknowledging the request, nothing is sent to the Kafka request log. Without some heartbeat mechanism the client can't know if the worker has failed, or the request is simply taking a long time to process. For this reason clients have special logic to retry notarisation requests with different workers, if a response is not received before a specified timeout.
|
||||
* 4: Kafka leader fails before replicating records. The producer does not receive an ack and the batch send fails. A new leader is elected and all producers and consumers switch to it. The producer retries sending with the new leader (it has to be configured to auto-retry). The lost records were not considered to be committed and therefore not made available for any consumers. Even if the producer did not re-send the batch to the new leader, client retries would fire and the requests would be reinserted into the "pipeline".
|
||||
* 7: The worker fails after sending out a batch of requests. The requests will be replicated and processed by other worker nodes. However, other workers will not send back replies to clients that the failed worker was responsible for.
|
||||
The client will retry with another worker. That worker will have already processed the same request, and committing the input states will result in a conflict. Since the conflict is caused by the same Corda transaction, it will ignore it and send back a successful response.
|
||||
* 8: The worker fails right after consuming a record batch. The consumer position is not recorded anywhere so it would re-consume the batch once it's back up again.
|
||||
* 9: The worker fails right after committing input states, but before recording last processed record position. On restart, it will re-consume the last batch of requests it had already processed. Committing input states is idempotent so re-processing the same request will succeed. Committing the consumer position to Kafka is strictly speaking not needed in our case, since we maintain it locally and manually "rewind" the partition to the last processed position on startup.
|
||||
* 10: The worker fails just before sending back a response. The client will retry with another worker.
|
||||
|
||||
The above discussion only considers crash failures which don't lead to data loss. What happens if the crash also results in disk corruption/failure?
|
||||
* If a Kafka leader node fails and loses all data, the machine can be re-provisioned, the Kafka node will reconnect to the cluster and automatically synchronise all data from one of the replicas. It can only become a leader again once it fully catches up.
|
||||
* If a worker node fails and loses all data, it can replay the Kafka partition from the beginning to reconstruct the committed state index. To speed this up, periodical backups can be taken so the index can be restored from a more recent snapshot.
|
||||
|
||||
One open question is flow handling on the worker node. If notary service flow is checkpointed and the worker crashes while the flow is suspended and waiting for a response (the completion of a future), on restart the flow will re-issue the request to the notary service. The service will in turn forward it to the request log (Kafka) for processing. If the worker node was down long enough for the client to retry the request with a different worker, a single notarisation request will get processed 3 times.
|
||||
|
||||
If the notary service flow is not checkpointed, the request won't be re-issued after restart, resulting in it being processed only twice. However, in the latter case, the client will need to wait for the entire duration until the timeout expires, and if the worker is down for only a couple of seconds, the first approach would result in a much faster response time.
|
||||
|
||||
### Performance
|
||||
|
||||
Kafka provides various configuration parameters allowing to control producer and consumer record batch size, compression, buffer size, ack synchrony and other aspects. There are also guidelines on optimal filesystem setup.
|
||||
|
||||
RocksDB is highly tunable as well, providing different table format implementations, compression, bloom filters, compaction styles, and others.
|
||||
|
||||
Initial prototype tests showed up to *15,000* TPS for single-input state transactions, or *40,000* IPS (inputs/sec) for 1,000 input transactions. No performance drop observed even after 1.2m transactions were notarised. The tests were run on three 8 core, 28 GB RAM Azure VMs in separate data centers.
|
||||
|
||||
With the recent introduction of notarisation request signatures the figures are likely to be much lower, as the request payload size is increased significantly. More tuning and testing required.
|
||||
|
||||
### Scalability
|
||||
|
||||
Not possible to scale beyond peak single machine throughput. Possible to scale the number of worker nodes for transactions verification and signing.
|
||||
|
||||
## Operational
|
||||
|
||||
As a general note, Kafka and Zookeeper are widely used in the industry and there are plenty of deployment guidelines and management tools available.
|
||||
|
||||
### Deployment
|
||||
|
||||
Different options available. A singe Kafka broker, Zookeeper replica and a Corda notary worker node can be hosted on the same machine for simplicity and cost-saving. At the other extreme, every Kafka/Zookeeper/Corda node can be hosted on its own machine. The latter arguably provides more room for error, at the expense of extra operational costs and effort.
|
||||
|
||||
### Management
|
||||
|
||||
Kafka provides command-line tools for managing brokers and topics. Third party UI-based tools are also available.
|
||||
|
||||
### Monitoring
|
||||
|
||||
Kafka exports a wide range of metrics via JMX. Datadog integration available.
|
||||
|
||||
### Disaster recovery
|
||||
|
||||
Failure modes:
|
||||
1. **Single machine or data center failure**. No backup/restore procedures are needed – nodes can catch up with the cluster on start. The RocksDB-backed committed state index keeps a pointer to the position of the last applied Kafka record, and it can resume where it left after restart.
|
||||
2. **Multi-data center disaster leading to data loss**. Out of scope.
|
||||
3. **User error**. It is possible for an admin to accidentally delete a topic – Kafka provides tools for that. However, topic deletion has to be explicitly enabled in the configuration (disabled by default). Keeping that option disabled should be a sufficient safeguard.
|
||||
4. **Protocol-level corruption**. This covers scenarios when data stored in Kafka gets corrupted and the corruption is replicated to healthy replicas. In general, this is extremely unlikely to happen since Kafka records are immutable. The only such corruption in practical sense could happen due to record deletion during compaction, which would occur if the broker is misconfigured to not retrain records indefinitely. However, compaction is performed asynchronously and local to the broker. In order for all data to be lost, _all_ brokers have to be misconfigured.
|
||||
|
||||
It is not possible to recover without any data loss in the event of 3 or 4. We can only _minimise_ data loss. There are two options:
|
||||
1. Run a backup Kafka cluster. Kafka provides a tool that forwards messages from one cluster to another (asynchronously).
|
||||
2. Take periodical physical backups of the Kafka topic.
|
||||
|
||||
In both scenarios the most recent requests will be lost. If data loss only occurs in Kafka, and the worker committed state indexes are intact, the notary could still function correctly and prevent double-spends of the transactions that were lost. However, in the non-validating notary scenario, the notarisation request signature and caller identity will be lost, and it will be impossible to trace the submitter of a fraudulent transaction. We could argue that the likelihood of request loss _and_ malicious transactions occurring at the same time is very low.
|
||||
|
||||
## Security
|
||||
|
||||
* **Communication**. Kafka supports SSL for both client-to-server and server-to-server communication. However, Zookeeper only supports SSL in client-to-server, which means that running Zookeeper across data centers will require setting up a VPN. For simplicity, we can reuse the same VPN for the Kafka cluster as well. The notary worker nodes can talk to Kafka either via SSL or the VPN.
|
||||
|
||||
* **Data privacy**. No transaction contents or PII is revealed or stored.
|
||||
|
||||
APPENDICES
|
||||
---
|
||||
|
||||
## Kafka throughput scaling via partitioning
|
||||
|
||||
We have to use a single partition for global transaction ordering guarantees, but we could reduce the load on it by using it _just_ for ordering:
|
||||
|
||||
* Have a single-partition `transactions` topic where all worker nodes send only the transaction id.
|
||||
* Have a separate _partitioned_ `payload` topic where workers send the entire notarisation request content: transaction id, inputs states, request signature. A single request can be around 1KB in size).
|
||||
|
||||
Workers would need to consume from the `transactions` partition to obtain the ordering, and from all `payload` partitions for the actual notarisation requests. A request will not be processed until its global order is known. Since Kafka tries to distribute leaders for different partitions evenly across the cluster, we would avoid a single Kafka broker handling all of the traffic. Load-wise, nothing changes from the worker node's perspective – it still has to process all requests – but a larger number of worker nodes could be supported.
|
Before Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 16 KiB |
Before Width: | Height: | Size: 37 KiB |
Before Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 63 KiB |
@ -1,542 +0,0 @@
|
||||
# Monitoring and Logging Design
|
||||
|
||||
## Overview
|
||||
|
||||
The successful deployment and operation of Corda (and associated CorDapps) in a production environment requires a
|
||||
supporting monitoring and management capability to ensure that both a Corda node (and its supporting middleware
|
||||
infrastructure) and deployed CorDapps execute in a functionally correct and consistent manner. A pro-active monitoring
|
||||
solution will enable the immediate alerting of unexpected behaviours and associated management tooling should enable
|
||||
swift corrective action.
|
||||
|
||||
This design defines the monitoring metrics and logging outputs, and associated implementation approach, required to
|
||||
enable a proactive enterprise management and monitoring solution of Corda nodes and their associated CorDapps. This also
|
||||
includes a set of "liveliness" checks to verify and validate correct functioning of a Corda node (and associated
|
||||
CorDapp).
|
||||
|
||||

|
||||
|
||||
In the above diagram, the left hand side dotted box represents the components within scope for this design. It is
|
||||
anticipated that 3rd party enterprise-wide system management solutions will closely follow the architectural component
|
||||
breakdown in the right hand side box, and thus seamlessly integrate with the proposed Corda event generation and logging
|
||||
design. The interface between the two is de-coupled and based on textual log file parsing and adoption of industry
|
||||
standard JMX MBean events.
|
||||
|
||||
## Background
|
||||
|
||||
Corda currently exposes several forms of monitorable content:
|
||||
|
||||
* Application log files using the [SLF4J](https://www.slf4j.org/) (Simple Logging Facade for Java) which provides an
|
||||
abstraction over various concrete logging frameworks (several of which are used within other Corda dependent 3rd party
|
||||
libraries). Corda itself uses the [Apache Log4j 2](https://logging.apache.org/log4j/2.x/) framework for logging output
|
||||
to a set of configured loggers (to include a rolling file appender and the console). Currently the same set of rolling
|
||||
log files are used by both the node and CorDapp(s) deployed to the node. The log file policy specifies a 60 day
|
||||
rolling period (but preserving the most recent 10Gb) with a maximum of 10 log files per day.
|
||||
|
||||
* Industry standard exposed JMX-based metrics, both standard JVM and custom application metrics are exposed directly
|
||||
using the [Dropwizard.io](http://metrics.dropwizard.io/3.2.3/) *JmxReporter* facility. In addition Corda also uses the
|
||||
[Jolokia](https://jolokia.org/) framework to make these accessible over an HTTP endpoint. Typically, these metrics are
|
||||
also collated by 3rd party tools to provide pro-active monitoring, visualisation and re-active management.
|
||||
|
||||
A full list of currently exposed metrics can be found in the appendix A.
|
||||
|
||||
The Corda flow framework also has *placeholder* support for recording additional Audit data in application flows using a
|
||||
simple *AuditService*. Audit event types are currently loosely defined and data is stored in string form (as a
|
||||
description and contextual map of name-value pairs) together with a timestamp and principal name. This service does not
|
||||
currently have an implementation of the audit event data to a persistent store.
|
||||
|
||||
The `ProgressTracker` component is used to report the progress of a flow throughout its business lifecycle, and is
|
||||
typically configured to report the start of a specific business workflow step (often before and after message send and
|
||||
receipt where other participants form part of a multi-staged business workflow). The progress tracking framework was
|
||||
designed to become a vital part of how exceptions, errors, and other faults are surfaced to human operators for
|
||||
investigation and resolution. It provides a means of exporting progress as a hierarchy of steps in a way that’s both
|
||||
human readable and machine readable.
|
||||
|
||||
In addition, in-house Corda networks at R3 use the following tools:
|
||||
|
||||
* Standard [DataDog](https://docs.datadoghq.com/guides/overview/) probes are currently used to provide e-mail based
|
||||
alerting for running Corda nodes. [Telegraf](https://github.com/influxdata/telegraf) is used in conjunction with a
|
||||
[Jolokia agent](https://jolokia.org/agent.html) as a collector to parse emitted metric data and push these to DataDog.
|
||||
* Investigation is underway to evaluate [ELK](https://logz.io/learn/complete-guide-elk-stack/) as a mechanism for parsing,
|
||||
indexing, storing, searching, and visualising log file data.
|
||||
|
||||
## Scope
|
||||
|
||||
### Goals
|
||||
|
||||
- Add new metrics at the level of a Corda node, individual CorDapps, and other supporting Corda components (float, bridge manager, doorman)
|
||||
- Support liveness checking of the node, deployed flows and services
|
||||
- Review logging groups and severities in the node.
|
||||
- Separate application logging from node logging.
|
||||
- Implement the audit framework that is currently only a stubbed out API
|
||||
- Ensure that Corda can be used with third party systems for monitoring, log collection and audit
|
||||
|
||||
### Out of scope
|
||||
|
||||
- Recommendation of a specific set of monitoring tools.
|
||||
- Monitoring of network infrastructure like the network map service.
|
||||
- Monitoring of liveness of peers.
|
||||
|
||||
## Requirements
|
||||
|
||||
Expanding on the first goal identified above, the following requirements have been identified:
|
||||
|
||||
1. Node health
|
||||
- Message queues: latency, number of queues/messages, backlog, bridging establishment and connectivity (success / failure)
|
||||
- Database: connections (retries, errors), latency, query time
|
||||
- RPC metrics, latency, authentication/authorisation checking (eg. number of successful / failed attempts).
|
||||
- Signing performance (eg. signatures per sec).
|
||||
- Deployed CorDapps
|
||||
- Garbage collector and JVM statistics
|
||||
|
||||
2. CorDapp health
|
||||
- Number of flows broken down by type (including flow status and aging statistics: oldest, latest)
|
||||
- Flow durations
|
||||
- JDBC connections, latency/histograms
|
||||
|
||||
3. Logging
|
||||
- RPC logging
|
||||
- Shell logging (user/command pairs)
|
||||
- Message queue
|
||||
- Traces
|
||||
- Exception logging (including full stack traces)
|
||||
- Crash dumps (full stack traces)
|
||||
- Hardware Security Module (HSM) events.
|
||||
- per CorDapp logging
|
||||
|
||||
4. Auditing
|
||||
|
||||
- Security: login authentication and authorisation
|
||||
- Business Event flow progress tracking
|
||||
- System events (particularly failures)
|
||||
|
||||
Audit data should be stored in a secure, storage medium.
|
||||
Audit data should include sufficient contextual information to enable optimal off-line analysis.
|
||||
Auditing should apply to all Corda node processes (running CorDapps, notaries, oracles).
|
||||
|
||||
#### Use Cases
|
||||
|
||||
It is envisaged that operational management and support teams will use the metrics and information collated from this
|
||||
design, either directly or through an integrated enterprise-wide systems management platform, to perform the following:
|
||||
|
||||
- Validate liveness and correctness of Corda nodes and deployed CorDapps, and the physical machine or VM they are hosted on.
|
||||
|
||||
* Use logging to troubleshoot operational failures (in conjunction with other supporting failure information: eg. GC logs, stack traces)
|
||||
* Use reported metrics to fine-tune and tweak operational systems parameters (including dynamic setting of logging
|
||||
modules and severity levels to enable detailed logging).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
The following design decisions are to be confirmed:
|
||||
|
||||
1. JMX for metric eventing and SLF4J for logging
|
||||
Both above are widely adopted mechanisms that enable pluggability and seamless interoperability with other 3rd party
|
||||
enterprise-wide system management solutions.
|
||||
2. Continue or discontinue usage of Jolokia? (TBC - most likely yes, subject to read-only security lock-down)
|
||||
3. Separation of Corda Node and CorDapp log outputs (TBC)
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
There are a number of activities and parts to the solution proposal:
|
||||
|
||||
1. Extend JMX metric reporting through the Corda Monitoring Service and associated jolokia conversion to REST/JSON)
|
||||
coverage (see implementation details) to include all Corda services (vault, key management, transaction storage,
|
||||
network map, attachment storage, identity, cordapp provision) & sub-sytems components (state machine)
|
||||
|
||||
2. Review and extend Corda log4j2 coverage (see implementation details) to ensure
|
||||
|
||||
- consistent use of severities according to situation
|
||||
- consistent coverage across all modules and libraries
|
||||
- consistent output format with all relevant contextual information (node identity, user/execution identity, flow
|
||||
session identity, version information)
|
||||
- separation of Corda Node and CorDapp log outputs (TBC)
|
||||
For consistent interleaving reasons, it may be desirable to continue using combined log output.
|
||||
|
||||
Publication of a *code style guide* to define when to use different severity levels.
|
||||
|
||||
3. Implement a CorDapp to perform sanity checking of flow framework, fundamental corda services (vault, identity), and
|
||||
dependent middleware infrastructure (message broker, database).
|
||||
|
||||
4. Revisit and enhance as necessary the [Audit service API]( https://github.com/corda/corda/pull/620 ), and provide a
|
||||
persistent backed implementation, to include:
|
||||
|
||||
- specification of Business Event Categories (eg. User authentication and authorisation, Flow-based triggering, Corda
|
||||
Service invocations, Oracle invocations, Flow-based send/receive calls, RPC invocations)
|
||||
- auto-enabled with Progress Tracker as Business Event generator
|
||||
- RDBMS backed persistent store (independent of Corda database), with adequate security controls (authenticated access
|
||||
and read-only permissioning). Captured information should be consistent with standard logging, and it may be desirable
|
||||
to define auditable loggers within log4j2 to automatically redirect certain types of log events to the audit service.
|
||||
|
||||
5. Ensure 3rd party middleware drivers (JDBC for database, MQ for messaging) and the JVM are correctly configured to export
|
||||
JMX metrics. Ensure the [JVM Hotspot VM command-line parameters](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts001.html)
|
||||
are tuned correctly to enable detailed troubleshooting upon failure. Many of these metrics are already automatically
|
||||
exposed to 3rd party profiling tools such as Yourkit.
|
||||
|
||||
Apache Artemis has a comprehensive [management API](https://activemq.apache.org/artemis/docs/latest/management.html)
|
||||
that allows a user to modify a server configuration, create new resources (e.g. addresses and queues), inspect these
|
||||
resources (e.g. how many messages are currently held in a queue) and interact with it (e.g. to remove messages from a
|
||||
queue), and exposes key metrics using JMX (using role-based authentication using Artemis's JAAS plug-in support to
|
||||
ensure Artemis cannot be controlled via JMX)..
|
||||
|
||||
##### Restrictions
|
||||
|
||||
As of Corda M11, Java serialisation in the Corda node has been restricted, meaning MBeans access via the JMX port will no longer work.
|
||||
|
||||
Usage of Jolokia requires bundling an associated *jolokia-agent-war* file on the classpath, and associated configuration
|
||||
to export JMX monitoring statistics and data over the Jolokia REST/JSON interface. An associated *jolokia-access.xml*
|
||||
configuration file defines role based permissioning to HTTP operations.
|
||||
|
||||
## Complementary solutions
|
||||
|
||||
A number of 3rd party libraries and frameworks have been proposed which solve different parts of the end to end
|
||||
solution, albeit with most focusing on the Agent Collector (eg. collect metrics from systems then output them to some
|
||||
backend storage.), Event Storage and Search, and Visualization aspects of Systems Management and Monitoring. These
|
||||
include:
|
||||
|
||||
| Solution | Type (OS/£) | Description |
|
||||
| ---------------------------------------- | ----------- | ---------------------------------------- |
|
||||
| [Splunk](https://www.splunk.com/en_us/products.html) | £ | General purpose enterprise-wide system management solution which performs collection and indexing of data, searching, correlation and analysis, visualization and reporting, monitoring and alerting. |
|
||||
| [ELK](https://logz.io/learn/complete-guide-elk-stack/) | OS | The ELK stack is a collection of 3 open source products from Elastic which provide an end to end enterprise-wide system management solution:<br />Elasticsearch: NoSQL database based on Lucene search engine<br />Logstash: is a log pipeline tool that accepts inputs from various sources, executes different transformations, and exports the data to various targets. Kibana: is a visualization layer that works on top of Elasticsearch. |
|
||||
| [ArcSight](https://software.microfocus.com/en-us/software/siem-security-information-event-management) | £ | Enterprise Security Manager |
|
||||
| [Collectd](https://collectd.org/) | OS | Collector agent (written in C circa 2005). Data acquisition and storage handled by over 90 plugins. |
|
||||
| [Telegraf](https://github.com/influxdata/telegraf) | OS | Collector agent (written in Go, active community) |
|
||||
| [Graphite](https://graphiteapp.org/) | OS | Monitoring tool that stores, retrieves, shares, and visualizes time-series data. |
|
||||
| [StatsD](https://github.com/etsy/statsd) | OS | Collector daemon that runs on the [Node.js](http://nodejs.org/) platform and listens for statistics, like counters and timers, sent over [UDP](http://en.wikipedia.org/wiki/User_Datagram_Protocol) or [TCP](http://en.wikipedia.org/wiki/Transmission_Control_Protocol) and sends aggregates to one or more pluggable backend services (e.g., [Graphite](http://graphite.readthedocs.org/)). |
|
||||
| [fluentd](https://www.fluentd.org/) | OS | Collector daemon which collects data directly from logs and databases. Often used to analyze event logs, application logs, and clickstreams (a series of mouse clicks). |
|
||||
| [Prometheus](https://prometheus.io/) | OS | End to end monitoring solution using time-series data (eg. metric name and a set of key-value pairs) and includes collection, storage, query and visualization. |
|
||||
| [NewRelic](https://newrelic.com/) | £ | Full stack instrumentation for application monitoring and real-time analytics solution. |
|
||||
|
||||
Most of the above solutions are not within the scope of this design proposal, but should be capable of ingesting the outputs (logging and metrics) defined by this design.
|
||||
|
||||
## Technical design
|
||||
|
||||
In general, the requirements outlined in this design are cross-cutting concerns which affect the Corda codebase holistically, both for logging and capture/export of JMX metrics.
|
||||
|
||||
### Interfaces
|
||||
|
||||
* Public APIs impacted
|
||||
* No Public API's are impacted.
|
||||
* Internal APIs impacted
|
||||
* No identified internal API's are impacted.
|
||||
* Services impacted:
|
||||
* No change anticipated to following service:
|
||||
* *Monitoring*
|
||||
This service defines and used the *Codahale* `MetricsRegistry`, which is used by all other Corda services.
|
||||
* Changes expected to:
|
||||
* *AuditService*
|
||||
This service has been specified but not implemented.
|
||||
The following event types have been defined (and may need reviewing):
|
||||
* `FlowAppAuditEvent`: used in `FlowStateMachine`, exposed on `FlowLogic` (but never called)
|
||||
* `FlowPermissionAuditEvent`: (as above)
|
||||
* `FlowStartEvent` (unused)
|
||||
* `FlowProgressAuditEvent` (unused)
|
||||
* `FlowErrorAuditEvent` (unused)
|
||||
* `SystemAuditEvent` (unused)
|
||||
* Modules impacted
|
||||
* All modules packaged and shipped as part of a Corda distribution (as published to Artifactory / Maven): *core, node, node-api, node-driver, finance, confidential-identities, test-common, test-utils, webserver, jackson, jfx, mock, rpc*
|
||||
|
||||
### Functional
|
||||
|
||||
#### Health Checker
|
||||
|
||||
The Health checker is a CorDapp which verifies the health and liveliness of the Corda node it is deployed and running within by performing the following activities:
|
||||
|
||||
1. Corda network and middleware infrastructure connectivity checking:
|
||||
|
||||
- Database connectivity
|
||||
- Message broker connectivity
|
||||
|
||||
2. Network Map participants summary (count, list)
|
||||
|
||||
- Notary summary (type, [number of cluster members]
|
||||
|
||||
3. Flow framework verification
|
||||
|
||||
Implement a simple flow that performs a simple "in-node" (no external messaging to 3rd party processes) round trip, and by doing so, exercises:
|
||||
|
||||
- flow checkpointing (including persistence to relational data store)
|
||||
- message subsystem verification (creation of a send-to-self queue for purpose of routing)
|
||||
- custom CordaService invocation (verify and validate behaviour of an installed CordaService)
|
||||
- vault querying (verify and validate behaviour of vault query mechanism)
|
||||
|
||||
[this CorDapp could perform a simple Issuance of a fictional Corda token, Spend Corda token to self, Corda token exit, plus a couple of Vault queries in between: one using the VaultQuery API and the other using a Custom Query via a registered @CordaService]
|
||||
|
||||
4. RPC triggering
|
||||
Autotriggering of above flow using RPC to exercise the following:
|
||||
|
||||
- messaging subsystem verification (RPC queuing)
|
||||
- authenticaton and permissions checking (against underlying configuration)
|
||||
|
||||
|
||||
The Health checker may be deployed as part of a Corda distribution and automatically invoked upon start-up and/or manually triggered via JMX or the nodes associated Crash shell (using the startFlow command)
|
||||
|
||||
Please note that the Health checker application is not responsible for determining the healthiness of a Corda Network. This is the responsibility of the network operator, and may include verification checks such as:
|
||||
|
||||
- correct functioning of Network Map Service (registration, discovery)
|
||||
- correct functioning of configured Notary
|
||||
- remote messaging sub-sytem (including bridge creation)
|
||||
|
||||
#### Metrics augmentation within Corda Subsystems and Components
|
||||
|
||||
*Codahale* provides the following types of reportable metrics:
|
||||
|
||||
- Gauge: is an instantaneous measurement of a value.
|
||||
- Counter: is a gauge for a numeric value (specifically of type `AtomicLong`) which can be incremented or decremented.
|
||||
- Meter: measures mean throughput (eg. the rate of events over time, e.g., “requests per second”). Also measures one-, five-, and fifteen-minute exponentially-weighted moving average throughputs.
|
||||
- Histogram: measures the statistical distribution of values in a stream of data (minimum, maximum, mean, median, 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles).
|
||||
- Timer: measures both the rate that a particular piece of code is called and the distribution of its duration (eg. rate of requests in requests per second).
|
||||
- Health checks: provides a means of centralizing service (database, message broker health checks).
|
||||
|
||||
See Appendix B for summary of current JMX Metrics exported by the Corda codebase.
|
||||
|
||||
The following table identifies additional metrics to report for a Corda node:
|
||||
|
||||
| Component / Subsystem | Proposed Metric(s) |
|
||||
| ---------------------------------------- | ---------------------------------------- |
|
||||
| Database | Connectivity (health check) |
|
||||
| Corda Persistence | Database configuration details: <br />Data source properties: JDBC driver, JDBC driver class name, URL<br />Database properties: isolation level, schema name, init database flag<br />Run-time metrics: total & in flight connection, session, transaction counts; committed / rolled back transaction (counter); transaction durations (metric) |
|
||||
| Message Broker | Connectivity (health check) |
|
||||
| Corda Messaging Client | |
|
||||
| State Machine | Fiber thread pool queue size (counter), Live fibers (counter) , Fibers waiting for ledger commit (counter)<br />Flow Session Messages (counters): init, confirm, received, reject, normal end, error end, total received messages (for a given flow session, Id and state)<br />(in addition to existing metrics captured)<br />Flow error (count) |
|
||||
| Flow State Machine | Initiated flows (counter)<br />For a given flow session (counters): initiated flows, send, sendAndReceive, receive, receiveAll, retries upon send<br />For flow messaging (timers) to determine round trip latencies between send/receive interactions with counterparties.<br />Flow suspension metrics (count, age, wait reason, cordapp) |
|
||||
| RPC | For each RPC operation we should export metrics to report: calling user, round trip latency (timer), calling frequency (meter). Metric reporting should include the Corda RPC protocol version (should be the same as the node's Platform Version) in play. <br />Failed requests would be of particular interest for alerting. |
|
||||
| Vault | round trip latency of Vault Queries (timer)<br />Soft locking counters for reserve, release (counter), elapsed times soft locks are held for per flow id (timer, histogram), list of soft locked flow ids and associated stateRefs.<br />attempt to soft lock fungible states for spending (timer) |
|
||||
| Transaction Verification<br />(InMemoryTransactionVerifierService) | worker pool size (counter), verify duration (timer), verify throughput (meter), success (counter), failure counter), in flight (counter) |
|
||||
| Notarisation | Notary details (type, members in cluster)<br />Counters for success, failures, failure types (conflict, invalid time window, invalid transaction, wrong notary), elapsed time (timer)<br />Ideally provide breakdown of latency across notarisation steps: state ref notary validation, signature checking, from sending to remote notary to receiving response |
|
||||
| RAFT Notary Service<br />(awaiting choice of new RAFT implementation) | should include similar metrics to previous RAFT (see appendix). |
|
||||
| SimpleNotaryService | success/failure uniqueness checking<br />success/failure time-window checking |
|
||||
| ValidatingNotaryService | as above plus success/failure of transaction validation |
|
||||
| RaftNonValidatingNotaryService | as `SimpleNotaryService`, plus timer for algorithmic execution latency |
|
||||
| RaftValidatingNotaryService | as `ValidatingNotaryService`, plus timer for algorithmic execution latency |
|
||||
| BFTNonValidatingNotaryService | as `RaftNonValidatingNotaryService` |
|
||||
| CorDapps<br />(CordappProviderImpl, CordappImpl) | list of corDapps loaded in node, path used to load corDapp jars<br />Details per CorDapp: name, contract class names, initiated flows, rpc flows, service flows, schedulable flows, services, serialization whitelists, custom schemas, jar path |
|
||||
| Doorman Server | TBC |
|
||||
| KeyManagementService | signing requests (count), fresh key requests (count), fresh key and cert requests (count), number of loaded keys (count) |
|
||||
| ContractUpgradeServiceImpl | number of authorisation upgrade requests (counter) |
|
||||
| DBTransactionStorage | number of transactions in storage map (cache) <br />cache size (max. 1024), concurrency level (def. 8) |
|
||||
| DBTransactionMappingStorage | as above |
|
||||
| Network Map | TBC (following re-engineering) |
|
||||
| Identity Service | number or parties, keys, principals (in cache)<br />Identity verification count & latency (count, metric) |
|
||||
| Attachment Service | counters for open, import, checking requests<br />(in addition to exiting attachment count) |
|
||||
| Schema Service | list of registered schemas; schemaOptions per schema; table prefix. |
|
||||
|
||||
#### Logging augmentation within Corda Subsystems and Components
|
||||
|
||||
Need to ensure that Log4J2 log messages within Corda code are correctly categorized according to defined severities (from most specific to least):
|
||||
|
||||
- ERROR: an error in the application, possibly recoverable.
|
||||
- WARNING: an event that might possible lead to an error.
|
||||
- INFO: an event for informational purposes.
|
||||
- DEBUG: a general debugging event.
|
||||
- TRACE: a fine-grained debug message, typically capturing the flow through the application.
|
||||
|
||||
A *logging style guide* will be published to answer questions such as what severity level should be used and why when:
|
||||
|
||||
- A connection to a remote peer is unexpectedly terminated.
|
||||
- A database connection timed out but was successfully re-established.
|
||||
- A message was sent to a peer.
|
||||
|
||||
It is also important that we capture the correct amount of contextual information to enable rapid identification and resolution of issues using log file output. Specifically, within Corda we should include the following information in logged messages:
|
||||
|
||||
- Node identifier
|
||||
- User name
|
||||
- Flow id (runId, also referred to as `StateMachineRunId`), if logging within a flow
|
||||
- Other contextual Flow information (eg. counterparty), if logging within a flow
|
||||
- `FlowStackSnapshot` information for catastrophic flow failures.
|
||||
Note: this information is not currently supposed to be used in production (???).
|
||||
- Session id information for RPC calls
|
||||
- CorDapp name, if logging from within a CorDapp
|
||||
|
||||
See Appendix C for summary of current Logging and Progress Tracker Reporting coverage within the Corda codebase.
|
||||
|
||||
##### Custom logging for enhanced visibility and troubleshooting:
|
||||
|
||||
1. Database SQL logging is controlled via explicit configuration of the Hibernate log4j2 logger as follows:
|
||||
|
||||
```
|
||||
<Logger name="org.hibernate.SQL" level="debug" additivity="false">
|
||||
<AppenderRef ref="Console-Appender"/>
|
||||
</Logger>
|
||||
```
|
||||
|
||||
2. Message broker (Apache Artemis) advanced logging is enabled by configuring log4j2 for each of the 6 available [loggers defined](https://activemq.apache.org/artemis/docs/latest/logging.html). In general, Artemis logging is highly chatty so default logging is actually toned down for one of the defined loggers:
|
||||
|
||||
```
|
||||
<Logger name="org.apache.activemq.artemis.core.server" level="error" additivity="false">
|
||||
<AppenderRef ref="RollingFile-Appender"/>
|
||||
</Logger>
|
||||
```
|
||||
|
||||
3. Corda coin selection advanced logging - including display of prepared statement parameters (which are not displayed for certain database providers when enabling Hibernate debug logging):
|
||||
|
||||
```
|
||||
<Logger name="net.corda.finance.contracts.asset.cash.selection" level="trace" additivity="false">
|
||||
<AppenderRef ref="Console-Appender"/>
|
||||
</Logger>
|
||||
```
|
||||
|
||||
#### Audit Service persistence implementation and enablement
|
||||
|
||||
1. Implementation of the existing `AuditService` API to write to a (pluggable) secure destination (database, message queue, other)
|
||||
2. Identification of Business Events that we should audit, and instrumentation of code to ensure the AuditService is called with the correct Event Type according to Business Event.
|
||||
For Corda Flows it would be a good idea to use the `ProgressTracker` component as a means of sending Business audit events. Refer [here](https://docs.corda.net/head/flow-state-machines.html?highlight=progress%20tracker#progress-tracking) for a detailed description of the ProgressTracker API.
|
||||
3. Identification of System Events that should be automatically audited.
|
||||
4. Specification of a database schema and associated object relational mapping implementation.
|
||||
5. Setup and configuration of separate database and user account.
|
||||
|
||||
## Software Development Tools and Programming Standards to be adopted.
|
||||
|
||||
* Design patterns
|
||||
|
||||
[Michele] proposes the adoption of an [event-based propagation](https://r3-cev.atlassian.net/browse/ENT-1131) solution (and associated event-driven framework) based on separation of concerns (performance improvements through parallelisation, latency minimisation for mainline execution thread): mainstream flow logic, business audit event triggering, JMX metric reporting. This approach would continue to use the same libraries for JMX event triggering and file logging.
|
||||
|
||||
* 3rd party libraries
|
||||
|
||||
[Jolokia](https://jolokia.org/) is a JMX-HTTP bridge giving access to the raw data and operations without connecting to the JMX port directly. Jolokia defines the JSON and REST formats for accessing MBeans, and provides client libraries to work with that protocol as well.
|
||||
|
||||
[Dropwizard Metrics](http://metrics.dropwizard.io/3.2.3/) (formerly Codahale) provides a toolkit of ways to measure the behavior of critical components in a production environment.
|
||||
|
||||
* supporting tools
|
||||
|
||||
[VisualVM](http://visualvm.github.io/) is a visual tool integrating commandline JDK tools and lightweight profiling capabilities.
|
||||
|
||||
## Appendix A - Corda exposed JMX Metrics
|
||||
|
||||
The following metrics are exposed directly by a Corda Node at run-time:
|
||||
|
||||
| Module | Metric | Desccription |
|
||||
| ------------------------ | ---------------------------- | ---------------------------------------- |
|
||||
| Attachment Service | Attachments | Counts number of attachments persisted in database. |
|
||||
| Verification Service | VerificationsInFlight | Gauge of number of in flight verifications handled by the out of process verification service. |
|
||||
| Verification Service | Verification.Duration | Timer |
|
||||
| Verification Service | Verification.Success | Count |
|
||||
| Verification Service | Verification.Failure | Count |
|
||||
| RAFT Uniqueness Provider | RaftCluster.ThisServerStatus | Gauge |
|
||||
| RAFT Uniqueness Provider | RaftCluster.MembersCount | Count |
|
||||
| RAFT Uniqueness Provider | RaftCluster.Members | Gauge, containing a list of members (by server address) |
|
||||
| State Machine Manager | Flows.InFlight | Gauge (number of instances of state machine manager) |
|
||||
| State Machine Manager | Flows.CheckpointingRate | Meter |
|
||||
| State Machine Manager | Flows.Started | Count |
|
||||
| State Machine Manager | Flows.Finished | Count |
|
||||
| Flow State Machine | FlowDuration | Timer |
|
||||
|
||||
Additionally, JMX metrics are also generated within the Corda *node-driver* performance testing utilities. Specifically, the `startPublishingFixedRateInjector` defines and exposes `QueueSize` and `WorkDuration` metrics.
|
||||
|
||||
## Appendix B - Corda Logging and Reporting coverage
|
||||
|
||||
Primary node services exposed publicly via ServiceHub (SH) or internally by ServiceHubInternal (SHI):
|
||||
|
||||
| Service | Type | Implementation | Logging summary |
|
||||
| ---------------------------------------- | ---- | ---------------------------------- | ---------------------------------------- |
|
||||
| VaultService | SH | NodeVaultService | extensive coverage including Vault Query api calls using `HibernateQueryCriteriaParser` |
|
||||
| KeyManagementService | SH | PersistentKeyManagementService | none |
|
||||
| ContractUpgradeService | SH | ContractUpgradeServiceImpl | none |
|
||||
| TransactionStorage | SH | DBTransactionStorage | none |
|
||||
| NetworkMapCache | SH | NetworkMapCacheImpl | some logging (11x info, 1x warning) |
|
||||
| TransactionVerifierService | SH | InMemoryTransactionVerifierService | |
|
||||
| IdentityService | SH | PersistentIdentityService | some logging (error, debug) |
|
||||
| AttachmentStorage | SH | NodeAttachmentService | minimal logging (info) |
|
||||
| | | | |
|
||||
| TransactionStorage | SHI | DBTransactionStorage | see SH |
|
||||
| StateMachineRecordedTransactionMappingStorage | SHI | DBTransactionMappingStorage | none |
|
||||
| MonitoringService | SHI | MonitoringService | none |
|
||||
| SchemaService | SHI | NodeSchemaService | none |
|
||||
| NetworkMapCacheInternal | SHI | PersistentNetworkMapCache | see SH |
|
||||
| AuditService | SHI | <unimplemented> | |
|
||||
| MessagingService | SHI | NodeMessagingClient | Good coverage (error, warning, info, trace) |
|
||||
| CordaPersistence | SHI | CordaPersistence | INFO coverage within `HibernateConfiguration` |
|
||||
| CordappProviderInternal | SHI | CordappProviderImpl | none |
|
||||
| VaultServiceInternal | SHI | NodeVaultService | see SH |
|
||||
| | | | |
|
||||
|
||||
Corda subsystem components:
|
||||
|
||||
| Name | Implementation | Logging summary |
|
||||
| -------------------------- | ---------------------------------------- | ---------------------------------------- |
|
||||
| NotaryService | SimpleNotaryService | some logging (warn) via `TrustedAuthorityNotaryService` |
|
||||
| NotaryService | ValidatingNotaryService | as above |
|
||||
| NotaryService | RaftValidatingNotaryService | some coverage (info, debug) within `RaftUniquenessProvider` |
|
||||
| NotaryService | RaftNonValidatingNotaryService | as above |
|
||||
| NotaryService | BFTNonValidatingNotaryService | Logging coverage (info, debug) |
|
||||
| Doorman | DoormanServer (Enterprise only) | Some logging (info, warn, error), and use of `println` |
|
||||
| | | |
|
||||
|
||||
Corda core flows:
|
||||
|
||||
| Flow name | Logging | Exception handling | Progress Tracking |
|
||||
| --------------------------------------- | ------------------- | ---------------------------------------- | ----------------------------- |
|
||||
| FinalityFlow | none | NotaryException | NOTARISING, BROADCASTING |
|
||||
| NotaryFlow | none | NotaryException (NotaryError types: TimeWindowInvalid, TransactionInvalid, WrongNotary), IllegalStateException, some via `check` assertions | REQUESTING, VALIDATING |
|
||||
| NotaryChangeFlow | none | StateReplacementException | SIGNING, NOTARY |
|
||||
| SendTransactionFlow | none | FetchDataFlow.HashNotFound (FlowException) | |
|
||||
| ReceiveTransactionFlow | none | SignatureException, AttachmentResolutionException, TransactionResolutionException, TransactionVerificationException | |
|
||||
| ResolveTransactionsFlow | none | FetchDataFlow.HashNotFound (FlowException), ExcessivelyLargeTransactionGraph (FlowException) | |
|
||||
| FetchAttachmentsFlow | none | FetchDataFlow.HashNotFound | |
|
||||
| FetchTransactionsFlow | none | FetchDataFlow.HashNotFound | |
|
||||
| FetchDataFlow | some logging (info) | FetchDataFlow.HashNotFound | |
|
||||
| AbstractStateReplacementFlow.Instigator | none | StateReplacementException | SIGNING, NOTARY |
|
||||
| AbstractStateReplacementFlow.Acceptor | none | StateReplacementException | VERIFYING, APPROVING |
|
||||
| CollectSignaturesFlow | none | IllegalArgumentException via `require` assertions | COLLECTING, VERIFYING |
|
||||
| CollectSignatureFlow | none | as above | |
|
||||
| SignTransactionFlow | none | FlowException, possibly other (general) Exception | RECEIVING, VERIFYING, SIGNING |
|
||||
| ContractUpgradeFlow | none | FlowException | |
|
||||
| | | | |
|
||||
|
||||
Corda finance flows:
|
||||
|
||||
| Flow name | Logging | Exception handling | Progress Tracking |
|
||||
| -------------------------- | ------- | ---------------------------------------- | ---------------------------------------- |
|
||||
| AbstractCashFlow | none | CashException (FlowException) | GENERATING_ID, GENERATING_TX, SIGNING_TX, FINALISING_TX |
|
||||
| CashIssueFlow | none | CashException (via call to `FinalityFlow`) | GENERATING_TX, SIGNING_TX, FINALISING_TX |
|
||||
| CashPaymentFlow | none | CashException (caused by `InsufficientBalanceException` or thrown by `FinalityFlow`), SwapIdentitiesException | GENERATING_ID, GENERATING_TX, SIGNING_TX, FINALISING_TX |
|
||||
| CashExitFlow | none | CashException (caused by `InsufficientBalanceException` or thrown by `FinalityFlow`), | GENERATING_TX, SIGNING_TX, FINALISING_TX |
|
||||
| CashIssueAndPaymentFlow | none | any thrown by `CashIssueFlow` and `CashPaymentFlow` | as `CashIssueFlow` and `CashPaymentFlow` |
|
||||
| TwoPartyDealFlow.Primary | none | | GENERATING_ID, SENDING_PROPOSAL |
|
||||
| TwoPartyDealFlow.Secondary | none | IllegalArgumentException via `require` assertions | RECEIVING, VERIFYING, SIGNING, COLLECTING_SIGNATURES, RECORDING |
|
||||
| TwoPartyTradeFlow.Seller | none | FlowException, IllegalArgumentException via `require` assertions | AWAITING_PROPOSAL, VERIFYING_AND_SIGNING |
|
||||
| TwoPartyTradeFlow.Buyer | none | IllegalArgumentException via `require` assertions, IllegalStateException | RECEIVING, VERIFYING, SIGNING, COLLECTING_SIGNATURES, RECORDING |
|
||||
|
||||
Confidential identities flows:
|
||||
|
||||
| Flow name | Logging | Exception handling | Progress Tracking |
|
||||
| ------------------------ | ------- | ---------------------------------------- | ---------------------------------------- |
|
||||
| SwapIdentitiesFlow | | | |
|
||||
| IdentitySyncFlow.Send | none | IllegalArgumentException via `require` assertions, IllegalStateException | SYNCING_IDENTITIES |
|
||||
| IdentitySyncFlow.Receive | none | CertificateExpiredException, CertificateNotYetValidException, InvalidAlgorithmParameterException | RECEIVING_IDENTITIES, RECEIVING_CERTIFICATES |
|
||||
|
||||
## Appendix C - Apache Artemis JMX Event types and Queuing Metrics.
|
||||
|
||||
The following table contains a list of Notification Types and associated perceived importance to a Corda node at run-time:
|
||||
|
||||
| Name | Code | Importance |
|
||||
| --------------------------------- | :--: | ---------- |
|
||||
| BINDING_ADDED | 0 | |
|
||||
| BINDING_REMOVED | 1 | |
|
||||
| CONSUMER_CREATED | 2 | Medium |
|
||||
| CONSUMER_CLOSED | 3 | Medium |
|
||||
| SECURITY_AUTHENTICATION_VIOLATION | 6 | Very high |
|
||||
| SECURITY_PERMISSION_VIOLATION | 7 | Very high |
|
||||
| DISCOVERY_GROUP_STARTED | 8 | |
|
||||
| DISCOVERY_GROUP_STOPPED | 9 | |
|
||||
| BROADCAST_GROUP_STARTED | 10 | N/A |
|
||||
| BROADCAST_GROUP_STOPPED | 11 | N/A |
|
||||
| BRIDGE_STARTED | 12 | High |
|
||||
| BRIDGE_STOPPED | 13 | High |
|
||||
| CLUSTER_CONNECTION_STARTED | 14 | Soon |
|
||||
| CLUSTER_CONNECTION_STOPPED | 15 | Soon |
|
||||
| ACCEPTOR_STARTED | 16 | |
|
||||
| ACCEPTOR_STOPPED | 17 | |
|
||||
| PROPOSAL | 18 | |
|
||||
| PROPOSAL_RESPONSE | 19 | |
|
||||
| CONSUMER_SLOW | 21 | High |
|
||||
|
||||
The following table summarised the types of metrics associated with Message Queues:
|
||||
|
||||
| Metric | Description |
|
||||
| ----------------- | ---------------------------------------- |
|
||||
| count | total number of messages added to a queue since the server started |
|
||||
| countDelta | number of messages added to the queue *since the last message counter update* |
|
||||
| messageCount | *current* number of messages in the queue |
|
||||
| messageCountDelta | *overall* number of messages added/removed from the queue *since the last message counter update*. Positive value indicated more messages were added, negative vice versa. |
|
||||
| lastAddTimestamp | timestamp of the last time a message was added to the queue |
|
||||
| updateTimestamp | timestamp of the last message counter update |
|
||||
|
Before Width: | Height: | Size: 74 KiB |
@ -1,69 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: CPU certification method
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
Remote attestation is done in two main steps.
|
||||
1. Certification of the CPU. This boils down to some kind of Intel signature over a key that only a specific enclave has
|
||||
access to.
|
||||
2. Using the certified key to sign business logic specific enclave quotes and providing the full chain of trust to
|
||||
challengers.
|
||||
|
||||
This design question concerns the way we can manage a certification key. A more detailed description is
|
||||
[here](../details/attestation.md)
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. Use Intel's recommended protocol
|
||||
|
||||
This involves using ``aesmd`` and the Intel SDK to establish an opaque attestation key that transparently signs quotes.
|
||||
Then for each enclave we need to do several round trips to IAS to get a revocation list (which we don't need) and request
|
||||
a direct Intel signature over the quote (which we shouldn't need as the trust has been established already during EPID
|
||||
join)
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. We have a PoC implemented that does this
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Frequent round trips to Intel infrastructure
|
||||
2. Intel can reproduce the certifying private key
|
||||
3. Involves unnecessary protocol steps and features we don't need (EPID)
|
||||
|
||||
### B. Use Intel's protocol to bootstrap our own certificate
|
||||
|
||||
This involves using Intel's current attestation protocol to have Intel sign over our own certifying enclave's
|
||||
certificate that derives its certification key using the sealing fuse values.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Certifying key not reproducible by Intel
|
||||
2. Allows for our own CPU enrollment process, should we need one
|
||||
3. Infrequent round trips to Intel infrastructure (only needed once per microcode update)
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Still uses the EPID protocol
|
||||
|
||||
### C. Intercept Intel's recommended protocol
|
||||
|
||||
This involves using Intel's current protocol as is but instead of doing round trips to IAS to get signatures over quotes
|
||||
we try to establish the chain of trust during EPID provisioning and reuse it later.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Uses Intel's current protocol
|
||||
2. Infrequent rountrips to Intel infrastructure
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. The provisioning protocol is underdocumented and it's hard to decipher how to construct the trust chain
|
||||
2. The chain of trust is not a traditional certificate chain but rather a sequence of signed messages
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option B. This is the most readily available and flexible option.
|
@ -1,59 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: Enclave language of choice
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
In the long run we would like to use the JVM for all enclave code. This is so that later on we can solve the problem of
|
||||
side channel attacks on the bytecode level (e.g. oblivious RAM) rather than putting this burden on enclave functionality
|
||||
implementors.
|
||||
|
||||
As we plan to use a JVM in the long run anyway and we already have an embedded Avian implementation I think the best
|
||||
course of action is to immediately use this together with the full JDK. To keep the native layer as minimal as possible
|
||||
we should forward enclave calls with little to no marshalling to the embedded JVM. All subsequent sanity checks,
|
||||
including ones currently handled by the edger8r generated code should be done inside the JVM. Accessing native enclave
|
||||
functionality (including OCALLs and reading memory from untrusted heap) should be through a centrally defined JNI
|
||||
interface. This way when we switch from Avian we have a very clear interface to code against both from the hosted code's
|
||||
side and from the ECALL/OCALL side.
|
||||
|
||||
The question remains what the thin native layer should be written in. Currently we use C++, but various alternatives
|
||||
popped up, most notably Rust.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. C++
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. The Intel SDK is written in C++
|
||||
2. [Reproducible binaries](https://wiki.debian.org/ReproducibleBuilds)
|
||||
3. The native parts of Avian, HotSpot and SubstrateVM are written in C/C++
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Unsafe memory accesses (unless strict adherence to modern C++)
|
||||
2. Quirky build
|
||||
3. Larger attack surface
|
||||
|
||||
### B. Rust
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Safe memory accesses
|
||||
2. Easier to read/write code, easier to audit
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Does not produce reproducible binaries currently (but it's [planned](https://github.com/rust-lang/rust/issues/34902))
|
||||
2. We would mostly be using it for unsafe things (raw pointers, calling C++ code)
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option A (C++) and keep the native layer as small as possible. Rust currently doesn't produce reproducible
|
||||
binary code, and we need the native layer mostly to handle raw pointers and call Intel SDK functions anyway, so we
|
||||
wouldn't really leverage Rust's safe memory features.
|
||||
|
||||
Having said that, once Rust implements reproducible builds we may switch to it, in this case the thinness of the native
|
||||
layer will be of big benefit.
|
@ -1,58 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: Key-value store implementation
|
||||
============================================
|
||||
|
||||
This is a simple choice of technology.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. ZooKeeper
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Tried and tested
|
||||
2. HA team already uses ZooKeeper
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Clunky API
|
||||
2. No HTTP API
|
||||
3. Hand-rolled protocol
|
||||
|
||||
### B. etcd
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Very simple API, UNIX philosophy
|
||||
2. gRPC
|
||||
3. Tried and tested
|
||||
4. MVCC
|
||||
5. Kubernetes uses it in the background already
|
||||
6. "Successor" of ZooKeeper
|
||||
7. Cross-platform, OSX and Windows support
|
||||
8. Resiliency, supports backups for disaster recovery
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. HA team uses ZooKeeper
|
||||
|
||||
### C. Consul
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. End to end discovery including UIs
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Not very well spread
|
||||
2. Need to store other metadata as well
|
||||
3. HA team uses ZooKeeper
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option B (etcd). It's practically a successor of ZooKeeper, the interface is quite simple, it focuses on
|
||||
primitives (CAS, leases, watches etc) and is tried and tested by many heavily used applications, most notably
|
||||
Kubernetes. In fact we have the option to use etcd indirectly by writing Kubernetes extensions, this would have the
|
||||
advantage of getting readily available CLI and UI tools to manage an enclave cluster.
|
@ -1,81 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: Strategic SGX roadmap
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
The statefulness of the enclave affects the complexity of both the infrastructure and attestation greatly.
|
||||
The infrastructure needs to take care of tracking enclave state for request routing, and we need extra care if we want
|
||||
to make sure that old keys cannot be used to reveal sealed secrets.
|
||||
|
||||
As the first step the easiest thing to do would be to provide an infrastructure for hosting *stateless* enclaves that
|
||||
are only concerned with enclave to non-enclave attestation. This provides a framework to do provable computations,
|
||||
without the headache of handling sealed state and the various implied upgrade paths.
|
||||
|
||||
In the first phase we want to facilitate the ease of rolling out full enclave images (JAR linked into the image)
|
||||
regardless of what the enclaves are doing internally. The contract of an enclave is the host-enclave API (attestation
|
||||
protocol) and the exposure of the static set of channels the enclave supports. Furthermore the infrastructure will allow
|
||||
deployment in a cloud environment and trivial scalability of enclaves through starting them on-demand.
|
||||
|
||||
The first phase will allow for a "fixed stateless provable computations as a service" product, e.g. provable builds or
|
||||
RNG.
|
||||
|
||||
The question remains on how we should proceed afterwards. In terms of infrastructure we have a choice of implementing
|
||||
sealed state or focusing on dynamic loading of bytecode. We also have the option to delay this decision until the end of
|
||||
the first phase.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. Implement sealed state
|
||||
|
||||
Implementing sealed state involves solving the routing problem, for this we can use the concept of active channel sets.
|
||||
Furthermore we need to solve various additional security issues around guarding sealed secret provisioning, most notably
|
||||
expiration checks. This would involve implementing a future-proof calendar time oracle, which may turn out to be
|
||||
impossible, or not quite good enough. We may decide that we cannot actually provide strong privacy guarantees and need
|
||||
to enforce epochs as mentioned [here](../details/time.md).
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. We would solve long term secret persistence early, allowing for a longer time frame for testing upgrades and
|
||||
reprovisioning before we integrate Corda
|
||||
2. Allows "fixed stateful provable computations as a service" product, e.g. HA encryption
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. There are some unsolved issues (Calendar time, sealing epochs)
|
||||
2. It would delay non-stateful Corda integration
|
||||
|
||||
### B. Implement dynamic code loading
|
||||
|
||||
Implementing dynamic loading involves sandboxing of the bytecode, providing bytecode verification and perhaps
|
||||
storage/caching of JARs (although it may be better to develop a more generic caching layer and use channels themselves
|
||||
to do the upload). Doing bytecode verification is quite involved as Avian does not support verification, so this
|
||||
would mean switching to a different JVM. This JVM would either be HotSpot or SubstrateVM, we are doing some preliminary
|
||||
exploratory work to assess their feasibility. If we choose this path it opens up the first true integration point with
|
||||
Corda by enabling semi-validating notaries - these are non-validating notaries that check an SGX signature over the
|
||||
transaction. It would also enable an entirely separate generic product for verifiable pure computation.
|
||||
|
||||
#### Advantages
|
||||
|
||||
1. Early adoption of Graal if we choose to go with it (the alternative is HotSpot)
|
||||
2. Allows first integration with Corda (semi-validating notaries)
|
||||
3. Allows "generic stateless provable computation as a service" product, i.e. anything expressible as a JAR
|
||||
4. Holding off on sealed state
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1. Too early Graal integration may result in maintenance headache later
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option B, dynamic code loading. It would make us very early adopters of Graal (with the implied ups and
|
||||
downs), and most importantly kickstart collaboration between R3 and Oracle. We would also move away from Avian which we
|
||||
wanted to do anyway. It would also give us more time to think about the issues around sealed state, do exploratory work
|
||||
on potential solutions, and there may be further development from Intel's side. Furthermore we need dynamic loading for
|
||||
any fully fledged Corda integration, so we should finish this ASAP.
|
||||
|
||||
## Appendix: Proposed roadmap breakdown
|
||||
|
||||

|
Before Width: | Height: | Size: 98 KiB |
@ -1,84 +0,0 @@
|
||||
# SGX Infrastructure design
|
||||
|
||||
.. important:: This design document describes a feature of Corda Enterprise.
|
||||
|
||||
This document is intended as a design description of the infrastructure around the hosting of SGX enclaves, interaction
|
||||
with enclaves and storage of encrypted data. It assumes basic knowledge of SGX concepts, and some knowledge of
|
||||
Kubernetes for parts specific to that.
|
||||
|
||||
## High level description
|
||||
|
||||
The main idea behind the infrastructure is to provide a highly available cluster of enclave services (hosts) which can
|
||||
serve enclaves on demand. It provides an interface for enclave business logic that's agnostic with regards to the
|
||||
infrastructure, similar to serverless architectures. The enclaves will use an opaque reference
|
||||
to other enclaves or services in the form of enclave channels. Channels hides attestation details
|
||||
and provide a loose coupling between enclave/non-enclave functionality and specific enclave images/services implementing
|
||||
it. This loose coupling allows easier upgrade of enclaves, relaxed trust (whitelisting), dynamic deployment, and
|
||||
horizontal scaling as we can spin up enclaves dynamically on demand when a channel is requested.
|
||||
|
||||
For more information see:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
details/serverless.md
|
||||
details/channels.md
|
||||
|
||||
## Infrastructure components
|
||||
|
||||
Here are the major components of the infrastructure. Note that this doesn't include business logic specific
|
||||
infrastructure pieces (like ORAM blob storage for Corda privacy model integration).
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
details/kv-store.md
|
||||
details/discovery.md
|
||||
details/host.md
|
||||
details/enclave-storage.md
|
||||
details/ias-proxy.md
|
||||
|
||||
## Infrastructure interactions
|
||||
|
||||
* **Enclave deployment**:
|
||||
This includes uploading of the enclave image/container to enclave storage and adding of the enclave metadata to the
|
||||
key-value store.
|
||||
|
||||
* **Enclave usage**:
|
||||
This includes using the discovery service to find a specific enclave image and a host to serve it, then connecting to
|
||||
the host, authenticating(attestation) and proceeding with the needed functionality.
|
||||
|
||||
* **Ops**:
|
||||
This includes management of the cluster (Kubernetes/Kubespray) and management of the metadata relating to discovery to
|
||||
control enclave deployment (e.g. canary, incremental, rollback).
|
||||
|
||||
## Decisions to be made
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
decisions/roadmap.md
|
||||
decisions/certification.md
|
||||
decisions/enclave-language.md
|
||||
decisions/kv-store.md
|
||||
|
||||
## Further details
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
details/attestation.md
|
||||
details/time.md
|
||||
details/enclave-deployment.md
|
||||
|
||||
## Example deployment
|
||||
|
||||
This is an example of how two Corda parties may use the above infrastructure. In this example R3 is hosting the IAS
|
||||
proxy and the enclave image store and the parties host the rest of the infrastructure, aside from Intel components.
|
||||
|
||||
Note that this is flexible, the parties may decide to host their own proxies (as long as they whitelist their keys) or
|
||||
the enclave image store (although R3 will need to have a repository of the signed enclaves somewhere).
|
||||
We may also decide to go the other way and have R3 host the enclave hosts and the discovery service, shared between
|
||||
parties (if e.g. they don't have access to/want to maintain SGX capable boxes).
|
||||
|
||||

|
@ -1,92 +0,0 @@
|
||||
### Terminology recap
|
||||
|
||||
**measurement**: The hash of an enclave image, uniquely pinning the code and related configuration
|
||||
**report**: A datastructure produced by an enclave including the measurement and other non-static properties of the
|
||||
running enclave instance (like the security version number of the hardware)
|
||||
**quote**: A signed report of an enclave produced by Intel's quoting enclave.
|
||||
|
||||
# Attestation
|
||||
|
||||
The goal of attestation is to authenticate enclaves. We are concerned with two variants of this, enclave to non-enclave
|
||||
attestation and enclave to enclave attestation.
|
||||
|
||||
In order to authenticate an enclave we need to establish a chain of trust rooted in an Intel signature certifying that a
|
||||
report is coming from an enclave running on genuine Intel hardware.
|
||||
|
||||
Intel's recommended attestation protocol is split into two phases.
|
||||
|
||||
1. Provisioning
|
||||
The first phase's goal is to establish an Attestation Key(AK) aka EPID key, unique to the SGX installation.
|
||||
The establishment of this key uses an underdocumented protocol similar to the attestation protocol:
|
||||
- Intel provides a Provisioning Certification Enclave(PCE). This enclave has special privileges in that it can derive a
|
||||
key in a deterministic fashion based on the *provisioning* fuse values. Intel stores these values in their databases
|
||||
and can do the same derivation to later check a signature from PCE.
|
||||
- Intel provides a separate enclave called the Provisioning Enclave(PvE), also privileged, which interfaces with PCE
|
||||
(using local attestation) to certify the PvE's report and talks with a special Intel endpoint to join an EPID group
|
||||
anonymously. During the join Intel verifies the PCE's signature. Once the join happened the PvE creates a related
|
||||
private key(the AK) that cannot be linked by Intel to a specific CPU. The PvE seals this key (also sometimes referred
|
||||
to as the "EPID blob") to MRSIGNER, which means it can only be unsealed by Intel enclaves.
|
||||
|
||||
2. Attestation
|
||||
- When a user wants to do attestation of their own enclave they need to do so through the Quoting Enclave(QE), also
|
||||
signed by Intel. This enclave can unseal the EPID blob and use the key to sign over user provided reports
|
||||
- The signed quote in turn is sent to the Intel Attestation Service, which can check whether the quote was signed by a
|
||||
key in the EPID group. Intel also checks whether the QE was provided with an up-to-date revocation list.
|
||||
|
||||
The end result is a signature of Intel over a signature of the AK over the user enclave quote. Challengers can then
|
||||
simply check this chain to make sure that the user provided data in the quote (probably another key) comes from a
|
||||
genuine enclave.
|
||||
|
||||
All enclaves involved (PCE, PvE, QE) are owned by Intel, so this setup basically forces us to use Intel's infrastructure
|
||||
during attestation (which in turn forces us to do e.g. MutualTLS, maintain our own proxies etc). There are two ways we
|
||||
can get around this.
|
||||
|
||||
1. Hook the provisioning phase. During the last step of provisioning the PvE constructs a chain of trust rooted in
|
||||
Intel. If we can extract some provable chain that allows proving of membership based on an EPID signature then we can
|
||||
essentially replicate what IAS does.
|
||||
2. Bootstrap our own certification. This would involve deriving another certification key based on sealing fuse values
|
||||
and getting an Intel signature over it using the original IAS protocol. This signature would then serve the same
|
||||
purpose as the certificate in 1.
|
||||
|
||||
## Non-enclave to enclave channels
|
||||
|
||||
When a non-enclave connects to a "leaf" enclave the goal is to establish a secure channel between the non-enclave and
|
||||
the enclave by authenticating the enclave and possibly authenticating the non-enclave. In addition we want to provide
|
||||
secrecy of the non-enclave. To this end we can use SIGMA-I to do a Diffie-Hellman key exchange between the non-enclave
|
||||
identity and the enclave identity.
|
||||
|
||||
The enclave proves the authenticity of its identity by providing a certificate chain rooted in Intel. If we do our own
|
||||
enclave certification then the chain goes like this:
|
||||
|
||||
* Intel signs quote of certifying enclave containing the certifying key pair's public part.
|
||||
* Certifying key signs report of leaf enclave containing the enclave's temporary identity.
|
||||
* Enclave identity signs the relevant bits in the SIGMA protocol.
|
||||
|
||||
Intel's signature may be cached on disk, and the certifying enclave signature over the temporary identity may be cached
|
||||
in enclave memory.
|
||||
|
||||
We can provide various invalidations, e.g. non-enclave won't accept signature if X time has passed since Intel's
|
||||
signature, or R3's whitelisting cert expired etc.
|
||||
|
||||
If the enclave needs to authorise the non-enclave the situation is a bit more complicated. Let's say the enclave holds
|
||||
some secret that it should only reveal to authorised non-enclaves. Authorisation is expressed as a whitelisting
|
||||
signature over the non-enclave identity. How do we check the expiration of the whitelisting key's certificate?
|
||||
|
||||
Calendar time inside enclaves deserves its own [document](time.md), the gist is that we simply don't have access to time
|
||||
unless we trust a calendar time oracle.
|
||||
|
||||
Note however that we probably won't need in-enclave authorisation for *stateless* enclaves, as these have no secrets to
|
||||
reveal at all. Authorisation would simply serve as access control, and we can solve access control in the hosting
|
||||
infrastructure instead.
|
||||
|
||||
## Enclave to enclave channels
|
||||
|
||||
Doing remote attestation between enclaves is similar to enclave to non-enclave, only this time authentication involves
|
||||
verifying the chain of trust on both sides. However note that this is also predicated on having access to a calendar
|
||||
time oracle, as this time expiration checks of the chain must be done in enclaves. So in a sense both enclave to enclave
|
||||
and stateful enclave to non-enclave attestation forces us to trust a calendar time oracle.
|
||||
|
||||
But note that remote enclave to enclave attestation is mostly required when there *is* sealed state (secrets to share
|
||||
with the other enclave). One other use case is the reduction of audit surface, once it comes to that. We may be able to
|
||||
split stateless enclaves into components that have different upgrade lifecycles. By doing so we ease the auditors' job
|
||||
by reducing the enclaves' contracts and code size.
|
@ -1,75 +0,0 @@
|
||||
# Enclave channels
|
||||
|
||||
AWS Lambdas may be invoked by name, and are simple request-response type RPCs. The lambda's name abstracts the
|
||||
specific JAR or code image that implements the functionality, which allows upgrading of a lambda without disrupting
|
||||
the rest of the lambdas.
|
||||
|
||||
Any authentication required for the invocation is done by a different AWS service (IAM), and is assumed to be taken
|
||||
care of by the time the lambda code is called.
|
||||
|
||||
Serverless enclaves also require ways to be addressed, let's call these "enclave channels". Each such channel may be
|
||||
identified with a string similar to Lambdas, however unlike lambdas we need to incorporate authentication into the
|
||||
concept of a channel in the form of attestation.
|
||||
|
||||
Furthermore unlike Lambdas we can implement a generic two-way communication channel. This reintroduces state into the
|
||||
enclave logic. However note that this state is in-memory only, and because of the transient nature of enclaves (they
|
||||
may be "lost" at any point) enclave authors are in general incentivised to either keep in-memory state minimal (by
|
||||
sealing state) or make their functionality idempotent (allowing retries).
|
||||
|
||||
We should be able to determine an enclave's supported channels statically. Enclaves may store this data for example in a
|
||||
specific ELF section or a separate file. The latter may be preferable as it may be hard to have a central definition of
|
||||
channels in an ELF section if we use JVM bytecode. Instead we could have a specific static JVM datastructure that can be
|
||||
extracted from the enclave statically during the build.
|
||||
|
||||
## Sealed state
|
||||
|
||||
Sealing keys tied to specific CPUs seem to throw a wrench in the requirement of statelessness. Routing a request to an
|
||||
enclave that has associated sealed state cannot be the same as routing to one which doesn't. How can we transparently
|
||||
scale enclaves like Lambdas if fresh enclaves by definition don't have associated sealed state?
|
||||
|
||||
Take key provisioning as an example: we want some key to be accessible by a number of enclaves, how do we
|
||||
differentiate between enclaves that have the key provisioned versus ones that don't? We need to somehow expose an
|
||||
opaque version of the enclave's sealed state to the hosting infrastructure for this.
|
||||
|
||||
The way we could do this is by expressing this state in terms of a changing set of "active" enclave channels. The
|
||||
enclave can statically declare the channels it potentially supports, and start with some initial subset of them as
|
||||
active. As the enclave's lifecycle (sealed state) evolves it may change this active set to something different,
|
||||
thereby informing the hosting infrastructure that it shouldn't route certain requests there, or that it can route some
|
||||
other ones.
|
||||
|
||||
Take the above key provisioning example. An enclave can be in two states, unprovisioned or provisioned. When it's
|
||||
unprovisioned its set of active channels will be related to provisioning (for example, request to bootstrap key or
|
||||
request from sibling enclave), when it's provisioned its active set will be related to the usage of the key and
|
||||
provisioning of the key itself to unprovisioned enclaves.
|
||||
|
||||
The enclave's initial set of active channels defines how enclaves may be scaled horizontally, as these are the
|
||||
channels that will be active for the freshly started enclaves without sealed state.
|
||||
|
||||
"Hold on" you might say, "this means we didn't solve the scalability of stateful enclaves!".
|
||||
|
||||
This is partly true. However in the above case we can force certain channels to be part of the initial active set! In
|
||||
particular the channels that actually use the key (e.g. for signing) may be made "stateless" by lazily requesting
|
||||
provisioning of the key from sibling enclaves. Enclaves may be spun up on demand, and as long as there is at least one
|
||||
sibling enclave holding the key it will be provisioned as needed. This hints at a general pattern of hiding stateful
|
||||
functionality behind stateless channels, if we want them to scale automatically.
|
||||
|
||||
Note that this doesn't mean we can't have external control over the provisioning of the key. For example we probably
|
||||
want to enforce redundancy across N CPUs. This requires the looping in of the hosting infrastructure, we cannot
|
||||
enforce this invariant purely in enclave code.
|
||||
|
||||
As we can see the set of active enclave channels are inherently tied to the sealed state of the enclave, therefore we
|
||||
should make the updating both of them an atomic operation.
|
||||
|
||||
### Side note
|
||||
|
||||
Another way to think about enclaves using sealed state is like an actor model. The sealed state is the actor's state,
|
||||
and state transitions may be executed by any enclave instance running on the same CPU. By transitioning the actor state
|
||||
one can also transition the type of messages the actor can receive atomically (= active channel set).
|
||||
|
||||
## Potential gRPC integration
|
||||
|
||||
It may be desirable to expose a built-in serialisation and network protocol. This would tie us to a specific protocol,
|
||||
but in turn it would ease development.
|
||||
|
||||
An obvious candidate for this is gRPC as it supports streaming and a specific serialization protocol. We need to
|
||||
investigate how we can integrate it so that channels are basically responsible for tunneling gRPC packets.
|
@ -1,88 +0,0 @@
|
||||
# Discovery
|
||||
|
||||
In order to understand enclave discovery and routing we first need to understand the mappings between CPUs, VMs and
|
||||
enclave hosts.
|
||||
|
||||
The cloud provider manages a number of physical machines (CPUs), each of those machines hosts a hypervisor which in
|
||||
turn hosts a number of guest VMs. Each VM in turn may host a number of enclave host containers (together with required
|
||||
supporting software like aesmd) and the sgx device driver. Each enclave host in turn may host several enclave instances.
|
||||
For the sake of simplicity let's assume that an enclave host may only host a single enclave instance per measurement.
|
||||
|
||||
We can figure out the identity of the CPU the VM is running on by using a dedicated enclave to derive a unique ID
|
||||
specific to the CPU. For this we can use EGETKEY with pre-defined inputs to derive a seal key sealed to MRENCLAVE. This
|
||||
provides a 128bit value reproducible only on the same CPU in this manner. Note that this is completely safe as the
|
||||
value won't be used for encryption and is specific to the measurement doing this. With this ID we can reason about
|
||||
physical locality of enclaves without looping in the cloud provider.
|
||||
Note: we should set OWNEREPOCH to a static value before doing this.
|
||||
|
||||
We don't need an explicit handle on the VM's identity, the mapping from VM to container will be handled by the
|
||||
orchestration engine (Kubernetes).
|
||||
|
||||
Similarly to VM identity, the specific host container's identity(IP address/DNS A) is also tracked by Kubernetes,
|
||||
however we do need access to this identity in order to implement discovery.
|
||||
|
||||
When an enclave instance seals a secret that piece of data is tied to the measurement+CPU combo. The secret can only be
|
||||
revealed to an enclave with the same measurement running on the same CPU. However the management of this secret is
|
||||
tied to the enclave host container, which we may have several of running on the same CPU, possibly all of them hosting
|
||||
enclaves with the same measurement.
|
||||
|
||||
To solve this we can introduce a *sealing identity*. This is basically a generated ID/namespace for a collection of
|
||||
secrets belonging to a specific CPU. It is generated when a fresh enclave host starts up and subsequently the host will
|
||||
store sealed secrets under this ID. These secrets should survive host death, so they will be persisted in etcd (together
|
||||
with the associated active channel sets). Every host owns a single sealing identity, but not every sealing identity may
|
||||
have an associated host (e.g. in case the host died).
|
||||
|
||||
## Mapping to Kubernetes
|
||||
|
||||
The following mapping of the above concepts to Kubernetes concepts is not yet fleshed out and requires further
|
||||
investigation into Kubernetes capabilities.
|
||||
|
||||
VMs correspond to Nodes, and enclave hosts correspond to Pods. The host's identity is the same as the Pod's, which is
|
||||
the Pod's IP address/DNS A record. From Kubernetes's point of view enclave hosts provide a uniform stateless Headless
|
||||
Service. This means we can use their scaling/autoscaling features to provide redundancy across hosts (to balance load).
|
||||
|
||||
However we'll probably need to tweak their (federated?) ReplicaSet concept in order to provide redundancy across CPUs
|
||||
(to be tolerant of CPU failures), or perhaps use their anti-affinity feature somehow, to be explored.
|
||||
|
||||
The concept of a sealing identity is very close to the stable identity of Pods in Kubernetes StatefulSets. However I
|
||||
couldn't find a way to use this directly as we need to tie the sealing identity to the CPU identity, which in Kubernetes
|
||||
would translate to a requirement to pin stateful Pods to Nodes based on a dynamically determined identity. We could
|
||||
however write an extension to handle this metadata.
|
||||
|
||||
## Registration
|
||||
|
||||
When an enclave host is started it first needs to establish its sealing identity. To this end first it needs to check
|
||||
whether there are any sealing identities available for the CPU it's running on. If not it can generate a fresh one and
|
||||
lease it for a period of time (and update the lease periodically) and atomically register its IP address in the process.
|
||||
If an existing identity is available the host can take over it by leasing it. There may be existing Kubernetes
|
||||
functionality to handle some of this.
|
||||
|
||||
Non-enclave services (like blob storage) could register similarly, but in this case we can take advantage of Kubernetes'
|
||||
existing discovery infrastructure to abstract a service behind a Service cluster IP. We do need to provide the metadata
|
||||
about supported channels though.
|
||||
|
||||
## Resolution
|
||||
|
||||
The enclave/service discovery problem boils down to:
|
||||
"Given a channel, my trust model and my identity, give me an enclave/service that serves this channel, trusts me, and I
|
||||
trust them".
|
||||
|
||||
This may be done in the following steps:
|
||||
|
||||
1. Resolve the channel to a set of measurements supporting it
|
||||
2. Filter the measurements to trusted ones and ones that trust us
|
||||
3. Pick one of the measurements randomly
|
||||
4. Find an alive host that has the channel in its active set for the measurement
|
||||
|
||||
1 may be done by maintaining a channel -> measurements map in etcd. This mapping would effectively define the enclave
|
||||
deployment and would be the central place to control incremental roll-out or rollbacks.
|
||||
|
||||
2 requires storing of additional metadata per advertised channel, namely a datastructure describing the enclave's trust
|
||||
predicate. A similar datastructure is provided by the discovering entity - these two predicates can then be used to
|
||||
filter measurements based on trust.
|
||||
|
||||
3 is where we may want to introduce more control if we want to support incremental roll-out/canary deployments.
|
||||
|
||||
4 is where various (non-MVP) optimisation considerations come to mind. We could add a loadbalancer, do autoscaling based
|
||||
on load (although Kubernetes already provides support for this), could have a preference for looping back to the same
|
||||
host to allow local attestation, or ones that have the enclave image cached locally or warmed up.
|
@ -1,16 +0,0 @@
|
||||
# Enclave deployment
|
||||
|
||||
What happens if we roll out a new enclave image?
|
||||
|
||||
In production we need to sign the image directly with the R3 key as MRSIGNER (process to be designed), as well as create
|
||||
any whitelisting signatures needed (e.g. from auditors) in order to allow existing enclaves to trust the new one.
|
||||
|
||||
We need to make the enclave build sources available to users - we can package this up as a single container pinning all
|
||||
build dependencies and source code. Docker style image layering/caching will come in handy here.
|
||||
|
||||
Once the image, build containers and related signatures are created we need to push this to the main R3 enclave storage.
|
||||
|
||||
Enclave infrastructure owners (e.g. Corda nodes) may then start using the images depending on their upgrade policy. This
|
||||
involves updating their key value store so that new channel discovery requests resolve to the new measurement, which in
|
||||
turn will trigger the image download on demand on enclave hosts. We can potentially add pre-caching here to reduce
|
||||
latency for first-time enclave users.
|
@ -1,7 +0,0 @@
|
||||
# Enclave storage
|
||||
|
||||
The enclave storage is a simple static content server. It should allow uploading of and serving of enclave images based
|
||||
on their measurement. We may also want to store metadata about the enclave build itself (e.g. github link/commit hash).
|
||||
|
||||
We may need to extend its responsibilities to serve other SGX related static content such as whitelisting signatures
|
||||
over measurements.
|
@ -1,11 +0,0 @@
|
||||
# Enclave host
|
||||
|
||||
An enclave host's responsibility is the orchestration of the communication with hosted enclaves.
|
||||
|
||||
It is responsible for:
|
||||
* Leasing a sealing identity
|
||||
* Getting a CPU certificate in the form of an Intel-signed quote
|
||||
* Downloading and starting of requested enclaves
|
||||
* Driving attestation and subsequent encrypted traffic
|
||||
* Using discovery to connect to other enclaves/services
|
||||
* Various caching layers (and invalidation of) for the CPU certificate, hosted enclave quotes and enclave images
|
@ -1,10 +0,0 @@
|
||||
# IAS proxy
|
||||
|
||||
The Intel Attestation Service proxy's responsibility is simply to forward requests to and from the IAS.
|
||||
|
||||
The reason we need this proxy is because Intel requires us to do Mutual TLS with them for each attestation round trip.
|
||||
For this we need an R3 maintained private key, and as we want third parties to be able to do attestation we need to
|
||||
store this private key in these proxies.
|
||||
|
||||
Alternatively we may decide to circumvent this mutual TLS requirement completely by distributing the private key with
|
||||
the host containers.
|
@ -1,13 +0,0 @@
|
||||
# Key-value store
|
||||
|
||||
To solve enclave to enclave and enclave to non-enclave communication we need a way to route requests correctly. There
|
||||
are readily available discovery solutions out there, however we have some special requirements because of the inherent
|
||||
statefulness of enclaves (route to enclave with correct state) and the dynamic nature of trust between them (route to
|
||||
enclave I can trust and that trusts me). To store metadata about discovery we can need some kind of distributed
|
||||
key-value store.
|
||||
|
||||
The key-value store needs to store information about the following entities:
|
||||
* Enclave image: measurement and supported channels
|
||||
* Sealing identity: the sealing ID, the corresponding CPU ID and the host leasing it (if any)
|
||||
* Sealed secret: the sealing ID, the sealing measurement, the sealed secret and corresponding active channel set
|
||||
* Enclave deployment: mapping from channel to set of measurements
|
@ -1,33 +0,0 @@
|
||||
# Serverless architectures
|
||||
|
||||
In 2014 Amazon launched AWS Lambda, which they coined a "serverless architecture". It essentially creates an abstraction
|
||||
layer which hides the infrastructure details. Users provide "lambdas", which are stateless functions that may invoke
|
||||
other lambdas, access other AWS services etc. Because Lambdas are inherently stateless (any state they need must be
|
||||
accessed through a service) they may be loaded and executed on demand. This is in contrast with microservices, which
|
||||
are inherently stateful. Internally AWS caches the lambda images and even caches JIT compiled/warmed up code in order
|
||||
to reduce latency. Furthermore the lambda invocation interface provides a convenient way to scale these lambdas: as the
|
||||
functions are statelesss AWS can spin up new VMs to push lambda functions to. The user simply pays for CPU usage, all
|
||||
the infrastructure pain is hidden by Amazon.
|
||||
|
||||
Google and Microsoft followed suit in a couple of years with Cloud Functions and Azure Functions.
|
||||
|
||||
This way of splitting hosting computation from a hosted restricted computation is not a new idea, examples are web
|
||||
frameworks (web server vs application), MapReduce (Hadoop vs mappers/reducers), or even the cloud (hypervisors vs vms)
|
||||
and the operating system (kernel vs userspace). The common pattern is: the hosting layer hides some kind of complexity,
|
||||
imposes some restriction on the guest layer (and provides a simpler interface in turn), and transparently multiplexes
|
||||
a number of resources for them.
|
||||
|
||||
The relevant key features of serverless architectures are 1. on-demand scaling and 2. business logic independent of
|
||||
hosting logic.
|
||||
|
||||
# Serverless SGX?
|
||||
|
||||
How are Amazon Lambdas relevant to SGX? Enclaves exhibit very similar features to Lambdas: they are pieces of business
|
||||
logic completely independent of the hosting functionality. Not only that, enclaves treat hosts as adversaries! This
|
||||
provides a very clean separation of concerns which we can exploit.
|
||||
|
||||
If we could provide a similar infrastructure for enclaves as Amazon provides for Lambdas it would not only allow easy
|
||||
HA and scaling, it would also decouple the burden of maintaining the infrastructure from the enclave business logic.
|
||||
Furthermore our plan of using the JVM within enclaves also aligns with the optimizations Amazon implemented (e.g.
|
||||
keeping warmed up enclaves around). Optimizations like upgrading to local attestation also become orthogonal to
|
||||
enclave business logic. Enclave code can focus on the specific functionality at hand, everything else is taken care of.
|
@ -1,69 +0,0 @@
|
||||
# Time in enclaves
|
||||
|
||||
In general we know that any one crypto algorithm will be broken in X years time. The usual way to mitigate this is by
|
||||
using certificate expiration. If a peer with an expired certificate tries to connect we reject it in order to enforce
|
||||
freshness of their key.
|
||||
|
||||
In order to check certificate expiration we need some notion of calendar time. However in SGX's threat model the host
|
||||
of the enclave is considered malicious, so we cannot rely on their notion of time. Intel provides trusted time through
|
||||
their PSW, however this uses the Management Engine which is known to be a proprietary vulnerable piece of architecture.
|
||||
|
||||
Therefore in order to check calendar time in general we need some kind of time oracle. We can burn in the oracle's
|
||||
identity to the enclave and request timestamped signatures from it. This already raises questions with regards to the
|
||||
oracle's identity itself, however for the time being let's assume we have something like this in place.
|
||||
|
||||
### Timestamped nonces
|
||||
|
||||
The most straightforward way to implement calendar time checks is to generate a nonce *after* DH exchange, send it to
|
||||
the oracle and have it sign over it with a timestamp. The nonce is required to avoid replay attacks. A malicious host
|
||||
may delay the delivery of the signature indefinitely, even until after the certificate expires. However note that the
|
||||
DH happened before the nonce was generated, which means even if an attacker can crack the expired key they would not be
|
||||
able to steal the DH session, only try creating new ones, which will fail at the timestamp check.
|
||||
|
||||
This seems to be working, however note that this would impose a full round trip to an oracle *per DH exchange*.
|
||||
|
||||
### Timestamp-encrypted channels
|
||||
|
||||
In order to reduce the round trips required for timestamp checking we can invert the responsibility of checking of the
|
||||
timestamp. We can do this by encrypting the channel traffic with an additional key generated by the enclave but that can
|
||||
only be revealed by the time oracle. The enclave encrypts the encryption key with the oracle's public key so the peer
|
||||
trying to communicate with the enclave must forward the encrypted key to the oracle. The oracle in turn will check the
|
||||
timestamp and reveal the contents (perhaps double encrypted with a DH-derived key). The peer can cache the key and later
|
||||
use the same encryption key with the enclave. It is then the peer's responsibility to get rid of the key after a while.
|
||||
|
||||
Note that this mitigates attacks where the attacker is a third party trying to exploit an expired key, but this method
|
||||
does *not* mitigate against malicious peers that keep around the encryption key until after expiration(= they "become"
|
||||
malicious).
|
||||
|
||||
### Oracle key break
|
||||
|
||||
So given an oracle we can secure a channel against expired keys and potentially improve performance by trusting
|
||||
once-authorized enclave peers to not become malicious.
|
||||
|
||||
However what happens if the oracle key itself is broken? There's a chicken-and-egg problem where we can't check the
|
||||
expiration of the time oracle's certificate itself! Once the oracle's key is broken an attacker can fake timestamping
|
||||
replies (or decrypt the timestamp encryption key), which in turn allows it to bypass the expiration check.
|
||||
|
||||
The main issue with this is in relation to sealed secrets, and sealed secret provisioning between enclaves. If an
|
||||
attacker can fake being e.g. an authorized enclave then it can extract old secrets. We have yet to come up with a
|
||||
solution to this, and I don't think it's possible.
|
||||
|
||||
Instead, knowing that current crypto algorithms are bound to be broken at *some* point in the future, instead of trying
|
||||
to make sealing future-proof we can become explicit about the time-boundness of security guarantees.
|
||||
|
||||
### Sealing epochs
|
||||
|
||||
Let's call the time period in which a certain set of algorithms are considered safe a *sealing epoch*. During this
|
||||
period sealed data at rest is considered to be secure. However once the epoch finishes old sealed data is considered to
|
||||
be potentially compromised. We can then think of sealed data as an append-only log of secrets with overlapping epoch
|
||||
intervals where the "breaking" of old epochs is constantly catching up with new ones.
|
||||
|
||||
In order to make sure that this works we need to enforce an invariant where secrets only flow from old epochs to newer
|
||||
ones, never the other way around.
|
||||
|
||||
This translates to the ledger nicely, data in old epochs are generally not valuable anymore, so it's safe to consider
|
||||
them compromised. Note however that in the privacy model an epoch transition requires a full re-provisioning of the
|
||||
ledger to the new set of algorithms/enclaves.
|
||||
|
||||
In any case this is an involved problem, and I think we should defer the fleshing out of it for now as we won't need it
|
||||
for the first round of stateless enclaves.
|
Before Width: | Height: | Size: 240 KiB |
@ -1,317 +0,0 @@
|
||||
# SGX Integration
|
||||
|
||||
This document is intended as a design description of how we can go about integrating SGX with Corda. As the
|
||||
infrastructure design of SGX is quite involved (detailed elsewhere) but otherwise flexible we can discuss the possible
|
||||
integration points separately, without delving into lower level technical detail.
|
||||
|
||||
For the purposes of this document we can think of SGX as a way to provision secrets to a remote node with the
|
||||
knowledge that only trusted code(= enclave) will operate on it. Furthermore it provides a way to durably encrypt data
|
||||
in a scalable way while also ensuring that the encryption key is never leaked (unless the encrypting enclave is
|
||||
compromised).
|
||||
|
||||
Broadly speaking there are two dimensions to deciding how we can integrate SGX: *what* we store in the ledger and
|
||||
*where* we store it.
|
||||
|
||||
The first dimension is the what: this relates to what we so far called the "integrity model" vs the "privacy model".
|
||||
|
||||
In the **integrity model** we rely on SGX to ensure the integrity of the ledger. Using this assumption we can cut off
|
||||
the transaction body and only store an SGX-backed signature over filtered transactions. Namely we would only store
|
||||
information required for notarisation of the current and subsequent spending transactions. This seems neat on first
|
||||
sight, however note that if we do this naively then if an attacker can impersonate an enclave they'll gain write
|
||||
access to the ledger, as the fake enclave can sign transactions as valid without having run verification.
|
||||
|
||||
In the **privacy model** we store the full transaction backchain (encrypted) and we keep provisioning it between nodes
|
||||
on demand, just like in the current Corda implementation. This means we only rely on SGX for the privacy aspects - if
|
||||
an enclave is compromised we only lose privacy, the verification cannot be eluded by providing a fake signature.
|
||||
|
||||
The other dimension is the where: currently in non-SGX Corda the full transaction backchain is provisioned between non-
|
||||
notary nodes, and is also provisioned to notaries in the case they are validating ones. With SGX+BFT notaries we have
|
||||
the possibility to offload the storage of the encrypted ledger (or encrypted signatures thereof) to notary nodes (or
|
||||
dedicated oracles) and only store bookkeeping information required for further ledger updates in non-notary nodes. The
|
||||
storage policy is very important, customers want control over the persistence of even encrypted data, and with the
|
||||
introduction of recent regulation (GDPR) unrestricted provisioning of sensitive data will be illegal by law, even when
|
||||
encrypted.
|
||||
|
||||
We'll explore the different combination of choices below. Note that we don't need to marry to any one of them, we may
|
||||
decide to implement several.
|
||||
|
||||
## Privacy model + non-notary provisioning
|
||||
|
||||
Let's start with the model that's closest to the current Corda implementation as this is an easy segue into the
|
||||
possibilities with SGX. We also have a simple example and a corresponding neat diagram (thank you Kostas!!) we showed
|
||||
to a member bank Itau to indicate in a semi-handwavy way what the integration will look like.
|
||||
|
||||
We have a cordapp X used by node A and B. The cordapp contains a flow XFlow and a (deterministic) contract XContract.
|
||||
The two nodes are negotiating a transaction T2. T2 consumes a state that comes from transaction T1.
|
||||
|
||||
Let's assume that both A and B are happy with T2, except Node A hasn't established the validity of it yet. Our goal is
|
||||
to prove the validity of T2 to A without revealing the details of T1.
|
||||
|
||||
The following diagram shows an overview of how this can be achieved. Note that the diagram is highly oversimplified
|
||||
and is meant to communicate the high-level data flow relevant to Corda.
|
||||
|
||||

|
||||
|
||||
* In order to validate T2, A asks its enclave whether T2 is valid.
|
||||
* The enclave sees that T2 depends on T1, so it consults its sealed ledger whether it contains T1.
|
||||
* If it does then this means T1 has been verified already, so the enclave moves on to the verification of T2.
|
||||
* If the ledger doesn't contain T1 then the enclave needs to retrieve it from node B.
|
||||
* In order to do this A's enclave needs to prove to B's enclave that it is indeed a trusted enclave B can provision T1
|
||||
to. This proof is what the attestation process provides.
|
||||
* Attestation is done in the clear: (TODO attestation diagram)
|
||||
* A's enclave generates a keypair, the public part of which is sent to Node B in a datastructure signed by Intel,
|
||||
this is called the quote(1).
|
||||
* Node B's XFlow may do various checks on this datastructure that cannot be performed by B's enclave, for example
|
||||
checking of the timeliness of Intel's signature(2).
|
||||
* Node B's XFlow then forwards the quote to B's enclave, which will check Intel's signature and whether it trusts A'
|
||||
s enclave. For the sake of simplicity we can assume this to be strict check that A is running the exact same
|
||||
enclave B is.
|
||||
* At this point B's enclave has established trust in A's enclave, and has the public part of the key generated by A'
|
||||
s enclave.
|
||||
* The nodes repeat the above process the other way around so that A's enclave establishes trust in B's and gets hold
|
||||
of B's public key(3).
|
||||
* Now they proceed to perform an ephemeral Diffie-Hellman key exchange using the keys in the quotes(4).
|
||||
* The ephemeral key is then used to encrypt further communication. Beyond this point the nodes' flows (and anything
|
||||
outside of the enclaves) have no way of seeing what data is being exchanged, all the nodes can do is forward the
|
||||
encrypted messages.
|
||||
* Once attestation is done B's enclave provisions T1 to A's enclave using the DH key. If there are further
|
||||
dependencies those would be provisioned as well.
|
||||
* A's enclave then proceeds to verify T1 using the embedded deterministic JVM to run XContract. The verified
|
||||
transaction is then sealed to disk(5). We repeat this for T2.
|
||||
* If verification or attestation fails at any point the enclave returns to A's XFlow with a failure. Otherwise if all
|
||||
is good the enclave returns with a success. At this point A's XFlow knows that T2 is valid, but hasn't seen T1 in
|
||||
the clear.
|
||||
|
||||
(1) This is simplified, the actual protocol is a bit different. Namely the quote is not generated every time A requires provisioning, but is rather generated periodically.
|
||||
|
||||
(2) There is a way to do this check inside the enclave, however it requires switching on of the Intel ME which in general isn't available on machines in the cloud and is known to have vulnerabilities.
|
||||
|
||||
(3) We need symmetric trust even if the secrets seem to only flow from B to A. Node B may try to fake being an enclave to fish for information from A.
|
||||
|
||||
(4) The generated keys in the quotes are used to authenticate the respective parts of the DH key exchange.
|
||||
|
||||
(5) Sealing means encryption of data using a key unique to the enclave and CPU. The data may be subsequently unsealed (decrypted) by the enclave, even if the enclave was restarted. Also note that there is another layer of abstraction needed which we don't detail here, needed for redundancy of the encryption key.
|
||||
|
||||
To summarise, the journey of T1 is:
|
||||
|
||||
1. Initially it's sitting encrypted in B's storage.
|
||||
2. B's enclave decrypts it using its seal key specific to B's enclave + CPU combination.
|
||||
3. B's enclave encrypts it using the ephemeral DH key.
|
||||
4. The encrypted transaction is sent to A. The safety of this (namely that A's enclave doesn't reveal the transaction to node A) hinges on B's enclave's trust in A's enclave, which is expressed as a check of A's enclave measurement during attestation, which in turn requires auditing of A's enclave code and reproducing of the measurement.
|
||||
5. A's enclave decrypts the transaction using the DH key.
|
||||
6. A's enclave verifies the transaction using a deterministic JVM.
|
||||
7. A's enclave encrypts the transaction using A's seal key specific to A's enclave + CPU combination.
|
||||
8. The encrypted transaction is stored in A's storage.
|
||||
|
||||
As we can see in this model each non-notary node runs their own SGX enclave and related storage. Validation of the
|
||||
backchain happens by secure provisioning of it between enclaves, plus subsequent verification and storage. However
|
||||
there is one important thing missing from the example (actually it has several, but those are mostly technical detail):
|
||||
the notary!
|
||||
|
||||
In reality we cannot establish the full validity of T2 at this point of negotiation, we need to first notarise it.
|
||||
This model gives us some flexibility in this regard: we can use a validating notary (also running SGX) or a
|
||||
non-validating one. This indicates that the enclave API should be split in two, mirroring the signature check choice
|
||||
in SignedTransaction.verify. Only when the transaction is fully signed and notarised should it be persisted (sealed).
|
||||
|
||||
This model has both advantages and disadvantages. On one hand it is the closest to what we have now - we (and users)
|
||||
are familiar with this model, we can fairly easily nest it into the existing codebase and it gives us flexibility with
|
||||
regards to notary modes. On the other hand it is a compromising answer to the regulatory problem. If we use non-
|
||||
validating notaries then the backchain storage is restricted to participants, however consider the following example:
|
||||
if we have a transaction X that parties A and B can process legally, but a later transaction Y that has X in its
|
||||
backchain is sent for verification to party C, then C will process and store X as well, which may be illegal.
|
||||
|
||||
## Privacy model + notary provisioning
|
||||
|
||||
This model would work similarly to the previous one, except non-notary nodes wouldn't need to run SGX or care about
|
||||
storage of the encrypted ledger, it would all be done in notary nodes. Nodes would connect to SGX capable notary nodes,
|
||||
and after attestation the nodes can be sure that the notary has run verification before signing.
|
||||
|
||||
This fixes the choice of using validating notaries, as notaries would be the only entities capable of verification:
|
||||
only they have access to the full backchain inside enclaves.
|
||||
|
||||
Note that because we still provision the full backchain between notary members for verification, we don't necessarily
|
||||
need a BFT consensus on validity - if an enclave is compromised an invalid transaction will be detected at the next
|
||||
backchain provisioning.
|
||||
|
||||
This model reduces the number of responsibilities of a non-notary node, in particular it wouldn't need to provide
|
||||
storage for the backchain or verification, but could simply trust notary signatures. Also it wouldn't need to host SGX
|
||||
enclaves, only partake in the DH exchange with notary enclaves. The node's responsibilities would be reduced to the
|
||||
orchestration of ledger updates (flows) and related bookkeeping (vault, network map). This split would also enable us
|
||||
to be flexible with regards to the update orchestration: trust in the validity of the ledger would cease to depend on
|
||||
the transaction resolution currently embedded into flows - we could provide a from-scratch light-weight implementation
|
||||
of a "node" (say a mobile app) that doesn't use flows and related code at all, it just needs to be able to connect to
|
||||
notary enclaves to notarise, validity is taken care of by notaries.
|
||||
|
||||
Note that although we wouldn't require validation checks from non-notary nodes, in theory it would be safe to allow
|
||||
them to do so (if they want a stronger-than-BFT guarantee).
|
||||
|
||||
Of course this model has disadvantages too. From the regulatory point of view it is a strictly worse solution than the
|
||||
non-notary provisioning model: the backchain would be provisioned between notary nodes not owned by actual
|
||||
participants in the backchain. It also disables us from using non-validating notaries.
|
||||
|
||||
## Integrity model + non-notary provisioning
|
||||
|
||||
In this model we would trust SGX-backed signatures and related attestation datastructures (quote over signature key
|
||||
signed by Intel) as proof of validity. When node A and B are negotiating a transaction it's enough to provision SGX
|
||||
signatures over the dependency hashes to one another, there's no need to provision the full backchain.
|
||||
|
||||
This sounds very simple and efficient, and it's even more private than the privacy model as we're only passing
|
||||
signatures around, not transactions. However there are a couple of issues that need addressing: If an SGX enclave is
|
||||
compromised a malicious node can provide a signature over an invalid transaction that checks out, and nobody will ever
|
||||
know about it, because the original transaction will never be verified. One way we can mitigate this is by requiring a
|
||||
BFT consensus signature, or perhaps a threshold signature is enough. We could decouple verification into "verifying
|
||||
oracles" which verify in SGX and return signatures over transaction hashes, and require a certain number of them to
|
||||
convince the notary to notarise and subsequent nodes to trust validity. Another issue is enclave updates. If we find a
|
||||
vulnerability in an enclave and update it, what happens to the already signed backchain? Historical transactions have
|
||||
signatures that are rooted in SGX quotes belonging to old untrusted enclave code. One option is to simply have a
|
||||
cutoff date before which we accept old signatures. This requires a consensus-backed timestamp on the notary signature.
|
||||
Another option would be to keep the old ledger around and re-verify it with the new enclaves. However if we do this we
|
||||
lose the benefits of the integrity model - we get back the regulatory issue, and we don't gain the performance benefits.
|
||||
|
||||
## Integrity model + notary provisioning
|
||||
|
||||
This is similar to the previous model, only once again non-notary nodes wouldn't need to care about verifying or
|
||||
collecting proofs of validity before sending the transaction off for notarisation. All of the complexity would be
|
||||
hidden by notary nodes, which may use validating oracles or perhaps combine consensus over validity with consensus
|
||||
over spending. This model would be a very clean separation of concerns which solves the regulatory problem (almost)
|
||||
and is quite efficient as we don't need to keep provisioning the chain. One potential issue with regards to regulation
|
||||
is the tip of the ledger (the transaction being notarised) - this is sent to notaries and although it is not stored it
|
||||
may still be against the law to receive it and hold it in volatile memory, even inside an enclave. I'm unfamiliar with
|
||||
the legal details of whether this is good enough. If this is an issue, one way we could address this would be to scope
|
||||
the validity checks required for notarisation within legal boundaries and only require "full" consensus on the
|
||||
spentness check. Of course this has the downside that ledger participants outside of the regulatory boundary need to
|
||||
trust the BFT-SGX of the scope. I'm not sure whether it's possible to do any better, after all we can't send the
|
||||
transaction body outside the scope in any shape or form.
|
||||
|
||||
## Threat model
|
||||
|
||||
In all models we have the following actors, which may or may not overlap depending on the model:
|
||||
|
||||
* Notary quorum members
|
||||
* Non-notary nodes/entities interacting with the ledger
|
||||
* Identities owning the verifying enclave hosting infrastructure
|
||||
* Identities owning the encrypted ledger/signature storage infrastructure
|
||||
* R3 = enclave whitelisting identity
|
||||
* Network Map = contract whitelisting identity
|
||||
* Intel
|
||||
|
||||
We have two major ways of compromise:
|
||||
|
||||
* compromise of a non-enclave entity (notary, node, R3, Network Map, storage)
|
||||
* compromise of an enclave.
|
||||
|
||||
In the case of **notaries** compromise means malicious signatures, for **nodes** it's malicious transactions, for **R3**
|
||||
it's signing malicious enclaves, for **Network Map** it's signing malicious contracts, for **storage** it's read-write
|
||||
access to encrypted data, and for **Intel** it's forging of quotes or signing over invalid ones.
|
||||
|
||||
A compromise of an **enclave** means some form of access to the enclave's temporary identity key. This may happen
|
||||
through direct hardware compromise (extracting of fuse values) and subsequent forging of a quote, or leaking of secrets
|
||||
through weakness of the enclave-host boundary or other side-channels like Spectre(hacking). In any case it allows an
|
||||
adversary to impersonate an enclave and therefore to intercept enclave traffic and forge signatures.
|
||||
|
||||
The actors relevant to SGX are enclave hosts, storage infrastructure owners, regular nodes and R3.
|
||||
|
||||
* **Enclave hosts**: enclave code is specifically written with malicious (compromised) hosts in mind. That said we
|
||||
cannot be 100% secure against yet undiscovered side channel attacks and other vulnerabilities, so we need to be
|
||||
prepared for the scenario where enclaves get compromised. The privacy model effectively solves this problem by
|
||||
always provisioning and re-verifying the backchain. An impersonated enclave may be able to see what's on the ledger,
|
||||
but tampering with it will not check out at the next provisioning. On the other hand if a compromise happens in the
|
||||
integrity model an attacker can forge a signature over validity. We can mitigate this with a BFT guarantee by
|
||||
requiring a consensus over validity. This way we effectively provide the same guarantee for validity as notaries
|
||||
provide with regards to double spend.
|
||||
|
||||
* **Storage infrastructure owner**:
|
||||
* A malicious actor would need to crack the encryption key to decrypt transactions
|
||||
or transaction signatures. Although this is highly unlikely, we can mitigate by preparing for and forcing of key
|
||||
updates (i.e. we won't provision new transactions to enclaves using old keys).
|
||||
* What an attacker *can* do is simply erase encrypted data (or perhaps re-encrypt as part of ransomware), blocking
|
||||
subsequent resolution and verification. In the non-notary provisioning models we can't really mitigate this as the
|
||||
tip of the ledger (or signature over) may only be stored by a single non-notary entity (assumed to be compromised).
|
||||
However if we require consensus over validity between notary or non-notary entities (e.g. validating oracles) then
|
||||
this implicitly provides redundancy of storage.
|
||||
* Furthermore storage owners can spy on the enclave's activity by observing access patterns to the encrypted blobs.
|
||||
We can mitigate by implementing ORAM storage.
|
||||
|
||||
* **Regular nodes**: if a regular node is compromised the attacker may gain access to the node's long term key that
|
||||
allows them to Diffie-Hellman with an enclave, or get the ephemeral DH value calculated during attestation directly.
|
||||
This means they can man-in-the-middle between the node and the enclave. From the ledger's point of view we are
|
||||
prepared for this scenario as we never leak sensitive information to the node from the enclave, however it opens the
|
||||
possibility that the attacker can fake enclave replies (e.g. validity checks) and can sniff on secrets flowing from
|
||||
the node to the enclave. We can mitigate the fake enclave replies by requiring an extra signature on messages.
|
||||
Sniffing cannot really be mitigated, but one could argue that if the transient DH key (that lives temporarily in
|
||||
volatile memory) or long term key (that probably lives in an HSM) was leaked then the attacker has access to node
|
||||
secrets anyway.
|
||||
|
||||
* **R3**: the entity that's whitelisting enclaves effectively controls attestation trust, which means they can
|
||||
backdoor the ledger by whitelisting a secret-revealing/signature-forging enclave. One way to mitigate this is by
|
||||
requiring a threshold signature/consensus over new trusted enclave measurements. Another way would be to use "canary"
|
||||
keys controlled by neutral parties. These parties' responsibility would simply be to publish enclave measurements (and
|
||||
perhaps the reproducing build) to the public before signing over them. The "publicity" and signature would be checked
|
||||
during attestation, so a quote with a non-public measurement would be rejected. Although this wouldn't prevent
|
||||
backdoors (unless the parties also do auditing), it would make them public.
|
||||
|
||||
* **Intel**: There are two ways a compromised Intel can interact with the ledger maliciously, both provide a backdoor.
|
||||
* It can sign over invalid quotes. This can be mitigated by implementing our own attestation service. Intel told us
|
||||
we'll be able to do this in the future (by downloading a set of certificates tied to CPU+CPUSVN combos that may be
|
||||
used to check QE signatures).
|
||||
* It can produce valid quotes without an enclave. This is due to the fact that they store one half of the SGX-
|
||||
specific fuse values in order to validate quotes flexibly. One way to circumvent this would be to only use the
|
||||
other half of the fuse values (the seal values) which they don't store (or so they claim). However this requires
|
||||
our own "enrollment" process of CPUs where we replicate the provisioning process based off of seal values and
|
||||
verify manually that the provisioning public key comes from the CPU. And even if we do this all we did was move
|
||||
the requirement of trust from Intel to R3.
|
||||
|
||||
Note however that even if an attacker compromises Intel and decides to backdoor they would need to connect to the
|
||||
ledger participants in order to take advantage. The flow framework and the business network concept act as a form of
|
||||
ACL on data that would make an Intel backdoor quite useless.
|
||||
|
||||
## Summary
|
||||
|
||||
As we can see we have a number of options here, all of them have advantages and disadvantages.
|
||||
|
||||
#### Privacy + non-notary
|
||||
|
||||
**Pros**:
|
||||
* Closest to our current non-SGX model
|
||||
* Strong guarantee of validity
|
||||
* Flexible with respect to notary modes
|
||||
|
||||
**Cons**:
|
||||
* Regulatory problem about provisioning of ledger
|
||||
* Relies on ledger participants to do validation checks
|
||||
* No redundancy across ledger participants
|
||||
|
||||
#### Privacy + notary
|
||||
|
||||
**Pros**:
|
||||
* Strong guarantee of validity
|
||||
* Separation of concerns, allows lightweight ledger participants
|
||||
* Redundancy across notary nodes
|
||||
|
||||
**Cons**:
|
||||
* Regulatory problem about provisioning of ledger
|
||||
|
||||
#### Integrity + non-notary
|
||||
|
||||
**Pros**:
|
||||
* Efficient validity checks
|
||||
* No storage of sensitive transaction body only signatures
|
||||
|
||||
**Cons**:
|
||||
* Enclave impersonation compromises ledger (unless consensus validation)
|
||||
* Relies on ledger participants to do validation checks
|
||||
* No redundancy across ledger participants
|
||||
|
||||
#### Integrity + notary
|
||||
|
||||
**Pros**:
|
||||
* Efficient validity check
|
||||
* No storage of sensitive transaction body only signatures
|
||||
* Separation of concerns, allows lightweight ledger participants
|
||||
* Redundancy across notary nodes
|
||||
|
||||
**Cons**:
|
||||
* Only BFT guarantee over validity
|
||||
* Temporary storage of transaction in RAM may be against regulation
|
||||
|
||||
Personally I'm strongly leaning towards an integrity model where SGX compromise is mitigated by a BFT consensus over validity (perhaps done by a validating oracle cluster). This would solve the regulatory problem, it would be efficient and the infrastructure would have a very clean separation of concerns between notary and non-notary nodes, allowing lighter-weight interaction with the ledger.
|
@ -1,39 +0,0 @@
|
||||

|
||||
|
||||
--------------------------------------------
|
||||
Design Decision: <Description heading>
|
||||
============================================
|
||||
|
||||
## Background / Context
|
||||
|
||||
Short outline of decision point.
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### A. <Option summary>
|
||||
|
||||
#### Advantages
|
||||
|
||||
1.
|
||||
2.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1.
|
||||
2.
|
||||
|
||||
### B. <Option summary>
|
||||
|
||||
#### Advantages
|
||||
|
||||
1.
|
||||
2.
|
||||
|
||||
#### Disadvantages
|
||||
|
||||
1.
|
||||
2.
|
||||
|
||||
## Recommendation and justification
|
||||
|
||||
Proceed with Option <A or B or ... >
|
@ -1,76 +0,0 @@
|
||||
# Design doc template
|
||||
|
||||
## Overview
|
||||
|
||||
Please read the [Design Review Process](../design-review-process.md) before completing a design.
|
||||
|
||||
Each section of the document should be at the second level (two hashes at the start of a line).
|
||||
|
||||
This section should describe the desired change or feature, along with background on why it's needed and what problem
|
||||
it solves.
|
||||
|
||||
An outcome of the design document should be an implementation plan that defines JIRA stories and tasks to be completed
|
||||
to produce shippable, demonstrable, executable code.
|
||||
|
||||
Please complete and/or remove section headings as appropriate to the design being proposed. These are provided as
|
||||
guidance and to structure the design in a consistent and coherent manner.
|
||||
|
||||
## Background
|
||||
|
||||
Description of existing solution (if any) and/or rationale for requirement.
|
||||
|
||||
* Reference(s) to discussions held elsewhere (slack, wiki, etc).
|
||||
* Definitions, acronyms and abbreviations
|
||||
|
||||
## Goals
|
||||
|
||||
What's in scope to be solved.
|
||||
|
||||
## Non-goals
|
||||
|
||||
What won't be tackled as part of this design, either because it's not needed/wanted, or because it will be tackled later
|
||||
as part of a separate design effort. Figuring out what you will *not* do is frequently a useful exercise.
|
||||
|
||||
## Timeline
|
||||
|
||||
* Is this a short, medium or long-term solution?
|
||||
* Where short-term design, is this evolvable / extensible or stop-gap (eg. potentially throwaway)?
|
||||
|
||||
## Requirements
|
||||
|
||||
* Reference(s) to any of following:
|
||||
* Captured Product Backlog JIRA entry
|
||||
* Internal White Paper feature item and/or visionary feature
|
||||
* Project related requirement (POC, RFP, Pilot, Prototype) from
|
||||
* Internal Incubator / Accelerator project
|
||||
* Direct from Customer, ISV, SI, Partner
|
||||
* Use Cases
|
||||
* Assumptions
|
||||
|
||||
## Design Decisions
|
||||
|
||||
List of design decisions identified in defining the target solution.
|
||||
|
||||
For each item, please complete the attached [Design Decision template](decisions/decision.md)
|
||||
|
||||
Use the ``.. toctree::`` feature to list out the design decision docs here (see the source of this file for an example).
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
decisions/decision.md
|
||||
|
||||
## Design
|
||||
|
||||
Think about:
|
||||
|
||||
* Public API, backwards compatibility impact.
|
||||
* UI requirements, if any. Illustrate with UI Mockups and/or wireframes.
|
||||
* Data model & serialization impact and changes required.
|
||||
* Infrastructure services: persistence (schemas), messaging.
|
||||
* Impact on performance, scalability, high availability
|
||||
* Versioning, upgradability, migration=
|
||||
* Management: audit, alerting, monitoring, backup/recovery, archiving
|
||||
* Data privacy, authentication, access control
|
||||
* Logging
|
||||
* Testability
|
@ -1,346 +0,0 @@
|
||||
.. raw:: html
|
||||
|
||||
<style> .red {color:red} </style>
|
||||
|
||||
.. role:: red
|
||||
|
||||
Deterministic modules
|
||||
=====================
|
||||
|
||||
A Corda contract's verify function should always produce the same results for the same input data. To that end,
|
||||
Corda provides the following modules:
|
||||
|
||||
#. ``core-deterministic``
|
||||
#. ``serialization-deterministic``
|
||||
#. ``jdk8u-deterministic``
|
||||
|
||||
These are reduced version of Corda's ``core`` and ``serialization`` modules and the OpenJDK 8 ``rt.jar``, where the
|
||||
non-deterministic functionality has been removed. The intention here is that all CorDapp classes required for
|
||||
contract verification should be compiled against these modules to prevent them containing non-deterministic behaviour.
|
||||
|
||||
.. note:: These modules are only a development aid. They cannot guarantee determinism without also including
|
||||
deterministic versions of all their dependent libraries, e.g. ``kotlin-stdlib``.
|
||||
|
||||
Generating the deterministic modules
|
||||
------------------------------------
|
||||
|
||||
JDK 8
|
||||
``jdk8u-deterministic`` is a "pseudo JDK" image that we can point the Java and Kotlin compilers to. It downloads the
|
||||
``rt.jar`` containing a deterministic subset of the Java 8 APIs from the Artifactory.
|
||||
|
||||
To build a new version of this JAR and upload it to the Artifactory, see the ``create-jdk8u`` module. This is a
|
||||
standalone Gradle project within the Corda repository that will clone the ``deterministic-jvm8`` branch of Corda's
|
||||
`OpenJDK repository <https://github.com/corda/openjdk>`_ and then build it. (This currently requires a C++ compiler,
|
||||
GNU Make and a UNIX-like development environment.)
|
||||
|
||||
Corda Modules
|
||||
``core-deterministic`` and ``serialization-deterministic`` are generated from Corda's ``core`` and ``serialization``
|
||||
modules respectively using both `ProGuard <https://www.guardsquare.com/en/proguard>`_ and Corda's ``JarFilter`` Gradle
|
||||
plugin. Corda developers configure these tools by applying Corda's ``@KeepForDJVM`` and ``@DeleteForDJVM``
|
||||
annotations to elements of ``core`` and ``serialization`` as described :ref:`here <deterministic_annotations>`.
|
||||
|
||||
The build generates each of Corda's deterministic JARs in six steps:
|
||||
|
||||
#. Some *very few* classes in the original JAR must be replaced completely. This is typically because the original
|
||||
class uses something like ``ThreadLocal``, which is not available in the deterministic Java APIs, and yet the
|
||||
class is still required by the deterministic JAR. We must keep such classes to a minimum!
|
||||
#. The patched JAR is analysed by ProGuard for the first time using the following rule:
|
||||
|
||||
.. sourcecode:: groovy
|
||||
|
||||
keep '@interface net.corda.core.KeepForDJVM { *; }'
|
||||
|
||||
..
|
||||
|
||||
ProGuard works by calculating how much code is reachable from given "entry points", and in our case these entry
|
||||
points are the ``@KeepForDJVM`` classes. The unreachable classes are then discarded by ProGuard's ``shrink``
|
||||
option.
|
||||
#. The remaining classes may still contain non-deterministic code. However, there is no way of writing a ProGuard rule
|
||||
explicitly to discard anything. Consider the following class:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
@CordaSerializable
|
||||
@KeepForDJVM
|
||||
data class UniqueIdentifier @JvmOverloads @DeleteForDJVM constructor(
|
||||
val externalId: String? = null,
|
||||
val id: UUID = UUID.randomUUID()
|
||||
) : Comparable<UniqueIdentifier> {
|
||||
...
|
||||
}
|
||||
|
||||
..
|
||||
|
||||
While CorDapps will definitely need to handle ``UniqueIdentifier`` objects, all of the secondary constructors
|
||||
generate a new random ``UUID`` and so are non-deterministic. Hence the next "determinising" step is to pass the
|
||||
classes to the ``JarFilter`` tool, which strips out all of the elements which have been annotated as
|
||||
``@DeleteForDJVM`` and stubs out any functions annotated with ``@StubOutForDJVM``. (Stub functions that
|
||||
return a value will throw ``UnsupportedOperationException``, whereas ``void`` or ``Unit`` stubs will do nothing.)
|
||||
#. After the ``@DeleteForDJVM`` elements have been filtered out, the classes are rescanned using ProGuard to remove
|
||||
any more code that has now become unreachable.
|
||||
#. The remaining classes define our deterministic subset. However, the ``@kotlin.Metadata`` annotations on the compiled
|
||||
Kotlin classes still contain references to all of the functions and properties that ProGuard has deleted. Therefore
|
||||
we now use the ``JarFilter`` to delete these references, as otherwise the Kotlin compiler will pretend that the
|
||||
deleted functions and properties are still present.
|
||||
#. Finally, we use ProGuard again to validate our JAR against the deterministic ``rt.jar``:
|
||||
|
||||
.. literalinclude:: ../../core-deterministic/build.gradle
|
||||
:language: groovy
|
||||
:start-after: DOCSTART 01
|
||||
:end-before: DOCEND 01
|
||||
..
|
||||
|
||||
This step will fail if ProGuard spots any Java API references that still cannot be satisfied by the deterministic
|
||||
``rt.jar``, and hence it will break the build.
|
||||
|
||||
Configuring IntelliJ with a deterministic SDK
|
||||
---------------------------------------------
|
||||
|
||||
We would like to configure IntelliJ so that it will highlight uses of non-deterministic Java APIs as :red:`not found`.
|
||||
Or, more specifically, we would like IntelliJ to use the ``deterministic-rt.jar`` as a "Module SDK" for deterministic
|
||||
modules rather than the ``rt.jar`` from the default project SDK, to make IntelliJ consistent with Gradle.
|
||||
|
||||
This is possible, but slightly tricky to configure because IntelliJ will not recognise an SDK containing only the
|
||||
``deterministic-rt.jar`` as being valid. It also requires that IntelliJ delegate all build tasks to Gradle, and that
|
||||
Gradle be configured to use the Project's SDK.
|
||||
|
||||
Creating the Deterministic SDK
|
||||
Gradle creates a suitable JDK image in the project's ``jdk8u-deterministic/jdk`` directory, and you can
|
||||
configure IntelliJ to use this location for this SDK. However, you should also be aware that IntelliJ SDKs
|
||||
are available for *all* projects to use.
|
||||
|
||||
To create this JDK image, execute the following:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ gradlew jdk8u-deterministic:copyJdk
|
||||
|
||||
..
|
||||
|
||||
Now select ``File/Project Structure/Platform Settings/SDKs`` and add a new JDK SDK with the
|
||||
``jdk8u-deterministic/jdk`` directory as its home. Rename this SDK to something like "1.8 (Deterministic)".
|
||||
|
||||
This *should* be sufficient for IntelliJ. However, if IntelliJ realises that this SDK does not contain a
|
||||
full JDK then you will need to configure the new SDK by hand:
|
||||
|
||||
#. Create a JDK Home directory with the following contents:
|
||||
|
||||
``jre/lib/rt.jar``
|
||||
|
||||
where ``rt.jar`` here is this renamed artifact:
|
||||
|
||||
.. code-block:: xml
|
||||
|
||||
<dependency>
|
||||
<groupId>net.corda</groupId>
|
||||
<artifactId>deterministic-rt</artifactId>
|
||||
<classifier>api</classifier>
|
||||
</dependency>
|
||||
|
||||
..
|
||||
|
||||
#. While IntelliJ is *not* running, locate the ``config/options/jdk.table.xml`` file in IntelliJ's configuration
|
||||
directory. Add an empty ``<jdk>`` section to this file:
|
||||
|
||||
.. code-block:: xml
|
||||
|
||||
<jdk version="2">
|
||||
<name value="1.8 (Deterministic)"/>
|
||||
<type value="JavaSDK"/>
|
||||
<version value="java version "1.8.0""/>
|
||||
<homePath value=".. path to the deterministic JDK directory .."/>
|
||||
<roots>
|
||||
</roots>
|
||||
</jdk>
|
||||
|
||||
..
|
||||
|
||||
#. Open IntelliJ and select ``File/Project Structure/Platform Settings/SDKs``. The "1.8 (Deterministic)" SDK
|
||||
should now be present. Select it and then click on the ``Classpath`` tab. Press the "Add" / "Plus" button to
|
||||
add ``rt.jar`` to the SDK's classpath. Then select the ``Annotations`` tab and include the same JAR(s) as
|
||||
the other SDKs.
|
||||
|
||||
Configuring the Corda Project
|
||||
#. Open the root ``build.gradle`` file and define this property:
|
||||
|
||||
.. code-block:: gradle
|
||||
|
||||
buildscript {
|
||||
ext {
|
||||
...
|
||||
deterministic_idea_sdk = '1.8 (Deterministic)'
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
..
|
||||
|
||||
Configuring IntelliJ
|
||||
#. Go to ``File/Settings/Build, Execution, Deployment/Build Tools/Gradle``, and configure Gradle's JVM to be the
|
||||
project's JVM.
|
||||
|
||||
#. Go to ``File/Settings/Build, Execution, Deployment/Build Tools/Gradle/Runner``, and select these options:
|
||||
|
||||
- Delegate IDE build/run action to Gradle
|
||||
- Run tests using the Gradle Test Runner
|
||||
|
||||
#. Delete all of the ``out`` directories that IntelliJ has previously generated for each module.
|
||||
|
||||
#. Go to ``View/Tool Windows/Gradle`` and click the ``Refresh all Gradle projects`` button.
|
||||
|
||||
These steps will enable IntelliJ's presentation compiler to use the deterministic ``rt.jar`` with the following modules:
|
||||
|
||||
- ``core-deterministic``
|
||||
- ``serialization-deterministic``
|
||||
- ``core-deterministic:testing:common``
|
||||
|
||||
but still build everything using Gradle with the full JDK.
|
||||
|
||||
Testing the deterministic modules
|
||||
---------------------------------
|
||||
|
||||
The ``core-deterministic:testing`` module executes some basic JUnit tests for the ``core-deterministic`` and
|
||||
``serialization-deterministic`` JARs. These tests are compiled against the deterministic ``rt.jar``, although
|
||||
they are still executed using the full JDK.
|
||||
|
||||
The ``testing`` module also has two sub-modules:
|
||||
|
||||
``core-deterministic:testing:data``
|
||||
This module generates test data such as serialised transactions and elliptic curve key pairs using the full
|
||||
non-deterministic ``core`` library and JDK. This data is all written into a single JAR which the ``testing``
|
||||
module adds to its classpath.
|
||||
|
||||
``core-deterministic:testing:common``
|
||||
This module provides the test classes which the ``testing`` and ``data`` modules need to share. It is therefore
|
||||
compiled against the deterministic API subset.
|
||||
|
||||
|
||||
.. _deterministic_annotations:
|
||||
|
||||
Applying @KeepForDJVM and @DeleteForDJVM annotations
|
||||
----------------------------------------------------
|
||||
|
||||
Corda developers need to understand how to annotate classes in the ``core`` and ``serialization`` modules correctly
|
||||
in order to maintain the deterministic JARs.
|
||||
|
||||
.. note:: Every Kotlin class still has its own ``.class`` file, even when all of those classes share the same
|
||||
source file. Also, annotating the file:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
@file:KeepForDJVM
|
||||
package net.corda.core.internal
|
||||
|
||||
..
|
||||
|
||||
*does not* automatically annotate any class declared *within* this file. It merely annotates any
|
||||
accompanying Kotlin ``xxxKt`` class.
|
||||
|
||||
For more information about how ``JarFilter`` is processing the byte-code inside ``core`` and ``serialization``,
|
||||
use Gradle's ``--info`` or ``--debug`` command-line options.
|
||||
|
||||
Deterministic Classes
|
||||
Classes that *must* be included in the deterministic JAR should be annotated as ``@KeepForDJVM``.
|
||||
|
||||
.. literalinclude:: ../../core/src/main/kotlin/net/corda/core/KeepForDJVM.kt
|
||||
:language: kotlin
|
||||
:start-after: DOCSTART 01
|
||||
:end-before: DOCEND 01
|
||||
..
|
||||
|
||||
To preserve any Kotlin functions, properties or type aliases that have been declared outside of a ``class``,
|
||||
you should annotate the source file's ``package`` declaration instead:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
@file:JvmName("InternalUtils")
|
||||
@file:KeepForDJVM
|
||||
package net.corda.core.internal
|
||||
|
||||
infix fun Temporal.until(endExclusive: Temporal): Duration = Duration.between(this, endExclusive)
|
||||
|
||||
..
|
||||
|
||||
Non-Deterministic Elements
|
||||
Elements that *must* be deleted from classes in the deterministic JAR should be annotated as ``@DeleteForDJVM``.
|
||||
|
||||
.. literalinclude:: ../../core/src/main/kotlin/net/corda/core/DeleteForDJVM.kt
|
||||
:language: kotlin
|
||||
:start-after: DOCSTART 01
|
||||
:end-before: DOCEND 01
|
||||
..
|
||||
|
||||
You must also ensure that a deterministic class's primary constructor does not reference any classes that are
|
||||
not available in the deterministic ``rt.jar``. The biggest risk here would be that ``JarFilter`` would delete the
|
||||
primary constructor and that the class could no longer be instantiated, although ``JarFilter`` will print a warning
|
||||
in this case. However, it is also likely that the "determinised" class would have a different serialisation
|
||||
signature than its non-deterministic version and so become unserialisable on the deterministic JVM.
|
||||
|
||||
Primary constructors that have non-deterministic default parameter values must still be annotated as
|
||||
``@DeleteForDJVM`` because they cannot be refactored without breaking Corda's binary interface. The Kotlin compiler
|
||||
will automatically apply this ``@DeleteForDJVM`` annotation - along with any others - to all of the class's
|
||||
secondary constructors too. The ``JarFilter`` plugin can then remove the ``@DeleteForDJVM`` annotation from the
|
||||
primary constructor so that it can subsequently delete only the secondary constructors.
|
||||
|
||||
The annotations that ``JarFilter`` will "sanitise" from primary constructors in this way are listed in the plugin's
|
||||
configuration block, e.g.
|
||||
|
||||
.. sourcecode:: groovy
|
||||
|
||||
task jarFilter(type: JarFilterTask) {
|
||||
...
|
||||
annotations {
|
||||
...
|
||||
|
||||
forSanitise = [
|
||||
"net.corda.core.DeleteForDJVM"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
..
|
||||
|
||||
Be aware that package-scoped Kotlin properties are all initialised within a common ``<clinit>`` block inside
|
||||
their host ``.class`` file. This means that when ``JarFilter`` deletes these properties, it cannot also remove
|
||||
their initialisation code. For example:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
package net.corda.core
|
||||
|
||||
@DeleteForDJVM
|
||||
val map: MutableMap<String, String> = ConcurrentHashMap()
|
||||
|
||||
..
|
||||
|
||||
In this case, ``JarFilter`` would delete the ``map`` property but the ``<clinit>`` block would still create
|
||||
an instance of ``ConcurrentHashMap``. The solution here is to refactor the property into its own file and then
|
||||
annotate the file itself as ``@DeleteForDJVM`` instead.
|
||||
|
||||
Non-Deterministic Function Stubs
|
||||
Sometimes it is impossible to delete a function entirely. Or a function may have some non-deterministic code
|
||||
embedded inside it that cannot be removed. For these rare cases, there is the ``@StubOutForDJVM``
|
||||
annotation:
|
||||
|
||||
.. literalinclude:: ../../core/src/main/kotlin/net/corda/core/StubOutForDJVM.kt
|
||||
:language: kotlin
|
||||
:start-after: DOCSTART 01
|
||||
:end-before: DOCEND 01
|
||||
..
|
||||
|
||||
This annotation instructs ``JarFilter`` to replace the function's body with either an empty body (for functions
|
||||
that return ``void`` or ``Unit``) or one that throws ``UnsupportedOperationException``. For example:
|
||||
|
||||
.. sourcecode:: kotlin
|
||||
|
||||
fun necessaryCode() {
|
||||
nonDeterministicOperations()
|
||||
otherOperations()
|
||||
}
|
||||
|
||||
@StubOutForDJVM
|
||||
private fun nonDeterministicOperations() {
|
||||
// etc
|
||||
}
|
||||
|
||||
..
|
@ -51,6 +51,7 @@ application development please continue to refer to `the main project documentat
|
||||
certificate-revocation
|
||||
node-internals-index.rst
|
||||
json.rst
|
||||
deterministic-modules.rst
|
||||
troubleshooting.rst
|
||||
|
||||
.. conditional-toctree::
|
||||
@ -81,6 +82,7 @@ application development please continue to refer to `the main project documentat
|
||||
component-library-index.rst
|
||||
serialization-index.rst
|
||||
json.rst
|
||||
deterministic-modules.rst
|
||||
troubleshooting.rst
|
||||
|
||||
.. conditional-toctree::
|
||||
|
@ -62,8 +62,7 @@ that provide the current time, random number generators, libraries that provide
|
||||
libraries, for example. Ultimately, the only information available to the contract when verifying the transaction is
|
||||
the information included in the transaction itself.
|
||||
|
||||
Developers can pre-verify their CorDapps are determinsitic by linking their CorDapps against the deterministic modules
|
||||
(see the :doc:`Deterministic Corda Modules <deterministic-modules>`).
|
||||
Developers can pre-verify their CorDapps are determinsitic by linking their CorDapps against the deterministic modules.
|
||||
|
||||
Contract limitations
|
||||
--------------------
|
||||
|
@ -1,12 +0,0 @@
|
||||
Release process
|
||||
===============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
release-notes
|
||||
changelog
|
||||
contributing
|
||||
codestyle
|
||||
testing
|
||||
api-scanner
|