CORDA-1514 Add our patented nested Merkle trees protocol to documentation

2025-06-01 07:00:54 +00:00 · 2018-07-17 15:36:10 +01:00 · 2018-07-17 15:36:10 +01:00 · 8a54badc6a
commit 8a54badc6a
parent 921b132658
6 changed files with 96 additions and 48 deletions
--- a/docs/source/key-concepts-oracles.rst
+++ b/docs/source/key-concepts-oracles.rst
@ -31,52 +31,12 @@ If a node wishes to use a given fact in a transaction, they request a command as
 the oracle considers the fact to be true, they send back the required command. The node then includes the command in
 their transaction, and the oracle will sign the transaction to assert that the fact is true.

+For privacy purposes, the oracle does not require to have access on every part of the transaction and the only
+information it needs to see is their embedded, related to this oracle, command(s). We should also provide
+guarantees that all of the commands requiring a signature from this oracle should be visible to
+the oracle entity, but not the rest. To achieve that we use filtered transactions, in which the transaction proposer(s)
+uses a nested Merkle tree approach to "tear off" the unrelated parts of the transaction. See :doc:`key-concepts-tearoffs`
+for more information on how transaction tear-offs work.
+
 If they wish to monetize their services, oracles can choose to only sign a transaction and attest to the validity of
-the fact it contains for a fee.
-
-Transaction tear-offs
---------------------
-To sign a transaction, the only information the oracle needs to see is their embedded command. Providing any
-additional transaction data to the oracle would constitute a privacy leak. Similarly, a non-validating notary only
-needs to see a transaction's input states.
-
-To combat this, the transaction proposer(s) uses a Merkle tree to "tear off" any parts of the transaction that the
-oracle/notary doesn't need to see before presenting it to them for signing. A Merkle tree is a well-known cryptographic
-scheme that is commonly used to provide proofs of inclusion and data integrity. Merkle trees are widely used in
-peer-to-peer networks, blockchain systems and git.
-
-The advantage of a Merkle tree is that the parts of the transaction that were torn off when presenting the transaction
-to the oracle cannot later be changed without also invalidating the oracle's digital signature.
-
-Transaction Merkle trees
-^^^^^^^^^^^^^^^^^^^^^^^^
-A Merkle tree is constructed from a transaction by splitting the transaction into leaves, where each leaf contains
-either an input, an output, a command, or an attachment. The Merkle tree also contains the other fields of the
-``WireTransaction``, such as the time-window, the notary, the type and the signers.
-
-Next, the Merkle tree is built in the normal way by hashing the concatenation of nodes’ hashes below the current one
-together. It’s visible on the example image below, where ``H`` denotes sha256 function, "+" - concatenation.
-
-.. image:: resources/merkleTree.png
-
-The transaction has two input states, one output state, one attachment, one command and a time-window. For brevity
-we didn't include all leaves on the diagram (type, notary and signers are presented as one leaf labelled Rest - in
-reality they are separate leaves). Notice that if a tree is not a full binary tree, leaves are padded to the nearest
-power of 2 with zero hash (since finding a pre-image of sha256(x) == 0 is hard computational task) - marked light
-green above. Finally, the hash of the root is the identifier of the transaction, it's also used for signing and
-verification of data integrity. Every change in transaction on a leaf level will change its identifier.
-
-Hiding data
-^^^^^^^^^^^
-Hiding data and providing the proof that it formed a part of a transaction is done by constructing Partial Merkle Trees
-(or Merkle branches). A Merkle branch is a set of hashes, that given the leaves’ data, is used to calculate the
-root’s hash. Then that hash is compared with the hash of a whole transaction and if they match it means that data we
-obtained belongs to that particular transaction.
-
-.. image:: resources/partialMerkle.png
-
-In the example above, the node ``H(f)`` is the one holding command data for signing by Oracle service. Blue leaf
-``H(g)`` is also included since it's holding time-window information. Nodes labelled ``Provided`` form the Partial
-Merkle Tree, black ones are omitted. Having time-window with the command that should be in a violet node place and
-branch we are able to calculate root of this tree and compare it with original transaction identifier - we have a
-proof that this command and time-window belong to this transaction.
+the fact it contains for a fee.
--- a/docs/source/key-concepts-tearoffs.rst
+++ b/docs/source/key-concepts-tearoffs.rst
@ -0,0 +1,87 @@
+Transaction tear-offs
+=====================
+
+.. topic:: Summary
+
+   * *Hide transaction components for privacy purposes*
+   * *Oracles and non-validating notaries can only see their "related" transaction components, but not the full transaction details*
+
+Overview
+--------
+There are cases where some of the entities involved on the transaction could only have partial visibility on the
+transaction parts. For instance, when an oracle should sign a transaction, the only information it needs to see is their
+embedded, related to this oracle, command(s). Similarly, a non-validating notary only needs to see a transaction's input
+states. Providing any additional transaction data to the oracle would constitute a privacy leak.
+
+To combat this, we use the concept of filtered transactions, in which the transaction proposer(s) uses a nested Merkle
+tree approach to "tear off" any parts of the transaction that the oracle/notary doesn't need to see before presenting it
+to them for signing. A Merkle tree is a well-known cryptographic scheme that is commonly used to provide proofs of
+inclusion and data integrity. Merkle trees are widely used in peer-to-peer networks, blockchain systems and git.
+
+The advantage of a Merkle tree is that the parts of the transaction that were torn off when presenting the transaction
+to the oracle cannot later be changed without also invalidating the oracle's digital signature.
+
+Transaction Merkle trees
+^^^^^^^^^^^^^^^^^^^^^^^^
+A Merkle tree is constructed from a transaction by splitting the transaction into leaves, where each leaf contains
+either an input, an output, a command, or an attachment. The final nested tree structure also contains the
+other fields of the transaction, such as the time-window, the notary and the required signers. As shown in the picture
+below, the only component type that is requiring two trees instead of one is the command, which is split into
+command data and required signers for visibility purposes.
+
+Corda is using a patent-pending approach using nested Merkle trees per component type. Briefly, a component sub-tree
+is generated for each component type (i.e., inputs, outputs, attachments). Then, the roots of these sub-trees
+form the leaves of the top Merkle tree and finally the root of this tree represents the transaction id.
+
+Another important feature is that a nonce is deterministically generated for each component in a way that each nonce
+is independent. Then, we use the nonces along with their corresponding components to calculate the component hash,
+which is the actual Merkle tree leaf. Nonces are required to protect against brute force attacks that otherwise would
+reveal the content of low-entropy hashed values (i.e., a single-word text attachment).
+
+After computing the leaves, each Merkle tree is built in the normal way by hashing the concatenation of nodes’ hashes
+below the current one together. It’s visible on the example image below, where ``H`` denotes sha256 function, "+" - concatenation.
+
+.. image:: resources/merkleTreeFull.png
+   :scale: 35%
+   :align: center
+
+The transaction has three input states, two output states, two commands, one attachment, a notary and a time-window.
+Notice that if a tree is not a full binary tree, leaves are padded to the nearest
+power of 2 with zero hash (since finding a pre-image of sha256(x) == 0 is hard computational task) - marked light
+green above. Finally, the hash of the root is the identifier of the transaction, it's also used for signing and
+verification of data integrity. Every change in transaction on a leaf level will change its identifier.
+
+Hiding data
+^^^^^^^^^^^
+Hiding data and providing the proof that it formed a part of a transaction is done by constructing partial Merkle trees
+(or Merkle branches). A Merkle branch is a set of hashes, that given the leaves’ data, is used to calculate the
+root’s hash. Then, that hash is compared with the hash of a whole transaction and if they match it means that data we
+obtained belongs to that particular transaction. In the following we provide concrete examples on the data visible to a
+an oracle and a non-validating notary, respectively.
+
+Let's assume that only the first command should be visible to an Oracle. We should also provide guarantees that all of
+the commands requiring a signature from this oracle should be visible to the oracle entity, but not the rest. Here is how
+this filtered transaction will be represented in the Merkle tree structure.
+
+.. image:: resources/SubMerkleTree_Oracle.png
+   :scale: 35%
+   :align: center
+
+Blue nodes and ``H(c2)`` are provided to the Oracle service, while the black ones are omitted. ``H(c2)`` is required, so
+that the Oracle can compute ``H(commandData)`` without being to able to see the second command, but at the same time
+ensuring ``CommandData1`` is part of the transaction. It is highlighted that all signers are visible, so as to have a
+proof that no related command (that the Oracle should see) has been maliciously filtered out. Additionally, hashes of
+sub-trees (violet nodes) are also provided in the current Corda protocol. The latter is required for special cases, i.e.,
+when required to know if a component group is empty or not.
+
+Having all of the aforementioned data, one can calculate the root of the top tree and compare it with original
+transaction identifier - we have a proof that this command and time-window belong to this transaction.
+
+Along the same lines, if we want to send the same transaction to a non-validating notary we should hide all components
+apart from input states, time-window and the notary information. This data is enough for the notary to know which
+input states should be checked for double-spending, if the time-window is valid and if this transaction should be
+notarised by this notary.
+
+.. image:: resources/SubMerkleTree_Notary.png
+   :scale: 35%
+   :align: center
--- a/docs/source/key-concepts.rst
+++ b/docs/source/key-concepts.rst
@ -23,6 +23,7 @@ This section should be read in order:
   key-concepts-time-windows
   key-concepts-oracles
   key-concepts-node
+   key-concepts-tearoffs
   key-concepts-tradeoffs

 The detailed thinking and rationale behind these concepts are presented in two white papers:
--- a/docs/source/resources/SubMerkleTree_Notary.png
+++ b/docs/source/resources/SubMerkleTree_Notary.png
--- a/docs/source/resources/SubMerkleTree_Oracle.png
+++ b/docs/source/resources/SubMerkleTree_Oracle.png
--- a/docs/source/resources/merkleTreeFull.png
+++ b/docs/source/resources/merkleTreeFull.png