Tech white paper refresh, part 1 (#5233)

Tech white paper refresh, part 1. In part 1: * A new section is added on package namespace ownership and the no-overlap rule. * The spelling of "serialize" is standardized on the US spelling used by the code, and add some content on serialization to the docs. * Make a variety of smaller edits intended to make it read better. * Spelling fixes. * The discussion of C-I is temporarily removed, pending later re-addition in a new privacy section. * Reference states are described. * More TODOs are added to help me keep track of things that are needed. * The discussion of time and clock sync is updated. * The discussion of identity lookups is removed.
2025-04-07 11:27:01 +00:00 · 2019-06-21 14:29:42 +01:00 · 2019-06-21 14:29:42 +01:00 · a2c5cd1947
commit a2c5cd1947
parent c7cb6ef725
2 changed files with 313 additions and 241 deletions
--- a/docs/source/whitepaper/Ref.bib
+++ b/docs/source/whitepaper/Ref.bib
@ -75,13 +75,6 @@
    year = 2016
 }

-@misc{CordaIntro,
-    title = "\emph{{Corda: An introduction}}",
-    author = "{{Brown, Carlyle, Grigg, Hearn}}",
-    howpublished = "{\url{http://r3cev.com/s/corda-introductory-whitepaper-final.pdf}}",
-    year = 2016
-}
-
@misc{PaymentChannels,
    title = "Bitcoin micropayment channels",
    author = "{{Mike Hearn}}",
@ -371,4 +364,20 @@ publisher = {USENIX Association},
    author = {Tim Swanson},
    howpublished = {\url{http://tabbforum.com/opinions/settlement-risks-involving-public-blockchains}},
    year = {2016}
+}
+
+@misc{DeserialisingPickles,
+    author = {Lawrence and Frohoff},
+    howpublished = {\url{http://frohoff.github.io/appseccali-marshalling-pickles/}},
+    year = {2016}
+}
+
+@misc{MetaWidget,
+    howpublished = {\url{http://www.metawidget.org/}},
+    year = {2018}
+}
+
+@misc{ReflectionUI,
+    howpublished= {\url{http://javacollection.net/reflectionui/}},
+    year = {2018}
 }
--- a/docs/source/whitepaper/corda-technical-whitepaper.tex
+++ b/docs/source/whitepaper/corda-technical-whitepaper.tex
@ -40,7 +40,7 @@

 \maketitle
 \begin{center}
-Version 0.5
+Version 1.0
 \end{center}

 \vspace{10mm}
@ -48,28 +48,23 @@ Version 0.5
 \begin{abstract}

 A decentralised database with minimal trust between nodes would allow for the creation of a global ledger. Such a ledger
-would have many useful applications in finance, trade, supply chain tracking and more. We present Corda, a decentralised
+would have many useful applications in finance, trade, healthcare and more. We present Corda, a decentralised
 global database, and describe in detail how it achieves the goal of providing a platform for decentralised app
 development. We elaborate on the high level description provided in the paper \emph{Corda: An
 introduction}\cite{CordaIntro} and provide a detailed technical discussion.

 \end{abstract}
 \vfill
-\begin{center}
-\scriptsize{
-\textsc{This document describes the Corda design as intended. The reference
-implementation does not implement everything described within at this time.}
-}
-\end{center}
+
 \newpage
 \tableofcontents
 \newpage
 \section{Introduction}

-In many industries significant effort is needed to keep organisation specific databases in sync with each
-other. In the financial sector the effort of keeping different databases synchronised, reconciling them to ensure
+In many industries significant effort is needed to keep organisation specific databases in sync with each other.
+In the financial sector the effort of keeping different databases synchronised, reconciling them to ensure
 they actually are synchronised and resolving the `breaks' that occur when they are not represents a significant
-fraction of the total work a bank actually does!
+fraction of the total work a bank actually does.

 Why not just use a shared relational database? This would certainly solve a lot of problems using only existing technology,
 but it would also raise more questions than answers:
@ -79,8 +74,8 @@ but it would also raise more questions than answers:
 \item In which countries would it be hosted? What would stop that country abusing the mountain of sensitive information it would have?
 \item What if it were hacked?
 \item Can you actually scale a relational database to fit the entire financial system?
-\item What happens if The Financial System\texttrademark~needs to go down for maintenance?
-\item What kind of nightmarish IT bureaucracy would guard changes to the database schemas?
+\item What happens if the database needs to go down for maintenance? Does the economy stop?
+\item What kind of nightmarish IT bureaucracy would guard schema changes?
 \item How would you manage access control?
 \end{itemize}

@ -88,7 +83,7 @@ We can imagine many other questions. A decentralised database attempts to answer

 In this paper we differentiate between a \emph{decentralised} database and a \emph{distributed} database. A distributed
 database like BigTable\cite{BigTable} scales to large datasets and transaction volumes by spreading the data over many
-computers. However it is assumed that the computers in question are all run by a single homogenous organisation and that
+computers. However it is assumed that the computers in question are all run by a single homogeneous organisation and that
 the nodes comprising the database all trust each other not to misbehave or leak data. In a decentralised database, such
 as the one underpinning Bitcoin\cite{Bitcoin}, the nodes make much weaker trust assumptions and actively cross-check
 each other's work. Such databases trade performance and usability for security and global acceptance.
@ -96,9 +91,10 @@ each other's work. Such databases trade performance and usability for security a
 \emph{Corda} is a decentralised database platform with the following novel features:

 \begin{itemize}
-\item New transaction types can be defined using JVM\cite{JVM} bytecode.
+\item Nodes are arranged in an authenticated peer to peer network. All communication is direct. Gossip is not used.
+\item New transaction types can be defined using JVM\cite{JVM} bytecode. The bytecode is statically analyzed and rewritten
+on the fly to be fully deterministic, and to implement deterministic execution time quotas.
 \item Transactions may execute in parallel, on different nodes, without either node being aware of the other's transactions.
-\item Nodes are arranged in an authenticated peer to peer network. All communication is direct.
 \item There is no block chain\cite{Bitcoin}. Transaction races are deconflicted using pluggable \emph{notaries}. A single
 Corda network may contain multiple notaries that provide their guarantees using a variety of different algorithms. Thus
 Corda is not tied to any particular consensus algorithm. (\cref{sec:notaries})
@ -107,20 +103,20 @@ another node on demand, but there is no global broadcast of \emph{all} transacti
 \item Bytecode-to-bytecode transpilation is used to allow complex, multi-step transaction building protocols called
 \emph{flows} to be modelled as blocking code. The code is transformed into an asynchronous state machine, with
 checkpoints written to the node's backing database when messages are sent and received. A node may potentially have
-millions of flows active at once and they may last days, across node restarts and even upgrades. Flows expose progress
-information to node administrators and users and may interact with people as well as other nodes. A Flow library is provided
-to enable developers to re-use common Flow types such as notarisation, membership broadcast and so on.
+millions of flows active at once and they may last days, across node restarts and even certain kinds of upgrade. Flows expose progress
+information to node administrators and users and may interact with people as well as other nodes. A library of flows is provided
+to enable developers to re-use common protocols such as notarisation, membership broadcast and so on.
 \item The data model allows for arbitrary object graphs to be stored in the ledger. These graphs are called \emph{states} and are the atomic unit of data.
 \item Nodes are backed by a relational database and data placed in the ledger can be queried using SQL as well as joined
-with private tables, thanks to slots in the state definitions that are reserved for join keys.
+with private tables. States can declare a relational mapping using the JPA standard.
 \item The platform provides a rich type system for the representation of things like dates, currencies, legal entities and
 financial entities such as cash, issuance, deals and so on.
-\item States can declare a relational mapping and can be queried using SQL.
-\item Integration with existing systems is considered from the start. The network can support rapid bulk data imports
-from other database systems without placing load on the network. Events on the ledger are exposed via an embedded JMS
-compatible message broker.
+\item The network can support rapid bulk data imports from other database systems without placing load on the network.
+Events on the ledger are exposed via an embedded JMS compatible message broker.
 \item States can declare scheduled events. For example a bond state may declare an automatic transition to an
 ``in default'' state if it is not repaid in time.
+\item Advanced privacy controls allow users to anonymize identities, and initial support is provided for running
+smart contracts inside memory spaces encrypted and protected by Intel SGX.
 \end{itemize}

 Corda follows a general philosophy of reusing existing proven software systems and infrastructure where possible.
@ -130,12 +126,13 @@ Comparisons with Bitcoin and Ethereum will be provided throughout.

 \section{Overview}

-Corda is a platform for the writing of ``CorDapps'': applications that extend the global database with new capabilities.
-Such apps define new data types, new inter-node protocol flows and the ``smart contracts'' that determine allowed changes.
+Corda is a platform for the writing and execution of ``CorDapps'': applications that extend the global database with new capabilities.
+Such apps define new data types, new inter-node protocol flows and the so-called ``smart contracts'' that determine
+allowed changes.

 What is a smart contract? That depends on the model of computation we are talking about. There are two competing
 computational models used in decentralised databases: the virtual computer model and the UTXO model. The virtual
-computer model is used by Ethereum\cite{Ethereum}. It models the database as the in-memory state of a
+computer model is used by Ethereum\cite{Ethereum} and Hyperledger Fabric. It models the database as the in-memory state of a
 global computer with a single thread of execution determined by the block chain. In the UTXO model, as used in
 Bitcoin, the database is a set of immutable rows keyed by \texttt{(hash:output index)}. Transactions define
 outputs that append new rows and inputs which consume existing rows. The term ``smart contract'' has a different
@ -165,15 +162,17 @@ The Corda transaction format has various other features which are described in l
 A Corda network consists of the following components:

 \begin{itemize}
-\item Nodes, communicating using AMQP/1.0 over TLS. Nodes use a relational database for data storage.
-\item A permissioning service that automates the process of provisioning TLS certificates.
-\item A network map service that publishes information about nodes on the network.
-\item One or more notary services. A notary may itself be distributed over multiple nodes.
+\item Nodes, operated by \emph{parties}, communicating using AMQP/1.0 over TLS.
+\item An \emph{doorman} service, that grants parties permission to use the network by provisioning identity certificates.
+\item A network map service that publishes information about how to connect to nodes on the network.
+\item One or more notary services. A notary may itself be distributed over a coalition of different parties.
 \item Zero or more oracle services. An oracle is a well known service that signs transactions if they state a fact
 and that fact is considered to be true. They may also optionally also provide the facts. This is how the ledger can be
 connected to the real world, despite being fully deterministic.
 \end{itemize}

+% TODO: Add section on zones and network parameters
+
 A purely in-memory implementation of the messaging subsystem is provided which can inject simulated latency between
 nodes and visualise communications between them. This can be useful for debugging, testing and educational purposes.

@ -191,39 +190,38 @@ arbitrary self-selected usernames. The permissioning service can implement any p
 identities it signs are globally unique. Thus an entirely anonymous Corda network is possible if a suitable
 IP obfuscation system like Tor\cite{Dingledine:2004:TSO:1251375.1251396} is also used.

-Whilst simple string identities are likely sufficient for some networks, the financial industry typically requires some
-level of \emph{know your customer} checking, and differentiation between different legal entities, branches and desks
+Whilst simple string identities are likely sufficient for some networks, industrial deployments typically require some
+level of identity verification, as well as differentiation between different legal entities, branches and desks
 that may share the same brand name. Corda reuses the standard PKIX infrastructure for connecting public keys to
-identities and thus names are actually X.500 names. When a single string is sufficient the \emph{common name} field can
-be used alone, similar to the web PKI. In more complex deployments the additional structure X.500 provides may be useful
-to differentiate between entities with the same name. For example there are at least five different companies called
-\emph{American Bank} and in the past there may have been more than 40 independent banks with that name.
+identities and thus names are actually X.500 names. Because legal names are unique only within a jurisdiction, the
+additional structure X.500 provides is useful  to differentiate between entities with the same name. For example
+there are at least five different companies called \emph{American Bank} and in the past there may have been more
+than 40 independent banks with that name.

 More complex notions of identity that may attest to many time-varying attributes are not handled at this layer of the
-system: the base identity is always just an X.500 name. Note that even though messaging is always identified, transactions
-themselves may still contain anonymous public keys.
-
-% TODO: Currently the node only lets you pick the CN and the rest of the X.500 name is dummy data.
+system: the base identity is always just an X.500 name. Note that even though messaging is always identified, ledger data
+itself may contain only anonymised public keys.

 \subsection{The network map}

-Every network requires a network map service, which may itself be composed of multiple cooperating nodes. This is
-similar to Tor's concept of \emph{directory authorities}. The network map publishes the IP addresses through which
-every node on the network can be reached, along with the identity certificates of those nodes and the services they
-provide. On receiving a connection, nodes check that the connecting node is in the network map.
+Every network requires a network map. This is similar to Tor's concept of \emph{directory authorities}. The network
+map service publishes information about each node such as the set of IP addresses it listens on (multiple IP
+addresses are supported for failover and load balancing purposes), the version of the protocol it speaks, and which
+identity certificates it hosts. Each data structure describing a node is signed by the identity keys it claims to
+host. The network map service is therefore not trusted to specify node data correctly, only to distribute it.

-The network map abstracts the underlying IP addresses of the nodes from more useful business concepts like identities
-and services. Each participant on the network, called a \emph{party}, publishes one or more IP addresses in the
-network map. Equivalent domain names may be helpful for debugging but are not required. User interfaces and APIs
-always work in terms of identities -- there is thus no equivalent to Bitcoin's notion of an address (hashed public key),
-and user-facing applications rely on auto-completion and search rather than QRcodes to identify a logical recipient.
+The network map abstracts the underlying network locations of the nodes to more useful business concepts like
+identities and services. Domain names for the underlying IP addresses may be helpful for debugging but are not
+required. User interfaces and APIs always work in terms of identities -- there is thus no equivalent to Bitcoin's
+notion of an address (hashed public key), and user-facing applications rely on auto-completion and search rather
+than QRcodes to identify a counterparty.

 It is possible to subscribe to network map changes and registering with the map is the first thing a node does at
-startup. Nodes may optionally advertise their nearest city for load balancing and network visualisation purposes.
+startup.

-The map is a document that may be cached and distributed throughout the network. The map is therefore not required
-to be highly available: if the map service becomes unreachable new nodes may not join the network and existing nodes
-may not change their advertised service set, but otherwise things continue as normal.
+The map is a set of files that may be cached and distributed via HTTP based content delivery networks. The
+underlying map infrastructure is therefore not required to be highly available: if the map service becomes
+unreachable nodes may not join the network or change IP addresses, but otherwise things continue as normal.

 \subsection{Message delivery}

@ -237,31 +235,65 @@ setups and thus the message routing component of a node can be separated from th
 Being outside the firewall or in the firewall's `de-militarised zone' (DMZ) is required to ensure that nodes can
 connect to anyone on the network, and be connected to in turn. In this way a node can be split into multiple
 sub-services that do not have duplex connectivity yet can still take part in the network as first class citizens.
-Additionally, a single node may have multiple advertised IP addresses.

 The reference implementation provides this functionality using the Apache Artemis message broker, through which it
 obtains journalling, load balancing, flow control, high availability clustering, streaming of messages too large to fit
 in RAM and many other useful features. The network uses the \emph{AMQP/1.0}\cite{AMQP} protocol which is a widely
 implemented binary messaging standard, combined with TLS to secure messages in transit and authenticate the endpoints.

-\subsection{Serialization, sessioning, deduplication and signing}
+\subsection{Serialization}\label{subsec:serialization}

-All messages are encoded using a compact binary format. Each message has a UUID set in an AMQP header which is used
-as a deduplication key, thus accidentally redelivered messages will be ignored.
-
-% TODO: Describe the serialization format in more detail once finalised.
+All messages are encoded using an extended form of the AMQP/1.0 binary format (\emph{Advanced Message Queue
+Protocol}\cite{AMQP}). Each message has a UUID set in an AMQP header which is used as a deduplication key, thus
+accidentally redelivered messages will be ignored.

 Messages may also have an associated organising 64-bit \emph{session ID}. Note that this is distinct from the AMQP
-notion of a session. Sessions can be long lived and persist across node restarts and network outages. They exist in order
-to group messages that are part of a \emph{flow}, described in more detail below.
+notion of a session. Sessions can be long lived and persist across node restarts and network outages. They exist in
+order to group messages that are part of a \emph{flow}, described in more detail below.

-Messages that are successfully processed by a node generate a signed acknowledgement message called a `receipt'. Note that
-this is distinct from the unsigned acknowledgements that live at the AMQP level and which simply flag that a message was
-successfully downloaded over the wire. A receipt may be generated some time after the message is processed in the case
-where acknowledgements are being batched to amortise signing overhead, and the receipt identifies the message by the hash
-of its content. The purpose of the receipts is to give a node undeniable evidence that a counterparty received a
-notification that would stand up later in a dispute mediation process. Corda does not attempt to support deniable
-messaging.
+Corda uses AMQP and extends it with more advanced types and embedded binary schemas, such that all messages are self
+describing. Because ledger data typically represents business agreements and data, it may persist for years and
+survive many upgrades and infrastructure changes. We require that data is always interpretable in strongly typed
+form, even if that data has been stored to a context-free location like a file, or the clipboard.
+
+Although based on AMQP, Corda's type system is fundamentally the Java type system. Java types are mapped to AMQP/1.0
+types whenever practical, but ledger data will frequently contain business types that the AMQP type system does not
+define. Fortunately, AMQP is extensible and supports standard concepts like polymorphism and interfaces, so it is
+straightforward to define a natural Java mapping. Type schemas are hashed to form a compact `fingerprint' that
+identifies the type, which allows types to be connected to the embedded binary schemas that describe them and which
+are useful for caching. The AMQP type system and schema language supports a form of annotations that we map to Java
+annotations.
+
+Object serialization frameworks must always consider security. Corda requires all types that may appear in
+serialized streams to mark themselves as safe for deserialization, and objects are only created via their
+constructors. Thus any data invariants that are enforced by constructors or setter methods are also enforced for
+deserialized data. Additionally, requests to deserialize an object specify the expected types. These two mechanisms
+block gadget-based attacks\cite{DeserialisingPickles}. Such attacks frequently affect any form of data
+deserialization regardless of format, for example, they have been found not only in Java object serialization
+frameworks but also JSON and XML parsers. They occur when a deserialization framework may instantiate too large a
+space of types which were not written with malicious input in mind.
+
+The serialization framework supports advanced forms of data evolution. When a stream is deserialized Corda attempts
+to map it to the named Java classes. If those classes don't exactly match, a process called `evolution' is
+triggered, which automatically maps the data as smoothly as possible. For example, deserializing an old object will
+attempt to use a constructor that matches the serialized schema, allowing default values in new code to fill in the
+gaps. When old code reads data from the future, new fields will be discarded if safe to do so. Various forms of type
+adaptation are supported, and type-safe enums can have unknown values mapped to a default enum value as well.
+
+If no suitable class is found at all, the framework performs \emph{class synthesis}. The embedded schema data will
+be used to generate the bytecode for a suitable holder type and load it into the JVM on the fly. These new classes
+will then be instantiated to hold the deserialized data. The new classes will implement any interfaces the schema is
+specified as supporting if those interfaces are found on the Java classpath. In this way the framework supports a
+form of generic programming. Tools can work with serialized data without having a copy of the app that generated it.
+The returned objects can be accessed either using reflection, or a simple interface that automates accessing
+properties by name and is just a friendlier way to access fields reflectively. Creating genuine object graphs like
+this is superior to the typical approach of defining a format specific generic data holder type (XML's DOM
+\texttt{Element}, \texttt{JSONObject} etc) because there is already a large ecosystem of tools and technologies that
+know how to work with objects via reflection. Synthesised object graphs can be fed straight into JSON or YaML
+serializers to get back text, inserted into a scripting engine for usage with dynamic languages like JavaScript or
+Python, fed to JPA for database persistence and query or a Bean Validation engine for integrity checking, or even
+used to automatically generate GUIs using a toolkit like MetaWidget\cite{MetaWidget} or
+ReflectionUI\cite{ReflectionUI}.

 \section{Flow framework}\label{sec:flows}

@ -273,7 +305,7 @@ counterparty a shared transaction that spends that pot, with extra transactions
 other fails to terminate properly. Such protocols typically involve reliable private message passing, checkpointing to
 disk, signing of transactions, interaction with the p2p network, reporting progress to the user, maintaining a complex
 state machine with timeouts and error cases, and possibly interaction with internal systems on either side. All
-this can become quite involved. The implementation of Bitcoin payment channels in the bitcoinj library is approximately
+this can become quite involved. The implementation of payment channels in the \texttt{bitcoinj} library is approximately
 9000 lines of Java, very little of which involves cryptography.

 As another example, the core Bitcoin protocol only allows you to append transactions to the ledger. Transmitting other
@ -290,15 +322,15 @@ form of communication is global broadcast, in Corda \emph{all} communication tak
 called flows.

 The flow framework presents a programming model that looks to the developer as if they have the ability to run millions
-of long lived threads which can survive node restarts, and even node upgrades. APIs are provided to send and receive
-object graphs to and from other identities on the network, embed sub-flows, and report progress to observers. In this
-way business logic can be expressed at a very high level, with the details of making it reliable and efficient
-abstracted away. This is achieved with the following components.
+of long lived threads which can survive node restarts. APIs are provided to send and receive
+serialized object graphs to and from other identities on the network, embed sub-flows, handle version evolution and
+report progress to observers. In this way business logic can be expressed at a very high level, with the details of
+making it reliable and efficient abstracted away. This is achieved with the following components.

 \paragraph{Just-in-time state machine compiler.}Code that is written in a blocking manner typically cannot be stopped
 and transparently restarted later. The first time a flow's \texttt{call} method is invoked a bytecode-to-bytecode
 transformation occurs that rewrites the classes into a form that implements a resumable state machine. These state
-machines are sometimes called fibers or coroutines, and the transformation engine Corda uses (Quasar) is capable of rewriting
+machines are sometimes called coroutines, and the transformation engine Corda uses (Quasar) is capable of rewriting
 code arbitrarily deep in the stack on the fly. The developer may thus break his or her logic into multiple methods and
 classes, use loops, and generally structure their program as if it were executing in a single blocking thread. There's only a
 small list of things they should not do: sleeping, directly accessing the network APIs, or doing other tasks that might
@ -306,11 +338,11 @@ block outside of the framework.

 \paragraph{Transparent checkpointing.}When a flow wishes to wait for a message from another party (or input from a
 human being) the underlying stack frames are suspended onto the heap, then crawled and serialized into the node's
-underlying relational database using an object serialization framework. The written objects are prefixed with small
-schema definitions that allow some measure of portability across changes to the layout of objects, although
-portability across changes to the stack layout is left for future work. Flows are resumed and suspended on demand, meaning
-it is feasible to have far more flows active at once than would fit in memory. The checkpointing process is atomic with
-changes to local storage and acknowledgement of network messages.
+underlying relational database (however, the AMQP framework isn't used in this case). The written objects are prefixed
+with small schema definitions that allow some measure of portability across changes to the layout of objects,
+although portability across changes to the stack layout is left for future work. Flows are resumed and suspended on
+demand, meaning it is feasible to have far more flows active at once than would fit in memory. The checkpointing process
+is atomic with respect to changes to the database and acknowledgement of network messages.

 \paragraph{Identity to IP address mapping.}Flows are written in terms of identities. The framework takes care of routing
 messages to the right IP address for a given identity, following movements that may take place whilst the flow is active
@ -325,12 +357,13 @@ steps can have sub-trackers for invoked sub-flows.

 \paragraph{Flow hospital.}Flows can pause if they throw exceptions or explicitly request human assistance. A flow that
 has stopped appears in the \emph{flow hospital} where the node's administrator may decide to kill the flow or provide it
-with a solution. The ability to request manual solutions is useful for cases where the other side isn't sure why you
-are contacting them, for example, the specified reason for sending a payment is not recognised, or when the asset used for
-a payment is not considered acceptable.
+with a solution. Some flows that end up in the hospital will be retried automatically by the node itself, for example
+in case of database deadlocks that require a retry. The ability to request manual solutions is useful for cases
+where the other side isn't sure why you are contacting them, for example, the specified reason for sending a payment
+is not recognised, or when the asset used for a payment is not considered acceptable.

-Flows are named using reverse DNS notation and several are defined by the base protocol. Note that the framework is
-not required to implement the wire protocols, it is just a development aid.
+Flows are identified using Java class names i.e. reverse DNS notation, and several are defined by the base protocol.
+Note that the framework is not required to implement the wire protocols, it is just a development aid.

 % TODO: Revisit this diagram once it matches the text more closely.
 %\begin{figure}[H]
@ -343,12 +376,13 @@ not required to implement the wire protocols, it is just a development aid.
 When a transaction is presented to a node as part of a flow it may need to be checked. Simply sending you
 a message saying that I am paying you \pounds1000 is only useful if you are sure I own the money I'm using to pay you.
 Checking transaction validity is the responsibility of the \texttt{ResolveTransactions} flow. This flow performs
-a breadth-first search over the transaction graph, downloading any missing transactions into local storage and
-validating them. The search bottoms out at the issuance transactions. A transaction is not considered valid if
-any of its transitive dependencies are invalid.
+a breadth-first search over the transaction graph, downloading any missing transactions into local storage from
+the counterparty, and validating them. The search bottoms out at the issuance transactions. A transaction is not
+considered valid if any of its transitive dependencies are invalid.

 It is required that a node be able to present the entire dependency graph for a transaction it is asking another
-node to accept. Thus there is never any confusion about where to find transaction data. Because transactions are
+node to accept. Thus there is never any confusion about where to find transaction data and there is never any
+need to reach out to dozens of nodes which may or may not be currently available. Because transactions are
 always communicated inside a flow, and flows embed the resolution flow, the necessary dependencies are fetched
 and checked automatically from the correct peer. Transactions propagate around the network lazily and there is
 no need for distributed hash tables.
@ -393,39 +427,58 @@ or not can be identified by the identifier of the creating transaction and the i
 Transactions consist of the following components:

 \begin{labeling}{Input references}
-\item [Input references] These are \texttt{(hash, output index)} pairs that point to the states a
+\item [Consuming input references.] These are \texttt{(hash, output index)} pairs that point to the states a
 transaction is consuming.
-\item [Output states] Each state specifies the notary for the new state, the contract(s) that define its allowed
+\item [Output states.] Each state specifies the notary for the new state, the contract(s) that define its allowed
 transition functions and finally the data itself.
-\item [Attachments] Transactions specify an ordered list of zip file hashes. Each zip file may contain
-code, data, certificates or supporting documentation for the transaction. Contract code has access to the contents
-of the attachments when checking the transaction for validity.
-\item [Commands] There may be multiple allowed output states from any given input state. For instance
+\item [Non-consuming input references.] These are also \texttt{(hash, output index)} pairs, however these `reference
+states' are not consumed by the act of referencing them. Reference states are useful for importing data that gives
+context to other states but which is only changed from time to time. Note that the pointed to state must be unconsumed
+at the time the transaction is notarised: if it's been consumed itself as part of a different transaction, the referencing
+transaction will not be notarised. In this way, non-consuming input references can help prevent the execution of
+transactions that rely on out-of-date reference data.
+\item [Attachments.] Transactions specify an ordered list of zip file hashes. Each zip file may contain
+code, data or supporting documentation for the transaction. Contract code has access to the contents
+of the attachments when checking the transaction for validity. Attachments have no concept of `spentness' and are useful
+for things like holiday calendars, timezone data, bytecode that defines the contract logic and state objects, and so on.
+\item [Commands.] There may be multiple allowed output states from any given input state. For instance
 an asset can be moved to a new owner on the ledger, or issued, or exited from the ledger if the asset has been
 redeemed by the owner and no longer needs to be tracked. A command is essentially a parameter to the contract
 that specifies more information than is obtainable from examination of the states by themselves (e.g. data from an oracle
-service). Each command has an associated list of public keys. Like states, commands are object graphs.
-\item [Signatures] The set of required signatures is equal to the union of the commands' public keys.
-\item [Type] Transactions can either be normal or notary-changing. The validation rules for each are
+service). Each command has an associated list of public keys. Like states, commands are object graphs. Commands therefore
+define what a transaction \emph{does} in a conveniently accessible form.
+\item [Signatures.] The set of required signatures is equal to the union of the commands' public keys. Signatures can use
+a variety of cipher suites - Corda implements cryptographic agility.
+\item [Type.] Transactions can either be normal, notary-changing or explicit upgrades. The validation rules for each are
 different.
-\item [Timestamp] When present, a timestamp defines a time range in which the transaction is considered to
-have occurrred. This is discussed in more detail below.
-\item [Summaries] Textual summaries of what the transaction does, checked by the involved smart contracts. This field
-is useful for secure signing devices (see \cref{sec:secure-signing-devices}).
+\item [Timestamp.] When present, a timestamp defines a time range in which the transaction is considered to
+have occurred. This is discussed in more detail below.
+% \item [Network parameters.] Specifies the hash and epoch of the network parameters that were in force at the time the
+% transaction was notarised. See \cref{sec:network-params} for more details.
+% \item [Summaries] Textual summaries of what the transaction does, checked by the involved smart contracts. This field
+% is useful for secure signing devices (see \cref{sec:secure-signing-devices}).
 \end{labeling}

-% TODO: Update this once transaction types are separated.
 % TODO: This description ignores the participants field in states, because it probably needs a rethink.
-% TODO: Specify the elliptic curve used here once we finalise our choice.
 % TODO: Summaries aren't implemented.

-Signatures are appended to the end of a transaction and transactions are identified by the hash used for signing, so
-signature malleability is not a problem. There is never a need to identify a transaction including its accompanying
-signatures by hash. Signatures can be both checked and generated in parallel, and they are not directly exposed to
-contract code. Instead contracts check that the set of public keys specified by a command is appropriate, knowing that
-the transaction will not be valid unless every key listed in every command has a matching signature. Public key
-structures are themselves opaque. In this way algorithmic agility is retained: new signature algorithms can be deployed
-without adjusting the code of the smart contracts themselves.
+Transactions are identified by the root of a Merkle tree computed over the components. The transaction format is
+structured so that it's possible to deserialize some components but not others: a \emph{filtered transaction} is one
+in which only some components are retained (e.g. the inputs) and a Merkle branch is provided that proves the
+inclusion of those components in the original full transaction. We say these components have been `torn off'. This
+feature is particularly useful for keeping data private from notaries and oracles. See \cref{sec:tear-offs}.
+
+Signatures are appended to the end of a transaction. Thus signature malleability as seen in the Bitcoin protocol is
+not a problem. There is never a need to identify a transaction with its accompanying signatures by hash. Signatures
+can be both checked and generated in parallel, and they are not directly exposed to contract code. Instead contracts
+check that the set of public keys specified by a command is appropriate, knowing that the transaction will not be
+valid unless every key listed in every command has a matching signature. Public key structures are themselves
+opaque. In this way high performance through parallelism is possible and algorithmic agility is retained. New
+signature algorithms can be deployed without adjusting the code of the smart contracts themselves.
+
+This transaction structure is fairly complex relative to competing systems. The Corda data model is designed for
+richness, evolution over time and high performance. The cost of this is that transactions have more components than
+in simpler systems.

 \begin{figure}[H]
 \includegraphics[width=\textwidth]{cash}
@ -446,14 +499,14 @@ specified on the command(s) are those of the parties whose signatures would be r
 In this case, it means that the verify() function must check that the command has specified a key corresponding to the
 identity of the issuer of the cash state. The Corda framework is responsible for checking that the transaction has been
 signed by all keys listed by all commands in the transaction. In this way, a verify() function only needs to ensure that
-all parties who need to sign the transaction are specified in Commands, with the framework responsible for ensuring that
+all parties who need to sign the transaction are specified in commands, with the framework responsible for ensuring that
 the transaction has been signed by all parties listed in all commands.

 \subsection{Composite keys}\label{sec:composite-keys}

 The term ``public key'' in the description above actually refers to a \emph{composite key}. Composite keys are trees in
 which leaves are regular cryptographic public keys with an accompanying algorithm identifiers. Nodes in the tree specify
-both the weights of each child and a threshold weight that must be met. The validty of a set of signatures can be
+both the weights of each child and a threshold weight that must be met. The validity of a set of signatures can be
 determined by walking the tree bottom-up, summing the weights of the keys that have a valid signature and comparing
 against the threshold. By using weights and thresholds a variety of conditions can be encoded, including boolean
 formulas with AND and OR.
@ -467,16 +520,18 @@ Composite keys are useful in multiple scenarios. For example, assets can be plac
 composite key where one leaf key is owned by a user, and the other by an independent risk analysis system. The
 risk analysis system refuses to sign if the transaction seems suspicious, like if too much value has been
 transferred in too short a time window. Another example involves encoding corporate structures into the key,
-allowing a CFO to sign a large transaction alone but his subordinates are required to work together. Composite keys
-are also useful for notaries. Each participant in a distributed notary is represented by a leaf, and the threshold
-is set such that some participants can be offline or refusing to sign yet the signature of the group is still valid.
+allowing a CFO to sign a large transaction alone but his subordinates are required to work together.
+
+Composite keys are also useful for byzantine fault tolerant notaries. Each participant in a distributed notary is
+represented by a leaf, and the threshold is set such that some participants can be offline or refusing to sign
+yet the signature of the group is still valid.

 Whilst there are threshold signature schemes in the literature that allow composite keys and signatures to be produced
 mathematically, we choose the less space efficient explicit form in order to allow a mixture of keys using different
 algorithms. In this way old algorithms can be phased out and new algorithms phased in without requiring all
 participants in a group to upgrade simultaneously.

-\subsection{Timestamps}\label{sec:timestamps}
+\subsection{Time handling}\label{sec:timestamps}

 Transaction timestamps specify a \texttt{[start, end]} time window within which the transaction is asserted to have
 occurred. Timestamps are expressed as windows because in a distributed system there is no true time, only a large number
@ -503,14 +558,18 @@ to a notary may be unpredictable if submission occurs right on a boundary of the
 perspective of all other observers the notary's signature is decisive: if the signature is present, the transaction
 is assumed to have occurred within that time.

-\paragraph{Reference clocks.}In order to allow for relatively tight time windows to be used when transactions are fully
-under the control of a single party, notaries are expected to be synchronised to the atomic clocks at the US Naval
-Observatory. Accurate feeds of this clock can be obtained from GPS satellites. Note that Corda uses the Java
-timeline\cite{JavaTimeScale} which is UTC with leap seconds spread over the last 1000 seconds of the day, thus each day
-always has exactly 86400 seconds. Care should be taken to ensure that changes in the GPS leap second counter are
-correctly smeared in order to stay synchronised with Java time. When setting a transaction time window care must be
-taken to account for network propagation delays between the user and the notary service, and messaging within the notary
-service.
+\paragraph{Reference clocks.}In order to allow for relatively tight time windows to be used when transactions are
+fully under the control of a single party, notaries are expected to be synchronised to international atomic time
+(TIA). Accurate feeds of this clock can be obtained from GPS satellites and long-wave radio. Note that Corda uses
+the Google/Amazon timeline, which is UTC with a leap smear from noon to noon across the leap event, thus each day
+always has exactly 86400 seconds.
+
+\paragraph{Timezones.}Business agreements typically specify times in local time zones rather than offsets from
+midnight UTC on January 1st 1970, although the latter would be more civilised. Because the Corda type system is the
+Java type system, developers can embed \texttt{java.time.ZonedDateTime} in their states to represent a time
+specified in a specific time zone. This allows ensure correct handling of daylight savings transitions and timezone
+definition changes. Future versions of the platform will allow timezone data files to be attached to transactions,
+to make such calculations entirely deterministic.

 \subsection{Attachments and contract bytecodes}

@ -518,18 +577,26 @@ Transactions may have a number of \emph{attachments}, identified by the hash of
 and transmitted separately to transaction data and are fetched by the standard resolution flow only when the
 attachment has not previously been seen before.

-Attachments are always zip files\cite{ZipFormat} and cannot be referred to individually by contract code. The files
-within the zips are collapsed together into a single logical file system, with overlapping files being resolved in
-favour of the first mentioned. Not coincidentally, this is the mechanism used by Java classpaths.
+Attachments are always zip files\cite{ZipFormat}. The files within the zips are collapsed together into a single
+logical file system and class path.

-Smart contracts in Corda are defined using JVM bytecode as specified in \emph{``The Java Virtual Machine Specification SE 8 Edition''}\cite{JVM},
+Smart contracts in Corda are defined using a restricted form of JVM bytecode as specified in
+\emph{``The Java Virtual Machine Specification SE 8 Edition''}\cite{JVM},
 with some small differences that are described in a later section. A contract is simply a class that implements
 the \texttt{Contract} interface, which in turn exposes a single function called \texttt{verify}. The verify
 function is passed a transaction and either throws an exception if the transaction is considered to be invalid,
 or returns with no result if the transaction is valid. The set of verify functions to use is the union of the contracts
-specified by each state (which may be expressed as constraints, see \cref{sec:contract-constraints}). Embedding the
-JVM specification in the Corda specification enables developers to write code in a variety of languages, use well
-developed toolchains, and to reuse code already authored in Java or other JVM compatible languages.
+specified by each state, which are expressed as a class name combined with a \emph{constraint} (see \cref{sec:contract-constraints}).
+Embedding the JVM specification in the Corda specification enables developers to write code in a variety of
+languages, use well developed toolchains, and to reuse code already authored in Java or other JVM compatible languages.
+A good example of this feature in action is the ability to embed the ISDA Common Domain Model directly into CorDapps.
+The CDM is a large collection of types mapped to Java classes that model derivatives trading in a standardised way.
+It is common for industry groups to define such domain models and for them to have a Java mapping.
+
+Current versions of the platform only execute attachments that have been previously installed (and thus
+whitelisted), or attachments that are signed by the same signer as a previously installed attachment. Thus
+nodes may fail to reach consensus on long transaction chains that involve apps your counterparty has not seen.
+Future versions of the platform will run contract bytecode inside a deterministic JVM. See \cref{sec:djvm}.

 The Java standards also specify a comprehensive type system for expressing common business data. Time and calendar
 handling is provided by an implementation of the JSR 310 specification, decimal calculations can be performed either
@ -537,14 +604,13 @@ using portable (`\texttt{strictfp}') floating point arithmetic or the provided b
 libraries have been carefully engineered by the business Java community over a period of many years and it makes
 sense to build on this investment.

-Contract bytecode also defines the states themselves, which may be arbitrary object graphs. Because JVM classes
-are not a convenient form to work with from non-JVM platforms the allowed types are restricted and a standardised
-binary encoding scheme is provided. States may label their properties with a small set of standardised annotations.
-These can be useful for controlling how states are serialised to JSON and XML (using JSR 367 and JSR 222 respectively),
-for expressing static validation constraints (JSR 349) and for controlling how states are inserted into relational
-databases (JSR 338). This feature is discussed later.
+Contract bytecode also defines the states themselves, which may be directed acyclic object graphs. States may label
+their properties with a small set of standardised annotations. These can be useful for controlling how states are
+serialized to JSON and XML (using JSR 367 and JSR 222 respectively), for expressing static validation constraints
+(JSR 349) and for controlling how states are inserted into relational databases (JSR 338). This feature is discussed later.
+Future versions of the platform may additionally support cyclic object graphs.

-Attachments may also contain data files that support the contract code. These may be in the same zip as the
+\paragraph{Data files.}Attachments may also contain data files that support the contract code. These may be in the same zip as the
 bytecode files, or in a different zip that must be provided for the transaction to be valid. Examples of such
 data files might include currency definitions, timezone data and public holiday calendars. Any public data may
 be referenced in this way. Attachments are intended for data on the ledger that many parties may wish to reuse
@ -552,20 +618,97 @@ over and over again. Data files are accessed by contract code using the same API
 would be accessed. The platform imposes some restrictions on what kinds of data can be included in attachments
 along with size limits, to avoid people placing inappropriate files on the global ledger (videos, PowerPoints etc).

-% TODO: No such abuse limits are currently in place.
-
 Note that the creator of a transaction gets to choose which files are attached. Therefore, it is typical that
-states place constraints on the data they're willing to accept. Attachments \emph{provide} data but do not
-\emph{authenticate} it, so if there's a risk of someone providing bad data to gain an economic advantage
-there must be a constraints mechanism to prevent that from happening. This is rooted at the contract constraints
-encoded in the states themselves: a state can not only name a class that implements the \texttt{Contract}
-interface but also place constraints on the zip/jar file that provides it. That constraint can in turn be used to
-ensure that the contract checks the authenticity of the data -- either by checking the hash of the data directly,
-or by requiring the data to be signed by some trusted third party.
+states place constraints on the data they're willing to accept. These mechanisms are discussed in
+\cref{sec:contract-constraints}.

-% TODO: The code doesn't match this description yet.
+\paragraph{Signing.}Attachments may be signed using the JAR signing standard. No particular certificate is necessary
+for this: Corda accepts self signed certificates for JARs. The signatures are useful for two purposes. Firstly, it
+allows states to express that they can be satisfied by any attachment signed by a particular provider. This allows
+on-ledger code to be upgraded over time. And secondly, signed JARs may provide classes in `\emph{claimed packages}',
+which are discussed below.

-\subsection{Hard forks, specifications and dispute resolution}
+\subsection{Contract constraints}\label{sec:contract-constraints}
+
+In Bitcoin contract logic is embedded inside every transaction. Programs are small and data is inlined into the
+bytecode, so upgrading code that's been added to the ledger is neither possible nor necessary. There's no need for a
+mechanism to tie code and data together. In Corda contract logic may be far more complex. It will usually reflect a
+changing business world which means it may need to be upgraded from time to time.
+
+The easiest way of tying states to the contract code that defines them is by hash. This is equivalent to other
+ledger platforms and is referred to as an \emph{hash constraint}. They work well for very simple and stable
+programs, but more complicated contracts may need to be upgraded. In this case it may be preferable for states to
+refer to contracts by the identity of the signer (a \emph{signature constraint}). Because contracts are stored in
+zip files, and because a Java Archive (JAR) file is just a zip with some extra files inside, it is possible to use
+the standard JAR signing infrastructure to identify the source of contract code. Simple constraints such as ``any
+contract of this name signed by these keys'' allow for some upgrade flexibility, at the cost of increased exposure
+to rogue contract developers. Requiring combinations of signatures helps reduce the risk of a rogue or hacked
+developer publishing a bad contract version, at the cost of increased difficulty in releasing new versions. State
+creators may also specify third parties they wish to review contract code. Regardless of which set of tradeoffs is
+chosen, the framework can accommodate them.
+
+A contract constraint may use a composite key of the type described in~\cref{sec:composite-keys}. The standard JAR
+signing protocol allows for multiple signatures from different private keys, thus being able to satisfy composite
+keys. The allowed signing algorithms are \texttt{SHA256withRSA} and \texttt{SHA256withECDSA}. Note that the
+cryptographic algorithms used for code signing may not always be the same as those used for transaction signing, as
+for code signing we place initial focus on being able to re-use the infrastructure.
+
+\subsection{Precise naming}\label{subsec:precise-naming}
+
+In any system that combines typed data with potentially malicious adversaries, it's important to always ensure
+names are not allowed to become ambiguous or mixed up. Corda achieves this via a combination of different
+features.
+
+\paragraph{No overlap rule.} Within a transaction attachments form a Java classpath. Class names are resolved
+by locating the defining class file within the set of attachments and loading them via the deterministic JVM.
+Unfortunately, out of the box Java allows different JAR files to define the same class name. Whichever JAR
+happens to come first on the classpath is the one that gets used, but conventionally a classpath is not meant
+to have an important ordering. This problem is a frequent source of confusion and bugs in Java software,
+especially when different versions of the same module are combined into one program. On the ledger an adversary
+can craft a malicious transaction that attempts to trick a node or application into thinking it does one thing
+whilst actually doing another. To prevent attackers from building deliberate classpath conflicts to change the
+behaviour of code, a transaction in which two file paths overlap between attachments is invalid. A small number
+of files that are expected to overlap normally, such as files in the \texttt{META-INF} directory, are excluded.
+
+\paragraph{Package namespace ownership.} Corda allows parts of the Java package namespace to be reserved for
+particular developers, identified by a public key (which may or may not be an identity on the node's zone).
+Any JAR that exports a class in an owned package namespace but which is not signed by the owning key is
+considered to be invalid. Reserving a package namespace is optional but can simplify the data model and make
+applications more secure.
+
+The reason for this is related to a mismatch between the way the ledger names code and the way programming
+languages do. In the distributed ledger world a bundle of code is referenced by hash or signing key, but in
+source code English-like module names are used. In the Java ecosystem these names are broken into components
+separated by dots, and there's a strong convention that names are chosen to start with the reversed domain
+name of the developer's website. For example a developer who works for MegaCorp may use
+\texttt{com.megacorp.superproduct.submodule} as a prefix for the names used in that specific product and
+submodule.
+
+However this is only a convention. Nothing prevents anyone from publishing code that uses MegaCorp's package
+namespace. Normally this isn't a problem as developers learn the correct names via some secure means, like
+browsing an encrypted website of good reputation. But on a distributed ledger data can be encountered which
+was crafted by a malicious adversary, usually a trading partner who hasn't been extensively verified or who
+has been compromised. Such an adversary could build a transaction with a custom state and attachment that
+defined classes with the same name as used by a real app. Whilst the core ledger can differentiate between the
+two applications, if data is serialized or otherwise exposed via APIs that rely on ordinary types and class
+names the hash or signer of the original attachment can easily get lost.
+
+For example, if a state is serialized to JSON at any point then \emph{any} type that has the same shape can
+appear legitimate. In Corda serialization types are ultimately identified by class name, as is true for all
+other forms of serialization. Thus deserializing data and assuming the data represents a state only reachable
+by the contract logic would be risky if the developer forgot to check that the original smart contract was the
+intended contract and not one violating the naming convention.
+
+By enforcing the Java naming convention cryptographically and elevating it to the status of a consensus rule,
+developers can assume that a \texttt{com.megacorp.superproduct.DealState} type always obeys the rules enforced
+by the smart contract published by that specific company. They cannot get confused by a mismatch between the
+human readable self-assigned name and the cryptographic but non-human readable hash or key based name the
+ledger really uses.
+
+% TODO: Discuss confidential identities.
+% TODO: Discuss the crypto suites used in Corda.
+
+\subsection{Hard forks, bug fixes and dispute resolution}

 Decentralised ledger systems often differ in their underlying political ideology as well as their technical
 choices. The Ethereum project originally promised ``unstoppable apps'' which would implement ``code as law''. After
@ -574,7 +717,8 @@ as a hack at all given the lack of any non-code specification of what the progra
 eventually led to a split in the community.

 As Corda contracts are simply zip files, it is easy to include a PDF or other documents describing what a contract
-is meant to actually do. There is no requirement to use this mechanism, and there is no requirement that these
+is meant to actually do. A \texttt{@LegalProseReference} annotation is provided which by convention contains a URL
+or URI to a specification document. There is no requirement to use this mechanism, and there is no requirement that these
 documents have any legal weight. However in financial use cases it's expected that they would be legal contracts that
 take precedence over the software implementations in case of disagreement.

@ -582,68 +726,7 @@ It is technically possible to write a contract that cannot be upgraded. If such
 existed only on the ledger, like a cryptocurrency, then that would provide an approximation of ``code as law''. We
 leave discussion of the wisdom of this concept to political scientists and reddit.

-\paragraph{Platform logging}There is no direct equivalent in Corda of a block chain ``hard fork'', so the only solution
-to discarding buggy or fraudulent transaction chains would be to mutually agree out of band to discard an entire
-transaction subgraph. As there is no global visibility either this mutual agreement would not need to encompass all
-network participants: only those who may have received and processed such transactions. The flip side of lacking global
-visibility is that there is no single point that records who exactly has seen which transactions. Determining the set
-of entities that'd have to agree to discard a subgraph means correlating node activity logs. Corda nodes log sufficient
-information to ensure this correlation can take place. The platform defines a flow to assist with this, which can be
-used by anyone. A tool is provided that generates an ``investigation request'' and sends it to a seed node. The flow
-signals to the node administrator that a decision is required, and sufficient information is transmitted to the node to
-try and convince the administrator to take part (e.g. a signed court order). If the administrator accepts the request
-through the node explorer interface, the next hops in the transaction chain are returned. In this way the tool can
-semi-automatically crawl the network to find all parties that would be affected by a proposed rollback. The platform
-does not take a position on what types of transaction rollback are justified and provides only minimal support for
-implementing rollbacks beyond locating the parties that would have to agree.
-
-% TODO: DB logging of tx transmits is COR-544.
-
-Once involved parties are identified there are at least two strategies for editing the ledger. One is to extend
-the transaction chain with new transactions that simply correct the database to match the intended reality. For
-this to be possible the smart contract must have been written to allow arbitrary changes outside its normal
-business logic when a sufficient threshold of signatures is present. This strategy is simple and makes the most
-sense when the number of parties involved in a state is small and parties have no incentive to leave bad information
-in the ledger. For asset states that are the result of theft or fraud the only party involved in a state may
-resist attempts to patch things up in this way, as they may be able to benefit in the real world from the time
-lag between the ledger becoming inaccurate and it catching up with reality. In this case a more complex approach
-can be used in which the involved parties minus the uncooperative party agree to mark the relevant states as
-no longer consumed/spent. This is essentially a limited form of database rollback.
-
-\subsection{Identity lookups}\label{sec:identity-lookups}
-
-In all block chain inspired systems there exists a tension between wanting to know who you are dealing with and
-not wanting others to know. A standard technique is to use randomised public keys in the shared data, and keep
-the knowledge of the identity that key maps to private. For instance, it is considered good practice to generate
-a fresh key for every received payment. This technique exploits the fact that verifying the integrity of the ledger
-does not require knowing exactly who took part in the transactions, only that they followed the agreed upon
-rules of the system.
-
-Platforms such as Bitcoin and Ethereum have relatively ad-hoc mechanisms for linking identities and keys. Typically
-it is the user's responsibility to manually label public keys in their wallet software using knowledge gleaned from
-websites, shop signs and so on. Because these mechanisms are ad hoc and tedious many users don't bother, which
-can make it hard to figure out where money went later. It also complicates the deployment of secure signing devices
-and risk analysis engines. Bitcoin has BIP 70\cite{BIP70} which specifies a way of signing a ``payment
-request'' using X.509 certificates linked to the web PKI, giving a cryptographically secured and standardised way
-of knowing who you are dealing with. Identities in this system are the same as used in the web PKI: a domain name,
-email address or EV (extended validation) organisation name.
-
-Corda takes this concept further. States may define fields of type \texttt{Party}, which encapsulates an identity
-and a public key. When a state is deserialised from a transaction in its raw form, the identity field of the
-\texttt{Party} object is null and only the public (composite) key is present. If a transaction is deserialised
-in conjunction with X.509 certificate chains linking the transient public keys to long term identity keys the
-identity field is set. In this way a single data representation can be used for both the anonymised case, such
-as when validating dependencies of a transaction, and the identified case, such as when trading directly with
-a counterparty. Trading flows incorporate sub-flows to transmit certificates for the keys used, which are then
-stored in the local database. However the transaction resolution flow does not transmit such data, keeping the
-transactions in the chain of custody pseudonymous.
-
-\paragraph{Deterministic key derivation} Corda allows for but does not mandate the use of determinstic key
-derivation schemes such as BIP 32\cite{BIP32}. The infrastructure does not assume any mathematical relationship
-between public keys because some cryptographic schemes are not compatible with such systems. Thus we take the
-efficiency hit of always linking transient public keys to longer term keys with X.509 certificates.
-
-% TODO: Discuss the crypto suites used in Corda.
+% TODO: Rewrite the section on confidential identities and move it under a new privacy section.

 \subsection{Oracles and tear-offs}\label{sec:tear-offs}

@ -735,26 +818,6 @@ by index alone.

 % TODO: Interaction of enumbrances with notary change transactions.

-\subsection{Contract constraints}\label{sec:contract-constraints}
-
-The easiest way of tying states to the contract code that defines them is by hash. This works for very simple
-and stable programs, but more complicated contracts may need to be upgraded. In this case it may be preferable
-for states to refer to contracts by the identity of the signer. Because contracts are stored in zip files, and
-because a Java Archive (JAR) file is just a zip with some extra files inside, it is possible to use the standard
-JAR signing infrastructure to identify the source of contract code. Simple constraints such as ``any contract of
-this name signed by these keys'' allow for some upgrade flexibility, at the cost of increased exposure to rogue
-contract developers. Requiring combinations of signatures helps reduce the risk of a rogue or hacked developer
-publishing a bad contract version, at the cost of increased difficulty in releasing new versions. State creators
-may also specify third parties they wish to review contract code. Regardless of which set of tradeoffs is chosen,
-the framework can accomodate them.
-
-A contract constraint may use a composite key of the type described in \cref{sec:composite-keys}. The standard JAR
-signing protocol allows for multiple signatures from different private keys, thus being able to satisfy composite
-keys. The allowed signing algorithms are \texttt{SHA256withRSA} and \texttt{SHA256withECDSA}. Note that the
-cryptographic algorithms used for code signing may not always be the same as those used for transaction signing,
-as for code signing we place initial focus on being able to re-use the infrastructure.
-
-% TODO: Contract constraints aren't implemented yet so this design may change based on feedback.

 \subsection{Event scheduling}\label{sec:event-scheduling}

@ -1497,7 +1560,7 @@ a requirement.

 % TODO: Nothing related to data distribution groups is implemented.

-\section{Deterministic JVM}
+\section{Deterministic JVM}\label{sec:djvm}

 It is important that all nodes that process a transaction always agree on whether it is valid or not. Because
 transaction types are defined using JVM bytecode, this means the execution of that bytecode must be fully