Delete the notes directory, it is long since obsoleted by the wiki and the docs site.

This commit is contained in:
Mike Hearn 2015-12-15 17:19:54 +01:00
parent e3cfe0ae49
commit d6cdc8b8de
3 changed files with 0 additions and 233 deletions

View File

@ -1,195 +0,0 @@
General design scratchpad
Do we need blocks at all? Blocks are an artifact of proof-of-work, which isn't acceptable on private block chains
due to the excessive energy usage, unclear incentives model and so on. They're also very useful for SPV operation,
but we have no such requirements here.
Possible alternative, blend of ideas from:
* Google Spanner
* Hawk
* Bitcoin
* Ethereum
* Intel/TCG
+ some of my own ideas
# Blockless operation
* A set of timestampers are set up around the world with clocks synchronised to GPS time (the most accurate clock
available as it's constantly recalibrated against the US Naval Observatory atomic clock). Public timestampers
are available already and can be easily used in the prototyping phase, but as they're intended for low traffic
applications eventually we'd want our own.
There is a standard protocol for timestamp servers (RFC 3161). It appears to include everything that we might want
and little more, i.e. it's a good place to start. A more modern version of it with the same features can be easily
generated later.
* All transactions submitted to the global network must be timestamped by a recognised TSP (i.e. signed by a root cert
owned by R3).
* Transactions are ordered according to these timestamps. They are assumed to be precise enough that conflicts where
two transactions have actually equal times can ~never happen: a trivial resolution algorithm (e.g. based on whichever
hash is lower) can be used in case that ever happens by fluke.
* If need be, clock uncertainty can be measured and overlapping intervals can result in conflict/reject states, as in
Spanner's TrueTime. The timestamping protocol (RFC 3161) exposes clock uncertainty.
* Transactions are timestamped as a group. This ensures that if multiple transactions are needed to carry out a trade,
individual transactions cannot be extracted by a malicious party and executed independently as the original bundle
will always be able to win, when broadcast.
* Nodes listen to a broadcast feed of timestamped transactions. They know how to roll back transactions and replay
new ones in case of conflict, but this is done independent of any block construct.
* Nodes that are catching up simply download all the transactions from peers that occur after the time they shut down.
They can be sure they didn't miss any by asking peers to calculate a UTXO set summary at a given time and then
verifying it against their own local calculations (this is slow, but shouldn't normally flag any issues so it can
be done asynchronously).
* Individual transactions/UTXOs can specify time bounds, e.g. "30 seconds". A node compares a transaction timestamp
to its own local clock and applies the specified bound to the local clock: if the transaction is out of bounds and
the node isn't catching up, then it is dropped. This prevents people timestamping a malicious transaction X and
keeping it private, then broadcasting a publicly timestamped transaction Y, then overriding Y with X long after the
underlying trade has become irreversible. Because time bounds are specified on a _per transaction_ basis, it is
arbitrarily controllable: traders that want very, very fast clearing can specify a small time boundary and it's up
to them to ensure their own systems are capable of getting an accurate trusted timestamp and broadcasting it within
that tight bound. Traders that care less, e.g. because the trade represents physical movement of real goods, can use
a much larger time bound and get more robustness against transient network/hardware hiccups.
* Like in Ethereum, transactions can update stored state (contracts? receipts? what is the right term?)
This can be called transaction-chains. All transactions are public.
For political expedience, we may wish to impose a (not strictly necessary) block quantisation anyway, so the popular
term 'block chain' can be applied and also for auditing/reporting convenience.
# Privacy
* Transactions can have two halves: the public side and the private side. The public side is a "normal" transaction that
includes a program of sufficient power to verify various kinds of signatures and proofs. The optional private side
is an arbitrary program which is executed by a third party. Various techniques are used to lower the trust required
in the third parties. We can call these notaries.
* It's up to the contract designer to decide how much they rely on notaries - if at all. They are technically not
required at all: the system would work (and scale) without them. But they can be used to improve privacy.
* Simplest "dummy" notary is just a machine that signs the output of the program to state it ran it properly. The notary
is trusted to execute the program correctly and privately. The signature is checked by the public side. This allows
traders to perform e.g. a Dutch auction with only the final results being reflected on the public network.
* Next best is an SGX based notary. This can provide both privacy and assurance that the code is executed correctly,
assuming Intel is trustworthy. Note: it's a safe assumption that if R3 becomes very popular with financial networks,
intelligence agencies will attempt to gain covert access to it given the NSA/GCHQ hacking of Western Union and clear
interest in SWIFT data. Thus care must be used to ensure the (entirely unprovable) SGX computers are not interdicted
during delivery.
* In addition, zero knowledge proofs can be considered as a supplement to SGX. They can give extra assurance against
corrupted notaries calculating incorrect results. However, unlike SGX, they cannot reduce the amount of information
the notary sees, and thus they are strictly a "backup". In addition they have _severe_ caveats, in particular, a
complex and expensive setup phase that must be executed for each contract (in fact for each version of each contract),
and execution of the private side is extremely slow.
This makes them suitable only for contracts that are basically finalised and in which the highest levels of assurance
are required, and fast or frequent trading is not required. The technology may well improve over time.
* In some cases homomorphic encryption could be used as a privacy supplement to SGX.
# Scaling
* Global broadcast systems are frequently attacked for 'not scaling'. But this is an absolute statement in a world of
tradeoffs: technically speaking the NASDAQ is a broadcast system as you can subscribe to data feeds via e.g. OPRA.
Some of these feeds can reach millions of messages per second. Nonetheless, financial firms are capable of digesting
them without issue. Even the largest feeds have finite traffic and predictable growth patterns.
* We can assume powerful hardware, as the primary users of this system would be financial institutions. There is no
requirement to run on people's laptops, outside of testing/devnet scenarios. For instance it's safe to assume SSD
based storage: we can simply tell institutions that want to get on the network to buy a proper server.
* There is no requirement for lightweight/mobile clients, unlike in Bitcoin.
* Transaction checking is highly parallelisable.
* Therefore, as long as transactions are kept computationally cheap, there should be no problem reaching even very high
levels of traffic.
Conclusion: scaling in a Bitcoin style manner should not be a problem, even if high level languages like Java or Kotlin
are in use.
# Programmability
* The public side of a transaction must use a globally agreed execution environment, like the EVM is for Ethereum.
The private sides can run anything: as the public side checks a proof of execution of the private side, there is
no requirement that the private side use any particular language or runtime.
* Inventing a custom VM and language doesn't make sense: there is only one special requirement that is different
to most VMs and that's the ability to impose hard CPU usage limits. But existing VMs can be extended to deliver
this functionality much more easily than entirely new VMs+languages can be created.
* For prototyping and possibly for production use, we should use the JVM:
* Sandboxing already available, easy to use
* Several languages available, developers are familiar
* If host environment also runs on the JVM, no time wasted on interop issues, see the Ethereum ABI issues
* HotSpot already has a CPU/memory tracking API and can interrupt threads (but lacks the ability to hard shut down
malicious code)
* Code annotations can be used to customise whatever languages are used for contract-specific use cases.
* Can be forced to run in interpreted mode at first, but if we need the extra performance later due to high traffic
the JIT compiler will automatically make contract code fast.
* Has industrial strength debugging/monitoring tools.
* Banks are already deeply familiar with it.
# Transaction design
Use a vaguely bitcoin-like design with "states" which are consumed and generated by "contracts" (programs). Everyone
runs the same programs simultaneously in order to verify state transitions. Transactions consist of input states,
output states and "commands" which represent signed auxiliary inputs to the transitions.
------
# Useful technologies
FIX SBE is a very (very) efficient binary encoding designed for HFT:
http://real-logic.github.io/simple-binary-encoding/
It's mostly analogous to protocol buffers but imposes some additional constraints and has an uglier API, in return for
much higher performance. It probably isn't useful during the prototyping phase. But it may be a useful optimisation
later.
CopyCat is an implementation of Raft (similar to Paxos), as an embeddable framework. Raft/Paxos type algorithms are not
suitable as the basis for a global distributed ledger due to tiny throughput, but may be useful as a subcomponent of
other algorithms. For instance possibly a multi-step contract protocol could use Raft/Paxos between a limited number of
counterparties to synchronise changes.
http://kuujo.github.io/copycat/user-manual/introduction/
------
# Prototyping
Stream 1:
1. Implement a simple star topology for message routing (full p2p can come later). Ensure it's got a clean modular API.
2. Implement a simple chat app on top of it. This will be useful later for sending commands to faucets, bots, etc.
3. Add auto-update
4. Design a basic transaction/transaction bundle abstraction and implement timestamping of the bundles. Make chat lines
into "transactions", so they are digitally signed and timestamped properly.
5. Implement detection of conflicts and rollbacks.
Stream 2: Design straw-man contracts and data structures (in Java or Kotlin) for
1. payments
2. simplified bond auctions
3. maybe a CDS

View File

@ -1,19 +0,0 @@
# Simple payment
CashState:
- Issuing institution
- Deposit reference (pointer into internal ledger)
- Currency code
- Claim size (initial state = size of original deposit)
- Public key of current owner
ExitCashState:
- Amount to reduce claim size by
- Signature signed by ownerPubKey
State transition function (contract):
1. If input states contains an ExitCashState, set reduceByAmount=state.amount
1. For all proposed output states, they must all be instances of CashState
For all proposed input states, they must all be instances of CashState
2. Sum claim sizes in all predecessor states. Sum claim sizes in all successor states
3. Accept if outputSum == inputSum - reduceByAmount

View File

@ -1,19 +0,0 @@
How to represent pointers to states in the type system? Opaque or exposed as hashes?
# Create states vs check states?
1. Derive output states entirely from input states + signed commands, *or*
2. Be given the output states and check they're valid
The advantage of 1 is that it feels safer: you can't forget to check something in the output state by accident. On
the other hand, then it's up to the platform to validate equality between the states (probably by serializing them
and comparing bit strings), and that would make unit testing harder as the generic machinery can't give good error
messages for a given mismatch. Also it means you can't do an equivalent of OP_RETURN and insert extra no-op states
in the output list that are ignored by all the input contracts. Does that matter if extensibility/tagging is built in
more elegantly? Is it better to prevent this for the usual spam reasons?
The advantage of 2 is that it seems somehow more extensible: old contracts would ignore fields added to new states if
they didn't understand them (or is that a disadvantage?)
# What precisely is signed at each point?