tahoe-lafs/docs/specifications/outline.txt

211 lines
11 KiB
Plaintext
Raw Normal View History

= Specification Document Outline =
While we do not yet have a clear set of specification documents for Tahoe
(explaining the file formats, so that others can write interoperable
implementations), this document is intended to lay out an outline for what
these specs ought to contain. Think of this as the ISO 7-Layer Model for
Tahoe.
We currently imagine 4 documents.
== #1: Share Format, Encoding Algorithm ==
This document will describe the way that files are encrypted and encoded into
shares. It will include a specification of the share format, and explain both
the encoding and decoding algorithms. It will cover both mutable and
immutable files.
The immutable encoding algorithm, as described by this document, will start
with a plaintext series of bytes, encoding parameters "k" and "N", and either
an encryption key or a mechanism for deterministically deriving the key from
the plaintext (the CHK specification). The algorithm will end with a set of N
shares, and a set of values that must be included in the filecap to provide
confidentiality (the encryption key) and integrity (the UEB hash).
The immutable decoding algorithm will start with the filecap values (key and
UEB hash) and "k" shares. It will explain how to validate the shares against
the integrity information, how to reverse the erasure-coding, and how to
decrypt the resulting ciphertext. It will result in the original plaintext
bytes (or some subrange thereof).
The sections on mutable files will contain similar information.
This document is *not* responsible for explaining the filecap format, since
full filecaps may need to contain additional information as described in
document #3. Likewise it it not responsible for explaining where to put the
generated shares or where to find them again later.
It is also not responsible for explaining the access control mechanisms
surrounding share upload, download, or modification ("Accounting" is the
business of controlling share upload to conserve space, and mutable file
shares require some sort of access control to prevent non-writecap holders
from destroying shares). We don't yet have a document dedicated to explaining
these, but let's call it "Access Control" for now.
== #2: Share Exchange Protocol ==
This document explains the wire-protocol used to upload, download, and modify
shares on the various storage servers.
Given the N shares created by the algorithm described in document #1, and a
set of servers who are willing to accept those shares, the protocols in this
document will be sufficient to get the shares onto the servers. Likewise,
given a set of servers who hold at least k shares, these protocols will be
enough to retrieve the shares necessary to begin the decoding process
described in document #1. The notion of a "storage index" is used to
reference a particular share: the storage index is generated by the encoding
process described in document #1.
This document does *not* describe how to identify or choose those servers,
rather it explains what to do once they have been selected (by the mechanisms
in document #3).
This document also explains the protocols that a client uses to ask a server
whether or not it is willing to accept an uploaded share, and whether it has
a share available for download. These protocols will be used by the
mechanisms in document #3 to help decide where the shares should be placed.
Where cryptographic mechanisms are necessary to implement access-control
policy, this document will explain those mechanisms.
In the future, Tahoe will be able to use multiple protocols to speak to
storage servers. There will be alternative forms of this document, one for
each protocol. The first one to be written will describe the Foolscap-based
protocol that tahoe currently uses, but we anticipate a subsequent one to
describe a more HTTP-based protocol.
== #3: Server Selection Algorithm, filecap format ==
This document has two interrelated purposes. With a deeper understanding of
the issues, we may be able to separate these more cleanly in the future.
The first purpose is to explain the server selection algorithm. Given a set
of N shares, where should those shares be uploaded? Given some information
stored about a previously-uploaded file, how should a downloader locate and
recover at least k shares? Given a previously-uploaded mutable file, how
should a modifier locate all (or most of) the shares with a reasonable amount
of work?
This question implies many things, all of which should be explained in this
document:
* the notion of a "grid", nominally a set of servers who could potentially
hold shares, which might change over time
* a way to configure which grid should be used
* a way to discover which servers are a part of that grid
* a way to decide which servers are reliable enough to be worth sending
shares
* an algorithm to handle servers which refuse shares
* a way for a downloader to locate which servers have shares
* a way to choose which shares should be used for download
The server-selection algorithm has several obviously competing goals:
* minimize the amount of work that must be done during upload
* minimize the total storage resources used
* avoid "hot spots", balance load among multiple servers
* maximize the chance that enough shares will be downloadable later, by
uploading lots of shares, and by placing them on reliable servers
* minimize the work that the future downloader must do
* tolerate temporary server failures, permanent server departure, and new
server insertions
* minimize the amount of information that must be added to the filecap
The server-selection algorithm is defined in some context: some set of
expectations about the servers or grid with which it is expected to operate.
Different algorithms are appropriate for different situtations, so there will
be multiple alternatives of this document.
The first version of this document will describe the algorithm that the
current (1.3.0) release uses, which is heavily weighted towards the two main
use case scenarios for which Tahoe has been designed: the small, stable
friendnet, and the allmydata.com managed grid. In both cases, we assume that
the storage servers are online most of the time, they are uniformly highly
reliable, and that the set of servers does not change very rapidly. The
server-selection algorithm for this environment uses a permuted server list
to achieve load-balancing, uses all servers identically, and derives the
permutation key from the storage index to avoid adding a new field to the
filecap.
An alternative algorithm could give clients more precise control over share
placement, for example by a user who wished to make sure that k+1 shares are
located in each datacenter (to allow downloads to take place using only local
bandwidth). This algorithm could skip the permuted list and use other
mechanisms to accomplish load-balancing (or ignore the issue altogether). It
could add additional information to the filecap (like a list of which servers
received the shares) in lieu of performing a search at download time, perhaps
at the expense of allowing a repairer to move shares to a new server after
the initial upload. It might make up for this by storing "location hints"
next to each share, to indicate where other shares are likely to be found,
and obligating the repairer to update these hints.
The second purpose of this document is to explain the format of the file
capability string (or "filecap" for short). There are multiple kinds of
capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
cap, traverse-only, etc). There are multiple ways to represent the filecap
(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
but they must all contain enough information to reliably retrieve a file
(given some context, of course). It must at least contain the confidentiality
and integrity information from document #1 (i.e. the encryption key and the
UEB hash). It must also contain whatever additional information the
upload-time server-selection algorithm generated that will be required by the
downloader.
For some server-selection algorithms, the additional information will be
minimal. For example, the 1.3.0 release uses the hash of the encryption key
as a storage index, and uses the storage index to permute the server list,
and uses an Introducer to learn the current list of servers. This allows a
"close-enough" list of servers to be compressed into a filecap field that is
already required anyways (the encryption key). It also adds k and N to the
filecap, to speed up the downloader's search (the downloader knows how many
shares it needs, so it can send out multiple queries in parallel).
But other server-selection algorithms might require more information. Each
variant of this document will explain how to encode that additional
information into the filecap, and how to extract and use that information at
download time.
These two purposes are interrelated. A filecap that is interpreted in the
context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
a specific peer-selection algorithm, a specific Introducer, and therefore a
fairly-specific set of servers to query for shares. A filecap which is meant
to be interpreted on a different sort of grid would need different
information.
Some filecap formats can be designed to contain more information (and depend
less upon context), such as the way an HTTP URL implies the existence of a
single global DNS system. Ideally a tahoe filecap should be able to specify
which "grid" it lives in, with enough information to allow a compatible
implementation of Tahoe to locate that grid and retrieve the file (regardless
of which server-selection algorithm was used for upload).
This more-universal format might come at the expense of reliability, however.
Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
an individual host might then impact file availability (however the
Introducer contains DNS names or IP addresses).
== #4: Directory Format ==
Tahoe directories are a special way of interpreting and managing the contents
of a file (either mutable or immutable). These "dirnode" files are basically
serialized tables that map child name to filecap/dircap. This document
describes the format of these files.
Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
applying an additional layer of encryption to the list of child writecaps.
The key for this encryption is derived from the containing file's writecap.
This document must explain how to derive this key and apply it to the
appropriate portion of the table.
Future versions of the directory format are expected to contain
"deep-traversal caps", which allow verification/repair of files without
exposing their plaintext to the repair agent. This document wil be
responsible for explaining traversal caps too.
Future versions of the directory format will probably contain an index and
more advanced data structures (for efficiency and fast lookups), instead of a
simple flat list of (childname, childcap). This document will also need to
describe metadata formats, including what access-control policies are defined
for the metadata.