mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-18 18:56:28 +00:00
docs/specifications: add an outline of the spec documents we'd like to have some day
This commit is contained in:
parent
d8e00c37b5
commit
bbef104315
210
docs/specifications/outline.txt
Normal file
210
docs/specifications/outline.txt
Normal file
@ -0,0 +1,210 @@
|
||||
= Specification Document Outline =
|
||||
|
||||
While we do not yet have a clear set of specification documents for Tahoe
|
||||
(explaining the file formats, so that others can write interoperable
|
||||
implementations), this document is intended to lay out an outline for what
|
||||
these specs ought to contain. Think of this as the ISO 7-Layer Model for
|
||||
Tahoe.
|
||||
|
||||
We currently imagine 4 documents.
|
||||
|
||||
== #1: Share Format, Encoding Algorithm ==
|
||||
|
||||
This document will describe the way that files are encrypted and encoded into
|
||||
shares. It will include a specification of the share format, and explain both
|
||||
the encoding and decoding algorithms. It will cover both mutable and
|
||||
immutable files.
|
||||
|
||||
The immutable encoding algorithm, as described by this document, will start
|
||||
with a plaintext series of bytes, encoding parameters "k" and "N", and either
|
||||
an encryption key or a mechanism for deterministically deriving the key from
|
||||
the plaintext (the CHK specification). The algorithm will end with a set of N
|
||||
shares, and a set of values that must be included in the filecap to provide
|
||||
confidentiality (the encryption key) and integrity (the UEB hash).
|
||||
|
||||
The immutable decoding algorithm will start with the filecap values (key and
|
||||
UEB hash) and "k" shares. It will explain how to validate the shares against
|
||||
the integrity information, how to reverse the erasure-coding, and how to
|
||||
decrypt the resulting ciphertext. It will result in the original plaintext
|
||||
bytes (or some subrange thereof).
|
||||
|
||||
The sections on mutable files will contain similar information.
|
||||
|
||||
This document is *not* responsible for explaining the filecap format, since
|
||||
full filecaps may need to contain additional information as described in
|
||||
document #3. Likewise it it not responsible for explaining where to put the
|
||||
generated shares or where to find them again later.
|
||||
|
||||
It is also not responsible for explaining the access control mechanisms
|
||||
surrounding share upload, download, or modification ("Accounting" is the
|
||||
business of controlling share upload to conserve space, and mutable file
|
||||
shares require some sort of access control to prevent non-writecap holders
|
||||
from destroying shares). We don't yet have a document dedicated to explaining
|
||||
these, but let's call it "Access Control" for now.
|
||||
|
||||
|
||||
== #2: Share Exchange Protocol ==
|
||||
|
||||
This document explains the wire-protocol used to upload, download, and modify
|
||||
shares on the various storage servers.
|
||||
|
||||
Given the N shares created by the algorithm described in document #1, and a
|
||||
set of servers who are willing to accept those shares, the protocols in this
|
||||
document will be sufficient to get the shares onto the servers. Likewise,
|
||||
given a set of servers who hold at least k shares, these protocols will be
|
||||
enough to retrieve the shares necessary to begin the decoding process
|
||||
described in document #1. The notion of a "storage index" is used to
|
||||
reference a particular share: the storage index is generated by the encoding
|
||||
process described in document #1.
|
||||
|
||||
This document does *not* describe how to identify or choose those servers,
|
||||
rather it explains what to do once they have been selected (by the mechanisms
|
||||
in document #3).
|
||||
|
||||
This document also explains the protocols that a client uses to ask a server
|
||||
whether or not it is willing to accept an uploaded share, and whether it has
|
||||
a share available for download. These protocols will be used by the
|
||||
mechanisms in document #3 to help decide where the shares should be placed.
|
||||
|
||||
Where cryptographic mechanisms are necessary to implement access-control
|
||||
policy, this document will explain those mechanisms.
|
||||
|
||||
In the future, Tahoe will be able to use multiple protocols to speak to
|
||||
storage servers. There will be alternative forms of this document, one for
|
||||
each protocol. The first one to be written will describe the Foolscap-based
|
||||
protocol that tahoe currently uses, but we anticipate a subsequent one to
|
||||
describe a more HTTP-based protocol.
|
||||
|
||||
== #3: Server Selection Algorithm, filecap format ==
|
||||
|
||||
This document has two interrelated purposes. With a deeper understanding of
|
||||
the issues, we may be able to separate these more cleanly in the future.
|
||||
|
||||
The first purpose is to explain the server selection algorithm. Given a set
|
||||
of N shares, where should those shares be uploaded? Given some information
|
||||
stored about a previously-uploaded file, how should a downloader locate and
|
||||
recover at least k shares? Given a previously-uploaded mutable file, how
|
||||
should a modifier locate all (or most of) the shares with a reasonable amount
|
||||
of work?
|
||||
|
||||
This question implies many things, all of which should be explained in this
|
||||
document:
|
||||
|
||||
* the notion of a "grid", nominally a set of servers who could potentially
|
||||
hold shares, which might change over time
|
||||
* a way to configure which grid should be used
|
||||
* a way to discover which servers are a part of that grid
|
||||
* a way to decide which servers are reliable enough to be worth sending
|
||||
shares
|
||||
* an algorithm to handle servers which refuse shares
|
||||
* a way for a downloader to locate which servers have shares
|
||||
* a way to choose which shares should be used for download
|
||||
|
||||
The server-selection algorithm has several obviously competing goals:
|
||||
|
||||
* minimize the amount of work that must be done during upload
|
||||
* minimize the total storage resources used
|
||||
* avoid "hot spots", balance load among multiple servers
|
||||
* maximize the chance that enough shares will be downloadable later, by
|
||||
uploading lots of shares, and by placing them on reliable servers
|
||||
* minimize the work that the future downloader must do
|
||||
* tolerate temporary server failures, permanent server departure, and new
|
||||
server insertions
|
||||
* minimize the amount of information that must be added to the filecap
|
||||
|
||||
The server-selection algorithm is defined in some context: some set of
|
||||
expectations about the servers or grid with which it is expected to operate.
|
||||
Different algorithms are appropriate for different situtations, so there will
|
||||
be multiple alternatives of this document.
|
||||
|
||||
The first version of this document will describe the algorithm that the
|
||||
current (1.3.0) release uses, which is heavily weighted towards the two main
|
||||
use case scenarios for which Tahoe has been designed: the small, stable
|
||||
friendnet, and the allmydata.com managed grid. In both cases, we assume that
|
||||
the storage servers are online most of the time, they are uniformly highly
|
||||
reliable, and that the set of servers does not change very rapidly. The
|
||||
server-selection algorithm for this environment uses a permuted server list
|
||||
to achieve load-balancing, uses all servers identically, and derives the
|
||||
permutation key from the storage index to avoid adding a new field to the
|
||||
filecap.
|
||||
|
||||
An alternative algorithm could give clients more precise control over share
|
||||
placement, for example by a user who wished to make sure that k+1 shares are
|
||||
located in each datacenter (to allow downloads to take place using only local
|
||||
bandwidth). This algorithm could skip the permuted list and use other
|
||||
mechanisms to accomplish load-balancing (or ignore the issue altogether). It
|
||||
could add additional information to the filecap (like a list of which servers
|
||||
received the shares) in lieu of performing a search at download time, perhaps
|
||||
at the expense of allowing a repairer to move shares to a new server after
|
||||
the initial upload. It might make up for this by storing "location hints"
|
||||
next to each share, to indicate where other shares are likely to be found,
|
||||
and obligating the repairer to update these hints.
|
||||
|
||||
The second purpose of this document is to explain the format of the file
|
||||
capability string (or "filecap" for short). There are multiple kinds of
|
||||
capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
|
||||
cap, traverse-only, etc). There are multiple ways to represent the filecap
|
||||
(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
|
||||
but they must all contain enough information to reliably retrieve a file
|
||||
(given some context, of course). It must at least contain the confidentiality
|
||||
and integrity information from document #1 (i.e. the encryption key and the
|
||||
UEB hash). It must also contain whatever additional information the
|
||||
upload-time server-selection algorithm generated that will be required by the
|
||||
downloader.
|
||||
|
||||
For some server-selection algorithms, the additional information will be
|
||||
minimal. For example, the 1.3.0 release uses the hash of the encryption key
|
||||
as a storage index, and uses the storage index to permute the server list,
|
||||
and uses an Introducer to learn the current list of servers. This allows a
|
||||
"close-enough" list of servers to be compressed into a filecap field that is
|
||||
already required anyways (the encryption key). It also adds k and N to the
|
||||
filecap, to speed up the downloader's search (the downloader knows how many
|
||||
shares it needs, so it can send out multiple queries in parallel).
|
||||
|
||||
But other server-selection algorithms might require more information. Each
|
||||
variant of this document will explain how to encode that additional
|
||||
information into the filecap, and how to extract and use that information at
|
||||
download time.
|
||||
|
||||
These two purposes are interrelated. A filecap that is interpreted in the
|
||||
context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
|
||||
a specific peer-selection algorithm, a specific Introducer, and therefore a
|
||||
fairly-specific set of servers to query for shares. A filecap which is meant
|
||||
to be interpreted on a different sort of grid would need different
|
||||
information.
|
||||
|
||||
Some filecap formats can be designed to contain more information (and depend
|
||||
less upon context), such as the way an HTTP URL implies the existence of a
|
||||
single global DNS system. Ideally a tahoe filecap should be able to specify
|
||||
which "grid" it lives in, with enough information to allow a compatible
|
||||
implementation of Tahoe to locate that grid and retrieve the file (regardless
|
||||
of which server-selection algorithm was used for upload).
|
||||
|
||||
This more-universal format might come at the expense of reliability, however.
|
||||
Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
|
||||
an individual host might then impact file availability (however the
|
||||
Introducer contains DNS names or IP addresses).
|
||||
|
||||
== #4: Directory Format ==
|
||||
|
||||
Tahoe directories are a special way of interpreting and managing the contents
|
||||
of a file (either mutable or immutable). These "dirnode" files are basically
|
||||
serialized tables that map child name to filecap/dircap. This document
|
||||
describes the format of these files.
|
||||
|
||||
Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
|
||||
applying an additional layer of encryption to the list of child writecaps.
|
||||
The key for this encryption is derived from the containing file's writecap.
|
||||
This document must explain how to derive this key and apply it to the
|
||||
appropriate portion of the table.
|
||||
|
||||
Future versions of the directory format are expected to contain
|
||||
"deep-traversal caps", which allow verification/repair of files without
|
||||
exposing their plaintext to the repair agent. This document wil be
|
||||
responsible for explaining traversal caps too.
|
||||
|
||||
Future versions of the directory format will probably contain an index and
|
||||
more advanced data structures (for efficiency and fast lookups), instead of a
|
||||
simple flat list of (childname, childcap). This document will also need to
|
||||
describe metadata formats, including what access-control policies are defined
|
||||
for the metadata.
|
Loading…
Reference in New Issue
Block a user