a few edits to architecture.txt and related docs

This commit is contained in:
Zooko O'Whielacronx 2007-09-21 14:12:26 -07:00
parent 7e1b67cf2e
commit f5518eca92
2 changed files with 22 additions and 23 deletions

View File

@ -9,9 +9,10 @@ virtual drive, and the application that sits on top.
The lowest layer is the "grid", basically a DHT (Distributed Hash Table) The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
which maps URIs to data. The URIs are relatively short ascii strings which maps URIs to data. The URIs are relatively short ascii strings
(currently about 140 bytes), and each is used as a reference to an immutable (currently about 140 bytes), and each is used as a reference to an immutable
arbitrary-length sequence of data bytes. This data is distributed around the arbitrary-length sequence of data bytes. This data is encrypted and
grid across a large number of nodes, such that a statistically unlikely number distributed around the grid across a large number of nodes, such that a
of nodes would have to be unavailable for the data to become unavailable. statistically unlikely number of nodes would have to be unavailable for the
data to become unavailable.
The middle layer is the virtual drive: a tree-shaped data structure in which The middle layer is the virtual drive: a tree-shaped data structure in which
the intermediate nodes are directories and the leaf nodes are files. Each the intermediate nodes are directories and the leaf nodes are files. Each
@ -27,9 +28,9 @@ later, a user can recover older versions of their files. Other sorts of
applications can run on top of the virtual drive, of course -- anything that applications can run on top of the virtual drive, of course -- anything that
has a use for a secure, robust, distributed filestore. has a use for a secure, robust, distributed filestore.
Note: some of the description below indicates design targets rather than Note: some of the text below describes design targets rather than actual code
actual code present in the current release. Please take a look at roadmap.txt present in the current release. Please take a look at roadmap.txt to get an
to get an idea of how much of this has been implemented so far. idea of how much of this has been implemented so far.
THE BIG GRID OF PEERS THE BIG GRID OF PEERS
@ -46,11 +47,11 @@ StorageServer, which offers to hold data for a limited period of time (a
that would cause it to consume more space than it wants to provide. When a that would cause it to consume more space than it wants to provide. When a
lease expires, the data is deleted. Peers might renew their leases. lease expires, the data is deleted. Peers might renew their leases.
This storage is used to hold "shares", which are themselves used to store This storage is used to hold "shares", which are encoded pieces of files in
files in the grid. There are many shares for each file, typically between 10 the grid. There are many shares for each file, typically between 10 and 100
and 100 (the exact number depends upon the tradeoffs made between (the exact number depends upon the tradeoffs made between reliability,
reliability, overhead, and storage space consumed). The files are indexed by overhead, and storage space consumed). The files are indexed by a
a "StorageIndex", which is derived from the encryption key, which may be "StorageIndex", which is derived from the encryption key, which may be
randomly generated or it may be derived from the contents of the file. Leases randomly generated or it may be derived from the contents of the file. Leases
are indexed by StorageIndex, and a single StorageServer may hold multiple are indexed by StorageIndex, and a single StorageServer may hold multiple
shares for the corresponding file. Multiple peers can hold leases on the same shares for the corresponding file. Multiple peers can hold leases on the same
@ -90,7 +91,7 @@ be used to reconstruct the whole file. The shares are then deposited in
StorageServers in other peers. StorageServers in other peers.
A tagged hash of the encryption key is used to form the "storage index", A tagged hash of the encryption key is used to form the "storage index",
which is used for both peer selection (described below) and to index shares which is used for both server selection (described below) and to index shares
within the StorageServers on the selected peers. within the StorageServers on the selected peers.
A variety of hashes are computed while the shares are being produced, to A variety of hashes are computed while the shares are being produced, to
@ -173,10 +174,10 @@ accurate. The plan is to store this capability next to the URI in the virtual
drive structure. drive structure.
PEER SELECTION SERVER SELECTION
When a file is uploaded, the encoded shares are sent to other peers. But to When a file is uploaded, the encoded shares are sent to other peers. But to
which ones? The "peer selection" algorithm is used to make this choice. which ones? The "server selection" algorithm is used to make this choice.
In the current version, the verifierid is used to consistently-permute the In the current version, the verifierid is used to consistently-permute the
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file

View File

@ -3,12 +3,9 @@ CODE OVERVIEW
A brief map to where the code lives in this distribution: A brief map to where the code lives in this distribution:
src/zfec: the erasure-coding library, turns data into shares and back again. src/allmydata: the code for this project. When installed, this provides the
When installed, this provides the 'zfec' package. 'allmydata' package. This includes a few pieces copied from
the PyCrypto package, in allmydata/Crypto/* .
src/allmydata: the bulk of the code for this project. When installed, this
provides the 'allmydata' package. This includes a few pieces
copied from the PyCrypto package, in allmydata/Crypto/* .
Within src/allmydata/ : Within src/allmydata/ :
@ -29,12 +26,13 @@ Within src/allmydata/ :
storageserver.py: provides storage services to other nodes storageserver.py: provides storage services to other nodes
codec.py: low-level erasure coding, wraps zfec codec.py: low-level erasure coding, wraps the zfec library
encode.py: handles turning data into shares and blocks, computes hash trees encode.py: handles turning data into shares and blocks, computes hash trees
upload.py: upload-side peer selection, reading data from upload sources
download.py: download-side peer selection, share retrieval, decoding upload.py: upload server selection, reading data from upload sources
download.py: download server selection, share retrieval, decoding
dirnode.py: implements the directory nodes. One part runs on the dirnode.py: implements the directory nodes. One part runs on the
global vdrive server, the other runs inside a client global vdrive server, the other runs inside a client