a few edits to architecture.txt and related docs

This commit is contained in:
Zooko O'Whielacronx 2007-09-21 14:12:26 -07:00
parent 7e1b67cf2e
commit f5518eca92
2 changed files with 22 additions and 23 deletions

View File

@ -9,9 +9,10 @@ virtual drive, and the application that sits on top.
The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
which maps URIs to data. The URIs are relatively short ascii strings
(currently about 140 bytes), and each is used as a reference to an immutable
arbitrary-length sequence of data bytes. This data is distributed around the
grid across a large number of nodes, such that a statistically unlikely number
of nodes would have to be unavailable for the data to become unavailable.
arbitrary-length sequence of data bytes. This data is encrypted and
distributed around the grid across a large number of nodes, such that a
statistically unlikely number of nodes would have to be unavailable for the
data to become unavailable.
The middle layer is the virtual drive: a tree-shaped data structure in which
the intermediate nodes are directories and the leaf nodes are files. Each
@ -27,9 +28,9 @@ later, a user can recover older versions of their files. Other sorts of
applications can run on top of the virtual drive, of course -- anything that
has a use for a secure, robust, distributed filestore.
Note: some of the description below indicates design targets rather than
actual code present in the current release. Please take a look at roadmap.txt
to get an idea of how much of this has been implemented so far.
Note: some of the text below describes design targets rather than actual code
present in the current release. Please take a look at roadmap.txt to get an
idea of how much of this has been implemented so far.
THE BIG GRID OF PEERS
@ -46,11 +47,11 @@ StorageServer, which offers to hold data for a limited period of time (a
that would cause it to consume more space than it wants to provide. When a
lease expires, the data is deleted. Peers might renew their leases.
This storage is used to hold "shares", which are themselves used to store
files in the grid. There are many shares for each file, typically between 10
and 100 (the exact number depends upon the tradeoffs made between
reliability, overhead, and storage space consumed). The files are indexed by
a "StorageIndex", which is derived from the encryption key, which may be
This storage is used to hold "shares", which are encoded pieces of files in
the grid. There are many shares for each file, typically between 10 and 100
(the exact number depends upon the tradeoffs made between reliability,
overhead, and storage space consumed). The files are indexed by a
"StorageIndex", which is derived from the encryption key, which may be
randomly generated or it may be derived from the contents of the file. Leases
are indexed by StorageIndex, and a single StorageServer may hold multiple
shares for the corresponding file. Multiple peers can hold leases on the same
@ -90,7 +91,7 @@ be used to reconstruct the whole file. The shares are then deposited in
StorageServers in other peers.
A tagged hash of the encryption key is used to form the "storage index",
which is used for both peer selection (described below) and to index shares
which is used for both server selection (described below) and to index shares
within the StorageServers on the selected peers.
A variety of hashes are computed while the shares are being produced, to
@ -173,10 +174,10 @@ accurate. The plan is to store this capability next to the URI in the virtual
drive structure.
PEER SELECTION
SERVER SELECTION
When a file is uploaded, the encoded shares are sent to other peers. But to
which ones? The "peer selection" algorithm is used to make this choice.
which ones? The "server selection" algorithm is used to make this choice.
In the current version, the verifierid is used to consistently-permute the
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file

View File

@ -3,12 +3,9 @@ CODE OVERVIEW
A brief map to where the code lives in this distribution:
src/zfec: the erasure-coding library, turns data into shares and back again.
When installed, this provides the 'zfec' package.
src/allmydata: the bulk of the code for this project. When installed, this
provides the 'allmydata' package. This includes a few pieces
copied from the PyCrypto package, in allmydata/Crypto/* .
src/allmydata: the code for this project. When installed, this provides the
'allmydata' package. This includes a few pieces copied from
the PyCrypto package, in allmydata/Crypto/* .
Within src/allmydata/ :
@ -29,12 +26,13 @@ Within src/allmydata/ :
storageserver.py: provides storage services to other nodes
codec.py: low-level erasure coding, wraps zfec
codec.py: low-level erasure coding, wraps the zfec library
encode.py: handles turning data into shares and blocks, computes hash trees
upload.py: upload-side peer selection, reading data from upload sources
download.py: download-side peer selection, share retrieval, decoding
upload.py: upload server selection, reading data from upload sources
download.py: download server selection, share retrieval, decoding
dirnode.py: implements the directory nodes. One part runs on the
global vdrive server, the other runs inside a client