mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-23 14:52:26 +00:00
a few edits to architecture.txt and related docs
This commit is contained in:
parent
7e1b67cf2e
commit
f5518eca92
@ -9,9 +9,10 @@ virtual drive, and the application that sits on top.
|
||||
The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
|
||||
which maps URIs to data. The URIs are relatively short ascii strings
|
||||
(currently about 140 bytes), and each is used as a reference to an immutable
|
||||
arbitrary-length sequence of data bytes. This data is distributed around the
|
||||
grid across a large number of nodes, such that a statistically unlikely number
|
||||
of nodes would have to be unavailable for the data to become unavailable.
|
||||
arbitrary-length sequence of data bytes. This data is encrypted and
|
||||
distributed around the grid across a large number of nodes, such that a
|
||||
statistically unlikely number of nodes would have to be unavailable for the
|
||||
data to become unavailable.
|
||||
|
||||
The middle layer is the virtual drive: a tree-shaped data structure in which
|
||||
the intermediate nodes are directories and the leaf nodes are files. Each
|
||||
@ -27,9 +28,9 @@ later, a user can recover older versions of their files. Other sorts of
|
||||
applications can run on top of the virtual drive, of course -- anything that
|
||||
has a use for a secure, robust, distributed filestore.
|
||||
|
||||
Note: some of the description below indicates design targets rather than
|
||||
actual code present in the current release. Please take a look at roadmap.txt
|
||||
to get an idea of how much of this has been implemented so far.
|
||||
Note: some of the text below describes design targets rather than actual code
|
||||
present in the current release. Please take a look at roadmap.txt to get an
|
||||
idea of how much of this has been implemented so far.
|
||||
|
||||
|
||||
THE BIG GRID OF PEERS
|
||||
@ -46,11 +47,11 @@ StorageServer, which offers to hold data for a limited period of time (a
|
||||
that would cause it to consume more space than it wants to provide. When a
|
||||
lease expires, the data is deleted. Peers might renew their leases.
|
||||
|
||||
This storage is used to hold "shares", which are themselves used to store
|
||||
files in the grid. There are many shares for each file, typically between 10
|
||||
and 100 (the exact number depends upon the tradeoffs made between
|
||||
reliability, overhead, and storage space consumed). The files are indexed by
|
||||
a "StorageIndex", which is derived from the encryption key, which may be
|
||||
This storage is used to hold "shares", which are encoded pieces of files in
|
||||
the grid. There are many shares for each file, typically between 10 and 100
|
||||
(the exact number depends upon the tradeoffs made between reliability,
|
||||
overhead, and storage space consumed). The files are indexed by a
|
||||
"StorageIndex", which is derived from the encryption key, which may be
|
||||
randomly generated or it may be derived from the contents of the file. Leases
|
||||
are indexed by StorageIndex, and a single StorageServer may hold multiple
|
||||
shares for the corresponding file. Multiple peers can hold leases on the same
|
||||
@ -90,7 +91,7 @@ be used to reconstruct the whole file. The shares are then deposited in
|
||||
StorageServers in other peers.
|
||||
|
||||
A tagged hash of the encryption key is used to form the "storage index",
|
||||
which is used for both peer selection (described below) and to index shares
|
||||
which is used for both server selection (described below) and to index shares
|
||||
within the StorageServers on the selected peers.
|
||||
|
||||
A variety of hashes are computed while the shares are being produced, to
|
||||
@ -173,10 +174,10 @@ accurate. The plan is to store this capability next to the URI in the virtual
|
||||
drive structure.
|
||||
|
||||
|
||||
PEER SELECTION
|
||||
SERVER SELECTION
|
||||
|
||||
When a file is uploaded, the encoded shares are sent to other peers. But to
|
||||
which ones? The "peer selection" algorithm is used to make this choice.
|
||||
which ones? The "server selection" algorithm is used to make this choice.
|
||||
|
||||
In the current version, the verifierid is used to consistently-permute the
|
||||
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
|
||||
|
@ -3,12 +3,9 @@ CODE OVERVIEW
|
||||
|
||||
A brief map to where the code lives in this distribution:
|
||||
|
||||
src/zfec: the erasure-coding library, turns data into shares and back again.
|
||||
When installed, this provides the 'zfec' package.
|
||||
|
||||
src/allmydata: the bulk of the code for this project. When installed, this
|
||||
provides the 'allmydata' package. This includes a few pieces
|
||||
copied from the PyCrypto package, in allmydata/Crypto/* .
|
||||
src/allmydata: the code for this project. When installed, this provides the
|
||||
'allmydata' package. This includes a few pieces copied from
|
||||
the PyCrypto package, in allmydata/Crypto/* .
|
||||
|
||||
Within src/allmydata/ :
|
||||
|
||||
@ -29,12 +26,13 @@ Within src/allmydata/ :
|
||||
|
||||
storageserver.py: provides storage services to other nodes
|
||||
|
||||
codec.py: low-level erasure coding, wraps zfec
|
||||
codec.py: low-level erasure coding, wraps the zfec library
|
||||
|
||||
encode.py: handles turning data into shares and blocks, computes hash trees
|
||||
upload.py: upload-side peer selection, reading data from upload sources
|
||||
|
||||
download.py: download-side peer selection, share retrieval, decoding
|
||||
upload.py: upload server selection, reading data from upload sources
|
||||
|
||||
download.py: download server selection, share retrieval, decoding
|
||||
|
||||
dirnode.py: implements the directory nodes. One part runs on the
|
||||
global vdrive server, the other runs inside a client
|
||||
|
Loading…
Reference in New Issue
Block a user