mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-24 15:16:41 +00:00
a few edits to architecture.txt and related docs
This commit is contained in:
parent
7e1b67cf2e
commit
f5518eca92
@ -9,9 +9,10 @@ virtual drive, and the application that sits on top.
|
|||||||
The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
|
The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
|
||||||
which maps URIs to data. The URIs are relatively short ascii strings
|
which maps URIs to data. The URIs are relatively short ascii strings
|
||||||
(currently about 140 bytes), and each is used as a reference to an immutable
|
(currently about 140 bytes), and each is used as a reference to an immutable
|
||||||
arbitrary-length sequence of data bytes. This data is distributed around the
|
arbitrary-length sequence of data bytes. This data is encrypted and
|
||||||
grid across a large number of nodes, such that a statistically unlikely number
|
distributed around the grid across a large number of nodes, such that a
|
||||||
of nodes would have to be unavailable for the data to become unavailable.
|
statistically unlikely number of nodes would have to be unavailable for the
|
||||||
|
data to become unavailable.
|
||||||
|
|
||||||
The middle layer is the virtual drive: a tree-shaped data structure in which
|
The middle layer is the virtual drive: a tree-shaped data structure in which
|
||||||
the intermediate nodes are directories and the leaf nodes are files. Each
|
the intermediate nodes are directories and the leaf nodes are files. Each
|
||||||
@ -27,9 +28,9 @@ later, a user can recover older versions of their files. Other sorts of
|
|||||||
applications can run on top of the virtual drive, of course -- anything that
|
applications can run on top of the virtual drive, of course -- anything that
|
||||||
has a use for a secure, robust, distributed filestore.
|
has a use for a secure, robust, distributed filestore.
|
||||||
|
|
||||||
Note: some of the description below indicates design targets rather than
|
Note: some of the text below describes design targets rather than actual code
|
||||||
actual code present in the current release. Please take a look at roadmap.txt
|
present in the current release. Please take a look at roadmap.txt to get an
|
||||||
to get an idea of how much of this has been implemented so far.
|
idea of how much of this has been implemented so far.
|
||||||
|
|
||||||
|
|
||||||
THE BIG GRID OF PEERS
|
THE BIG GRID OF PEERS
|
||||||
@ -46,11 +47,11 @@ StorageServer, which offers to hold data for a limited period of time (a
|
|||||||
that would cause it to consume more space than it wants to provide. When a
|
that would cause it to consume more space than it wants to provide. When a
|
||||||
lease expires, the data is deleted. Peers might renew their leases.
|
lease expires, the data is deleted. Peers might renew their leases.
|
||||||
|
|
||||||
This storage is used to hold "shares", which are themselves used to store
|
This storage is used to hold "shares", which are encoded pieces of files in
|
||||||
files in the grid. There are many shares for each file, typically between 10
|
the grid. There are many shares for each file, typically between 10 and 100
|
||||||
and 100 (the exact number depends upon the tradeoffs made between
|
(the exact number depends upon the tradeoffs made between reliability,
|
||||||
reliability, overhead, and storage space consumed). The files are indexed by
|
overhead, and storage space consumed). The files are indexed by a
|
||||||
a "StorageIndex", which is derived from the encryption key, which may be
|
"StorageIndex", which is derived from the encryption key, which may be
|
||||||
randomly generated or it may be derived from the contents of the file. Leases
|
randomly generated or it may be derived from the contents of the file. Leases
|
||||||
are indexed by StorageIndex, and a single StorageServer may hold multiple
|
are indexed by StorageIndex, and a single StorageServer may hold multiple
|
||||||
shares for the corresponding file. Multiple peers can hold leases on the same
|
shares for the corresponding file. Multiple peers can hold leases on the same
|
||||||
@ -90,7 +91,7 @@ be used to reconstruct the whole file. The shares are then deposited in
|
|||||||
StorageServers in other peers.
|
StorageServers in other peers.
|
||||||
|
|
||||||
A tagged hash of the encryption key is used to form the "storage index",
|
A tagged hash of the encryption key is used to form the "storage index",
|
||||||
which is used for both peer selection (described below) and to index shares
|
which is used for both server selection (described below) and to index shares
|
||||||
within the StorageServers on the selected peers.
|
within the StorageServers on the selected peers.
|
||||||
|
|
||||||
A variety of hashes are computed while the shares are being produced, to
|
A variety of hashes are computed while the shares are being produced, to
|
||||||
@ -173,10 +174,10 @@ accurate. The plan is to store this capability next to the URI in the virtual
|
|||||||
drive structure.
|
drive structure.
|
||||||
|
|
||||||
|
|
||||||
PEER SELECTION
|
SERVER SELECTION
|
||||||
|
|
||||||
When a file is uploaded, the encoded shares are sent to other peers. But to
|
When a file is uploaded, the encoded shares are sent to other peers. But to
|
||||||
which ones? The "peer selection" algorithm is used to make this choice.
|
which ones? The "server selection" algorithm is used to make this choice.
|
||||||
|
|
||||||
In the current version, the verifierid is used to consistently-permute the
|
In the current version, the verifierid is used to consistently-permute the
|
||||||
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
|
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
|
||||||
|
@ -3,12 +3,9 @@ CODE OVERVIEW
|
|||||||
|
|
||||||
A brief map to where the code lives in this distribution:
|
A brief map to where the code lives in this distribution:
|
||||||
|
|
||||||
src/zfec: the erasure-coding library, turns data into shares and back again.
|
src/allmydata: the code for this project. When installed, this provides the
|
||||||
When installed, this provides the 'zfec' package.
|
'allmydata' package. This includes a few pieces copied from
|
||||||
|
the PyCrypto package, in allmydata/Crypto/* .
|
||||||
src/allmydata: the bulk of the code for this project. When installed, this
|
|
||||||
provides the 'allmydata' package. This includes a few pieces
|
|
||||||
copied from the PyCrypto package, in allmydata/Crypto/* .
|
|
||||||
|
|
||||||
Within src/allmydata/ :
|
Within src/allmydata/ :
|
||||||
|
|
||||||
@ -29,12 +26,13 @@ Within src/allmydata/ :
|
|||||||
|
|
||||||
storageserver.py: provides storage services to other nodes
|
storageserver.py: provides storage services to other nodes
|
||||||
|
|
||||||
codec.py: low-level erasure coding, wraps zfec
|
codec.py: low-level erasure coding, wraps the zfec library
|
||||||
|
|
||||||
encode.py: handles turning data into shares and blocks, computes hash trees
|
encode.py: handles turning data into shares and blocks, computes hash trees
|
||||||
upload.py: upload-side peer selection, reading data from upload sources
|
|
||||||
|
|
||||||
download.py: download-side peer selection, share retrieval, decoding
|
upload.py: upload server selection, reading data from upload sources
|
||||||
|
|
||||||
|
download.py: download server selection, share retrieval, decoding
|
||||||
|
|
||||||
dirnode.py: implements the directory nodes. One part runs on the
|
dirnode.py: implements the directory nodes. One part runs on the
|
||||||
global vdrive server, the other runs inside a client
|
global vdrive server, the other runs inside a client
|
||||||
|
Loading…
Reference in New Issue
Block a user