mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-23 14:52:26 +00:00
docs: edit architecture.txt with Amber's help
This commit is contained in:
parent
07a45cd232
commit
5bc69329fc
@ -3,54 +3,43 @@
|
||||
|
||||
OVERVIEW
|
||||
|
||||
The high-level view of this system consists of three layers: the grid, the
|
||||
virtual drive, and the application that sits on top.
|
||||
At a high-level this system consists of three layers: the grid, the
|
||||
filesystem, and the application.
|
||||
|
||||
The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
|
||||
which maps capabilities to data. The capabilities are relatively short ascii
|
||||
strings, and each is used as a reference to an arbitrary-length sequence of
|
||||
data bytes. This data is encrypted and distributed around the grid across a
|
||||
large number of nodes, such that a large fraction of the nodes would have to
|
||||
be unavailable for the data to become unavailable.
|
||||
The lowest layer is the "grid", a mapping from capabilities to
|
||||
data. The capabilities are relatively short ascii strings, each used
|
||||
as a reference to an arbitrary-length sequence of data bytes. This
|
||||
data is encrypted and distributed across a number of nodes, such that
|
||||
it will survive the loss of most of the nodes.
|
||||
|
||||
The middle layer is the virtual drive: a directed-acyclic-graph-shaped data
|
||||
structure in which the intermediate nodes are directories and the leaf nodes
|
||||
are files. The leaf nodes contain only the file data -- they don't contain
|
||||
any metadata about the file except for the length. The edges that lead to
|
||||
leaf nodes have metadata attached to them about the file that they point to.
|
||||
Therefore, the same file may have different metadata associated with it if it
|
||||
is dereferenced through different edges.
|
||||
The middle layer is the decentralized filesystem: a directed graph in
|
||||
which the intermediate nodes are directories and the leaf nodes are
|
||||
files. The leaf nodes contain only the file data -- they contain no
|
||||
metadata about the file other than the length. The edges leading to
|
||||
leaf nodes have metadata attached to them about the file they point
|
||||
to. Therefore, the same file may be associated with different
|
||||
metadata if it is dereferenced through different edges.
|
||||
|
||||
The top layer is where the applications that use this virtual drive operate.
|
||||
Allmydata uses this for a backup service, in which the application copies the
|
||||
files to be backed up from the local disk into the virtual drive on a
|
||||
periodic basis. By providing read-only access to the same virtual drive
|
||||
later, a user can recover older versions of their files. Other sorts of
|
||||
applications can run on top of the virtual drive, of course -- anything that
|
||||
has a use for a secure, decentralized, fault-tolerant filesystem.
|
||||
The top layer consists of the applications using the filesystem.
|
||||
Allmydata.com uses it for a backup service: the application
|
||||
periodically copies files from the local disk onto the decentralized
|
||||
filesystem We later provide read-only access to those files, allowing
|
||||
users to recover them. The filesystem can be used by other
|
||||
applications, too.
|
||||
|
||||
|
||||
THE BIG GRID OF STORAGE SERVERS
|
||||
THE GRID OF STORAGE SERVERS
|
||||
|
||||
Underlying the grid is a large collection of peer nodes. These are processes
|
||||
running on a wide variety of computers, all of which know about each other in
|
||||
some way or another. They establish TCP connections to one another using
|
||||
Foolscap, an encrypted+authenticated remote message passing library (using
|
||||
TLS connections and self-authenticating identifiers called "FURLs").
|
||||
Underlying the grid is a collection of peer nodes -- processes running
|
||||
on computers. They establish TCP connections to each other using
|
||||
Foolscap, a secure remote message passing library.
|
||||
|
||||
Each peer offers certain services to the others. The primary service is the
|
||||
StorageServer, which offers to hold data. Each StorageServer has a quota, and
|
||||
it will reject storage requests that would cause it to consume more space
|
||||
than it wants to provide.
|
||||
|
||||
This storage is used to hold "shares", which are encoded pieces of files in
|
||||
the grid. There are many shares for each file, typically between 10 and 100
|
||||
(the exact number depends upon the tradeoffs made between reliability,
|
||||
overhead, and storage space consumed). The files are indexed by a
|
||||
"StorageIndex", which is derived from the encryption key, which is derived
|
||||
from the contents of the file. Leases are indexed by StorageIndex, and a
|
||||
single StorageServer may hold multiple shares for the corresponding
|
||||
file. Multiple peers can hold leases on the same file.
|
||||
Each peer offers certain services to the others. The primary service
|
||||
is that of the storage server, which holds data in the form of
|
||||
"shares". Shares are encoded pieces of files. There are a
|
||||
configurable number of shares for each file, 12 by default. Normally,
|
||||
each share is stored on a separate server, but a single server can
|
||||
hold multiple shares for a single file.
|
||||
|
||||
Peers learn about each other through the "introducer". Each peer connects to
|
||||
this central introducer at startup, and receives a list of all other peers
|
||||
|
Loading…
Reference in New Issue
Block a user