mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-20 05:28:04 +00:00
docs: update docs/architecture.txt to more fully and correctly explain the upload procedure
This commit is contained in:
parent
e225f573b9
commit
77aabe7066
@ -139,63 +139,67 @@ key-value layer.
|
|||||||
|
|
||||||
SERVER SELECTION
|
SERVER SELECTION
|
||||||
|
|
||||||
When a file is uploaded, the encoded shares are sent to other nodes. But to
|
When a file is uploaded, the encoded shares are sent to some servers. But to
|
||||||
which ones? The "server selection" algorithm is used to make this choice.
|
which ones? The "server selection" algorithm is used to make this choice.
|
||||||
|
|
||||||
In the current version, the storage index is used to consistently-permute the
|
The storage index is used to consistently-permute the set of all servers nodes
|
||||||
set of all peer nodes (by sorting the peer nodes by
|
(by sorting them by HASH(storage_index+nodeid)). Each file gets a different
|
||||||
HASH(storage_index+peerid)). Each file gets a different permutation, which
|
permutation, which (on average) will evenly distribute shares among the grid
|
||||||
(on average) will evenly distribute shares among the grid and avoid hotspots.
|
and avoid hotspots. Each server has announced its available space when it
|
||||||
We first remove any peer nodes that cannot hold an encoded share for our file,
|
connected to the introducer, and we use that available space information to
|
||||||
and then ask some of the peers that we have removed if they are already
|
remove any servers that cannot hold an encoded share for our file. Then we ask
|
||||||
holding encoded shares for our file; we use this information later. This step
|
some of the servers thus removed if they are already holding any encoded shares
|
||||||
helps conserve space, time, and bandwidth by making the upload process less
|
for our file; we use this information later. (We ask any servers which are in
|
||||||
likely to upload encoded shares that already exist.
|
the first 2*N elements of the permuted list.)
|
||||||
|
|
||||||
We then use this permuted list of nodes to ask each node, in turn, if it will
|
We then use the permuted list of servers to ask each server, in turn, if it
|
||||||
hold a share for us, by sending an 'allocate_buckets() query' to each one.
|
will hold a share for us (a share that was not reported as being already
|
||||||
Some will say yes, others (those who have become full since the start of peer
|
present when we talked to the full servers earlier, and that we have not
|
||||||
selection) will say no: when a node refuses our request, we just take that
|
already planned to upload to a different server). We plan to send a share to a
|
||||||
share to the next node on the list. We keep going until we run out of shares
|
server by sending an 'allocate_buckets() query' to the server with the number
|
||||||
to place. At the end of the process, we'll have a table that maps each share
|
of that share. Some will say yes they can hold that share, others (those who
|
||||||
number to a node, and then we can begin the encode+push phase, using the table
|
have become full since they announced their available space) will say no; when
|
||||||
to decide where each share should be sent.
|
a server refuses our request, we take that share to the next server on the
|
||||||
|
list. In the response to allocate_buckets() the server will also inform us of
|
||||||
|
any shares of that file that it already has. We keep going until we run out of
|
||||||
|
shares that need to be stored. At the end of the process, we'll have a table
|
||||||
|
that maps each share number to a server, and then we can begin the encode and
|
||||||
|
push phase, using the table to decide where each share should be sent.
|
||||||
|
|
||||||
Most of the time, this will result in one share per node, which gives us
|
Most of the time, this will result in one share per server, which gives us
|
||||||
maximum reliability (since it disperses the failures as widely as possible).
|
maximum reliability. If there are fewer writable servers than there are
|
||||||
If there are fewer useable nodes than there are shares, we'll be forced to
|
unstored shares, we'll be forced to loop around, eventually giving multiple
|
||||||
loop around, eventually giving multiple shares to a single node. This reduces
|
shares to a single server.
|
||||||
reliability, so it isn't the sort of thing we want to happen all the time,
|
|
||||||
and either indicates that the default encoding parameters are set incorrectly
|
If we have to loop through the node list a second time, we accelerate the query
|
||||||
(creating more shares than you have nodes), or that the grid does not have
|
|
||||||
enough space (many nodes are full). But apart from that, it doesn't hurt. If
|
|
||||||
we have to loop through the node list a second time, we accelerate the query
|
|
||||||
process, by asking each node to hold multiple shares on the second pass. In
|
process, by asking each node to hold multiple shares on the second pass. In
|
||||||
most cases, this means we'll never send more than two queries to any given
|
most cases, this means we'll never send more than two queries to any given
|
||||||
node.
|
node.
|
||||||
|
|
||||||
If a node is unreachable, or has an error, or refuses to accept any of our
|
If a server is unreachable, or has an error, or refuses to accept any of our
|
||||||
shares, we remove them from the permuted list, so we won't query them a
|
shares, we remove it from the permuted list, so we won't query it again for
|
||||||
second time for this file. If a node already has shares for the file we're
|
this file. If a server already has shares for the file we're uploading, we add
|
||||||
uploading (or if someone else is currently sending them shares), we add that
|
that information to the share-to-server table. This lets us do less work for
|
||||||
information to the share-to-peer-node table. This lets us do less work for
|
files which have been uploaded once before, while making sure we still wind up
|
||||||
files which have been uploaded once before, while making sure we still wind
|
with as many shares as we desire.
|
||||||
up with as many shares as we desire.
|
|
||||||
|
|
||||||
If we are unable to place every share that we want, but we still managed to
|
If we are unable to place every share that we want, but we still managed to
|
||||||
place a quantity known as "servers of happiness" that each map to a unique
|
place enough shares on enough servers to achieve a condition called "servers of
|
||||||
server, we'll do the upload anyways. If we cannot place at least this many
|
happiness" then we'll do the upload anyways. If we cannot achieve "servers of
|
||||||
in this way, the upload is declared a failure.
|
happiness", the upload is declared a failure.
|
||||||
|
|
||||||
The current defaults use k=3, servers_of_happiness=7, and N=10, meaning that
|
The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that
|
||||||
we'll try to place 10 shares, we'll be happy if we can place shares on enough
|
we'll try to place 10 shares. k=3 means that we need any three shares to
|
||||||
servers that there are 7 different servers, the correct functioning of any 3 of
|
recover the file. servers_of_happiness=7 means that we'll consider the upload
|
||||||
which guarantee the availability of the file, and we need to get back any 3 to
|
to be successful if we can place shares on enough servers that there are 7
|
||||||
recover the file. This results in a 3.3x expansion factor. On a small grid, you
|
different servers, the correct functioning of any k of which guarantee the
|
||||||
|
availability of the file.
|
||||||
|
|
||||||
|
N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you
|
||||||
should set N about equal to the number of storage servers in your grid; on a
|
should set N about equal to the number of storage servers in your grid; on a
|
||||||
large grid, you might set it to something smaller to avoid the overhead of
|
large grid, you might set it to something smaller to avoid the overhead of
|
||||||
contacting every server to place a file. In either case, you should then set k
|
contacting every server to place a file. In either case, you should then set k
|
||||||
such that N/k reflects your desired availability goals. The correct value for
|
such that N/k reflects your desired availability goals. The best value for
|
||||||
servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
|
servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
|
||||||
a variable number of servers, it might make sense to set it to the smallest
|
a variable number of servers, it might make sense to set it to the smallest
|
||||||
number of servers that you expect to have online and accepting shares at any
|
number of servers that you expect to have online and accepting shares at any
|
||||||
|
Loading…
Reference in New Issue
Block a user