docs: update docs/architecture.txt to more fully and correctly explain the upload procedure

This commit is contained in:
Zooko O'Whielacronx 2010-05-13 21:34:58 -07:00
parent e225f573b9
commit 77aabe7066

View File

@ -139,63 +139,67 @@ key-value layer.
SERVER SELECTION SERVER SELECTION
When a file is uploaded, the encoded shares are sent to other nodes. But to When a file is uploaded, the encoded shares are sent to some servers. But to
which ones? The "server selection" algorithm is used to make this choice. which ones? The "server selection" algorithm is used to make this choice.
In the current version, the storage index is used to consistently-permute the The storage index is used to consistently-permute the set of all servers nodes
set of all peer nodes (by sorting the peer nodes by (by sorting them by HASH(storage_index+nodeid)). Each file gets a different
HASH(storage_index+peerid)). Each file gets a different permutation, which permutation, which (on average) will evenly distribute shares among the grid
(on average) will evenly distribute shares among the grid and avoid hotspots. and avoid hotspots. Each server has announced its available space when it
We first remove any peer nodes that cannot hold an encoded share for our file, connected to the introducer, and we use that available space information to
and then ask some of the peers that we have removed if they are already remove any servers that cannot hold an encoded share for our file. Then we ask
holding encoded shares for our file; we use this information later. This step some of the servers thus removed if they are already holding any encoded shares
helps conserve space, time, and bandwidth by making the upload process less for our file; we use this information later. (We ask any servers which are in
likely to upload encoded shares that already exist. the first 2*N elements of the permuted list.)
We then use this permuted list of nodes to ask each node, in turn, if it will We then use the permuted list of servers to ask each server, in turn, if it
hold a share for us, by sending an 'allocate_buckets() query' to each one. will hold a share for us (a share that was not reported as being already
Some will say yes, others (those who have become full since the start of peer present when we talked to the full servers earlier, and that we have not
selection) will say no: when a node refuses our request, we just take that already planned to upload to a different server). We plan to send a share to a
share to the next node on the list. We keep going until we run out of shares server by sending an 'allocate_buckets() query' to the server with the number
to place. At the end of the process, we'll have a table that maps each share of that share. Some will say yes they can hold that share, others (those who
number to a node, and then we can begin the encode+push phase, using the table have become full since they announced their available space) will say no; when
to decide where each share should be sent. a server refuses our request, we take that share to the next server on the
list. In the response to allocate_buckets() the server will also inform us of
any shares of that file that it already has. We keep going until we run out of
shares that need to be stored. At the end of the process, we'll have a table
that maps each share number to a server, and then we can begin the encode and
push phase, using the table to decide where each share should be sent.
Most of the time, this will result in one share per node, which gives us Most of the time, this will result in one share per server, which gives us
maximum reliability (since it disperses the failures as widely as possible). maximum reliability. If there are fewer writable servers than there are
If there are fewer useable nodes than there are shares, we'll be forced to unstored shares, we'll be forced to loop around, eventually giving multiple
loop around, eventually giving multiple shares to a single node. This reduces shares to a single server.
reliability, so it isn't the sort of thing we want to happen all the time,
and either indicates that the default encoding parameters are set incorrectly If we have to loop through the node list a second time, we accelerate the query
(creating more shares than you have nodes), or that the grid does not have
enough space (many nodes are full). But apart from that, it doesn't hurt. If
we have to loop through the node list a second time, we accelerate the query
process, by asking each node to hold multiple shares on the second pass. In process, by asking each node to hold multiple shares on the second pass. In
most cases, this means we'll never send more than two queries to any given most cases, this means we'll never send more than two queries to any given
node. node.
If a node is unreachable, or has an error, or refuses to accept any of our If a server is unreachable, or has an error, or refuses to accept any of our
shares, we remove them from the permuted list, so we won't query them a shares, we remove it from the permuted list, so we won't query it again for
second time for this file. If a node already has shares for the file we're this file. If a server already has shares for the file we're uploading, we add
uploading (or if someone else is currently sending them shares), we add that that information to the share-to-server table. This lets us do less work for
information to the share-to-peer-node table. This lets us do less work for files which have been uploaded once before, while making sure we still wind up
files which have been uploaded once before, while making sure we still wind with as many shares as we desire.
up with as many shares as we desire.
If we are unable to place every share that we want, but we still managed to If we are unable to place every share that we want, but we still managed to
place a quantity known as "servers of happiness" that each map to a unique place enough shares on enough servers to achieve a condition called "servers of
server, we'll do the upload anyways. If we cannot place at least this many happiness" then we'll do the upload anyways. If we cannot achieve "servers of
in this way, the upload is declared a failure. happiness", the upload is declared a failure.
The current defaults use k=3, servers_of_happiness=7, and N=10, meaning that The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that
we'll try to place 10 shares, we'll be happy if we can place shares on enough we'll try to place 10 shares. k=3 means that we need any three shares to
servers that there are 7 different servers, the correct functioning of any 3 of recover the file. servers_of_happiness=7 means that we'll consider the upload
which guarantee the availability of the file, and we need to get back any 3 to to be successful if we can place shares on enough servers that there are 7
recover the file. This results in a 3.3x expansion factor. On a small grid, you different servers, the correct functioning of any k of which guarantee the
availability of the file.
N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you
should set N about equal to the number of storage servers in your grid; on a should set N about equal to the number of storage servers in your grid; on a
large grid, you might set it to something smaller to avoid the overhead of large grid, you might set it to something smaller to avoid the overhead of
contacting every server to place a file. In either case, you should then set k contacting every server to place a file. In either case, you should then set k
such that N/k reflects your desired availability goals. The correct value for such that N/k reflects your desired availability goals. The best value for
servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
a variable number of servers, it might make sense to set it to the smallest a variable number of servers, it might make sense to set it to the smallest
number of servers that you expect to have online and accepting shares at any number of servers that you expect to have online and accepting shares at any