docs: update docs/architecture.txt to more fully and correctly explain the upload procedure

2024-12-20 05:28:04 +00:00 · 2010-05-13 21:34:58 -07:00 · 2010-05-13 21:34:58 -07:00 · 77aabe7066
commit 77aabe7066
parent e225f573b9
1 changed files with 47 additions and 43 deletions
--- a/docs/architecture.txt
+++ b/docs/architecture.txt
@ -139,63 +139,67 @@ key-value layer.
 SERVER SELECTION
-When a file is uploaded, the encoded shares are sent to other nodes. But to
+When a file is uploaded, the encoded shares are sent to some servers. But to
 which ones? The "server selection" algorithm is used to make this choice.
-In the current version, the storage index is used to consistently-permute the
+The storage index is used to consistently-permute the set of all servers nodes
-set of all peer nodes (by sorting the peer nodes by
+(by sorting them by HASH(storage_index+nodeid)). Each file gets a different
-HASH(storage_index+peerid)). Each file gets a different permutation, which
+permutation, which (on average) will evenly distribute shares among the grid
-(on average) will evenly distribute shares among the grid and avoid hotspots.
+and avoid hotspots. Each server has announced its available space when it
-We first remove any peer nodes that cannot hold an encoded share for our file,
+connected to the introducer, and we use that available space information to
-and then ask some of the peers that we have removed if they are already
+remove any servers that cannot hold an encoded share for our file. Then we ask
-holding encoded shares for our file; we use this information later. This step
+some of the servers thus removed if they are already holding any encoded shares
-helps conserve space, time, and bandwidth by making the upload process less
+for our file; we use this information later. (We ask any servers which are in
-likely to upload encoded shares that already exist.
+the first 2*N elements of the permuted list.)
-We then use this permuted list of nodes to ask each node, in turn, if it will
+We then use the permuted list of servers to ask each server, in turn, if it
-hold a share for us, by sending an 'allocate_buckets() query' to each one.
+will hold a share for us (a share that was not reported as being already
-Some will say yes, others (those who have become full since the start of peer
+present when we talked to the full servers earlier, and that we have not
-selection) will say no: when a node refuses our request, we just take that
+already planned to upload to a different server). We plan to send a share to a
-share to the next node on the list. We keep going until we run out of shares
+server by sending an 'allocate_buckets() query' to the server with the number
-to place. At the end of the process, we'll have a table that maps each share
+of that share. Some will say yes they can hold that share, others (those who
-number to a node, and then we can begin the encode+push phase, using the table
+have become full since they announced their available space) will say no; when
-to decide where each share should be sent.
+a server refuses our request, we take that share to the next server on the
 list. In the response to allocate_buckets() the server will also inform us of
 any shares of that file that it already has. We keep going until we run out of
 shares that need to be stored. At the end of the process, we'll have a table
 that maps each share number to a server, and then we can begin the encode and
 push phase, using the table to decide where each share should be sent.
-Most of the time, this will result in one share per node, which gives us
+Most of the time, this will result in one share per server, which gives us
-maximum reliability (since it disperses the failures as widely as possible).
+maximum reliability.  If there are fewer writable servers than there are
-If there are fewer useable nodes than there are shares, we'll be forced to
+unstored shares, we'll be forced to loop around, eventually giving multiple
-loop around, eventually giving multiple shares to a single node. This reduces
+shares to a single server.
-reliability, so it isn't the sort of thing we want to happen all the time,
+
-and either indicates that the default encoding parameters are set incorrectly
+If we have to loop through the node list a second time, we accelerate the query
 (creating more shares than you have nodes), or that the grid does not have
 enough space (many nodes are full). But apart from that, it doesn't hurt. If
 we have to loop through the node list a second time, we accelerate the query
 process, by asking each node to hold multiple shares on the second pass. In
 most cases, this means we'll never send more than two queries to any given
 node.
-If a node is unreachable, or has an error, or refuses to accept any of our
+If a server is unreachable, or has an error, or refuses to accept any of our
-shares, we remove them from the permuted list, so we won't query them a
+shares, we remove it from the permuted list, so we won't query it again for
-second time for this file. If a node already has shares for the file we're
+this file. If a server already has shares for the file we're uploading, we add
-uploading (or if someone else is currently sending them shares), we add that
+that information to the share-to-server table. This lets us do less work for
-information to the share-to-peer-node table. This lets us do less work for
+files which have been uploaded once before, while making sure we still wind up
-files which have been uploaded once before, while making sure we still wind
+with as many shares as we desire.
 up with as many shares as we desire.
 If we are unable to place every share that we want, but we still managed to
-place a quantity known as "servers of happiness" that each map to a unique
+place enough shares on enough servers to achieve a condition called "servers of
-server, we'll do the upload anyways. If we cannot place at least this many
+happiness" then we'll do the upload anyways. If we cannot achieve "servers of
-in this way, the upload is declared a failure.
+happiness", the upload is declared a failure.
-The current defaults use k=3, servers_of_happiness=7, and N=10, meaning that
+The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that
-we'll try to place 10 shares, we'll be happy if we can place shares on enough
+we'll try to place 10 shares. k=3 means that we need any three shares to
-servers that there are 7 different servers, the correct functioning of any 3 of
+recover the file. servers_of_happiness=7 means that we'll consider the upload
-which guarantee the availability of the file, and we need to get back any 3 to
+to be successful if we can place shares on enough servers that there are 7
-recover the file. This results in a 3.3x expansion factor. On a small grid, you
+different servers, the correct functioning of any k of which guarantee the
 availability of the file.
 N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you
 should set N about equal to the number of storage servers in your grid; on a
 large grid, you might set it to something smaller to avoid the overhead of
 contacting every server to place a file. In either case, you should then set k
-such that N/k reflects your desired availability goals. The correct value for
+such that N/k reflects your desired availability goals. The best value for
 servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
 a variable number of servers, it might make sense to set it to the smallest
 number of servers that you expect to have online and accepting shares at any