docs: formatting.

This commit is contained in:
david-sarah 2010-12-12 12:11:15 -08:00
parent dfd9c8a949
commit a86724ccd8
3 changed files with 42 additions and 41 deletions

View File

@ -24,7 +24,7 @@ There are three layers: the key-value store, the filesystem, and the
application.
The lowest layer is the key-value store. The keys are "capabilities" -- short
ascii strings -- and the values are sequences of data bytes. This data is
ASCII strings -- and the values are sequences of data bytes. This data is
encrypted and distributed across a number of nodes, such that it will survive
the loss of most of the nodes. There are no hard limits on the size of the
values, but there may be performance issues with extremely large values (just
@ -173,7 +173,7 @@ connected to the introducer, and we use that available space information to
remove any servers that cannot hold an encoded share for our file. Then we ask
some of the servers thus removed if they are already holding any encoded shares
for our file; we use this information later. (We ask any servers which are in
the first 2*N elements of the permuted list.)
the first 2*``N`` elements of the permuted list.)
We then use the permuted list of servers to ask each server, in turn, if it
will hold a share for us (a share that was not reported as being already
@ -222,23 +222,23 @@ process reside on only one storage server. We hope to extend
at the end of the upload process, the appropriate upload health check fails,
the upload is considered a failure.
The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that
we'll try to place 10 shares. k=3 means that we need any three shares to
recover the file. servers_of_happiness=7 means that we'll consider an immutable
file upload to be successful if we can place shares on enough servers that
there are 7 different servers, the correct functioning of any k of which
guarantee the availability of the immutable file.
The current defaults use ``k``=3, ``servers_of_happiness``=7, and ``N``=10.
``N``=10 means that we'll try to place 10 shares. ``k``=3 means that we need
any three shares to recover the file. ``servers_of_happiness``=7 means that
we'll consider an immutable file upload to be successful if we can place shares
on enough servers that there are 7 different servers, the correct functioning
of any ``k`` of which guarantee the availability of the immutable file.
N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you
should set N about equal to the number of storage servers in your grid; on a
``N``=10 and ``k``=3 means there is a 3.3x expansion factor. On a small grid, you
should set ``N`` about equal to the number of storage servers in your grid; on a
large grid, you might set it to something smaller to avoid the overhead of
contacting every server to place a file. In either case, you should then set k
such that N/k reflects your desired availability goals. The best value for
servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
a variable number of servers, it might make sense to set it to the smallest
contacting every server to place a file. In either case, you should then set ``k``
such that ``N``/``k`` reflects your desired availability goals. The best value for
``servers_of_happiness`` will depend on how you use Tahoe-LAFS. In a friendnet
with a variable number of servers, it might make sense to set it to the smallest
number of servers that you expect to have online and accepting shares at any
given time. In a stable environment without much server churn, it may make
sense to set servers_of_happiness = N.
sense to set ``servers_of_happiness`` = ``N``.
When downloading a file, the current version just asks all known servers for
any shares they might have. Once it has received enough responses that it
@ -260,7 +260,7 @@ times), if possible.
clockwise from 0 with a basket. Each time it encountered a share, it put it
in the basket, each time it encountered a server, give it as many shares
from the basket as they'd accept. This reduced the number of queries
(usually to 1) for small grids (where N is larger than the number of
(usually to 1) for small grids (where ``N`` is larger than the number of
nodes), but resulted in extremely non-uniform share distribution, which
significantly hurt reliability (sometimes the permutation resulted in most
of the shares being dumped on a single node).
@ -395,7 +395,7 @@ which nodes ought to hold shares for this file, and to see if those nodes are
still around and willing to provide the data. If the file is not healthy
enough, the File Repairer is invoked to download the ciphertext, regenerate
any missing shares, and upload them to new nodes. The goal of the File
Repairer is to finish up with a full set of "N" shares.
Repairer is to finish up with a full set of ``N`` shares.
There are a number of engineering issues to be resolved here. The bandwidth,
disk IO, and CPU time consumed by the verification/repair process must be
@ -498,11 +498,11 @@ File encoding and peer-node selection parameters can be adjusted to achieve
different goals. Each choice results in a number of properties; there are
many tradeoffs.
First, some terms: the erasure-coding algorithm is described as K-out-of-N
(for this release, the default values are K=3 and N=10). Each grid will have
some number of nodes; this number will rise and fall over time as nodes join,
drop out, come back, and leave forever. Files are of various sizes, some are
popular, others are unpopular. Nodes have various capacities, variable
First, some terms: the erasure-coding algorithm is described as ``k``-out-of-``N``
(for this release, the default values are ``k``=3 and ``N``=10). Each grid will
have some number of nodes; this number will rise and fall over time as nodes
join, drop out, come back, and leave forever. Files are of various sizes, some
are popular, others are unpopular. Nodes have various capacities, variable
upload/download bandwidths, and network latency. Most of the mathematical
models that look at node failure assume some average (and independent)
probability 'P' of a given node being available: this can be high (servers
@ -510,14 +510,14 @@ tend to be online and available >90% of the time) or low (laptops tend to be
turned on for an hour then disappear for several days). Files are encoded in
segments of a given maximum size, which affects memory usage.
The ratio of N/K is the "expansion factor". Higher expansion factors improve
reliability very quickly (the binomial distribution curve is very sharp), but
consumes much more grid capacity. When P=50%, the absolute value of K affects
the granularity of the binomial curve (1-out-of-2 is much worse than
The ratio of ``N``/``k`` is the "expansion factor". Higher expansion factors
improve reliability very quickly (the binomial distribution curve is very sharp),
but consumes much more grid capacity. When P=50%, the absolute value of ``k``
affects the granularity of the binomial curve (1-out-of-2 is much worse than
50-out-of-100), but high values asymptotically approach a constant (i.e.
500-of-1000 is not much better than 50-of-100). When P is high and the
expansion factor is held at a constant, higher values of K and N give much
better reliability (for P=99%, 50-out-of-100 is much much better than
expansion factor is held at a constant, higher values of ``k`` and ``N`` give
much better reliability (for P=99%, 50-out-of-100 is much much better than
5-of-10, roughly 10^50 times better), because there are more shares that can
be lost without losing the file.
@ -537,7 +537,7 @@ rate at which nodes come and go will be higher (requiring network maintenance
traffic). Also, the File Repairer work will increase with larger grids,
although then the job can be distributed out to more nodes.
Higher values of N increase overhead: more shares means more Merkle hashes
Higher values of ``N`` increase overhead: more shares means more Merkle hashes
that must be included with the data, and more nodes to contact to retrieve
the shares. Smaller segment sizes reduce memory usage (since each segment
must be held in memory while erasure coding runs) and improves "alacrity"

View File

@ -8,13 +8,13 @@ The New York Times has recently reported that the current U.S. administration
is proposing a bill that would apparently, if passed, require communication
systems to facilitate government wiretapping and access to encrypted data:
http://www.nytimes.com/2010/09/27/us/27wiretap.html (login required; username/password pairs available at http://www.bugmenot.com/view/nytimes.com).
`<http://www.nytimes.com/2010/09/27/us/27wiretap.html>`_ (login required; username/password pairs
available at `bugmenot <http://www.bugmenot.com/view/nytimes.com>`_).
Commentary by the Electronic Frontier Foundation
(https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman /
Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ),
Julian Sanchez / Cato Institute
(http://www.cato-at-liberty.org/designing-an-insecure-internet/ ).
Commentary by the
`Electronic Frontier Foundation <https://www.eff.org/deeplinks/2010/09/government-seeks>`_,
`Peter Suderman / Reason <http://reason.com/blog/2010/09/27/obama-administration-frustrate>`_,
`Julian Sanchez / Cato Institute <http://www.cato-at-liberty.org/designing-an-insecure-internet/>`_.
The core Tahoe developers promise never to change Tahoe-LAFS to facilitate
government access to data stored or transmitted by it. Even if it were
@ -23,8 +23,9 @@ technically feasible to do so without severely compromising Tahoe-LAFS'
security against other attackers. There have been many examples in which
backdoors intended for use by government have introduced vulnerabilities
exploitable by other parties (a notable example being the Greek cellphone
eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the
security case against such backdoors.
eavesdropping scandal in 2004/5). RFCs `1984 <http://tools.ietf.org/html/rfc1984>`_
and `2804 <http://tools.ietf.org/html/rfc2804>`_ elaborate on the security case
against such backdoors.
Note that since Tahoe-LAFS is open-source software, forks by people other than
the current core developers are possible. In that event, we would try to

View File

@ -140,7 +140,7 @@ starting point: some specific directory that we will refer to as a
"starting directory". For a given starting directory, the
"``ls [STARTING_DIR]``" command would list the contents of this directory,
the "``ls [STARTING_DIR]/dir1``" command would look inside this directory
for a child named "dir1" and list its contents,
for a child named "``dir1``" and list its contents,
"``ls [STARTING_DIR]/dir1/subdir2``" would look two levels deep, etc.
Note that there is no real global "root" directory, but instead each
@ -256,9 +256,9 @@ Command Syntax Summary
In these summaries, ``PATH``, ``TOPATH`` or ``FROMPATH`` can be one of::
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
Command Examples