mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-23 14:52:26 +00:00
docs: formatting.
This commit is contained in:
parent
dfd9c8a949
commit
a86724ccd8
@ -24,7 +24,7 @@ There are three layers: the key-value store, the filesystem, and the
|
||||
application.
|
||||
|
||||
The lowest layer is the key-value store. The keys are "capabilities" -- short
|
||||
ascii strings -- and the values are sequences of data bytes. This data is
|
||||
ASCII strings -- and the values are sequences of data bytes. This data is
|
||||
encrypted and distributed across a number of nodes, such that it will survive
|
||||
the loss of most of the nodes. There are no hard limits on the size of the
|
||||
values, but there may be performance issues with extremely large values (just
|
||||
@ -173,7 +173,7 @@ connected to the introducer, and we use that available space information to
|
||||
remove any servers that cannot hold an encoded share for our file. Then we ask
|
||||
some of the servers thus removed if they are already holding any encoded shares
|
||||
for our file; we use this information later. (We ask any servers which are in
|
||||
the first 2*N elements of the permuted list.)
|
||||
the first 2*``N`` elements of the permuted list.)
|
||||
|
||||
We then use the permuted list of servers to ask each server, in turn, if it
|
||||
will hold a share for us (a share that was not reported as being already
|
||||
@ -222,23 +222,23 @@ process reside on only one storage server. We hope to extend
|
||||
at the end of the upload process, the appropriate upload health check fails,
|
||||
the upload is considered a failure.
|
||||
|
||||
The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that
|
||||
we'll try to place 10 shares. k=3 means that we need any three shares to
|
||||
recover the file. servers_of_happiness=7 means that we'll consider an immutable
|
||||
file upload to be successful if we can place shares on enough servers that
|
||||
there are 7 different servers, the correct functioning of any k of which
|
||||
guarantee the availability of the immutable file.
|
||||
The current defaults use ``k``=3, ``servers_of_happiness``=7, and ``N``=10.
|
||||
``N``=10 means that we'll try to place 10 shares. ``k``=3 means that we need
|
||||
any three shares to recover the file. ``servers_of_happiness``=7 means that
|
||||
we'll consider an immutable file upload to be successful if we can place shares
|
||||
on enough servers that there are 7 different servers, the correct functioning
|
||||
of any ``k`` of which guarantee the availability of the immutable file.
|
||||
|
||||
N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you
|
||||
should set N about equal to the number of storage servers in your grid; on a
|
||||
``N``=10 and ``k``=3 means there is a 3.3x expansion factor. On a small grid, you
|
||||
should set ``N`` about equal to the number of storage servers in your grid; on a
|
||||
large grid, you might set it to something smaller to avoid the overhead of
|
||||
contacting every server to place a file. In either case, you should then set k
|
||||
such that N/k reflects your desired availability goals. The best value for
|
||||
servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with
|
||||
a variable number of servers, it might make sense to set it to the smallest
|
||||
contacting every server to place a file. In either case, you should then set ``k``
|
||||
such that ``N``/``k`` reflects your desired availability goals. The best value for
|
||||
``servers_of_happiness`` will depend on how you use Tahoe-LAFS. In a friendnet
|
||||
with a variable number of servers, it might make sense to set it to the smallest
|
||||
number of servers that you expect to have online and accepting shares at any
|
||||
given time. In a stable environment without much server churn, it may make
|
||||
sense to set servers_of_happiness = N.
|
||||
sense to set ``servers_of_happiness`` = ``N``.
|
||||
|
||||
When downloading a file, the current version just asks all known servers for
|
||||
any shares they might have. Once it has received enough responses that it
|
||||
@ -260,7 +260,7 @@ times), if possible.
|
||||
clockwise from 0 with a basket. Each time it encountered a share, it put it
|
||||
in the basket, each time it encountered a server, give it as many shares
|
||||
from the basket as they'd accept. This reduced the number of queries
|
||||
(usually to 1) for small grids (where N is larger than the number of
|
||||
(usually to 1) for small grids (where ``N`` is larger than the number of
|
||||
nodes), but resulted in extremely non-uniform share distribution, which
|
||||
significantly hurt reliability (sometimes the permutation resulted in most
|
||||
of the shares being dumped on a single node).
|
||||
@ -395,7 +395,7 @@ which nodes ought to hold shares for this file, and to see if those nodes are
|
||||
still around and willing to provide the data. If the file is not healthy
|
||||
enough, the File Repairer is invoked to download the ciphertext, regenerate
|
||||
any missing shares, and upload them to new nodes. The goal of the File
|
||||
Repairer is to finish up with a full set of "N" shares.
|
||||
Repairer is to finish up with a full set of ``N`` shares.
|
||||
|
||||
There are a number of engineering issues to be resolved here. The bandwidth,
|
||||
disk IO, and CPU time consumed by the verification/repair process must be
|
||||
@ -498,11 +498,11 @@ File encoding and peer-node selection parameters can be adjusted to achieve
|
||||
different goals. Each choice results in a number of properties; there are
|
||||
many tradeoffs.
|
||||
|
||||
First, some terms: the erasure-coding algorithm is described as K-out-of-N
|
||||
(for this release, the default values are K=3 and N=10). Each grid will have
|
||||
some number of nodes; this number will rise and fall over time as nodes join,
|
||||
drop out, come back, and leave forever. Files are of various sizes, some are
|
||||
popular, others are unpopular. Nodes have various capacities, variable
|
||||
First, some terms: the erasure-coding algorithm is described as ``k``-out-of-``N``
|
||||
(for this release, the default values are ``k``=3 and ``N``=10). Each grid will
|
||||
have some number of nodes; this number will rise and fall over time as nodes
|
||||
join, drop out, come back, and leave forever. Files are of various sizes, some
|
||||
are popular, others are unpopular. Nodes have various capacities, variable
|
||||
upload/download bandwidths, and network latency. Most of the mathematical
|
||||
models that look at node failure assume some average (and independent)
|
||||
probability 'P' of a given node being available: this can be high (servers
|
||||
@ -510,14 +510,14 @@ tend to be online and available >90% of the time) or low (laptops tend to be
|
||||
turned on for an hour then disappear for several days). Files are encoded in
|
||||
segments of a given maximum size, which affects memory usage.
|
||||
|
||||
The ratio of N/K is the "expansion factor". Higher expansion factors improve
|
||||
reliability very quickly (the binomial distribution curve is very sharp), but
|
||||
consumes much more grid capacity. When P=50%, the absolute value of K affects
|
||||
the granularity of the binomial curve (1-out-of-2 is much worse than
|
||||
The ratio of ``N``/``k`` is the "expansion factor". Higher expansion factors
|
||||
improve reliability very quickly (the binomial distribution curve is very sharp),
|
||||
but consumes much more grid capacity. When P=50%, the absolute value of ``k``
|
||||
affects the granularity of the binomial curve (1-out-of-2 is much worse than
|
||||
50-out-of-100), but high values asymptotically approach a constant (i.e.
|
||||
500-of-1000 is not much better than 50-of-100). When P is high and the
|
||||
expansion factor is held at a constant, higher values of K and N give much
|
||||
better reliability (for P=99%, 50-out-of-100 is much much better than
|
||||
expansion factor is held at a constant, higher values of ``k`` and ``N`` give
|
||||
much better reliability (for P=99%, 50-out-of-100 is much much better than
|
||||
5-of-10, roughly 10^50 times better), because there are more shares that can
|
||||
be lost without losing the file.
|
||||
|
||||
@ -537,7 +537,7 @@ rate at which nodes come and go will be higher (requiring network maintenance
|
||||
traffic). Also, the File Repairer work will increase with larger grids,
|
||||
although then the job can be distributed out to more nodes.
|
||||
|
||||
Higher values of N increase overhead: more shares means more Merkle hashes
|
||||
Higher values of ``N`` increase overhead: more shares means more Merkle hashes
|
||||
that must be included with the data, and more nodes to contact to retrieve
|
||||
the shares. Smaller segment sizes reduce memory usage (since each segment
|
||||
must be held in memory while erasure coding runs) and improves "alacrity"
|
||||
|
@ -8,13 +8,13 @@ The New York Times has recently reported that the current U.S. administration
|
||||
is proposing a bill that would apparently, if passed, require communication
|
||||
systems to facilitate government wiretapping and access to encrypted data:
|
||||
|
||||
http://www.nytimes.com/2010/09/27/us/27wiretap.html (login required; username/password pairs available at http://www.bugmenot.com/view/nytimes.com).
|
||||
`<http://www.nytimes.com/2010/09/27/us/27wiretap.html>`_ (login required; username/password pairs
|
||||
available at `bugmenot <http://www.bugmenot.com/view/nytimes.com>`_).
|
||||
|
||||
Commentary by the Electronic Frontier Foundation
|
||||
(https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman /
|
||||
Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ),
|
||||
Julian Sanchez / Cato Institute
|
||||
(http://www.cato-at-liberty.org/designing-an-insecure-internet/ ).
|
||||
Commentary by the
|
||||
`Electronic Frontier Foundation <https://www.eff.org/deeplinks/2010/09/government-seeks>`_,
|
||||
`Peter Suderman / Reason <http://reason.com/blog/2010/09/27/obama-administration-frustrate>`_,
|
||||
`Julian Sanchez / Cato Institute <http://www.cato-at-liberty.org/designing-an-insecure-internet/>`_.
|
||||
|
||||
The core Tahoe developers promise never to change Tahoe-LAFS to facilitate
|
||||
government access to data stored or transmitted by it. Even if it were
|
||||
@ -23,8 +23,9 @@ technically feasible to do so without severely compromising Tahoe-LAFS'
|
||||
security against other attackers. There have been many examples in which
|
||||
backdoors intended for use by government have introduced vulnerabilities
|
||||
exploitable by other parties (a notable example being the Greek cellphone
|
||||
eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the
|
||||
security case against such backdoors.
|
||||
eavesdropping scandal in 2004/5). RFCs `1984 <http://tools.ietf.org/html/rfc1984>`_
|
||||
and `2804 <http://tools.ietf.org/html/rfc2804>`_ elaborate on the security case
|
||||
against such backdoors.
|
||||
|
||||
Note that since Tahoe-LAFS is open-source software, forks by people other than
|
||||
the current core developers are possible. In that event, we would try to
|
||||
|
@ -140,7 +140,7 @@ starting point: some specific directory that we will refer to as a
|
||||
"starting directory". For a given starting directory, the
|
||||
"``ls [STARTING_DIR]``" command would list the contents of this directory,
|
||||
the "``ls [STARTING_DIR]/dir1``" command would look inside this directory
|
||||
for a child named "dir1" and list its contents,
|
||||
for a child named "``dir1``" and list its contents,
|
||||
"``ls [STARTING_DIR]/dir1/subdir2``" would look two levels deep, etc.
|
||||
|
||||
Note that there is no real global "root" directory, but instead each
|
||||
@ -256,9 +256,9 @@ Command Syntax Summary
|
||||
|
||||
In these summaries, ``PATH``, ``TOPATH`` or ``FROMPATH`` can be one of::
|
||||
|
||||
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
|
||||
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
|
||||
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
|
||||
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
|
||||
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
|
||||
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
|
||||
|
||||
|
||||
Command Examples
|
||||
|
Loading…
Reference in New Issue
Block a user