architecture.txt: update to include tahoe2, dirnodes, leases

This commit is contained in:
Brian Warner 2007-09-17 18:24:48 -07:00
parent 645927ca73
commit 77d973471b

View File

@ -182,71 +182,75 @@ set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
gets a different permutation, which (on average) will evenly distribute gets a different permutation, which (on average) will evenly distribute
shares among the grid and avoid hotspots. shares among the grid and avoid hotspots.
This permutation places the peers around a 2^256-sized ring, like the rim of We use this permuted list of peers to ask each peer, in turn, if it will hold
a big clock. The 100-or-so shares are then placed around the same ring (at 0, on to share for us, by sending an 'allocate_buckets() query' to each one.
1/100*2^256, 2/100*2^256, ... 99/100*2^256). Imagine that we start at 0 with Some will say yes, others (those who are full) will say no: when a peer
an empty basket in hand and proceed clockwise. When we come to a share, we refuses our request, we just take that share to the next peer on the list. We
pick it up and put it in the basket. When we come to a peer, we ask that peer keep going until we run out of shares to place. At the end of the process,
if they will give us a lease for every share in our basket. we'll have a table that maps each share number to a peer, and then we can
begin the encode+push phase, using the table to decide where each share
should be sent.
The peer will grant us leases for some of those shares and reject others (if Most of the time, this will result in one share per peer, which gives us
they are full or almost full). If they reject all our requests, we remove maximum reliability (since it disperses the failures as widely as possible).
them from the ring, because they are full and thus unhelpful. Each share they If there are fewer useable peers than there are shares, we'll be forced to
accept is removed from the basket. The remainder stay in the basket as we loop around, eventually giving multiple shares to a single peer. This reduces
continue walking clockwise. reliability, so it isn't the sort of thing we want to happen all the time,
and either indicates that the default encoding parameters are set incorrectly
(creating more shares than you have peers), or that the grid does not have
enough space (many peers are full). But apart from that, it doesn't hurt. If
we have to loop through the peer list a second time, we accelerate the query
process, by asking each peer to hold multiple shares on the second pass. In
most cases, this means we'll never send more than two queries to any given
peer.
We keep walking, accumulating shares and distributing them to peers, until If a peer is unreachable, or has an error, or refuses to accept any of our
either we find a home for all shares, or there are no peers left in the ring shares, we remove them from the permuted list, so we won't query them a
(because they are all full). If we run out of peers before we run out of second time for this file. If a peer already has shares for the file we're
shares, the upload may be considered a failure, depending upon how many uploading (or if someone else is currently sending them shares), we add that
shares we were able to place. The current parameters try to place 100 shares, information to the share-to-peer table. This lets us do less work for files
of which 25 must be retrievable to recover the file, and the peer selection which have been uploaded once before, while making sure we still wind up with
algorithm is happy if it was able to place at least 75 shares. These numbers as many shares as we desire.
are adjustable: 25-out-of-100 means an expansion factor of 4x (every file in
the grid consumes four times as much space when totalled across all
StorageServers), but is highly reliable (the actual reliability is a binomial
distribution function of the expected availability of the individual peers,
but in general it goes up very quickly with the expansion factor).
If the file has been uploaded before (or if two uploads are happening at the If we are unable to place every share that we want, but we still managed to
same time), a peer might already have shares for the same file we are place a quantity known as "shares of happiness", we'll do the upload anyways.
proposing to send to them. In this case, those shares are removed from the If we cannot place at least this many, the upload is declared a failure.
list and assumed to be available (or will be soon). This reduces the number
of uploads that must be performed. The current defaults use k=3, shares_of_happiness=7, and N=10, meaning that
we'll try to place 10 shares, we'll be happy if we can place 7, and we need
to get back any 3 to recover the file. This results in a 3.3x expansion
factor. In general, you should set N about equal to the number of peers in
your grid, then set N/k to achieve your desired availability goals.
When downloading a file, the current release just asks all known peers for When downloading a file, the current release just asks all known peers for
any shares they might have, chooses the minimal necessary subset, then starts any shares they might have, chooses the minimal necessary subset, then starts
downloading and processing those shares. A later release will use the full downloading and processing those shares. A later release will use the full
algorithm to reduce the number of queries that must be sent out. This algorithm to reduce the number of queries that must be sent out. This
algorithm uses the same consistent-hashing permutation as on upload, but algorithm uses the same consistent-hashing permutation as on upload, but
instead of one walker with one basket, we have 100 walkers (one per share). stops after it has located k shares (instead of all N). This reduces the
They each proceed clockwise in parallel until they find a peer, and put that number of queries that must be sent before downloading can begin.
one on the "A" list: out of all peers, this one is the most likely to be the
same one to which the share was originally uploaded. The next peer that each
walker encounters is put on the "B" list, etc.
All the "A" list peers are asked for any shares they might have. If enough of The actual number of queries is directly related to the availability of the
them can provide a share, the download phase begins and those shares are peers and the degree of overlap between the peerlist used at upload and at
retrieved and decoded. If not, the "B" list peers are contacted, etc. This download. For stable grids, this overlap is very high, and usually the first
routine will eventually find all the peers that have shares, and will find k queries will result in shares. The number of queries grows as the stability
them quickly if there is significant overlap between the set of peers that decreases. Some limits may be imposed in large grids to avoid querying a
were present when the file was uploaded and the set of peers that are present million peers; this provides a tradeoff between the work spent to discover
as it is downloaded (i.e. if the "peerlist stability" is high). Some limits that a file is unrecoverable and the probability that a retrieval will fail
may be imposed in large grids to avoid querying a million peers; this when it could have succeeded if we had just tried a little bit harder. The
provides a tradeoff between the work spent to discover that a file is appropriate value of this tradeoff will depend upon the size of the grid, and
unrecoverable and the probability that a retrieval will fail when it could will change over time.
have succeeded if we had just tried a little bit harder. The appropriate
value of this tradeoff will depend upon the size of the grid, and will change
over time.
Other peer selection algorithms are being evaluated. One of them (known as Other peer selection algorithms are possible. One earlier version (known as
"tahoe 2") uses the same consistent hash, starts at 0 and requests one lease "tahoe 3") used the permutation to place the peers around a large ring,
per peer until it gets 100 of them. This is likely to get better overlap distributed shares evenly around the same ring, then walks clockwise from 0
(since a single insertion or deletion will still leave 99 overlapping peers), with a basket: each time we encounter a share, put it in the basket, each
but is non-ideal in other ways (TODO: what were they?). It would also make it time we encounter a peer, give them as many shares from our basket as they'll
easier to select peers on the basis of their reliability, uptime, or accept. This reduced the number of queries (usually to 1) for small grids
reputation: we could pick 75 good peers plus 50 marginal peers, if it seemed (where N is larger than the number of peers), but resulted in extremely
likely that this would provide as good service as 100 good peers. non-uniform share distribution, which significantly hurt reliability
(sometimes the permutation resulted in most of the shares being dumped on a
single peer).
Another algorithm (known as "denver airport"[2]) uses the permuted hash to Another algorithm (known as "denver airport"[2]) uses the permuted hash to
decide on an approximate target for each share, then sends lease requests via decide on an approximate target for each share, then sends lease requests via
@ -262,75 +266,177 @@ SWARMING DOWNLOAD, TRICKLING UPLOAD
Because the shares being downloaded are distributed across a large number of Because the shares being downloaded are distributed across a large number of
peers, the download process will pull from many of them at the same time. The peers, the download process will pull from many of them at the same time. The
current encoding parameters require 25 shares to be retrieved for each current encoding parameters require 3 shares to be retrieved for each
segment, which means that up to 25 peers will be used simultaneously. This segment, which means that up to 3 peers will be used simultaneously. For
allows the download process to use the sum of the available peers' upload larger networks, 25-of-100 encoding is preferred, meaning 25 peers can be
bandwidths, resulting in downloads that take full advantage of the common 8x used simultaneously. This allows the download process to use the sum of the
disparity between download and upload bandwith on modern ADSL lines. available peers' upload bandwidths, resulting in downloads that take full
advantage of the common 8x disparity between download and upload bandwith on
modern ADSL lines.
On the other hand, uploads are hampered by the need to upload encoded shares On the other hand, uploads are hampered by the need to upload encoded shares
that are larger than the original data (4x larger with the current default that are larger than the original data (3.3x larger with the current default
encoding parameters), through the slow end of the asymmetric connection. This encoding parameters), through the slow end of the asymmetric connection. This
means that on a typical 8x ADSL line, uploading a file will take about 32 means that on a typical 8x ADSL line, uploading a file will take about 32
times longer than downloading it again later. times longer than downloading it again later.
Smaller expansion ratios can reduce this upload penalty, at the expense of Smaller expansion ratios can reduce this upload penalty, at the expense of
reliability. See RELIABILITY, below. reliability. See RELIABILITY, below. A project known as "offloaded uploading"
can eliminate the penalty, if there is a node somewhere else in the network
that is willing to do the work of encoding and upload for you.
FILETREE: THE VIRTUAL DRIVE LAYER VDRIVE and DIRNODES: THE VIRTUAL DRIVE LAYER
The "virtual drive" layer is responsible for mapping human-meaningful The "virtual drive" layer is responsible for mapping human-meaningful
pathnames (directories and filenames) to pieces of data. The actual bytes pathnames (directories and filenames) to pieces of data. The actual bytes
inside these files are referenced by URI, but the "filetree" is where the inside these files are referenced by URI, but the "vdrive" is where the
directory names, file names, and metadata are kept. directory names, file names, and metadata are kept.
The current release has a very simplistic filetree model. There is a single In the current release, the virtual drive is a graph of "dirnodes". Each
globally-shared directory structure, which maps filename to URI. This dirnode represents a single directory, and thus contains a table of named
structure is maintained in a central node (which happens to be the same node children. These children are either other dirnodes or actual files. All
that houses the Introducer), by writing URIs to files in a local filesystem. children are referenced by their URI. Each client creates a "private vdrive"
dirnode at startup. The clients also receive access to a "global vdrive"
dirnode from the central introducer/vdrive server, which is shared between
all clients and serves as an easy demonstration of having multiple writers
for a single dirnode.
A future release (probably the next one) will offer each application the The dirnode itself has two forms of URI: one is read-write and the other is
ability to have a separate file tree. Each tree can reference others. Some read-only. The table of children inside the dirnode has a read-write and
trees are redirections, while others actually contain subdirectories full of read-only URI for each child. If you have a read-only URI for a given
filenames. The redirections may be mutable by some users but not by others, dirnode, you will not be able to access the read-write URI of the children.
allowing both read-only and read-write views of the same data. This will This results in "transitively read-only" dirnode access.
enable individual users to have their own personal space, with links to
spaces that are shared with specific other users, and other spaces that are
globally visible. Eventually the application layer will present these pieces
in a way that allows the sharing of a specific file or the creation of a
"virtual CD" as easily as dragging a folder onto a user icon.
The URIs described above are "Content Hash Key" (CHK) identifiers[3], in By having two different URIs, you can choose which you want to share with
which the identifier refers to a specific, unchangeable sequence of bytes. In someone else. If you create a new directory and share the read-write URI for
this project, CHK identifiers are used for both files and immutable versions it with a friend, then you will both be able to modify its contents. If
of directories: the tree of directory and file nodes is serialized into a instead you give them the read-only URI, then they will *not* be able to
sequence of bytes, which is then uploaded and turned into a URI. Each time modify the contents. Any URI that you receive can be attached to any dirnode
the directory is changed, a new URI is generated for it and propagated to the that you can modify, so very powerful shared+published directory structures
filetree above it. There is a separate kind of upload, not yet implemented, can be built from these components.
called SSK (short for Signed Subspace Key), in which the URI refers to a
mutable slot. Some users have a write-capability to this slot, allowing them This structure enable individual users to have their own personal space, with
to change the data that it refers to. Others only have a read-capability, links to spaces that are shared with specific other users, and other spaces
merely letting them read the current contents. These SSK slots can be used to that are globally visible. Eventually the application layer will present
provide mutability in the filetree, so that users can actually change the these pieces in a way that allows the sharing of a specific file or the
contents of their virtual drive. Redirection nodes can also provide creation of a "virtual CD" as easily as dragging a folder onto a user icon.
mutability, such as a central service which allows a user to set the current
URI of their top-level filetree. SSK slots provide a decentralized way to In the current release, these dirnodes are *not* distributed. Instead, each
accomplish this mutability, whereas centralized redirection nodes are more dirnode lives on a single host, in a file on it's local (physical) disk. In
vulnerable to single-point-of-failure issues. addition, all dirnodes are on the same host, known as the "Introducer And
VDrive Node". This simplifies implementation and consistency, but obviously
has a drastic effect on reliability: the file data can survive multiple host
failures, but the vdrive that points to that data cannot. Fixing this
situation is a high priority task.
LEASES, REFRESHING, GARBAGE COLLECTION
Shares are uploaded to a storage server, but they do not necessarily stay
there forever. We are anticipating three main share-lifetime management modes
for Tahoe: 1) per-share leases which expire, 2) per-account timers which
expire and cancel all leases for the account, and 3) centralized account
management without expiration timers.
Multiple clients may be interested in a given share, for example if two
clients uploaded the same file, or if two clients are sharing a directory and
both want to make sure the files therein remain available. Consequently, each
share (technically each "bucket", which may contain multiple shares for a
single storage index) has a set of leases, one per client. One way to
visualize this is with a large table, with shares (i.e. buckets, or storage
indices, or files) as the rows, and accounts as columns. Each square of this
table might hold a lease.
Using limited-duration leases reduces the storage consumed by clients who
have (for whatever reason) forgotten about the share they once cared about.
Clients are supposed to explicitly cancel leases for every file that they
remove from their vdrive, and when the last lease is removed on a share, the
storage server deletes that share. However, the storage server might be
offline when the client deletes the file, or the client might experience a
bug or a race condition that results in forgetting about the file. Using
leases that expire unless otherwise renewed ensures that these lost files
will not consume storage space forever. On the other hand, they require
periodic maintenance, which can become prohibitively expensive for large
grids. In addition, clients who go offline for a while are then obligated to
get someone else to keep their files alive for them.
In the first mode, each client holds a limited-duration lease on each share
(typically one month), and clients are obligated to periodically renew these
leases to keep them from expiring (typically once a week). In this mode, the
storage server does not know anything about which client is which: it only
knows about leases.
In the second mode, each server maintains a list of clients and which leases
they hold. This is called the "account list", and each time a client wants to
upload a share or establish a lease, it provides credentials to allow the
server to know which Account it will be using. Rather than putting individual
timers on each lease, the server puts a timer on the Account. When the
account expires, all of the associated leases are cancelled.
In this mode, clients are obligated to renew the Account periodically, but
not the (thousands of) individual share leases. Clients which forget about
files are still incurring a storage cost for those files. An occasional
reconcilliation process (in which the client presents the storage server with
a list of all the files it cares about, and the server removes leases for
anything that isn't on the list) can be used to free this storage, but the
effort involved is large, so reconcilliation must be done very infrequently.
Our plan is to have the clients create their own Accounts, based upon the
possession of a private key. Clients can create as many accounts as they
wish, but they are responsible for their own maintenance. Servers can add up
all the leases for each account and present a report of usage, in bytes per
account. This is intended for friendnet scenarios where it would be nice to
know how much space your friends are consuming on your disk.
In the third mode, the Account objects are centrally managed, and are not
expired by the storage servers. In this mode, the client presents credentials
that are issued by a central authority, such as a signed message which the
storage server can verify. The storage used by this account is not freed
unless and until the central account manager says so.
This mode is more appropriate for a commercial offering, in which use of the
storage servers is contingent upon a monthly fee, or other membership
criteria. Being able to ask the storage usage for each account (or establish
limits on it) helps to enforce whatever kind of membership policy is desired.
Each lease is created with a pair of secrets: the "renew secret" and the
"cancel secret". These are just random-looking strings, derived by hashing
other higher-level secrets, starting with a per-client master secret. Anyone
who knows the secret is allowed to restart the expiration timer, or cancel
the lease altogether. Having these be individual values allows the original
uploading node to delegate these capabilities to others.
In the current release, clients provide lease secrets to the storage server,
and each lease contains an expiration time, but there is no facility to
actually expire leases, nor are there explicit owners (the "ownerid" field of
each lease is always set to zero). In addition, many features have not been
implemented yet: the client should claim leases on files which are added to
the vdrive by linking (as opposed to uploading), and the client should cancel
leases on files which are removed from the vdrive, but neither has been
written yet. This means that shares are not ever deleted in this release.
FILE REPAIRER FILE REPAIRER
Each node is expected to explicitly drop leases on files that it knows it no Shares may go away because the storage server hosting them has suffered a
longer wants (the "delete" operation). Nodes are also expected to renew failure: either temporary downtime (affecting availability of the file), or a
leases on files that still exist in their filetrees. When nodes are offline permanent data loss (affecting the reliability of the file). Hard drives
for an extended period of time, their files may decay (both because of leases crash, power supplies explode, coffee spills, and asteroids strike. The goal
expiring and because of StorageServers going offline). A File Verifier is of a robust distributed filesystem is to survive these setbacks.
used to check on the health of any given file, and a File Repairer is used to
to keep desired files alive. The two are conceptually distinct (the repairer To work against this slow, continually loss of shares, a File Checker is used
is run if the verifier decides it is necessary), but in practice they will be to periodically count the number of shares still available for any given
closely related, and may run in the same process. file. A more extensive form of checking known as the File Verifier can
download the crypttext of the target file and perform integrity checks (using
strong hashes) to make sure the data is stil intact. When the file is found
to have decayed below some threshold, the File Repairer can be used to
regenerate and re-upload the missing shares. These processes are conceptually
distinct (the repairer is only run if the checker/verifier decides it is
necessary), but in practice they will be closely related, and may run in the
same process.
The repairer process does not get the full URI of the file to be maintained: The repairer process does not get the full URI of the file to be maintained:
it merely gets the "repairer capability" subset, which does not include the it merely gets the "repairer capability" subset, which does not include the
@ -368,32 +474,32 @@ The design goal for this project is that an attacker may be able to deny
service (i.e. prevent you from recovering a file that was uploaded earlier) service (i.e. prevent you from recovering a file that was uploaded earlier)
but can accomplish none of the following three attacks: but can accomplish none of the following three attacks:
1) violate privacy: the attacker gets to view data to which you have not 1) violate confidentiality: the attacker gets to view data to which you have
granted them access not granted them access
2) violate consistency: the attacker convinces you that the wrong data is 2) violate consistency: the attacker convinces you that the wrong data is
actually the data you were intending to retrieve actually the data you were intending to retrieve
3) violate mutability: the attacker gets to modify a filetree (either the 3) violate mutability: the attacker gets to modify a dirnode (either the
pathnames or the file contents) to which you have not given them pathnames or the file contents) to which you have not given them
mutability rights mutability rights
Data validity and consistency (the promise that the downloaded data will Data validity and consistency (the promise that the downloaded data will
match the originally uploaded data) is provided by the hashes embedded the match the originally uploaded data) is provided by the hashes embedded the
URI. Data security (the promise that the data is only readable by people with URI. Data confidentiality (the promise that the data is only readable by
the URI) is provided by the encryption key embedded in the URI. Data people with the URI) is provided by the encryption key embedded in the URI.
availability (the hope that data which has been uploaded in the past will be Data availability (the hope that data which has been uploaded in the past
downloadable in the future) is provided by the grid, which distributes will be downloadable in the future) is provided by the grid, which
failures in a way that reduces the correlation between individual node distributes failures in a way that reduces the correlation between individual
failure and overall file recovery failure. node failure and overall file recovery failure.
Many of these security properties depend upon the usual cryptographic Many of these security properties depend upon the usual cryptographic
assumptions: the resistance of AES and RSA to attack, the resistance of assumptions: the resistance of AES and RSA to attack, the resistance of
SHA256 to pre-image attacks, and upon the proximity of 2^-128 and 2^-256 to SHA256 to pre-image attacks, and upon the proximity of 2^-128 and 2^-256 to
zero. A break in AES would allow a privacy violation, a pre-image break in zero. A break in AES would allow a confidentiality violation, a pre-image
SHA256 would allow a consistency violation, and a break in RSA would allow a break in SHA256 would allow a consistency violation, and a break in RSA would
mutability violation. The discovery of a collision in SHA256 is unlikely to allow a mutability violation. The discovery of a collision in SHA256 is
allow much, but could conceivably allow a consistency violation in data that unlikely to allow much, but could conceivably allow a consistency violation
was uploaded by the attacker. If SHA256 is threatened, further analysis will in data that was uploaded by the attacker. If SHA256 is threatened, further
be warranted. analysis will be warranted.
There is no attempt made to provide anonymity, neither of the origin of a There is no attempt made to provide anonymity, neither of the origin of a
piece of data nor the identity of the subsequent downloaders. In general, piece of data nor the identity of the subsequent downloaders. In general,
@ -403,34 +509,36 @@ for a coalition of more than 1% of the nodes to correlate the set of peers
who are all uploading or downloading the same file, even if the attacker does who are all uploading or downloading the same file, even if the attacker does
not know the contents of the file in question. not know the contents of the file in question.
Also note that the file size and verifierid are not protected. Many people Also note that the file size and (when convergence is being used) a keyed
can determine the size of the file you are accessing, and if they already hash of the plaintext are not protected. Many people can determine the size
know the contents of a given file, they will be able to determine that you of the file you are accessing, and if they already know the contents of a
are uploading or downloading the same one. given file, they will be able to determine that you are uploading or
downloading the same one.
A likely enhancement is the ability to use distinct encryption keys for each A likely enhancement is the ability to use distinct encryption keys for each
file, avoiding the file-correlation attacks at the expense of increased file, avoiding the file-correlation attacks at the expense of increased
storage consumption. storage consumption. This is known as "non-convergent" encoding.
The capability-based security model is used throughout this project. Filetree The capability-based security model is used throughout this project. dirnode
operations are expressed in terms of distinct read and write capabilities. operations are expressed in terms of distinct read and write capabilities.
The URI of a file is the read-capability: knowing the URI is equivalent to The URI of a file is the read-capability: knowing the URI is equivalent to
the ability to read the corresponding data. The capability to validate and the ability to read the corresponding data. The capability to validate and
repair a file is a subset of the read-capability. The capability to read an repair a file is a subset of the read-capability. When distributed dirnodes
SSK slot is a subset of the capability to modify it. These capabilities may are implemented (with SSK slots), the capability to read an SSK slot will be
be expressly delegated (irrevocably) by simply transferring the relevant a subset of the capability to modify it. These capabilities may be expressly
secrets. Special forms of SSK slots can be used to make revocable delegations delegated (irrevocably) by simply transferring the relevant secrets. Special
of particular directories. Certain redirections in the filetree code are forms of SSK slots can be used to make revocable delegations of particular
expressed as Foolscap "furls", which are also capabilities and provide access directories. Dirnode references contain Foolscap "FURLs", which are also
to an instance of code running on a central server: these can be delegated capabilities and provide access to an instance of code running on a central
just as easily as any other capability, and can be made revocable by server: these can be delegated just as easily as any other capability, and
delegating access to a forwarder instead of the actual target. can be made revocable by delegating access to a forwarder instead of the
actual target.
The application layer can provide whatever security/access model is desired, The application layer can provide whatever security/access model is desired,
but we expect the first few to also follow capability discipline: rather than but we expect the first few to also follow capability discipline: rather than
user accounts with passwords, each user will get a furl to their private user accounts with passwords, each user will get a FURL to their private
filetree, and the presentation layer will give them the ability to break off dirnode, and the presentation layer will give them the ability to break off
pieces of this filetree for delegation or sharing with others on demand. pieces of this vdrive for delegation or sharing with others on demand.
RELIABILITY RELIABILITY
@ -489,6 +597,11 @@ still retaining high reliability, but large unstable grids (where nodes are
coming and going very quickly) may require more repair/verification bandwidth coming and going very quickly) may require more repair/verification bandwidth
than actual upload/download traffic. than actual upload/download traffic.
Tahoe nodes that run a webserver have a page dedicated to provisioning
decisions: this tool may help you evaluate different expansion factors and
view the disk consumption of each. It is also acquiring some sections with
availability/reliability numbers, as well as preliminary cost analysis data.
This tool will continue to evolve as our analysis improves.
------------------------------ ------------------------------
@ -496,12 +609,8 @@ than actual upload/download traffic.
[2]: all of these names are derived from the location where they were [2]: all of these names are derived from the location where they were
concocted, in this case in a car ride from Boulder to DEN. To be concocted, in this case in a car ride from Boulder to DEN. To be
precise, "tahoe 1" was an unworkable scheme in which everyone holding precise, "tahoe 1" was an unworkable scheme in which everyone who holds
shares for a given file formed a sort of cabal which kept track of all shares for a given file would form a sort of cabal which kept track of
the others, "tahoe 2" is the first-100-peers in the permuted hash, and all the others, "tahoe 2" is the first-100-peers in the permuted hash,
this document descibes "tahoe 3", or perhaps "potrero hill 1". and this document descibes "tahoe 3", or perhaps "potrero hill 1".
[3]: the terms CHK and SSK come from Freenet,
http://wiki.freenetproject.org/FreenetCHKPages ,
although we use "SSK" in a slightly different way