mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-04-07 10:56:49 +00:00
docs/dirnodes.txt: rewrite to reflect 0.7.0's RSA-based SDMF dirnodes
This commit is contained in:
parent
583cc34d2f
commit
f4c0167552
@ -1,9 +1,3 @@
|
||||
NOTE: this file starts by explaining old-style (no longer used) centralized
|
||||
directories, but it also has useful discussion of security, efficiency, and
|
||||
usage, so I'm not removing it from the source code distribution just yet.
|
||||
Hopefully its contents which are still relevant will be reworked into a new
|
||||
document.
|
||||
|
||||
|
||||
= Tahoe Directory Nodes =
|
||||
|
||||
@ -21,18 +15,22 @@ This document examines the middle layer, the "filesystem".
|
||||
|
||||
== DHT Primitives ==
|
||||
|
||||
In the lowest layer (DHT), we've defined two operations thus far, both of
|
||||
which refer to "CHK URIs", which reference immutable data:
|
||||
In the lowest layer (DHT), there are two operations that reference immutable
|
||||
data (which we refer to as "CHK URIs" or "CHK read-capabilities" or "CHK
|
||||
read-caps"). One puts data into the grid (but only if it doesn't exist
|
||||
already), the other retrieves it:
|
||||
|
||||
chk_uri = put(data)
|
||||
data = get(chk_uri)
|
||||
|
||||
We anticipate creating mutable slots in the DHT layer at some point, which
|
||||
will add some new operations to this layer:
|
||||
We also have three operations which reference mutable data (which we refer to
|
||||
as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK
|
||||
slots"). One creates a slot with some initial contents, a second replaces the
|
||||
contents of a pre-existing slot, and the third retrieves the contents:
|
||||
|
||||
slotname = create_slot()
|
||||
set(slotname, data)
|
||||
data = get(slotname)
|
||||
mutable_uri = create(initial_data)
|
||||
replace(mutable_uri, new_data)
|
||||
data = get(mutable_uri)
|
||||
|
||||
== Filesystem Goals ==
|
||||
|
||||
@ -56,8 +54,8 @@ particular order:
|
||||
|
||||
1: functional. Code which does not work doesn't count.
|
||||
2: easy to document, explain, and understand
|
||||
3: private: it should not be possible for others to see the contents of a
|
||||
directory
|
||||
3: confidential: it should not be possible for others to see the contents of
|
||||
a directory
|
||||
4: integrity: it should not be possible for others to modify the contents
|
||||
of a directory
|
||||
5: available: directories should survive host failure, just like files do
|
||||
@ -67,102 +65,97 @@ particular order:
|
||||
9: monotonicity: everybody looking at a directory should see the same
|
||||
sequence of updates
|
||||
|
||||
We do not meet all of these goals. For the current release, we favored #1,
|
||||
#2, and #7 above the rest, which led us to the following design. In a later
|
||||
section, we discuss some alternate designs and potential changes to the
|
||||
existing code that can help us achieve the other goals.
|
||||
Some of these goals are mutually exclusive. For example, availability and
|
||||
consistency are opposing, so it is not possible to achieve #5 and #8 at the
|
||||
same time. Moreover, it takes a more complex architecture to get close to the
|
||||
available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.
|
||||
|
||||
Tahoe-0.7.0 introduced distributed mutable files, which use public key
|
||||
cryptography for integrity, and erasure coding for availability. These
|
||||
achieve roughly the same properties as immutable CHK files, but their
|
||||
contents can be replaced without changing their identity. Dirnodes are then
|
||||
just a special way of interpreting the contents of a specific mutable file.
|
||||
Earlier releases used a "vdrive server": this server was abolished in the
|
||||
0.7.0 release.
|
||||
|
||||
For details of how mutable files work, please see "mutable.txt" in this
|
||||
directory.
|
||||
|
||||
For the current 0.7.0 release, we achieve most of our desired properties. The
|
||||
integrity and availability of dirnodes is equivalent to that of regular
|
||||
(immutable) files, with the exception that there are more simultaneous-update
|
||||
failure modes for mutable slots. Delegation is quite strong: you can give
|
||||
read-write or read-only access to any subtree, and the data format used for
|
||||
dirnodes is such that read-only access is transitive: i.e. if you grant Bob
|
||||
read-only access to a parent directory, then Bob will get read-only access
|
||||
(and *not* read-write access) to its children.
|
||||
|
||||
Relative to the previous "vdrive-server" based scheme, the current
|
||||
distributed dirnode approach gives better availability, but cannot guarantee
|
||||
updateness quite as well, and requires far more network traffic for each
|
||||
retrieval and update. Mutable files are somewhat less available than
|
||||
immutable files, simply because of the increased number of combinations
|
||||
(shares of an immutable file are either present or not, whereas there are
|
||||
multiple versions of each mutable file, and you might have some shares of
|
||||
version 1 and other shares of version 2). In extreme cases of simultaneous
|
||||
update, mutable files might suffer from non-monotonicity.
|
||||
|
||||
In tahoe-0.4.0, each "dirnode" is stored as a file on a single "vdrive
|
||||
server". The name of this file is an unguessable string. The contents are an
|
||||
encrypted representation of the directory's name-to-child mapping. Foolscap
|
||||
is used to provide remote access to this file. A collection of "directory
|
||||
URIs" are used to hold all the parameters necessary to access, read, and
|
||||
write this dirnode.
|
||||
|
||||
== Dirnode secret values ==
|
||||
|
||||
Each dirnode begins life as a "writekey", a randomly-generated AES key. This
|
||||
key is hashed (using a tagged hash, see src/allmydata/util/hashutil.py for
|
||||
details) to form the "readkey". The readkey is hashed to form the "storage
|
||||
index". The writekey is hashed with a different tag to form the "write
|
||||
enabler".
|
||||
|
||||
Clients who have read-write access to the dirnode know the writekey, and can
|
||||
derive all the other secrets from it. Clients with merely read-only access to
|
||||
the dirnode know the readkey (and can derive the storage index), but do not
|
||||
know the writekey or the write enabler. The vdrive server knows only the
|
||||
storage index and the write enabler.
|
||||
|
||||
== Dirnode capability URIs ==
|
||||
As mentioned before, dirnodes are simply a special way to interpret the
|
||||
contents of a mutable file, so the secret keys and capability strings
|
||||
described in "mutable.txt" are all the same. Each dirnode contains an RSA
|
||||
public/private keypair, and the holder of the "write capability" will be able
|
||||
to retrieve the private key (as well as the AES encryption key used for the
|
||||
data itself). The holder of the "read capability" will be able to obtain the
|
||||
public key and the AES data key, but not the RSA private key needed to modify
|
||||
the data.
|
||||
|
||||
The "write capability" for a dirnode grants read-write access to its
|
||||
contents. This is expressed on concrete form as the "dirnode write URI": a
|
||||
printable string which contains the following pieces of information:
|
||||
|
||||
furl of the vdrive server hosting this dirnode
|
||||
writekey
|
||||
|
||||
The "read capability" grants read-only access to a dirnode, and its "dirnode
|
||||
read URI" contains:
|
||||
|
||||
furl of the vdrive server hosting this dirnode
|
||||
readkey
|
||||
contents. This is expressed on concrete form as the "dirnode write cap": a
|
||||
printable string which contains the necessary secrets to grant this access.
|
||||
Likewise, the "read capability" grants read-only access to a dirnode, and can
|
||||
be represented by a "dirnode read cap" string.
|
||||
|
||||
For example,
|
||||
URI:DIR:pb://xextf3eap44o3wi27mf7ehiur6wvhzr6@207.7.153.180:56677,127.0.0.1:56677/vdrive:shrrn75qq3x7uxfzk326ncahd4======
|
||||
URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
|
||||
is a write-capability URI, while
|
||||
URI:DIR-RO:pb://xextf3eap44o3wi27mf7ehiur6wvhzr6@207.7.153.180:56677,127.0.0.1:56677/vdrive:4c2legsthoe52qywuaturgwdrm======
|
||||
URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
|
||||
is a read-capability URI, both for the same dirnode.
|
||||
|
||||
|
||||
== Dirnode storage format ==
|
||||
|
||||
Each dirnode is stored in a single file, saved on the vdrive server, using
|
||||
the (base32-encoded) storage index as a filename. The contents of this file
|
||||
are a serialized dictionary which maps H_name (explained below) to a tuple
|
||||
with three values: (E_name, E_write, E_read). The vdrive server is made
|
||||
available as a Foolscap "Referenceable" object, with the following
|
||||
operations:
|
||||
Each dirnode is stored in a single mutable file, distributed in the Tahoe
|
||||
grid. The contents of this file are a serialized list of netstrings, one per
|
||||
child. Each child is a list of four netstrings: (name, rocap, rwcap,
|
||||
metadata). (remember that the contents of the mutable file are encrypted by
|
||||
the read-cap, so this section describes the plaintext contents of the mutable
|
||||
file, *after* it has been decrypted by the read-cap).
|
||||
|
||||
create_dirnode(index, write_enabler) -> None
|
||||
list(index) -> list of (E_name, E_write, E_read) tuples
|
||||
get(index, H_name) -> (E_write, E_read)
|
||||
set(index, write_enabler, H_name, E_name, E_write, E_read)
|
||||
delete(index, write_enabler, H_name)
|
||||
The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only
|
||||
capability URI to that child, either an immutable (CHK) file, a mutable file,
|
||||
or a directory. The 'rwcap' is a read-write capability URI for that child,
|
||||
encrypted with the dirnode's write-cap: this enables the "transitive
|
||||
readonlyness" property, described further below. The 'metadata' is a
|
||||
JSON-encoded dictionary of type,value metadata pairs. Some metadata keys are
|
||||
pre-defined, the rest are left up to the application.
|
||||
|
||||
For any given entry of this dictionary, the following values are obtained by
|
||||
hashing or encryption:
|
||||
Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random
|
||||
value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI
|
||||
string, using a key that is formed from a tagged hash of the IV and the
|
||||
dirnode's writekey. The MAC is a 32-byte SHA-256 -based HMAC (using that same
|
||||
AES key) over the (IV+ciphertext) pair.
|
||||
|
||||
H_name is the hash of the readkey and the child's name.
|
||||
E_name is the child's name, encrypted with the readkey
|
||||
E_write is the child's write-URI, encrypted with the writekey
|
||||
E_read is the child's read-URI, encrypted with the readkey
|
||||
|
||||
All encryption uses AES in CTR mode, in which the high-order 10 or 12 bytes
|
||||
of the 16-byte key are used as an IV (randomly chosen each time the data is
|
||||
changed), and the remaining bytes are used as the CTR-mode offset. An
|
||||
HMAC-SHA256 is computed for each encrypted value and stored alongside. The
|
||||
stored E_name/E_write/E_read values are thus the concatenation of IV,
|
||||
encrypted data, and HMAC.
|
||||
|
||||
When a new dirnode is created, it records the write_enabler. All operations
|
||||
that modify an existing dirnode (set and delete) require the write_enabler be
|
||||
presented.
|
||||
|
||||
This approach insures that clients who do not have the read or write keys
|
||||
(including the vdrive server, which knows the storage index but not the keys)
|
||||
will be unable to see any of the contents of the dirnode. Clients who have
|
||||
the readkey but not the writekey will not be allowed to modify the dirnode.
|
||||
The H_name value allows clients to perform lookups of specific keys rather
|
||||
than requiring them to download the whole dirnode for each operation.
|
||||
|
||||
By putting both read-only and read-write child access capabilities in each
|
||||
entry, encrypted by different keys, this approach provides transitive
|
||||
read-only-ness: if a client has only a readkey for the parent dirnode, they
|
||||
will only get readkeys (and not writekeys) for any children, including other
|
||||
directories. When we create mutable slots in the mesh and we start having
|
||||
read-write file URIs, we can use the same approach to insure that read-only
|
||||
access to a directory means read-only access to the files as well.
|
||||
If Bob has read-only access to the 'bar' directory, and he adds it as a child
|
||||
to the 'foo' directory, then he will put the read-only cap for 'bar' in both
|
||||
the rwcap and rocap slots (encrypting the rwcap contents as described above).
|
||||
If he has full read-write access to 'bar', then he will put the read-write
|
||||
cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since
|
||||
other users who have read-only access to 'foo' will be unable to decrypt its
|
||||
rwcap slot, this limits those users to read-only access to 'bar' as well,
|
||||
thus providing the transitive readonlyness that we desire.
|
||||
|
||||
|
||||
== Design Goals, redux ==
|
||||
@ -171,11 +164,13 @@ How well does this design meet the goals?
|
||||
|
||||
#1 functional: YES: the code works and has extensive unit tests
|
||||
#2 documentable: YES: this document is the existence proof
|
||||
#3 private: MOSTLY: see the discussion below
|
||||
#4 integrity: MOSTLY: the vdrive server can rollback individual slots
|
||||
#5 availability: BARELY: if the vdrive server is offline, the dirnode will
|
||||
be unuseable. If the vdrive server fails,
|
||||
the dirnode will be lost forever.
|
||||
#3 confidential: YES: see below
|
||||
#4 integrity: MOSTLY: a coalition of storage servers can rollback individual
|
||||
mutable files, but not a single one. No server can
|
||||
substitute fake data as genuine.
|
||||
#5 availability: YES: as long as 'k' storage servers are present and have
|
||||
the same version of the mutable file, the dirnode will
|
||||
be available.
|
||||
#6 efficient: MOSTLY:
|
||||
network: single dirnode lookup is very efficient, since clients can
|
||||
fetch specific keys rather than being required to get or set
|
||||
@ -198,98 +193,71 @@ How well does this design meet the goals?
|
||||
|
||||
|
||||
|
||||
=== Privacy leaks in the vdrive server ===
|
||||
=== Confidentiality leaks in the vdrive server ===
|
||||
|
||||
Dirnodes are very private against other clients: traffic between the client
|
||||
and the vdrive server is protected by the Foolscap SSL connection, so they
|
||||
can observe very little. Storage index values are hashes of secrets and thus
|
||||
unguessable, and they are not made public, so other clients cannot snoop
|
||||
through encrypted dirnodes that they have not been told about.
|
||||
Dirnode (and the mutable files upon which they are based) are very private
|
||||
against other clients: traffic between the client and the storage servers is
|
||||
protected by the Foolscap SSL connection, so they can observe very little.
|
||||
Storage index values are hashes of secrets and thus unguessable, and they are
|
||||
not made public, so other clients cannot snoop through encrypted dirnodes
|
||||
that they have not been told about.
|
||||
|
||||
On the other hand, the vdrive server gets to see the access patterns of each
|
||||
client who is using dirnodes hosted there. The childnames and URIs are
|
||||
encrypted and not visible to anyone (including the vdrive server), but the
|
||||
vdrive server is in a good position to infer a lot of data about the
|
||||
directory structure. It knows the length of all childnames, and from the
|
||||
length of the child URIs themselves it can tell whether children are file
|
||||
URIs vs. directory URIs vs read-only directory URIs. By watching a client's
|
||||
access patterns it can deduce the connection between (encrypted) child 1 and
|
||||
target directory 2 (i.e. if the client does a 'get' of the first child, then
|
||||
immediately does an operation on directory 2, it can assume the two are
|
||||
related. From this the vdrive server can build a graph with the same shape as
|
||||
the filesystem, even though the nodes and edges will be unlabled.
|
||||
|
||||
By providing CHK-level storage services as well (or colluding with a server
|
||||
who is), the vdrive server can infer the storage index of file nodes that are
|
||||
downloaded shortly after their childname is looked up.
|
||||
Storage servers can observe access patterns and see ciphertext, but they
|
||||
cannot see the plaintext (of child names, metadata, or URIs). If an attacker
|
||||
operates a significant number of storage servers, they can infer the shape of
|
||||
the directory structure by assuming that directories are usually accessed
|
||||
from root to leaf in rapid succession. Since filenames are usually much
|
||||
shorter than read-caps and write-caps, the attacker can use the length of the
|
||||
ciphertext to guess the number of children of each node, and might be able to
|
||||
guess the length of the child names (or at least their sum). From this, the
|
||||
attacker may be able to build up a graph with the same shape as the plaintext
|
||||
filesystem, but with unlabeled edges and unknown file contents.
|
||||
|
||||
|
||||
=== Integrity failures in the vdrive server ===
|
||||
|
||||
The HMAC prevents the vdrive server from modifying the child names or child
|
||||
URI values without detection: changing a few bytes will cause an HMAC failure
|
||||
that the client can detect. This means the vdrive server can make the dirnode
|
||||
The mutable file's integrity mechanism (RSA signature on the hash of the file
|
||||
contents) prevents the storage server from modifying the dirnode's contents
|
||||
without detection. Therefore the storage servers can make the dirnode
|
||||
unavailable, but not corrupt it.
|
||||
|
||||
However, the vdrive server can perform a rollback attack: either replacing an
|
||||
individual entry in the encrypted table with an old version, or replacing the
|
||||
entire table. Despite not knowing what the child names or URIs are, the
|
||||
vdrive server can undo changes made by authorized clients. It could also
|
||||
perform selective rollback, showing different clients different versions of
|
||||
the filesystem. To solve this problem either requires mutable data (like a
|
||||
sequence number or hash) to be stored in the URI which points to this dirnode
|
||||
(rendering them non-constant, and losing most of their value), or requires
|
||||
spreading the dirnode out over multiple non-colluding servers (which might
|
||||
improve availability but makes updateness and monotonicity harder).
|
||||
A sufficient number of colluding storage servers can perform a rollback
|
||||
attack: replace all shares of the whole mutable file with an earlier version.
|
||||
When retrieving the contents of a mutable file, the client queries more than
|
||||
one server and uses the highest available version number. This insures that
|
||||
one or two misbehaving storage servers cannot cause this rollback on their
|
||||
own.
|
||||
|
||||
|
||||
=== Improving the availability of dirnodes ===
|
||||
|
||||
Clearly it is somewhat disappointing to have a sexy distributed filestore at
|
||||
the bottom layer and then have a single-point-of-failure vdrive server on top
|
||||
of it. However, this approach meets many of the design goals and is extremely
|
||||
simple to explain and implement. There are many avenues to improve the
|
||||
reliability and availability of dirnodes. (note that reliability and
|
||||
availability can be separate goals).
|
||||
|
||||
A simple way to improve the reliability of dirnodes would be to make the
|
||||
vdrive server be responsible for saving the dirnode contents in a fashion
|
||||
that will survive the failure of its local disk, for example by simply
|
||||
rsync'ing all the dirnodes off to a separate machine on a periodic basis, and
|
||||
pulling them back in the case of disk failure.
|
||||
|
||||
To improve availability, we must allow clients to access their dirnodes even
|
||||
if the vdrive server is offline. The first step here is to create multiple
|
||||
vdrive servers, putting a list of furls into the DIR:URI, with instructions
|
||||
to update all of them during write, and accept the first answer that comes
|
||||
back during read. This introduces issues of updateness and monotonicity: if a
|
||||
dirnode is changed while one of the vdrive servers is offline, the servers
|
||||
will diverge, and subsequent clients will see different contents depending
|
||||
upon which server they ask.
|
||||
|
||||
A more comforting way to improve both reliability and availability is to
|
||||
spread the dirnodes out over the mesh in the same way that CHK files work.
|
||||
The general name for this approach is the "SSK directory slot", a structure
|
||||
for keeping a mutable slot on multiple hosts, setting and retrieving its
|
||||
contents at various times, and reconciling differences by comparing sequence
|
||||
numbers. The "slot name" is the hash of a public key, which is also used to
|
||||
sign updates, such that the SSK storage hosts will only accept updates from
|
||||
those in possession of the corresponding private key. This approach (although
|
||||
not yet implemented) will provide fairly good reliability and availability
|
||||
properties, at the expense of complexity and updateness/monotonicity. It can
|
||||
also improve integrity, since an attacker would have to corrupt multiple
|
||||
storage servers to successfully perform a rollback attack.
|
||||
|
||||
Reducing centralization can improve reliability, as long as the overall
|
||||
reliability of the mesh is greater than the reliability of the original
|
||||
centralized services.
|
||||
|
||||
=== Improving the efficiency of dirnodes ===
|
||||
|
||||
By storing each child of a dirnode in a separate element of the dictionary,
|
||||
we provide efficient directory traversal and clean+simple dirnode delegation
|
||||
behavior. This comes at the cost of efficiency for other operations,
|
||||
specifically things that operation on multiple dirnodes at once.
|
||||
The current mutable-file -based dirnode scheme suffers from certain
|
||||
inefficiencies. A very large directory (with thousands or millions of
|
||||
children) will take a significant time to extract any single entry, because
|
||||
the whole file must be downloaded first, then parsed and searched to find the
|
||||
desired child entry. Likewise, modifying a single child will require the
|
||||
whole file to be re-uploaded.
|
||||
|
||||
The current design assumes (and in some cases, requires) that dirnodes remain
|
||||
small. The mutable files on which dirnodes are based are currently using
|
||||
"SDMF" ("Small Distributed Mutable File") design rules, which state that the
|
||||
size of the data shall remain below one megabyte. More advanced forms of
|
||||
mutable files (MDMF and LDMF) are in the design phase to allow efficient
|
||||
manipulation of larger mutable files. This would reduce the work needed to
|
||||
modify a single entry in a large directory.
|
||||
|
||||
Judicious caching may help improve the reading-large-directory case. Some
|
||||
form of mutable index at the beginning of the dirnode might help as well. The
|
||||
MDMF design rules allow for efficient random-access reads from the middle of
|
||||
the file, which would give the index something useful to point at.
|
||||
|
||||
The current SDMF design generates a new RSA public/private keypair for each
|
||||
directory. This takes considerable time and CPU effort, generally one or two
|
||||
seconds per directory. We have designed (but not yet built) a DSA-based
|
||||
mutable file scheme which will use shared parameters to reduce the
|
||||
directory-creation effort to a bare minimum (picking a random number instead
|
||||
of generating two random primes).
|
||||
|
||||
|
||||
When a backup program is run for the first time, it needs to copy a large
|
||||
amount of data from a pre-existing filesystem into reliable storage. This
|
||||
@ -305,11 +273,11 @@ whole block of data (and presumeably cache it for a while to avoid lots of
|
||||
re-fetches), and modification operations would need to replace the whole
|
||||
thing at once. This "realm" approach would have the added benefit of
|
||||
combining more data into a single encrypted bundle (perhaps hiding the shape
|
||||
of the graph from the vdrive server better), and would reduce round-trips
|
||||
when performing deep directory traversals (assuming the realm was already
|
||||
cached). It would also prevent fine-grained rollback attacks from working:
|
||||
the vdrive server could change the entire dirnode to look like an earlier
|
||||
state, but it could not independently roll back individual edges.
|
||||
of the graph from a determined attacker), and would reduce round-trips when
|
||||
performing deep directory traversals (assuming the realm was already cached).
|
||||
It would also prevent fine-grained rollback attacks from working: a coalition
|
||||
of storage servers could change the entire realm to look like an earlier
|
||||
state, but it could not independently roll back individual directories.
|
||||
|
||||
The drawbacks of this aggregation would be that small accesses (adding a
|
||||
single child, looking up a single child) would require pulling or pushing a
|
||||
@ -324,7 +292,10 @@ all-or-nothing access control, the act of delegating any directory from the
|
||||
middle of the realm would require the realm first be split into the upper
|
||||
piece that isn't being shared and the lower piece that is. This splitting
|
||||
would have to be done in response to what is essentially a read operation,
|
||||
which is not traditionally supposed to be a high-effort action.
|
||||
which is not traditionally supposed to be a high-effort action. On the other
|
||||
hand, it may be possible to aggregate the ciphertext, but use distinct
|
||||
encryption keys for each component directory, to get the benefits of both
|
||||
schemes at once.
|
||||
|
||||
|
||||
=== Dirnode expiration and leases ===
|
||||
@ -333,16 +304,16 @@ Dirnodes are created any time a client wishes to add a new directory. How
|
||||
long do they live? What's to keep them from sticking around forever, taking
|
||||
up space that nobody can reach any longer?
|
||||
|
||||
Our plan is to define the vdrive servers to keep dirnodes alive with
|
||||
"leases". Clients which know and care about specific dirnodes can ask to keep
|
||||
them alive for a while, by renewing a lease on them (with a typical period of
|
||||
one month). Clients are expected to assist in the deletion of dirnodes by
|
||||
canceling their leases as soon as they are done with them. This means that
|
||||
when a client deletes a directory, it should also cancel its lease on that
|
||||
directory. When the lease count on a dirnode goes to zero, the vdrive server
|
||||
can delete the related storage. Multiple clients may all have leases on the
|
||||
same dirnode: the server may delete the dirnode only after all of the leases
|
||||
have gone away.
|
||||
Mutable files are created with limited-time "leases", which keep the shares
|
||||
alive until the last lease has expired or been cancelled. Clients which know
|
||||
and care about specific dirnodes can ask to keep them alive for a while, by
|
||||
renewing a lease on them (with a typical period of one month). Clients are
|
||||
expected to assist in the deletion of dirnodes by canceling their leases as
|
||||
soon as they are done with them. This means that when a client deletes a
|
||||
directory, it should also cancel its lease on that directory. When the lease
|
||||
count on a given share goes to zero, the storage server can delete the
|
||||
related storage. Multiple clients may all have leases on the same dirnode:
|
||||
the server may delete the shares only after all of the leases have gone away.
|
||||
|
||||
We expect that clients will periodically create a "manifest": a list of
|
||||
so-called "refresh capabilities" for all of the dirnodes and files that they
|
||||
@ -362,13 +333,9 @@ to be deleted.
|
||||
== Starting Points: root dirnodes ==
|
||||
|
||||
Any client can record the URI of a directory node in some external form (say,
|
||||
in a local file) and use it as the starting point of later traversal. The
|
||||
current vdrive servers are configured to create a "root" dirnode at startup
|
||||
and publish its URI to the world: this forms the basis of the "global shared
|
||||
vdrive" used in the demonstration application. In addition, client code is
|
||||
currently designed to create a new (unattached) dirnode at startup and record
|
||||
its URI: this forms the root of the "per-user private vdrive" presented as
|
||||
the "~" directory.
|
||||
in a local file) and use it as the starting point of later traversal. Each
|
||||
Tahoe user is expected to create a new (unattached) dirnode when they first
|
||||
start using the grid, and record its URI for later use.
|
||||
|
||||
== Mounting and Sharing Directories ==
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user