docs/uri.txt: update to reflect mutable files

This commit is contained in:
Brian Warner 2008-02-14 17:59:29 -07:00
parent 7927495cbe
commit 92640dc6e5

View File

@ -4,7 +4,8 @@
Each file and directory in a Tahoe filesystem is described by a "URI". There
are different kinds of URIs for different kinds of objects, and there are
different kinds of URIs to provide different kinds of access to those
objects.
objects. Each URI is a string representation of a "capability" or "cap", and
there are read-caps, write-caps, verify-caps, and others.
Each URI provides both '''location''' and '''identification''' properties.
'''location''' means that holding the URI is sufficient to locate the data it
@ -17,8 +18,9 @@ limited in their abilities by the identification properties of the URI.
Some URIs are subsets of others. In particular, if you know a URI which
allows you to modify some object, you can produce a weaker read-only URI and
give it to someone else, and they will be able to read that object but not
modify it. Each URI represents some '''capability''', and some capabilities
are derived from others.
modify it. Directories, for example, have a read-cap which is derived from
the write-cap: anyone with read/write access to the directory can produce a
limited URI that grants read-only access, but not the other way around.
source:src/allmydata/uri.py is the main place where URIs are processed. It is
the authoritative definition point for all the the URI types described
@ -30,10 +32,13 @@ The lowest layer of the Tahoe architecture (the "grid") is reponsible for
mapping URIs to data. This is basically a distributed hash table, in which
the URI is the key, and some sequence of bytes is the value.
At present, all the entries in this DHT are immutable. That means that each
URI represents a fixed chunk of data. The URI itself is derived from the data
when it is uploaded into the grid, and can be used to locate and download
that data from the grid at some time in the future.
There are two kinds of entries in this table: immutable and mutable. For
immutable entries, the URI represents a fixed chunk of data. The URI itself
is derived from the data when it is uploaded into the grid, and can be used
to locate and download that data from the grid at some time in the future.
For mutable entries, the URI identifies a "slot" or "container", which can be
filled with different pieces of data at different times.
It is important to note that the "files" described by these URIs are just a
bunch of bytes, and that __no__ filenames or other metadata is retained at
@ -49,7 +54,7 @@ computed to help validate the data afterwards (providing the "identification"
property). All of these pieces, plus information about the file's size and
the number of shares into which it has been distributed, are put into the
"CHK" uri. The storage index is derived by hashing the read key (using a
tagged SHA-256 hash, then truncated to 128 bits), so it does not need to be
tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
physically present in the URI.
The current format for CHK URIs is the concatenation of the following
@ -62,13 +67,24 @@ base32 encoding of the SHA-256 hash of the URI Extension Block,
(needed-shares) is an ascii decimal representation of the number of shares
required to reconstruct this file, (total-shares) is the same representation
of the total number of shares created, and (size) is an ascii decimal
representation of the size of the data represented by this URI.
representation of the size of the data represented by this URI. All base32
encodings are expressed in lower-case, with the trailing '=' signs removed.
For example, the following is a CHK URI, generated from the contents of the
architecture.txt document that lives next to this one in the source tree:
URI:CHK:ihrbeov7lbvoduupd4qblysj7a======:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq====:3:10:28733
URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
Historical note: The name "CHK" is somewhat inaccurate and continues to be
used for historical reasons. "Content Hash Key" means that the encryption key
is derived by hashing the contents, which gives the useful property that
encoding the same file twice will result in the same URI. However, this is an
optional step: by passing a different flag to the appropriate API call, Tahoe
will generate a random encryption key instead of hashing the file: this gives
the useful property that the URI or storage index does not reveal anything
about the file's contents (except filesize), which improves privacy. The
URI:CHK: prefix really indicates that an immutable file is in use, without
saying anything about how the key was derived.
=== LIT URIs ===
@ -83,27 +99,49 @@ directly in the URI.
The format of a LIT URI is simply a fixed prefix concatenated with the base32
encoding of the file's data:
URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi======
URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
file that contains the string "hello" is "URI:LIT:nbswy3dp".
=== Mutable File URIs ===
TODO: update this documentation for v0.7.0 which does have decentralized mutable files and decentralized directories
The current release does not provide for mutable files, hence all file URIs
correspond to immutable data. Future releases will probably add mutable
files, creating a new class of Mutable File URIs. These URIs will contain the
hash of a public key and also a symmetric read- or write- key. The URI refers
to a "mutable slot" into which arbitrary data can be uploaded at various
times. Each time this kind of URI is submitted to the Downloader, the caller
will receive the current contents of the slot (i.e. the data that was most
recently uploaded to it). The public key will be used to validate the data.
The other kind of DHT entry is the "mutable slot", in which the URI names a
container to which data can be placed and retrieved without changing the
identity of the container.
Note that this form of validation is limited to confirming that the data
retrieved matches __some__ data that was uploaded in the past. The downloader
may still be vulnerable to replay attacks, although the distributed storage
mechanism will probably minimize this vulnerability.
These slots have write-caps (which allow read/write access), read-caps (which
only allow read-access), and verify-caps (which allow a file checker/repairer
to confirm that the contents exist, but does not let it decrypt the
contents).
Mutable slots use public key technology to provide data integrity, and put a
hash of the public key in the URI. As a result, the data validation is
limited to confirming that the data retrieved matches _some_ data that was
uploaded in the past, but not _which_ version of that data.
The format of the write-cap for mutable files is:
URI:SSK:(writekey):(fingerprint)
Where (writekey) is the base32 encoding of the 16-byte AES encryption key
that is used to encrypt the RSA private key, and (fingerprint) is the base32
encoded 32-byte SHA-256 hash of the RSA public key. For more details about
the way these keys are used, please see docs/mutable.txt .
The format for mutable read-caps is:
URI:SSK-RO:(readkey):(fingerprint)
The read-cap is just like the write-cap except it contains the other AES
encryption key: the one used for encrypting the mutable file's contents. This
second key is derived by hashing the writekey, which allows the holder of a
write-cap to produce a read-cap, but not the other way around. The
fingerprint is the same in both caps.
Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
structure to provide mutable file access.
== Directory URIs ==
@ -112,25 +150,19 @@ of directories and files, the "vdrive" layer (which sits on top of the grid
layer) needs to keep track of "directory nodes", or "dirnodes" for short.
source:docs/dirnodes.txt describes how these work.
TODO: update this documentation for v0.7.0 which has decentralized mutable files and decentralized directories
In the current release, each dirnode is stored (in encrypted form) on a
single "vdrive server". The Foolscap FURL that points at this server is kept
inside the "dirnode URI", as well as the read-key or write-key used in the
encryption. There are two forms of dirnode URIs: the read-write form contains
the write-key (from which the read-key can be derived by hashing), while the
read-only form only contains the read-key. The storage index is derived from
the read-key, so both kinds of URIs implicitly contain the storage index.
Dirnodes are contained inside mutable files, and are thus simply a particular
way to interpret the contents of these files. As a result, a directory
write-cap looks a lot like a mutable-file write-cap:
The format of a read-write directory URI is the literal string "URI:DIR:",
followed by the FURL of the vdrive server, another ":", then the
base32-encoded representation of the write-key. For example:
URI:DIR2:(writekey):(fingerprint)
URI:DIR:pb://ugltpehrf73gnb4qbjigxmmzbmznjxo6@10.0.0.16:59571,127.0.0.1:59571/vdrive:x2amqa52r6kqe7iemndilvtntm======
Likewise directory read-caps (which provide read-only access to the
directory) look much like mutable-file read-caps:
A read-only directory URI is similar: "DIR-RO" is used instead of "DIR", and
the read-key is used instead of the write-key:
URI:DIR2-RO:(readkey):(fingerprint)
URI:DIR-RO:pb://ugltpehrf73gnb4qbjigxmmzbmznjxo6@10.0.0.16:59571,127.0.0.1:59571/vdrive:l4dqkt3lianmxecxv7nol3ka2i======
Historical note: the "DIR2" prefix is used because the non-distributed
dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
== Internal Usage of URIs ==