mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-18 18:56:28 +00:00
docs/uri.txt: update to reflect mutable files
This commit is contained in:
parent
7927495cbe
commit
92640dc6e5
110
docs/uri.txt
110
docs/uri.txt
@ -4,7 +4,8 @@
|
||||
Each file and directory in a Tahoe filesystem is described by a "URI". There
|
||||
are different kinds of URIs for different kinds of objects, and there are
|
||||
different kinds of URIs to provide different kinds of access to those
|
||||
objects.
|
||||
objects. Each URI is a string representation of a "capability" or "cap", and
|
||||
there are read-caps, write-caps, verify-caps, and others.
|
||||
|
||||
Each URI provides both '''location''' and '''identification''' properties.
|
||||
'''location''' means that holding the URI is sufficient to locate the data it
|
||||
@ -17,8 +18,9 @@ limited in their abilities by the identification properties of the URI.
|
||||
Some URIs are subsets of others. In particular, if you know a URI which
|
||||
allows you to modify some object, you can produce a weaker read-only URI and
|
||||
give it to someone else, and they will be able to read that object but not
|
||||
modify it. Each URI represents some '''capability''', and some capabilities
|
||||
are derived from others.
|
||||
modify it. Directories, for example, have a read-cap which is derived from
|
||||
the write-cap: anyone with read/write access to the directory can produce a
|
||||
limited URI that grants read-only access, but not the other way around.
|
||||
|
||||
source:src/allmydata/uri.py is the main place where URIs are processed. It is
|
||||
the authoritative definition point for all the the URI types described
|
||||
@ -30,10 +32,13 @@ The lowest layer of the Tahoe architecture (the "grid") is reponsible for
|
||||
mapping URIs to data. This is basically a distributed hash table, in which
|
||||
the URI is the key, and some sequence of bytes is the value.
|
||||
|
||||
At present, all the entries in this DHT are immutable. That means that each
|
||||
URI represents a fixed chunk of data. The URI itself is derived from the data
|
||||
when it is uploaded into the grid, and can be used to locate and download
|
||||
that data from the grid at some time in the future.
|
||||
There are two kinds of entries in this table: immutable and mutable. For
|
||||
immutable entries, the URI represents a fixed chunk of data. The URI itself
|
||||
is derived from the data when it is uploaded into the grid, and can be used
|
||||
to locate and download that data from the grid at some time in the future.
|
||||
|
||||
For mutable entries, the URI identifies a "slot" or "container", which can be
|
||||
filled with different pieces of data at different times.
|
||||
|
||||
It is important to note that the "files" described by these URIs are just a
|
||||
bunch of bytes, and that __no__ filenames or other metadata is retained at
|
||||
@ -49,7 +54,7 @@ computed to help validate the data afterwards (providing the "identification"
|
||||
property). All of these pieces, plus information about the file's size and
|
||||
the number of shares into which it has been distributed, are put into the
|
||||
"CHK" uri. The storage index is derived by hashing the read key (using a
|
||||
tagged SHA-256 hash, then truncated to 128 bits), so it does not need to be
|
||||
tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
|
||||
physically present in the URI.
|
||||
|
||||
The current format for CHK URIs is the concatenation of the following
|
||||
@ -62,13 +67,24 @@ base32 encoding of the SHA-256 hash of the URI Extension Block,
|
||||
(needed-shares) is an ascii decimal representation of the number of shares
|
||||
required to reconstruct this file, (total-shares) is the same representation
|
||||
of the total number of shares created, and (size) is an ascii decimal
|
||||
representation of the size of the data represented by this URI.
|
||||
representation of the size of the data represented by this URI. All base32
|
||||
encodings are expressed in lower-case, with the trailing '=' signs removed.
|
||||
|
||||
For example, the following is a CHK URI, generated from the contents of the
|
||||
architecture.txt document that lives next to this one in the source tree:
|
||||
|
||||
URI:CHK:ihrbeov7lbvoduupd4qblysj7a======:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq====:3:10:28733
|
||||
URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
|
||||
|
||||
Historical note: The name "CHK" is somewhat inaccurate and continues to be
|
||||
used for historical reasons. "Content Hash Key" means that the encryption key
|
||||
is derived by hashing the contents, which gives the useful property that
|
||||
encoding the same file twice will result in the same URI. However, this is an
|
||||
optional step: by passing a different flag to the appropriate API call, Tahoe
|
||||
will generate a random encryption key instead of hashing the file: this gives
|
||||
the useful property that the URI or storage index does not reveal anything
|
||||
about the file's contents (except filesize), which improves privacy. The
|
||||
URI:CHK: prefix really indicates that an immutable file is in use, without
|
||||
saying anything about how the key was derived.
|
||||
|
||||
=== LIT URIs ===
|
||||
|
||||
@ -83,27 +99,49 @@ directly in the URI.
|
||||
The format of a LIT URI is simply a fixed prefix concatenated with the base32
|
||||
encoding of the file's data:
|
||||
|
||||
URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi======
|
||||
URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
|
||||
|
||||
The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
|
||||
file that contains the string "hello" is "URI:LIT:nbswy3dp".
|
||||
|
||||
=== Mutable File URIs ===
|
||||
|
||||
TODO: update this documentation for v0.7.0 which does have decentralized mutable files and decentralized directories
|
||||
The current release does not provide for mutable files, hence all file URIs
|
||||
correspond to immutable data. Future releases will probably add mutable
|
||||
files, creating a new class of Mutable File URIs. These URIs will contain the
|
||||
hash of a public key and also a symmetric read- or write- key. The URI refers
|
||||
to a "mutable slot" into which arbitrary data can be uploaded at various
|
||||
times. Each time this kind of URI is submitted to the Downloader, the caller
|
||||
will receive the current contents of the slot (i.e. the data that was most
|
||||
recently uploaded to it). The public key will be used to validate the data.
|
||||
The other kind of DHT entry is the "mutable slot", in which the URI names a
|
||||
container to which data can be placed and retrieved without changing the
|
||||
identity of the container.
|
||||
|
||||
Note that this form of validation is limited to confirming that the data
|
||||
retrieved matches __some__ data that was uploaded in the past. The downloader
|
||||
may still be vulnerable to replay attacks, although the distributed storage
|
||||
mechanism will probably minimize this vulnerability.
|
||||
These slots have write-caps (which allow read/write access), read-caps (which
|
||||
only allow read-access), and verify-caps (which allow a file checker/repairer
|
||||
to confirm that the contents exist, but does not let it decrypt the
|
||||
contents).
|
||||
|
||||
Mutable slots use public key technology to provide data integrity, and put a
|
||||
hash of the public key in the URI. As a result, the data validation is
|
||||
limited to confirming that the data retrieved matches _some_ data that was
|
||||
uploaded in the past, but not _which_ version of that data.
|
||||
|
||||
The format of the write-cap for mutable files is:
|
||||
|
||||
URI:SSK:(writekey):(fingerprint)
|
||||
|
||||
Where (writekey) is the base32 encoding of the 16-byte AES encryption key
|
||||
that is used to encrypt the RSA private key, and (fingerprint) is the base32
|
||||
encoded 32-byte SHA-256 hash of the RSA public key. For more details about
|
||||
the way these keys are used, please see docs/mutable.txt .
|
||||
|
||||
The format for mutable read-caps is:
|
||||
|
||||
URI:SSK-RO:(readkey):(fingerprint)
|
||||
|
||||
The read-cap is just like the write-cap except it contains the other AES
|
||||
encryption key: the one used for encrypting the mutable file's contents. This
|
||||
second key is derived by hashing the writekey, which allows the holder of a
|
||||
write-cap to produce a read-cap, but not the other way around. The
|
||||
fingerprint is the same in both caps.
|
||||
|
||||
Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
|
||||
"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
|
||||
structure to provide mutable file access.
|
||||
|
||||
== Directory URIs ==
|
||||
|
||||
@ -112,25 +150,19 @@ of directories and files, the "vdrive" layer (which sits on top of the grid
|
||||
layer) needs to keep track of "directory nodes", or "dirnodes" for short.
|
||||
source:docs/dirnodes.txt describes how these work.
|
||||
|
||||
TODO: update this documentation for v0.7.0 which has decentralized mutable files and decentralized directories
|
||||
In the current release, each dirnode is stored (in encrypted form) on a
|
||||
single "vdrive server". The Foolscap FURL that points at this server is kept
|
||||
inside the "dirnode URI", as well as the read-key or write-key used in the
|
||||
encryption. There are two forms of dirnode URIs: the read-write form contains
|
||||
the write-key (from which the read-key can be derived by hashing), while the
|
||||
read-only form only contains the read-key. The storage index is derived from
|
||||
the read-key, so both kinds of URIs implicitly contain the storage index.
|
||||
Dirnodes are contained inside mutable files, and are thus simply a particular
|
||||
way to interpret the contents of these files. As a result, a directory
|
||||
write-cap looks a lot like a mutable-file write-cap:
|
||||
|
||||
The format of a read-write directory URI is the literal string "URI:DIR:",
|
||||
followed by the FURL of the vdrive server, another ":", then the
|
||||
base32-encoded representation of the write-key. For example:
|
||||
URI:DIR2:(writekey):(fingerprint)
|
||||
|
||||
URI:DIR:pb://ugltpehrf73gnb4qbjigxmmzbmznjxo6@10.0.0.16:59571,127.0.0.1:59571/vdrive:x2amqa52r6kqe7iemndilvtntm======
|
||||
Likewise directory read-caps (which provide read-only access to the
|
||||
directory) look much like mutable-file read-caps:
|
||||
|
||||
A read-only directory URI is similar: "DIR-RO" is used instead of "DIR", and
|
||||
the read-key is used instead of the write-key:
|
||||
URI:DIR2-RO:(readkey):(fingerprint)
|
||||
|
||||
URI:DIR-RO:pb://ugltpehrf73gnb4qbjigxmmzbmznjxo6@10.0.0.16:59571,127.0.0.1:59571/vdrive:l4dqkt3lianmxecxv7nol3ka2i======
|
||||
Historical note: the "DIR2" prefix is used because the non-distributed
|
||||
dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
|
||||
|
||||
== Internal Usage of URIs ==
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user