mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-11 15:32:39 +00:00
docs/file-encoding.txt: move this over from the wiki
This commit is contained in:
parent
7ea7fd751e
commit
f9fe63fd7a
148
docs/file-encoding.txt
Normal file
148
docs/file-encoding.txt
Normal file
@ -0,0 +1,148 @@
|
||||
|
||||
== FileEncoding ==
|
||||
|
||||
When the client wishes to upload an immutable file, the first step is to
|
||||
decide upon an encryption key. There are two methods: convergent or random.
|
||||
The goal of the convergent-key method is to make sure that multiple uploads
|
||||
of the same file will result in only one copy on the grid, whereas the
|
||||
random-key method does not provide this "convergence" feature.
|
||||
|
||||
The convergent-key method computes the SHA-256d hash of a single-purpose tag,
|
||||
the encoding parameters, a "convergence secret", and the contents of the
|
||||
file. It uses a portion of the resulting hash as the AES encryption key.
|
||||
There are security concerns with using convergence this approach (the
|
||||
"partial-information guessing attack", please see ticket #365 for some
|
||||
references), so Tahoe uses a separate (randomly-generated) "convergence
|
||||
secret" for each node, stored in NODEDIR/private/convergence . The encoding
|
||||
parameters (k, N, and the segment size) are included in the hash to make sure
|
||||
that two different encodings of the same file will get different keys. This
|
||||
method requires an extra IO pass over the file, to compute this key, and
|
||||
encryption cannot be started until the pass is complete. This means that the
|
||||
convergent-key method will require at least two total passes over the file.
|
||||
|
||||
The random-key method simply chooses a random encryption key. Convergence is
|
||||
disabled, however this method does not require a separate IO pass, so upload
|
||||
can be done with a single pass. This mode makes it easier to perform
|
||||
streaming upload.
|
||||
|
||||
Regardless of which method is used to generate the key, the plaintext file is
|
||||
encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
|
||||
then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
|
||||
are generated as the encryption proceeds: a flat hash of the whole
|
||||
ciphertext, and a Merkle tree. These are used to verify the correctness of
|
||||
the erasure decoding step, and can be used by a "verifier" process to make
|
||||
sure the file is intact without requiring the decryption key.
|
||||
|
||||
The encryption key is hashed (with SHA-256d and a single-purpose tag) to
|
||||
produce the "Storage Index". This Storage Index (or SI) is used to identify
|
||||
the shares produced by the method described below. The grid can be thought of
|
||||
as a large table that maps Storage Index to a ciphertext. Since the
|
||||
ciphertext is stored as erasure-coded shares, it can also be thought of as a
|
||||
table that maps SI to shares.
|
||||
|
||||
Anybody who knows a Storage Index can retrieve the associated ciphertext:
|
||||
ciphertexts are not secret.
|
||||
|
||||
|
||||
[[Image(file-encoding1.png)]]
|
||||
|
||||
The ciphertext file is then broken up into segments. The last segment is
|
||||
likely to be shorter than the rest. Each segment is erasure-coded into a
|
||||
number of "subshares". This takes place one segment at a time. (In fact,
|
||||
encryption and erasure-coding take place at the same time, once per plaintext
|
||||
segment). Larger segment sizes result in less overhead overall, but increase
|
||||
both the memory footprint and the "alacrity" (the number of bytes we have to
|
||||
receive before we can deliver validated plaintext to the user). The current
|
||||
default segment size is 128KiB.
|
||||
|
||||
One subshare from each segment is sent to each shareholder (aka leaseholder,
|
||||
aka landlord, aka storage node, aka peer). The "share" held by each remote
|
||||
shareholder is nominally just a collection of these subshares. The file will
|
||||
be recoverable when a certain number of shares have been retrieved.
|
||||
|
||||
[[Image(file-encoding2.png)]]
|
||||
|
||||
The subshares are hashed as they are generated and transmitted. These
|
||||
subshare hashes are put into a Merkle hash tree. When the last share has been
|
||||
created, the merkle tree is completed and delivered to the peer. Later, when
|
||||
we retrieve these subshares, the peer will send many of the merkle hash tree
|
||||
nodes ahead of time, so we can validate each subshare independently.
|
||||
|
||||
The root of this subshare hash tree is called the "subshare root hash" and
|
||||
used in the next step.
|
||||
|
||||
[[Image(file-encoding3.png)]]
|
||||
|
||||
There is a higher-level Merkle tree called the "share hash tree". Its leaves
|
||||
are the subshare root hashes from each share. The root of this tree is called
|
||||
the "share root hash" and is included in the "URI Extension Block", aka UEB.
|
||||
The ciphertext hash and Merkle tree are also put here, along with the
|
||||
original file size, and the encoding parameters. The UEB contains all the
|
||||
non-secret values that could be put in the URI, but would have made the URI
|
||||
too big. So instead, the UEB is stored with the share, and the hash of the
|
||||
UEB is put in the URI.
|
||||
|
||||
The URI then contains the secret encryption key and the UEB hash. It also
|
||||
contains the basic encoding parameters (k and N) and the file size, to make
|
||||
download more efficient (by knowing the number of required shares ahead of
|
||||
time, sufficient download queries can be generated in parallel).
|
||||
|
||||
The URI (also known as the immutable-file read-cap, since possessing it
|
||||
grants the holder the capability to read the file's plaintext) is then
|
||||
represented as a (relatively) short printable string like so:
|
||||
|
||||
URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
|
||||
|
||||
[[Image(file-encoding4.png)]]
|
||||
|
||||
During download, when a peer begins to transmit a share, it first transmits
|
||||
all of the parts of the share hash tree that are necessary to validate its
|
||||
subshare root hash. Then it transmits the portions of the subshare hash tree
|
||||
that are necessary to validate the first subshare. Then it transmits the
|
||||
first subshare. It then continues this loop: transmitting any portions of the
|
||||
subshare hash tree to validate subshare#N, then sending subshare#N.
|
||||
|
||||
[[Image(file-encoding5.png)]]
|
||||
|
||||
So the "share" that is sent to the remote peer actually consists of three
|
||||
pieces, sent in a specific order as they become available, and retrieved
|
||||
during download in a different order according to when they are needed.
|
||||
|
||||
The first piece is the subshares themselves, one per segment. The last
|
||||
subshare will likely be shorter than the rest, because the last segment is
|
||||
probably shorter than the rest. The second piece is the subshare hash tree,
|
||||
consisting of a total of two SHA-1 hashes per subshare. The third piece is a
|
||||
hash chain from the share hash tree, consisting of log2(numshares) hashes.
|
||||
|
||||
During upload, all subshares are sent first, followed by the subshare hash
|
||||
tree, followed by the share hash chain. During download, the share hash chain
|
||||
is delivered first, followed by the subshare root hash. The client then uses
|
||||
the hash chain to validate the subshare root hash. Then the peer delivers
|
||||
enough of the subshare hash tree to validate the first subshare, followed by
|
||||
the first subshare itself. The subshare hash chain is used to validate the
|
||||
subshare, then it is passed (along with the first subshare from several other
|
||||
peers) into decoding, to produce the first segment of crypttext, which is
|
||||
then decrypted to produce the first segment of plaintext, which is finally
|
||||
delivered to the user.
|
||||
|
||||
[[Image(file-encoding6.png)]]
|
||||
|
||||
== Hashes ==
|
||||
|
||||
All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
|
||||
and Schneier). All hashes use a single-purpose tag, e.g. the hash that
|
||||
converts an encryption key into a storage index is defined as follows:
|
||||
|
||||
SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
|
||||
|
||||
When two separate values need to be combined together in a hash, we wrap each
|
||||
in a netstring.
|
||||
|
||||
Using SHA-256d (instead of plain SHA-256) guards against length-extension
|
||||
attacks. Using the tag protects our Merkle trees against attacks in which the
|
||||
hash of a leaf is confused with a hash of two children (allowing an attacker
|
||||
to generate corrupted data that nevertheless appears to be valid), and is
|
||||
simply good "cryptograhic hygiene". The "Chosen Protocol Attack" by Kelsey,
|
||||
Schneier, and Wagner (http://www.schneier.com/paper-chosen-protocol.html) is
|
||||
relevant. Putting the tag in a netstring guards against attacks that seek to
|
||||
confuse the end of the tag with the beginning of the subsequent value.
|
Loading…
Reference in New Issue
Block a user