mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-06 21:18:43 +00:00
149 lines
7.8 KiB
Plaintext
149 lines
7.8 KiB
Plaintext
|
|
||
|
== FileEncoding ==
|
||
|
|
||
|
When the client wishes to upload an immutable file, the first step is to
|
||
|
decide upon an encryption key. There are two methods: convergent or random.
|
||
|
The goal of the convergent-key method is to make sure that multiple uploads
|
||
|
of the same file will result in only one copy on the grid, whereas the
|
||
|
random-key method does not provide this "convergence" feature.
|
||
|
|
||
|
The convergent-key method computes the SHA-256d hash of a single-purpose tag,
|
||
|
the encoding parameters, a "convergence secret", and the contents of the
|
||
|
file. It uses a portion of the resulting hash as the AES encryption key.
|
||
|
There are security concerns with using convergence this approach (the
|
||
|
"partial-information guessing attack", please see ticket #365 for some
|
||
|
references), so Tahoe uses a separate (randomly-generated) "convergence
|
||
|
secret" for each node, stored in NODEDIR/private/convergence . The encoding
|
||
|
parameters (k, N, and the segment size) are included in the hash to make sure
|
||
|
that two different encodings of the same file will get different keys. This
|
||
|
method requires an extra IO pass over the file, to compute this key, and
|
||
|
encryption cannot be started until the pass is complete. This means that the
|
||
|
convergent-key method will require at least two total passes over the file.
|
||
|
|
||
|
The random-key method simply chooses a random encryption key. Convergence is
|
||
|
disabled, however this method does not require a separate IO pass, so upload
|
||
|
can be done with a single pass. This mode makes it easier to perform
|
||
|
streaming upload.
|
||
|
|
||
|
Regardless of which method is used to generate the key, the plaintext file is
|
||
|
encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
|
||
|
then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
|
||
|
are generated as the encryption proceeds: a flat hash of the whole
|
||
|
ciphertext, and a Merkle tree. These are used to verify the correctness of
|
||
|
the erasure decoding step, and can be used by a "verifier" process to make
|
||
|
sure the file is intact without requiring the decryption key.
|
||
|
|
||
|
The encryption key is hashed (with SHA-256d and a single-purpose tag) to
|
||
|
produce the "Storage Index". This Storage Index (or SI) is used to identify
|
||
|
the shares produced by the method described below. The grid can be thought of
|
||
|
as a large table that maps Storage Index to a ciphertext. Since the
|
||
|
ciphertext is stored as erasure-coded shares, it can also be thought of as a
|
||
|
table that maps SI to shares.
|
||
|
|
||
|
Anybody who knows a Storage Index can retrieve the associated ciphertext:
|
||
|
ciphertexts are not secret.
|
||
|
|
||
|
|
||
|
[[Image(file-encoding1.png)]]
|
||
|
|
||
|
The ciphertext file is then broken up into segments. The last segment is
|
||
|
likely to be shorter than the rest. Each segment is erasure-coded into a
|
||
|
number of "subshares". This takes place one segment at a time. (In fact,
|
||
|
encryption and erasure-coding take place at the same time, once per plaintext
|
||
|
segment). Larger segment sizes result in less overhead overall, but increase
|
||
|
both the memory footprint and the "alacrity" (the number of bytes we have to
|
||
|
receive before we can deliver validated plaintext to the user). The current
|
||
|
default segment size is 128KiB.
|
||
|
|
||
|
One subshare from each segment is sent to each shareholder (aka leaseholder,
|
||
|
aka landlord, aka storage node, aka peer). The "share" held by each remote
|
||
|
shareholder is nominally just a collection of these subshares. The file will
|
||
|
be recoverable when a certain number of shares have been retrieved.
|
||
|
|
||
|
[[Image(file-encoding2.png)]]
|
||
|
|
||
|
The subshares are hashed as they are generated and transmitted. These
|
||
|
subshare hashes are put into a Merkle hash tree. When the last share has been
|
||
|
created, the merkle tree is completed and delivered to the peer. Later, when
|
||
|
we retrieve these subshares, the peer will send many of the merkle hash tree
|
||
|
nodes ahead of time, so we can validate each subshare independently.
|
||
|
|
||
|
The root of this subshare hash tree is called the "subshare root hash" and
|
||
|
used in the next step.
|
||
|
|
||
|
[[Image(file-encoding3.png)]]
|
||
|
|
||
|
There is a higher-level Merkle tree called the "share hash tree". Its leaves
|
||
|
are the subshare root hashes from each share. The root of this tree is called
|
||
|
the "share root hash" and is included in the "URI Extension Block", aka UEB.
|
||
|
The ciphertext hash and Merkle tree are also put here, along with the
|
||
|
original file size, and the encoding parameters. The UEB contains all the
|
||
|
non-secret values that could be put in the URI, but would have made the URI
|
||
|
too big. So instead, the UEB is stored with the share, and the hash of the
|
||
|
UEB is put in the URI.
|
||
|
|
||
|
The URI then contains the secret encryption key and the UEB hash. It also
|
||
|
contains the basic encoding parameters (k and N) and the file size, to make
|
||
|
download more efficient (by knowing the number of required shares ahead of
|
||
|
time, sufficient download queries can be generated in parallel).
|
||
|
|
||
|
The URI (also known as the immutable-file read-cap, since possessing it
|
||
|
grants the holder the capability to read the file's plaintext) is then
|
||
|
represented as a (relatively) short printable string like so:
|
||
|
|
||
|
URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
|
||
|
|
||
|
[[Image(file-encoding4.png)]]
|
||
|
|
||
|
During download, when a peer begins to transmit a share, it first transmits
|
||
|
all of the parts of the share hash tree that are necessary to validate its
|
||
|
subshare root hash. Then it transmits the portions of the subshare hash tree
|
||
|
that are necessary to validate the first subshare. Then it transmits the
|
||
|
first subshare. It then continues this loop: transmitting any portions of the
|
||
|
subshare hash tree to validate subshare#N, then sending subshare#N.
|
||
|
|
||
|
[[Image(file-encoding5.png)]]
|
||
|
|
||
|
So the "share" that is sent to the remote peer actually consists of three
|
||
|
pieces, sent in a specific order as they become available, and retrieved
|
||
|
during download in a different order according to when they are needed.
|
||
|
|
||
|
The first piece is the subshares themselves, one per segment. The last
|
||
|
subshare will likely be shorter than the rest, because the last segment is
|
||
|
probably shorter than the rest. The second piece is the subshare hash tree,
|
||
|
consisting of a total of two SHA-1 hashes per subshare. The third piece is a
|
||
|
hash chain from the share hash tree, consisting of log2(numshares) hashes.
|
||
|
|
||
|
During upload, all subshares are sent first, followed by the subshare hash
|
||
|
tree, followed by the share hash chain. During download, the share hash chain
|
||
|
is delivered first, followed by the subshare root hash. The client then uses
|
||
|
the hash chain to validate the subshare root hash. Then the peer delivers
|
||
|
enough of the subshare hash tree to validate the first subshare, followed by
|
||
|
the first subshare itself. The subshare hash chain is used to validate the
|
||
|
subshare, then it is passed (along with the first subshare from several other
|
||
|
peers) into decoding, to produce the first segment of crypttext, which is
|
||
|
then decrypted to produce the first segment of plaintext, which is finally
|
||
|
delivered to the user.
|
||
|
|
||
|
[[Image(file-encoding6.png)]]
|
||
|
|
||
|
== Hashes ==
|
||
|
|
||
|
All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
|
||
|
and Schneier). All hashes use a single-purpose tag, e.g. the hash that
|
||
|
converts an encryption key into a storage index is defined as follows:
|
||
|
|
||
|
SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
|
||
|
|
||
|
When two separate values need to be combined together in a hash, we wrap each
|
||
|
in a netstring.
|
||
|
|
||
|
Using SHA-256d (instead of plain SHA-256) guards against length-extension
|
||
|
attacks. Using the tag protects our Merkle trees against attacks in which the
|
||
|
hash of a leaf is confused with a hash of two children (allowing an attacker
|
||
|
to generate corrupted data that nevertheless appears to be valid), and is
|
||
|
simply good "cryptograhic hygiene". The "Chosen Protocol Attack" by Kelsey,
|
||
|
Schneier, and Wagner (http://www.schneier.com/paper-chosen-protocol.html) is
|
||
|
relevant. Putting the tag in a netstring guards against attacks that seek to
|
||
|
confuse the end of the tag with the beginning of the subsequent value.
|