Update docs, notably performance.rst, to include MDMF. fixes #1772

This commit is contained in:
david-sarah 2012-06-23 23:13:38 +00:00
parent c1faaa2ca2
commit 514fb096be
3 changed files with 87 additions and 67 deletions

View File

@ -365,10 +365,13 @@ Client Configuration
mutable-type parameter in the webapi. If you do not specify a value here, mutable-type parameter in the webapi. If you do not specify a value here,
Tahoe-LAFS will use SDMF for all newly-created mutable files. Tahoe-LAFS will use SDMF for all newly-created mutable files.
Note that this parameter only applies to mutable files. Mutable Note that this parameter applies only to files, not to directories.
directories, which are stored as mutable files, are not controlled by Mutable directories, which are stored in mutable files, are not
this parameter and will always use SDMF. We may revisit this decision in controlled by this parameter and will always use SDMF. We may revisit
future versions of Tahoe-LAFS. this decision in future versions of Tahoe-LAFS.
See `<frontends/specifications/mutable.rst>`_ for details about mutable
file formats.
Frontend Configuration Frontend Configuration
====================== ======================

View File

@ -10,8 +10,8 @@ Performance costs for some common operations
6. `Inserting/Removing B bytes in an A-byte mutable file`_ 6. `Inserting/Removing B bytes in an A-byte mutable file`_
7. `Adding an entry to an A-entry directory`_ 7. `Adding an entry to an A-entry directory`_
8. `Listing an A entry directory`_ 8. `Listing an A entry directory`_
9. `Performing a file-check on an A-byte file`_ 9. `Checking an A-byte file`_
10. `Performing a file-verify on an A-byte file`_ 10. `Verifying an A-byte file (immutable)`_
11. `Repairing an A-byte file (mutable or immutable)`_ 11. `Repairing an A-byte file (mutable or immutable)`_
``K`` indicates the number of shares required to reconstruct the file ``K`` indicates the number of shares required to reconstruct the file
@ -23,7 +23,7 @@ Performance costs for some common operations
``A`` indicates the number of bytes in a file ``A`` indicates the number of bytes in a file
``B`` indicates the number of bytes of a file which are being read or ``B`` indicates the number of bytes of a file that are being read or
written written
``G`` indicates the number of storage servers on your grid ``G`` indicates the number of storage servers on your grid
@ -179,8 +179,8 @@ directory be downloaded from the grid. So listing an A entry
directory requires downloading a (roughly) 330 * A byte mutable directory requires downloading a (roughly) 330 * A byte mutable
file, since each directory entry is about 300-330 bytes in size. file, since each directory entry is about 300-330 bytes in size.
Performing a file-check on an ``A``-byte file Checking an ``A``-byte file
============================================= ===========================
cpu: ~G cpu: ~G
@ -193,8 +193,8 @@ about. Note that neither of these values directly depend on the size
of the file. This is relatively inexpensive, compared to the verify of the file. This is relatively inexpensive, compared to the verify
and repair operations. and repair operations.
Performing a file-verify on an ``A``-byte file Verifying an A-byte file (immutable)
============================================== ====================================
cpu: ~N/K*A cpu: ~N/K*A
@ -204,9 +204,24 @@ memory footprint: N/K*S
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
shares that were originally uploaded to the grid and integrity checks shares that were originally uploaded to the grid and integrity checks
them. This is (for well-behaved grids) more expensive than downloading them. This is (for grids with good redundancy) more expensive than
an A-byte file, since only a fraction of these shares are necessary to downloading an A-byte file, since only a fraction of these shares would
recover the file. be necessary to recover the file.
Verifying an A-byte file (mutable)
==================================
cpu: ~N/K*A
network: N/K*A
memory footprint: N/K*A
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
shares that were originally uploaded to the grid and integrity checks
them. This is (for grids with good redundancy) more expensive than
downloading an A-byte file, since only a fraction of these shares would
be necessary to recover the file.
Repairing an ``A``-byte file (mutable or immutable) Repairing an ``A``-byte file (mutable or immutable)
=================================================== ===================================================

View File

@ -2,8 +2,6 @@
Mutable Files Mutable Files
============= =============
This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
1. `Mutable Formats`_ 1. `Mutable Formats`_
2. `Consistency vs. Availability`_ 2. `Consistency vs. Availability`_
3. `The Prime Coordination Directive: "Don't Do That"`_ 3. `The Prime Coordination Directive: "Don't Do That"`_
@ -19,33 +17,38 @@ This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
6. `Large Distributed Mutable Files`_ 6. `Large Distributed Mutable Files`_
7. `TODO`_ 7. `TODO`_
Mutable File Slots are places with a stable identifier that can hold data Mutable files are places with a stable identifier that can hold data that
that changes over time. In contrast to CHK slots, for which the changes over time. In contrast to immutable slots, for which the
URI/identifier is derived from the contents themselves, the Mutable File Slot identifier/capability is derived from the contents themselves, the mutable
URI remains fixed for the life of the slot, regardless of what data is placed file identifier remains fixed for the life of the slot, regardless of what
inside it. data is placed inside it.
Each mutable slot is referenced by two different URIs. The "read-write" URI Each mutable file is referenced by two different caps. The "read-write" cap
grants read-write access to its holder, allowing them to put whatever grants read-write access to its holder, allowing them to put whatever
contents they like into the slot. The "read-only" URI is less powerful, only contents they like into the slot. The "read-only" cap is less powerful, only
granting read access, and not enabling modification of the data. The granting read access, and not enabling modification of the data. The
read-write URI can be turned into the read-only URI, but not the other way read-write cap can be turned into the read-only cap, but not the other way
around. around.
The data in these slots is distributed over a number of servers, using the The data in these files is distributed over a number of servers, using the
same erasure coding that CHK files use, with 3-of-10 being a typical choice same erasure coding that immutable files use, with 3-of-10 being a typical
of encoding parameters. The data is encrypted and signed in such a way that choice of encoding parameters. The data is encrypted and signed in such a way
only the holders of the read-write URI will be able to set the contents of that only the holders of the read-write cap will be able to set the contents
the slot, and only the holders of the read-only URI will be able to read of the slot, and only the holders of the read-only cap will be able to read
those contents. Holders of either URI will be able to validate the contents those contents. Holders of either cap will be able to validate the contents
as being written by someone with the read-write URI. The servers who hold the as being written by someone with the read-write cap. The servers who hold the
shares cannot read or modify them: the worst they can do is deny service (by shares are not automatically given the ability read or modify them: the worst
deleting or corrupting the shares), or attempt a rollback attack (which can they can do is deny service (by deleting or corrupting the shares), or
only succeed with the cooperation of at least k servers). attempt a rollback attack (which can only succeed with the cooperation of at
least k servers).
Mutable Formats Mutable Formats
=============== ===============
History
-------
When mutable files first shipped in Tahoe-0.8.0 (15-Feb-2008), the only When mutable files first shipped in Tahoe-0.8.0 (15-Feb-2008), the only
version available was "SDMF", described below. This was a version available was "SDMF", described below. This was a
limited-functionality placeholder, intended to be replaced with limited-functionality placeholder, intended to be replaced with
@ -75,8 +78,11 @@ SDMF a clean subset of MDMF, where any single-segment MDMF file could be
handled by the old SDMF code). In the fall of 2011, Kevan's code was finally handled by the old SDMF code). In the fall of 2011, Kevan's code was finally
integrated, and first made available in the Tahoe-1.9.0 release. integrated, and first made available in the Tahoe-1.9.0 release.
The main improvement of MDMF is the use of multiple segments: individual SDMF vs. MDMF
128KiB sections of the file can be retrieved or modified independently. The -------------
The improvement of MDMF is the use of multiple segments: individual 128-KiB
sections of the file can be retrieved or modified independently. The
improvement can be seen when fetching just a portion of the file (using a improvement can be seen when fetching just a portion of the file (using a
Range: header on the webapi), or when modifying a portion (again with a Range: header on the webapi), or when modifying a portion (again with a
Range: header). It can also be seen indirectly when fetching the whole file: Range: header). It can also be seen indirectly when fetching the whole file:
@ -84,12 +90,14 @@ the first segment of data should be delivered faster from a large MDMF file
than from an SDMF file, although the overall download will then proceed at than from an SDMF file, although the overall download will then proceed at
the same rate. the same rate.
We've decided to make it opt-in for the first release while we shake out the We've decided to make it opt-in for now: mutable files default to
bugs, just in case a problem is found which requires an incompatible format SDMF format unless explicitly configured to use MDMF, either in ``tahoe.cfg``
change. All new mutable files will be in SDMF format unless the user (see `<configuration.rst>`__) or in the WUI or CLI command that created a
specifically chooses to use MDMF instead. The code can read and modify new mutable file.
existing files of either format without user intervention. We expect to make
MDMF the default in a subsequent release, perhaps 2.0. The code can read and modify existing files of either format without user
intervention. We expect to make MDMF the default in a subsequent release,
perhaps 2.0.
Which format should you use? SDMF works well for files up to a few MB, and Which format should you use? SDMF works well for files up to a few MB, and
can be handled by older versions (Tahoe-1.8.3 and earlier). If you do not can be handled by older versions (Tahoe-1.8.3 and earlier). If you do not
@ -114,8 +122,9 @@ As we develop more sophisticated mutable slots, the API may expose multiple
read versions to the application layer. The tahoe philosophy is to defer most read versions to the application layer. The tahoe philosophy is to defer most
consistency recovery logic to the higher layers. Some applications have consistency recovery logic to the higher layers. Some applications have
effective ways to merge multiple versions, so inconsistency is not effective ways to merge multiple versions, so inconsistency is not
necessarily a problem (i.e. directory nodes can usually merge multiple "add necessarily a problem (i.e. directory nodes can usually merge multiple
child" operations). "add child" operations).
The Prime Coordination Directive: "Don't Do That" The Prime Coordination Directive: "Don't Do That"
================================================= =================================================
@ -697,38 +706,30 @@ Medium Distributed Mutable Files
These are just like the SDMF case, but: These are just like the SDMF case, but:
* we actually take advantage of the Merkle hash tree over the blocks, by * We actually take advantage of the Merkle hash tree over the blocks, by
reading a single segment of data at a time (and its necessary hashes), to reading a single segment of data at a time (and its necessary hashes), to
reduce the read-time alacrity reduce the read-time alacrity.
* we allow arbitrary writes to the file (i.e. seek() is provided, and * We allow arbitrary writes to any range of the file.
O_TRUNC is no longer required) * We add more code to first read each segment that a write must modify.
* we write more code on the client side (in the MutableFileNode class), to This looks exactly like the way a normal filesystem uses a block device,
first read each segment that a write must modify. This looks exactly like or how a CPU must perform a cache-line fill before modifying a single word.
the way a normal filesystem uses a block device, or how a CPU must perform * We might implement some sort of copy-based atomic update server call,
a cache-line fill before modifying a single word.
* we might implement some sort of copy-based atomic update server call,
to allow multiple writev() calls to appear atomic to any readers. to allow multiple writev() calls to appear atomic to any readers.
MDMF slots provide fairly efficient in-place edits of very large files (a few MDMF slots provide fairly efficient in-place edits of very large files (a few
GB). Appending data is also fairly efficient, although each time a power of 2 GB). Appending data is also fairly efficient.
boundary is crossed, the entire file must effectively be re-uploaded (because
the size of the block hash tree changes), so if the filesize is known in
advance, that space ought to be pre-allocated (by leaving extra space between
the block hash tree and the actual data).
MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
adds cache-line reads to allow random-access writes.
Large Distributed Mutable Files Large Distributed Mutable Files
=============================== ===============================
LDMF slots use a fundamentally different way to store the file, inspired by LDMF slots (not implemented) would use a fundamentally different way to store
Mercurial's "revlog" format. They enable very efficient insert/remove/replace the file, inspired by Mercurial's "revlog" format. This would enable very
editing of arbitrary spans. Multiple versions of the file can be retained, in efficient insert/remove/replace editing of arbitrary spans. Multiple versions
a revision graph that can have multiple heads. Each revision can be of the file can be retained, in a revision graph that can have multiple heads.
referenced by a cryptographic identifier. There are two forms of the URI, one Each revision can be referenced by a cryptographic identifier. There are two
that means "most recent version", and a longer one that points to a specific forms of the URI, one that means "most recent version", and a longer one that
revision. points to a specific revision.
Metadata can be attached to the revisions, like timestamps, to enable rolling Metadata can be attached to the revisions, like timestamps, to enable rolling
back an entire tree to a specific point in history. back an entire tree to a specific point in history.
@ -736,6 +737,7 @@ back an entire tree to a specific point in history.
LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2 LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
provides explicit support for revision identifiers and branching. provides explicit support for revision identifiers and branching.
TODO TODO
==== ====