mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-04-26 05:49:44 +00:00
Update docs, notably performance.rst, to include MDMF. fixes #1772
This commit is contained in:
parent
c1faaa2ca2
commit
514fb096be
docs
@ -365,10 +365,13 @@ Client Configuration
|
||||
mutable-type parameter in the webapi. If you do not specify a value here,
|
||||
Tahoe-LAFS will use SDMF for all newly-created mutable files.
|
||||
|
||||
Note that this parameter only applies to mutable files. Mutable
|
||||
directories, which are stored as mutable files, are not controlled by
|
||||
this parameter and will always use SDMF. We may revisit this decision in
|
||||
future versions of Tahoe-LAFS.
|
||||
Note that this parameter applies only to files, not to directories.
|
||||
Mutable directories, which are stored in mutable files, are not
|
||||
controlled by this parameter and will always use SDMF. We may revisit
|
||||
this decision in future versions of Tahoe-LAFS.
|
||||
|
||||
See `<frontends/specifications/mutable.rst>`_ for details about mutable
|
||||
file formats.
|
||||
|
||||
Frontend Configuration
|
||||
======================
|
||||
|
@ -10,8 +10,8 @@ Performance costs for some common operations
|
||||
6. `Inserting/Removing B bytes in an A-byte mutable file`_
|
||||
7. `Adding an entry to an A-entry directory`_
|
||||
8. `Listing an A entry directory`_
|
||||
9. `Performing a file-check on an A-byte file`_
|
||||
10. `Performing a file-verify on an A-byte file`_
|
||||
9. `Checking an A-byte file`_
|
||||
10. `Verifying an A-byte file (immutable)`_
|
||||
11. `Repairing an A-byte file (mutable or immutable)`_
|
||||
|
||||
``K`` indicates the number of shares required to reconstruct the file
|
||||
@ -23,7 +23,7 @@ Performance costs for some common operations
|
||||
|
||||
``A`` indicates the number of bytes in a file
|
||||
|
||||
``B`` indicates the number of bytes of a file which are being read or
|
||||
``B`` indicates the number of bytes of a file that are being read or
|
||||
written
|
||||
|
||||
``G`` indicates the number of storage servers on your grid
|
||||
@ -179,8 +179,8 @@ directory be downloaded from the grid. So listing an A entry
|
||||
directory requires downloading a (roughly) 330 * A byte mutable
|
||||
file, since each directory entry is about 300-330 bytes in size.
|
||||
|
||||
Performing a file-check on an ``A``-byte file
|
||||
=============================================
|
||||
Checking an ``A``-byte file
|
||||
===========================
|
||||
|
||||
cpu: ~G
|
||||
|
||||
@ -193,8 +193,8 @@ about. Note that neither of these values directly depend on the size
|
||||
of the file. This is relatively inexpensive, compared to the verify
|
||||
and repair operations.
|
||||
|
||||
Performing a file-verify on an ``A``-byte file
|
||||
==============================================
|
||||
Verifying an A-byte file (immutable)
|
||||
====================================
|
||||
|
||||
cpu: ~N/K*A
|
||||
|
||||
@ -204,9 +204,24 @@ memory footprint: N/K*S
|
||||
|
||||
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
|
||||
shares that were originally uploaded to the grid and integrity checks
|
||||
them. This is (for well-behaved grids) more expensive than downloading
|
||||
an A-byte file, since only a fraction of these shares are necessary to
|
||||
recover the file.
|
||||
them. This is (for grids with good redundancy) more expensive than
|
||||
downloading an A-byte file, since only a fraction of these shares would
|
||||
be necessary to recover the file.
|
||||
|
||||
Verifying an A-byte file (mutable)
|
||||
==================================
|
||||
|
||||
cpu: ~N/K*A
|
||||
|
||||
network: N/K*A
|
||||
|
||||
memory footprint: N/K*A
|
||||
|
||||
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
|
||||
shares that were originally uploaded to the grid and integrity checks
|
||||
them. This is (for grids with good redundancy) more expensive than
|
||||
downloading an A-byte file, since only a fraction of these shares would
|
||||
be necessary to recover the file.
|
||||
|
||||
Repairing an ``A``-byte file (mutable or immutable)
|
||||
===================================================
|
||||
|
@ -2,8 +2,6 @@
|
||||
Mutable Files
|
||||
=============
|
||||
|
||||
This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
|
||||
|
||||
1. `Mutable Formats`_
|
||||
2. `Consistency vs. Availability`_
|
||||
3. `The Prime Coordination Directive: "Don't Do That"`_
|
||||
@ -19,33 +17,38 @@ This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
|
||||
6. `Large Distributed Mutable Files`_
|
||||
7. `TODO`_
|
||||
|
||||
Mutable File Slots are places with a stable identifier that can hold data
|
||||
that changes over time. In contrast to CHK slots, for which the
|
||||
URI/identifier is derived from the contents themselves, the Mutable File Slot
|
||||
URI remains fixed for the life of the slot, regardless of what data is placed
|
||||
inside it.
|
||||
Mutable files are places with a stable identifier that can hold data that
|
||||
changes over time. In contrast to immutable slots, for which the
|
||||
identifier/capability is derived from the contents themselves, the mutable
|
||||
file identifier remains fixed for the life of the slot, regardless of what
|
||||
data is placed inside it.
|
||||
|
||||
Each mutable slot is referenced by two different URIs. The "read-write" URI
|
||||
Each mutable file is referenced by two different caps. The "read-write" cap
|
||||
grants read-write access to its holder, allowing them to put whatever
|
||||
contents they like into the slot. The "read-only" URI is less powerful, only
|
||||
contents they like into the slot. The "read-only" cap is less powerful, only
|
||||
granting read access, and not enabling modification of the data. The
|
||||
read-write URI can be turned into the read-only URI, but not the other way
|
||||
read-write cap can be turned into the read-only cap, but not the other way
|
||||
around.
|
||||
|
||||
The data in these slots is distributed over a number of servers, using the
|
||||
same erasure coding that CHK files use, with 3-of-10 being a typical choice
|
||||
of encoding parameters. The data is encrypted and signed in such a way that
|
||||
only the holders of the read-write URI will be able to set the contents of
|
||||
the slot, and only the holders of the read-only URI will be able to read
|
||||
those contents. Holders of either URI will be able to validate the contents
|
||||
as being written by someone with the read-write URI. The servers who hold the
|
||||
shares cannot read or modify them: the worst they can do is deny service (by
|
||||
deleting or corrupting the shares), or attempt a rollback attack (which can
|
||||
only succeed with the cooperation of at least k servers).
|
||||
The data in these files is distributed over a number of servers, using the
|
||||
same erasure coding that immutable files use, with 3-of-10 being a typical
|
||||
choice of encoding parameters. The data is encrypted and signed in such a way
|
||||
that only the holders of the read-write cap will be able to set the contents
|
||||
of the slot, and only the holders of the read-only cap will be able to read
|
||||
those contents. Holders of either cap will be able to validate the contents
|
||||
as being written by someone with the read-write cap. The servers who hold the
|
||||
shares are not automatically given the ability read or modify them: the worst
|
||||
they can do is deny service (by deleting or corrupting the shares), or
|
||||
attempt a rollback attack (which can only succeed with the cooperation of at
|
||||
least k servers).
|
||||
|
||||
|
||||
Mutable Formats
|
||||
===============
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
When mutable files first shipped in Tahoe-0.8.0 (15-Feb-2008), the only
|
||||
version available was "SDMF", described below. This was a
|
||||
limited-functionality placeholder, intended to be replaced with
|
||||
@ -75,8 +78,11 @@ SDMF a clean subset of MDMF, where any single-segment MDMF file could be
|
||||
handled by the old SDMF code). In the fall of 2011, Kevan's code was finally
|
||||
integrated, and first made available in the Tahoe-1.9.0 release.
|
||||
|
||||
The main improvement of MDMF is the use of multiple segments: individual
|
||||
128KiB sections of the file can be retrieved or modified independently. The
|
||||
SDMF vs. MDMF
|
||||
-------------
|
||||
|
||||
The improvement of MDMF is the use of multiple segments: individual 128-KiB
|
||||
sections of the file can be retrieved or modified independently. The
|
||||
improvement can be seen when fetching just a portion of the file (using a
|
||||
Range: header on the webapi), or when modifying a portion (again with a
|
||||
Range: header). It can also be seen indirectly when fetching the whole file:
|
||||
@ -84,12 +90,14 @@ the first segment of data should be delivered faster from a large MDMF file
|
||||
than from an SDMF file, although the overall download will then proceed at
|
||||
the same rate.
|
||||
|
||||
We've decided to make it opt-in for the first release while we shake out the
|
||||
bugs, just in case a problem is found which requires an incompatible format
|
||||
change. All new mutable files will be in SDMF format unless the user
|
||||
specifically chooses to use MDMF instead. The code can read and modify
|
||||
existing files of either format without user intervention. We expect to make
|
||||
MDMF the default in a subsequent release, perhaps 2.0.
|
||||
We've decided to make it opt-in for now: mutable files default to
|
||||
SDMF format unless explicitly configured to use MDMF, either in ``tahoe.cfg``
|
||||
(see `<configuration.rst>`__) or in the WUI or CLI command that created a
|
||||
new mutable file.
|
||||
|
||||
The code can read and modify existing files of either format without user
|
||||
intervention. We expect to make MDMF the default in a subsequent release,
|
||||
perhaps 2.0.
|
||||
|
||||
Which format should you use? SDMF works well for files up to a few MB, and
|
||||
can be handled by older versions (Tahoe-1.8.3 and earlier). If you do not
|
||||
@ -114,8 +122,9 @@ As we develop more sophisticated mutable slots, the API may expose multiple
|
||||
read versions to the application layer. The tahoe philosophy is to defer most
|
||||
consistency recovery logic to the higher layers. Some applications have
|
||||
effective ways to merge multiple versions, so inconsistency is not
|
||||
necessarily a problem (i.e. directory nodes can usually merge multiple "add
|
||||
child" operations).
|
||||
necessarily a problem (i.e. directory nodes can usually merge multiple
|
||||
"add child" operations).
|
||||
|
||||
|
||||
The Prime Coordination Directive: "Don't Do That"
|
||||
=================================================
|
||||
@ -697,38 +706,30 @@ Medium Distributed Mutable Files
|
||||
|
||||
These are just like the SDMF case, but:
|
||||
|
||||
* we actually take advantage of the Merkle hash tree over the blocks, by
|
||||
* We actually take advantage of the Merkle hash tree over the blocks, by
|
||||
reading a single segment of data at a time (and its necessary hashes), to
|
||||
reduce the read-time alacrity
|
||||
* we allow arbitrary writes to the file (i.e. seek() is provided, and
|
||||
O_TRUNC is no longer required)
|
||||
* we write more code on the client side (in the MutableFileNode class), to
|
||||
first read each segment that a write must modify. This looks exactly like
|
||||
the way a normal filesystem uses a block device, or how a CPU must perform
|
||||
a cache-line fill before modifying a single word.
|
||||
* we might implement some sort of copy-based atomic update server call,
|
||||
reduce the read-time alacrity.
|
||||
* We allow arbitrary writes to any range of the file.
|
||||
* We add more code to first read each segment that a write must modify.
|
||||
This looks exactly like the way a normal filesystem uses a block device,
|
||||
or how a CPU must perform a cache-line fill before modifying a single word.
|
||||
* We might implement some sort of copy-based atomic update server call,
|
||||
to allow multiple writev() calls to appear atomic to any readers.
|
||||
|
||||
MDMF slots provide fairly efficient in-place edits of very large files (a few
|
||||
GB). Appending data is also fairly efficient, although each time a power of 2
|
||||
boundary is crossed, the entire file must effectively be re-uploaded (because
|
||||
the size of the block hash tree changes), so if the filesize is known in
|
||||
advance, that space ought to be pre-allocated (by leaving extra space between
|
||||
the block hash tree and the actual data).
|
||||
GB). Appending data is also fairly efficient.
|
||||
|
||||
MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
|
||||
adds cache-line reads to allow random-access writes.
|
||||
|
||||
Large Distributed Mutable Files
|
||||
===============================
|
||||
|
||||
LDMF slots use a fundamentally different way to store the file, inspired by
|
||||
Mercurial's "revlog" format. They enable very efficient insert/remove/replace
|
||||
editing of arbitrary spans. Multiple versions of the file can be retained, in
|
||||
a revision graph that can have multiple heads. Each revision can be
|
||||
referenced by a cryptographic identifier. There are two forms of the URI, one
|
||||
that means "most recent version", and a longer one that points to a specific
|
||||
revision.
|
||||
LDMF slots (not implemented) would use a fundamentally different way to store
|
||||
the file, inspired by Mercurial's "revlog" format. This would enable very
|
||||
efficient insert/remove/replace editing of arbitrary spans. Multiple versions
|
||||
of the file can be retained, in a revision graph that can have multiple heads.
|
||||
Each revision can be referenced by a cryptographic identifier. There are two
|
||||
forms of the URI, one that means "most recent version", and a longer one that
|
||||
points to a specific revision.
|
||||
|
||||
Metadata can be attached to the revisions, like timestamps, to enable rolling
|
||||
back an entire tree to a specific point in history.
|
||||
@ -736,6 +737,7 @@ back an entire tree to a specific point in history.
|
||||
LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
|
||||
provides explicit support for revision identifiers and branching.
|
||||
|
||||
|
||||
TODO
|
||||
====
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user