2010-02-02 00:59:14 +00:00
|
|
|
= Performance costs for some common operations =
|
|
|
|
|
|
|
|
=== Publishing an A-byte immutable file ===
|
|
|
|
|
|
|
|
cost: O(A)
|
|
|
|
|
2010-02-02 05:27:50 +00:00
|
|
|
notes: An immutable file upload requires an additional I/O pass over the entire
|
|
|
|
source file before the upload process can start, since convergent
|
|
|
|
encryption derives the encryption key in part from the contents of the
|
|
|
|
source file.
|
2010-02-02 00:59:14 +00:00
|
|
|
|
|
|
|
=== Publishing an A-byte mutable file ===
|
|
|
|
|
|
|
|
cost: O(A) + a large constant for RSA + memory usage.
|
|
|
|
|
|
|
|
notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that
|
|
|
|
it publishes to a grid. This takes up to 1 or 2 seconds on a
|
|
|
|
typical desktop PC.
|
|
|
|
|
|
|
|
Part of the process of encrypting, encoding, and uploading a
|
|
|
|
mutable file to a Tahoe-LAFS grid requires that the entire file
|
|
|
|
be in memory at once. For larger files, this may cause
|
|
|
|
Tahoe-LAFS to have an unacceptably large memory footprint (at
|
|
|
|
least when uploading a mutable file).
|
|
|
|
|
|
|
|
=== Downloading B bytes of an A-byte immutable file ===
|
|
|
|
|
|
|
|
time/cost until the read is satisfied: variable; up to O(A).
|
|
|
|
cost of the entire operation: O(A) if the file isn't cached.
|
|
|
|
|
|
|
|
notes: When asked to read an arbitrary range of an immutable file,
|
|
|
|
Tahoe-LAFS will download from the beginning of the file up until
|
|
|
|
it has enough of the file to satisfy the requested read.
|
|
|
|
Depending on where in the file the requested range is, this can
|
|
|
|
mean that the entire file is downloaded before the request is
|
|
|
|
satisfied. Tahoe-LAFS will continue to download the rest of the
|
|
|
|
file even after the request is satisfied, so in any case where the
|
|
|
|
file actually has to downloaded from the grid, reading part of an
|
|
|
|
immutable file will result in downloading all of the immutable
|
|
|
|
file. Ticket #798 is a proposal to change this behavior.
|
2010-02-02 05:27:50 +00:00
|
|
|
|
2010-02-02 00:59:14 +00:00
|
|
|
Tahoe-LAFS will cache files that are read in this manner for a
|
|
|
|
short while, so subsequent reads of the same file may be served
|
|
|
|
entirely from cache, depending on what part of the file they need
|
|
|
|
to read, what part of the file was read by previous reads, and
|
|
|
|
how much time has elapsed since the last read.
|
|
|
|
|
2010-02-02 05:27:50 +00:00
|
|
|
=== Downloading B bytes of an A-byte mutable file ===
|
2010-02-02 00:59:14 +00:00
|
|
|
|
|
|
|
cost: O(A)
|
|
|
|
|
|
|
|
notes: As currently implemented, mutable files must be downloaded in
|
|
|
|
their entirety before any part of them can be read. We are
|
|
|
|
exploring fixes for this; see ticket #393 for more information.
|
|
|
|
|
|
|
|
=== Modifying B bytes of an A-byte mutable file ===
|
|
|
|
|
|
|
|
cost: O(A)
|
|
|
|
|
|
|
|
notes: If you upload a changed version of a mutable file that you
|
|
|
|
earlier put onto your grid with, say, 'tahoe put --mutable',
|
|
|
|
Tahoe-LAFS will replace the old file with the new file on the
|
|
|
|
grid, rather than attempting to modify only those portions of the
|
|
|
|
file that have changed. Modifying a file in this manner is
|
|
|
|
essentially uploading the file over again, except that it re-uses
|
|
|
|
the existing RSA keypair instead of generating a new one.
|
|
|
|
|
|
|
|
=== Adding/Removing B bytes in an A-byte mutable file ===
|
|
|
|
|
|
|
|
cost: O(A)
|
|
|
|
|
|
|
|
notes: Modifying any part of a mutable file in Tahoe-LAFS requires that
|
|
|
|
the entire file be downloaded, modified, held in memory while it
|
|
|
|
is encrypted and encoded, and then re-uploaded. Note that this
|
|
|
|
sort of modification is mostly used internally for directories,
|
|
|
|
and isn't something that the WUI, CLI, or other interfaces will
|
|
|
|
do -- instead, they will simply overwrite the file to be
|
|
|
|
modified, as described in "Modifying B bytes of an A-byte mutable
|
|
|
|
file".
|
|
|
|
|
|
|
|
=== Adding an entry to an A-entry directory ===
|
|
|
|
|
|
|
|
cost: O(A) (roughly)
|
|
|
|
notes: In Tahoe-LAFS, directories are implemented as specialized mutable
|
|
|
|
files. So adding an entry to a directory is essentially adding B
|
|
|
|
(actually, 300-330) bytes somewhere in an existing mutable file.
|
|
|
|
|
|
|
|
=== Listing an A entry directory ===
|
|
|
|
|
2010-02-02 05:27:50 +00:00
|
|
|
cost: O(A)
|
2010-02-02 00:59:14 +00:00
|
|
|
|
|
|
|
notes: Listing a directory requires that the mutable file storing the
|
|
|
|
directory be downloaded from the grid. So listing an A entry
|
|
|
|
directory requires downloading a (roughly) 330 * A byte mutable
|
|
|
|
file, since each directory entry is about 300-330 bytes in size.
|
|
|
|
|
|
|
|
=== Checking an A-byte file ===
|
|
|
|
|
|
|
|
cost: variable; between O(N) and O(S), where N is the number of shares
|
|
|
|
generated when the file was initially uploaded, and S is the
|
|
|
|
number of servers on your grid.
|
|
|
|
|
|
|
|
notes: To check a file, Tahoe-LAFS queries the servers that it knows
|
|
|
|
about until it either runs out of servers, or finds all of the
|
|
|
|
shares that were originally uploaded. Note that neither of these
|
|
|
|
values directly depend on the size of the file. This is
|
|
|
|
relatively inexpensive, compared to the verify and repair
|
|
|
|
operations.
|
|
|
|
|
|
|
|
=== Verifying an A-byte file ===
|
|
|
|
|
|
|
|
cost: O(A)
|
|
|
|
|
|
|
|
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
|
|
|
|
shares that were originally uploaded to the grid and integrity
|
|
|
|
checks them. This is, for well-behaved grids, likely to be more
|
|
|
|
expensive than downloading an A-byte file, since only a fraction
|
|
|
|
of these shares are necessary to recover the file.
|
|
|
|
|
|
|
|
=== Repairing an A-byte file (mutable or immutable) ===
|
|
|
|
|
|
|
|
cost: variable; up to around O(A)
|
|
|
|
|
|
|
|
notes: To repair a file, Tahoe-LAFS generates and uploads missing shares
|
|
|
|
in the same way as when it initially uploads the file. So,
|
|
|
|
depending on how many shares are missing, this can be about as
|
|
|
|
expensive as initially uploading the file in the first place.
|