2010-02-01 16:59:14 -08:00
|
|
|
= Performance costs for some common operations =
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
1. Publishing an A-byte immutable file
|
|
|
|
2. Publishing an A-byte mutable file
|
|
|
|
3. Downloading B bytes of an A-byte immutable file
|
|
|
|
4. Downloading B bytes of an A-byte mutable file
|
|
|
|
5. Modifying B bytes of an A-byte mutable file
|
|
|
|
6. Inserting/Removing B bytes in an A-byte mutable file
|
|
|
|
7. Adding an entry to an A-entry directory
|
|
|
|
8. Listing an A entry directory
|
|
|
|
9. Performing a file-check on an A-byte file
|
|
|
|
10. Performing a file-verify on an A-byte file
|
|
|
|
11. Repairing an A-byte file (mutable or immutable)
|
|
|
|
|
|
|
|
== Publishing an A-byte immutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
|
|
|
memory footprint: N/k*128KiB
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-01 21:27:50 -08:00
|
|
|
notes: An immutable file upload requires an additional I/O pass over the entire
|
|
|
|
source file before the upload process can start, since convergent
|
|
|
|
encryption derives the encryption key in part from the contents of the
|
|
|
|
source file.
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Publishing an A-byte mutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
|
|
|
memory footprint: N/k*A
|
|
|
|
cpu: O(A) + a large constant for RSA keypair generation
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that
|
|
|
|
it publishes to a grid. This takes up to 1 or 2 seconds on a
|
|
|
|
typical desktop PC.
|
|
|
|
|
|
|
|
Part of the process of encrypting, encoding, and uploading a
|
|
|
|
mutable file to a Tahoe-LAFS grid requires that the entire file
|
|
|
|
be in memory at once. For larger files, this may cause
|
|
|
|
Tahoe-LAFS to have an unacceptably large memory footprint (at
|
|
|
|
least when uploading a mutable file).
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Downloading B bytes of an A-byte immutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
|
|
|
memory footprint: 128KiB
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: When asked to read an arbitrary range of an immutable file,
|
|
|
|
Tahoe-LAFS will download from the beginning of the file up until
|
|
|
|
it has enough of the file to satisfy the requested read.
|
|
|
|
Depending on where in the file the requested range is, this can
|
|
|
|
mean that the entire file is downloaded before the request is
|
|
|
|
satisfied. Tahoe-LAFS will continue to download the rest of the
|
|
|
|
file even after the request is satisfied, so in any case where the
|
|
|
|
file actually has to downloaded from the grid, reading part of an
|
|
|
|
immutable file will result in downloading all of the immutable
|
|
|
|
file. Ticket #798 is a proposal to change this behavior.
|
2010-02-01 21:27:50 -08:00
|
|
|
|
2010-02-01 16:59:14 -08:00
|
|
|
Tahoe-LAFS will cache files that are read in this manner for a
|
|
|
|
short while, so subsequent reads of the same file may be served
|
|
|
|
entirely from cache, depending on what part of the file they need
|
|
|
|
to read, what part of the file was read by previous reads, and
|
|
|
|
how much time has elapsed since the last read.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Downloading B bytes of an A-byte mutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
2010-02-27 22:14:39 -08:00
|
|
|
memory footprint: A
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: As currently implemented, mutable files must be downloaded in
|
|
|
|
their entirety before any part of them can be read. We are
|
|
|
|
exploring fixes for this; see ticket #393 for more information.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Modifying B bytes of an A-byte mutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
|
|
|
memory footprint: N/k*A
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: If you upload a changed version of a mutable file that you
|
|
|
|
earlier put onto your grid with, say, 'tahoe put --mutable',
|
|
|
|
Tahoe-LAFS will replace the old file with the new file on the
|
|
|
|
grid, rather than attempting to modify only those portions of the
|
|
|
|
file that have changed. Modifying a file in this manner is
|
|
|
|
essentially uploading the file over again, except that it re-uses
|
|
|
|
the existing RSA keypair instead of generating a new one.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Inserting/Removing B bytes in an A-byte mutable file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: A
|
|
|
|
memory footprint: N/k*A
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: Modifying any part of a mutable file in Tahoe-LAFS requires that
|
2010-02-23 23:38:13 -05:00
|
|
|
the entire file be downloaded, modified, held in memory while it is
|
|
|
|
encrypted and encoded, and then re-uploaded. A future version of the
|
|
|
|
mutable file layout ("LDMF") may provide efficient inserts and
|
|
|
|
deletes. Note that this sort of modification is mostly used internally
|
|
|
|
for directories, and isn't something that the WUI, CLI, or other
|
|
|
|
interfaces will do -- instead, they will simply overwrite the file to
|
|
|
|
be modified, as described in "Modifying B bytes of an A-byte mutable
|
2010-02-01 16:59:14 -08:00
|
|
|
file".
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Adding an entry to an A-entry directory ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: O(A)
|
|
|
|
memory footprint: N/k*A
|
|
|
|
|
2010-02-01 16:59:14 -08:00
|
|
|
notes: In Tahoe-LAFS, directories are implemented as specialized mutable
|
|
|
|
files. So adding an entry to a directory is essentially adding B
|
|
|
|
(actually, 300-330) bytes somewhere in an existing mutable file.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Listing an A entry directory ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: O(A)
|
|
|
|
memory footprint: N/k*A
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: Listing a directory requires that the mutable file storing the
|
|
|
|
directory be downloaded from the grid. So listing an A entry
|
|
|
|
directory requires downloading a (roughly) 330 * A byte mutable
|
|
|
|
file, since each directory entry is about 300-330 bytes in size.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Performing a file-check on an A-byte file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: O(S), where S is the number of servers on your grid
|
|
|
|
memory footprint: negligible
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
notes: To check a file, Tahoe-LAFS queries all the servers that it knows
|
|
|
|
about. Note that neither of these values directly depend on the size
|
|
|
|
of the file. This is relatively inexpensive, compared to the verify
|
|
|
|
and repair operations.
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Performing a file-verify on an A-byte file ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: N/k*A
|
|
|
|
memory footprint: N/k*128KiB
|
2010-02-01 16:59:14 -08:00
|
|
|
|
|
|
|
notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext
|
|
|
|
shares that were originally uploaded to the grid and integrity
|
|
|
|
checks them. This is, for well-behaved grids, likely to be more
|
|
|
|
expensive than downloading an A-byte file, since only a fraction
|
|
|
|
of these shares are necessary to recover the file.
|
|
|
|
|
2010-04-24 04:44:44 -07:00
|
|
|
== Repairing an A-byte file (mutable or immutable) ==
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
network: variable; up to around O(A)
|
|
|
|
memory footprint: from 128KiB to (1+N/k)*128KiB
|
2010-02-01 16:59:14 -08:00
|
|
|
|
2010-02-23 23:38:13 -05:00
|
|
|
notes: To repair a file, Tahoe-LAFS downloads the file, and generates/uploads
|
|
|
|
missing shares in the same way as when it initially uploads the file.
|
|
|
|
So, depending on how many shares are missing, this can be about as
|
2010-02-01 16:59:14 -08:00
|
|
|
expensive as initially uploading the file in the first place.
|