mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-19 03:06:33 +00:00
c4f8376a20
80-cols, remove spurious whitespace. Add rst2html.py rule to Makefile.
298 lines
14 KiB
ReStructuredText
298 lines
14 KiB
ReStructuredText
===========================
|
|
Garbage Collection in Tahoe
|
|
===========================
|
|
|
|
1. `Overview`_
|
|
2. `Client-side Renewal`_
|
|
3. `Server Side Expiration`_
|
|
4. `Expiration Progress`_
|
|
5. `Future Directions`_
|
|
|
|
Overview
|
|
========
|
|
|
|
When a file or directory in the virtual filesystem is no longer referenced,
|
|
the space that its shares occupied on each storage server can be freed,
|
|
making room for other shares. Tahoe currently uses a garbage collection
|
|
("GC") mechanism to implement this space-reclamation process. Each share has
|
|
one or more "leases", which are managed by clients who want the
|
|
file/directory to be retained. The storage server accepts each share for a
|
|
pre-defined period of time, and is allowed to delete the share if all of the
|
|
leases are cancelled or allowed to expire.
|
|
|
|
Garbage collection is not enabled by default: storage servers will not delete
|
|
shares without being explicitly configured to do so. When GC is enabled,
|
|
clients are responsible for renewing their leases on a periodic basis at
|
|
least frequently enough to prevent any of the leases from expiring before the
|
|
next renewal pass.
|
|
|
|
There are several tradeoffs to be considered when choosing the renewal timer
|
|
and the lease duration, and there is no single optimal pair of values. See
|
|
the "lease-tradeoffs.svg" diagram to get an idea for the tradeoffs involved.
|
|
If lease renewal occurs quickly and with 100% reliability, than any renewal
|
|
time that is shorter than the lease duration will suffice, but a larger ratio
|
|
of duration-over-renewal-time will be more robust in the face of occasional
|
|
delays or failures.
|
|
|
|
The current recommended values for a small Tahoe grid are to renew the leases
|
|
once a week, and to give each lease a duration of 31 days. Renewing leases
|
|
can be expected to take about one second per file/directory, depending upon
|
|
the number of servers and the network speeds involved. Note that in the
|
|
current release, the server code enforces a 31 day lease duration: there is
|
|
not yet a way for the client to request a different duration (however the
|
|
server can use the "expire.override_lease_duration" configuration setting to
|
|
increase or decrease the effective duration to something other than 31 days).
|
|
|
|
Client-side Renewal
|
|
===================
|
|
|
|
If all of the files and directories which you care about are reachable from a
|
|
single starting point (usually referred to as a "rootcap"), and you store
|
|
that rootcap as an alias (via "tahoe create-alias"), then the simplest way to
|
|
renew these leases is with the following CLI command::
|
|
|
|
tahoe deep-check --add-lease ALIAS:
|
|
|
|
This will recursively walk every directory under the given alias and renew
|
|
the leases on all files and directories. (You may want to add a ``--repair``
|
|
flag to perform repair at the same time). Simply run this command once a week
|
|
(or whatever other renewal period your grid recommends) and make sure it
|
|
completes successfully. As a side effect, a manifest of all unique files and
|
|
directories will be emitted to stdout, as well as a summary of file sizes and
|
|
counts. It may be useful to track these statistics over time.
|
|
|
|
Note that newly uploaded files (and newly created directories) get an initial
|
|
lease too: the ``--add-lease`` process is only needed to ensure that all
|
|
older objects have up-to-date leases on them.
|
|
|
|
For larger systems (such as a commercial grid), a separate "maintenance
|
|
daemon" is under development. This daemon will acquire manifests from
|
|
rootcaps on a periodic basis, keep track of checker results, manage
|
|
lease-addition, and prioritize repair needs, using multiple worker nodes to
|
|
perform these jobs in parallel. Eventually, this daemon will be made
|
|
appropriate for use by individual users as well, and may be incorporated
|
|
directly into the client node.
|
|
|
|
Server Side Expiration
|
|
======================
|
|
|
|
Expiration must be explicitly enabled on each storage server, since the
|
|
default behavior is to never expire shares. Expiration is enabled by adding
|
|
config keys to the "[storage]" section of the tahoe.cfg file (as described
|
|
below) and restarting the server node.
|
|
|
|
Each lease has two parameters: a create/renew timestamp and a duration. The
|
|
timestamp is updated when the share is first uploaded (i.e. the file or
|
|
directory is created), and updated again each time the lease is renewed (i.e.
|
|
"``tahoe check --add-lease``" is performed). The duration is currently fixed
|
|
at 31 days, and the "nominal lease expiration time" is simply $duration
|
|
seconds after the $create_renew timestamp. (In a future release of Tahoe, the
|
|
client will get to request a specific duration, and the server will accept or
|
|
reject the request depending upon its local configuration, so that servers
|
|
can achieve better control over their storage obligations).
|
|
|
|
The lease-expiration code has two modes of operation. The first is age-based:
|
|
leases are expired when their age is greater than their duration. This is the
|
|
preferred mode: as long as clients consistently update their leases on a
|
|
periodic basis, and that period is shorter than the lease duration, then all
|
|
active files and directories will be preserved, and the garbage will
|
|
collected in a timely fashion.
|
|
|
|
Since there is not yet a way for clients to request a lease duration of other
|
|
than 31 days, there is a tahoe.cfg setting to override the duration of all
|
|
leases. If, for example, this alternative duration is set to 60 days, then
|
|
clients could safely renew their leases with an add-lease operation perhaps
|
|
once every 50 days: even though nominally their leases would expire 31 days
|
|
after the renewal, the server would not actually expire the leases until 60
|
|
days after renewal.
|
|
|
|
The other mode is an absolute-date-cutoff: it compares the create/renew
|
|
timestamp against some absolute date, and expires any lease which was not
|
|
created or renewed since the cutoff date. If all clients have performed an
|
|
add-lease some time after March 20th, you could tell the storage server to
|
|
expire all leases that were created or last renewed on March 19th or earlier.
|
|
This is most useful if you have a manual (non-periodic) add-lease process.
|
|
Note that there is not much point to running a storage server in this mode
|
|
for a long period of time: once the lease-checker has examined all shares and
|
|
expired whatever it is going to expire, the second and subsequent passes are
|
|
not going to find any new leases to remove.
|
|
|
|
The tahoe.cfg file uses the following keys to control lease expiration::
|
|
|
|
[storage]
|
|
|
|
expire.enabled = (boolean, optional)
|
|
|
|
If this is True, the storage server will delete shares on which all
|
|
leases have expired. Other controls dictate when leases are considered to
|
|
have expired. The default is False.
|
|
|
|
expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)
|
|
|
|
If this string is "age", the age-based expiration scheme is used, and the
|
|
"expire.override_lease_duration" setting can be provided to influence the
|
|
lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is
|
|
used, and the "expire.cutoff_date" setting must be provided to specify
|
|
the cutoff date. The mode setting currently has no default: you must
|
|
provide a value.
|
|
|
|
In a future release, this setting is likely to default to "age", but in
|
|
this release it was deemed safer to require an explicit mode
|
|
specification.
|
|
|
|
expire.override_lease_duration = (duration string, optional)
|
|
|
|
When age-based expiration is in use, a lease will be expired if its
|
|
"lease.create_renew" timestamp plus its "lease.duration" time is
|
|
earlier/older than the current time. This key, if present, overrides the
|
|
duration value for all leases, changing the algorithm from:
|
|
|
|
if (lease.create_renew_timestamp + lease.duration) < now:
|
|
expire_lease()
|
|
|
|
to:
|
|
|
|
if (lease.create_renew_timestamp + override_lease_duration) < now:
|
|
expire_lease()
|
|
|
|
The value of this setting is a "duration string", which is a number of
|
|
days, months, or years, followed by a units suffix, and optionally
|
|
separated by a space, such as one of the following:
|
|
|
|
7days
|
|
31day
|
|
60 days
|
|
2mo
|
|
3 month
|
|
12 months
|
|
2years
|
|
|
|
This key is meant to compensate for the fact that clients do not yet have
|
|
the ability to ask for leases that last longer than 31 days. A grid which
|
|
wants to use faster or slower GC than a 31-day lease timer permits can
|
|
use this parameter to implement it. The current fixed 31-day lease
|
|
duration makes the server behave as if "lease.override_lease_duration =
|
|
31days" had been passed.
|
|
|
|
This key is only valid when age-based expiration is in use (i.e. when
|
|
"expire.mode = age" is used). It will be rejected if cutoff-date
|
|
expiration is in use.
|
|
|
|
expire.cutoff_date = (date string, required if mode=cutoff-date)
|
|
|
|
When cutoff-date expiration is in use, a lease will be expired if its
|
|
create/renew timestamp is older than the cutoff date. This string will be
|
|
a date in the following format:
|
|
|
|
2009-01-16 (January 16th, 2009)
|
|
2008-02-02
|
|
2007-12-25
|
|
|
|
The actual cutoff time shall be midnight UTC at the beginning of the
|
|
given day. Lease timers should naturally be generous enough to not depend
|
|
upon differences in timezone: there should be at least a few days between
|
|
the last renewal time and the cutoff date.
|
|
|
|
This key is only valid when cutoff-based expiration is in use (i.e. when
|
|
"expire.mode = cutoff-date"). It will be rejected if age-based expiration
|
|
is in use.
|
|
|
|
expire.immutable = (boolean, optional)
|
|
|
|
If this is False, then immutable shares will never be deleted, even if
|
|
their leases have expired. This can be used in special situations to
|
|
perform GC on mutable files but not immutable ones. The default is True.
|
|
|
|
expire.mutable = (boolean, optional)
|
|
|
|
If this is False, then mutable shares will never be deleted, even if
|
|
their leases have expired. This can be used in special situations to
|
|
perform GC on immutable files but not mutable ones. The default is True.
|
|
|
|
Expiration Progress
|
|
===================
|
|
|
|
In the current release, leases are stored as metadata in each share file, and
|
|
no separate database is maintained. As a result, checking and expiring leases
|
|
on a large server may require multiple reads from each of several million
|
|
share files. This process can take a long time and be very disk-intensive, so
|
|
a "share crawler" is used. The crawler limits the amount of time looking at
|
|
shares to a reasonable percentage of the storage server's overall usage: by
|
|
default it uses no more than 10% CPU, and yields to other code after 100ms. A
|
|
typical server with 1.1M shares was observed to take 3.5 days to perform this
|
|
rate-limited crawl through the whole set of shares, with expiration disabled.
|
|
It is expected to take perhaps 4 or 5 days to do the crawl with expiration
|
|
turned on.
|
|
|
|
The crawler's status is displayed on the "Storage Server Status Page", a web
|
|
page dedicated to the storage server. This page resides at $NODEURL/storage,
|
|
and there is a link to it from the front "welcome" page. The "Lease
|
|
Expiration crawler" section of the status page shows the progress of the
|
|
current crawler cycle, expected completion time, amount of space recovered,
|
|
and details of how many shares have been examined.
|
|
|
|
The crawler's state is persistent: restarting the node will not cause it to
|
|
lose significant progress. The state file is located in two files
|
|
($BASEDIR/storage/lease_checker.state and lease_checker.history), and the
|
|
crawler can be forcibly reset by stopping the node, deleting these two files,
|
|
then restarting the node.
|
|
|
|
Future Directions
|
|
=================
|
|
|
|
Tahoe's GC mechanism is undergoing significant changes. The global
|
|
mark-and-sweep garbage-collection scheme can require considerable network
|
|
traffic for large grids, interfering with the bandwidth available for regular
|
|
uploads and downloads (and for non-Tahoe users of the network).
|
|
|
|
A preferable method might be to have a timer-per-client instead of a
|
|
timer-per-lease: the leases would not be expired until/unless the client had
|
|
not checked in with the server for a pre-determined duration. This would
|
|
reduce the network traffic considerably (one message per week instead of
|
|
thousands), but retain the same general failure characteristics.
|
|
|
|
In addition, using timers is not fail-safe (from the client's point of view),
|
|
in that a client which leaves the network for an extended period of time may
|
|
return to discover that all of their files have been garbage-collected. (It
|
|
*is* fail-safe from the server's point of view, in that a server is not
|
|
obligated to provide disk space in perpetuity to an unresponsive client). It
|
|
may be useful to create a "renewal agent" to which a client can pass a list
|
|
of renewal-caps: the agent then takes the responsibility for keeping these
|
|
leases renewed, so the client can go offline safely. Of course, this requires
|
|
a certain amount of coordination: the renewal agent should not be keeping
|
|
files alive that the client has actually deleted. The client can send the
|
|
renewal-agent a manifest of renewal caps, and each new manifest should
|
|
replace the previous set.
|
|
|
|
The GC mechanism is also not immediate: a client which deletes a file will
|
|
nevertheless be consuming extra disk space (and might be charged or otherwise
|
|
held accountable for it) until the ex-file's leases finally expire on their
|
|
own. If the client is certain that they've removed their last reference to
|
|
the file, they could accelerate the GC process by cancelling their lease. The
|
|
current storage server API provides a method to cancel a lease, but the
|
|
client must be careful to coordinate with anyone else who might be
|
|
referencing the same lease (perhaps a second directory in the same virtual
|
|
drive), otherwise they might accidentally remove a lease that should have
|
|
been retained.
|
|
|
|
In the current release, these leases are each associated with a single "node
|
|
secret" (stored in $BASEDIR/private/secret), which is used to generate
|
|
renewal- and cancel- secrets for each lease. Two nodes with different secrets
|
|
will produce separate leases, and will not be able to renew or cancel each
|
|
others' leases.
|
|
|
|
Once the Accounting project is in place, leases will be scoped by a
|
|
sub-delegatable "account id" instead of a node secret, so clients will be able
|
|
to manage multiple leases per file. In addition, servers will be able to
|
|
identify which shares are leased by which clients, so that clients can safely
|
|
reconcile their idea of which files/directories are active against the
|
|
server's list, and explicitly cancel leases on objects that aren't on the
|
|
active list.
|
|
|
|
By reducing the size of the "lease scope", the coordination problem is made
|
|
easier. In general, mark-and-sweep is easier to implement (it requires mere
|
|
vigilance, rather than coordination), so unless the space used by deleted
|
|
files is not expiring fast enough, the renew/expire timed lease approach is
|
|
recommended.
|
|
|