mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-02-03 01:30:58 +00:00
docs: more formatting cleanups and corrections. Spell webapi and wapi as web-API.
This commit is contained in:
parent
7be2c73f08
commit
45dd8b910a
@ -14,9 +14,8 @@ To speed up backup operations, Tahoe maintains a small database known as the
|
||||
uploaded recently.
|
||||
|
||||
This database lives in ``~/.tahoe/private/backupdb.sqlite``, and is a SQLite
|
||||
single-file database. It is used by the "tahoe backup" command. In the
|
||||
future, it will also be used by "tahoe mirror", and by "tahoe cp" when the
|
||||
``--use-backupdb`` option is included.
|
||||
single-file database. It is used by the "``tahoe backup``" command. In the
|
||||
future, it may optionally be used by other commands such as "``tahoe cp``".
|
||||
|
||||
The purpose of this database is twofold: to manage the file-to-cap
|
||||
translation (the "upload" step) and the directory-to-cap translation (the
|
||||
@ -24,7 +23,7 @@ translation (the "upload" step) and the directory-to-cap translation (the
|
||||
|
||||
The overall goal of optimizing backup is to reduce the work required when the
|
||||
source disk has not changed (much) since the last backup. In the ideal case,
|
||||
running "tahoe backup" twice in a row, with no intervening changes to the
|
||||
running "``tahoe backup``" twice in a row, with no intervening changes to the
|
||||
disk, will not require any network traffic. Minimal changes to the source
|
||||
disk should result in minimal traffic.
|
||||
|
||||
@ -32,12 +31,12 @@ This database is optional. If it is deleted, the worst effect is that a
|
||||
subsequent backup operation may use more effort (network bandwidth, CPU
|
||||
cycles, and disk IO) than it would have without the backupdb.
|
||||
|
||||
The database uses sqlite3, which is included as part of the standard python
|
||||
library with python2.5 and later. For python2.4, Tahoe will try to install the
|
||||
The database uses sqlite3, which is included as part of the standard Python
|
||||
library with Python 2.5 and later. For Python 2.4, Tahoe will try to install the
|
||||
"pysqlite" package at build-time, but this will succeed only if sqlite3 with
|
||||
development headers is already installed. On Debian and Debian derivatives
|
||||
you can install the "python-pysqlite2" package (which, despite the name,
|
||||
actually provides sqlite3 rather than sqlite2), but on old distributions such
|
||||
actually provides sqlite3 rather than sqlite2). On old distributions such
|
||||
as Debian etch (4.0 "oldstable") or Ubuntu Edgy (6.10) the "python-pysqlite2"
|
||||
package won't work, but the "sqlite3-dev" package will.
|
||||
|
||||
@ -84,11 +83,11 @@ The database contains the following tables::
|
||||
Upload Operation
|
||||
================
|
||||
|
||||
The upload process starts with a pathname (like ~/.emacs) and wants to end up
|
||||
with a file-cap (like URI:CHK:...).
|
||||
The upload process starts with a pathname (like ``~/.emacs``) and wants to end up
|
||||
with a file-cap (like ``URI:CHK:...``).
|
||||
|
||||
The first step is to convert the path to an absolute form
|
||||
(/home/warner/.emacs) and do a lookup in the local_files table. If the path
|
||||
(``/home/warner/.emacs``) and do a lookup in the local_files table. If the path
|
||||
is not present in this table, the file must be uploaded. The upload process
|
||||
is:
|
||||
|
||||
@ -150,8 +149,8 @@ checked and found healthy, the 'last_upload' entry is updated.
|
||||
|
||||
Relying upon timestamps is a compromise between efficiency and safety: a file
|
||||
which is modified without changing the timestamp or size will be treated as
|
||||
unmodified, and the "tahoe backup" command will not copy the new contents
|
||||
into the grid. The ``--no-timestamps`` can be used to disable this
|
||||
unmodified, and the "``tahoe backup``" command will not copy the new contents
|
||||
into the grid. The ``--no-timestamps`` option can be used to disable this
|
||||
optimization, forcing every byte of the file to be hashed and encoded.
|
||||
|
||||
Directory Operations
|
||||
@ -162,17 +161,17 @@ dircap for each directory), the backup process must find or create a tahoe
|
||||
directory node with the same contents. The contents are hashed, and the hash
|
||||
is queried in the 'directories' table. If found, the last-checked timestamp
|
||||
is used to perform the same random-early-check algorithm described for files
|
||||
above, but no new upload is performed. Since "tahoe backup" creates immutable
|
||||
above, but no new upload is performed. Since "``tahoe backup``" creates immutable
|
||||
directories, it is perfectly safe to re-use a directory from a previous
|
||||
backup.
|
||||
|
||||
If not found, the webapi "mkdir-immutable" operation is used to create a new
|
||||
If not found, the web-API "mkdir-immutable" operation is used to create a new
|
||||
directory, and an entry is stored in the table.
|
||||
|
||||
The comparison operation ignores timestamps and metadata, and pays attention
|
||||
solely to the file names and contents.
|
||||
|
||||
By using a directory-contents hash, the "tahoe backup" command is able to
|
||||
By using a directory-contents hash, the "``tahoe backup``" command is able to
|
||||
re-use directories from other places in the backed up data, or from old
|
||||
backups. This means that renaming a directory and moving a subdirectory to a
|
||||
new parent both count as "minor changes" and will result in minimal Tahoe
|
||||
@ -184,7 +183,7 @@ directories from backup #1.
|
||||
|
||||
The best case is a null backup, in which nothing has changed. This will
|
||||
result in minimal network bandwidth: one directory read and two modifies. The
|
||||
Archives/ directory must be read to locate the latest backup, and must be
|
||||
modified to add a new snapshot, and the Latest/ directory will be updated to
|
||||
``Archives/`` directory must be read to locate the latest backup, and must be
|
||||
modified to add a new snapshot, and the ``Latest/`` directory will be updated to
|
||||
point to that same snapshot.
|
||||
|
||||
|
@ -149,7 +149,7 @@ set the ``tub.location`` option described below.
|
||||
tub.port = 8098
|
||||
tub.location = external-firewall.example.com:7912
|
||||
|
||||
* Run a node behind a Tor proxy (perhaps via torsocks), in client-only
|
||||
* Run a node behind a Tor proxy (perhaps via ``torsocks``), in client-only
|
||||
mode (i.e. we can make outbound connections, but other nodes will not
|
||||
be able to connect to us). The literal '``unreachable.example.org``' will
|
||||
not resolve, but will serve as a reminder to human observers that this
|
||||
@ -186,7 +186,7 @@ set the ``tub.location`` option described below.
|
||||
a "log gatherer", which will be granted access to the logport. This can
|
||||
be used by centralized storage grids to gather operational logs in a
|
||||
single place. Note that when an old-style ``BASEDIR/log_gatherer.furl`` file
|
||||
exists (see 'Backwards Compatibility Files', below), both are used. (For
|
||||
exists (see `Backwards Compatibility Files`_, below), both are used. (For
|
||||
most other items, the separate config file overrides the entry in
|
||||
``tahoe.cfg``.)
|
||||
|
||||
@ -208,12 +208,12 @@ set the ``tub.location`` option described below.
|
||||
each connection to another node, if nothing has been heard for a while,
|
||||
we will drop the connection. The duration of silence that passes before
|
||||
dropping the connection will be between DT-2*KT and 2*DT+2*KT (please see
|
||||
ticket #521 for more details). If we are sending a large amount of data
|
||||
ticket `#521`_ for more details). If we are sending a large amount of data
|
||||
to the other end (which takes more than DT-2*KT to deliver), we might
|
||||
incorrectly drop the connection. The default behavior (when this value is
|
||||
not provided) is to disable the disconnect timer.
|
||||
|
||||
See ticket #521 for a discussion of how to pick these timeout values.
|
||||
See ticket `#521`_ for a discussion of how to pick these timeout values.
|
||||
Using 30 minutes means we'll disconnect after 22 to 68 minutes of
|
||||
inactivity. Receiving data will reset this timeout, however if we have
|
||||
more than 22min of data in the outbound queue (such as 800kB in two
|
||||
@ -221,6 +221,8 @@ set the ``tub.location`` option described below.
|
||||
contact us, our ping might be delayed, so we may disconnect them by
|
||||
accident.
|
||||
|
||||
.. _`#521`: http://tahoe-lafs.org/trac/tahoe-lafs/ticket/521
|
||||
|
||||
``ssh.port = (strports string, optional)``
|
||||
|
||||
``ssh.authorized_keys_file = (filename, optional)``
|
||||
@ -236,8 +238,8 @@ set the ``tub.location`` option described below.
|
||||
|
||||
``tempdir = (string, optional)``
|
||||
|
||||
This specifies a temporary directory for the webapi server to use, for
|
||||
holding large files while they are being uploaded. If a webapi client
|
||||
This specifies a temporary directory for the web-API server to use, for
|
||||
holding large files while they are being uploaded. If a web-API client
|
||||
attempts to upload a 10GB file, this tempdir will need to have at least
|
||||
10GB available for the upload to complete.
|
||||
|
||||
@ -400,10 +402,11 @@ and pays attention to the ``[node]`` section, but not the others.
|
||||
|
||||
The Introducer node maintains some different state than regular client nodes.
|
||||
|
||||
``BASEDIR/introducer.furl`` : This is generated the first time the introducer
|
||||
node is started, and used again on subsequent runs, to give the introduction
|
||||
service a persistent long-term identity. This file should be published and
|
||||
copied into new client nodes before they are started for the first time.
|
||||
``BASEDIR/introducer.furl``
|
||||
This is generated the first time the introducer node is started, and used
|
||||
again on subsequent runs, to give the introduction service a persistent
|
||||
long-term identity. This file should be published and copied into new client
|
||||
nodes before they are started for the first time.
|
||||
|
||||
|
||||
Other Files in BASEDIR
|
||||
@ -572,14 +575,17 @@ these are not the default values), merely a legal one.
|
||||
ssh.port = 8022
|
||||
ssh.authorized_keys_file = ~/.ssh/authorized_keys
|
||||
|
||||
|
||||
[client]
|
||||
introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo
|
||||
helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr
|
||||
|
||||
|
||||
[storage]
|
||||
enabled = True
|
||||
readonly_storage = True
|
||||
sizelimit = 10000000000
|
||||
|
||||
|
||||
[helper]
|
||||
run_helper = True
|
||||
|
@ -12,7 +12,7 @@ Debian Support
|
||||
Overview
|
||||
========
|
||||
|
||||
One convenient way to install Tahoe-LAFS is with debian packages.
|
||||
One convenient way to install Tahoe-LAFS is with Debian packages.
|
||||
This document attempts to explain how to complete a desert island build for
|
||||
people in a hurry. It also attempts to explain more about our Debian packaging
|
||||
for those willing to read beyond the simple pragmatic packaging exercises.
|
||||
@ -21,7 +21,7 @@ TL;DR supporting package building instructions
|
||||
==============================================
|
||||
|
||||
There are only four supporting packages that are currently not available from
|
||||
the debian apt repositories in Debian Lenny::
|
||||
the Debian apt repositories in Debian Lenny::
|
||||
|
||||
python-foolscap python-zfec argparse zbase32
|
||||
|
||||
@ -99,23 +99,23 @@ a source release, do the following::
|
||||
sudo dpkg -i ../allmydata-tahoe_1.6.1-r4262_all.deb
|
||||
|
||||
You should now have a functional desert island build of Tahoe with all of the
|
||||
supported libraries as .deb packages. You'll need to edit the Debian specific
|
||||
/etc/defaults/allmydata-tahoe file to get Tahoe started. Data is by default
|
||||
stored in /var/lib/tahoelafsd/ and Tahoe runs as the 'tahoelafsd' user.
|
||||
supported libraries as .deb packages. You'll need to edit the Debian-specific
|
||||
``/etc/defaults/allmydata-tahoe`` file to get Tahoe started. Data is by default
|
||||
stored in ``/var/lib/tahoelafsd/`` and Tahoe runs as the 'tahoelafsd' user.
|
||||
|
||||
Building Debian Packages
|
||||
========================
|
||||
|
||||
The Tahoe source tree comes with limited support for building debian packages
|
||||
The Tahoe source tree comes with limited support for building Debian packages
|
||||
on a variety of Debian and Ubuntu platforms. For each supported platform,
|
||||
there is a "deb-PLATFORM-head" target in the Makefile that will produce a
|
||||
debian package from a darcs checkout, using a version number that is derived
|
||||
Debian package from a darcs checkout, using a version number that is derived
|
||||
from the most recent darcs tag, plus the total number of revisions present in
|
||||
the tree (e.g. "1.1-r2678").
|
||||
|
||||
To create debian packages from a Tahoe tree, you will need some additional
|
||||
To create Debian packages from a Tahoe tree, you will need some additional
|
||||
tools installed. The canonical list of these packages is in the
|
||||
"Build-Depends" clause of misc/sid/debian/control , and includes::
|
||||
"Build-Depends" clause of ``misc/sid/debian/control``, and includes::
|
||||
|
||||
build-essential
|
||||
debhelper
|
||||
@ -127,13 +127,13 @@ tools installed. The canonical list of these packages is in the
|
||||
python-twisted-core
|
||||
|
||||
In addition, to use the "deb-$PLATFORM-head" target, you will also need the
|
||||
"debchange" utility from the "devscripts" package, and the "fakeroot" package.
|
||||
"``debchange``" utility from the "devscripts" package, and the "fakeroot" package.
|
||||
|
||||
Some recent platforms can be handled by using the targets for the previous
|
||||
release, for example if there is no "deb-hardy-head" target, try building
|
||||
"deb-gutsy-head" and see if the resulting package will work.
|
||||
|
||||
Note that we haven't tried to build source packages (.orig.tar.gz + dsc) yet,
|
||||
Note that we haven't tried to build source packages (``.orig.tar.gz`` + dsc) yet,
|
||||
and there are no such source packages in our APT repository.
|
||||
|
||||
Using Pre-Built Debian Packages
|
||||
@ -146,16 +146,16 @@ describes this repository.
|
||||
|
||||
The ``tahoe-lafs.org`` APT repository also includes Debian packages of support
|
||||
libraries, like Foolscap, zfec, pycryptopp, and everything else you need that
|
||||
isn't already in debian.
|
||||
isn't already in Debian.
|
||||
|
||||
Building From Source on Debian Systems
|
||||
======================================
|
||||
|
||||
Many of Tahoe's build dependencies can be satisfied by first installing
|
||||
certain debian packages: simplejson is one of these. Some debian/ubuntu
|
||||
platforms do not provide the necessary .egg-info metadata with their
|
||||
certain Debian packages: simplejson is one of these. Some Debian/Ubuntu
|
||||
platforms do not provide the necessary ``.egg-info`` metadata with their
|
||||
packages, so the Tahoe build process may not believe they are present. Some
|
||||
Tahoe dependencies are not present in most debian systems (such as foolscap
|
||||
Tahoe dependencies are not present in most Debian systems (such as foolscap
|
||||
and zfec): debs for these are made available in the APT repository described
|
||||
above.
|
||||
|
||||
@ -164,9 +164,9 @@ that it needs to run and which are not already present in the build
|
||||
environment).
|
||||
|
||||
We have observed occasional problems with this acquisition process. In some
|
||||
cases, setuptools will only be half-aware of an installed debian package,
|
||||
cases, setuptools will only be half-aware of an installed Debian package,
|
||||
just enough to interfere with the automatic download+build of the dependency.
|
||||
For example, on some platforms, if Nevow-0.9.26 is installed via a debian
|
||||
For example, on some platforms, if Nevow-0.9.26 is installed via a Debian
|
||||
package, setuptools will believe that it must download Nevow anyways, but it
|
||||
will insist upon downloading that specific 0.9.26 version. Since the current
|
||||
release of Nevow is 0.9.31, and 0.9.26 is no longer available for download,
|
||||
|
@ -21,4 +21,4 @@ automatically, but older filesystems may not have it enabled::
|
||||
|
||||
If "dir_index" is present in the "features:" line, then you're all set. If
|
||||
not, you'll need to use tune2fs and e2fsck to enable and build the index. See
|
||||
<http://wiki.dovecot.org/MailboxFormat/Maildir> for some hints.
|
||||
`<http://wiki.dovecot.org/MailboxFormat/Maildir>`_ for some hints.
|
||||
|
@ -82,7 +82,7 @@ clients.
|
||||
"key-generation" service, which allows a client to offload their RSA key
|
||||
generation to a separate process. Since RSA key generation takes several
|
||||
seconds, and must be done each time a directory is created, moving it to a
|
||||
separate process allows the first process (perhaps a busy webapi server) to
|
||||
separate process allows the first process (perhaps a busy web-API server) to
|
||||
continue servicing other requests. The key generator exports a FURL that can
|
||||
be copied into a node to enable this functionality.
|
||||
|
||||
@ -96,8 +96,8 @@ same way as "``tahoe run``".
|
||||
|
||||
"``tahoe stop [NODEDIR]``" will shut down a running node.
|
||||
|
||||
"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is
|
||||
most often used by developers who have just modified the code and want to
|
||||
"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This
|
||||
is most often used by developers who have just modified the code and want to
|
||||
start using their changes.
|
||||
|
||||
|
||||
@ -107,15 +107,15 @@ Filesystem Manipulation
|
||||
These commands let you exmaine a Tahoe-LAFS filesystem, providing basic
|
||||
list/upload/download/delete/rename/mkdir functionality. They can be used as
|
||||
primitives by other scripts. Most of these commands are fairly thin wrappers
|
||||
around webapi calls, which are described in `<webapi.rst>`_.
|
||||
around web-API calls, which are described in `<webapi.rst>`_.
|
||||
|
||||
By default, all filesystem-manipulation commands look in ``~/.tahoe/`` to figure
|
||||
out which Tahoe-LAFS node they should use. When the CLI command makes webapi
|
||||
calls, it will use ``~/.tahoe/node.url`` for this purpose: a running Tahoe-LAFS
|
||||
node that provides a webapi port will write its URL into this file. If you want
|
||||
to use a node on some other host, just create ``~/.tahoe/`` and copy that node's
|
||||
webapi URL into this file, and the CLI commands will contact that node instead
|
||||
of a local one.
|
||||
By default, all filesystem-manipulation commands look in ``~/.tahoe/`` to
|
||||
figure out which Tahoe-LAFS node they should use. When the CLI command makes
|
||||
web-API calls, it will use ``~/.tahoe/node.url`` for this purpose: a running
|
||||
Tahoe-LAFS node that provides a web-API port will write its URL into this
|
||||
file. If you want to use a node on some other host, just create ``~/.tahoe/``
|
||||
and copy that node's web-API URL into this file, and the CLI commands will
|
||||
contact that node instead of a local one.
|
||||
|
||||
These commands also use a table of "aliases" to figure out which directory
|
||||
they ought to use a starting point. This is explained in more detail below.
|
||||
@ -258,7 +258,8 @@ In these summaries, ``PATH``, ``TOPATH`` or ``FROMPATH`` can be one of::
|
||||
|
||||
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
|
||||
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
|
||||
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
|
||||
* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a
|
||||
path relative to a directory cap.
|
||||
|
||||
|
||||
Command Examples
|
||||
|
@ -37,7 +37,7 @@ All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP
|
||||
clients (like /usr/bin/ftp, ncftp, and countless others) to access the
|
||||
virtual filesystem. They can also run an SFTP server, so SFTP clients (like
|
||||
/usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
|
||||
sit at the same level as the webapi interface.
|
||||
sit at the same level as the web-API interface.
|
||||
|
||||
Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
|
||||
must be configured with a way to first authenticate a user (confirm that a
|
||||
|
@ -30,7 +30,7 @@ What's involved in a download?
|
||||
==============================
|
||||
|
||||
Downloads are triggered by read() calls, each with a starting offset (defaults
|
||||
to 0) and a length (defaults to the whole file). A regular webapi GET request
|
||||
to 0) and a length (defaults to the whole file). A regular web-API GET request
|
||||
will result in a whole-file read() call.
|
||||
|
||||
Each read() call turns into an ordered sequence of get_segment() calls. A
|
||||
|
@ -109,7 +109,7 @@ actions to upload, rename, and delete files.
|
||||
|
||||
When an error occurs, the HTTP response code will be set to an appropriate
|
||||
400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
|
||||
when the parameters to a webapi operation are invalid), and the HTTP response
|
||||
when the parameters to a web-API operation are invalid), and the HTTP response
|
||||
body will usually contain a few lines of explanation as to the cause of the
|
||||
error and possible responses. Unusual exceptions may result in a 500 Internal
|
||||
Server Error as a catch-all, with a default response body containing
|
||||
@ -231,9 +231,9 @@ contain unicode filenames, and cannot contain binary strings that are not
|
||||
representable as such.
|
||||
|
||||
All Tahoe operations that refer to existing files or directories must include
|
||||
a suitable read- or write- cap in the URL: the webapi server won't add one
|
||||
a suitable read- or write- cap in the URL: the web-API server won't add one
|
||||
for you. If you don't know the cap, you can't access the file. This allows
|
||||
the security properties of Tahoe caps to be extended across the webapi
|
||||
the security properties of Tahoe caps to be extended across the web-API
|
||||
interface.
|
||||
|
||||
Slow Operations, Progress, and Cancelling
|
||||
@ -436,22 +436,22 @@ Creating A New Directory
|
||||
}
|
||||
|
||||
For forward-compatibility, a mutable directory can also contain caps in
|
||||
a format that is unknown to the webapi server. When such caps are retrieved
|
||||
a format that is unknown to the web-API server. When such caps are retrieved
|
||||
from a mutable directory in a "ro_uri" field, they will be prefixed with
|
||||
the string "ro.", indicating that they must not be decoded without
|
||||
checking that they are read-only. The "ro." prefix must not be stripped
|
||||
off without performing this check. (Future versions of the webapi server
|
||||
off without performing this check. (Future versions of the web-API server
|
||||
will perform it where necessary.)
|
||||
|
||||
If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
|
||||
and the webapi server recognizes the rw_uri as a write cap, then it will
|
||||
and the web-API server recognizes the rw_uri as a write cap, then it will
|
||||
reset the ro_uri to the corresponding read cap and discard the original
|
||||
contents of ro_uri (in order to ensure that the two caps correspond to the
|
||||
same object and that the ro_uri is in fact read-only). However this may not
|
||||
happen for caps in a format unknown to the webapi server. Therefore, when
|
||||
writing a directory the webapi client should ensure that the contents
|
||||
happen for caps in a format unknown to the web-API server. Therefore, when
|
||||
writing a directory the web-API client should ensure that the contents
|
||||
of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
|
||||
(write cap, read cap) pair if possible. If the webapi client only has
|
||||
(write cap, read cap) pair if possible. If the web-API client only has
|
||||
one cap and does not know whether it is a write cap or read cap, then
|
||||
it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
|
||||
client must not put a write cap into a "ro_uri" field.
|
||||
@ -462,7 +462,7 @@ Creating A New Directory
|
||||
Also, if the "no-write" field is set to true in the metadata of a link to
|
||||
a mutable child, it will cause the link to be diminished to read-only.
|
||||
|
||||
Note that the webapi-using client application must not provide the
|
||||
Note that the web-API-using client application must not provide the
|
||||
"Content-Type: multipart/form-data" header that usually accompanies HTML
|
||||
form submissions, since the body is not formatted this way. Doing so will
|
||||
cause a server error as the lower-level code misparses the request body.
|
||||
@ -480,18 +480,18 @@ Creating A New Directory
|
||||
immutable files, literal files, and deep-immutable directories.
|
||||
|
||||
For forward-compatibility, a deep-immutable directory can also contain caps
|
||||
in a format that is unknown to the webapi server. When such caps are retrieved
|
||||
in a format that is unknown to the web-API server. When such caps are retrieved
|
||||
from a deep-immutable directory in a "ro_uri" field, they will be prefixed
|
||||
with the string "imm.", indicating that they must not be decoded without
|
||||
checking that they are immutable. The "imm." prefix must not be stripped
|
||||
off without performing this check. (Future versions of the webapi server
|
||||
off without performing this check. (Future versions of the web-API server
|
||||
will perform it where necessary.)
|
||||
|
||||
The cap for each child may be given either in the "rw_uri" or "ro_uri"
|
||||
field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
|
||||
then the webapi server will check that it is an immutable read-cap of a
|
||||
then the web-API server will check that it is an immutable read-cap of a
|
||||
*known* format, and give an error if it is not. If a cap is given in the
|
||||
"ro_uri" field, then the webapi server will still check whether known
|
||||
"ro_uri" field, then the web-API server will still check whether known
|
||||
caps are immutable, but for unknown caps it will simply assume that the
|
||||
cap can be stored, as described above. Note that an attacker would be
|
||||
able to store any cap in an immutable directory, so this check when
|
||||
@ -729,7 +729,7 @@ In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated
|
||||
instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
|
||||
in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
|
||||
are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
|
||||
sub-dict to be deleted by webapi requests in which new metadata is
|
||||
sub-dict to be deleted by web-API requests in which new metadata is
|
||||
specified, and not to be added to existing child links that lack it.
|
||||
|
||||
From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
|
||||
@ -829,7 +829,7 @@ Attaching an existing File or Directory by its read- or write-cap
|
||||
|
||||
Note that this operation does not take its child cap in the form of
|
||||
separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
|
||||
child cap in a format unknown to the webapi server, unless its URI
|
||||
child cap in a format unknown to the web-API server, unless its URI
|
||||
starts with "ro." or "imm.". This restriction is necessary because the
|
||||
server is not able to attenuate an unknown write cap to a read cap.
|
||||
Unknown URIs starting with "ro." or "imm.", on the other hand, are
|
||||
@ -1138,7 +1138,7 @@ Attaching An Existing File Or Directory (by URI)
|
||||
directory, with a specified child name. This behaves much like the PUT t=uri
|
||||
operation, and is a lot like a UNIX hardlink. It is subject to the same
|
||||
restrictions as that operation on the use of cap formats unknown to the
|
||||
webapi server.
|
||||
web-API server.
|
||||
|
||||
This will create additional intermediate directories as necessary, although
|
||||
since it is expected to be triggered by a form that was retrieved by "GET
|
||||
@ -1796,7 +1796,7 @@ This is the "Welcome Page", and contains a few distinct sections::
|
||||
Static Files in /public_html
|
||||
============================
|
||||
|
||||
The webapi server will take any request for a URL that starts with /static
|
||||
The web-API server will take any request for a URL that starts with /static
|
||||
and serve it from a configurable directory which defaults to
|
||||
$BASEDIR/public_html . This is configured by setting the "[node]web.static"
|
||||
value in $BASEDIR/tahoe.cfg . If this is left at the default value of
|
||||
@ -1804,7 +1804,7 @@ value in $BASEDIR/tahoe.cfg . If this is left at the default value of
|
||||
served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
|
||||
|
||||
This can be useful to serve a javascript application which provides a
|
||||
prettier front-end to the rest of the Tahoe webapi.
|
||||
prettier front-end to the rest of the Tahoe web-API.
|
||||
|
||||
|
||||
Safety and security issues -- names vs. URIs
|
||||
@ -1850,7 +1850,7 @@ parent directory, so it isn't any harder to use the URI for this purpose.
|
||||
|
||||
The read and write caps in a given directory node are separate URIs, and
|
||||
can't be assumed to point to the same object even if they were retrieved in
|
||||
the same operation (although the webapi server attempts to ensure this
|
||||
the same operation (although the web-API server attempts to ensure this
|
||||
in most cases). If you need to rely on that property, you should explicitly
|
||||
verify it. More generally, you should not make assumptions about the
|
||||
internal consistency of the contents of mutable directories. As a result
|
||||
@ -1895,7 +1895,7 @@ Coordination Directive.
|
||||
|
||||
Tahoe nodes implement internal serialization to make sure that a single Tahoe
|
||||
node cannot conflict with itself. For example, it is safe to issue two
|
||||
directory modification requests to a single tahoe node's webapi server at the
|
||||
directory modification requests to a single tahoe node's web-API server at the
|
||||
same time, because the Tahoe node will internally delay one of them until
|
||||
after the other has finished being applied. (This feature was introduced in
|
||||
Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
|
||||
|
@ -28,7 +28,7 @@ next renewal pass.
|
||||
|
||||
There are several tradeoffs to be considered when choosing the renewal timer
|
||||
and the lease duration, and there is no single optimal pair of values. See
|
||||
the "lease-tradeoffs.svg" diagram to get an idea for the tradeoffs involved.
|
||||
the `<lease-tradeoffs.svg>`_ diagram to get an idea for the tradeoffs involved.
|
||||
If lease renewal occurs quickly and with 100% reliability, than any renewal
|
||||
time that is shorter than the lease duration will suffice, but a larger ratio
|
||||
of duration-over-renewal-time will be more robust in the face of occasional
|
||||
@ -48,14 +48,14 @@ Client-side Renewal
|
||||
|
||||
If all of the files and directories which you care about are reachable from a
|
||||
single starting point (usually referred to as a "rootcap"), and you store
|
||||
that rootcap as an alias (via "tahoe create-alias"), then the simplest way to
|
||||
renew these leases is with the following CLI command::
|
||||
that rootcap as an alias (via "``tahoe create-alias``" for example), then the
|
||||
simplest way to renew these leases is with the following CLI command::
|
||||
|
||||
tahoe deep-check --add-lease ALIAS:
|
||||
|
||||
This will recursively walk every directory under the given alias and renew
|
||||
the leases on all files and directories. (You may want to add a ``--repair``
|
||||
flag to perform repair at the same time). Simply run this command once a week
|
||||
flag to perform repair at the same time.) Simply run this command once a week
|
||||
(or whatever other renewal period your grid recommends) and make sure it
|
||||
completes successfully. As a side effect, a manifest of all unique files and
|
||||
directories will be emitted to stdout, as well as a summary of file sizes and
|
||||
@ -78,7 +78,7 @@ Server Side Expiration
|
||||
|
||||
Expiration must be explicitly enabled on each storage server, since the
|
||||
default behavior is to never expire shares. Expiration is enabled by adding
|
||||
config keys to the "[storage]" section of the tahoe.cfg file (as described
|
||||
config keys to the ``[storage]`` section of the ``tahoe.cfg`` file (as described
|
||||
below) and restarting the server node.
|
||||
|
||||
Each lease has two parameters: a create/renew timestamp and a duration. The
|
||||
@ -89,7 +89,7 @@ at 31 days, and the "nominal lease expiration time" is simply $duration
|
||||
seconds after the $create_renew timestamp. (In a future release of Tahoe, the
|
||||
client will get to request a specific duration, and the server will accept or
|
||||
reject the request depending upon its local configuration, so that servers
|
||||
can achieve better control over their storage obligations).
|
||||
can achieve better control over their storage obligations.)
|
||||
|
||||
The lease-expiration code has two modes of operation. The first is age-based:
|
||||
leases are expired when their age is greater than their duration. This is the
|
||||
@ -99,7 +99,7 @@ active files and directories will be preserved, and the garbage will
|
||||
collected in a timely fashion.
|
||||
|
||||
Since there is not yet a way for clients to request a lease duration of other
|
||||
than 31 days, there is a tahoe.cfg setting to override the duration of all
|
||||
than 31 days, there is a ``tahoe.cfg`` setting to override the duration of all
|
||||
leases. If, for example, this alternative duration is set to 60 days, then
|
||||
clients could safely renew their leases with an add-lease operation perhaps
|
||||
once every 50 days: even though nominally their leases would expire 31 days
|
||||
@ -117,22 +117,22 @@ for a long period of time: once the lease-checker has examined all shares and
|
||||
expired whatever it is going to expire, the second and subsequent passes are
|
||||
not going to find any new leases to remove.
|
||||
|
||||
The tahoe.cfg file uses the following keys to control lease expiration::
|
||||
The ``tahoe.cfg`` file uses the following keys to control lease expiration:
|
||||
|
||||
[storage]
|
||||
``[storage]``
|
||||
|
||||
expire.enabled = (boolean, optional)
|
||||
``expire.enabled = (boolean, optional)``
|
||||
|
||||
If this is True, the storage server will delete shares on which all
|
||||
If this is ``True``, the storage server will delete shares on which all
|
||||
leases have expired. Other controls dictate when leases are considered to
|
||||
have expired. The default is False.
|
||||
have expired. The default is ``False``.
|
||||
|
||||
expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)
|
||||
``expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)``
|
||||
|
||||
If this string is "age", the age-based expiration scheme is used, and the
|
||||
"expire.override_lease_duration" setting can be provided to influence the
|
||||
``expire.override_lease_duration`` setting can be provided to influence the
|
||||
lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is
|
||||
used, and the "expire.cutoff_date" setting must be provided to specify
|
||||
used, and the ``expire.cutoff_date`` setting must be provided to specify
|
||||
the cutoff date. The mode setting currently has no default: you must
|
||||
provide a value.
|
||||
|
||||
@ -140,24 +140,24 @@ The tahoe.cfg file uses the following keys to control lease expiration::
|
||||
this release it was deemed safer to require an explicit mode
|
||||
specification.
|
||||
|
||||
expire.override_lease_duration = (duration string, optional)
|
||||
``expire.override_lease_duration = (duration string, optional)``
|
||||
|
||||
When age-based expiration is in use, a lease will be expired if its
|
||||
"lease.create_renew" timestamp plus its "lease.duration" time is
|
||||
``lease.create_renew`` timestamp plus its ``lease.duration`` time is
|
||||
earlier/older than the current time. This key, if present, overrides the
|
||||
duration value for all leases, changing the algorithm from:
|
||||
duration value for all leases, changing the algorithm from::
|
||||
|
||||
if (lease.create_renew_timestamp + lease.duration) < now:
|
||||
expire_lease()
|
||||
|
||||
to:
|
||||
to::
|
||||
|
||||
if (lease.create_renew_timestamp + override_lease_duration) < now:
|
||||
expire_lease()
|
||||
|
||||
The value of this setting is a "duration string", which is a number of
|
||||
days, months, or years, followed by a units suffix, and optionally
|
||||
separated by a space, such as one of the following:
|
||||
separated by a space, such as one of the following::
|
||||
|
||||
7days
|
||||
31day
|
||||
@ -175,14 +175,14 @@ The tahoe.cfg file uses the following keys to control lease expiration::
|
||||
31days" had been passed.
|
||||
|
||||
This key is only valid when age-based expiration is in use (i.e. when
|
||||
"expire.mode = age" is used). It will be rejected if cutoff-date
|
||||
``expire.mode = age`` is used). It will be rejected if cutoff-date
|
||||
expiration is in use.
|
||||
|
||||
expire.cutoff_date = (date string, required if mode=cutoff-date)
|
||||
``expire.cutoff_date = (date string, required if mode=cutoff-date)``
|
||||
|
||||
When cutoff-date expiration is in use, a lease will be expired if its
|
||||
create/renew timestamp is older than the cutoff date. This string will be
|
||||
a date in the following format:
|
||||
a date in the following format::
|
||||
|
||||
2009-01-16 (January 16th, 2009)
|
||||
2008-02-02
|
||||
|
239
docs/logging.rst
239
docs/logging.rst
@ -23,33 +23,33 @@ record information about what is happening inside the Tahoe node. This is
|
||||
primarily for use by programmers and grid operators who want to find out what
|
||||
went wrong.
|
||||
|
||||
The foolscap logging system is documented here:
|
||||
The foolscap logging system is documented at
|
||||
`<http://foolscap.lothar.com/docs/logging.html>`_.
|
||||
|
||||
http://foolscap.lothar.com/docs/logging.html
|
||||
|
||||
The foolscap distribution includes a utility named "flogtool" (usually at
|
||||
/usr/bin/flogtool) which is used to get access to many foolscap logging
|
||||
features.
|
||||
The foolscap distribution includes a utility named "``flogtool``" (usually
|
||||
at ``/usr/bin/flogtool`` on Unix) which is used to get access to many
|
||||
foolscap logging features.
|
||||
|
||||
Realtime Logging
|
||||
================
|
||||
|
||||
When you are working on Tahoe code, and want to see what the node is doing,
|
||||
the easiest tool to use is "flogtool tail". This connects to the tahoe node
|
||||
and subscribes to hear about all log events. These events are then displayed
|
||||
to stdout, and optionally saved to a file.
|
||||
the easiest tool to use is "``flogtool tail``". This connects to the Tahoe
|
||||
node and subscribes to hear about all log events. These events are then
|
||||
displayed to stdout, and optionally saved to a file.
|
||||
|
||||
"flogtool tail" connects to the "logport", for which the FURL is stored in
|
||||
BASEDIR/private/logport.furl . The following command will connect to this
|
||||
port and start emitting log information:
|
||||
"``flogtool tail``" connects to the "logport", for which the FURL is stored
|
||||
in ``BASEDIR/private/logport.furl`` . The following command will connect to
|
||||
this port and start emitting log information::
|
||||
|
||||
flogtool tail BASEDIR/private/logport.furl
|
||||
|
||||
The "--save-to FILENAME" option will save all received events to a file,
|
||||
where then can be examined later with "flogtool dump" or "flogtool
|
||||
web-viewer". The --catch-up flag will ask the node to dump all stored events
|
||||
before subscribing to new ones (without --catch-up, you will only hear about
|
||||
events that occur after the tool has connected and subscribed).
|
||||
The ``--save-to FILENAME`` option will save all received events to a file,
|
||||
where then can be examined later with "``flogtool dump``" or
|
||||
"``flogtool web-viewer``". The ``--catch-up`` option will ask the node to
|
||||
dump all stored events before subscribing to new ones (without ``--catch-up``,
|
||||
you will only hear about events that occur after the tool has connected and
|
||||
subscribed).
|
||||
|
||||
Incidents
|
||||
=========
|
||||
@ -57,41 +57,41 @@ Incidents
|
||||
Foolscap keeps a short list of recent events in memory. When something goes
|
||||
wrong, it writes all the history it has (and everything that gets logged in
|
||||
the next few seconds) into a file called an "incident". These files go into
|
||||
BASEDIR/logs/incidents/ , in a file named
|
||||
"incident-TIMESTAMP-UNIQUE.flog.bz2". The default definition of "something
|
||||
goes wrong" is the generation of a log event at the log.WEIRD level or
|
||||
higher, but other criteria could be implemented.
|
||||
``BASEDIR/logs/incidents/`` , in a file named
|
||||
"``incident-TIMESTAMP-UNIQUE.flog.bz2``". The default definition of
|
||||
"something goes wrong" is the generation of a log event at the ``log.WEIRD``
|
||||
level or higher, but other criteria could be implemented.
|
||||
|
||||
The typical "incident report" we've seen in a large Tahoe grid is about 40kB
|
||||
compressed, representing about 1800 recent events.
|
||||
|
||||
These "flogfiles" have a similar format to the files saved by "flogtool tail
|
||||
--save-to". They are simply lists of log events, with a small header to
|
||||
indicate which event triggered the incident.
|
||||
These "flogfiles" have a similar format to the files saved by
|
||||
"``flogtool tail --save-to``". They are simply lists of log events, with a
|
||||
small header to indicate which event triggered the incident.
|
||||
|
||||
The "flogtool dump FLOGFILE" command will take one of these .flog.bz2 files
|
||||
and print their contents to stdout, one line per event. The raw event
|
||||
dictionaries can be dumped by using "flogtool dump --verbose FLOGFILE".
|
||||
The "``flogtool dump FLOGFILE``" command will take one of these ``.flog.bz2``
|
||||
files and print their contents to stdout, one line per event. The raw event
|
||||
dictionaries can be dumped by using "``flogtool dump --verbose FLOGFILE``".
|
||||
|
||||
The "flogtool web-viewer" command can be used to examine the flogfile in a
|
||||
web browser. It runs a small HTTP server and emits the URL on stdout. This
|
||||
view provides more structure than the output of "flogtool dump": the
|
||||
parent/child relationships of log events is displayed in a nested format.
|
||||
"flogtool web-viewer" is still fairly immature.
|
||||
The "``flogtool web-viewer``" command can be used to examine the flogfile
|
||||
in a web browser. It runs a small HTTP server and emits the URL on stdout.
|
||||
This view provides more structure than the output of "``flogtool dump``":
|
||||
the parent/child relationships of log events is displayed in a nested format.
|
||||
"``flogtool web-viewer``" is still fairly immature.
|
||||
|
||||
Working with flogfiles
|
||||
======================
|
||||
|
||||
The "flogtool filter" command can be used to take a large flogfile (perhaps
|
||||
one created by the log-gatherer, see below) and copy a subset of events into
|
||||
a second file. This smaller flogfile may be easier to work with than the
|
||||
original. The arguments to "flogtool filter" specify filtering criteria: a
|
||||
predicate that each event must match to be copied into the target file.
|
||||
--before and --after are used to exclude events outside a given window of
|
||||
time. --above will retain events above a certain severity level. --from
|
||||
retains events send by a specific tubid. --strip-facility removes events that
|
||||
were emitted with a given facility (like foolscap.negotiation or
|
||||
tahoe.upload).
|
||||
The "``flogtool filter``" command can be used to take a large flogfile
|
||||
(perhaps one created by the log-gatherer, see below) and copy a subset of
|
||||
events into a second file. This smaller flogfile may be easier to work with
|
||||
than the original. The arguments to "``flogtool filter``" specify filtering
|
||||
criteria: a predicate that each event must match to be copied into the
|
||||
target file. ``--before`` and ``--after`` are used to exclude events outside
|
||||
a given window of time. ``--above`` will retain events above a certain
|
||||
severity level. ``--from`` retains events send by a specific tubid.
|
||||
``--strip-facility`` removes events that were emitted with a given facility
|
||||
(like ``foolscap.negotiation`` or ``tahoe.upload``).
|
||||
|
||||
Gatherers
|
||||
=========
|
||||
@ -99,16 +99,16 @@ Gatherers
|
||||
In a deployed Tahoe grid, it is useful to get log information automatically
|
||||
transferred to a central log-gatherer host. This offloads the (admittedly
|
||||
modest) storage requirements to a different host and provides access to
|
||||
logfiles from multiple nodes (webapi/storage/helper) nodes in a single place.
|
||||
logfiles from multiple nodes (web-API, storage, or helper) in a single place.
|
||||
|
||||
There are two kinds of gatherers. Both produce a FURL which needs to be
|
||||
placed in the NODEDIR/log_gatherer.furl file (one FURL per line) of the nodes
|
||||
that are to publish their logs to the gatherer. When the Tahoe node starts,
|
||||
it will connect to the configured gatherers and offer its logport: the
|
||||
gatherer will then use the logport to subscribe to hear about events.
|
||||
placed in the ``NODEDIR/log_gatherer.furl`` file (one FURL per line) of
|
||||
each node that is to publish its logs to the gatherer. When the Tahoe node
|
||||
starts, it will connect to the configured gatherers and offer its logport:
|
||||
the gatherer will then use the logport to subscribe to hear about events.
|
||||
|
||||
The gatherer will write to files in its working directory, which can then be
|
||||
examined with tools like "flogtool dump" as described above.
|
||||
examined with tools like "``flogtool dump``" as described above.
|
||||
|
||||
Incident Gatherer
|
||||
-----------------
|
||||
@ -121,38 +121,38 @@ functions are written after examining a new/unknown incident. The idea is to
|
||||
recognize when the same problem is happening multiple times.
|
||||
|
||||
A collection of classification functions that are useful for Tahoe nodes are
|
||||
provided in misc/incident-gatherer/support_classifiers.py . There is roughly
|
||||
one category for each log.WEIRD-or-higher level event in the Tahoe source
|
||||
code.
|
||||
provided in ``misc/incident-gatherer/support_classifiers.py`` . There is
|
||||
roughly one category for each ``log.WEIRD``-or-higher level event in the
|
||||
Tahoe source code.
|
||||
|
||||
The incident gatherer is created with the "flogtool create-incident-gatherer
|
||||
WORKDIR" command, and started with "tahoe start". The generated
|
||||
"gatherer.tac" file should be modified to add classifier functions.
|
||||
The incident gatherer is created with the "``flogtool create-incident-gatherer
|
||||
WORKDIR``" command, and started with "``tahoe start``". The generated
|
||||
"``gatherer.tac``" file should be modified to add classifier functions.
|
||||
|
||||
The incident gatherer writes incident names (which are simply the relative
|
||||
pathname of the incident-\*.flog.bz2 file) into classified/CATEGORY. For
|
||||
example, the classified/mutable-retrieve-uncoordinated-write-error file
|
||||
contains a list of all incidents which were triggered by an uncoordinated
|
||||
pathname of the ``incident-\*.flog.bz2`` file) into ``classified/CATEGORY``.
|
||||
For example, the ``classified/mutable-retrieve-uncoordinated-write-error``
|
||||
file contains a list of all incidents which were triggered by an uncoordinated
|
||||
write that was detected during mutable file retrieval (caused when somebody
|
||||
changed the contents of the mutable file in between the node's mapupdate step
|
||||
and the retrieve step). The classified/unknown file contains a list of all
|
||||
and the retrieve step). The ``classified/unknown`` file contains a list of all
|
||||
incidents that did not match any of the classification functions.
|
||||
|
||||
At startup, the incident gatherer will automatically reclassify any incident
|
||||
report which is not mentioned in any of the classified/* files. So the usual
|
||||
workflow is to examine the incidents in classified/unknown, add a new
|
||||
classification function, delete classified/unknown, then bound the gatherer
|
||||
with "tahoe restart WORKDIR". The incidents which can be classified with the
|
||||
new functions will be added to their own classified/FOO lists, and the
|
||||
remaining ones will be put in classified/unknown, where the process can be
|
||||
repeated until all events are classifiable.
|
||||
report which is not mentioned in any of the ``classified/\*`` files. So the
|
||||
usual workflow is to examine the incidents in ``classified/unknown``, add a
|
||||
new classification function, delete ``classified/unknown``, then bound the
|
||||
gatherer with "``tahoe restart WORKDIR``". The incidents which can be
|
||||
classified with the new functions will be added to their own ``classified/FOO``
|
||||
lists, and the remaining ones will be put in ``classified/unknown``, where
|
||||
the process can be repeated until all events are classifiable.
|
||||
|
||||
The incident gatherer is still fairly immature: future versions will have a
|
||||
web interface and an RSS feed, so operations personnel can track problems in
|
||||
the storage grid.
|
||||
|
||||
In our experience, each Incident takes about two seconds to transfer from the
|
||||
node which generated it to the gatherer. The gatherer will automatically
|
||||
In our experience, each incident takes about two seconds to transfer from
|
||||
the node that generated it to the gatherer. The gatherer will automatically
|
||||
catch up to any incidents which occurred while it is offline.
|
||||
|
||||
Log Gatherer
|
||||
@ -163,20 +163,20 @@ the connected nodes, regardless of severity. This server writes these log
|
||||
events into a large flogfile that is rotated (closed, compressed, and
|
||||
replaced with a new one) on a periodic basis. Each flogfile is named
|
||||
according to the range of time it represents, with names like
|
||||
"from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2". The flogfiles
|
||||
"``from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2``". The flogfiles
|
||||
contain events from many different sources, making it easier to correlate
|
||||
things that happened on multiple machines (such as comparing a client node
|
||||
making a request with the storage servers that respond to that request).
|
||||
|
||||
The Log Gatherer is created with the "flogtool create-gatherer WORKDIR"
|
||||
command, and started with "tahoe start". The log_gatherer.furl it creates
|
||||
then needs to be copied into the BASEDIR/log_gatherer.furl file of all nodes
|
||||
which should be sending it log events.
|
||||
The Log Gatherer is created with the "``flogtool create-gatherer WORKDIR``"
|
||||
command, and started with "``tahoe start``". The ``log_gatherer.furl`` it
|
||||
creates then needs to be copied into the ``BASEDIR/log_gatherer.furl`` file
|
||||
of all nodes that should be sending it log events.
|
||||
|
||||
The "flogtool filter" command, described above, is useful to cut down the
|
||||
The "``flogtool filter``" command, described above, is useful to cut down the
|
||||
potentially-large flogfiles into more a narrowly-focussed form.
|
||||
|
||||
Busy nodes, particularly wapi nodes which are performing recursive
|
||||
Busy nodes, particularly web-API nodes which are performing recursive
|
||||
deep-size/deep-stats/deep-check operations, can produce a lot of log events.
|
||||
To avoid overwhelming the node (and using an unbounded amount of memory for
|
||||
the outbound TCP queue), publishing nodes will start dropping log events when
|
||||
@ -186,19 +186,20 @@ the outbound queue grows too large. When this occurs, there will be gaps
|
||||
Local twistd.log files
|
||||
======================
|
||||
|
||||
[TODO: not yet true, requires foolscap-0.3.1 and a change to allmydata.node]
|
||||
[TODO: not yet true, requires foolscap-0.3.1 and a change to ``allmydata.node``]
|
||||
|
||||
In addition to the foolscap-based event logs, certain high-level events will
|
||||
be recorded directly in human-readable text form, in the
|
||||
BASEDIR/logs/twistd.log file (and its rotated old versions: twistd.log.1,
|
||||
twistd.log.2, etc). This form does not contain as much information as the
|
||||
``BASEDIR/logs/twistd.log`` file (and its rotated old versions: ``twistd.log.1``,
|
||||
``twistd.log.2``, etc). This form does not contain as much information as the
|
||||
flogfiles available through the means described previously, but they are
|
||||
immediately available to the curious developer, and are retained until the
|
||||
twistd.log.NN files are explicitly deleted.
|
||||
|
||||
Only events at the log.OPERATIONAL level or higher are bridged to twistd.log
|
||||
(i.e. not the log.NOISY debugging events). In addition, foolscap internal
|
||||
events (like connection negotiation messages) are not bridged to twistd.log .
|
||||
Only events at the ``log.OPERATIONAL`` level or higher are bridged to
|
||||
``twistd.log`` (i.e. not the ``log.NOISY`` debugging events). In addition,
|
||||
foolscap internal events (like connection negotiation messages) are not
|
||||
bridged to ``twistd.log``.
|
||||
|
||||
Adding log messages
|
||||
===================
|
||||
@ -207,63 +208,65 @@ When adding new code, the Tahoe developer should add a reasonable number of
|
||||
new log events. For details, please see the Foolscap logging documentation,
|
||||
but a few notes are worth stating here:
|
||||
|
||||
* use a facility prefix of "tahoe.", like "tahoe.mutable.publish"
|
||||
* use a facility prefix of "``tahoe.``", like "``tahoe.mutable.publish``"
|
||||
|
||||
* assign each severe (log.WEIRD or higher) event a unique message
|
||||
identifier, as the umid= argument to the log.msg() call. The
|
||||
misc/coding_tools/make_umid script may be useful for this purpose. This will make it
|
||||
easier to write a classification function for these messages.
|
||||
* assign each severe (``log.WEIRD`` or higher) event a unique message
|
||||
identifier, as the ``umid=`` argument to the ``log.msg()`` call. The
|
||||
``misc/coding_tools/make_umid`` script may be useful for this purpose.
|
||||
This will make it easier to write a classification function for these
|
||||
messages.
|
||||
|
||||
* use the parent= argument whenever the event is causally/temporally
|
||||
* use the ``parent=`` argument whenever the event is causally/temporally
|
||||
clustered with its parent. For example, a download process that involves
|
||||
three sequential hash fetches could announce the send and receipt of those
|
||||
hash-fetch messages with a parent= argument that ties them to the overall
|
||||
download process. However, each new wapi download request should be
|
||||
unparented.
|
||||
hash-fetch messages with a ``parent=`` argument that ties them to the
|
||||
overall download process. However, each new web-API download request
|
||||
should be unparented.
|
||||
|
||||
* use the format= argument in preference to the message= argument. E.g.
|
||||
use log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k) instead of
|
||||
log.msg("got %d shares, need %d" % (n,k)). This will allow later tools to
|
||||
analyze the event without needing to scrape/reconstruct the structured
|
||||
data out of the formatted string.
|
||||
* use the ``format=`` argument in preference to the ``message=`` argument.
|
||||
E.g. use ``log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k)``
|
||||
instead of ``log.msg("got %d shares, need %d" % (n,k))``. This will allow
|
||||
later tools to analyze the event without needing to scrape/reconstruct
|
||||
the structured data out of the formatted string.
|
||||
|
||||
* Pass extra information as extra keyword arguments, even if they aren't
|
||||
included in the format= string. This information will be displayed in the
|
||||
"flogtool dump --verbose" output, as well as being available to other
|
||||
tools. The umid= argument should be passed this way.
|
||||
included in the ``format=`` string. This information will be displayed in
|
||||
the "``flogtool dump --verbose``" output, as well as being available to
|
||||
other tools. The ``umid=`` argument should be passed this way.
|
||||
|
||||
* use log.err for the catch-all addErrback that gets attached to the end of
|
||||
any given Deferred chain. When used in conjunction with LOGTOTWISTED=1,
|
||||
log.err() will tell Twisted about the error-nature of the log message,
|
||||
causing Trial to flunk the test (with an "ERROR" indication that prints a
|
||||
copy of the Failure, including a traceback). Don't use log.err for events
|
||||
that are BAD but handled (like hash failures: since these are often
|
||||
deliberately provoked by test code, they should not cause test failures):
|
||||
use log.msg(level=BAD) for those instead.
|
||||
* use ``log.err`` for the catch-all ``addErrback`` that gets attached to
|
||||
the end of any given Deferred chain. When used in conjunction with
|
||||
``LOGTOTWISTED=1``, ``log.err()`` will tell Twisted about the error-nature
|
||||
of the log message, causing Trial to flunk the test (with an "ERROR"
|
||||
indication that prints a copy of the Failure, including a traceback).
|
||||
Don't use ``log.err`` for events that are ``BAD`` but handled (like hash
|
||||
failures: since these are often deliberately provoked by test code, they
|
||||
should not cause test failures): use ``log.msg(level=BAD)`` for those
|
||||
instead.
|
||||
|
||||
|
||||
Log Messages During Unit Tests
|
||||
==============================
|
||||
|
||||
If a test is failing and you aren't sure why, start by enabling
|
||||
FLOGTOTWISTED=1 like this:
|
||||
``FLOGTOTWISTED=1`` like this::
|
||||
|
||||
make test FLOGTOTWISTED=1
|
||||
|
||||
With FLOGTOTWISTED=1, sufficiently-important log events will be written into
|
||||
_trial_temp/test.log, which may give you more ideas about why the test is
|
||||
failing. Note, however, that _trial_temp/log.out will not receive messages
|
||||
below the level=OPERATIONAL threshold, due to this issue:
|
||||
<http://foolscap.lothar.com/trac/ticket/154>
|
||||
With ``FLOGTOTWISTED=1``, sufficiently-important log events will be written
|
||||
into ``_trial_temp/test.log``, which may give you more ideas about why the
|
||||
test is failing. Note, however, that ``_trial_temp/log.out`` will not receive
|
||||
messages below the ``level=OPERATIONAL`` threshold, due to this issue:
|
||||
`<http://foolscap.lothar.com/trac/ticket/154>`_
|
||||
|
||||
|
||||
If that isn't enough, look at the detailed foolscap logging messages instead,
|
||||
by running the tests like this:
|
||||
by running the tests like this::
|
||||
|
||||
make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1
|
||||
|
||||
The first environment variable will cause foolscap log events to be written
|
||||
to ./flog.out.bz2 (instead of merely being recorded in the circular buffers
|
||||
to ``./flog.out.bz2`` (instead of merely being recorded in the circular buffers
|
||||
for the use of remote subscribers or incident reports). The second will cause
|
||||
all log events to be written out, not just the higher-severity ones. The
|
||||
third will cause twisted log events (like the markers that indicate when each
|
||||
@ -271,13 +274,13 @@ unit test is starting and stopping) to be copied into the flogfile, making it
|
||||
easier to correlate log events with unit tests.
|
||||
|
||||
Enabling this form of logging appears to roughly double the runtime of the
|
||||
unit tests. The flog.out.bz2 file is approximately 2MB.
|
||||
unit tests. The ``flog.out.bz2`` file is approximately 2MB.
|
||||
|
||||
You can then use "flogtool dump" or "flogtool web-viewer" on the resulting
|
||||
flog.out file.
|
||||
You can then use "``flogtool dump``" or "``flogtool web-viewer``" on the
|
||||
resulting ``flog.out`` file.
|
||||
|
||||
("flogtool tail" and the log-gatherer are not useful during unit tests, since
|
||||
there is no single Tub to which all the log messages are published).
|
||||
("``flogtool tail``" and the log-gatherer are not useful during unit tests,
|
||||
since there is no single Tub to which all the log messages are published).
|
||||
|
||||
It is possible for setting these environment variables to cause spurious test
|
||||
failures in tests with race condition bugs. All known instances of this have
|
||||
|
@ -1502,6 +1502,6 @@
|
||||
sodipodi:role="line"
|
||||
id="tspan2822"
|
||||
x="469.52924"
|
||||
y="342.69528">Tahoe-LAFS WAPI</tspan></text>
|
||||
y="342.69528">Tahoe-LAFS web-API</tspan></text>
|
||||
</g>
|
||||
</svg>
|
||||
|
Before Width: | Height: | Size: 161 KiB After Width: | Height: | Size: 161 KiB |
@ -88,14 +88,14 @@ contents of an authority string. These authority strings can be shared with
|
||||
others just like filecaps and dircaps: knowledge of the authority string is
|
||||
both necessary and complete to wield the authority it represents.
|
||||
|
||||
webapi requests will include the authority necessary to complete the
|
||||
Web-API requests will include the authority necessary to complete the
|
||||
operation. When used by a CLI tool, the authority is likely to come from
|
||||
~/.tahoe/private/authority (i.e. it is ambient to the user who has access to
|
||||
that node, just like aliases provide similar access to a specific "root
|
||||
directory"). When used by the browser-oriented WUI, the authority will [TODO]
|
||||
somehow be retained on each page in a way that minimizes the risk of CSRF
|
||||
attacks and allows safe sharing (cut-and-paste of a URL without sharing the
|
||||
storage authority too). The client node receiving the webapi request will
|
||||
storage authority too). The client node receiving the web-API request will
|
||||
extract the authority string from the request and use it to build the storage
|
||||
server messages that it sends to fulfill that request.
|
||||
|
||||
@ -449,12 +449,11 @@ using a foreign tahoe node, or when asking a Helper to upload a specific
|
||||
file. Attenuations (see below) should be used to limit the delegated
|
||||
authority in these cases.
|
||||
|
||||
In the programmatic webapi interface (colloquially known as the "WAPI"), any
|
||||
operation that consumes storage will accept a storage-authority= query
|
||||
argument, the value of which will be the printable form of an authority
|
||||
string. This includes all PUT operations, POST t=upload and t=mkdir, and
|
||||
anything which creates a new file, creates a directory (perhaps an
|
||||
intermediate one), or modifies a mutable file.
|
||||
In the programmatic web-API, any operation that consumes storage will accept
|
||||
a storage-authority= query argument, the value of which will be the printable
|
||||
form of an authority string. This includes all PUT operations, POST t=upload
|
||||
and t=mkdir, and anything which creates a new file, creates a directory
|
||||
(perhaps an intermediate one), or modifies a mutable file.
|
||||
|
||||
Alternatively, the authority string can also be passed through an HTTP
|
||||
header. A single "X-Tahoe-Storage-Authority:" header can be used with the
|
||||
@ -501,7 +500,7 @@ servers in a single grid and sum them together, providing a grid-wide usage
|
||||
number for each account. This could be used by e.g. clients in a commercial
|
||||
grid to report overall-space-used to the end user.
|
||||
|
||||
There will be webapi URLs available for all of these reports.
|
||||
There will be web-API URLs available for all of these reports.
|
||||
|
||||
TODO: storage servers might also have a mechanism to apply space-usage limits
|
||||
to specific account ids directly, rather than requiring that these be
|
||||
@ -516,7 +515,7 @@ beginning with the storage-authority data structure and working upwards. This
|
||||
section is organized to follow the storage authority, starting from the point
|
||||
of grant. The discussion will thus begin at the storage server (where the
|
||||
authority is first created), work back to the client (which receives the
|
||||
authority as a webapi argument), then follow the authority back to the
|
||||
authority as a web-API argument), then follow the authority back to the
|
||||
servers as it is used to enable specific storage operations. It will then
|
||||
detail the accounting tables that the storage server is obligated to
|
||||
maintain, and describe the interfaces through which these tables are accessed
|
||||
|
@ -124,7 +124,7 @@
|
||||
<p>The <a href="http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend">SftpFrontend</a> page
|
||||
on the wiki has more information about using SFTP with Tahoe-LAFS.</p>
|
||||
|
||||
<h3>The WAPI</h3>
|
||||
<h3>The Web-API</h3>
|
||||
|
||||
<p>Want to program your Tahoe-LAFS node to do your bidding? Easy! See <a
|
||||
href="frontends/webapi.rst">webapi.rst</a>.</p>
|
||||
|
@ -498,8 +498,9 @@ overwrite() tells the client to ignore this cached version information, and
|
||||
to unconditionally replace the mutable file's contents with the new data.
|
||||
This should not be used in delta application, but rather in situations where
|
||||
you want to replace the file's contents with completely unrelated ones. When
|
||||
raw files are uploaded into a mutable slot through the tahoe webapi (using
|
||||
POST and the ?mutable=true argument), they are put in place with overwrite().
|
||||
raw files are uploaded into a mutable slot through the Tahoe-LAFS web-API
|
||||
(using POST and the ?mutable=true argument), they are put in place with
|
||||
overwrite().
|
||||
|
||||
The peer-selection and data-structure manipulation (and signing/verification)
|
||||
steps will be implemented in a separate class in allmydata/mutable.py .
|
||||
|
Loading…
x
Reference in New Issue
Block a user