mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-02-20 17:52:50 +00:00
docs: reflow docs/logging.rst to fill-column 77
This commit is contained in:
parent
cd006ed46b
commit
41999430e0
117
docs/logging.rst
117
docs/logging.rst
@ -26,9 +26,9 @@ went wrong.
|
||||
The foolscap logging system is documented at
|
||||
`<http://foolscap.lothar.com/docs/logging.html>`_.
|
||||
|
||||
The foolscap distribution includes a utility named "``flogtool``" (usually
|
||||
at ``/usr/bin/flogtool`` on Unix) which is used to get access to many
|
||||
foolscap logging features.
|
||||
The foolscap distribution includes a utility named "``flogtool``" (usually at
|
||||
``/usr/bin/flogtool`` on Unix) which is used to get access to many foolscap
|
||||
logging features.
|
||||
|
||||
Realtime Logging
|
||||
================
|
||||
@ -45,11 +45,10 @@ this port and start emitting log information::
|
||||
flogtool tail BASEDIR/private/logport.furl
|
||||
|
||||
The ``--save-to FILENAME`` option will save all received events to a file,
|
||||
where then can be examined later with "``flogtool dump``" or
|
||||
"``flogtool web-viewer``". The ``--catch-up`` option will ask the node to
|
||||
dump all stored events before subscribing to new ones (without ``--catch-up``,
|
||||
you will only hear about events that occur after the tool has connected and
|
||||
subscribed).
|
||||
where then can be examined later with "``flogtool dump``" or "``flogtool
|
||||
web-viewer``". The ``--catch-up`` option will ask the node to dump all stored
|
||||
events before subscribing to new ones (without ``--catch-up``, you will only
|
||||
hear about events that occur after the tool has connected and subscribed).
|
||||
|
||||
Incidents
|
||||
=========
|
||||
@ -65,18 +64,18 @@ level or higher, but other criteria could be implemented.
|
||||
The typical "incident report" we've seen in a large Tahoe grid is about 40kB
|
||||
compressed, representing about 1800 recent events.
|
||||
|
||||
These "flogfiles" have a similar format to the files saved by
|
||||
"``flogtool tail --save-to``". They are simply lists of log events, with a
|
||||
small header to indicate which event triggered the incident.
|
||||
These "flogfiles" have a similar format to the files saved by "``flogtool
|
||||
tail --save-to``". They are simply lists of log events, with a small header
|
||||
to indicate which event triggered the incident.
|
||||
|
||||
The "``flogtool dump FLOGFILE``" command will take one of these ``.flog.bz2``
|
||||
files and print their contents to stdout, one line per event. The raw event
|
||||
dictionaries can be dumped by using "``flogtool dump --verbose FLOGFILE``".
|
||||
|
||||
The "``flogtool web-viewer``" command can be used to examine the flogfile
|
||||
in a web browser. It runs a small HTTP server and emits the URL on stdout.
|
||||
This view provides more structure than the output of "``flogtool dump``":
|
||||
the parent/child relationships of log events is displayed in a nested format.
|
||||
The "``flogtool web-viewer``" command can be used to examine the flogfile in
|
||||
a web browser. It runs a small HTTP server and emits the URL on stdout. This
|
||||
view provides more structure than the output of "``flogtool dump``": the
|
||||
parent/child relationships of log events is displayed in a nested format.
|
||||
"``flogtool web-viewer``" is still fairly immature.
|
||||
|
||||
Working with flogfiles
|
||||
@ -86,10 +85,10 @@ The "``flogtool filter``" command can be used to take a large flogfile
|
||||
(perhaps one created by the log-gatherer, see below) and copy a subset of
|
||||
events into a second file. This smaller flogfile may be easier to work with
|
||||
than the original. The arguments to "``flogtool filter``" specify filtering
|
||||
criteria: a predicate that each event must match to be copied into the
|
||||
target file. ``--before`` and ``--after`` are used to exclude events outside
|
||||
a given window of time. ``--above`` will retain events above a certain
|
||||
severity level. ``--from`` retains events send by a specific tubid.
|
||||
criteria: a predicate that each event must match to be copied into the target
|
||||
file. ``--before`` and ``--after`` are used to exclude events outside a given
|
||||
window of time. ``--above`` will retain events above a certain severity
|
||||
level. ``--from`` retains events send by a specific tubid.
|
||||
``--strip-facility`` removes events that were emitted with a given facility
|
||||
(like ``foolscap.negotiation`` or ``tahoe.upload``).
|
||||
|
||||
@ -102,8 +101,8 @@ modest) storage requirements to a different host and provides access to
|
||||
logfiles from multiple nodes (web-API, storage, or helper) in a single place.
|
||||
|
||||
There are two kinds of gatherers: "log gatherer" and "stats gatherer". Each
|
||||
produces a FURL which needs to be placed in the ``NODEDIR/tahoe.cfg`` file
|
||||
of each node that is to publish to the gatherer, under the keys
|
||||
produces a FURL which needs to be placed in the ``NODEDIR/tahoe.cfg`` file of
|
||||
each node that is to publish to the gatherer, under the keys
|
||||
"log_gatherer.furl" and "stats_gatherer.furl" respectively. When the Tahoe
|
||||
node starts, it will connect to the configured gatherers and offer its
|
||||
logport: the gatherer will then use the logport to subscribe to hear about
|
||||
@ -127,35 +126,38 @@ provided in ``misc/incident-gatherer/support_classifiers.py`` . There is
|
||||
roughly one category for each ``log.WEIRD``-or-higher level event in the
|
||||
Tahoe source code.
|
||||
|
||||
The incident gatherer is created with the "``flogtool create-incident-gatherer
|
||||
WORKDIR``" command, and started with "``tahoe start``". The generated
|
||||
"``gatherer.tac``" file should be modified to add classifier functions.
|
||||
The incident gatherer is created with the "``flogtool
|
||||
create-incident-gatherer WORKDIR``" command, and started with "``tahoe
|
||||
start``". The generated "``gatherer.tac``" file should be modified to add
|
||||
classifier functions.
|
||||
|
||||
The incident gatherer writes incident names (which are simply the relative
|
||||
pathname of the ``incident-\*.flog.bz2`` file) into ``classified/CATEGORY``.
|
||||
For example, the ``classified/mutable-retrieve-uncoordinated-write-error``
|
||||
file contains a list of all incidents which were triggered by an uncoordinated
|
||||
write that was detected during mutable file retrieval (caused when somebody
|
||||
changed the contents of the mutable file in between the node's mapupdate step
|
||||
and the retrieve step). The ``classified/unknown`` file contains a list of all
|
||||
incidents that did not match any of the classification functions.
|
||||
file contains a list of all incidents which were triggered by an
|
||||
uncoordinated write that was detected during mutable file retrieval (caused
|
||||
when somebody changed the contents of the mutable file in between the node's
|
||||
mapupdate step and the retrieve step). The ``classified/unknown`` file
|
||||
contains a list of all incidents that did not match any of the classification
|
||||
functions.
|
||||
|
||||
At startup, the incident gatherer will automatically reclassify any incident
|
||||
report which is not mentioned in any of the ``classified/\*`` files. So the
|
||||
usual workflow is to examine the incidents in ``classified/unknown``, add a
|
||||
new classification function, delete ``classified/unknown``, then bound the
|
||||
gatherer with "``tahoe restart WORKDIR``". The incidents which can be
|
||||
classified with the new functions will be added to their own ``classified/FOO``
|
||||
lists, and the remaining ones will be put in ``classified/unknown``, where
|
||||
the process can be repeated until all events are classifiable.
|
||||
classified with the new functions will be added to their own
|
||||
``classified/FOO`` lists, and the remaining ones will be put in
|
||||
``classified/unknown``, where the process can be repeated until all events
|
||||
are classifiable.
|
||||
|
||||
The incident gatherer is still fairly immature: future versions will have a
|
||||
web interface and an RSS feed, so operations personnel can track problems in
|
||||
the storage grid.
|
||||
|
||||
In our experience, each incident takes about two seconds to transfer from
|
||||
the node that generated it to the gatherer. The gatherer will automatically
|
||||
catch up to any incidents which occurred while it is offline.
|
||||
In our experience, each incident takes about two seconds to transfer from the
|
||||
node that generated it to the gatherer. The gatherer will automatically catch
|
||||
up to any incidents which occurred while it is offline.
|
||||
|
||||
Log Gatherer
|
||||
------------
|
||||
@ -170,12 +172,11 @@ contain events from many different sources, making it easier to correlate
|
||||
things that happened on multiple machines (such as comparing a client node
|
||||
making a request with the storage servers that respond to that request).
|
||||
|
||||
Create the Log Gatherer with the "``flogtool create-gatherer
|
||||
WORKDIR``" command, and start it with "``tahoe start``". Then copy the
|
||||
contents of the ``log_gatherer.furl`` file it creates into the
|
||||
``BASEDIR/tahoe.cfg`` file (under the key ``log_gatherer.furl`` of the
|
||||
section ``[node]``) of all nodes that should be sending it log
|
||||
events. (See `<configuration.rst>`_.)
|
||||
Create the Log Gatherer with the "``flogtool create-gatherer WORKDIR``"
|
||||
command, and start it with "``tahoe start``". Then copy the contents of the
|
||||
``log_gatherer.furl`` file it creates into the ``BASEDIR/tahoe.cfg`` file
|
||||
(under the key ``log_gatherer.furl`` of the section ``[node]``) of all nodes
|
||||
that should be sending it log events. (See `<configuration.rst>`_.)
|
||||
|
||||
The "``flogtool filter``" command, described above, is useful to cut down the
|
||||
potentially large flogfiles into a more focussed form.
|
||||
@ -194,11 +195,11 @@ Local twistd.log files
|
||||
|
||||
In addition to the foolscap-based event logs, certain high-level events will
|
||||
be recorded directly in human-readable text form, in the
|
||||
``BASEDIR/logs/twistd.log`` file (and its rotated old versions: ``twistd.log.1``,
|
||||
``twistd.log.2``, etc). This form does not contain as much information as the
|
||||
flogfiles available through the means described previously, but they are
|
||||
immediately available to the curious developer, and are retained until the
|
||||
twistd.log.NN files are explicitly deleted.
|
||||
``BASEDIR/logs/twistd.log`` file (and its rotated old versions:
|
||||
``twistd.log.1``, ``twistd.log.2``, etc). This form does not contain as much
|
||||
information as the flogfiles available through the means described
|
||||
previously, but they are immediately available to the curious developer, and
|
||||
are retained until the twistd.log.NN files are explicitly deleted.
|
||||
|
||||
Only events at the ``log.OPERATIONAL`` level or higher are bridged to
|
||||
``twistd.log`` (i.e. not the ``log.NOISY`` debugging events). In addition,
|
||||
@ -224,22 +225,22 @@ but a few notes are worth stating here:
|
||||
clustered with its parent. For example, a download process that involves
|
||||
three sequential hash fetches could announce the send and receipt of those
|
||||
hash-fetch messages with a ``parent=`` argument that ties them to the
|
||||
overall download process. However, each new web-API download request
|
||||
should be unparented.
|
||||
overall download process. However, each new web-API download request should
|
||||
be unparented.
|
||||
|
||||
* use the ``format=`` argument in preference to the ``message=`` argument.
|
||||
E.g. use ``log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k)``
|
||||
instead of ``log.msg("got %d shares, need %d" % (n,k))``. This will allow
|
||||
later tools to analyze the event without needing to scrape/reconstruct
|
||||
the structured data out of the formatted string.
|
||||
later tools to analyze the event without needing to scrape/reconstruct the
|
||||
structured data out of the formatted string.
|
||||
|
||||
* Pass extra information as extra keyword arguments, even if they aren't
|
||||
included in the ``format=`` string. This information will be displayed in
|
||||
the "``flogtool dump --verbose``" output, as well as being available to
|
||||
other tools. The ``umid=`` argument should be passed this way.
|
||||
|
||||
* use ``log.err`` for the catch-all ``addErrback`` that gets attached to
|
||||
the end of any given Deferred chain. When used in conjunction with
|
||||
* use ``log.err`` for the catch-all ``addErrback`` that gets attached to the
|
||||
end of any given Deferred chain. When used in conjunction with
|
||||
``LOGTOTWISTED=1``, ``log.err()`` will tell Twisted about the error-nature
|
||||
of the log message, causing Trial to flunk the test (with an "ERROR"
|
||||
indication that prints a copy of the Failure, including a traceback).
|
||||
@ -270,12 +271,12 @@ by running the tests like this::
|
||||
make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1
|
||||
|
||||
The first environment variable will cause foolscap log events to be written
|
||||
to ``./flog.out.bz2`` (instead of merely being recorded in the circular buffers
|
||||
for the use of remote subscribers or incident reports). The second will cause
|
||||
all log events to be written out, not just the higher-severity ones. The
|
||||
third will cause twisted log events (like the markers that indicate when each
|
||||
unit test is starting and stopping) to be copied into the flogfile, making it
|
||||
easier to correlate log events with unit tests.
|
||||
to ``./flog.out.bz2`` (instead of merely being recorded in the circular
|
||||
buffers for the use of remote subscribers or incident reports). The second
|
||||
will cause all log events to be written out, not just the higher-severity
|
||||
ones. The third will cause twisted log events (like the markers that indicate
|
||||
when each unit test is starting and stopping) to be copied into the flogfile,
|
||||
making it easier to correlate log events with unit tests.
|
||||
|
||||
Enabling this form of logging appears to roughly double the runtime of the
|
||||
unit tests. The ``flog.out.bz2`` file is approximately 2MB.
|
||||
|
Loading…
x
Reference in New Issue
Block a user