mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-19 04:57:54 +00:00
stats-gatherer: add --hostname/--location/--port
Updates docs, tests, explains how to update an old gatherer.
This commit is contained in:
parent
d1d988410b
commit
93bb3e995a
@ -279,11 +279,12 @@ boxes, as long as the stats-gatherer has a reachable IP address.)
|
||||
|
||||
The stats-gatherer is created in the same fashion as regular tahoe client
|
||||
nodes and introducer nodes. Choose a base directory for the gatherer to live
|
||||
in (but do not create the directory). Then run:
|
||||
in (but do not create the directory). Choose the hostname that should be
|
||||
advertised in the gatherer's FURL. Then run:
|
||||
|
||||
::
|
||||
|
||||
tahoe create-stats-gatherer $BASEDIR
|
||||
tahoe create-stats-gatherer --hostname=HOSTNAME $BASEDIR
|
||||
|
||||
and start it with "tahoe start $BASEDIR". Once running, the gatherer will
|
||||
write a FURL into $BASEDIR/stats_gatherer.furl .
|
||||
@ -295,19 +296,23 @@ under a key named "stats_gatherer.furl", like so:
|
||||
::
|
||||
|
||||
[client]
|
||||
stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y
|
||||
stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@HOSTNAME:PORTNUM/wxycb4kaexzskubjnauxeoptympyf45y
|
||||
|
||||
or simply copy the stats_gatherer.furl file into the node's base directory
|
||||
(next to the tahoe.cfg file): it will be interpreted in the same way.
|
||||
|
||||
The first time it is started, the gatherer will listen on a random unused TCP
|
||||
port, so it should not conflict with anything else that you have running on
|
||||
that host at that time. On subsequent runs, it will re-use the same port (to
|
||||
keep its FURL consistent). To explicitly control which port it uses, write
|
||||
the desired portnumber into a file named "portnum" (i.e. $BASEDIR/portnum),
|
||||
and the next time the gatherer is started, it will start listening on the
|
||||
given port. The portnum file is actually a "strports specification string",
|
||||
as described in :doc:`configuration`.
|
||||
When the gatherer is created, it will allocate a random unused TCP port, so
|
||||
it should not conflict with anything else that you have running on that host
|
||||
at that time. To explicitly control which port it uses, run the creation
|
||||
command with ``--location=`` and ``--port=`` instead of ``--hostname=``. If
|
||||
you use a hostname of ``example.org`` and a port number of ``1234``, then
|
||||
run::
|
||||
|
||||
tahoe create-stats-gatherer --location=tcp:example.org:1234 --port=tcp:1234
|
||||
|
||||
``--location=`` is a Foolscap FURL hints string (so it can be a
|
||||
comma-separated list of connection hints), and ``--port=`` is a Twisted
|
||||
"server endpoint specification string", as described in :doc:`configuration`.
|
||||
|
||||
Once running, the stats gatherer will create a standard JSON file in
|
||||
``$BASEDIR/stats.json``. Once a minute, the gatherer will pull stats
|
||||
@ -322,16 +327,17 @@ Other tools can be built to examine these stats and render them into
|
||||
something useful. For example, a tool could sum the
|
||||
"storage_server.disk_avail' values from all servers to compute a
|
||||
total-disk-available number for the entire grid (however, the "disk watcher"
|
||||
daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task).
|
||||
daemon, in misc/operations_helpers/spacetime/, is better suited for this
|
||||
specific task).
|
||||
|
||||
Using Munin To Graph Stats Values
|
||||
=================================
|
||||
|
||||
The misc/munin/ directory contains various plugins to graph stats for Tahoe
|
||||
nodes. They are intended for use with the Munin_ system-management tool, which
|
||||
typically polls target systems every 5 minutes and produces a web page with
|
||||
graphs of various things over multiple time scales (last hour, last month,
|
||||
last year).
|
||||
The misc/operations_helpers/munin/ directory contains various plugins to
|
||||
graph stats for Tahoe nodes. They are intended for use with the Munin_
|
||||
system-management tool, which typically polls target systems every 5 minutes
|
||||
and produces a web page with graphs of various things over multiple time
|
||||
scales (last hour, last month, last year).
|
||||
|
||||
Most of the plugins are designed to pull stats from a single Tahoe node, and
|
||||
are configured with the e.g. http://localhost:3456/statistics?t=json URL. The
|
||||
|
@ -1,14 +1,59 @@
|
||||
|
||||
import os, sys
|
||||
|
||||
from twisted.python import usage
|
||||
from allmydata.scripts.common import NoDefaultBasedirOptions
|
||||
from allmydata.scripts.create_node import write_tac
|
||||
from allmydata.util.assertutil import precondition
|
||||
from allmydata.util.encodingutil import listdir_unicode, quote_output
|
||||
from allmydata.util import fileutil, iputil
|
||||
|
||||
|
||||
class CreateStatsGathererOptions(NoDefaultBasedirOptions):
|
||||
subcommand_name = "create-stats-gatherer"
|
||||
optParameters = [
|
||||
("hostname", None, None, "Hostname of this machine, used to build location"),
|
||||
("location", None, None, "FURL connection hints, e.g. 'tcp:HOSTNAME:PORT'"),
|
||||
("port", None, None, "listening endpoint, e.g. 'tcp:PORT'"),
|
||||
]
|
||||
def postOptions(self):
|
||||
if self["hostname"] and (not self["location"]) and (not self["port"]):
|
||||
pass
|
||||
elif (not self["hostname"]) and self["location"] and self["port"]:
|
||||
pass
|
||||
else:
|
||||
raise usage.UsageError("You must provide --hostname, or --location and --port.")
|
||||
|
||||
description = """
|
||||
Create a "stats-gatherer" service, which is a standalone process that
|
||||
collects and stores runtime statistics from many server nodes. This is a
|
||||
tool for operations personnel to keep track of free disk space, server
|
||||
load, and protocol activity, across a fleet of Tahoe storage servers.
|
||||
|
||||
The "stats-gatherer" listens on a TCP port and publishes a Foolscap FURL
|
||||
by writing it into a file named "stats_gatherer.furl". You must copy this
|
||||
FURL into the servers' tahoe.cfg, as the [client] stats_gatherer.furl=
|
||||
entry. Those servers will then establish a connection to the
|
||||
stats-gatherer and publish their statistics on a periodic basis. The
|
||||
gatherer writes a summary JSON file out to disk after each update.
|
||||
|
||||
The stats-gatherer listens on a configurable port, and writes a
|
||||
configurable hostname+port pair into the FURL that it publishes. There
|
||||
are two configuration modes you can use.
|
||||
|
||||
* In the first, you provide --hostname=, and the service chooses its own
|
||||
TCP port number. If the host is named "example.org" and you provide
|
||||
--hostname=example.org, the node will pick a port number (e.g. 12345)
|
||||
and use location="tcp:example.org:12345" and port="tcp:12345".
|
||||
|
||||
* In the second, you provide both --location= and --port=, and the
|
||||
service will refrain from doing any allocation of its own. --location=
|
||||
must be a Foolscap "FURL connection hint sequence", which is a
|
||||
comma-separated list of "tcp:HOSTNAME:PORTNUM" strings. --port= must be
|
||||
a Twisted server endpoint specification, which is generally
|
||||
"tcp:PORTNUM". So, if your host is named "example.org" and you want to
|
||||
use port 6789, you should provide --location=tcp:example.org:6789 and
|
||||
--port=tcp:6789. You are responsible for making sure --location= and
|
||||
--port= match each other.
|
||||
"""
|
||||
|
||||
|
||||
def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
|
||||
@ -26,6 +71,15 @@ def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
|
||||
else:
|
||||
os.mkdir(basedir)
|
||||
write_tac(basedir, "stats-gatherer")
|
||||
if config["hostname"]:
|
||||
portnum = iputil.allocate_tcp_port()
|
||||
location = "tcp:%s:%d" % (config["hostname"], portnum)
|
||||
port = "tcp:%d" % portnum
|
||||
else:
|
||||
location = config["location"]
|
||||
port = config["port"]
|
||||
fileutil.write(os.path.join(basedir, "location"), location+"\n")
|
||||
fileutil.write(os.path.join(basedir, "port"), port+"\n")
|
||||
return 0
|
||||
|
||||
subCommands = [
|
||||
|
@ -11,7 +11,7 @@ from twisted.application.internet import TimerService
|
||||
from zope.interface import implements
|
||||
from foolscap.api import eventually, DeadReferenceError, Referenceable, Tub
|
||||
|
||||
from allmydata.util import log, fileutil
|
||||
from allmydata.util import log
|
||||
from allmydata.util.encodingutil import quote_local_unicode_path
|
||||
from allmydata.interfaces import RIStatsProvider, RIStatsGatherer, IStatsProducer
|
||||
|
||||
@ -294,24 +294,19 @@ class StatsGathererService(service.MultiService):
|
||||
self.stats_gatherer = JSONStatsGatherer(self.basedir, verbose)
|
||||
self.stats_gatherer.setServiceParent(self)
|
||||
|
||||
portnumfile = os.path.join(self.basedir, "portnum")
|
||||
try:
|
||||
portnum = open(portnumfile, "r").read()
|
||||
with open(os.path.join(self.basedir, "location")) as f:
|
||||
location = f.read().strip()
|
||||
except EnvironmentError:
|
||||
portnum = None
|
||||
self.listener = self.tub.listenOn(portnum or "tcp:0")
|
||||
d = self.tub.setLocationAutomatically()
|
||||
if portnum is None:
|
||||
d.addCallback(self.save_portnum)
|
||||
d.addCallback(self.tub_ready)
|
||||
d.addErrback(log.err)
|
||||
raise ValueError("Unable to find 'location' in BASEDIR, please rebuild your stats-gatherer")
|
||||
try:
|
||||
with open(os.path.join(self.basedir, "port")) as f:
|
||||
port = f.read().strip()
|
||||
except EnvironmentError:
|
||||
raise ValueError("Unable to find 'port' in BASEDIR, please rebuild your stats-gatherer")
|
||||
|
||||
def save_portnum(self, junk):
|
||||
portnum = self.listener.getPortnum()
|
||||
portnumfile = os.path.join(self.basedir, 'portnum')
|
||||
fileutil.write(portnumfile, '%d\n' % (portnum,))
|
||||
|
||||
def tub_ready(self, ignored):
|
||||
self.tub.listenOn(port)
|
||||
self.tub.setLocation(location)
|
||||
ff = os.path.join(self.basedir, self.furl_file)
|
||||
self.gatherer_furl = self.tub.registerReference(self.stats_gatherer,
|
||||
furlFile=ff)
|
||||
|
@ -494,6 +494,11 @@ class SystemTestMixin(pollmixin.PollMixin, testutil.StallMixin):
|
||||
def _set_up_stats_gatherer(self, res):
|
||||
statsdir = self.getdir("stats_gatherer")
|
||||
fileutil.make_dirs(statsdir)
|
||||
portnum = iputil.allocate_tcp_port()
|
||||
location = "tcp:127.0.0.1:%d" % portnum
|
||||
fileutil.write(os.path.join(statsdir, "location"), location)
|
||||
port = "tcp:%d:interface=127.0.0.1" % portnum
|
||||
fileutil.write(os.path.join(statsdir, "port"), port)
|
||||
self.stats_gatherer_svc = StatsGathererService(statsdir)
|
||||
self.stats_gatherer = self.stats_gatherer_svc.stats_gatherer
|
||||
self.add_service(self.stats_gatherer_svc)
|
||||
|
@ -184,14 +184,14 @@ class CreateNode(unittest.TestCase):
|
||||
rc = runner.runner(argv, stdout=out, stderr=err)
|
||||
return rc, out.getvalue(), err.getvalue()
|
||||
|
||||
def do_create(self, kind):
|
||||
def do_create(self, kind, *args):
|
||||
basedir = self.workdir("test_" + kind)
|
||||
command = "create-" + kind
|
||||
is_client = kind in ("node", "client")
|
||||
tac = is_client and "tahoe-client.tac" or ("tahoe-" + kind + ".tac")
|
||||
|
||||
n1 = os.path.join(basedir, command + "-n1")
|
||||
argv = ["--quiet", command, "--basedir", n1]
|
||||
argv = ["--quiet", command, "--basedir", n1] + list(args)
|
||||
rc, out, err = self.run_tahoe(argv)
|
||||
self.failUnlessEqual(err, "")
|
||||
self.failUnlessEqual(out, "")
|
||||
@ -226,7 +226,7 @@ class CreateNode(unittest.TestCase):
|
||||
|
||||
# test that the non --basedir form works too
|
||||
n2 = os.path.join(basedir, command + "-n2")
|
||||
argv = ["--quiet", command, n2]
|
||||
argv = ["--quiet", command] + list(args) + [n2]
|
||||
rc, out, err = self.run_tahoe(argv)
|
||||
self.failUnlessEqual(err, "")
|
||||
self.failUnlessEqual(out, "")
|
||||
@ -236,7 +236,7 @@ class CreateNode(unittest.TestCase):
|
||||
|
||||
# test the --node-directory form
|
||||
n3 = os.path.join(basedir, command + "-n3")
|
||||
argv = ["--quiet", "--node-directory", n3, command]
|
||||
argv = ["--quiet", "--node-directory", n3, command] + list(args)
|
||||
rc, out, err = self.run_tahoe(argv)
|
||||
self.failUnlessEqual(err, "")
|
||||
self.failUnlessEqual(out, "")
|
||||
@ -247,7 +247,7 @@ class CreateNode(unittest.TestCase):
|
||||
if kind in ("client", "node", "introducer"):
|
||||
# test that the output (without --quiet) includes the base directory
|
||||
n4 = os.path.join(basedir, command + "-n4")
|
||||
argv = [command, n4]
|
||||
argv = [command] + list(args) + [n4]
|
||||
rc, out, err = self.run_tahoe(argv)
|
||||
self.failUnlessEqual(err, "")
|
||||
self.failUnlessIn(" created in ", out)
|
||||
@ -282,7 +282,7 @@ class CreateNode(unittest.TestCase):
|
||||
self.do_create("introducer")
|
||||
|
||||
def test_stats_gatherer(self):
|
||||
self.do_create("stats-gatherer")
|
||||
self.do_create("stats-gatherer", "--hostname=127.0.0.1")
|
||||
|
||||
def test_subcommands(self):
|
||||
# no arguments should trigger a command listing, via UsageError
|
||||
@ -291,6 +291,44 @@ class CreateNode(unittest.TestCase):
|
||||
[],
|
||||
run_by_human=False)
|
||||
|
||||
def test_stats_gatherer_good_args(self):
|
||||
rc = runner.runner(["create-stats-gatherer", "--hostname=foo",
|
||||
self.mktemp()])
|
||||
self.assertEqual(rc, 0)
|
||||
rc = runner.runner(["create-stats-gatherer", "--location=tcp:foo:1234",
|
||||
"--port=tcp:1234", self.mktemp()])
|
||||
self.assertEqual(rc, 0)
|
||||
|
||||
def test_stats_gatherer_bad_args(self):
|
||||
# missing hostname/location/port
|
||||
argv = "create-stats-gatherer D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
# missing port
|
||||
argv = "create-stats-gatherer --location=foo D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
# missing location
|
||||
argv = "create-stats-gatherer --port=foo D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
# can't provide both
|
||||
argv = "create-stats-gatherer --hostname=foo --port=foo D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
# can't provide both
|
||||
argv = "create-stats-gatherer --hostname=foo --location=foo D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
# can't provide all three
|
||||
argv = "create-stats-gatherer --hostname=foo --location=foo --port=foo D"
|
||||
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
|
||||
run_by_human=False)
|
||||
|
||||
class RunNode(common_util.SignalMixin, unittest.TestCase, pollmixin.PollMixin,
|
||||
RunBinTahoeMixin):
|
||||
|
18
topfiles/2773.docs
Normal file
18
topfiles/2773.docs
Normal file
@ -0,0 +1,18 @@
|
||||
The "stats-gatherer", an operation-helper service used to collect runtime
|
||||
statistics from a fleet of Tahoe storage servers, must now be assigned a
|
||||
hostname, or location+port pair, at creation time. The "tahoe
|
||||
create-stats-gatherer" command now requires either "--hostname=", or both
|
||||
"--location=" and "--port".
|
||||
|
||||
Previously, "tahoe create-stats-gatherer NODEDIR" would attempt to guess its
|
||||
location by running something like /sbin/ifconfig to collect local IP
|
||||
addresses. While this works if the host has a public IP address (or at least
|
||||
lives in the same LAN as the storage servers it monitors), most sysadmins
|
||||
would prefer the FURL be created with a real hostname.
|
||||
|
||||
To keep your old stats-gatherers working, with their original FURL, you must
|
||||
determine a suitable --location and --port, and write their values into
|
||||
NODEDIR/location and NODEDIR/port, respectively. Or you could simply rebuild
|
||||
it by re-running "tahoe create-stats-gatherer" with the new arguments.
|
||||
|
||||
See docs/stats.rst for details.
|
Loading…
Reference in New Issue
Block a user