stats-gatherer: add --hostname/--location/--port

Updates docs, tests, explains how to update an old gatherer.
This commit is contained in:
Brian Warner 2016-05-04 15:04:12 -07:00
parent d1d988410b
commit 93bb3e995a
6 changed files with 157 additions and 41 deletions

View File

@ -279,11 +279,12 @@ boxes, as long as the stats-gatherer has a reachable IP address.)
The stats-gatherer is created in the same fashion as regular tahoe client The stats-gatherer is created in the same fashion as regular tahoe client
nodes and introducer nodes. Choose a base directory for the gatherer to live nodes and introducer nodes. Choose a base directory for the gatherer to live
in (but do not create the directory). Then run: in (but do not create the directory). Choose the hostname that should be
advertised in the gatherer's FURL. Then run:
:: ::
tahoe create-stats-gatherer $BASEDIR tahoe create-stats-gatherer --hostname=HOSTNAME $BASEDIR
and start it with "tahoe start $BASEDIR". Once running, the gatherer will and start it with "tahoe start $BASEDIR". Once running, the gatherer will
write a FURL into $BASEDIR/stats_gatherer.furl . write a FURL into $BASEDIR/stats_gatherer.furl .
@ -295,19 +296,23 @@ under a key named "stats_gatherer.furl", like so:
:: ::
[client] [client]
stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@HOSTNAME:PORTNUM/wxycb4kaexzskubjnauxeoptympyf45y
or simply copy the stats_gatherer.furl file into the node's base directory or simply copy the stats_gatherer.furl file into the node's base directory
(next to the tahoe.cfg file): it will be interpreted in the same way. (next to the tahoe.cfg file): it will be interpreted in the same way.
The first time it is started, the gatherer will listen on a random unused TCP When the gatherer is created, it will allocate a random unused TCP port, so
port, so it should not conflict with anything else that you have running on it should not conflict with anything else that you have running on that host
that host at that time. On subsequent runs, it will re-use the same port (to at that time. To explicitly control which port it uses, run the creation
keep its FURL consistent). To explicitly control which port it uses, write command with ``--location=`` and ``--port=`` instead of ``--hostname=``. If
the desired portnumber into a file named "portnum" (i.e. $BASEDIR/portnum), you use a hostname of ``example.org`` and a port number of ``1234``, then
and the next time the gatherer is started, it will start listening on the run::
given port. The portnum file is actually a "strports specification string",
as described in :doc:`configuration`. tahoe create-stats-gatherer --location=tcp:example.org:1234 --port=tcp:1234
``--location=`` is a Foolscap FURL hints string (so it can be a
comma-separated list of connection hints), and ``--port=`` is a Twisted
"server endpoint specification string", as described in :doc:`configuration`.
Once running, the stats gatherer will create a standard JSON file in Once running, the stats gatherer will create a standard JSON file in
``$BASEDIR/stats.json``. Once a minute, the gatherer will pull stats ``$BASEDIR/stats.json``. Once a minute, the gatherer will pull stats
@ -322,16 +327,17 @@ Other tools can be built to examine these stats and render them into
something useful. For example, a tool could sum the something useful. For example, a tool could sum the
"storage_server.disk_avail' values from all servers to compute a "storage_server.disk_avail' values from all servers to compute a
total-disk-available number for the entire grid (however, the "disk watcher" total-disk-available number for the entire grid (however, the "disk watcher"
daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task). daemon, in misc/operations_helpers/spacetime/, is better suited for this
specific task).
Using Munin To Graph Stats Values Using Munin To Graph Stats Values
================================= =================================
The misc/munin/ directory contains various plugins to graph stats for Tahoe The misc/operations_helpers/munin/ directory contains various plugins to
nodes. They are intended for use with the Munin_ system-management tool, which graph stats for Tahoe nodes. They are intended for use with the Munin_
typically polls target systems every 5 minutes and produces a web page with system-management tool, which typically polls target systems every 5 minutes
graphs of various things over multiple time scales (last hour, last month, and produces a web page with graphs of various things over multiple time
last year). scales (last hour, last month, last year).
Most of the plugins are designed to pull stats from a single Tahoe node, and Most of the plugins are designed to pull stats from a single Tahoe node, and
are configured with the e.g. http://localhost:3456/statistics?t=json URL. The are configured with the e.g. http://localhost:3456/statistics?t=json URL. The

View File

@ -1,14 +1,59 @@
import os, sys import os, sys
from twisted.python import usage
from allmydata.scripts.common import NoDefaultBasedirOptions from allmydata.scripts.common import NoDefaultBasedirOptions
from allmydata.scripts.create_node import write_tac from allmydata.scripts.create_node import write_tac
from allmydata.util.assertutil import precondition from allmydata.util.assertutil import precondition
from allmydata.util.encodingutil import listdir_unicode, quote_output from allmydata.util.encodingutil import listdir_unicode, quote_output
from allmydata.util import fileutil, iputil
class CreateStatsGathererOptions(NoDefaultBasedirOptions): class CreateStatsGathererOptions(NoDefaultBasedirOptions):
subcommand_name = "create-stats-gatherer" subcommand_name = "create-stats-gatherer"
optParameters = [
("hostname", None, None, "Hostname of this machine, used to build location"),
("location", None, None, "FURL connection hints, e.g. 'tcp:HOSTNAME:PORT'"),
("port", None, None, "listening endpoint, e.g. 'tcp:PORT'"),
]
def postOptions(self):
if self["hostname"] and (not self["location"]) and (not self["port"]):
pass
elif (not self["hostname"]) and self["location"] and self["port"]:
pass
else:
raise usage.UsageError("You must provide --hostname, or --location and --port.")
description = """
Create a "stats-gatherer" service, which is a standalone process that
collects and stores runtime statistics from many server nodes. This is a
tool for operations personnel to keep track of free disk space, server
load, and protocol activity, across a fleet of Tahoe storage servers.
The "stats-gatherer" listens on a TCP port and publishes a Foolscap FURL
by writing it into a file named "stats_gatherer.furl". You must copy this
FURL into the servers' tahoe.cfg, as the [client] stats_gatherer.furl=
entry. Those servers will then establish a connection to the
stats-gatherer and publish their statistics on a periodic basis. The
gatherer writes a summary JSON file out to disk after each update.
The stats-gatherer listens on a configurable port, and writes a
configurable hostname+port pair into the FURL that it publishes. There
are two configuration modes you can use.
* In the first, you provide --hostname=, and the service chooses its own
TCP port number. If the host is named "example.org" and you provide
--hostname=example.org, the node will pick a port number (e.g. 12345)
and use location="tcp:example.org:12345" and port="tcp:12345".
* In the second, you provide both --location= and --port=, and the
service will refrain from doing any allocation of its own. --location=
must be a Foolscap "FURL connection hint sequence", which is a
comma-separated list of "tcp:HOSTNAME:PORTNUM" strings. --port= must be
a Twisted server endpoint specification, which is generally
"tcp:PORTNUM". So, if your host is named "example.org" and you want to
use port 6789, you should provide --location=tcp:example.org:6789 and
--port=tcp:6789. You are responsible for making sure --location= and
--port= match each other.
"""
def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr): def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
@ -26,6 +71,15 @@ def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
else: else:
os.mkdir(basedir) os.mkdir(basedir)
write_tac(basedir, "stats-gatherer") write_tac(basedir, "stats-gatherer")
if config["hostname"]:
portnum = iputil.allocate_tcp_port()
location = "tcp:%s:%d" % (config["hostname"], portnum)
port = "tcp:%d" % portnum
else:
location = config["location"]
port = config["port"]
fileutil.write(os.path.join(basedir, "location"), location+"\n")
fileutil.write(os.path.join(basedir, "port"), port+"\n")
return 0 return 0
subCommands = [ subCommands = [

View File

@ -11,7 +11,7 @@ from twisted.application.internet import TimerService
from zope.interface import implements from zope.interface import implements
from foolscap.api import eventually, DeadReferenceError, Referenceable, Tub from foolscap.api import eventually, DeadReferenceError, Referenceable, Tub
from allmydata.util import log, fileutil from allmydata.util import log
from allmydata.util.encodingutil import quote_local_unicode_path from allmydata.util.encodingutil import quote_local_unicode_path
from allmydata.interfaces import RIStatsProvider, RIStatsGatherer, IStatsProducer from allmydata.interfaces import RIStatsProvider, RIStatsGatherer, IStatsProducer
@ -294,24 +294,19 @@ class StatsGathererService(service.MultiService):
self.stats_gatherer = JSONStatsGatherer(self.basedir, verbose) self.stats_gatherer = JSONStatsGatherer(self.basedir, verbose)
self.stats_gatherer.setServiceParent(self) self.stats_gatherer.setServiceParent(self)
portnumfile = os.path.join(self.basedir, "portnum")
try: try:
portnum = open(portnumfile, "r").read() with open(os.path.join(self.basedir, "location")) as f:
location = f.read().strip()
except EnvironmentError: except EnvironmentError:
portnum = None raise ValueError("Unable to find 'location' in BASEDIR, please rebuild your stats-gatherer")
self.listener = self.tub.listenOn(portnum or "tcp:0") try:
d = self.tub.setLocationAutomatically() with open(os.path.join(self.basedir, "port")) as f:
if portnum is None: port = f.read().strip()
d.addCallback(self.save_portnum) except EnvironmentError:
d.addCallback(self.tub_ready) raise ValueError("Unable to find 'port' in BASEDIR, please rebuild your stats-gatherer")
d.addErrback(log.err)
def save_portnum(self, junk): self.tub.listenOn(port)
portnum = self.listener.getPortnum() self.tub.setLocation(location)
portnumfile = os.path.join(self.basedir, 'portnum')
fileutil.write(portnumfile, '%d\n' % (portnum,))
def tub_ready(self, ignored):
ff = os.path.join(self.basedir, self.furl_file) ff = os.path.join(self.basedir, self.furl_file)
self.gatherer_furl = self.tub.registerReference(self.stats_gatherer, self.gatherer_furl = self.tub.registerReference(self.stats_gatherer,
furlFile=ff) furlFile=ff)

View File

@ -494,6 +494,11 @@ class SystemTestMixin(pollmixin.PollMixin, testutil.StallMixin):
def _set_up_stats_gatherer(self, res): def _set_up_stats_gatherer(self, res):
statsdir = self.getdir("stats_gatherer") statsdir = self.getdir("stats_gatherer")
fileutil.make_dirs(statsdir) fileutil.make_dirs(statsdir)
portnum = iputil.allocate_tcp_port()
location = "tcp:127.0.0.1:%d" % portnum
fileutil.write(os.path.join(statsdir, "location"), location)
port = "tcp:%d:interface=127.0.0.1" % portnum
fileutil.write(os.path.join(statsdir, "port"), port)
self.stats_gatherer_svc = StatsGathererService(statsdir) self.stats_gatherer_svc = StatsGathererService(statsdir)
self.stats_gatherer = self.stats_gatherer_svc.stats_gatherer self.stats_gatherer = self.stats_gatherer_svc.stats_gatherer
self.add_service(self.stats_gatherer_svc) self.add_service(self.stats_gatherer_svc)

View File

@ -184,14 +184,14 @@ class CreateNode(unittest.TestCase):
rc = runner.runner(argv, stdout=out, stderr=err) rc = runner.runner(argv, stdout=out, stderr=err)
return rc, out.getvalue(), err.getvalue() return rc, out.getvalue(), err.getvalue()
def do_create(self, kind): def do_create(self, kind, *args):
basedir = self.workdir("test_" + kind) basedir = self.workdir("test_" + kind)
command = "create-" + kind command = "create-" + kind
is_client = kind in ("node", "client") is_client = kind in ("node", "client")
tac = is_client and "tahoe-client.tac" or ("tahoe-" + kind + ".tac") tac = is_client and "tahoe-client.tac" or ("tahoe-" + kind + ".tac")
n1 = os.path.join(basedir, command + "-n1") n1 = os.path.join(basedir, command + "-n1")
argv = ["--quiet", command, "--basedir", n1] argv = ["--quiet", command, "--basedir", n1] + list(args)
rc, out, err = self.run_tahoe(argv) rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "") self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "") self.failUnlessEqual(out, "")
@ -226,7 +226,7 @@ class CreateNode(unittest.TestCase):
# test that the non --basedir form works too # test that the non --basedir form works too
n2 = os.path.join(basedir, command + "-n2") n2 = os.path.join(basedir, command + "-n2")
argv = ["--quiet", command, n2] argv = ["--quiet", command] + list(args) + [n2]
rc, out, err = self.run_tahoe(argv) rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "") self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "") self.failUnlessEqual(out, "")
@ -236,7 +236,7 @@ class CreateNode(unittest.TestCase):
# test the --node-directory form # test the --node-directory form
n3 = os.path.join(basedir, command + "-n3") n3 = os.path.join(basedir, command + "-n3")
argv = ["--quiet", "--node-directory", n3, command] argv = ["--quiet", "--node-directory", n3, command] + list(args)
rc, out, err = self.run_tahoe(argv) rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "") self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "") self.failUnlessEqual(out, "")
@ -247,7 +247,7 @@ class CreateNode(unittest.TestCase):
if kind in ("client", "node", "introducer"): if kind in ("client", "node", "introducer"):
# test that the output (without --quiet) includes the base directory # test that the output (without --quiet) includes the base directory
n4 = os.path.join(basedir, command + "-n4") n4 = os.path.join(basedir, command + "-n4")
argv = [command, n4] argv = [command] + list(args) + [n4]
rc, out, err = self.run_tahoe(argv) rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "") self.failUnlessEqual(err, "")
self.failUnlessIn(" created in ", out) self.failUnlessIn(" created in ", out)
@ -282,7 +282,7 @@ class CreateNode(unittest.TestCase):
self.do_create("introducer") self.do_create("introducer")
def test_stats_gatherer(self): def test_stats_gatherer(self):
self.do_create("stats-gatherer") self.do_create("stats-gatherer", "--hostname=127.0.0.1")
def test_subcommands(self): def test_subcommands(self):
# no arguments should trigger a command listing, via UsageError # no arguments should trigger a command listing, via UsageError
@ -291,6 +291,44 @@ class CreateNode(unittest.TestCase):
[], [],
run_by_human=False) run_by_human=False)
def test_stats_gatherer_good_args(self):
rc = runner.runner(["create-stats-gatherer", "--hostname=foo",
self.mktemp()])
self.assertEqual(rc, 0)
rc = runner.runner(["create-stats-gatherer", "--location=tcp:foo:1234",
"--port=tcp:1234", self.mktemp()])
self.assertEqual(rc, 0)
def test_stats_gatherer_bad_args(self):
# missing hostname/location/port
argv = "create-stats-gatherer D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
# missing port
argv = "create-stats-gatherer --location=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
# missing location
argv = "create-stats-gatherer --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
# can't provide both
argv = "create-stats-gatherer --hostname=foo --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
# can't provide both
argv = "create-stats-gatherer --hostname=foo --location=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
# can't provide all three
argv = "create-stats-gatherer --hostname=foo --location=foo --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)
class RunNode(common_util.SignalMixin, unittest.TestCase, pollmixin.PollMixin, class RunNode(common_util.SignalMixin, unittest.TestCase, pollmixin.PollMixin,
RunBinTahoeMixin): RunBinTahoeMixin):

18
topfiles/2773.docs Normal file
View File

@ -0,0 +1,18 @@
The "stats-gatherer", an operation-helper service used to collect runtime
statistics from a fleet of Tahoe storage servers, must now be assigned a
hostname, or location+port pair, at creation time. The "tahoe
create-stats-gatherer" command now requires either "--hostname=", or both
"--location=" and "--port".
Previously, "tahoe create-stats-gatherer NODEDIR" would attempt to guess its
location by running something like /sbin/ifconfig to collect local IP
addresses. While this works if the host has a public IP address (or at least
lives in the same LAN as the storage servers it monitors), most sysadmins
would prefer the FURL be created with a real hostname.
To keep your old stats-gatherers working, with their original FURL, you must
determine a suitable --location and --port, and write their values into
NODEDIR/location and NODEDIR/port, respectively. Or you could simply rebuild
it by re-running "tahoe create-stats-gatherer" with the new arguments.
See docs/stats.rst for details.