stats: add a simple stats gathering system
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
2008-01-31 03:11:07 +00:00
|
|
|
#!/usr/bin/python
|
|
|
|
|
|
|
|
import os
|
|
|
|
import pickle
|
|
|
|
import re
|
|
|
|
import sys
|
2008-02-01 04:04:23 +00:00
|
|
|
import time
|
|
|
|
|
|
|
|
STAT_VALIDITY = 300 # 5min limit on reporting stats
|
stats: add a simple stats gathering system
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
2008-01-31 03:11:07 +00:00
|
|
|
|
|
|
|
PLUGINS = {
|
|
|
|
'tahoe_storage_consumed':
|
|
|
|
{ 'statid': 'storage_server.consumed',
|
|
|
|
'category': 'stats',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Storage Server Space Consumed',
|
|
|
|
'graph_vlabel bytes',
|
|
|
|
'graph_category tahoe_storage_server',
|
|
|
|
'graph_info This graph shows space consumed',
|
|
|
|
'graph_args --base 1024',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
'tahoe_storage_allocated':
|
|
|
|
{ 'statid': 'storage_server.allocated',
|
|
|
|
'category': 'stats',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Storage Server Space Allocated',
|
|
|
|
'graph_vlabel bytes',
|
|
|
|
'graph_category tahoe_storage_server',
|
|
|
|
'graph_info This graph shows space allocated',
|
|
|
|
'graph_args --base 1024',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
|
|
|
|
'tahoe_runtime_load_avg':
|
|
|
|
{ 'statid': 'load_monitor.avg_load',
|
|
|
|
'category': 'stats',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Runtime Load Average',
|
|
|
|
'graph_vlabel load',
|
|
|
|
'graph_category tahoe',
|
|
|
|
'graph_info This graph shows average reactor delay',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
'tahoe_runtime_load_peak':
|
|
|
|
{ 'statid': 'load_monitor.max_load',
|
|
|
|
'category': 'stats',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Runtime Load Peak',
|
|
|
|
'graph_vlabel load',
|
|
|
|
'graph_category tahoe',
|
|
|
|
'graph_info This graph shows peak reactor delay',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
|
|
|
|
'tahoe_storage_bytes_added':
|
|
|
|
{ 'statid': 'storage_server.bytes_added',
|
|
|
|
'category': 'counters',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Storage Server Bytes Added',
|
|
|
|
'graph_vlabel bytes',
|
|
|
|
'graph_category tahoe_storage_server',
|
|
|
|
'graph_info This graph shows cummulative bytes added',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
'tahoe_storage_bytes_freed':
|
|
|
|
{ 'statid': 'storage_server.bytes_freed',
|
|
|
|
'category': 'counters',
|
|
|
|
'configheader': '\n'.join(['graph_title Tahoe Storage Server Bytes Removed',
|
|
|
|
'graph_vlabel bytes',
|
|
|
|
'graph_category tahoe_storage_server',
|
|
|
|
'graph_info This graph shows cummulative bytes removed',
|
|
|
|
]),
|
|
|
|
'graph_config': '\n'.join(['%(name)s.label %(name)s',
|
|
|
|
'%(name)s.draw LINE1',
|
|
|
|
]),
|
|
|
|
'graph_render': '\n'.join(['%(name)s.value %(value)s',
|
|
|
|
]),
|
|
|
|
},
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
def smash_name(name):
|
|
|
|
return re.sub('[^a-zA-Z0-9]', '_', name)
|
|
|
|
|
|
|
|
def open_stats(fname):
|
|
|
|
f = open(fname, 'rb')
|
|
|
|
stats = pickle.load(f)
|
|
|
|
f.close()
|
|
|
|
return stats
|
|
|
|
|
|
|
|
def main(argv):
|
|
|
|
graph_name = os.path.basename(argv[0])
|
|
|
|
if graph_name.endswith('.py'):
|
|
|
|
graph_name = graph_name[:-3]
|
|
|
|
|
|
|
|
plugin_conf = PLUGINS.get(graph_name)
|
|
|
|
|
|
|
|
for k,v in os.environ.items():
|
|
|
|
if k.startswith('statsfile'):
|
|
|
|
stats_file = v
|
|
|
|
break
|
|
|
|
else:
|
|
|
|
raise RuntimeError("No 'statsfile' env var found")
|
|
|
|
|
|
|
|
stats = open_stats(stats_file)
|
|
|
|
|
2008-02-01 04:04:23 +00:00
|
|
|
now = time.time()
|
stats: add a simple stats gathering system
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
2008-01-31 03:11:07 +00:00
|
|
|
def output_nodes(output_section):
|
|
|
|
for tubid, nodestats in stats.items():
|
2008-02-01 04:04:23 +00:00
|
|
|
if (now - nodestats.get('timestamp', 0)) > STAT_VALIDITY:
|
|
|
|
continue
|
2008-02-01 02:21:17 +00:00
|
|
|
name = smash_name("%s_%s" % (nodestats['nickname'], tubid[:4]))
|
stats: add a simple stats gathering system
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
2008-01-31 03:11:07 +00:00
|
|
|
#value = nodestats['stats'][plugin_conf['category']].get(plugin_conf['statid'])
|
|
|
|
category = plugin_conf['category']
|
|
|
|
statid = plugin_conf['statid']
|
|
|
|
value = nodestats['stats'][category].get(statid)
|
|
|
|
if value is not None:
|
|
|
|
args = { 'name': name, 'value': value }
|
|
|
|
print plugin_conf[output_section] % args
|
|
|
|
|
|
|
|
if len(argv) > 1:
|
|
|
|
if sys.argv[1] == 'config':
|
|
|
|
print plugin_conf['configheader']
|
|
|
|
output_nodes('graph_config')
|
|
|
|
sys.exit(0)
|
|
|
|
|
|
|
|
output_nodes('graph_render')
|
|
|
|
|
|
|
|
if __name__ == '__main__':
|
|
|
|
main(sys.argv)
|