tahoe-lafs/misc/munin/tahoe_stats

399 lines
19 KiB
Plaintext
Raw Normal View History

stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
#!/usr/bin/python
import os
import pickle
import re
import sys
import time
STAT_VALIDITY = 300 # 5min limit on reporting stats
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
PLUGINS = {
'tahoe_storage_consumed':
{ 'statid': 'storage_server.consumed',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Storage Server Space Consumed',
'graph_vlabel bytes',
'graph_category tahoe_storage_server',
'graph_info This graph shows space consumed',
'graph_args --base 1024',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_storage_allocated':
{ 'statid': 'storage_server.allocated',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Storage Server Space Allocated',
'graph_vlabel bytes',
'graph_category tahoe_storage_server',
'graph_info This graph shows space allocated',
'graph_args --base 1024',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_runtime_load_avg':
{ 'statid': 'load_monitor.avg_load',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Runtime Load Average',
'graph_vlabel load',
'graph_category tahoe',
'graph_info This graph shows average reactor delay',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_runtime_load_peak':
{ 'statid': 'load_monitor.max_load',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Runtime Load Peak',
'graph_vlabel load',
'graph_category tahoe',
'graph_info This graph shows peak reactor delay',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_storage_bytes_added':
{ 'statid': 'storage_server.bytes_added',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Storage Server Bytes Added',
'graph_vlabel bytes',
'graph_category tahoe_storage_server',
'graph_info This graph shows cummulative bytes added',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_storage_bytes_freed':
{ 'statid': 'storage_server.bytes_freed',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Storage Server Bytes Removed',
'graph_vlabel bytes',
'graph_category tahoe_storage_server',
'graph_info This graph shows cummulative bytes removed',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_incoming_files':
{ 'statid': 'chk_upload_helper.incoming_count',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Incoming File Count',
'graph_vlabel n files',
'graph_category tahoe_helper',
'graph_info This graph shows number of incoming files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_incoming_filesize':
{ 'statid': 'chk_upload_helper.incoming_size',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Incoming File Size',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows total size of incoming files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_incoming_files_old':
{ 'statid': 'chk_upload_helper.incoming_size_old',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Incoming Old Files',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows total size of old incoming files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_encoding_files':
{ 'statid': 'chk_upload_helper.encoding_count',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Encoding File Count',
'graph_vlabel n files',
'graph_category tahoe_helper',
'graph_info This graph shows number of encoding files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_encoding_filesize':
{ 'statid': 'chk_upload_helper.encoding_size',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Encoding File Size',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows total size of encoding files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_encoding_files_old':
{ 'statid': 'chk_upload_helper.encoding_size_old',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Encoding Old Files',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows total size of old encoding files',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_active_uploads':
{ 'statid': 'chk_upload_helper.active_uploads',
'category': 'stats',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Active Files',
'graph_vlabel n files',
'graph_category tahoe_helper',
'graph_info This graph shows number of files actively being processed by the helper',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_upload_requests':
{ 'statid': 'chk_upload_helper.upload_requests',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Upload Requests',
'graph_vlabel requests',
'graph_category tahoe_helper',
'graph_info This graph shows the number of upload requests arriving at the helper',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_upload_already_present':
{ 'statid': 'chk_upload_helper.upload_already_present',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Uploads Already Present',
'graph_vlabel requests',
'graph_category tahoe_helper',
'graph_info This graph shows the number of uploads whose files are already present in the grid',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_upload_need_upload':
{ 'statid': 'chk_upload_helper.upload_need_upload',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Uploads Needing Upload',
'graph_vlabel requests',
'graph_category tahoe_helper',
'graph_info This graph shows the number of uploads whose files are not already present in the grid',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_encoded_bytes':
{ 'statid': 'chk_upload_helper.encoded_bytes',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Encoded Bytes',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows the number of bytes encoded by the helper',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_helper_fetched_bytes':
{ 'statid': 'chk_upload_helper.fetched_bytes',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Upload Helper Fetched Bytes',
'graph_vlabel bytes',
'graph_category tahoe_helper',
'graph_info This graph shows the number of bytes fetched by the helper',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_uploader_bytes_uploaded':
{ 'statid': 'uploader.bytes_uploaded',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Uploader Bytes Uploaded',
'graph_vlabel bytes',
'graph_category tahoe_traffic',
'graph_info This graph shows the number of bytes uploaded',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_uploader_files_uploaded':
{ 'statid': 'uploader.files_uploaded',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Uploader Bytes Uploaded',
'graph_vlabel files',
'graph_category tahoe_traffic',
'graph_info This graph shows the number of files uploaded',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_mutable_files_published':
{ 'statid': 'mutable.files_published',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Mutable Files Published',
'graph_vlabel files',
'graph_category tahoe_traffic',
'graph_info This graph shows the number of mutable files published',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
'tahoe_mutable_files_retrieved':
{ 'statid': 'mutable.files_retrieved',
'category': 'counters',
'configheader': '\n'.join(['graph_title Tahoe Mutable Files Retrieved',
'graph_vlabel files',
'graph_category tahoe_traffic',
'graph_info This graph shows the number of files retrieved',
]),
'graph_config': '\n'.join(['%(name)s.label %(name)s',
'%(name)s.type DERIVE',
'%(name)s.min 0',
'%(name)s.draw LINE1',
]),
'graph_render': '\n'.join(['%(name)s.value %(value)s',
]),
},
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
}
def smash_name(name):
return re.sub('[^a-zA-Z0-9]', '_', name)
def open_stats(fname):
f = open(fname, 'rb')
stats = pickle.load(f)
f.close()
return stats
def main(argv):
graph_name = os.path.basename(argv[0])
if graph_name.endswith('.py'):
graph_name = graph_name[:-3]
plugin_conf = PLUGINS.get(graph_name)
for k,v in os.environ.items():
if k.startswith('statsfile'):
stats_file = v
break
else:
raise RuntimeError("No 'statsfile' env var found")
stats = open_stats(stats_file)
now = time.time()
def output_nodes(output_section, check_time):
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
for tubid, nodestats in stats.items():
if check_time and (now - nodestats.get('timestamp', 0)) > STAT_VALIDITY:
continue
name = smash_name("%s_%s" % (nodestats['nickname'], tubid[:4]))
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
#value = nodestats['stats'][plugin_conf['category']].get(plugin_conf['statid'])
category = plugin_conf['category']
statid = plugin_conf['statid']
value = nodestats['stats'][category].get(statid)
if value is not None:
args = { 'name': name, 'value': value }
print plugin_conf[output_section] % args
if len(argv) > 1:
if sys.argv[1] == 'config':
print plugin_conf['configheader']
output_nodes('graph_config', False)
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
sys.exit(0)
output_nodes('graph_render', True)
stats: add a simple stats gathering system We have a desire to collect runtime statistics from multiple nodes primarily for server monitoring purposes. This implements a simple implementation of such a system, as a skeleton to build more sophistication upon. Each client now looks for a 'stats_gatherer.furl' config file. If it has been configured to use a stats gatherer, then it instantiates internally a StatsProvider. This is a central place for code which wishes to offer stats up for monitoring to report them to, either by calling stats_provider.count('stat.name', value) to increment a counter, or by registering a class as a stats producer with sp.register_producer(obj). The StatsProvider connects to the StatsGatherer server and provides its provider upon startup. The StatsGatherer is then responsible for polling the attached providers periodically to retrieve the data provided. The provider queries each registered producer when the gatherer queries the provider. Both the internal 'counters' and the queried 'stats' are then reported to the gatherer. This provides a simple gatherer app, (c.f. make stats-gatherer-run) which prints its furl and listens for incoming connections. Once a minute, the gatherer polls all connected providers, and writes the retrieved data into a pickle file. Also included is a munin plugin which knows how to read the gatherer's stats.pickle and output data munin can interpret. this plugin, tahoe-stats.py can be symlinked as multiple different names within munin's 'plugins' directory, and inspects argv to determine which data to display, doing a lookup in a table within that file. It looks in the environment for 'statsfile' to determine the path to the gatherer's stats.pickle. An example plugins-conf.d file is provided.
2008-01-31 03:11:07 +00:00
if __name__ == '__main__':
main(sys.argv)