I'd implemented stats gathering hooks in the helper a while back.
Brian did the same without reference to my changes. This reconciles
those two changes, encompassing all the stats in both changes,
implemented through the stats_provider interface.
this also provide templates for all 10 helper graphs in the
tahoe-stats munin plugin.
adds a stats_producer for the upload helper, which provides a series of counters
to the stats gatherer, under the name 'chk_upload_helper'.
it examines both the 'incoming' directory, and the 'encoding' dir, providing
inc_count inc_size inc_size_old enc_count enc_size enc_size_old, respectively
the number of files in each dir, the total size thereof, and the aggregate
size of all files older than 48hrs
the windows (cygwin) buildslave has been failing the key generator test
it turns out that the time check on whether to refill the pool, and the
reactor, are interacting such that when the maybe_refill_pool call posted
on the reactor fires, the test on whether to fill the pool fails.
this adds a loop in the failure case to retry each 1s until it is time
to refill the pool, thus mitigating this timing accuracy problem on
windows.
the RIStatsProvider interface requires that counter and stat values be
ChoiceOf(float, int, long) the recent changes to storage server to not
track 'consumed' led to returning None as the value of a counter.
this causes violations to be experienced by nodes whose stats are being
gathered.
this patch simply omits that stat if 'consumed' is not being tracked.
one of the storage servers is throwing foolscap violations about the
return value of get_stats(). this adds a log of the data returned
to the foolscap log event stream at the debug level '12' (between
NOISY(10) and OPERATIONAL(20)) hopefully this will facilitate
finding the cause of this problem.
the timeouts on uses of 'poll' were there purely to make sure a test doesn't
poll indefinitely. however having such timeouts makes tests susceptible
to premature timeouts under high load, or on slow machines. (e.g. cygwin
slaves running in virtual machines on loaded hosts)
purportedly trial by default applies a timeout to tests to prevent them
hanging out indefinitely, so these poll timeouts are redundant and cause
intermittent failures on slow hosts. hence they're more bother than they're
worth, and should be culled.