Unfinished bits: doc in webapi.txt, test handling of badly formed JSON, return reasonable HTTP response, examination of the effect of this patch on code coverage -- but I'm committing it anyway because MikeB can use it and I'm being called to dinner...
this adds an interface, IStatsProducer, defining the get_stats() method
which the stats provider calls upon and registered producer, and made the
register_producer() method check that interface is implemented.
also refine the startup logic, so that the stats provider doesn't try and
connect out to the stats gatherer until after the node declares the tub
'ready'. this is to address an issue whereby providers would attach to
the gatherer without providing a valid furl, and hence the gatherer would
be unable to determine the tubid of the connected client, leading to lost
samples.
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
Also allow an optional leading "http://127.0.0.1:8123/uri/".
Also fix a few unit tests to generate bogus Dirnode URIs of the modern form instead of the former form.
* use new decentralized directories everywhere instead of old centralized directories
* provide UI to them through the web server
* provide UI to them through the CLI
* update unit tests to simulate decentralized mutable directories in order to test other components that rely on them
* remove the notion of a "vdrive server" and a client thereof
* remove the notion of a "public vdrive", which was a directory that was centrally published/subscribed automatically by the tahoe node (you can accomplish this manually by making a directory and posting the URL to it on your web site, for example)
* add a notion of "wait_for_numpeers" when you need to publish data to peers, which is how many peers should be attached before you start. The default is 1.
* add __repr__ for filesystem nodes (note: these reprs contain a few bits of the secret key!)
* fix a few bugs where we used to equate "mutable" with "not read-only". Nowadays all directories are mutable, but some might be read-only (to you).
* fix a few bugs where code wasn't aware of the new general-purpose metadata dict the comes with each filesystem edge
* sundry fixes to unit tests to adjust to the new directories, e.g. don't assume that every share on disk belongs to a chk file.
I'm not 100% sure that this is correct, but it looks reasonable, it passes unit
tests (although note that unit tests are currently not covering the new mutable
files very well), and it makes the "view JSON" link on a directory work instead
of raising an assertion error.
The URI typenames need revision, and only a few dirnode methods are
implemented. Filenodes are non-functional, but URI/key-management is in
place. There are a lot of classes with names like "NewDirectoryNode" that
will need to be rename once we decide what (if any) backwards compatibility
want to retain.
By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
This will make it easier to change RIBucketWriter in the future to reduce the wire
protocol to just open/write(offset,data)/close, and do all the structuring on the
client end. The ultimate goal is to store each bucket in a single file, to reduce
the considerable filesystem-quantization/inode overhead on the storage servers.
If the error occurs before any data has been sent, we can give a sensible
error message (code 500, stack trace, etc). This will cover most of the error
cases. The ones that aren't covered are when we run out of good peers after
successfully decoding the first segment, either because they go away or
because their shares are corrupt.
Previously, exceptions during a web download caused a hang rather than some
kind of exception or error message. This patch improves the situation by
terminating the HTTP download rather than letting it hang forever. The
behavior still isn't ideal, however, because the error can occur too late to
abort the HTTP request cleanly (i.e. with an error code). In fact, the
Content-Type header and response code have already been set by the time any
download errors have been detected, so the browser is committed to displaying
an image or whatever (thus any error message we put into the stream is
unlikely to be displayed in a meaningful way).
These allow client-side code to conveniently retrieve the IDirectoryNode
instances for both the global shared public root directory, and the per-user
private root directory.
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).
This (compatibility-breaking) change moves much of the validation data and
encoding parameters out of the URI and into the so-called "thingA" block
(which will get a better name as soon as we find one we're comfortable with).
The URI retains the "storage_index" (a generalized term for the role that
we're currently using the verifierid for, the unique index for each file
that gets used by storage servers to decide which shares to return), the
decryption key, the needed_shares/total_shares counts (since they affect
peer selection), and the hash of the thingA block.
This shortens the URI and lets us add more kinds of validation data without
growing the URI (like plaintext merkle trees, to enable strong incremental
plaintext validation), at the cost of maybe 150 bytes of alacrity. Each
storage server holds an identical copy of the thingA block.
This is an incompatible change: new messages have been added to the storage
server interface, and the URI format has changed drastically.
Add a new method to RIIntroducer, to allow the central introducer node to
remove peers from the active set after they've gone away. Without this,
client nodes accumulate stale peer FURLs forever. This introduces a
compatibility break, as old introducers won't know about the 'lost_peers'
message, although the errors produced are probably harmless.
Add a new method to RIIntroducer, to allow the central introducer node to
remove peers from the active set after they've gone away. Without this,
client nodes accumulate stale peer FURLs forever. This introduces a
compatibility break, as old introducers won't know about the 'lost_peers'
message, although the errors produced are probably harmless.
This is a potentially disruptive and potentially ugly change to the code base,
because I renamed the object that serves in both roles from "Queen" to
"IntroducerAndVdrive", which is a bit of an ugly name.
However, I think that clarity is important enough in this release to make this
change. All unit tests pass. I'm now darcs recording this patch in order to
pull it to other machines for more testing.
It now takes a sequence of buffers instead of a single string for both encode and decode, and it also takes a separate sequence of shareids for decode instead of a sequence of tuples, and it returns a sequence of buffers instead of a single string.
Added metadata to the bucket store, which is used to hold the share number
(but the bucket doesn't know that, it just gets a string).
Modified the codec interfaces a bit.
Try to pass around URIs to/from download/upload instead of verifierids.
URI format is still in flux.
Change the current (primitive) file encoder to use a ReplicatingEncoder
because it provides ICodecEncoder. We will be moving to the (less primitive)
file encoder (currently in allmydata.encode_new) eventually, but for now
this change lets us test out PyRS or zooko's upcoming C-based RS codec in
something larger than a single unit test. This primitive file encoder only
uses a single segment, and has no merkle trees.
Also added allmydata.util.deferredutil for a DeferredList wrapper that
errbacks (but only when all component Deferreds have fired) if there were
any errors, which unfortunately is not a behavior available from the standard
DeferredList.