These constraints were originally intended to protect against attacks on the
storage server protocol layer which exhaust memory in the peer. However,
defending against that sort of DoS is hard -- probably it isn't completely
achieved -- and it costs development time to think about it, and it sometimes
imposes limits on legitimate users which we don't necessarily want to impose.
So, for now we forget about limiting the amount of RAM that a foolscap peer can
cause you to start using.
to make the internal ones use binary strings (nodeid, storage index) and
the web/JSON ones use base32-encoded strings. The immutable verifier is
still incomplete (it returns imaginary healty results).
Removed the Checker service, removed checker results storage (both in-memory
and the tiny stub of sqlite-based storage). Added ICheckable, all
check/verify is now done by calling the check() method on filenodes and
dirnodes (immutable files, literal files, mutable files, and directory
instances).
Checker results are returned in a Results instance, with an html() method for
display. Checker results have been temporarily removed from the wui directory
listing until we make some other fixes.
Also fixed client.create_node_from_uri() to create LiteralFileNodes properly,
since they have different checking behavior. Previously we were creating full
FileNodes with LIT uris inside, which were downloadable but not checkable.
this adds a new service to pre-generate RSA key pairs. This allows
the expensive (i.e. slow) key generation to be placed into a process
outside the node, so that the node's reactor will not block when it
needs a key pair, but instead can retrieve them from a pool of already
generated key pairs in the key-generator service.
it adds a tahoe create-key-generator command which initialises an
empty dir with a tahoe-key-generator.tac file which can then be run
via twistd. it stashes its .pem and portnum for furl stability and
writes the furl of the key gen service to key_generator.furl, also
printing it to stdout.
by placing a key_generator.furl file into the nodes config directory
(e.g. ~/.tahoe) a node will attempt to connect to such a service, and
will use that when creating mutable files (i.e. directories) whenever
possible. if the keygen service is unavailable, it will perform the
key generation locally instead, as before.
Now upload or encode methods take a required argument named "convergence" which can be either None, indicating no convergent encryption at all, or a string, which is the "added secret" to be mixed in to the content hash key. If you want traditional convergent encryption behavior, set the added secret to be the empty string.
This patch also renames "content hash key" to "convergent encryption" in a argument names and variable names. (A different and larger renaming is needed in order to clarify that Tahoe supports immutable files which are not encrypted content-hash-key a.k.a. convergent encryption.)
This patch also changes a few unit tests to use non-convergent encryption, because it doesn't matter for what they are testing and non-convergent encryption is slightly faster.
Unfinished bits: doc in webapi.txt, test handling of badly formed JSON, return reasonable HTTP response, examination of the effect of this patch on code coverage -- but I'm committing it anyway because MikeB can use it and I'm being called to dinner...
this adds an interface, IStatsProducer, defining the get_stats() method
which the stats provider calls upon and registered producer, and made the
register_producer() method check that interface is implemented.
also refine the startup logic, so that the stats provider doesn't try and
connect out to the stats gatherer until after the node declares the tub
'ready'. this is to address an issue whereby providers would attach to
the gatherer without providing a valid furl, and hence the gatherer would
be unable to determine the tubid of the connected client, leading to lost
samples.
We have a desire to collect runtime statistics from multiple nodes primarily
for server monitoring purposes. This implements a simple implementation of
such a system, as a skeleton to build more sophistication upon.
Each client now looks for a 'stats_gatherer.furl' config file. If it has
been configured to use a stats gatherer, then it instantiates internally
a StatsProvider. This is a central place for code which wishes to offer
stats up for monitoring to report them to, either by calling
stats_provider.count('stat.name', value) to increment a counter, or by
registering a class as a stats producer with sp.register_producer(obj).
The StatsProvider connects to the StatsGatherer server and provides its
provider upon startup. The StatsGatherer is then responsible for polling
the attached providers periodically to retrieve the data provided.
The provider queries each registered producer when the gatherer queries
the provider. Both the internal 'counters' and the queried 'stats' are
then reported to the gatherer.
This provides a simple gatherer app, (c.f. make stats-gatherer-run)
which prints its furl and listens for incoming connections. Once a
minute, the gatherer polls all connected providers, and writes the
retrieved data into a pickle file.
Also included is a munin plugin which knows how to read the gatherer's
stats.pickle and output data munin can interpret. this plugin,
tahoe-stats.py can be symlinked as multiple different names within
munin's 'plugins' directory, and inspects argv to determine which
data to display, doing a lookup in a table within that file.
It looks in the environment for 'statsfile' to determine the path to
the gatherer's stats.pickle. An example plugins-conf.d file is
provided.
Also allow an optional leading "http://127.0.0.1:8123/uri/".
Also fix a few unit tests to generate bogus Dirnode URIs of the modern form instead of the former form.
* use new decentralized directories everywhere instead of old centralized directories
* provide UI to them through the web server
* provide UI to them through the CLI
* update unit tests to simulate decentralized mutable directories in order to test other components that rely on them
* remove the notion of a "vdrive server" and a client thereof
* remove the notion of a "public vdrive", which was a directory that was centrally published/subscribed automatically by the tahoe node (you can accomplish this manually by making a directory and posting the URL to it on your web site, for example)
* add a notion of "wait_for_numpeers" when you need to publish data to peers, which is how many peers should be attached before you start. The default is 1.
* add __repr__ for filesystem nodes (note: these reprs contain a few bits of the secret key!)
* fix a few bugs where we used to equate "mutable" with "not read-only". Nowadays all directories are mutable, but some might be read-only (to you).
* fix a few bugs where code wasn't aware of the new general-purpose metadata dict the comes with each filesystem edge
* sundry fixes to unit tests to adjust to the new directories, e.g. don't assume that every share on disk belongs to a chk file.
I'm not 100% sure that this is correct, but it looks reasonable, it passes unit
tests (although note that unit tests are currently not covering the new mutable
files very well), and it makes the "view JSON" link on a directory work instead
of raising an assertion error.
The URI typenames need revision, and only a few dirnode methods are
implemented. Filenodes are non-functional, but URI/key-management is in
place. There are a lot of classes with names like "NewDirectoryNode" that
will need to be rename once we decide what (if any) backwards compatibility
want to retain.
By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
This will make it easier to change RIBucketWriter in the future to reduce the wire
protocol to just open/write(offset,data)/close, and do all the structuring on the
client end. The ultimate goal is to store each bucket in a single file, to reduce
the considerable filesystem-quantization/inode overhead on the storage servers.
If the error occurs before any data has been sent, we can give a sensible
error message (code 500, stack trace, etc). This will cover most of the error
cases. The ones that aren't covered are when we run out of good peers after
successfully decoding the first segment, either because they go away or
because their shares are corrupt.
Previously, exceptions during a web download caused a hang rather than some
kind of exception or error message. This patch improves the situation by
terminating the HTTP download rather than letting it hang forever. The
behavior still isn't ideal, however, because the error can occur too late to
abort the HTTP request cleanly (i.e. with an error code). In fact, the
Content-Type header and response code have already been set by the time any
download errors have been detected, so the browser is committed to displaying
an image or whatever (thus any error message we put into the stream is
unlikely to be displayed in a meaningful way).
These allow client-side code to conveniently retrieve the IDirectoryNode
instances for both the global shared public root directory, and the per-user
private root directory.
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).
This (compatibility-breaking) change moves much of the validation data and
encoding parameters out of the URI and into the so-called "thingA" block
(which will get a better name as soon as we find one we're comfortable with).
The URI retains the "storage_index" (a generalized term for the role that
we're currently using the verifierid for, the unique index for each file
that gets used by storage servers to decide which shares to return), the
decryption key, the needed_shares/total_shares counts (since they affect
peer selection), and the hash of the thingA block.
This shortens the URI and lets us add more kinds of validation data without
growing the URI (like plaintext merkle trees, to enable strong incremental
plaintext validation), at the cost of maybe 150 bytes of alacrity. Each
storage server holds an identical copy of the thingA block.
This is an incompatible change: new messages have been added to the storage
server interface, and the URI format has changed drastically.