By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
This will make it easier to change RIBucketWriter in the future to reduce the wire
protocol to just open/write(offset,data)/close, and do all the structuring on the
client end. The ultimate goal is to store each bucket in a single file, to reduce
the considerable filesystem-quantization/inode overhead on the storage servers.
If the error occurs before any data has been sent, we can give a sensible
error message (code 500, stack trace, etc). This will cover most of the error
cases. The ones that aren't covered are when we run out of good peers after
successfully decoding the first segment, either because they go away or
because their shares are corrupt.
Previously, exceptions during a web download caused a hang rather than some
kind of exception or error message. This patch improves the situation by
terminating the HTTP download rather than letting it hang forever. The
behavior still isn't ideal, however, because the error can occur too late to
abort the HTTP request cleanly (i.e. with an error code). In fact, the
Content-Type header and response code have already been set by the time any
download errors have been detected, so the browser is committed to displaying
an image or whatever (thus any error message we put into the stream is
unlikely to be displayed in a meaningful way).
These allow client-side code to conveniently retrieve the IDirectoryNode
instances for both the global shared public root directory, and the per-user
private root directory.
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).
This (compatibility-breaking) change moves much of the validation data and
encoding parameters out of the URI and into the so-called "thingA" block
(which will get a better name as soon as we find one we're comfortable with).
The URI retains the "storage_index" (a generalized term for the role that
we're currently using the verifierid for, the unique index for each file
that gets used by storage servers to decide which shares to return), the
decryption key, the needed_shares/total_shares counts (since they affect
peer selection), and the hash of the thingA block.
This shortens the URI and lets us add more kinds of validation data without
growing the URI (like plaintext merkle trees, to enable strong incremental
plaintext validation), at the cost of maybe 150 bytes of alacrity. Each
storage server holds an identical copy of the thingA block.
This is an incompatible change: new messages have been added to the storage
server interface, and the URI format has changed drastically.
Add a new method to RIIntroducer, to allow the central introducer node to
remove peers from the active set after they've gone away. Without this,
client nodes accumulate stale peer FURLs forever. This introduces a
compatibility break, as old introducers won't know about the 'lost_peers'
message, although the errors produced are probably harmless.
Add a new method to RIIntroducer, to allow the central introducer node to
remove peers from the active set after they've gone away. Without this,
client nodes accumulate stale peer FURLs forever. This introduces a
compatibility break, as old introducers won't know about the 'lost_peers'
message, although the errors produced are probably harmless.
This is a potentially disruptive and potentially ugly change to the code base,
because I renamed the object that serves in both roles from "Queen" to
"IntroducerAndVdrive", which is a bit of an ugly name.
However, I think that clarity is important enough in this release to make this
change. All unit tests pass. I'm now darcs recording this patch in order to
pull it to other machines for more testing.
It now takes a sequence of buffers instead of a single string for both encode and decode, and it also takes a separate sequence of shareids for decode instead of a sequence of tuples, and it returns a sequence of buffers instead of a single string.
Added metadata to the bucket store, which is used to hold the share number
(but the bucket doesn't know that, it just gets a string).
Modified the codec interfaces a bit.
Try to pass around URIs to/from download/upload instead of verifierids.
URI format is still in flux.
Change the current (primitive) file encoder to use a ReplicatingEncoder
because it provides ICodecEncoder. We will be moving to the (less primitive)
file encoder (currently in allmydata.encode_new) eventually, but for now
this change lets us test out PyRS or zooko's upcoming C-based RS codec in
something larger than a single unit test. This primitive file encoder only
uses a single segment, and has no merkle trees.
Also added allmydata.util.deferredutil for a DeferredList wrapper that
errbacks (but only when all component Deferreds have fired) if there were
any errors, which unfortunately is not a behavior available from the standard
DeferredList.