in trying to test my fix for the failure of the offloaded unit test on windows
(by closing the reader before unlinking the encoding file - which, perhaps
disturbingly doesn't actually make a difference in my windows environment)
I was unable too because the unit test failed every time with a connection lost
error.
after much more time than I'd like to admit it took, I eventually managed to
track that down to a part of the unit test which is supposed to be be dropping
a connection. it looks like the exceptions that get thrown on unix, or at
least all the specific environments brian tested in, for that dropped
connection are different from what is thrown on my box (which is running py2.4
and twisted 2.4.0, for reference) adding ConnectionLost to the list of
expected exceptions makes the test pass.
though curiously still my test logs a NotEnoughWritersError error, and I'm not
currently able to fathom why that exception isn't leading to any overall
failure of the unit test itself.
for general interest, a large part of the time spent trying to track this down
was lost to the state of logging. I added a whole bunch of logging to try
and track down where the tests were failing, but then spent a bunch of time
searching in vain for that log output. as far as I can tell at this point
the unit tests are themselves logging to foolscap's log module, but that isn't
being directed anywhere, so all the test's logging is being black holed.
* rename my_private_dir.cap to root_dir.cap
* move it into the private subdir
* change the cmdline argument "--root-uri=[private]" to "--dir-uri=[root]"
The underlying issue is recorded in #211: one corrupt share in a query
response will cause us to ignore the remaining shares in that response, even
if they are good. In our tests (with N=10 but only 5 peers), this can leave
us with too few shares to recover the file.
The temporary workaround is to use 10 peers, to make sure we never get
multiple shares per response. The real fix will be to fix the control flow.
This fixes#209.
* use new decentralized directories everywhere instead of old centralized directories
* provide UI to them through the web server
* provide UI to them through the CLI
* update unit tests to simulate decentralized mutable directories in order to test other components that rely on them
* remove the notion of a "vdrive server" and a client thereof
* remove the notion of a "public vdrive", which was a directory that was centrally published/subscribed automatically by the tahoe node (you can accomplish this manually by making a directory and posting the URL to it on your web site, for example)
* add a notion of "wait_for_numpeers" when you need to publish data to peers, which is how many peers should be attached before you start. The default is 1.
* add __repr__ for filesystem nodes (note: these reprs contain a few bits of the secret key!)
* fix a few bugs where we used to equate "mutable" with "not read-only". Nowadays all directories are mutable, but some might be read-only (to you).
* fix a few bugs where code wasn't aware of the new general-purpose metadata dict the comes with each filesystem edge
* sundry fixes to unit tests to adjust to the new directories, e.g. don't assume that every share on disk belongs to a chk file.
It turns out that we actually have *two* files in our storage servers at the
time that test_vdrive asserts things about the shares. I suppose that
test_vdrive happens to pass on all other operating systems because the
filesystem happens to return the right share as the first one in a
"listdir()". The fix in this patch is slightly kludgey -- allow either share
to pass -- but good enough.
By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
If the error occurs before any data has been sent, we can give a sensible
error message (code 500, stack trace, etc). This will cover most of the error
cases. The ones that aren't covered are when we run out of good peers after
successfully decoding the first segment, either because they go away or
because their shares are corrupt.
Previously, exceptions during a web download caused a hang rather than some
kind of exception or error message. This patch improves the situation by
terminating the HTTP download rather than letting it hang forever. The
behavior still isn't ideal, however, because the error can occur too late to
abort the HTTP request cleanly (i.e. with an error code). In fact, the
Content-Type header and response code have already been set by the time any
download errors have been detected, so the browser is committed to displaying
an image or whatever (thus any error message we put into the stream is
unlikely to be displayed in a meaningful way).
These allow client-side code to conveniently retrieve the IDirectoryNode
instances for both the global shared public root directory, and the per-user
private root directory.
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).
This (compatibility-breaking) change moves much of the validation data and
encoding parameters out of the URI and into the so-called "thingA" block
(which will get a better name as soon as we find one we're comfortable with).
The URI retains the "storage_index" (a generalized term for the role that
we're currently using the verifierid for, the unique index for each file
that gets used by storage servers to decide which shares to return), the
decryption key, the needed_shares/total_shares counts (since they affect
peer selection), and the hash of the thingA block.
This shortens the URI and lets us add more kinds of validation data without
growing the URI (like plaintext merkle trees, to enable strong incremental
plaintext validation), at the cost of maybe 150 bytes of alacrity. Each
storage server holds an identical copy of the thingA block.
This is an incompatible change: new messages have been added to the storage
server interface, and the URI format has changed drastically.
It does indeed take longer than 2400 seconds to run test_upload_and_download on a virtual windows machine when the underlying real machine is heavily loaded down with filesystem analysis runs...
This is a potentially disruptive and potentially ugly change to the code base,
because I renamed the object that serves in both roles from "Queen" to
"IntroducerAndVdrive", which is a bit of an ugly name.
However, I think that clarity is important enough in this release to make this
change. All unit tests pass. I'm now darcs recording this patch in order to
pull it to other machines for more testing.