The filesystem which gets my vote for most undeservedly popular is ext3, and it has a hard limit of 32,000 entries in a directory. Many other filesystems (even ones that I like more than I like ext3) have either hard limits or bad performance consequences or weird edge cases when you get too many entries in a single directory.
This patch makes it so that there is a layer of intermediate directories between the "shares" directory and the actual storage-index directory (the one whose name contains the entire storage index (z-base-32 encoded) and which contains one or more share files named by their share number).
The intermediate directories are named by the first 14 bits of the storage index, which means there are at most 16384 of them. (This also means that the intermediate directory names are not a leading prefix of the storage-index directory names -- to do that would have required us to have intermediate directories limited to either 1024 (2-char), which is too few, or 32768 (3-chars of a full 5 bits each), which would overrun ext3's funny hard limit of 32,000.))
This closes#150, and please see the "convertshares.py" script attached to #150 to convert your old tahoe-0.7.0 storage/shares directory into a new tahoe-0.8.0 storage/shares directory.
in trying to test my fix for the failure of the offloaded unit test on windows
(by closing the reader before unlinking the encoding file - which, perhaps
disturbingly doesn't actually make a difference in my windows environment)
I was unable too because the unit test failed every time with a connection lost
error.
after much more time than I'd like to admit it took, I eventually managed to
track that down to a part of the unit test which is supposed to be be dropping
a connection. it looks like the exceptions that get thrown on unix, or at
least all the specific environments brian tested in, for that dropped
connection are different from what is thrown on my box (which is running py2.4
and twisted 2.4.0, for reference) adding ConnectionLost to the list of
expected exceptions makes the test pass.
though curiously still my test logs a NotEnoughWritersError error, and I'm not
currently able to fathom why that exception isn't leading to any overall
failure of the unit test itself.
for general interest, a large part of the time spent trying to track this down
was lost to the state of logging. I added a whole bunch of logging to try
and track down where the tests were failing, but then spent a bunch of time
searching in vain for that log output. as far as I can tell at this point
the unit tests are themselves logging to foolscap's log module, but that isn't
being directed anywhere, so all the test's logging is being black holed.
* rename my_private_dir.cap to root_dir.cap
* move it into the private subdir
* change the cmdline argument "--root-uri=[private]" to "--dir-uri=[root]"
The underlying issue is recorded in #211: one corrupt share in a query
response will cause us to ignore the remaining shares in that response, even
if they are good. In our tests (with N=10 but only 5 peers), this can leave
us with too few shares to recover the file.
The temporary workaround is to use 10 peers, to make sure we never get
multiple shares per response. The real fix will be to fix the control flow.
This fixes#209.
* use new decentralized directories everywhere instead of old centralized directories
* provide UI to them through the web server
* provide UI to them through the CLI
* update unit tests to simulate decentralized mutable directories in order to test other components that rely on them
* remove the notion of a "vdrive server" and a client thereof
* remove the notion of a "public vdrive", which was a directory that was centrally published/subscribed automatically by the tahoe node (you can accomplish this manually by making a directory and posting the URL to it on your web site, for example)
* add a notion of "wait_for_numpeers" when you need to publish data to peers, which is how many peers should be attached before you start. The default is 1.
* add __repr__ for filesystem nodes (note: these reprs contain a few bits of the secret key!)
* fix a few bugs where we used to equate "mutable" with "not read-only". Nowadays all directories are mutable, but some might be read-only (to you).
* fix a few bugs where code wasn't aware of the new general-purpose metadata dict the comes with each filesystem edge
* sundry fixes to unit tests to adjust to the new directories, e.g. don't assume that every share on disk belongs to a chk file.
It turns out that we actually have *two* files in our storage servers at the
time that test_vdrive asserts things about the shares. I suppose that
test_vdrive happens to pass on all other operating systems because the
filesystem happens to return the right share as the first one in a
"listdir()". The fix in this patch is slightly kludgey -- allow either share
to pass -- but good enough.
By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
If the error occurs before any data has been sent, we can give a sensible
error message (code 500, stack trace, etc). This will cover most of the error
cases. The ones that aren't covered are when we run out of good peers after
successfully decoding the first segment, either because they go away or
because their shares are corrupt.
Previously, exceptions during a web download caused a hang rather than some
kind of exception or error message. This patch improves the situation by
terminating the HTTP download rather than letting it hang forever. The
behavior still isn't ideal, however, because the error can occur too late to
abort the HTTP request cleanly (i.e. with an error code). In fact, the
Content-Type header and response code have already been set by the time any
download errors have been detected, so the browser is committed to displaying
an image or whatever (thus any error message we put into the stream is
unlikely to be displayed in a meaningful way).
These allow client-side code to conveniently retrieve the IDirectoryNode
instances for both the global shared public root directory, and the per-user
private root directory.
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).