It is a Deferred that indicates when things have actually stopped. Failing to
return it means callers will think everything stopped synchronously. This is
certainly not the case.
Remove old hypothesis tests
Fix allmydata.test.cli.test_cli.Errors.test_get
this was broken due to differing share placements
whereas we need to allow this.
Fix test_5_overdue_immutable
This change makes the test not depend on the value
of PYTHONHASHSEED.
Revert "Fix test_5_overdue_immutable"
This reverts commit 5f3696d9a53e7df8781a2c463c7112282397cd69.
fix test to actually hang the first 5 *servers*
sort keys for stable output
use file-context-managers
remove probably-unneeded assert (that fails sometimes)
another non-deterministic test?
which uses SHA1 to combine the file's storage index (known as "peer
selection index" in this context) and each server's "server permutation
seed". This is the only thing in tahoe that uses SHA1.
With this change, we stop importing sha1 from anywhere else.
The node now attempts to create Tor/I2P connection handlers (if the
right libraries are available), and will use them for tor/i2p FURL hints
by default. For now it only creates default handlers: there is not yet
any code to interpret the `[tor]`/`[i2p]` sections of tahoe.cfg which
would let you override that process.
The node also parses the `[connections]` section, allowing `tcp: tor` to
use Tor for all outbound TCP connections. It defaults to `tcp: tcp`, of
course.
Static storage-server connections will now honor the `connections:`
overrides in `servers.yaml`, allowing specific servers to use TCP where
they would normally be restricted to Tor.
refs ticket:2788
refs ticket:517
This adds Node._create_tub(), which knows how to make a Tub with all the
right options and connection handlers that were specified in
tahoe.cfg (the connection handlers are disabled for now, but they'll get
implemented soon).
The new Node.create_main_tub() calls it. This main Tub is used:
* to connect to the Introducer
* to host the Helper (if enabled)
* to host the Storage Server (if enabled)
Node._create_tub() is also passed into the StorageFarmBroker, which
passes it into each NativeStorageServer, to create the (separate) Tub
for each server connection. _create_tub knows about the options, and
NativeStorageServer can override the connection handlers. This way we
don't need to pass tub options or default handlers into Client,
StorageFarmBroker, or NativeStorageServer.
A number of tests create NativeStorageServer objects: these were updated
to match the new arguments. test_storage_client was simplified because
we no longer need to mock out the Tub() constructor.
This contains several merged patches. Individual messages follow, latest first:
* Fix a warning from check-miscaptures.
* In retrieve.py, explicitly test whether a key is in self.servermap.proxies
rather than catching KeyError.
* Added a new comment to the MDMF version of the test I removed, explaining
the removal of the SDMF version.
* Removed test_corrupt_all_block_hash_tree_late, since the entire block_hash_tree
is cached in the servermap for an SDMF file.
* Fixed several tests that require files larger than the servermap cache.
* Remove unused test_response_cache_memory_leak().
* Exercise the cache.
* Test infrastructure for counting cache misses on MDMF files.
* Removed the ResponseCache. Instead, the MDMFSlotReadProxy initialized
by ServerMap is kept around so Retrieve can access it. The ReadProxy
has a cache of the first 1000 bytes initially read from each share by
the ServerMap. We're able to satisfy a number of requests out of this
cache, so roundtrips are reduced from 84 to 60 in test_deepcheck_mdmf.
There is still some mystery about under what conditions the cache has
fewer than 1000 bytes. Also this breaks some existing unit tests that
depend on the inner behavior of ResponseCache.
* The servermap.proxies (a cache of SlotReadProxies) is now keyed
by (verinfo,serverid,shnum) rather than just (serverid,shnum)
* Minor cosmetic changes
* Added a test failure if the number of cache misses is too high.
Author: Andrew Miller <amiller@dappervision.com>
Signed-off-by: David-Sarah Hopwood <davidsarah@jacaranda.org>
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).
In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).
The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.
I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
The remaining work is to write additional tests.
src/allmydata/test/no_network.py:
This supports tests in which servers leave the grid only to return with
their shares intact at a later time.
src/allmydata/test/test_mutable.py:
The UCWEs in the incident reports associated with #1628 all seem to be
associated with shares that the servermap knows about, but which aren't
accounted for during the publish process for whatever reason. Specifically,
it looks like the publisher is only capable of keeping track of a single
storage server for a given share. This makes the repair process worse than
it was pre-MDMF at updating all of the shares of a particular file to the
newest version, and can also cause spurious UCWEs. This test simulates such
a layout and fails if an UCWE is thrown. We need to write another test to
ensure that all copies of a share are updated to the latest version (or
alter this test to do that), so that the test suite doesn't pass unless both
regressions are fixed.
We want the publisher to follow the existing share placement when uploading
a new version of a mutable file, and we don't want this test to pass unless
it does.
src/allmydata/mutable/publish.py:
Before this commit, the publisher only kept track of a single writer for
each share. This is insufficient to handle updates in which a single share
may live on multiple servers. In the best case, an update will only update
one of the existing shares instead of all of them. In some cases, the update
will encounter the existing shares when publishing some other share,
interpret it as a sign of an uncoordinated update, and fail. Keeping track
of all of the writers helps ensure that all existing shares are updated, and
helps avoid spurious uncoordinated write errors.