The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.
The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.
Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.
Closes#1467
Improve the column headers to make it clear that this list shows Tub
IDs. (we can't show pubkey-based serverids because clients don't give
those to us: only servers provide pubkeys). This should be the only
place in the whole webapi that shows TubIDs for modern (V2-introducer)
nodes.
This makes it easy to distinguish between old V1-Introducer
nodes (identified by their Foolscap TubID) and new V2 nodes (identified
by their ed25519 pubkey).
This fixes a few places where we used to display a tubid even if we had
a pubkey, making it hard to visually correlate servers in two different
displays. It also cleans up the way we pass serverids to the JS-based
download timeline.
The "introweb" subscribed-clients list still shows tubids.
The _upload_resumable() test interrupts a Helper upload partway
through (by shutting down the Helper), then restarts the Helper and
resumes the upload. The control flow is kind of tricky: to do anything
"partway through" requires adding a hook to the Uploadable. The previous
flow depended upon a (fragile) call to self.stall(), which waits a fixed
number of seconds.
This removes one of those stall() calls (the remainder is in
test/common.py and I'll try removing it in a subsequent revision). It
also removes some now-redundant wait_for_connections() calls, since
bounce_client() doesn't fire its Deferred until the client has finished
coming back up (and uses wait_for_connections() internally to do so).
There was one corner case (where the client disconnects at just the
wrong time) that could have dropped a Deferred, leading to an Unhandled
Error. Clean up the control flow to avoid this case.
This prepares for invitation-based reciprocal-permission Accounting. In
the scheme I'm developing, nodes publish "I accept shares from Y"
messages, which are assembled into a graph, and server will accept
shares from any client node reachable in this graph. For this to work,
the serverX->clientY edge must be connectable to the serverY->clientZ
edge, which means "clientY" and "serverY" must be connected. If clientY
and serverY are two distinct keys, they must be cross-signed. Life is
easier if there's just one key "Y", rather than distinct client- and
server- keys. Calling this one key "server.privkey" would be confusing.
"node.privkey" and "node.pubkey" makes more sense.
One-server-per-node is a pretty easy restriction. Originally I was
thinking that the client.key should be provided in each webapi call,
just like a filecap is, making a single node useable by multiple users
(Accounting principals), and not providing any ambient storage
authority. But I've been unable to think of a comfortable WUI for
that (at least without requiring javascript), nor a friendly way to
transfer account authority (e.g. writecaps that include storage
authority). So I'm more willing to have one-client-per-node these days.
(and note that this rename doesn't seriously preclude
many-clients-per-node or zero-clients-per-node anyways, it just makes
one-client-per-node less awkward)
DeepResultsBase also has a get_corrupt_shares(), and it is populated
from CheckResults.get_corrupt_shares(). It has been updated too, along
with get_remaining_corrupt_shares().
Remove temporary get_new_corrupt_shares() and
get_new_incompatible_shares().
This changes all code which feeds CheckResults(sharemap=) to provide
IServer instances, but CheckResults converts these to old-style
serverids during output, so downstream code doesn't have to change yet.
It adds a temporary get_new_sharemap(), which *does* return IServer
instances, so the immutable repairer can build new CheckResults from an
old one. This will go away when get_sharemap() is updated to return
IServer (and downstream code is updated too).
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.
This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.
Closes#1729