Commit Graph

3283 Commits

Author SHA1 Message Date
Brian Warner
b2dcbbb62d test/common.py: fix race condition waiting for the helper connection
The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.

The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.

Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.

Closes #1467
2012-06-14 12:18:35 -07:00
david-sarah
daa24bce8b Clarify documentation of RIStorageServer.slot_testv_and_readv_and_writev. fixes #1744 2012-06-13 16:51:35 +00:00
Brian Warner
ef2db56104 introweb: the Subscribed Clients list shows tubids, not serverids
Improve the column headers to make it clear that this list shows Tub
IDs. (we can't show pubkey-based serverids because clients don't give
those to us: only servers provide pubkeys). This should be the only
place in the whole webapi that shows TubIDs for modern (V2-introducer)
nodes.
2012-06-12 14:37:27 -07:00
Brian Warner
c073565ecc Display serverids consistently as 8-char pubkey, or 6-char tubid.
This makes it easy to distinguish between old V1-Introducer
nodes (identified by their Foolscap TubID) and new V2 nodes (identified
by their ed25519 pubkey).

This fixes a few places where we used to display a tubid even if we had
a pubkey, making it hard to visually correlate servers in two different
displays. It also cleans up the way we pass serverids to the JS-based
download timeline.

The "introweb" subscribed-clients list still shows tubids.
2012-06-12 14:30:34 -07:00
Brian Warner
51a5aaa639 test_system.py: wait for the Helper connection properly before uploading 2012-06-11 23:19:30 -07:00
Brian Warner
5e432b12a1 test_system.py: clean up control flow, reduce use of stall()
The _upload_resumable() test interrupts a Helper upload partway
through (by shutting down the Helper), then restarts the Helper and
resumes the upload. The control flow is kind of tricky: to do anything
"partway through" requires adding a hook to the Uploadable. The previous
flow depended upon a (fragile) call to self.stall(), which waits a fixed
number of seconds.

This removes one of those stall() calls (the remainder is in
test/common.py and I'll try removing it in a subsequent revision). It
also removes some now-redundant wait_for_connections() calls, since
bounce_client() doesn't fire its Deferred until the client has finished
coming back up (and uses wait_for_connections() internally to do so).
2012-06-11 18:22:35 -07:00
Brian Warner
82519ef93f test_system.py: fix minor typo 2012-06-11 18:16:36 -07:00
Brian Warner
a809e4caba offloaded.py: don't drop the Deferred
There was one corner case (where the client disconnects at just the
wrong time) that could have dropped a Deferred, leading to an Unhandled
Error. Clean up the control flow to avoid this case.
2012-06-11 18:16:02 -07:00
Brian Warner
e1093cbb33 introducer: add sequence-numbers to announcements, ignore replays
This will support revocation of Accounting recommendation records,
assuming the gossip-based broadcast channel isn't easily jammed.
2012-06-10 19:10:22 -07:00
Brian Warner
bf416af49e client.py: rename "server key" to "node key", use old name if present
This prepares for invitation-based reciprocal-permission Accounting. In
the scheme I'm developing, nodes publish "I accept shares from Y"
messages, which are assembled into a graph, and server will accept
shares from any client node reachable in this graph. For this to work,
the serverX->clientY edge must be connectable to the serverY->clientZ
edge, which means "clientY" and "serverY" must be connected. If clientY
and serverY are two distinct keys, they must be cross-signed. Life is
easier if there's just one key "Y", rather than distinct client- and
server- keys. Calling this one key "server.privkey" would be confusing.
"node.privkey" and "node.pubkey" makes more sense.

One-server-per-node is a pretty easy restriction. Originally I was
thinking that the client.key should be provided in each webapi call,
just like a filecap is, making a single node useable by multiple users
(Accounting principals), and not providing any ambient storage
authority. But I've been unable to think of a comfortable WUI for
that (at least without requiring javascript), nor a friendly way to
transfer account authority (e.g. writecaps that include storage
authority). So I'm more willing to have one-client-per-node these days.

(and note that this rename doesn't seriously preclude
many-clients-per-node or zero-clients-per-node anyways, it just makes
one-client-per-node less awkward)
2012-06-10 18:14:55 -07:00
Brian Warner
26d3869076 node.py: add get_private_config()
Also add tests for this and the pre-existing private-config methods.
2012-06-10 17:46:38 -07:00
Brian Warner
518e4cec98 Fix text in Publish Status results. Closes #1762. 2012-06-08 15:21:46 -07:00
Brian Warner
188c7fecf5 CheckResults corrupt/incompatible shares now return IServers
DeepResultsBase also has a get_corrupt_shares(), and it is populated
from CheckResults.get_corrupt_shares(). It has been updated too, along
with get_remaining_corrupt_shares().

Remove temporary get_new_corrupt_shares() and
get_new_incompatible_shares().
2012-06-02 11:39:12 -07:00
Brian Warner
da9ac55294 CheckResults.get_servers_responding() now returns IServers
Remove temporary get_new_servers_responding().
2012-06-02 11:39:12 -07:00
Brian Warner
957a5315aa CheckResults.get_sharemap() now returns IServers
Remove temporary get_new_sharemap().
2012-06-02 11:39:11 -07:00
Brian Warner
76fca000df CheckResults: pass IServer to corrupt/incompatible share locators
Getters still return serverid. Adds temporary get_new_corrupt_shares()
and get_new_incompatible_shares().
2012-06-02 11:39:11 -07:00
Brian Warner
dd8178ee6d CheckResults: pass IServer to servers_responding=, getter returns serverid
Add temporary get_new_servers_responding().
2012-06-02 11:39:11 -07:00
Brian Warner
a4c95609c7 CheckResults: pass IServer to sharemap=, but get_sharemap() returns serverids
This changes all code which feeds CheckResults(sharemap=) to provide
IServer instances, but CheckResults converts these to old-style
serverids during output, so downstream code doesn't have to change yet.

It adds a temporary get_new_sharemap(), which *does* return IServer
instances, so the immutable repairer can build new CheckResults from an
old one. This will go away when get_sharemap() is updated to return
IServer (and downstream code is updated too).
2012-06-02 11:39:11 -07:00
Brian Warner
c03b6aff97 CheckResults: internal cleanup
replace the one-big-dictionary with normal private attributes
2012-06-02 11:39:11 -07:00
Brian Warner
437de4340b CheckResults: privatize remaining attributes 2012-06-02 11:39:10 -07:00
Brian Warner
0fcc054a61 CheckResults: use fat init, add type-checking assertions
Added assertions for sharemap, servermap, servers_responding,
list_corrupt_shares, and list_incompatible_shares.
2012-06-02 11:39:10 -07:00
Brian Warner
fba26d7bae mutable/checker: refactor to make CheckResults easier to change 2012-06-02 11:39:10 -07:00
Brian Warner
8daacbcf69 CheckResults: replace get_data() with as_dict(), use getters in web status 2012-06-02 11:39:10 -07:00
Brian Warner
4867dca3f0 use the new CheckResult getters almost everywhere
The remaining get_data() calls are either in
web.check_results.json_check_results(), or functioning as repr()s in
various unit test failure cases.
2012-06-02 11:39:10 -07:00
Brian Warner
1883d393c6 CheckResults: replace get_data() with a bunch of individual getters 2012-06-02 11:39:10 -07:00
Brian Warner
ccfcd4de37 change CheckResults to use a fat set_data()
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
2012-06-02 11:39:10 -07:00
Brian Warner
d446897282 CheckResults: simplify self._data 2012-06-02 11:39:09 -07:00
Brian Warner
e313cf6406 CheckResults: start hiding .data, first step to clean it up
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
2012-06-02 11:39:09 -07:00
Brian Warner
17c5384f79 immutable.CiphertextFileNode.check_and_repair: simplify for refactoring
There were too many nested functions here, making some upcoming changes
too difficult, so let's refactor it first.
2012-06-02 11:39:09 -07:00
david-sarah
2ee1bc7148 Catch exceptions from CLI in order to prevent the Ubuntu crash monolog from triggering. refs #1746 2012-05-20 15:35:29 +00:00
Brian Warner
3ba77925d9 node.py: stop stripping whitespace in write_private_config()
It's nice to add newlines to the saved file, so 'cat' is easy to use. We
still strip on the input side, in get_or_create_private_config().
2012-05-30 00:17:55 -07:00
Brian Warner
bfee999e20 test_web.py: fix memory leak when run with --until-failure
The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.

This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.

Closes #1729
2012-05-22 15:39:49 -07:00
Brian Warner
bcdfb5802e test/check_memory.py: oops, fix one last ur.uri -> ur.get_uri() 2012-05-22 08:50:36 -07:00
Brian Warner
3a1c02cfdf change UploadResults to return IServers, update users to match
This finally changes all callers of get_servermap()/get_sharemap() to
accept IServers, and changes UploadResults to provide them.
2012-05-21 21:18:37 -07:00
Brian Warner
843739486a UploadResults: store IServers internally, but still return serverids
This stores IDisplayableServer-providing instances (StubServers or
NativeStorageServers) in the .servermap and .sharemap dictionaries. But
get_servermap()/get_sharemap() still return data structures with
serverids, not IServers, by translating their data on the way out. This
lets us put off changing the callers for a little bit longer.
2012-05-21 21:18:25 -07:00
Brian Warner
97a1eb6ebf split IDisplayableServer from IServer, add sb.get_stub_server()
IDisplayableServer includes just enough functionality to call
.get_name() and friends, which is all that the UploadResults really
need. IServer is a superset that includes actual share-manipulation
methods. StubServer instances provide only IDisplayableServer, while
actual NativeStorageServer instances provide the full IServer interface.

When the Helper sends a serverid (so we know what to call the server but
nothing else about it, and have no corresponding NativeStorageServer
object to reference), but we want to store an IDisplayableServer in the
UploadResults, we create a synthetic StubServer "server" and store that
instead.
2012-05-21 21:17:27 -07:00
Brian Warner
3d771132a8 switch UploadResults to use get_uri(), hide internal ._uri
Complete the getter-based transformation, by hiding ".uri" and updating
callers to use get_uri(). Also don't set a dummy self._uri, leave it
undefined until someone calls set_uri().
2012-05-21 21:14:44 -07:00
Brian Warner
29b11531b5 switch UploadResults to use getters, hide internal data, for all but .uri
This hides attributes with e.g. _sharemap, and creates getters like
get_sharemap() to access them, for every field except .uri . This will
make it easier to modify the internal representation of .sharemap
without requiring callers to adjust quite yet.

".uri" has so many users that it seemed better to update it in a
subsequent patch.
2012-05-21 21:14:28 -07:00
Brian Warner
08f5bc8e2f convert UploadResults to a fat init
Populate most of UploadResults (except .uri, which is learned later when
using a Helper) in the constructor, instead of allowing creators to
write to attributes later. This will help isolate the fields that we
want to change to use IServers.
2012-05-21 21:14:14 -07:00
Brian Warner
b71234c538 add HelperUploadResults
This splits the pb.Copyable on-wire object (HelperUploadResults) out
from the local results object (UploadResults). To maintain compatibility
with older Helpers, we have to leave pb.Copyable classes alone and
unmodified, but we want to change UploadResults to use IServers instead
of serverids. So by using a different class on the wire, and translating
to/from it on either end, we can accomplish both.
2012-05-21 21:14:00 -07:00
Brian Warner
b3af012b13 Uploader cleanup: create results at end, not beginning
This will make it easier to populate the UploadResults during __init__,
instead of doing it one-field-at-a-time later.
2012-05-21 21:13:47 -07:00
Brian Warner
0df833eac9 clean up Helper to make later changes easier
Fix up control flow inside the Helper, to make it more friendly for
later refactoring.
2012-05-21 21:13:32 -07:00
Brian Warner
e60982c851 helper: remove timings["existence_check"], aka "Already-In-Grid Check"
This measured how long the Helper took to do a filecheck before asking
for ciphertext. The "Contacting Helper" report includes both
existence_check and the client-helper RTT.

For non-overlapping uploads, it was being returned correctly. But when
multiple upload requests overlapped, and the file was not already in the
grid, the filecheck would only run once, and its existence_check time
would be reported for all uploaders (even if they didn't have to wait
for that time). Cleaning that up proved too difficult: the only correct
place to report this time is from the initial remote_upload_chk() call,
but the return value of that is too constrained to accomodate it in the
needs-upload case.

So I'm removing it altogether. Eventually I plan to add a proper
events/times field and record more data, including this check, in a form
that can be drawn on a nice zoomable timeline view.

Old clients talking to a new Helper (which doesn't supply the value)
will tolerate the loss (they'll just display an empty field on the web
view).
2012-05-21 21:13:11 -07:00
Brian Warner
393c0729de test_checker: minor improvement in fake-server setup
This prepares for testing the differences between tubid and pubkey-based
name/longname.
2012-05-21 19:49:36 -07:00
david-sarah
4ddcde3094 Since we now require Python 2.5, we can use os.SEEK_END. 2012-05-16 21:39:48 +00:00
david-sarah
a1a1b5bf8a Simplifications resulting from requiring Python 2.5 and therefore being able to use sqlite3 from the standard library. This also drops sqlite3 from the set of versions and paths we report. 2012-05-16 02:47:25 +00:00
david-sarah
0fc196ea5f Require Python 2.5. 2012-05-16 02:41:49 +00:00
Brian Warner
cc366903ce dictutil.DictOfSets: remove .union() method, it was misleading
Unlike set.union(), which returns a new set, DictOfSets.union() modified
the DictOfSets in-place. The name collision bit me when I changed some
code from using DictOfSets to a normal set, and expected that
set.union() would modify the set in-place. Since there was only one user
of DictOfSets.union, I figured it was safer to just get rid of it.
2012-05-16 16:55:09 -07:00
Brian Warner
9acf5beebd immutable repairer: populate servers-responding properly
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).

In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).

The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.

I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
2012-05-16 16:55:09 -07:00
david-sarah
3738c3e2d1 fileutil.py: use try/finally to close file in write_atomically. 2012-05-16 23:08:39 +00:00