This is all minor stuff: unreachable debug code (that should be commented-out
instead of in an 'if False:' block), unnecessary 'pass' and 'global'
statements, redundantly-initialized variables. No behavior changes. Nothing
here was actually broken, it just looked suspicious to the static analysis at
https://lgtm.com/projects/g/tahoe-lafs/tahoe-lafs/alerts/?mode=list .
Squashed all commits that were meejah's between
30d68fb499f300a393fa0ced5980229f4bb6efda
and
33c268ed3a8c63a809f4403e307ecc13d848b1ab
On the branch meejah:1382.markberger-rewrite-rebase.6 as
per review
Remove old hypothesis tests
Fix allmydata.test.cli.test_cli.Errors.test_get
this was broken due to differing share placements
whereas we need to allow this.
Fix test_5_overdue_immutable
This change makes the test not depend on the value
of PYTHONHASHSEED.
Revert "Fix test_5_overdue_immutable"
This reverts commit 5f3696d9a53e7df8781a2c463c7112282397cd69.
fix test to actually hang the first 5 *servers*
sort keys for stable output
use file-context-managers
remove probably-unneeded assert (that fails sometimes)
another non-deterministic test?
correctly calculate happiness
guard with except
fix tests, and happiness calculation
remove debug
fix placements to None
happiness calc shouldn't have to filter None
WIP fixing some tests etc
Comment out all debug print statements
Add hypothesis tests for the old servers of happiness implementation
Attempt to speed up meejah's servers of happiness
WIP
Fix test_calc_happy
WIP
refactor hypothesis to be 'pytest style' and add another one
get rid of 'shares->set(1 thing)' in generate_mappings return
Add a unittest hypothesis came up with
fix tests since we return peers, not sets-of-1-peer
add more debug
add a unit-test that's like test_problem_layout_ticket_1128
fix bug
add a note
fix utest
unit-test for bigger numbers
re-insert markberger code for testing
results of pairing with david
unit-test for happiness calculation
unused function
put old servers_of_happiness() calculation back for now
test for calculate_happiness
remove some redundant functions
Add comments 10 and 8 from the servers of happiness spec
Fix bug in _filter_g3 for servers of happiness
Remove usage of HappinessUpload class
here we modifying the PeerSelector class.
we make sure to correctly calculate the happiness value
by ignoring keys who's value are None...
Remove HappinessUpload and tests
Replace helper servers_of_happiness
we replace it's previous implementation with a new
wrapper function that uses share_placement
This is a change I've wanted to make for many years, because when we get
to HTTP-based servers, we won't have tubids for them. What held me back
was that there's code all over the place that uses the serverid for
various purposes, so I wasn't sure it was safe. I did a big push a few
years ago to use IServer instances instead of serverids in most
places (in #1363), and to split out the values that actually depend upon
tubid into separate accessors (like get_lease_seed and
get_foolscap_write_enabler_seed), which I think took care of all the
important uses.
There are a number of places that use get_serverid() as dictionary key
to track shares (Checker results, mutable servermap). I believe these
are happy to use pubkeys instead of tubids: the only thing they do with
get_serverid() is to compare it to other values obtained from
get_serverid(). A few places in the WUI used serverid to compute display
values: these were fixed.
The main trouble was the Helper: it returns a HelperUploadResults (a
Copyable) with a share->server mapping that's keyed by whatever the
Helper's get_serverid() returns. If the uploader and the helper are on
different sides of this change, the Helper could return values that the
uploader won't recognize. This is cosmetic: that mapping is only used to
display the upload results on the "Recent and Active Operations" page.
I've added code to StorageFarmBroker.get_stub_server() to fall back to
tubids when looking up a server, so this should still work correctly
when the uploader is new and the Helper is old. If the Helper is new and
the uploader is old, the upload results will show unusual server ids.
refs ticket:1363
Adds:
- a JSON endpoint
- CLI to display information
- QueuedItem + IQueuedItem for uploader/downloader
- IProgress interface + PercentProgress implementation
- progress= args to many upload/download APIs
The unnecessary request was from the upload helper to the sender, and it was
there in order to trigger the sender to deliver cleartext hashes. But we've
long since removed cleartext hashes.
Unit tests pass both before and after this change, and code-coverage shows that
the block I changed is exercised in unit tests.
There was one corner case (where the client disconnects at just the
wrong time) that could have dropped a Deferred, leading to an Unhandled
Error. Clean up the control flow to avoid this case.
DeepResultsBase also has a get_corrupt_shares(), and it is populated
from CheckResults.get_corrupt_shares(). It has been updated too, along
with get_remaining_corrupt_shares().
Remove temporary get_new_corrupt_shares() and
get_new_incompatible_shares().
This changes all code which feeds CheckResults(sharemap=) to provide
IServer instances, but CheckResults converts these to old-style
serverids during output, so downstream code doesn't have to change yet.
It adds a temporary get_new_sharemap(), which *does* return IServer
instances, so the immutable repairer can build new CheckResults from an
old one. This will go away when get_sharemap() is updated to return
IServer (and downstream code is updated too).
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
This stores IDisplayableServer-providing instances (StubServers or
NativeStorageServers) in the .servermap and .sharemap dictionaries. But
get_servermap()/get_sharemap() still return data structures with
serverids, not IServers, by translating their data on the way out. This
lets us put off changing the callers for a little bit longer.
Complete the getter-based transformation, by hiding ".uri" and updating
callers to use get_uri(). Also don't set a dummy self._uri, leave it
undefined until someone calls set_uri().
This hides attributes with e.g. _sharemap, and creates getters like
get_sharemap() to access them, for every field except .uri . This will
make it easier to modify the internal representation of .sharemap
without requiring callers to adjust quite yet.
".uri" has so many users that it seemed better to update it in a
subsequent patch.
Populate most of UploadResults (except .uri, which is learned later when
using a Helper) in the constructor, instead of allowing creators to
write to attributes later. This will help isolate the fields that we
want to change to use IServers.
This splits the pb.Copyable on-wire object (HelperUploadResults) out
from the local results object (UploadResults). To maintain compatibility
with older Helpers, we have to leave pb.Copyable classes alone and
unmodified, but we want to change UploadResults to use IServers instead
of serverids. So by using a different class on the wire, and translating
to/from it on either end, we can accomplish both.
This measured how long the Helper took to do a filecheck before asking
for ciphertext. The "Contacting Helper" report includes both
existence_check and the client-helper RTT.
For non-overlapping uploads, it was being returned correctly. But when
multiple upload requests overlapped, and the file was not already in the
grid, the filecheck would only run once, and its existence_check time
would be reported for all uploaders (even if they didn't have to wait
for that time). Cleaning that up proved too difficult: the only correct
place to report this time is from the initial remote_upload_chk() call,
but the return value of that is too constrained to accomodate it in the
needs-upload case.
So I'm removing it altogether. Eventually I plan to add a proper
events/times field and record more data, including this check, in a form
that can be drawn on a nice zoomable timeline view.
Old clients talking to a new Helper (which doesn't supply the value)
will tolerate the loss (they'll just display an empty field on the web
view).
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).
In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).
The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.
I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
* use d3.js v2.4.6
* add a "toggle misc events" button, to get hash/bitmap-checking details
* only draw data that's on screen, for speed
* add fragment-arg to fetch timeline data.json from somewhere else
This consistently records all immutable uploads in the Recent Uploads And
Downloads page, regardless of code path. Previously, certain webapi upload
operations (like PUT /uri/$DIRCAP/newchildname) failed to pass the History
object and were left out.
Shares are still verified in parallel, but within a share, don't request a
block until the previous block has been verified and the memory we used to hold
it has been freed up.
Patch originally due to Brian. This version has a mockery-patchery-style test
which is "low tech" (it implements the patching inline in the test code instead
of using an extension of the mock.patch() function from the mock library) and
which unpatches in case of exception.
fixes#1395
This patch is a rebase of a patch originally written by Brian. I didn't change any of the intent of Brian's patch, just ported it to current trunk.
refs #1363
This patch was originally written by Brian, but was re-recorded by Zooko to use
darcs replace instead of hunks for any file in which it would result in fewer
total hunks.
refs #1363
This patch was written by Brian but was re-recorded by Zooko (with David-Sarah looking on) to use darcs replace instead of editing to rename the three variables to their new names.
refs #1363
When blocks terminate (either COMPLETE or CORRUPT/DEAD/BADSEGNUM), the
_shares_from_server dict was being popped incorrectly (using shnum as the
index instead of serverid). I'm still thinking through the consequences of
this bug. It was probably benign and really hard to detect. I think it would
cause us to incorrectly believe that we're pulling too many shares from a
server, and thus prefer a different server rather than asking for a second
share from the first server. The diversity code is intended to spread out the
number of shares simultaneously being requested from each server, but with
this bug, it might be spreading out the total number of shares requested at
all, not just simultaneously. (note that SegmentFetcher is scoped to a single
segment, so the effect doesn't last very long).
No behavioral changes, just updating variable/method names and log messages.
The effects outside these three files should be minimal: some exception
messages changed (to say "server" instead of "peer"), and some internal class
names were changed. A few things still use "peer" to minimize external
changes, like UploadResults.timings["peer_selection"] and
happinessutil.merge_peers, which can be changed later.
Pass around IServer instance instead of (peerid, rref) tuple. Replace
"descriptor" with "server". Other replacements:
get_all_servers -> get_connected_servers/get_known_servers
get_servers_for_index -> get_servers_for_psi (now returns IServers)
This change still needs to be pushed further down: lots of code is now
getting the IServer and then distributing (peerid, rref) internally.
Instead, it ought to distribute the IServer internally and delay
extracting a serverid or rref until the last moment.
no_network.py was updated to retain parallelism.
* repairer (really the uploader) reads beyond end of input file (Uploadable)
* new-downloader does not tolerate overreads
* uploader does lots of tiny reads (inefficient)
This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.
fixes#1191
Patch by Brian. This patch description was actually written by Zooko, but I forged Brian's name on the "author" field so that he would get credit for this patch in revision control history.
deliver all shares at once instead of feeding them out one-at-a-time.
Also fix distribution of real-number-of-segments information: now all
CommonShares (not just the ones used for the first segment) get a
correctly-sized hashtree. Previously, the late ones might not, which would
make them crash and get dropped (causing the download to fail if the initial
set were insufficient, perhaps because one of their servers went away).
Update tests, add some TODO notes, improve variable names and comments.
Improve logging: add logparents, set more appropriate levels.
This avoids spamming the "recent uploads and downloads" /status page from
FileNode instances that were created for a directory read but which nobody is
ever going to read from. I also cleaned up the way DownloadStatus instances
are made to only ever do it in the CiphertextFileNode, not in the
higher-level plaintext FileNode. Also fixed DownloadStatus handling of read
size, thanks to David-Sarah for the catch.
seems to avoid the #1155 log message which reveals the URI (and filecap).
Also add an [ERROR] marker to the flog entry, since unregisterProducer also
makes interrupted downloads appear "200 OK"; this makes it more obvious that
the download did not complete.