seems to avoid the #1155 log message which reveals the URI (and filecap).
Also add an [ERROR] marker to the flog entry, since unregisterProducer also
makes interrupted downloads appear "200 OK"; this makes it more obvious that
the download did not complete.
The lost-progress bug occurred when two simultanous read() calls fetched
different segments, and the first one failed (due to corruption, or the other
bugs in #1154): the second read() would never complete. While in this state,
cancelling the second read by having its consumer call stopProducing) would
trigger the cancel-intolerance bug. Finally, in downloader.node.Cancel,
prevent late cancels by adding an 'active' flag
The Range header causes n.read() to be called with an offset= of type 'long',
which eventually got used in a Spans/DataSpans object's __len__ method.
Apparently python doesn't permit __len__() to return longs, only ints.
Rewrote Spans/DataSpans to use s.len() instead of len(s) aka s.__len__() .
Added a test in test_download. Note that test_web didn't catch this because
it uses mock FileNodes for speed: it's probably time to rewrite that.
There is still an unresolved error-recovery problem in #1154, so I'm not
closing the ticket quite yet.
The fixed 10-second timer will eventually be replaced with a per-server
value, calculated based on observed response times.
test_hung_server.py: enhance to exercise DYHB=OVERDUE state. Split existing
mutable+immutable tests into two pieces for clarity. Reenabled several tests.
Deleted the now-obsolete "test_failover_during_stage_4".
- Make some important utility functions clearer and more thoroughly
documented.
- Assert in upload.servers_of_happiness that the buckets attributes
of PeerTrackers passed to it are mutually disjoint.
- Get rid of some silly non-Pythonisms that I didn't see when I first
wrote these patches.
- Make sure that should_add_server returns true when queried about a
shnum that it doesn't know about yet.
- Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
of peerids, alter dependencies to deal with that.
- Remove upload.should_add_servers, because it is no longer necessary
- Move upload.shares_of_happiness and upload.shares_by_server to a utility
file.
- Change some points in Tahoe2PeerSelector.
- Compute servers_of_happiness using a bipartite matching algorithm that
we know is optimal instead of an ad-hoc greedy algorithm that isn't.
- Change servers_of_happiness to just take a sharemap as an argument,
change its callers to merge existing_shares and used_peers before
calling it.
- Change an error message in the encoder to be more appropriate for
servers of happiness.
- Clarify the wording of an error message in immutable/upload.py
- Refactor a happiness failure message to happinessutil.py, and make
immutable/upload.py and immutable/encode.py use it.
- Move the word "only" as far to the right as possible in failure
messages.
- Use a better definition of progress during peer selection.
- Do read-only peer share detection queries in parallel, not sequentially.
- Clean up logging semantics; print the query statistics whenever an
upload is unsuccessful, not just in one case.
When I first implemented #778, I just altered the error messages to refer to
servers where they referred to shares. The resulting error messages weren't
very good. These are a bit better.
The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
for a variety of error conditions that weren't informatively described by them.
This patch creates a new error, UploadHappinessError, replaces uses of
NoSharesError and NotEnoughSharesError with it, and alters the error message
raised with the errors to be more in line with the new servers_of_happiness
behavior. See ticket #834 for more information.
This can be useful if one of the ones that he has already begun downloading fails. See #287 for discussion. This fixes part of #287 which part was a regression caused by #928, namely this fixes fail-over in case a share is corrupted (or the server returns an error or disconnects). This does not fix the related issue mentioned in #287 if a server hangs and doesn't reply to requests for blocks.
This should put an end to the phenomenon I've been seeing that a single hung server can cause all downloads on a grid to hang. Also it should speed up all downloads by (a) not-waiting for responses to queries that it doesn't need, and (b) downloading shares from the servers which answered the initial query the fastest.
Also, do not count how many buckets you've gotten when deciding whether the download has enough shares or not -- instead count how many buckets to *unique* shares that you've gotten. This appears to improve a slightly weird behavior in the current download code in which receiving >= K different buckets all to the same sharenumber would make it think it had enough to download the file when in fact it hadn't.
This patch needs tests before it is actually ready for trunk.
allmydata.util.log.err() either takes a Failure as the first positional
argument, or takes no positional arguments and must be invoked in an
exception handler. Fixed its signature to match both foolscap.logging.log.err
and twisted.python.log.err . Included a brief unit test.
Stop checking separately for ConnectionDone/ConnectionLost, since those have
been folded into DeadReferenceError since foolscap-0.3.1 . Write
rrefutil.trap_deadref() in terms of rrefutil.trap_and_discard() to improve
code coverage.
Mutable servermap updates and the immutable checker, when run with
add_lease=True, send both the do-you-have-block and add-lease commands in
parallel, to avoid an extra round trip time. Many older servers have problems
with add-lease and raise various exceptions, which don't generally matter.
The client-side code was catching+ignoring some of them, but unrecognized
exceptions were passed through to the DYHB code, concealing the DYHB results
from the checker, making it think the server had no shares.
The fix is to separate the code paths. Both commands are sent at the same
time, but the errback path from add-lease is handled separately. Known
exceptions are ignored, the others (both unknown-remote and all-local) are
logged (log.WEIRD, which will trigger an Incident), but neither will affect
the DYHB results.
The add-lease message is sent first, and we know that the server handles them
synchronously. So when the checker is done, we can be sure that all the
add-lease messages have been retired. This makes life easier for unit tests.
* remove Downloader.download_to_data/download_to_filename/download_to_filehandle
* remove download.Data/FileName/FileHandle targets
* remove filenode.download/download_to_data/download_to_filename methods
* leave Downloader.download (the whole Downloader will go away eventually)
* add util.consumer.MemoryConsumer/download_to_data, for convenience
(this is mostly used by unit tests, but it gets used by enough non-test
code to warrant putting it in allmydata.util)
* update tests
* removes about 180 lines of code. Yay negative code days!
Overall plan is to rewrite immutable/download.py and leave filenode.read() as
the sole read-side API.
The proper hierarchy is:
IFilesystemNode
+IFileNode
++IMutableFileNode
++IImmutableFileNode
+IDirectoryNode
Also expand test_client.py (NodeMaker) to hit all IFilesystemNode types.
* stop caching most_recent_size in dirnode, rely upon backing filenode for it
* start caching most_recent_size in MutableFileNode
* return None when you don't know, not "?"
* only render None as "?" in the web "more info" page
* add get_size/get_current_size to UnknownNode
* "cap" means a python instance which encapsulates a filecap/dircap (uri.py)
* "uri" means a string with a "URI:" prefix
* FileNode instances are created with (and retain) a cap instance, and
generate uri strings on demand
* .get_cap/get_readcap/get_verifycap/get_repaircap return cap instances
* .get_uri/get_readonly_uri return uri strings
* add filenode.download_to_filename() for control.py, should find a better way
* use MutableFileNode.init_from_cap, not .init_from_uri
* directory URI instances: use get_filenode_cap, not get_filenode_uri
* update/cleanup bench_dirnode.py to match, add Makefile target to run it
This makes it more obvious that the Helper currently generates leases with
the Helper's own secrets, rather than getting values from the client, which
is arguably a bug that will likely be resolved with the Accounting project.