When blocks terminate (either COMPLETE or CORRUPT/DEAD/BADSEGNUM), the
_shares_from_server dict was being popped incorrectly (using shnum as the
index instead of serverid). I'm still thinking through the consequences of
this bug. It was probably benign and really hard to detect. I think it would
cause us to incorrectly believe that we're pulling too many shares from a
server, and thus prefer a different server rather than asking for a second
share from the first server. The diversity code is intended to spread out the
number of shares simultaneously being requested from each server, but with
this bug, it might be spreading out the total number of shares requested at
all, not just simultaneously. (note that SegmentFetcher is scoped to a single
segment, so the effect doesn't last very long).
No behavioral changes, just updating variable/method names and log messages.
The effects outside these three files should be minimal: some exception
messages changed (to say "server" instead of "peer"), and some internal class
names were changed. A few things still use "peer" to minimize external
changes, like UploadResults.timings["peer_selection"] and
happinessutil.merge_peers, which can be changed later.
I'm skeptical that the test was proceeding correctly but ran out of time. It seems more likely that it had gotten hung. But if we raise the timeout to an even more extravagant number then we can be even more certain that the test was never going to finish.
Pass around IServer instance instead of (peerid, rref) tuple. Replace
"descriptor" with "server". Other replacements:
get_all_servers -> get_connected_servers/get_known_servers
get_servers_for_index -> get_servers_for_psi (now returns IServers)
This change still needs to be pushed further down: lots of code is now
getting the IServer and then distributing (peerid, rref) internally.
Instead, it ought to distribute the IServer internally and delay
extracting a serverid or rref until the last moment.
no_network.py was updated to retain parallelism.
The service generated by strports.service() changed in 10.2, and the ugly
private-attribute-reading hack we used to glean a kernel-allocated port
number (e.g. when using "tcp:0", especially during unit tests) broke, causing
Tahoe to be completely unusable with Twisted-10.2 . The new ugly
private-attribute-reading hack starts by figuring out what sort of service
was generated, then reads different attributes accordingly.
This also hushes a warning when using schemeless strports strings like "0" or
"3456", by quietly prepending a "tcp:" scheme, since 10.2 complains about
those. It also adds getURL() and getPortnum() accessors to the "webish"
service, rather than having unit tests dig through _url and _portnum and such
to find out what they are.
I make some changes on status web pages
status.xhtml:
- Delete unused webform_css link
- Align tables on the left
tahoe-css:
- Do some minor changes on code synthax
- changes table.status-download-events style to look like other tables
status.py:
- Align table on the left
- Changes table header
- Add heading tags
- Modify google api graph: add image border, calculate height to feet data
signed-off-by: zooko@zooko.comfixes#1219
I personally used "tahoe start/restart -m ../MY-TESTNET/node*" all the time,
to spin up or update a local testgrid while iterating over new code. However,
with the recent switch from "subprocess.Popen(/bin/twistd)" to "import and
call twistd.run()" in scripts/startstop_node.py (yay fewer processes!),
"start -m" broke, and fixing it requires os.fork, which is unavailable on
windows (boo windows!). And I was probably the only one using -m. So in the
interests of uniformity among platforms and simpler code (yay negative code
days!), we're just removing -m from everything. I will start using a little
shell script or something to simulate the removed functionality.
This patch also cleans up CLI-function calling a bit: get the basedir from
the config dict (instead of sometimes from a separate argument), and always
return a numeric exit code.
Specifically, test_runner.CreateNode.test_client failed, because the
os.fork-is-present test decided that --multiple should not be allowed on
windows, even though --multiple works just fine for 'tahoe create-client'.
The only restriction on --multiple is for 'tahoe start' and 'tahoe restart'.
This needs a different approach, probably by cleaning up BasedirMixin. We
should only be withholding --multiple on windows for "start" and
"restart". (we should continue withholding --multiple on all platforms for
"run").
This reverts (git) commit f3adb037ae:
"startstop_node.py: fix "tahoe start -m" by forking before non-final targets"
* don't advertise -m flag on tahoe start/restart/run unless os.fork is
available (i.e. windows)
* test_runner.py: add test to exercise "start/stop/restart -m"
For one thing, this makes missing-dependency failures into DistributionNotFound errors instead of ImportErrors, which might be more useful to the user. For another thing, if someone is using distributions that were installed with --multi-version, then they might be not importable until after require_auto_deps() has been run. (The docs claim that this would be the case, but we don't have an example of this happening at this time.)
* repairer (really the uploader) reads beyond end of input file (Uploadable)
* new-downloader does not tolerate overreads
* uploader does lots of tiny reads (inefficient)
This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.
This removes the need to use a locally-built (dependency) bin/twistd, and
removes a big chunk of behavior differences between unix and windows. It
also happens to resolve the "client node probably started" uncertainty.
Might help with #1190, #602, and #71.
fixes#530. I earlier tried this twice (see #530 for history) and then twice rolled it back due to some problems that arose. However, I didn't write down what the problems were in enough detail on the ticket that I can tell today whether those problems are still issues, so here goes the third attempt. (I did write down on the ticket that it would not create site.py or .pth files in the target directory with --multi-version mode, but I didn't explain why *that* was a problem.)
fixes#1191
Patch by Brian. This patch description was actually written by Zooko, but I forged Brian's name on the "author" field so that he would get credit for this patch in revision control history.
If you are investigating the bug in new-downloader, one way to investigate might be to change this ordering to a different fixed order (e.g. rotate by 4 instead of rotate by 5) and observe how the behavior of new-downloader differs in that case.
Kyle's OpenBSD buildslave used 41 reads when doing this test. The fact that I'm blindly bumping this number up to match the observed behavior probably means this isn't a good criterion to be testing for anyway. But perhaps someone else (Brian) could investigate why that run on Kyle's OpenBSD box took four more reads than we expected, and whether the fact that it took 41 reads to do this operation is indicative of an actual problem.
deliver all shares at once instead of feeding them out one-at-a-time.
Also fix distribution of real-number-of-segments information: now all
CommonShares (not just the ones used for the first segment) get a
correctly-sized hashtree. Previously, the late ones might not, which would
make them crash and get dropped (causing the download to fail if the initial
set were insufficient, perhaps because one of their servers went away).
Update tests, add some TODO notes, improve variable names and comments.
Improve logging: add logparents, set more appropriate levels.
This avoids spamming the "recent uploads and downloads" /status page from
FileNode instances that were created for a directory read but which nobody is
ever going to read from. I also cleaned up the way DownloadStatus instances
are made to only ever do it in the CiphertextFileNode, not in the
higher-level plaintext FileNode. Also fixed DownloadStatus handling of read
size, thanks to David-Sarah for the catch.
seems to avoid the #1155 log message which reveals the URI (and filecap).
Also add an [ERROR] marker to the flog entry, since unregisterProducer also
makes interrupted downloads appear "200 OK"; this makes it more obvious that
the download did not complete.
The lost-progress bug occurred when two simultanous read() calls fetched
different segments, and the first one failed (due to corruption, or the other
bugs in #1154): the second read() would never complete. While in this state,
cancelling the second read by having its consumer call stopProducing) would
trigger the cancel-intolerance bug. Finally, in downloader.node.Cancel,
prevent late cancels by adding an 'active' flag
The Range header causes n.read() to be called with an offset= of type 'long',
which eventually got used in a Spans/DataSpans object's __len__ method.
Apparently python doesn't permit __len__() to return longs, only ints.
Rewrote Spans/DataSpans to use s.len() instead of len(s) aka s.__len__() .
Added a test in test_download. Note that test_web didn't catch this because
it uses mock FileNodes for speed: it's probably time to rewrite that.
There is still an unresolved error-recovery problem in #1154, so I'm not
closing the ticket quite yet.