We're removing this function because it is currently unused, because it is dangerous, and because the bug described in #1528 leaks the cancellation secret, which allows anyone who knows a file's storage index to abuse this function to delete shares of that file.
ref. #1528
This check needs to be done with each fetch from the storage server, to
detect when someone has changed the share (i.e. our servermap goes stale).
Doing it just once at the beginning of retrieve isn't enough: a write might
occur after the first segment but before the second, etc.
_try_to_validate_prefix() was not removed: it will be used by the future
check-with-each-fetch code.
test_mutable.Roundtrip.test_corrupt_all_seqnum_late was disabled, since it
fails until this check is brought back. (the corruption it applies only
touches the prefix, not the block data, so the check-less retrieve actually
tolerates it). Don't forget to re-enable it once the check is brought back.
This is a neat trick to reduce Foolscap overhead, but the need for an
explicit flush() complicates the Retrieve path and makes it prone to
lost-progress bugs.
Also change test_mutable.FakeStorageServer to tolerate multiple reads of the
same share in a row, a limitation exposed by turning off the queue.
Note that the downloader will still fetch a segment for a zero-length
read, which is wasteful. Fixing that isn't specifically required to fix
#1512, but it should probably be fixed before 1.9.
This first step shaves 15% off the runtime: from 139s to 119s on my laptop.
It also fixes a couple of places where a Deferred was being dropped, which
would cause two tests to run in parallel and also confuse error reporting.
This consistently records all immutable uploads in the Recent Uploads And
Downloads page, regardless of code path. Previously, certain webapi upload
operations (like PUT /uri/$DIRCAP/newchildname) failed to pass the History
object and were left out.
Without this, we get a regression when modifying a mutable file that was
created with more shares (larger N) than our current tahoe.cfg . The
modification attempt creates new versions of the (0,1,..,newN-1) shares, but
leaves the old versions of the (newN,..,oldN-1) shares alone (and throws a
assertion error in SDMFSlotWriteProxy.finish_publishing in the process).
The mixed versions that result (some shares with e.g. N=10, some with N=20,
such that both versions are recoverable) cause problems for the Publish code,
even before MDMF landed. Might be related to refs #1390 and refs #1042.
These are their own patch because they cut across a lot of the changes
I've made in implementing MDMF in such a way as to make it difficult to
split them up into the other patches.
- Learn how to create MDMF files and directories through the
mutable-type argument.
- Operate with the interface changes associated with MDMF and #993.
- Learn how to do partial updates of mutable files.
The changes in layout.py are mostly concerned with the MDMF share
format. In particular, we define read and write proxy objects used by
retrieval, publishing, and other code to write and read the MDMF share
format. We create equivalent proxies for SDMF objects so that these
objects can be suitably general.
A candidate patch for #1429 has a bug when it is using FilePath.is_dir() to detect whether the configured local dir exists and is a directory. FilePath.is_dir() raises exception, instead of returning False, if the thing doesn't exist. This test is to make sure that DropUploader.__init__ raise different exceptions for those two cases.
refs #1429
This is a subset of a patch that David-Sarah attached to #1429. This is just the unit-tests part of that patch, and uses darcs record instead of hunks to change the names.
refs #1429
Shares are still verified in parallel, but within a share, don't request a
block until the previous block has been verified and the memory we used to hold
it has been freed up.
Patch originally due to Brian. This version has a mockery-patchery-style test
which is "low tech" (it implements the patching inline in the test code instead
of using an extension of the mock.patch() function from the mock library) and
which unpatches in case of exception.
fixes#1395
Check for the existence of any of them and if any are found raise exception which will abort the startup of the node.
This is a backwards-incompatible change for anyone who is still using old-style configuration files.
fixes#1385
This patch is a rebase of a patch originally written by Brian. I didn't change any of the intent of Brian's patch, just ported it to current trunk.
refs #1363
This patch was originally written by Brian, but was re-recorded by Zooko to use
darcs replace instead of hunks for any file in which it would result in fewer
total hunks.
refs #1363
Apparently none of the two authors (stercor, terrell), three reviewers (warner, davidsarah, terrell), or one committer (me) actually ran the tests. This is presumably due to #20.
fixes#1412
interfaces.py: modified the return type of RIStatsProvider.get_stats to allow for None as a return value
NEWS.rst, stats.py: documentation of change to get_latencies
stats.rst: now documents percentile modification in get_latencies
test_storage.py: test_latencies now expects None in output categories that contain too few samples for the associated percentile to be unambiguously reported.
fixes#1392
No behavioral changes, just updating variable/method names and log messages.
The effects outside these three files should be minimal: some exception
messages changed (to say "server" instead of "peer"), and some internal class
names were changed. A few things still use "peer" to minimize external
changes, like UploadResults.timings["peer_selection"] and
happinessutil.merge_peers, which can be changed later.
I'm skeptical that the test was proceeding correctly but ran out of time. It seems more likely that it had gotten hung. But if we raise the timeout to an even more extravagant number then we can be even more certain that the test was never going to finish.
Pass around IServer instance instead of (peerid, rref) tuple. Replace
"descriptor" with "server". Other replacements:
get_all_servers -> get_connected_servers/get_known_servers
get_servers_for_index -> get_servers_for_psi (now returns IServers)
This change still needs to be pushed further down: lots of code is now
getting the IServer and then distributing (peerid, rref) internally.
Instead, it ought to distribute the IServer internally and delay
extracting a serverid or rref until the last moment.
no_network.py was updated to retain parallelism.
The service generated by strports.service() changed in 10.2, and the ugly
private-attribute-reading hack we used to glean a kernel-allocated port
number (e.g. when using "tcp:0", especially during unit tests) broke, causing
Tahoe to be completely unusable with Twisted-10.2 . The new ugly
private-attribute-reading hack starts by figuring out what sort of service
was generated, then reads different attributes accordingly.
This also hushes a warning when using schemeless strports strings like "0" or
"3456", by quietly prepending a "tcp:" scheme, since 10.2 complains about
those. It also adds getURL() and getPortnum() accessors to the "webish"
service, rather than having unit tests dig through _url and _portnum and such
to find out what they are.
I personally used "tahoe start/restart -m ../MY-TESTNET/node*" all the time,
to spin up or update a local testgrid while iterating over new code. However,
with the recent switch from "subprocess.Popen(/bin/twistd)" to "import and
call twistd.run()" in scripts/startstop_node.py (yay fewer processes!),
"start -m" broke, and fixing it requires os.fork, which is unavailable on
windows (boo windows!). And I was probably the only one using -m. So in the
interests of uniformity among platforms and simpler code (yay negative code
days!), we're just removing -m from everything. I will start using a little
shell script or something to simulate the removed functionality.
This patch also cleans up CLI-function calling a bit: get the basedir from
the config dict (instead of sometimes from a separate argument), and always
return a numeric exit code.
Specifically, test_runner.CreateNode.test_client failed, because the
os.fork-is-present test decided that --multiple should not be allowed on
windows, even though --multiple works just fine for 'tahoe create-client'.
The only restriction on --multiple is for 'tahoe start' and 'tahoe restart'.
This needs a different approach, probably by cleaning up BasedirMixin. We
should only be withholding --multiple on windows for "start" and
"restart". (we should continue withholding --multiple on all platforms for
"run").
This reverts (git) commit f3adb037ae:
"startstop_node.py: fix "tahoe start -m" by forking before non-final targets"
* don't advertise -m flag on tahoe start/restart/run unless os.fork is
available (i.e. windows)
* test_runner.py: add test to exercise "start/stop/restart -m"
* repairer (really the uploader) reads beyond end of input file (Uploadable)
* new-downloader does not tolerate overreads
* uploader does lots of tiny reads (inefficient)
This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.
If you are investigating the bug in new-downloader, one way to investigate might be to change this ordering to a different fixed order (e.g. rotate by 4 instead of rotate by 5) and observe how the behavior of new-downloader differs in that case.
Kyle's OpenBSD buildslave used 41 reads when doing this test. The fact that I'm blindly bumping this number up to match the observed behavior probably means this isn't a good criterion to be testing for anyway. But perhaps someone else (Brian) could investigate why that run on Kyle's OpenBSD box took four more reads than we expected, and whether the fact that it took 41 reads to do this operation is indicative of an actual problem.
deliver all shares at once instead of feeding them out one-at-a-time.
Also fix distribution of real-number-of-segments information: now all
CommonShares (not just the ones used for the first segment) get a
correctly-sized hashtree. Previously, the late ones might not, which would
make them crash and get dropped (causing the download to fail if the initial
set were insufficient, perhaps because one of their servers went away).
Update tests, add some TODO notes, improve variable names and comments.
Improve logging: add logparents, set more appropriate levels.
The lost-progress bug occurred when two simultanous read() calls fetched
different segments, and the first one failed (due to corruption, or the other
bugs in #1154): the second read() would never complete. While in this state,
cancelling the second read by having its consumer call stopProducing) would
trigger the cancel-intolerance bug. Finally, in downloader.node.Cancel,
prevent late cancels by adding an 'active' flag
The Range header causes n.read() to be called with an offset= of type 'long',
which eventually got used in a Spans/DataSpans object's __len__ method.
Apparently python doesn't permit __len__() to return longs, only ints.
Rewrote Spans/DataSpans to use s.len() instead of len(s) aka s.__len__() .
Added a test in test_download. Note that test_web didn't catch this because
it uses mock FileNodes for speed: it's probably time to rewrite that.
There is still an unresolved error-recovery problem in #1154, so I'm not
closing the ticket quite yet.
The fixed 10-second timer will eventually be replaced with a per-server
value, calculated based on observed response times.
test_hung_server.py: enhance to exercise DYHB=OVERDUE state. Split existing
mutable+immutable tests into two pieces for clarity. Reenabled several tests.
Deleted the now-obsolete "test_failover_during_stage_4".