This should tolerate offset/size combinations that read the last byte of
the file, something which was broken before. It quits early in the case
of zero-byte reads, to simplify the resulting "which segments do I need"
logic. Probably addresses ticket:2459.
test_cli.Help was too sensitive to the way that the --help output was
wrapped, which caused failures on travis when COLUMNS= was set low and
the expected strings were split across separate lines.
Also:
* do some light refactoring of create-client/node
* make it clear that these commands' --basedir options do the same as
the global --node-directory option
* use "global-options" instead of "global-opts"
Subcommands "--help" is now rendered as:
```
tahoe [global-options] COMMAND [options] ARGS
(use 'tahoe --help' to view global options)
USAGE (flags/options)
DESCRIPTION
DESCRIPTION_UNWRAPPED
```
The new .description and .description_unwrapped fields allow
commands (subclasses of twisted.python.usage.Usage) better control over
how their explanations are rendered: the old .longdesc field was wrapped
unpleasantly.
This avoids an error case where an empty child name resulted in a
duplicate mkdir. It adds a precondition check to guard against empty
child names, and some test cases. It also cleans up a funny redundancy
noticed earlier (refs ticket:2329).
FURLs are unguessable, but an attacker who somehow learned this FURL
could overwrite files and read sensitive data.
This will break the memory tests. I will add a new interface to support
the memory tests soon.
refs ticket:1737
refs ticket:2394
It's kind of a hack, but Twisted changed the API and I couldn't find a
cleaner way to detect which form of "permissions" value the Twisted FTP
server wants.
I've manually tested it against 14.0.2 and 15.0.0.
This tests ftpd, but not sftpd. Doing this sort of test on sftpd
requires the creation of a valid pubkey/privkey file pair, which is more
work than I want to do right now.
init_ftp/init_sftp were changed to interpret the configured
accounts.file as relative to the node's basedir, with
abspath_expanduser_unicode(accountfile, base=self.basedir).
This would happen naturally in a real node, since it os.chdir()s
to the basedir before doing anything. But tests don't do that.
Author: Brian Warner <warner@lothar.com>
Author: Daira Hopwood <daira@jacaranda.org>
Signed-off-by: Daira Hopwood <daira@jacaranda.org>
This substantially changes the internals of "tahoe cp", to behave in
accordance with the scheme developed in ticket:2329. test_cli_cp.py got
a large new test to exercise all the various combinations. This also
changes the set of error messages that "tahoe cp" can produce.
This modifies try_copy(), inserts a new implementation of
copy_things_to_directory() (and supporting methods), and fixes a few
bugs elsewhere.
fixes ticket:2329
This code will be replaced in the next commit with an entirely different
approach, and modifying it in a single commit would yield a completely
unreadable diff.
Replaces the location 'AUTO' with the autodetected IP/port combination.
Author: Chris Kerr <debdepba@dasganma.tk>
Signed-off-by: Daira Hopwood <daira@jacaranda.org>
When calculating the query boundary for updates to mutable files,
instead of using servers that used to have shares, use servers we
have added to the servermap. This way the querying process won't finish
until we have finished interacting with the servers that have shares.
This fixes the race condition which sometimes caused the querying process
to finish before the updater was done talking to servers with shares.
twisted.web.http.Request.setHeader() really wants a "bytes" object, but
we've been passing integers like len(body). Twisted-12.3 started to
complain about this (with a DeprecationWarning), but the warning is
usually silenced because py2.7 disables deprecations by default.
This fixes Tahoe's misbehavior, but others remain (in Nevow, at least).
I plan to set up some tooling to run tests with
PYTHONWARNINGS=default::DeprecationWarning and collect others. We won't
be able to fix the ones that occur outside of Tahoe, but at least we
should be able to fix our own.
refs ticket:2312
This replaces the status display which was only distinct by color which is a usability issue for color-blind users. This commit includes test coverage by way of pattern matching on rendered templates. The PNG icons are conversions of original SVG source which I've included and placed in the public domain.
The latest setuptools (version 8) changed the way dependency
specifications ("I can handle libfoo version 2 or 3, but not 4") are
interpreted. The new version follows PEP440, which is simpler but
somewhat less expressive. Tahoe's _auto_deps.py now uses dep-specs which
are correctly parsed by both old and new setuptools.
Fixes ticket:2354.
* Restrict the requirements in _auto_deps.py to work with either the old
or PEP 440 semantics.
* Update check_requirement and tests to take account of changes for PEP
440 compatibility.
* Fix an error message.
* Remove a superfluous TODO.
add get_available_space() to NativeStorageServer
It uses a new 'available-space' key in the server's v1 version dict, or falls
back to 'maximum-immutable-share-size' (which presently always has the same
value but could have a different meaning in the future).
This is a squash merge of 9773555bb87fab71145ad7a0e84785a4e92d11f7
Instead of constructing a sys.argv for 'twistd' that reads the node's
.tac file, we construct arguments that tell twistd to use a special
in-memory-only plugin that creates the desired node instance directly.
We still use the name of the .tac file to decide which kind of instance
to make (Client, IntroducerNode, KeyGenerator, StatsGatherer), but never
actually read the contents of the .tac file. Later improvements could
change this to look inside the tahoe.cfg for a nodetype= directive, etc.
This also makes it easy to have "tahoe start BASEDIR" pass the rest of
its arguments on to twistd, so e.g. "tahoe start BASEDIR --nodaemon
--profile=prof.out" does what you'd expect "twistd --nodaemon
--profile=prof.out" to do. "tahoe run BASEDIR" is thus simply aliased to
"tahoe start BASEDIR --nodaemon". This removes the need to special-case
--profile and --syslog.
I also removed some of the default logging behavior:
before:
'tahoe start' = 'twistd --logfile BASEDIR logs/twistd.log'
'tahoe start --profile' adds '--profile=profiling_results.prof --savestats'
'tahoe run' = 'twistd --nodaemon --logfile BASEDIR/logs/tahoesvc.log'
after:
'tahoe start' = 'twistd --logfile BASEDIR logs/twistd.log'
unless --logfile, --nodaemon, or --syslog are passed
'tahoe start --profile' invalid, use 'tahoe start --profile=OUTPUT'
'tahoe run' = 'twistd --nodaemon'
so log messages go to stdout
This finally enables 'tahoe run' to work with all node types, including
the key-generator and stats-gatherer.
It gets 'tahoe start' one step closer to accepting --reactor= . To
actually accomplish this will require this file, the enclosing
__init_.py files, and everything they import to avoid importing the
reactor. (if anything imports twisted.internet.reactor before
startstop_node.start() gets to run, then --reactor= comes too late).
That will take a lot of work, and requires lazy-loading of many core
libraries (foolscap.logging in particular), and removing a lot of code
from src/allmydata/__init__.py .
Note fix following issues from origial commit:
refactor unittests, fix style, add test
(0) use CommonFixture as mixin to increase DRYness
(1) self.failUnlessIn('size', metadata.keys()) --> self.failUnlessIn('size', metdata)
(2) test_size_is_not_None --> test_size_is_0 AND test_size_is_1000
Solution was very simple to implement, no content disposition header was
necessary.
Tested with both Firefox and Chrome, using binary image file stored in
folder, as well as with text data using LIT cap.
The stdlib 'subprocess' module in python-2.7.4 through 2.7.7 suffers
from http://bugs.python.org/issue18851 which causes unrelated file
descriptors to be closed when `subprocess.call()` fails the `exec()`,
such as when the executable being invoked does not actually exist. There
appears to be some randomness involved. This was fixed in python-2.7.8.
Tahoe's iputil.py uses subprocess.call on many different "ifconfig"-type
executables, most of which don't exist on any given platform (added in
git commit 8e31d66cd0). This results in a lot of file-descriptor
closing, which (at least during unit tests) tends to clobber important
things like Tub TCP sockets. This seems to be the root cause behind
ticket:2121, in which normal code tries to close already-closed sockets,
crashing the unit tests. Since different platforms have different
ifconfigs, some platforms will experience more failed execs than others,
so this bug could easily behave differently on linux vs freebsd, as well
as working normally on python-2.7.8 or 2.7.4.
This patch inserts a guard to make sure that os.path.isfile() is true
before allowing Popen.call() to try executing the target. This ought to
be enough to avoid the bug. It changes both iputil.py and
allmydata.__init__ (which uses Popen for calling "lsb_release"), which
are all the places where 'subprocess' is used outside of unit tests.
Other potential fixes: use the 'subprocess32' module from PyPI (which is
a bug-free backport of the Python3 stdlib subprocess module, but would
introduce a new dependency), or require python >= 2.7.8 (but this would
rule out development/deployment on the current OS-X 10.9 release, which
ships with 2.7.5, as well as other distributions like Ubuntu 14.04 LTS).
I believe this closes ticket:2121, and given the apparent relationship
between 2121 and 2023, I think it also closes ticket:2023 (although
since 2023 doesn't have copies of the failing log files, it's hard to
tell). I'm hoping that this will tide us over until 1.11 is released, at
which point we can execute on the plan to remove iputil.py entirely by
changing the way that nodes learn their externally-facing IP address.
Some Travis-CI workers report persistently empty disks, causing spurious
test failures. It's not really that important to assert used>0, so this
relaxes the test.
Closes ticket:2290
Add a tooltip to explain what SDMF means. Cannot find a definition for MDMF; I presume "Medium" but at the risk of being wrong, I don't want to just blindly make that suggested change.
Closes ticket:2281 (trac).
This removes src/allmydata/test/trial_coverage.py, which was a
in-process way to run trial tests under the "coverage" code-coverage
tool. These days, the preferred way to do this is with "coverage run",
although the actual invocation is a bit messy because of the way
bin/trial uses subprocess.call() to invoke the real entrypoint script
with the right PYTHONPATH (see #1698 for details). Hopefully this will
be improved to use a simpler "coverage run .." command in the future.
This patch also removes twisted/plugins/allmydata_trial.py, which
enabled the "--reporter=bwverbose-coverage" option. Finally it modifies
setup.py to stop looking for that option and adding "trialcoverage" to
the dependencies list, which gets us closer to removing "setup_requires"
entirely.
This is an initial conversion of the directory pages from the old style
to the new style which is based on Twitter Bootstrap.
Still some remaining work to be done. You can see a screenshot here:
http://i.imgur.com/MPEngGx.png
This should hopefully satisfy the Debian requirement to include original
sources. The old minified files for d3 and jquery were 63k and 91k
respectively, while the new unminified files are 133k and 293k.
We also change the order of setting up attributes in Retrieve so that _raise_notenoughshareserror() can be called from _setup_download.
Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org>
A simple refactoring. Doesn't even require a new or updated unit test.
Author: Zooko O'Whielacronx <zooko@zooko.com>
Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org>
For a brief while (in between releases 1.9 and 1.10, specifically from
revision bc21726 on 12-Mar-2012, until bf416af on 10-Jun-2012), the new
introducer code stored its node key in NODEDIR/private/server.privkey .
After that point, it was updated to store this key in
NODEDIR/private/node.privkey instead. Fallback code was added to read
from the old location if present (so that folks using development
versions could keep their node keys after the bf416af change).
This patch removes the fallback code. If you have a node which was run
under a version of Tahoe within this range, you need to manually update
your node by running:
mv NODEDIR/private/server.privkey NODEDIR/private/node.privkey
and then restart the node. If you accidentally start an older node with
code after this patch, it will create a new key (and other peers will
think a new server has appeared). You can either stick with the new key,
or use the command above to switch back to the old key.
See docs/nodekeys.rst (not yet written) for details about the node key
and how it is used.
The new rules for "bin/tahoe ARG1.. SUBCOMMAND ARG2.." arg:
* --node-directory is only accepted in ARG1, not ARG2
* create-*/start/stop/restart accept --basedir in ARG2, or an explicit
basedir argument
* only one of --node-directory/--basedir/explicit-basedir is accepted
* --quiet/--version is only accepted in ARG1, not ARG2
Closes#166
This should now fail quickly (during "tahoe start"). Previously this
would silently treat an unparseable size as "0", and the only way to
discover that it had had a problem would be to look at the foolscap log,
or examine the storage-service web page for the unexpected "Reserved
Size" number.
Previously, Introducers always used a swissnum of "introducer", so
anyone who could learn the (public) tubid of the introducer would be
able to connect to and use it. This changes new Introducers to use the
same randomly-generated swissnum as clients and storage servers do, so
that you absolutely must learn the introducer.furl from someone who
knows it already before you can connect.
This change also moves the location of the file that stores
introducer.furl from BASEDIR/introducer.furl to
BASEDIR/private/introducer.furl, since that's where we keep the private
things. The first time an introducer is started with the new code, it
will move any existing BASEDIR/introducer.furl into the new place.
Note that this will not change the FURL of existing introducers: it will
only affect newly created ones. When you change an introducer's FURL,
you must also update all of the nodes (clients and storage servers)
which connect to it, so upgrading it to an unguessable one isn't
something we should do automatically.
This stores the sequence number in BASEDIR/announcement-seqnum, and
increments it each time any service is published (every service
announcement is regenerated with the new sequence number). As everyone
knows, time is an illusion, and occasionally goes backwards, so a
counter is generally safer (and reveals less information about the
node).
Later, we'll improve the introducer client to tolerate rollbacks (where,
perhaps due to a VM being restarted from an earlier checkpoint, the
stored sequence number reverts to an earlier version).
I want mode="w" (i.e. text, with newline conversion) for code that
writes newline-terminated strings (which should also be human readable)
to files. I like to use things like "cat .tahoe/permutation-seed"
without seeing the seed jammed together with the next command prompt.
twisted.web.html.escape was used to produce html-encoded string (to then look
it up in "value" attribute), but behavior of that function has changed between
Twisted 12.2.0 (simple custom implementation) and 12.3.0 (imported from stdlib
cgi module).
This also simplifies how case-insensitivity is handled, and fixes a corner case
where the wrong exception was raised when the size ends in "BB".
fixes#1812
Signed-off-by: David-Sarah Hopwood <davidsarah@mint>
This contains several merged patches. Individual messages follow, latest first:
* Fix a warning from check-miscaptures.
* In retrieve.py, explicitly test whether a key is in self.servermap.proxies
rather than catching KeyError.
* Added a new comment to the MDMF version of the test I removed, explaining
the removal of the SDMF version.
* Removed test_corrupt_all_block_hash_tree_late, since the entire block_hash_tree
is cached in the servermap for an SDMF file.
* Fixed several tests that require files larger than the servermap cache.
* Remove unused test_response_cache_memory_leak().
* Exercise the cache.
* Test infrastructure for counting cache misses on MDMF files.
* Removed the ResponseCache. Instead, the MDMFSlotReadProxy initialized
by ServerMap is kept around so Retrieve can access it. The ReadProxy
has a cache of the first 1000 bytes initially read from each share by
the ServerMap. We're able to satisfy a number of requests out of this
cache, so roundtrips are reduced from 84 to 60 in test_deepcheck_mdmf.
There is still some mystery about under what conditions the cache has
fewer than 1000 bytes. Also this breaks some existing unit tests that
depend on the inner behavior of ResponseCache.
* The servermap.proxies (a cache of SlotReadProxies) is now keyed
by (verinfo,serverid,shnum) rather than just (serverid,shnum)
* Minor cosmetic changes
* Added a test failure if the number of cache misses is too high.
Author: Andrew Miller <amiller@dappervision.com>
Signed-off-by: David-Sarah Hopwood <davidsarah@jacaranda.org>
This prints out which things are different when two sets are expected to be the
same. This was useful to me when debugging the code under test. Hm, this
pattern might be more generally useful...
The unnecessary request was from the upload helper to the sender, and it was
there in order to trigger the sender to deliver cleartext hashes. But we've
long since removed cleartext hashes.
Unit tests pass both before and after this change, and code-coverage shows that
the block I changed is exercised in unit tests.
This probably only works on Linux. It uses sudo to mount and unmount the tmpfs,
which may prompt for a password. refs #20
Signed-off-by: David-Sarah Hopwood <david-sarah@jacaranda.org>
Nevow automatically HTML-escapes strings passed in stan without a raw marker.
Written by MK_FG. fixes#1143
Signed-off-by: David-Sarah Hopwood <david-sarah@jacaranda.org>
The current Twisted release is 12.1.0, which (like 12.0.0 before it)
isn't compatible with foolscap-0.6.2 and earlier. We previously required
foolscap>=0.6.1, since that's all we actually need from foolscap itself.
_auto_deps specifies twisted>=11.0.0, so any system that can't meet that
will install the current Twisted (12.1.0), which will give them
something incompatible with foolscap-0.6.1 and 0.6.2 .
If we're limited to setuptools's declarative constraint language (and
can't have a function which evaluates the available dependency versions
and gives recommendations on which to change), then the only safe
approach is to make sure that any acceptable Foolscap version will be
compatible with all acceptable Twisted versions. So, bump the foolscap
dependency to >=0.6.3, which covers all currently-known
incompatibilities.
The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.
The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.
Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.
Closes#1467
Improve the column headers to make it clear that this list shows Tub
IDs. (we can't show pubkey-based serverids because clients don't give
those to us: only servers provide pubkeys). This should be the only
place in the whole webapi that shows TubIDs for modern (V2-introducer)
nodes.
This makes it easy to distinguish between old V1-Introducer
nodes (identified by their Foolscap TubID) and new V2 nodes (identified
by their ed25519 pubkey).
This fixes a few places where we used to display a tubid even if we had
a pubkey, making it hard to visually correlate servers in two different
displays. It also cleans up the way we pass serverids to the JS-based
download timeline.
The "introweb" subscribed-clients list still shows tubids.
The _upload_resumable() test interrupts a Helper upload partway
through (by shutting down the Helper), then restarts the Helper and
resumes the upload. The control flow is kind of tricky: to do anything
"partway through" requires adding a hook to the Uploadable. The previous
flow depended upon a (fragile) call to self.stall(), which waits a fixed
number of seconds.
This removes one of those stall() calls (the remainder is in
test/common.py and I'll try removing it in a subsequent revision). It
also removes some now-redundant wait_for_connections() calls, since
bounce_client() doesn't fire its Deferred until the client has finished
coming back up (and uses wait_for_connections() internally to do so).
There was one corner case (where the client disconnects at just the
wrong time) that could have dropped a Deferred, leading to an Unhandled
Error. Clean up the control flow to avoid this case.
This prepares for invitation-based reciprocal-permission Accounting. In
the scheme I'm developing, nodes publish "I accept shares from Y"
messages, which are assembled into a graph, and server will accept
shares from any client node reachable in this graph. For this to work,
the serverX->clientY edge must be connectable to the serverY->clientZ
edge, which means "clientY" and "serverY" must be connected. If clientY
and serverY are two distinct keys, they must be cross-signed. Life is
easier if there's just one key "Y", rather than distinct client- and
server- keys. Calling this one key "server.privkey" would be confusing.
"node.privkey" and "node.pubkey" makes more sense.
One-server-per-node is a pretty easy restriction. Originally I was
thinking that the client.key should be provided in each webapi call,
just like a filecap is, making a single node useable by multiple users
(Accounting principals), and not providing any ambient storage
authority. But I've been unable to think of a comfortable WUI for
that (at least without requiring javascript), nor a friendly way to
transfer account authority (e.g. writecaps that include storage
authority). So I'm more willing to have one-client-per-node these days.
(and note that this rename doesn't seriously preclude
many-clients-per-node or zero-clients-per-node anyways, it just makes
one-client-per-node less awkward)
DeepResultsBase also has a get_corrupt_shares(), and it is populated
from CheckResults.get_corrupt_shares(). It has been updated too, along
with get_remaining_corrupt_shares().
Remove temporary get_new_corrupt_shares() and
get_new_incompatible_shares().
This changes all code which feeds CheckResults(sharemap=) to provide
IServer instances, but CheckResults converts these to old-style
serverids during output, so downstream code doesn't have to change yet.
It adds a temporary get_new_sharemap(), which *does* return IServer
instances, so the immutable repairer can build new CheckResults from an
old one. This will go away when get_sharemap() is updated to return
IServer (and downstream code is updated too).
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.
This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.
Closes#1729
This stores IDisplayableServer-providing instances (StubServers or
NativeStorageServers) in the .servermap and .sharemap dictionaries. But
get_servermap()/get_sharemap() still return data structures with
serverids, not IServers, by translating their data on the way out. This
lets us put off changing the callers for a little bit longer.
IDisplayableServer includes just enough functionality to call
.get_name() and friends, which is all that the UploadResults really
need. IServer is a superset that includes actual share-manipulation
methods. StubServer instances provide only IDisplayableServer, while
actual NativeStorageServer instances provide the full IServer interface.
When the Helper sends a serverid (so we know what to call the server but
nothing else about it, and have no corresponding NativeStorageServer
object to reference), but we want to store an IDisplayableServer in the
UploadResults, we create a synthetic StubServer "server" and store that
instead.
Complete the getter-based transformation, by hiding ".uri" and updating
callers to use get_uri(). Also don't set a dummy self._uri, leave it
undefined until someone calls set_uri().
This hides attributes with e.g. _sharemap, and creates getters like
get_sharemap() to access them, for every field except .uri . This will
make it easier to modify the internal representation of .sharemap
without requiring callers to adjust quite yet.
".uri" has so many users that it seemed better to update it in a
subsequent patch.
Populate most of UploadResults (except .uri, which is learned later when
using a Helper) in the constructor, instead of allowing creators to
write to attributes later. This will help isolate the fields that we
want to change to use IServers.
This splits the pb.Copyable on-wire object (HelperUploadResults) out
from the local results object (UploadResults). To maintain compatibility
with older Helpers, we have to leave pb.Copyable classes alone and
unmodified, but we want to change UploadResults to use IServers instead
of serverids. So by using a different class on the wire, and translating
to/from it on either end, we can accomplish both.
This measured how long the Helper took to do a filecheck before asking
for ciphertext. The "Contacting Helper" report includes both
existence_check and the client-helper RTT.
For non-overlapping uploads, it was being returned correctly. But when
multiple upload requests overlapped, and the file was not already in the
grid, the filecheck would only run once, and its existence_check time
would be reported for all uploaders (even if they didn't have to wait
for that time). Cleaning that up proved too difficult: the only correct
place to report this time is from the initial remote_upload_chk() call,
but the return value of that is too constrained to accomodate it in the
needs-upload case.
So I'm removing it altogether. Eventually I plan to add a proper
events/times field and record more data, including this check, in a form
that can be drawn on a nice zoomable timeline view.
Old clients talking to a new Helper (which doesn't supply the value)
will tolerate the loss (they'll just display an empty field on the web
view).
Unlike set.union(), which returns a new set, DictOfSets.union() modified
the DictOfSets in-place. The name collision bit me when I changed some
code from using DictOfSets to a normal set, and expected that
set.union() would modify the set in-place. Since there was only one user
of DictOfSets.union, I figured it was safer to just get rid of it.
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).
In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).
The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.
I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
Previously, test_runner sometimes fails because the _node_has_started()
poller fires after the portnum file has been opened, but before it has
actually been filled, allowing the test process to observe an empty file,
which flunks the test.
This adds a new fileutil.write_atomically() function (using the usual
write-to-.tmp-then-rename approach), and uses it for both node.url and
client.port . These files are written a bit before the node is really up and
running, but they're late enough for test_runner's purposes, which is to know
when it's safe to read client.port and use 'tahoe restart' (and therefore
SIGINT) to restart the node.
The current node/client code doesn't offer any better "are you really done
with startup" indicator.. the ideal approach would be to either watch the
logfile, or connect to its flogport, but both are a hassle. Changing the node
to write out a new "all done" file would be intrusive for regular
operations.
t=info contains randomly-generated ophandles, and t=rename-form contains the
name of the child being renamed, so neither is eligible for a
short-circuiting ETag. Enhanced test_web to exercise this. Had to improve
FakeCHKFileNode slightly to let it participate. Refs #443.
When client does a conditional GET/HEAD with If-none-match:, if the condition
fails (ie, the client's ETag matches the file's) then we can short-circuit
the whole process and immediately return an empty body.
Like immutable files, the ETag is based on the storage index. However, since
a directory is a special interpretation of a file, it is distinguished from
the file by prepending "DIR:" onto the start of the ETag, and adding
-representation on the end (where -representation is the ?t= argument, json,
info, etc).
It also checks the return of setETag and avoids generating a representation
if the client already has it.
test_web.py: use shouldFail2(), safer than old shouldFail()
directory.py: forbid slashes in from_name=, return BAD_REQUEST instead of
GONE when trying to move into a non-directory
The move webapi function now takes a target_type argument which lets it
know whether the target is a subdirectory name or URI. This is an
improvement over the old system in which the move handler tried to guess
whether the target was a name or a URI. Also fixed a little docs
copypaste problem and tweaked some line wrapping.
This adds "move file" capability to the web UI's directory display. The
support and test framework is heavily based on the similar "rename file"
feature. Unit tests and documentation are included. Multiple in-progress
versions of this patch may be found in ticket 1579. This version
includes arbitrary URI target support and is compatible with the change
from tahoe_css to tahoe.css.
'serverid' is the pubkey (for V2 clients), falling back to the tubid (for V1
clients). This also required cleaning up the way the index is created for the
old V1 introducer.
This significantly cleans up the IntroducerServer web-status renderers.
Instead of poking around in the introducer's internals, now the web-status
renderers get clean AnnouncementDescriptor and SubscriberDescriptor
objects. They are still somewhat foolscap-centric, but will provide a clean
abstraction boundary for future improvements.
The specific #1721 bug was that old (V1) subscribers were handled by
wrapping their RemoteReference in a special WrapV1SubscriberInV2Interface
object, but the web-status display was trying to peek inside the object to
learn what host+port it was associated with, and the wrapper did not proxy
those extra attributes.
A test was added to test_introducer to make sure the introweb page renders
properly and at least contains the nicknames of both the V1 and V2 clients.
This was a premature feature addition to the mock filenode, and gets in the
way of the IServer refactoring I'm trying to do. Best to remove it now and
re-introduce it in a better form later when it's actually needed.
This avoids the name collision between the actual results
objects (defined in allmydata.check_results) and the code that renders
these objects into HTML (defined in allmydata.web.check_results). Only
the web-side objects were renamed.
This fixes bug #1689. Repair was using MODE_READ to build the servermap,
which doesn't try hard enough to grab the privkey, and also doesn't guarantee
sending queries to all servers. This patch adds a new MODE_REPAIR which does
both, and does a separate, distinct mapupdate to start wth repair cycle,
instead of relying upon the (MODE_CHECK) mapupdate leftover from the
filecheck that triggered the repair.
SystemTest has a couple of different phases, separated by a poller which
waits for everything to be idle (all messages delivered, none in flight). It
does this by watching some internal "_debug_outstanding" counters in the
server and in each client, and waiting for them to hit zero.
Just before the last phase, we replace the server with a new one (to make
sure clients re-send their messages properly). Unfortunately, the polling
function closed over the variable holding the original server, and didn't see
the replacement. It kept polling the old server, and failed to notice the
outstanding messages for the new server. The last phase of the test (check3)
was started too early, which failed (since some messages had not yet been
delivered), and then exploded in a flurry of dirty-reactor errors (because
some messages were delivered after test shutdown).
This replaces the closed-over-variable with a "self.the_introducer", which
seems to fix the race.
One additional place to look at in the future: the client
announcement-receive path (remote_announce) uses an eventually(). If the
message has been received and the eventual-send posted (but not yet executed)
when the poller sees it, the poller might erroneously conclude that the
client is idle and cause the same problem as above. To fix this, the poller
(probably all pollers) could be enhanced to do a flushEventualQueue before
querying the are-we-done-yet predicate function.
This still leaves immutable-publish results incorrectly using tubids instead
of serverids. That will need some more work, since it might change the Helper
interface.
This introduces new client and server halves to the Introducer (renaming the
old one with a _V1 suffix). Both have fallbacks to accomodate talking to a
different version: the publishing client switches on whether the server's
.get_version() advertises V2 support, the server switches on which
subscription method was invoked by the subscribing client.
The V2 protocol sends a three-tuple of (serialized announcement dictionary,
signature, pubkey) for each announcement. The V2 server dispatches messages
to subscribers according to the service-name, and throws errors for invalid
signatures, but does not otherwise examine the messages. The V2 receiver's
subscription callback will receive a (serverid, ann_dict) pair. The
'serverid' will be equal to the pubkey if all of the following are true:
the originating client is V2, and was told a privkey to use
the announcement went through a V2 server
the signature is valid
If not, 'serverid' will be equal to the tubid portion of the announced FURL,
as was the case for V1 receivers.
Servers will create a keypair if one does not exist yet, stored in
private/server.privkey .
The signed announcement dictionary puts the server FURL in a key named
"anonymous-storage-FURL", which anticipates upcoming Accounting-related
changes in the server advertisements. It also provides a key named
"permutation-seed-base32" to tell clients what permutation seed to use. This
is computed at startup, using tubid if there are existing shares, otherwise
the pubkey, to retain share-order compatibility for existing servers.
The "#section" declaration (which matches id="section") should have been
".section" (which matches class="section").
The welcome page has a feature that I actually liked: the little "This
Client" sidebar sits just to the right of the start of the Controls block.
Fixing .section broke that (the clear:both introduces a gap, forcing the
Controls block to start strictly below the bottom of the This Client block).
So I also removed class="section" from the Controls block to allow them to
share the horizontal space again.
The removed assertions are appropriate for a download that seeks to
return plaintext to a caller; if we don't have at least k active remote
shares, then we can't hope to do that. They're not appropriate for a
verification operation; a user can try to verify a file that has fewer
than k shares available, so that shouldn't be treated as an error.
Instead, we proceed with fewer than k shares, and ensure that we
terminate the download if we have no shares at all and we're verifying.
test_verify_mdmf_all_bad_sharedata tests for the regression described
in ticket 1648. In particular, it will trigger the misplaced assertion
in the share activation code. It also tests to make sure that
verification continues with fewer than k shares.
* replace DeferredList with gatherResults, simplify result handling
* use BadShareError to signal recoverable problems in either fetch or
validate, catch after _validate_block
* _validate_block is thus not responsible for noticing fetch problems
* rename _validation_or_decoding_failed() to _handle_bad_share()
* _get_needed_hashes() returns two Deferreds, instead of a hard-to-unpack
DeferredList
The remaining work is to write additional tests.
src/allmydata/test/no_network.py:
This supports tests in which servers leave the grid only to return with
their shares intact at a later time.
src/allmydata/test/test_mutable.py:
The UCWEs in the incident reports associated with #1628 all seem to be
associated with shares that the servermap knows about, but which aren't
accounted for during the publish process for whatever reason. Specifically,
it looks like the publisher is only capable of keeping track of a single
storage server for a given share. This makes the repair process worse than
it was pre-MDMF at updating all of the shares of a particular file to the
newest version, and can also cause spurious UCWEs. This test simulates such
a layout and fails if an UCWE is thrown. We need to write another test to
ensure that all copies of a share are updated to the latest version (or
alter this test to do that), so that the test suite doesn't pass unless both
regressions are fixed.
We want the publisher to follow the existing share placement when uploading
a new version of a mutable file, and we don't want this test to pass unless
it does.
src/allmydata/mutable/publish.py:
Before this commit, the publisher only kept track of a single writer for
each share. This is insufficient to handle updates in which a single share
may live on multiple servers. In the best case, an update will only update
one of the existing shares instead of all of them. In some cases, the update
will encounter the existing shares when publishing some other share,
interpret it as a sign of an uncoordinated update, and fail. Keeping track
of all of the writers helps ensure that all existing shares are updated, and
helps avoid spurious uncoordinated write errors.
This replaces the setup.cfg aliases that run "darcsver" before each major
command with the new "update_version". update_version is defined in setup.py,
and tries to get a version string from either darcs or git (or leaves the
existing _version.py alone if neither VC metadata is available).
Also clean up a tiny typo in verlib.py that messed up syntax hilighting.
We're slowly moving away from Nevow, and marcusw's previous patch removed
uses of the formless CSS file, so now we can stop testing that nevow can find
that file, and remove the lingering unused "import formless" call.
They were probably meant to be links to webform_css, but we aren't really
using Nevow's form-generation code anyways, so they can just be removed.
Thanks to 'marcusw' for the catch.
* use d3.js v2.4.6
* add a "toggle misc events" button, to get hash/bitmap-checking details
* only draw data that's on screen, for speed
* add fragment-arg to fetch timeline data.json from somewhere else
rolling back:
Thu Sep 29 23:46:28 MDT 2011 zooko@zooko.com
* debugprint the values of blocks and hashes thereof; make the test data and the seg size small in order to make the debugprints easy to look at
M ./src/allmydata/mutable/publish.py -1 +2
M ./src/allmydata/mutable/retrieve.py +3
M ./src/allmydata/test/test_mutable.py -2 +2
This fixes a test failure found against current Twisted trunk in
test_mutable.Filenode.test_retrieve_producer_mdmf (when it uses
PausingAndStoppingConsumer). There must be some sort of race: I could
make it fail against Twisted-11.0 if I just increased the 0.5s delay in
test_download.PausingAndStoppingConsumer to about 0.6s, and could make
Twisted-trunk pass by reducing it to about 0.3s .
I fixed the test (as opposed to the bug) by replacing the delay with a
simple reliable eventually(), and adding extra asserts to fail the test
if the consumer's write() method is called while the producer is
supposed to be paused
The bug itself was that mutable.retrieve.Retrieve wasn't checking the
"stopped" flag after resuming from a pause, and thus delivered one
segment to a consumer that wasn't expecting it. I split out
stopped-flag-checking to separate function, which is now called
immediately after _check_for_paused(). I also cleaned up some Deferred
usage and whitespace.
* fix tahoe.cfg control of default mutable type
* tolerate arbitrary case in [client]mutable.format value
* small docs improvements
* use get_mutable_type() as a format-is-mutable predicate
* tighten up error message
* fix CLI commands (put, mkdir) to send format=, not mutable-type=
* fix tests
* test_cli: fix tests that observe t=json output, don't ignore failures in
'tahoe put'
* fix handling of version= to make it easier to use the default
* interpret ?mutable=true&format=MDMF as MDMF, not SDMF
The filecaps used to be produced with hints for 'k' and segsize, but they
weren't actually used, and doing so had the potential to limit how we change
those filecaps in the future. Also the parsing code had some problems dealing
with other numbers of extensions. Removing the existing fields and making the
parser tolerate (and ignore) extra ones makes MDMF more future-proof.
Really, all the upload/modify APIs should take a string or a filehandle, and
internally wrap it as needed. Callers should not need to be aware of
Uploadable() or MutableData() classes.
* storage server ignores requests to extend shares by sending a new_length
* storage server fills exposed holes (created by sending a write vector whose offset begins after the end of the current data) with 0 to avoid "palimpsest" exposure of previous contents
* storage server zeroes out lease info at the old location when moving it to a new location
ref. #1528
Declare explicitly that we prevent this problem in the server's version dict.
fixes#1528 (there are two patches that are each a sufficient fix to #1528 and this is one of them)
We're removing this function because it is currently unused, because it is dangerous, and because the bug described in #1528 leaks the cancellation secret, which allows anyone who knows a file's storage index to abuse this function to delete shares of that file.
fixes#1528 (there are two patches that are each a sufficient fix to #1528 and this is one of them)
We're removing this function because it is currently unused, because it is dangerous, and because the bug described in #1528 leaks the cancellation secret, which allows anyone who knows a file's storage index to abuse this function to delete shares of that file.
ref. #1528
This ought to close the potential for dropped errors and hanging downloads.
Verify needs to be examined, I may have broken it, although all tests pass.
This check needs to be done with each fetch from the storage server, to
detect when someone has changed the share (i.e. our servermap goes stale).
Doing it just once at the beginning of retrieve isn't enough: a write might
occur after the first segment but before the second, etc.
_try_to_validate_prefix() was not removed: it will be used by the future
check-with-each-fetch code.
test_mutable.Roundtrip.test_corrupt_all_seqnum_late was disabled, since it
fails until this check is brought back. (the corruption it applies only
touches the prefix, not the block data, so the check-less retrieve actually
tolerates it). Don't forget to re-enable it once the check is brought back.
This is a neat trick to reduce Foolscap overhead, but the need for an
explicit flush() complicates the Retrieve path and makes it prone to
lost-progress bugs.
Also change test_mutable.FakeStorageServer to tolerate multiple reads of the
same share in a row, a limitation exposed by turning off the queue.
Note that the downloader will still fetch a segment for a zero-length
read, which is wasteful. Fixing that isn't specifically required to fix
#1512, but it should probably be fixed before 1.9.
This first step shaves 15% off the runtime: from 139s to 119s on my laptop.
It also fixes a couple of places where a Deferred was being dropped, which
would cause two tests to run in parallel and also confuse error reporting.
This consistently records all immutable uploads in the Recent Uploads And
Downloads page, regardless of code path. Previously, certain webapi upload
operations (like PUT /uri/$DIRCAP/newchildname) failed to pass the History
object and were left out.
publish:
* encrypt and encode times are cumulative, not just current-segment
retrieve:
* same for decrypt and decode times
* update "current status" to include segment number
* set status to Finished/Failed when download is complete
* set progress to 1.0 when complete
More improvements to consider:
* progress is currently 0% or 100%: should calculate how many segments are
involved (remembering retrieve can be less than the whole file) and set it
to a fraction
* "fetch" time is fuzzy: what we want is to know how much of the delay is not
our own fault, but since we do decode/decrypt work while waiting for more
shares, it's not straightforward