The new rules for "bin/tahoe ARG1.. SUBCOMMAND ARG2.." arg:
* --node-directory is only accepted in ARG1, not ARG2
* create-*/start/stop/restart accept --basedir in ARG2, or an explicit
basedir argument
* only one of --node-directory/--basedir/explicit-basedir is accepted
* --quiet/--version is only accepted in ARG1, not ARG2
Closes#166
This should now fail quickly (during "tahoe start"). Previously this
would silently treat an unparseable size as "0", and the only way to
discover that it had had a problem would be to look at the foolscap log,
or examine the storage-service web page for the unexpected "Reserved
Size" number.
Previously, Introducers always used a swissnum of "introducer", so
anyone who could learn the (public) tubid of the introducer would be
able to connect to and use it. This changes new Introducers to use the
same randomly-generated swissnum as clients and storage servers do, so
that you absolutely must learn the introducer.furl from someone who
knows it already before you can connect.
This change also moves the location of the file that stores
introducer.furl from BASEDIR/introducer.furl to
BASEDIR/private/introducer.furl, since that's where we keep the private
things. The first time an introducer is started with the new code, it
will move any existing BASEDIR/introducer.furl into the new place.
Note that this will not change the FURL of existing introducers: it will
only affect newly created ones. When you change an introducer's FURL,
you must also update all of the nodes (clients and storage servers)
which connect to it, so upgrading it to an unguessable one isn't
something we should do automatically.
This stores the sequence number in BASEDIR/announcement-seqnum, and
increments it each time any service is published (every service
announcement is regenerated with the new sequence number). As everyone
knows, time is an illusion, and occasionally goes backwards, so a
counter is generally safer (and reveals less information about the
node).
Later, we'll improve the introducer client to tolerate rollbacks (where,
perhaps due to a VM being restarted from an earlier checkpoint, the
stored sequence number reverts to an earlier version).
twisted.web.html.escape was used to produce html-encoded string (to then look
it up in "value" attribute), but behavior of that function has changed between
Twisted 12.2.0 (simple custom implementation) and 12.3.0 (imported from stdlib
cgi module).
This also simplifies how case-insensitivity is handled, and fixes a corner case
where the wrong exception was raised when the size ends in "BB".
fixes#1812
Signed-off-by: David-Sarah Hopwood <davidsarah@mint>
This contains several merged patches. Individual messages follow, latest first:
* Fix a warning from check-miscaptures.
* In retrieve.py, explicitly test whether a key is in self.servermap.proxies
rather than catching KeyError.
* Added a new comment to the MDMF version of the test I removed, explaining
the removal of the SDMF version.
* Removed test_corrupt_all_block_hash_tree_late, since the entire block_hash_tree
is cached in the servermap for an SDMF file.
* Fixed several tests that require files larger than the servermap cache.
* Remove unused test_response_cache_memory_leak().
* Exercise the cache.
* Test infrastructure for counting cache misses on MDMF files.
* Removed the ResponseCache. Instead, the MDMFSlotReadProxy initialized
by ServerMap is kept around so Retrieve can access it. The ReadProxy
has a cache of the first 1000 bytes initially read from each share by
the ServerMap. We're able to satisfy a number of requests out of this
cache, so roundtrips are reduced from 84 to 60 in test_deepcheck_mdmf.
There is still some mystery about under what conditions the cache has
fewer than 1000 bytes. Also this breaks some existing unit tests that
depend on the inner behavior of ResponseCache.
* The servermap.proxies (a cache of SlotReadProxies) is now keyed
by (verinfo,serverid,shnum) rather than just (serverid,shnum)
* Minor cosmetic changes
* Added a test failure if the number of cache misses is too high.
Author: Andrew Miller <amiller@dappervision.com>
Signed-off-by: David-Sarah Hopwood <davidsarah@jacaranda.org>
This prints out which things are different when two sets are expected to be the
same. This was useful to me when debugging the code under test. Hm, this
pattern might be more generally useful...
This probably only works on Linux. It uses sudo to mount and unmount the tmpfs,
which may prompt for a password. refs #20
Signed-off-by: David-Sarah Hopwood <david-sarah@jacaranda.org>
Nevow automatically HTML-escapes strings passed in stan without a raw marker.
Written by MK_FG. fixes#1143
Signed-off-by: David-Sarah Hopwood <david-sarah@jacaranda.org>
The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.
The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.
Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.
Closes#1467
This makes it easy to distinguish between old V1-Introducer
nodes (identified by their Foolscap TubID) and new V2 nodes (identified
by their ed25519 pubkey).
This fixes a few places where we used to display a tubid even if we had
a pubkey, making it hard to visually correlate servers in two different
displays. It also cleans up the way we pass serverids to the JS-based
download timeline.
The "introweb" subscribed-clients list still shows tubids.
The _upload_resumable() test interrupts a Helper upload partway
through (by shutting down the Helper), then restarts the Helper and
resumes the upload. The control flow is kind of tricky: to do anything
"partway through" requires adding a hook to the Uploadable. The previous
flow depended upon a (fragile) call to self.stall(), which waits a fixed
number of seconds.
This removes one of those stall() calls (the remainder is in
test/common.py and I'll try removing it in a subsequent revision). It
also removes some now-redundant wait_for_connections() calls, since
bounce_client() doesn't fire its Deferred until the client has finished
coming back up (and uses wait_for_connections() internally to do so).
DeepResultsBase also has a get_corrupt_shares(), and it is populated
from CheckResults.get_corrupt_shares(). It has been updated too, along
with get_remaining_corrupt_shares().
Remove temporary get_new_corrupt_shares() and
get_new_incompatible_shares().
This changes all code which feeds CheckResults(sharemap=) to provide
IServer instances, but CheckResults converts these to old-style
serverids during output, so downstream code doesn't have to change yet.
It adds a temporary get_new_sharemap(), which *does* return IServer
instances, so the immutable repairer can build new CheckResults from an
old one. This will go away when get_sharemap() is updated to return
IServer (and downstream code is updated too).
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.
This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.
Closes#1729
This stores IDisplayableServer-providing instances (StubServers or
NativeStorageServers) in the .servermap and .sharemap dictionaries. But
get_servermap()/get_sharemap() still return data structures with
serverids, not IServers, by translating their data on the way out. This
lets us put off changing the callers for a little bit longer.
Complete the getter-based transformation, by hiding ".uri" and updating
callers to use get_uri(). Also don't set a dummy self._uri, leave it
undefined until someone calls set_uri().
This hides attributes with e.g. _sharemap, and creates getters like
get_sharemap() to access them, for every field except .uri . This will
make it easier to modify the internal representation of .sharemap
without requiring callers to adjust quite yet.
".uri" has so many users that it seemed better to update it in a
subsequent patch.
Populate most of UploadResults (except .uri, which is learned later when
using a Helper) in the constructor, instead of allowing creators to
write to attributes later. This will help isolate the fields that we
want to change to use IServers.
This splits the pb.Copyable on-wire object (HelperUploadResults) out
from the local results object (UploadResults). To maintain compatibility
with older Helpers, we have to leave pb.Copyable classes alone and
unmodified, but we want to change UploadResults to use IServers instead
of serverids. So by using a different class on the wire, and translating
to/from it on either end, we can accomplish both.
Unlike set.union(), which returns a new set, DictOfSets.union() modified
the DictOfSets in-place. The name collision bit me when I changed some
code from using DictOfSets to a normal set, and expected that
set.union() would modify the set in-place. Since there was only one user
of DictOfSets.union, I figured it was safer to just get rid of it.
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).
In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).
The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.
I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
Previously, test_runner sometimes fails because the _node_has_started()
poller fires after the portnum file has been opened, but before it has
actually been filled, allowing the test process to observe an empty file,
which flunks the test.
This adds a new fileutil.write_atomically() function (using the usual
write-to-.tmp-then-rename approach), and uses it for both node.url and
client.port . These files are written a bit before the node is really up and
running, but they're late enough for test_runner's purposes, which is to know
when it's safe to read client.port and use 'tahoe restart' (and therefore
SIGINT) to restart the node.
The current node/client code doesn't offer any better "are you really done
with startup" indicator.. the ideal approach would be to either watch the
logfile, or connect to its flogport, but both are a hassle. Changing the node
to write out a new "all done" file would be intrusive for regular
operations.
t=info contains randomly-generated ophandles, and t=rename-form contains the
name of the child being renamed, so neither is eligible for a
short-circuiting ETag. Enhanced test_web to exercise this. Had to improve
FakeCHKFileNode slightly to let it participate. Refs #443.
test_web.py: use shouldFail2(), safer than old shouldFail()
directory.py: forbid slashes in from_name=, return BAD_REQUEST instead of
GONE when trying to move into a non-directory
The move webapi function now takes a target_type argument which lets it
know whether the target is a subdirectory name or URI. This is an
improvement over the old system in which the move handler tried to guess
whether the target was a name or a URI. Also fixed a little docs
copypaste problem and tweaked some line wrapping.
This adds "move file" capability to the web UI's directory display. The
support and test framework is heavily based on the similar "rename file"
feature. Unit tests and documentation are included. Multiple in-progress
versions of this patch may be found in ticket 1579. This version
includes arbitrary URI target support and is compatible with the change
from tahoe_css to tahoe.css.