This should put an end to the phenomenon I've been seeing that a single hung server can cause all downloads on a grid to hang. Also it should speed up all downloads by (a) not-waiting for responses to queries that it doesn't need, and (b) downloading shares from the servers which answered the initial query the fastest.
Also, do not count how many buckets you've gotten when deciding whether the download has enough shares or not -- instead count how many buckets to *unique* shares that you've gotten. This appears to improve a slightly weird behavior in the current download code in which receiving >= K different buckets all to the same sharenumber would make it think it had enough to download the file when in fact it hadn't.
This patch needs tests before it is actually ready for trunk.
Having both test_node() and test_client() (one of which calls the other) felt
confusing to me, so I changed it to have test_node(), test_client(), and a
common do_create() helper method.
This patch displays a warning to the user in two cases:
1. When special files like symlinks, fifos, devices, etc. are found in the
local source.
2. If files or directories are not readables by the user running the 'tahoe
backup' command.
In verbose mode, the number of skipped files and directories is printed at the
end of the backup.
Exit status returned by 'tahoe backup':
- 0 everything went fine
- 1 the backup failed
- 2 files were skipped during the backup
allmydata.util.log.err() either takes a Failure as the first positional
argument, or takes no positional arguments and must be invoked in an
exception handler. Fixed its signature to match both foolscap.logging.log.err
and twisted.python.log.err . Included a brief unit test.
Stop checking separately for ConnectionDone/ConnectionLost, since those have
been folded into DeadReferenceError since foolscap-0.3.1 . Write
rrefutil.trap_deadref() in terms of rrefutil.trap_and_discard() to improve
code coverage.
Verifier misses
The results (described in #819) match our expectations: it misses corruption
in unused share fields and in most container fields (which are only visible
to the storage server, not the client). 1265 bytes of a 2753 byte
share (hosting a 56-byte file with an artifically small segment size) are
unused, mostly in the unused tail of the overallocated UEB space (765 bytes),
and the allocated-but-unwritten plaintext_hash_tree (480 bytes).
The bug was that a disconnected server could cause us to re-enter the initial
loop() call, sending multiple queries to a single server, provoking an
incorrect UCWE. To fix it, stall the loop() with an eventual.fireEventually()
instead of weird errors. Closes#874 and #786.
Previously, if the file had 0 shares, this would raise TypeError as it tried
to call download_version(None). If the file had some shares but fewer than
'k', it would incorrectly raise MustForceRepairError.
Added get_successful() to the IRepairResults API, to give repair() a place to
report non-code-bug problems like this.
Mutable servermap updates and the immutable checker, when run with
add_lease=True, send both the do-you-have-block and add-lease commands in
parallel, to avoid an extra round trip time. Many older servers have problems
with add-lease and raise various exceptions, which don't generally matter.
The client-side code was catching+ignoring some of them, but unrecognized
exceptions were passed through to the DYHB code, concealing the DYHB results
from the checker, making it think the server had no shares.
The fix is to separate the code paths. Both commands are sent at the same
time, but the errback path from add-lease is handled separately. Known
exceptions are ignored, the others (both unknown-remote and all-local) are
logged (log.WEIRD, which will trigger an Incident), but neither will affect
the DYHB results.
The add-lease message is sent first, and we know that the server handles them
synchronously. So when the checker is done, we can be sure that all the
add-lease messages have been retired. This makes life easier for unit tests.
web/filenode.py: also serve edge metadata when using t=json on a
DIRCAP/childname object.
tahoe_ls.py: list file objects as if we were listing one-entry directories.
Show edge metadata if we have it, which will be true when doing
'tahoe ls DIRCAP/filename' and false when doing 'tahoe ls
FILECAP'
This forbids operations that would implicitly create a directory with a
zero-length (empty string) name, like what you'd get if you did "tahoe put
local /oops/blah" (#358) or "POST /uri/CAP//?t=mkdir" (#676). The error
message is fairly friendly too.
Also added code to "tahoe put" to catch this error beforehand and suggest the
correct syntax (i.e. without the leading slash).
The webapi has been looking for an Accept header since 1.4.0, but it treats a
missing header as equal to */* (to honor RFC2616). This change finally
modifies our CLI tools to ask for "text/plain, application/octet-stream",
which seems roughly correct (we either want a plain-text traceback or error
message, or an uninterpreted chunk of binary data to save to disk). Some day
we'll figure out how JSON fits into this scheme.
under normal conditions, this wouldn't cause any problems, but if the shares
are really sparse (perhaps because new servers were added), then
file-modifies might stop looking too early and leave old shares in place
* remove Downloader.download_to_data/download_to_filename/download_to_filehandle
* remove download.Data/FileName/FileHandle targets
* remove filenode.download/download_to_data/download_to_filename methods
* leave Downloader.download (the whole Downloader will go away eventually)
* add util.consumer.MemoryConsumer/download_to_data, for convenience
(this is mostly used by unit tests, but it gets used by enough non-test
code to warrant putting it in allmydata.util)
* update tests
* removes about 180 lines of code. Yay negative code days!
Overall plan is to rewrite immutable/download.py and leave filenode.read() as
the sole read-side API.
* backups now share dirnodes with any previous backup, in any location,
so renames and moves are handled very efficiently
* "tahoe backup" no longer bothers reading the previous snapshot
* if you switch grids, you should delete ~/.tahoe/private/backupdb.sqlite,
to force new uploads of all files and directories
The proper hierarchy is:
IFilesystemNode
+IFileNode
++IMutableFileNode
++IImmutableFileNode
+IDirectoryNode
Also expand test_client.py (NodeMaker) to hit all IFilesystemNode types.
* stop caching most_recent_size in dirnode, rely upon backing filenode for it
* start caching most_recent_size in MutableFileNode
* return None when you don't know, not "?"
* only render None as "?" in the web "more info" page
* add get_size/get_current_size to UnknownNode
* change t=mkdir-with-children to not use multipart/form encoding. Instead,
the request body is all JSON. t=mkdir-immutable uses this format too.
* make nodemaker.create_immutable_dirnode() get convergence from SecretHolder,
but let callers override it
* raise NotDeepImmutableError instead of using assert()
* add mutable= argument to DirectoryNode.create_subdirectory(), default True
* "cap" means a python instance which encapsulates a filecap/dircap (uri.py)
* "uri" means a string with a "URI:" prefix
* FileNode instances are created with (and retain) a cap instance, and
generate uri strings on demand
* .get_cap/get_readcap/get_verifycap/get_repaircap return cap instances
* .get_uri/get_readonly_uri return uri strings
* add filenode.download_to_filename() for control.py, should find a better way
* use MutableFileNode.init_from_cap, not .init_from_uri
* directory URI instances: use get_filenode_cap, not get_filenode_uri
* update/cleanup bench_dirnode.py to match, add Makefile target to run it
This is safer: in the earlier API, an old webapi server would silently ignore
the initial children, and clients trying to set them would have to fetch the
newly-created directory to discover the incompatibility. In the new API,
clients using t=mkdir-with-children against an old webapi server will get a
clear error.
instead of creating an empty file and then adding the children later.
This should speed up mkdir(initial_children) considerably, removing two
roundtrips and an entire read-modify-write cycle, probably bringing it down
to a single roundtrip. A quick test (against the volunteergrid) suggests a
30% speedup.
test_dirnode: add new tests to enforce the restrictions that interfaces.py
claims for create_new_mutable_directory(): no UnknownNodes, metadata dicts
interfaces.py: define INodeMaker, document argument values, change
create_new_mutable_directory() to take dict-of-nodes. Change
dirnode.set_nodes() and dirnode.create_subdirectory() too.
nodemaker.py: use INodeMaker, update create_new_mutable_directory()
client.py: have create_dirnode() delegate initial_children= to nodemaker
dirnode.py (Adder): take dict-of-nodes instead of list-of-nodes, which
updates set_nodes() and create_subdirectory()
web/common.py (convert_initial_children_json): create dict-of-nodes
web/directory.py: same
web/unlinked.py: same
test_dirnode.py: update tests to match
invoked with the new MutableFileNode and is supposed to return the initial
contents. This can be used by e.g. a new dirnode which needs the filenode's
writekey to encrypt its initial children.
create_mutable_file() still accepts a bytestring too, or None for an empty
file.
We need to carefully document the licence of figleaf in order to get Tahoe-LAFS into Ubuntu Karmic Koala. However, figleaf isn't really a part of Tahoe-LAFS per se -- this is just a "convenience copy" of a development tool. The quickest way to make Tahoe-LAFS acceptable for Karmic then, is to remove figleaf from the Tahoe-LAFS tarball itself. People who want to run figleaf on Tahoe-LAFS (as everyone should want) can install figleaf themselves. I haven't tested this -- there may be incompatibilities between upstream figleaf and the copy that we had here...
This makes it more obvious that the Helper currently generates leases with
the Helper's own secrets, rather than getting values from the client, which
is arguably a bug that will likely be resolved with the Accounting project.
child of the client, access with client.downloader instead of
client.getServiceNamed("downloader"). The single "Downloader" instance is
scheduled for demolition anyways, to be replaced by individual
filenode.download calls.
* stop using IURI as an adapter
* pass cap strings around instead of URI instances
* move filenode/dirnode creation duties from Client to new NodeMaker class
* move other Client duties to KeyGenerator, SecretHolder, History classes
* stop passing Client reference to dirnode/filenode constructors
- pass less-powerful references instead, like StorageBroker or Uploader
* always create DirectoryNodes by wrapping a filenode (mutable for now)
* remove some specialized mock classes from unit tests
Detailed list of changes (done one at a time, then merged together)
always pass a string to create_node_from_uri(), not an IURI instance
always pass a string to IFilesystemNode constructors, not an IURI instance
stop using IURI() as an adapter, switch on cap prefix in create_node_from_uri()
client.py: move SecretHolder code out to a separate class
test_web.py: hush pyflakes
client.py: move NodeMaker functionality out into a separate object
LiteralFileNode: stop storing a Client reference
immutable Checker: remove Client reference, it only needs a SecretHolder
immutable Upload: remove Client reference, leave SecretHolder and StorageBroker
immutable Repairer: replace Client reference with StorageBroker and SecretHolder
immutable FileNode: remove Client reference
mutable.Publish: stop passing Client
mutable.ServermapUpdater: get StorageBroker in constructor, not by peeking into Client reference
MutableChecker: reference StorageBroker and History directly, not through Client
mutable.FileNode: removed unused indirection to checker classes
mutable.FileNode: remove Client reference
client.py: move RSA key generation into a separate class, so it can be passed to the nodemaker
move create_mutable_file() into NodeMaker
test_dirnode.py: stop using FakeClient mockups, use NoNetworkGrid instead. This simplifies the code, but takes longer to run (17s instead of 6s). This should come down later when other cleanups make it possible to use simpler (non-RSA) fake mutable files for dirnode tests.
test_mutable.py: clean up basedir names
client.py: move create_empty_dirnode() into NodeMaker
dirnode.py: get rid of DirectoryNode.create
remove DirectoryNode.init_from_uri, refactor NodeMaker for customization, simplify test_web's mock Client to match
stop passing Client to DirectoryNode, make DirectoryNode.create_with_mutablefile the normal DirectoryNode constructor, start removing client from NodeMaker
remove Client from NodeMaker
move helper status into History, pass History to web.Status instead of Client
test_mutable.py: fix minor typo
webapi.txt: clarify replace=only-files argument, mention replace= on POST t=uri
test_cli.py: insert whitespace between logical operations
web.common.parse_replace_arg: make it case-insensitive, to match the docs
we actually exercise during tests) into more specific exceptions, so they
don't get optimized away. The best rule to follow is probably this: if an
exception is worth testing, then it's part of the API, and AssertionError
should never be part of the API. Closes#749.
not just the dirnode writecap. The previous code (which only hashed the
dirnode writecap) would use the same key for all children, which is very bad.
This is the correct implementation of #750.
If you open up a directory containing thousands of files, it currently computes the cache filename and checks for the cache file on disk immediately for each immutble file in that directory. With this patch, it delays those steps until you try to do something with an immutable file that could use the cache.
The idea is that future versions of Tahoe will add new URI types that this
version won't recognize, but might store them in directories that we *can*
read. We should handle these "objects from the future" as best we can.
Previous releases of Tahoe would just explode. With this change, we'll
continue to be able to work with everything else in the directory.
The code change is to wrap anything we don't recognize as an UnknownNode
instance (as opposed to a FileNode or DirectoryNode). Then webapi knows how
to render these (mostly by leaving fields blank), deep-check knows to skip
over them, deep-stats counts them in "count-unknown". You can rename and
delete these things, but you can't add new ones (because we wouldn't know how
to generate a readcap to put into the dirnode's rocap slot, and because this
lets us catch typos better).
This reduces the total test time on my laptop from 400s to 283s.
* src/allmydata/test/test_system.py (SystemTest.test_mutable._test_debug):
Remove assertion about container_size/data_size, this changes with keysize
and was too variable anyways.
* src/allmydata/mutable/filenode.py (MutableFileNode.create): add keysize=
* src/allmydata/dirnode.py (NewDirectoryNode.create): same
* src/allmydata/client.py (Client.DEFAULT_MUTABLE_KEYSIZE): add default,
this overrides the one in MutableFileNode
This will at least turn the really really weird error when a repair of a
readonly mutable file is attempted into a merely really weird assertion that
mentions "repair currently requires a writecap".
Instead of testing to see that the previous SDMF filesize limit was being
obeyed, we now test to make sure that we can insert files larger than that
limit.
ticketed in http://divmod.org/trac/ticket/2830 and doesn't need a Tahoe-side
change, plus this test fails on win32 for unrelated reasons (and test_client
is the place to think about the win32 issue).
and deny the Helper the ability to mount a partial-information-guessing
attack. This will probably break compatibility between new clients and very
old (pre-1.0) helpers.
* calculate depth-first with math instead of traversing the actual tree
* don't mark a node with a red dot if you instead compare it with an extant hash value (tiny optimization)
* edit a comment about checking the root node
* emit lease expiry date in ISO-8601'ish format as well as Brian's format
* rename iso_utc_time_to_localseconds() to iso_utc_time_to_seconds()
* add iso_utc_date()
* simplify the body of iso_utc_time_to_seconds()
A test failed on draco (MacPPC) because it took 67.1 seconds to get around to running the test, and the node had already stopped itself when the hotline file was 60 seconds old.
This walks slowly through all shares, examining their leases, deciding which
are still valid and which have expired. Once enabled, it will then remove the
expired leases, and delete shares which no longer have any valid leases. Note
that there is not yet a tahoe.cfg option to enable lease-deletion: the
current code is read-only. A subsequent patch will add a tahoe.cfg knob to
control this, as well as docs. Some other minor items included in this patch:
tahoe debug dump-share has a new --leases-only flag
storage sharefile/leaseinfo code is cleaned up
storage web status page (/storage) has more info, more tests coverage
space-left measurement on OS-X should be more accurate (it was off by 2048x)
(use stat .f_frsize instead of f_bsize)
instead of examining the value returned by f.trap, because the latter appears
to squash exception types down into their base classes (i.e. since
ShareVersionIncompatible is a subclass of LayoutInvalid,
f.trap(Failure(ShareVersionIncompatible)) == LayoutInvalid).
All this resulted in 'incompatible' shares being misclassified as 'corrupt'.