This substantially changes the internals of "tahoe cp", to behave in
accordance with the scheme developed in ticket:2329. test_cli_cp.py got
a large new test to exercise all the various combinations. This also
changes the set of error messages that "tahoe cp" can produce.
This modifies try_copy(), inserts a new implementation of
copy_things_to_directory() (and supporting methods), and fixes a few
bugs elsewhere.
fixes ticket:2329
This code will be replaced in the next commit with an entirely different
approach, and modifying it in a single commit would yield a completely
unreadable diff.
Instead of constructing a sys.argv for 'twistd' that reads the node's
.tac file, we construct arguments that tell twistd to use a special
in-memory-only plugin that creates the desired node instance directly.
We still use the name of the .tac file to decide which kind of instance
to make (Client, IntroducerNode, KeyGenerator, StatsGatherer), but never
actually read the contents of the .tac file. Later improvements could
change this to look inside the tahoe.cfg for a nodetype= directive, etc.
This also makes it easy to have "tahoe start BASEDIR" pass the rest of
its arguments on to twistd, so e.g. "tahoe start BASEDIR --nodaemon
--profile=prof.out" does what you'd expect "twistd --nodaemon
--profile=prof.out" to do. "tahoe run BASEDIR" is thus simply aliased to
"tahoe start BASEDIR --nodaemon". This removes the need to special-case
--profile and --syslog.
I also removed some of the default logging behavior:
before:
'tahoe start' = 'twistd --logfile BASEDIR logs/twistd.log'
'tahoe start --profile' adds '--profile=profiling_results.prof --savestats'
'tahoe run' = 'twistd --nodaemon --logfile BASEDIR/logs/tahoesvc.log'
after:
'tahoe start' = 'twistd --logfile BASEDIR logs/twistd.log'
unless --logfile, --nodaemon, or --syslog are passed
'tahoe start --profile' invalid, use 'tahoe start --profile=OUTPUT'
'tahoe run' = 'twistd --nodaemon'
so log messages go to stdout
This finally enables 'tahoe run' to work with all node types, including
the key-generator and stats-gatherer.
It gets 'tahoe start' one step closer to accepting --reactor= . To
actually accomplish this will require this file, the enclosing
__init_.py files, and everything they import to avoid importing the
reactor. (if anything imports twisted.internet.reactor before
startstop_node.start() gets to run, then --reactor= comes too late).
That will take a lot of work, and requires lazy-loading of many core
libraries (foolscap.logging in particular), and removing a lot of code
from src/allmydata/__init__.py .
The new rules for "bin/tahoe ARG1.. SUBCOMMAND ARG2.." arg:
* --node-directory is only accepted in ARG1, not ARG2
* create-*/start/stop/restart accept --basedir in ARG2, or an explicit
basedir argument
* only one of --node-directory/--basedir/explicit-basedir is accepted
* --quiet/--version is only accepted in ARG1, not ARG2
Closes#166
* fix CLI commands (put, mkdir) to send format=, not mutable-type=
* fix tests
* test_cli: fix tests that observe t=json output, don't ignore failures in
'tahoe put'
* fix handling of version= to make it easier to use the default
* interpret ?mutable=true&format=MDMF as MDMF, not SDMF
The filecaps used to be produced with hints for 'k' and segsize, but they
weren't actually used, and doing so had the potential to limit how we change
those filecaps in the future. Also the parsing code had some problems dealing
with other numbers of extensions. Removing the existing fields and making the
parser tolerate (and ignore) extra ones makes MDMF more future-proof.
I rerecorded this patch, originally by David-Sarah, to use "darcs replace" instead of editing to do the renames. This uncovered one missed rename in Client.init_drop_uploader. (Which also means that code isn't exercised by the current unit tests.)
refs #1429
I personally used "tahoe start/restart -m ../MY-TESTNET/node*" all the time,
to spin up or update a local testgrid while iterating over new code. However,
with the recent switch from "subprocess.Popen(/bin/twistd)" to "import and
call twistd.run()" in scripts/startstop_node.py (yay fewer processes!),
"start -m" broke, and fixing it requires os.fork, which is unavailable on
windows (boo windows!). And I was probably the only one using -m. So in the
interests of uniformity among platforms and simpler code (yay negative code
days!), we're just removing -m from everything. I will start using a little
shell script or something to simulate the removed functionality.
This patch also cleans up CLI-function calling a bit: get the basedir from
the config dict (instead of sometimes from a separate argument), and always
return a numeric exit code.
Specifically, test_runner.CreateNode.test_client failed, because the
os.fork-is-present test decided that --multiple should not be allowed on
windows, even though --multiple works just fine for 'tahoe create-client'.
The only restriction on --multiple is for 'tahoe start' and 'tahoe restart'.
This needs a different approach, probably by cleaning up BasedirMixin. We
should only be withholding --multiple on windows for "start" and
"restart". (we should continue withholding --multiple on all platforms for
"run").
This reverts (git) commit f3adb037ae:
"startstop_node.py: fix "tahoe start -m" by forking before non-final targets"
* don't advertise -m flag on tahoe start/restart/run unless os.fork is
available (i.e. windows)
* test_runner.py: add test to exercise "start/stop/restart -m"
This removes the need to use a locally-built (dependency) bin/twistd, and
removes a big chunk of behavior differences between unix and windows. It
also happens to resolve the "client node probably started" uncertainty.
Might help with #1190, #602, and #71.
fixes#530. I earlier tried this twice (see #530 for history) and then twice rolled it back due to some problems that arose. However, I didn't write down what the problems were in enough detail on the ticket that I can tell today whether those problems are still issues, so here goes the third attempt. (I did write down on the ticket that it would not create site.py or .pth files in the target directory with --multi-version mode, but I didn't explain why *that* was a problem.)
pyflakes pointed out that the exception handler fallback called an un-imported function, showing that the fallback wasn't being exercised.
I'm not 100% sure that this patch is right and would appreciate François or someone reviewing it.
Tahoe CLI commands working on local files, for instance 'tahoe cp' or 'tahoe
backup', have been improved to correctly handle filenames containing non-ASCII
characters.
In the case where Tahoe encounters a filename which cannot be decoded using the
system encoding, an error will be returned and the operation will fail. Under
Linux, this typically happens when the filesystem contains filenames encoded
with another encoding, for instance latin1, than the system locale, for
instance UTF-8. In such case, you'll need to fix your system with tools such
as 'convmv' before using Tahoe CLI.
All CLI commands have been improved to support non-ASCII parameters such as
filenames and aliases on all supported Operating Systems except Windows as of
now.
This patch modifies the regular expression used for verifying of '--node-url'
parameter. Support for accessing a Tahoe gateway over HTTPS was already
present, thanks to Python's urllib.
This handles the case where we upload a new tahoe directory for a
previously-processed local directory, possibly creating a new dircap (if the
metadata had changed). Now we replace the old dirhash->dircap record. The
previous behavior left the old record in place (with the old dircap and
timestamps), so we'd never stop creating new directories and never converge
on a null backup.
These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months.
This patch displays a warning to the user in two cases:
1. When special files like symlinks, fifos, devices, etc. are found in the
local source.
2. If files or directories are not readables by the user running the 'tahoe
backup' command.
In verbose mode, the number of skipped files and directories is printed at the
end of the backup.
Exit status returned by 'tahoe backup':
- 0 everything went fine
- 1 the backup failed
- 2 files were skipped during the backup
web/filenode.py: also serve edge metadata when using t=json on a
DIRCAP/childname object.
tahoe_ls.py: list file objects as if we were listing one-entry directories.
Show edge metadata if we have it, which will be true when doing
'tahoe ls DIRCAP/filename' and false when doing 'tahoe ls
FILECAP'
This forbids operations that would implicitly create a directory with a
zero-length (empty string) name, like what you'd get if you did "tahoe put
local /oops/blah" (#358) or "POST /uri/CAP//?t=mkdir" (#676). The error
message is fairly friendly too.
Also added code to "tahoe put" to catch this error beforehand and suggest the
correct syntax (i.e. without the leading slash).
The webapi has been looking for an Accept header since 1.4.0, but it treats a
missing header as equal to */* (to honor RFC2616). This change finally
modifies our CLI tools to ask for "text/plain, application/octet-stream",
which seems roughly correct (we either want a plain-text traceback or error
message, or an uninterpreted chunk of binary data to save to disk). Some day
we'll figure out how JSON fits into this scheme.
* backups now share dirnodes with any previous backup, in any location,
so renames and moves are handled very efficiently
* "tahoe backup" no longer bothers reading the previous snapshot
* if you switch grids, you should delete ~/.tahoe/private/backupdb.sqlite,
to force new uploads of all files and directories
* emit lease expiry date in ISO-8601'ish format as well as Brian's format
* rename iso_utc_time_to_localseconds() to iso_utc_time_to_seconds()
* add iso_utc_date()
* simplify the body of iso_utc_time_to_seconds()
This walks slowly through all shares, examining their leases, deciding which
are still valid and which have expired. Once enabled, it will then remove the
expired leases, and delete shares which no longer have any valid leases. Note
that there is not yet a tahoe.cfg option to enable lease-deletion: the
current code is read-only. A subsequent patch will add a tahoe.cfg knob to
control this, as well as docs. Some other minor items included in this patch:
tahoe debug dump-share has a new --leases-only flag
storage sharefile/leaseinfo code is cleaned up
storage web status page (/storage) has more info, more tests coverage
space-left measurement on OS-X should be more accurate (it was off by 2048x)
(use stat .f_frsize instead of f_bsize)
It is currently hardcoded in setup.py to be 'allmydata-tahoe'. Ticket #556 is to make it configurable by a runtime command-line argument to setup.py: "--appname=foo", but I suddenly wondered if we really wanted that and at the same time realized that we don't need that for tahoe-1.3.0 release, so this patch just hardcodes it in setup.py.
setup.py inspects a file named 'src/allmydata/_appname.py' and assert that it contains the string "__appname__ = 'allmydata-tahoe'", and creates it if it isn't already present. src/allmydata/__init__.py import _appname and reads __appname__ from it. The rest of the Python code imports allmydata and inspects "allmydata.__appname__", although actually every use it uses "allmydata.__full_version__" instead, where "allmydata.__full_version__" is created in src/allmydata/__init__.py to be:
__full_version__ = __appname + '-' + str(__version__).
All the code that emits an "application version string" when describing what version of a protocol it supports (introducer server, storage server, upload helper), or when describing itself in general (introducer client), usese allmydata.__full_version__.
This fixes ticket #556 at least well enough for tahoe-1.3.0 release.
This was somewhat sad; the assertion didn't say what path caused the
error, what went wrong. So... silently skip over things that are
neither dirs nor files.
Because the unit tests on the VirtualZooko? buildslave failed when it took 31 seconds for a process to go away.
Perhaps getting warning message after only 5 seconds instead of 40 seconds is desirable, and we should change the unit tests and set this back to 5, but I don't know exactly how to change the unit tests. Perhaps match this particular warning message about the shutdown taking a while and allow the code under test to pass if the only stderr that it emits is this warning.
The code for validating the share hash tree and the block hash tree has been rewritten to make sure it handles all cases, to share metadata about the file (such as the share hash tree, block hash trees, and UEB) among different share downloads, and not to require hashes to be stored on the server unnecessarily, such as the roots of the block hash trees (not needed since they are also the leaves of the share hash tree), and the root of the share hash tree (not needed since it is also included in the UEB). It also passes the latest tests including handling corrupted shares well.
ValidatedReadBucketProxy takes a share_hash_tree argument to its constructor, which is a reference to a share hash tree shared by all ValidatedReadBucketProxies for that immutable file download.
ValidatedReadBucketProxy requires the block_size and share_size to be provided in its constructor, and it then uses those to compute the offsets and lengths of blocks when it needs them, instead of reading those values out of the share. The user of ValidatedReadBucketProxy therefore has to have first used a ValidatedExtendedURIProxy to compute those two values from the validated contents of the URI. This is pleasingly simplifies safety analysis: the client knows which span of bytes corresponds to a given block from the validated URI data, rather than from the unvalidated data stored on the storage server. It also simplifies unit testing of verifier/repairer, because now it doesn't care about the contents of the "share size" and "block size" fields in the share. It does not relieve the need for share data v2 layout, because we still need to store and retrieve the offsets of the fields which come after the share data, therefore we still need to use share data v2 with its 8-byte fields if we want to store share data larger than about 2^32.
Specify which subset of the block hashes and share hashes you need while downloading a particular share. In the future this will hopefully be used to fetch only a subset, for network efficiency, but currently all of them are fetched, regardless of which subset you specify.
ReadBucketProxy hides the question of whether it has "started" or not (sent a request to the server to get metadata) from its user.
Download is optimized to do as few roundtrips and as few requests as possible, hopefully speeding up download a bit.
Because the unit tests on the VirtualZooko buildslave failed when it took 16 seconds for a process to go away.
Perhaps getting notification after only 5 seconds instead of 20 seconds is desirable, and we should change the unit tests and set this back to 5, but I don't know exactly how to change the unit tests. Perhaps match this particular warning message about the shutdown taking a while and allow the code under test to pass if the only stderr that it emits is this warning.
We're just going to mark unicode in the cli as unsupported for tahoe-lafs-1.3.0. Unicode filenames on the command-line do actually work for some platforms and probably only if the platform encoding is utf-8, but I'm not sure, and in any case for it to be marked as "supported" it would have to work on all platforms, be thoroughly tested, and also we would have to understand why it worked. :-)
Also encode all args to urllib as utf-8 because urllib doesn't handle unicode objects.
I'm not sure if it is appropriate to *assume* utf-8 encoding of cli args. Perhaps the Right thing to do is to detect the platform encoding. Any ideas?
This patch is mostly due to François Deppierraz.
in the recent reconciliation of webopen patches, I wound up adjusting
webopen to 'pass through' the state of the trailing slash on the given
argument to the resultant url passed to the browser. this change
removes the requirement that arguments must be directories, and allows
webopen to be used with files. it also broke the tests that assumed
that webopen would always normalise the url to have a trailing slash.
in fixing the tests, I realised that, IMHO, there's something deeply
awry with the way tahoe handles paths; specifically in the combination
of '/' being the name of the root path within an alias, but a leading
slash on paths, e.g. 'alias:/path', is catagorically incorrect. i.e.
'tahoe:' == 'tahoe:/' == '/'
but 'tahoe:/foo' is an invalid path, and must be 'tahoe:foo'
I wound up making the internals of webopen simply spot a 'path' of
'/' and smash it to '', which 'fixes' webopen to match the behaviour
of tahoe's path handling elsewhere, but that special case sort of
points to the weirdness.
(fwiw, I personally found the fact that the leading / in a path was
disallowed to be weird - I'm just used to seeing paths qualified by
the leading / I guess - so in a debate about normalising path handling
I'd vote to include the /)
I think this is largely attributable to a cleanup patch I'd made
which never got committed upstream somehow, but at any rate various
conflicting changes to webopen had been made. This cleans up the
conflicts therein, and hopefully brings 'tahoe webopen' in line with
other cli commands.
moved the body of webopen out of cli.py into tahoe_webopen.py
made its invocation consistent with the other cli commands, most
notably replacing its 'vdrive path' with the same alias parsing,
allowing usage such as 'tahoe webopen private:Pictures/xti'
this adds a new service to pre-generate RSA key pairs. This allows
the expensive (i.e. slow) key generation to be placed into a process
outside the node, so that the node's reactor will not block when it
needs a key pair, but instead can retrieve them from a pool of already
generated key pairs in the key-generator service.
it adds a tahoe create-key-generator command which initialises an
empty dir with a tahoe-key-generator.tac file which can then be run
via twistd. it stashes its .pem and portnum for furl stability and
writes the furl of the key gen service to key_generator.furl, also
printing it to stdout.
by placing a key_generator.furl file into the nodes config directory
(e.g. ~/.tahoe) a node will attempt to connect to such a service, and
will use that when creating mutable files (i.e. directories) whenever
possible. if the keygen service is unavailable, it will perform the
key generation locally instead, as before.
runner provides the main point of entry for the 'tahoe' command, and
provides various subcommands by default. this provides a hook whereby
additional subcommands can be added in in other contexts, providing a
simple way to extend the (sub)commands space available through 'tahoe'
base62 encoding fits more information into alphanumeric chars while avoiding the troublesome non-alphanumeric chars of base64 encoding. In particular, this allows us to work around the ext3 "32,000 entries in a directory" limit while retaining the convenient property that the intermediate directory names are leading prefixes of the storage index file names.
adds a 'run' commands to bin/tahoe / tahoe.exe
it loads a client node into the tahoe process itself,
running in the base dir specified by --basedir/-C and
defaulting to the current working dir.
it runs synchronously, and the tahoe process blocks until
the reactor is stopped.
this is probably not of very high utility in the unix case of bin/tahoe
but is useful when working with native builds, e.g. py2exe's tahoe.exe,
to examine and debug the runtime environment, linking problems etc.
taking the same arguments as tahoe ls, it does a webbrowser.open to the page
specified by those args. hence "tahoe webopen" will open a browser to the
root dir specified in private/root_dir.cap by default.
this might be a good alternative to the start.html page.
* rename my_private_dir.cap to root_dir.cap
* move it into the private subdir
* change the cmdline argument "--root-uri=[private]" to "--dir-uri=[root]"
* use new decentralized directories everywhere instead of old centralized directories
* provide UI to them through the web server
* provide UI to them through the CLI
* update unit tests to simulate decentralized mutable directories in order to test other components that rely on them
* remove the notion of a "vdrive server" and a client thereof
* remove the notion of a "public vdrive", which was a directory that was centrally published/subscribed automatically by the tahoe node (you can accomplish this manually by making a directory and posting the URL to it on your web site, for example)
* add a notion of "wait_for_numpeers" when you need to publish data to peers, which is how many peers should be attached before you start. The default is 1.
* add __repr__ for filesystem nodes (note: these reprs contain a few bits of the secret key!)
* fix a few bugs where we used to equate "mutable" with "not read-only". Nowadays all directories are mutable, but some might be read-only (to you).
* fix a few bugs where code wasn't aware of the new general-purpose metadata dict the comes with each filesystem edge
* sundry fixes to unit tests to adjust to the new directories, e.g. don't assume that every share on disk belongs to a chk file.
It isn't currently used, and I don't remember what part of its behavior was so much better than tahoe_put.py, and Brian has subsequently improved tahoe_put.py.
Thanks to Brian for helping me figure out the cleaner way to do this: take the
first result from which("twistd"), and if it has the extension ".bat" or
".exe" then execute it, else execute python and give it as the first argument.
It doesn't exit properly afterward, and it might not do the best things with non-success responses from the server.
(See tahoe-put-web2ish.py for an example of better response handling.)
There are actually two versions in this patch, one of which requires twisted.web2 and the other of which uses the Python standard library's socket module. The socketish one doesn't know when the web server is done so it hangs after doing its thing. (Oh -- maybe I should add an HTTP header asking the web server to close the connection when finished.) The web2ish one works better in that respect. Neither of them handle error responses from the server very well yet.
After lunch I intend to finish the socketish one.
To try one, mv src/allmydata/scripts/tahoe_put-{socketish,web2ish}.py src/allmydata/scripts/tahoe_put.py .
If you want to try the web2ish one, and you can't find a web2 package to install, you can get one from:
http://allmydata.org/~zooko/repos/twistedweb2snarf/
This is a potentially disruptive and potentially ugly change to the code base,
because I renamed the object that serves in both roles from "Queen" to
"IntroducerAndVdrive", which is a bit of an ugly name.
However, I think that clarity is important enough in this release to make this
change. All unit tests pass. I'm now darcs recording this patch in order to
pull it to other machines for more testing.
Note that using "whatever version of python the name 'python' maps to in the current shell environment" is more error-prone that specifying which python you mean, such as by executing "/usr/bin/python setup.py" instead of executing "./setup.py". When you build tahoe (by running "make") it will make a copy of bin/allmydata-tahoe in instdir/bin/allmydata-tahoe with the shebang line rewritten to execute the specific version of python that was used when building instead of to execute "/usr/bin/env python".
However, it seems better that the default for lazy people be "whatever 'python' means currently" instead of "whatever 'python' meant to the manufacturer of your operating system".