Remove install.html (long since deprecated).
Also replace some obsolete references to install.html with references to quickstart.rst.
Fix some broken internal references within docs/historical/historical_known_issues.txt.
Thanks to Ravi Pinjala and Patrick McDonald.
refs #1227
When blocks terminate (either COMPLETE or CORRUPT/DEAD/BADSEGNUM), the
_shares_from_server dict was being popped incorrectly (using shnum as the
index instead of serverid). I'm still thinking through the consequences of
this bug. It was probably benign and really hard to detect. I think it would
cause us to incorrectly believe that we're pulling too many shares from a
server, and thus prefer a different server rather than asking for a second
share from the first server. The diversity code is intended to spread out the
number of shares simultaneously being requested from each server, but with
this bug, it might be spreading out the total number of shares requested at
all, not just simultaneously. (note that SegmentFetcher is scoped to a single
segment, so the effect doesn't last very long).
No behavioral changes, just updating variable/method names and log messages.
The effects outside these three files should be minimal: some exception
messages changed (to say "server" instead of "peer"), and some internal class
names were changed. A few things still use "peer" to minimize external
changes, like UploadResults.timings["peer_selection"] and
happinessutil.merge_peers, which can be changed later.
I'm skeptical that the test was proceeding correctly but ran out of time. It seems more likely that it had gotten hung. But if we raise the timeout to an even more extravagant number then we can be even more certain that the test was never going to finish.
Pass around IServer instance instead of (peerid, rref) tuple. Replace
"descriptor" with "server". Other replacements:
get_all_servers -> get_connected_servers/get_known_servers
get_servers_for_index -> get_servers_for_psi (now returns IServers)
This change still needs to be pushed further down: lots of code is now
getting the IServer and then distributing (peerid, rref) internally.
Instead, it ought to distribute the IServer internally and delay
extracting a serverid or rref until the last moment.
no_network.py was updated to retain parallelism.
The service generated by strports.service() changed in 10.2, and the ugly
private-attribute-reading hack we used to glean a kernel-allocated port
number (e.g. when using "tcp:0", especially during unit tests) broke, causing
Tahoe to be completely unusable with Twisted-10.2 . The new ugly
private-attribute-reading hack starts by figuring out what sort of service
was generated, then reads different attributes accordingly.
This also hushes a warning when using schemeless strports strings like "0" or
"3456", by quietly prepending a "tcp:" scheme, since 10.2 complains about
those. It also adds getURL() and getPortnum() accessors to the "webish"
service, rather than having unit tests dig through _url and _portnum and such
to find out what they are.
I make some changes on status web pages
status.xhtml:
- Delete unused webform_css link
- Align tables on the left
tahoe-css:
- Do some minor changes on code synthax
- changes table.status-download-events style to look like other tables
status.py:
- Align table on the left
- Changes table header
- Add heading tags
- Modify google api graph: add image border, calculate height to feet data
signed-off-by: zooko@zooko.comfixes#1219
I personally used "tahoe start/restart -m ../MY-TESTNET/node*" all the time,
to spin up or update a local testgrid while iterating over new code. However,
with the recent switch from "subprocess.Popen(/bin/twistd)" to "import and
call twistd.run()" in scripts/startstop_node.py (yay fewer processes!),
"start -m" broke, and fixing it requires os.fork, which is unavailable on
windows (boo windows!). And I was probably the only one using -m. So in the
interests of uniformity among platforms and simpler code (yay negative code
days!), we're just removing -m from everything. I will start using a little
shell script or something to simulate the removed functionality.
This patch also cleans up CLI-function calling a bit: get the basedir from
the config dict (instead of sometimes from a separate argument), and always
return a numeric exit code.
Specifically, test_runner.CreateNode.test_client failed, because the
os.fork-is-present test decided that --multiple should not be allowed on
windows, even though --multiple works just fine for 'tahoe create-client'.
The only restriction on --multiple is for 'tahoe start' and 'tahoe restart'.
This needs a different approach, probably by cleaning up BasedirMixin. We
should only be withholding --multiple on windows for "start" and
"restart". (we should continue withholding --multiple on all platforms for
"run").
This reverts (git) commit f3adb037ae:
"startstop_node.py: fix "tahoe start -m" by forking before non-final targets"
* don't advertise -m flag on tahoe start/restart/run unless os.fork is
available (i.e. windows)
* test_runner.py: add test to exercise "start/stop/restart -m"
For one thing, this makes missing-dependency failures into DistributionNotFound errors instead of ImportErrors, which might be more useful to the user. For another thing, if someone is using distributions that were installed with --multi-version, then they might be not importable until after require_auto_deps() has been run. (The docs claim that this would be the case, but we don't have an example of this happening at this time.)
* repairer (really the uploader) reads beyond end of input file (Uploadable)
* new-downloader does not tolerate overreads
* uploader does lots of tiny reads (inefficient)
This fixes the last two. The uploader still does a single overread at the end
of the input file, but now that's ok so we can leave it in place. The
uploader now expects the Uploadable to behave like a normal disk
file (reading beyond EOF will return less data than was asked for), and now
the new-downloadable behaves that way.
This removes the need to use a locally-built (dependency) bin/twistd, and
removes a big chunk of behavior differences between unix and windows. It
also happens to resolve the "client node probably started" uncertainty.
Might help with #1190, #602, and #71.
fixes#530. I earlier tried this twice (see #530 for history) and then twice rolled it back due to some problems that arose. However, I didn't write down what the problems were in enough detail on the ticket that I can tell today whether those problems are still issues, so here goes the third attempt. (I did write down on the ticket that it would not create site.py or .pth files in the target directory with --multi-version mode, but I didn't explain why *that* was a problem.)
fixes#1191
Patch by Brian. This patch description was actually written by Zooko, but I forged Brian's name on the "author" field so that he would get credit for this patch in revision control history.
If you are investigating the bug in new-downloader, one way to investigate might be to change this ordering to a different fixed order (e.g. rotate by 4 instead of rotate by 5) and observe how the behavior of new-downloader differs in that case.
Kyle's OpenBSD buildslave used 41 reads when doing this test. The fact that I'm blindly bumping this number up to match the observed behavior probably means this isn't a good criterion to be testing for anyway. But perhaps someone else (Brian) could investigate why that run on Kyle's OpenBSD box took four more reads than we expected, and whether the fact that it took 41 reads to do this operation is indicative of an actual problem.
deliver all shares at once instead of feeding them out one-at-a-time.
Also fix distribution of real-number-of-segments information: now all
CommonShares (not just the ones used for the first segment) get a
correctly-sized hashtree. Previously, the late ones might not, which would
make them crash and get dropped (causing the download to fail if the initial
set were insufficient, perhaps because one of their servers went away).
Update tests, add some TODO notes, improve variable names and comments.
Improve logging: add logparents, set more appropriate levels.
This avoids spamming the "recent uploads and downloads" /status page from
FileNode instances that were created for a directory read but which nobody is
ever going to read from. I also cleaned up the way DownloadStatus instances
are made to only ever do it in the CiphertextFileNode, not in the
higher-level plaintext FileNode. Also fixed DownloadStatus handling of read
size, thanks to David-Sarah for the catch.
seems to avoid the #1155 log message which reveals the URI (and filecap).
Also add an [ERROR] marker to the flog entry, since unregisterProducer also
makes interrupted downloads appear "200 OK"; this makes it more obvious that
the download did not complete.
The lost-progress bug occurred when two simultanous read() calls fetched
different segments, and the first one failed (due to corruption, or the other
bugs in #1154): the second read() would never complete. While in this state,
cancelling the second read by having its consumer call stopProducing) would
trigger the cancel-intolerance bug. Finally, in downloader.node.Cancel,
prevent late cancels by adding an 'active' flag
The Range header causes n.read() to be called with an offset= of type 'long',
which eventually got used in a Spans/DataSpans object's __len__ method.
Apparently python doesn't permit __len__() to return longs, only ints.
Rewrote Spans/DataSpans to use s.len() instead of len(s) aka s.__len__() .
Added a test in test_download. Note that test_web didn't catch this because
it uses mock FileNodes for speed: it's probably time to rewrite that.
There is still an unresolved error-recovery problem in #1154, so I'm not
closing the ticket quite yet.
The fixed 10-second timer will eventually be replaced with a per-server
value, calculated based on observed response times.
test_hung_server.py: enhance to exercise DYHB=OVERDUE state. Split existing
mutable+immutable tests into two pieces for clarity. Reenabled several tests.
Deleted the now-obsolete "test_failover_during_stage_4".
This bug had the effect of making uploads sometimes (rarely) appear to succeed when they had actually not distributed the shares well enough to achieve the desired servers-of-happiness level.
This patch also renames some instances of "find_shares()" to "find_all_shares()" and other instances to "find_uri_shares()" as appropriate -- the conflation between those names confused me at first when writing these tests.
Impose micro-POLA by passing only the writekey instead of the whole node object to {{{_encrypt_rw_uri()}}}. Remove DummyImmutableFileNode in nodemaker.py, which is obviated by this. Add micro-optimization by precomputing the netstring of the empty string and branching on whether the writekey is present or not outside of {{{_encrypt_rw_uri()}}}. Add doc about writekey to docstring.
fixes#967
CSS modification to be correctly diplayed with Internet Explorer 8
The links on the top of page directory.xhtml are not diplayed in the same line as display with Firefox.
Fix parsing of a Range: header to support:
- multiple ranges (parsed, but not returned)
- suffix byte ranges ("-2139")
- correct handling of incorrectly formatted range headers
(correct behaviour is to ignore the header and return the full
file)
- return appropriate error for ranges outside the file
Multiple ranges are parsed, but only the first range is returned.
Returning multiple ranges requires using the multipart/byterange
content type.
pyflakes pointed out that the exception handler fallback called an un-imported function, showing that the fallback wasn't being exercised.
I'm not 100% sure that this patch is right and would appreciate François or someone reviewing it.
This test ensure that open(a_unicode_string) is used on Unicode platforms
(Windows or MacOS X) and that open(a_correctly_encoded_bytestring) on other
platforms such as Unix.
Tahoe CLI commands working on local files, for instance 'tahoe cp' or 'tahoe
backup', have been improved to correctly handle filenames containing non-ASCII
characters.
In the case where Tahoe encounters a filename which cannot be decoded using the
system encoding, an error will be returned and the operation will fail. Under
Linux, this typically happens when the filesystem contains filenames encoded
with another encoding, for instance latin1, than the system locale, for
instance UTF-8. In such case, you'll need to fix your system with tools such
as 'convmv' before using Tahoe CLI.
All CLI commands have been improved to support non-ASCII parameters such as
filenames and aliases on all supported Operating Systems except Windows as of
now.
- Renamed xhtml Title from "Allmydata - Tahoe" to "Tahoe-LAFS"
- Renamed Tahoe to Tahoe-LAFS in page content
- Changed Tahoe-LAFS home page link to http://tahoe-lafs.org (added target="blank")
- Deleted commented css script in info.xhtml
- Make some important utility functions clearer and more thoroughly
documented.
- Assert in upload.servers_of_happiness that the buckets attributes
of PeerTrackers passed to it are mutually disjoint.
- Get rid of some silly non-Pythonisms that I didn't see when I first
wrote these patches.
- Make sure that should_add_server returns true when queried about a
shnum that it doesn't know about yet.
- Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set
of peerids, alter dependencies to deal with that.
- Remove upload.should_add_servers, because it is no longer necessary
- Move upload.shares_of_happiness and upload.shares_by_server to a utility
file.
- Change some points in Tahoe2PeerSelector.
- Compute servers_of_happiness using a bipartite matching algorithm that
we know is optimal instead of an ad-hoc greedy algorithm that isn't.
- Change servers_of_happiness to just take a sharemap as an argument,
change its callers to merge existing_shares and used_peers before
calling it.
- Change an error message in the encoder to be more appropriate for
servers of happiness.
- Clarify the wording of an error message in immutable/upload.py
- Refactor a happiness failure message to happinessutil.py, and make
immutable/upload.py and immutable/encode.py use it.
- Move the word "only" as far to the right as possible in failure
messages.
- Use a better definition of progress during peer selection.
- Do read-only peer share detection queries in parallel, not sequentially.
- Clean up logging semantics; print the query statistics whenever an
upload is unsuccessful, not just in one case.
When I first implemented #778, I just altered the error messages to refer to
servers where they referred to shares. The resulting error messages weren't
very good. These are a bit better.
The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError
for a variety of error conditions that weren't informatively described by them.
This patch creates a new error, UploadHappinessError, replaces uses of
NoSharesError and NotEnoughSharesError with it, and alters the error message
raised with the errors to be more in line with the new servers_of_happiness
behavior. See ticket #834 for more information.
- Fix comments and confusing naming.
- Add tests for the new error messages suggested by David-Sarah
and Zooko.
- Alter existing tests for new error messages.
- Make sure that the tests continue to work with the trunk.
- Add a test for a mutual disjointedness assertion that I added to
upload.servers_of_happiness.
- Fix the comments to correctly reflect read-onlyness
- Add a test for an edge case in should_add_server
- Add an assertion to make sure that share redistribution works as it
should
- Alter tests to work with revised servers_of_happiness semantics
- Remove tests for should_add_server, since that function no longer exists.
- Alter tests to know about merge_peers, and to use it before calling
servers_of_happiness.
- Add tests for merge_peers.
- Add Zooko's puzzles to the tests.
- Edit encoding tests to expect the new kind of failure message.
- Edit tests to expect error messages with the word "only" moved as far
to the right as possible.
- Extended and cleaned up some helper functions.
- Changed some tests to call more appropriate helper functions.
- Added a test for the failing redistribution algorithm
- Added a test for the progress message
- Added a test for the upper bound on readonly peer share discovery.
This patch modifies the regular expression used for verifying of '--node-url'
parameter. Support for accessing a Tahoe gateway over HTTPS was already
present, thanks to Python's urllib.
This handles the case where we upload a new tahoe directory for a
previously-processed local directory, possibly creating a new dircap (if the
metadata had changed). Now we replace the old dirhash->dircap record. The
previous behavior left the old record in place (with the old dircap and
timestamps), so we'd never stop creating new directories and never converge
on a null backup.
These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months.
To test the changes for #577, we need a deterministic way to simulate
the passage of long periods of time. twisted.internet.task.Clock seems,
from my Googling, to be the way to go for this functionality. I changed
a few things so that OphandleTable would use twisted.internet.task.Clock
when testing:
* WebishServer.__init___ now takes an optional 'clock' parameter,
* which it passes to the root.Root instance it creates.
* root.Root.__init__ now takes an optional 'clock' parameter, which it
passes to the OphandleTable.__init__ method.
* OphandleTable.__init__ now takes an optional 'clock' parameter. If
it is provided, and it isn't None, its callLater method will be used
to schedule ophandle expirations (as opposed to using
reactor.callLater, which is what OphandleTable does normally).
* The WebMixin object in test_web.py now sets a self.clock parameter,
which is a twisted.internet.task.Clock that it feeds to the
WebishServer it creates.
Tests using the WebMixin can control the passage of time in
OphandleTable by accessing self.clock.
It still lacks the right HTML report (the builtin report is very pretty, but
lacks the "lines uncovered" numbers that I want), and the half-finished
delta-from-last-run measurements.
This can be useful if one of the ones that he has already begun downloading fails. See #287 for discussion. This fixes part of #287 which part was a regression caused by #928, namely this fixes fail-over in case a share is corrupted (or the server returns an error or disconnects). This does not fix the related issue mentioned in #287 if a server hangs and doesn't reply to requests for blocks.
This should put an end to the phenomenon I've been seeing that a single hung server can cause all downloads on a grid to hang. Also it should speed up all downloads by (a) not-waiting for responses to queries that it doesn't need, and (b) downloading shares from the servers which answered the initial query the fastest.
Also, do not count how many buckets you've gotten when deciding whether the download has enough shares or not -- instead count how many buckets to *unique* shares that you've gotten. This appears to improve a slightly weird behavior in the current download code in which receiving >= K different buckets all to the same sharenumber would make it think it had enough to download the file when in fact it hadn't.
This patch needs tests before it is actually ready for trunk.
Having both test_node() and test_client() (one of which calls the other) felt
confusing to me, so I changed it to have test_node(), test_client(), and a
common do_create() helper method.
This patch displays a warning to the user in two cases:
1. When special files like symlinks, fifos, devices, etc. are found in the
local source.
2. If files or directories are not readables by the user running the 'tahoe
backup' command.
In verbose mode, the number of skipped files and directories is printed at the
end of the backup.
Exit status returned by 'tahoe backup':
- 0 everything went fine
- 1 the backup failed
- 2 files were skipped during the backup
allmydata.util.log.err() either takes a Failure as the first positional
argument, or takes no positional arguments and must be invoked in an
exception handler. Fixed its signature to match both foolscap.logging.log.err
and twisted.python.log.err . Included a brief unit test.
Stop checking separately for ConnectionDone/ConnectionLost, since those have
been folded into DeadReferenceError since foolscap-0.3.1 . Write
rrefutil.trap_deadref() in terms of rrefutil.trap_and_discard() to improve
code coverage.
Verifier misses
The results (described in #819) match our expectations: it misses corruption
in unused share fields and in most container fields (which are only visible
to the storage server, not the client). 1265 bytes of a 2753 byte
share (hosting a 56-byte file with an artifically small segment size) are
unused, mostly in the unused tail of the overallocated UEB space (765 bytes),
and the allocated-but-unwritten plaintext_hash_tree (480 bytes).
The bug was that a disconnected server could cause us to re-enter the initial
loop() call, sending multiple queries to a single server, provoking an
incorrect UCWE. To fix it, stall the loop() with an eventual.fireEventually()
instead of weird errors. Closes#874 and #786.
Previously, if the file had 0 shares, this would raise TypeError as it tried
to call download_version(None). If the file had some shares but fewer than
'k', it would incorrectly raise MustForceRepairError.
Added get_successful() to the IRepairResults API, to give repair() a place to
report non-code-bug problems like this.
Mutable servermap updates and the immutable checker, when run with
add_lease=True, send both the do-you-have-block and add-lease commands in
parallel, to avoid an extra round trip time. Many older servers have problems
with add-lease and raise various exceptions, which don't generally matter.
The client-side code was catching+ignoring some of them, but unrecognized
exceptions were passed through to the DYHB code, concealing the DYHB results
from the checker, making it think the server had no shares.
The fix is to separate the code paths. Both commands are sent at the same
time, but the errback path from add-lease is handled separately. Known
exceptions are ignored, the others (both unknown-remote and all-local) are
logged (log.WEIRD, which will trigger an Incident), but neither will affect
the DYHB results.
The add-lease message is sent first, and we know that the server handles them
synchronously. So when the checker is done, we can be sure that all the
add-lease messages have been retired. This makes life easier for unit tests.
web/filenode.py: also serve edge metadata when using t=json on a
DIRCAP/childname object.
tahoe_ls.py: list file objects as if we were listing one-entry directories.
Show edge metadata if we have it, which will be true when doing
'tahoe ls DIRCAP/filename' and false when doing 'tahoe ls
FILECAP'
This forbids operations that would implicitly create a directory with a
zero-length (empty string) name, like what you'd get if you did "tahoe put
local /oops/blah" (#358) or "POST /uri/CAP//?t=mkdir" (#676). The error
message is fairly friendly too.
Also added code to "tahoe put" to catch this error beforehand and suggest the
correct syntax (i.e. without the leading slash).
The webapi has been looking for an Accept header since 1.4.0, but it treats a
missing header as equal to */* (to honor RFC2616). This change finally
modifies our CLI tools to ask for "text/plain, application/octet-stream",
which seems roughly correct (we either want a plain-text traceback or error
message, or an uninterpreted chunk of binary data to save to disk). Some day
we'll figure out how JSON fits into this scheme.
under normal conditions, this wouldn't cause any problems, but if the shares
are really sparse (perhaps because new servers were added), then
file-modifies might stop looking too early and leave old shares in place
* remove Downloader.download_to_data/download_to_filename/download_to_filehandle
* remove download.Data/FileName/FileHandle targets
* remove filenode.download/download_to_data/download_to_filename methods
* leave Downloader.download (the whole Downloader will go away eventually)
* add util.consumer.MemoryConsumer/download_to_data, for convenience
(this is mostly used by unit tests, but it gets used by enough non-test
code to warrant putting it in allmydata.util)
* update tests
* removes about 180 lines of code. Yay negative code days!
Overall plan is to rewrite immutable/download.py and leave filenode.read() as
the sole read-side API.
* backups now share dirnodes with any previous backup, in any location,
so renames and moves are handled very efficiently
* "tahoe backup" no longer bothers reading the previous snapshot
* if you switch grids, you should delete ~/.tahoe/private/backupdb.sqlite,
to force new uploads of all files and directories
The proper hierarchy is:
IFilesystemNode
+IFileNode
++IMutableFileNode
++IImmutableFileNode
+IDirectoryNode
Also expand test_client.py (NodeMaker) to hit all IFilesystemNode types.
* stop caching most_recent_size in dirnode, rely upon backing filenode for it
* start caching most_recent_size in MutableFileNode
* return None when you don't know, not "?"
* only render None as "?" in the web "more info" page
* add get_size/get_current_size to UnknownNode
* change t=mkdir-with-children to not use multipart/form encoding. Instead,
the request body is all JSON. t=mkdir-immutable uses this format too.
* make nodemaker.create_immutable_dirnode() get convergence from SecretHolder,
but let callers override it
* raise NotDeepImmutableError instead of using assert()
* add mutable= argument to DirectoryNode.create_subdirectory(), default True
* "cap" means a python instance which encapsulates a filecap/dircap (uri.py)
* "uri" means a string with a "URI:" prefix
* FileNode instances are created with (and retain) a cap instance, and
generate uri strings on demand
* .get_cap/get_readcap/get_verifycap/get_repaircap return cap instances
* .get_uri/get_readonly_uri return uri strings
* add filenode.download_to_filename() for control.py, should find a better way
* use MutableFileNode.init_from_cap, not .init_from_uri
* directory URI instances: use get_filenode_cap, not get_filenode_uri
* update/cleanup bench_dirnode.py to match, add Makefile target to run it
This is safer: in the earlier API, an old webapi server would silently ignore
the initial children, and clients trying to set them would have to fetch the
newly-created directory to discover the incompatibility. In the new API,
clients using t=mkdir-with-children against an old webapi server will get a
clear error.
instead of creating an empty file and then adding the children later.
This should speed up mkdir(initial_children) considerably, removing two
roundtrips and an entire read-modify-write cycle, probably bringing it down
to a single roundtrip. A quick test (against the volunteergrid) suggests a
30% speedup.
test_dirnode: add new tests to enforce the restrictions that interfaces.py
claims for create_new_mutable_directory(): no UnknownNodes, metadata dicts
interfaces.py: define INodeMaker, document argument values, change
create_new_mutable_directory() to take dict-of-nodes. Change
dirnode.set_nodes() and dirnode.create_subdirectory() too.
nodemaker.py: use INodeMaker, update create_new_mutable_directory()
client.py: have create_dirnode() delegate initial_children= to nodemaker
dirnode.py (Adder): take dict-of-nodes instead of list-of-nodes, which
updates set_nodes() and create_subdirectory()
web/common.py (convert_initial_children_json): create dict-of-nodes
web/directory.py: same
web/unlinked.py: same
test_dirnode.py: update tests to match
invoked with the new MutableFileNode and is supposed to return the initial
contents. This can be used by e.g. a new dirnode which needs the filenode's
writekey to encrypt its initial children.
create_mutable_file() still accepts a bytestring too, or None for an empty
file.