mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-27 08:22:32 +00:00
828 lines
41 KiB
Plaintext
828 lines
41 KiB
Plaintext
User visible changes in Tahoe. -*- outline -*-
|
|
|
|
* Release 1.XXX (200X-YY-ZZ)
|
|
|
|
** Improvements
|
|
|
|
Uploads of immutable files now use pipelined writes, improving upload speed
|
|
slightly (10%) over high-latency connections. (#392)
|
|
|
|
Processing large directories has been sped up, by removing a O(N^2) algorithm
|
|
from the dirnode decoding path and retaining unmodified encrypted entries.
|
|
(#750, #752)
|
|
|
|
The human-facing web interface (aka the "WUI") received a significant CSS
|
|
makeover by Kevin Reid, making it much prettier and easier to read. The WUI
|
|
"check" and "deep-check" forms now include a "Renew Lease" checkbox,
|
|
mirroring the CLI --add-lease option, so leases can be added or renewed from
|
|
the web interface.
|
|
|
|
The CLI "tahoe webopen" command, when run without arguments, will bring up
|
|
the "Welcome Page" (node status and mkdir/upload forms).
|
|
|
|
The 3.5MB limit on mutable files was removed, so it should be possible to
|
|
upload arbitrarily-sized mutable files. Note, however, that the data format
|
|
and algorithm remains the same, so larger files will suffer from poor speed,
|
|
data transfer overhead, memory consumption, and alacrity until "MDMF" mutable
|
|
files (#393) are implemented. (#694)
|
|
|
|
This version of Tahoe will tolerate directory entries that contain filecap
|
|
formats which it does not recognize: files and directories from the future.
|
|
Previous versions would fail badly, preventing the user from seeing or
|
|
editing anything else in those directories. These unrecognized objects can be
|
|
renamed and deleted, but obviously not read or written. Also they cannot be
|
|
copied. This should improve the user experience when we add new cap formats in
|
|
the future. (#683)
|
|
|
|
** Bugfixes
|
|
|
|
deep-check-and-repair now tolerates read-only directories, such as the ones
|
|
produced by the "tahoe backup" CLI command. Read-only directories and mutable
|
|
files are checked, but not repaired. Previous versions threw an exception
|
|
when attempting the repair and failed to process the remaining contents. We
|
|
cannot yet repair these read-only objects, but at least this version allows
|
|
the rest of the check+repair to proceed. (#625)
|
|
|
|
A bug in 1.4.1 which caused a server to be listed multiple times (and
|
|
frequently broke all connections to that server) was fixed. (#653)
|
|
|
|
The plaintext-hashing code was removed from the Helper interface, removing
|
|
the Helper's ability to mount a partial-information-guessing attack. (#722)
|
|
|
|
** Platform/packaging changes
|
|
|
|
Tahoe now runs on NetBSD.
|
|
|
|
Unit test timeouts have been raised to allow the tests to complete on
|
|
extremely slow platforms like embedded ARM-based NAS boxes. An ARM-specific
|
|
data-corrupting bug in an older version of Crypto++ (5.5.2) was identified,
|
|
ARM-users are encouraged to use recent Crypto++/pycryptopp which avoids this
|
|
problem.
|
|
|
|
Tahoe now requires a SQLite library, either the sqlite3 that comes built-in
|
|
with python2.5/2.6, or the add-on pysqlite2 if you're using python2.4. In the
|
|
previous release, this was only needed for the "tahoe backup" command, now it
|
|
is mandatory.
|
|
|
|
Several minor documentation updates were made.
|
|
|
|
To help get Tahoe into Linux distributions like Fedora and Debian, packaging
|
|
improvements are being made in both Tahoe and related libraries like
|
|
pycryptopp and zfec.
|
|
|
|
** dependency updates
|
|
|
|
foolscap-0.4.1
|
|
no python-2.4.0 or 2.4.1 (2.4.2 is good)
|
|
(they contained a bug in base64.b32decode)
|
|
avoid python-2.6 on windows with mingw: compiler issues
|
|
python2.4 requires pysqlite2 (2.5,2.6 does not)
|
|
no python-3.x
|
|
pycryptopp-0.5.15
|
|
|
|
|
|
* Release 1.4.1 (2009-04-13)
|
|
|
|
** Garbage Collection
|
|
|
|
The big feature for this release is the implementation of garbage collection,
|
|
allowing Tahoe storage servers to delete shares for old deleted files. When
|
|
enabled, this uses a "mark and sweep" process: clients are responsible for
|
|
updating the leases on their shares (generally by running "tahoe deep-check
|
|
--add-lease"), and servers are allowed to delete any share which does not
|
|
have an up-to-date lease. The process is described in detail in
|
|
docs/garbage-collection.txt .
|
|
|
|
The server must be configured to enable garbage-collection, by adding
|
|
directives to the [storage] section that define an age limit for shares. The
|
|
default configuration will not delete any shares.
|
|
|
|
Both servers and clients should be upgraded to this release to make the
|
|
garbage-collection as pleasant as possible. 1.2.0 servers do not have the
|
|
code to perform the update-lease operation, while 1.3.0 servers have
|
|
update-lease but will return an exception for unknown storage indices,
|
|
causing clients to emit an Incident for each exception, slowing the add-lease
|
|
process down to a crawl. 1.3.0 clients did not have the add-lease operation
|
|
at all.
|
|
|
|
** Security/Usability Problems Fixed
|
|
|
|
A super-linear algorithm in the Merkle Tree code was fixed, which previously
|
|
caused e.g. download of a 10GB file to take several hours before the first
|
|
byte of plaintext could be produced. The new "alacrity" is about 2 minutes. A
|
|
future release should reduce this to a few seconds by fixing ticket #442.
|
|
|
|
The previous version permitted a small timing attack (due to our use of
|
|
strcmp) against the write-enabler and lease-renewal/cancel secrets. An
|
|
attacker who could measure response-time variations of approximatly 3ns
|
|
against a very noisy background time of about 15ms might be able to guess
|
|
these secrets. We do not believe this attack was actually feasible. This
|
|
release closes the attack by first hashing the two strings to be compared
|
|
with a random secret.
|
|
|
|
** webapi changes
|
|
|
|
In most cases, HTML tracebacks will only be sent if an "Accept: text/html"
|
|
header was provided with the HTTP request. This will generally cause browsers
|
|
to get an HTMLized traceback but send regular text/plain tracebacks to
|
|
non-browsers (like the CLI clients). More errors have been mapped to useful
|
|
HTTP error codes.
|
|
|
|
The streaming webapi operations (deep-check and manifest) now have a way to
|
|
indicate errors (an output line that starts with "ERROR" instead of being
|
|
legal JSON). See docs/frontends/webapi.txt for details.
|
|
|
|
The storage server now has its own status page (at /storage), linked from the
|
|
Welcome page. This page shows progress and results of the two new
|
|
share-crawlers: one which merely counts shares (to give an estimate of how
|
|
many files/directories are being stored in the grid), the other examines
|
|
leases and reports how much space would be freed if GC were enabled. The page
|
|
also shows how much disk space is present, used, reserved, and available for
|
|
the Tahoe server, and whether the server is currently running in "read-write"
|
|
mode or "read-only" mode.
|
|
|
|
When a directory node cannot be read (perhaps because of insufficent shares),
|
|
a minimal webapi page is created so that the "more-info" links (including a
|
|
Check/Repair operation) will still be accessible.
|
|
|
|
A new "reliability" page was added, with the beginnings of work on a
|
|
statistical loss model. You can tell this page how many servers you are using
|
|
and their independent failure probabilities, and it will tell you the
|
|
likelihood that an arbitrary file will survive each repair period. The
|
|
"numpy" package must be installed to access this page. A partial paper,
|
|
written by Shawn Willden, has been added to docs/proposed/lossmodel.lyx .
|
|
|
|
** CLI changes
|
|
|
|
"tahoe check" and "tahoe deep-check" now accept an "--add-lease" argument, to
|
|
update a lease on all shares. This is the "mark" side of garbage collection.
|
|
|
|
In many cases, CLI error messages have been improved: the ugly HTMLized
|
|
traceback has been replaced by a normal python traceback.
|
|
|
|
"tahoe deep-check" and "tahoe manifest" now have better error reporting.
|
|
"tahoe cp" is now non-verbose by default.
|
|
|
|
"tahoe backup" now accepts several "--exclude" arguments, to ignore certain
|
|
files (like editor temporary files and version-control metadata) during
|
|
backup.
|
|
|
|
On windows, the CLI now accepts local paths like "c:\dir\file.txt", which
|
|
previously was interpreted as a Tahoe path using a "c:" alias.
|
|
|
|
The "tahoe restart" command now uses "--force" by default (meaning it will
|
|
start a node even if it didn't look like there was one already running).
|
|
|
|
The "tahoe debug consolidate" command was added. This takes a series of
|
|
independent timestamped snapshot directories (such as those created by the
|
|
allmydata.com windows backup program, or a series of "tahoe cp -r" commands)
|
|
and creates new snapshots that used shared read-only directories whenever
|
|
possible (like the output of "tahoe backup"). In the most common case (when
|
|
the snapshots are fairly similar), the result will use significantly fewer
|
|
directories than the original, allowing "deep-check" and similar tools to run
|
|
much faster. In some cases, the speedup can be an order of magnitude or more.
|
|
This tool is still somewhat experimental, and only needs to be run on large
|
|
backups produced by something other than "tahoe backup", so it was placed
|
|
under the "debug" category.
|
|
|
|
"tahoe cp -r --caps-only tahoe:dir localdir" is a diagnostic tool which,
|
|
instead of copying the full contents of files into the local directory,
|
|
merely copies their filecaps. This can be used to verify the results of a
|
|
"consolidation" operation.
|
|
|
|
** other fixes
|
|
|
|
The codebase no longer rauses RuntimeError as a kind of assert(). Specific
|
|
exception classes were created for each previous instance of RuntimeError.
|
|
|
|
Many unit tests were changed to use a non-network test harness, speeding them
|
|
up considerably.
|
|
|
|
Deep-traversal operations (manifest and deep-check) now walk individual
|
|
directories in alphabetical order. Occasional turn breaks are inserted to
|
|
prevent a stack overflow when traversing directories with hundreds of
|
|
entries.
|
|
|
|
The experimental SFTP server had its path-handling logic changed slightly, to
|
|
accomodate more SFTP clients, although there are still issues (#645).
|
|
|
|
|
|
* Release 1.3.0 (2009-02-13)
|
|
|
|
** Checker/Verifier/Repairer
|
|
|
|
The primary focus of this release has been writing a checker / verifier /
|
|
repairer for files and directories. "Checking" is the act of asking storage
|
|
servers whether they have a share for the given file or directory: if there
|
|
are not enough shares available, the file or directory will be
|
|
unrecoverable. "Verifying" is the act of downloading and cryptographically
|
|
asserting that the server's share is undamaged: it requires more work
|
|
(bandwidth and CPU) than checking, but can catch problems that simple
|
|
checking cannot. "Repair" is the act of replacing missing or damaged shares
|
|
with new ones.
|
|
|
|
This release includes a full checker, a partial verifier, and a partial
|
|
repairer. The repairer is able to handle missing shares: new shares are
|
|
generated and uploaded to make up for the missing ones. This is currently the
|
|
best application of the repairer: to replace shares that were lost because of
|
|
server departure or permanent drive failure.
|
|
|
|
The repairer in this release is somewhat able to handle corrupted shares. The
|
|
limitations are:
|
|
|
|
* Immutable verifier is incomplete: not all shares are used, and not all
|
|
fields of those shares are verified. Therefore the immutable verifier has
|
|
only a moderate chance of detecting corrupted shares.
|
|
* The mutable verifier is mostly complete: all shares are examined, and most
|
|
fields of the shares are validated.
|
|
* The storage server protocol offers no way for the repairer to replace or
|
|
delete immutable shares. If corruption is detected, the repairer will
|
|
upload replacement shares to other servers, but the corrupted shares will
|
|
be left in place.
|
|
* read-only directories and read-only mutable files must be repaired by
|
|
someone who holds the write-cap: the read-cap is insufficient. Moreover,
|
|
the deep-check-and-repair operation will halt with an error if it attempts
|
|
to repair one of these read-only objects.
|
|
* Some forms of corruption can cause both download and repair operations to
|
|
fail. A future release will fix this, since download should be tolerant of
|
|
any corruption as long as there are at least 'k' valid shares, and repair
|
|
should be able to fix any file that is downloadable.
|
|
|
|
If the downloader, verifier, or repairer detects share corruption, the
|
|
servers which provided the bad shares will be notified (via a file placed in
|
|
the BASEDIR/storage/corruption-advisories directory) so their operators can
|
|
manually delete the corrupted shares and investigate the problem. In
|
|
addition, the "incident gatherer" mechanism will automatically report share
|
|
corruption to an incident gatherer service, if one is configured. Note that
|
|
corrupted shares indicate hardware failures, serious software bugs, or malice
|
|
on the part of the storage server operator, so a corrupted share should be
|
|
considered highly unusual.
|
|
|
|
By periodically checking/repairing all files and directories, objects in the
|
|
Tahoe filesystem remain resistant to recoverability failures due to missing
|
|
and/or broken servers.
|
|
|
|
This release includes a wapi mechanism to initiate checks on individual
|
|
files and directories (with or without verification, and with or without
|
|
automatic repair). A related mechanism is used to initiate a "deep-check" on
|
|
a directory: recursively traversing the directory and its children, checking
|
|
(and/or verifying/repairing) everything underneath. Both mechanisms can be
|
|
run with an "output=JSON" argument, to obtain machine-readable check/repair
|
|
status results. These results include a copy of the filesystem statistics
|
|
from the "deep-stats" operation (including total number of files, size
|
|
histogram, etc). If repair is possible, a "Repair" button will appear on the
|
|
results page.
|
|
|
|
The client web interface now features some extra buttons to initiate check
|
|
and deep-check operations. When these operations finish, they display a
|
|
results page that summarizes any problems that were encountered. All
|
|
long-running deep-traversal operations, including deep-check, use a
|
|
start-and-poll mechanism, to avoid depending upon a single long-lived HTTP
|
|
connection. docs/frontends/webapi.txt has details.
|
|
|
|
** Efficient Backup
|
|
|
|
The "tahoe backup" command is new in this release, which creates efficient
|
|
versioned backups of a local directory. Given a local pathname and a target
|
|
Tahoe directory, this will create a read-only snapshot of the local directory
|
|
in $target/Archives/$timestamp. It will also create $target/Latest, which is
|
|
a reference to the latest such snapshot. Each time you run "tahoe backup"
|
|
with the same source and target, a new $timestamp snapshot will be added.
|
|
These snapshots will share directories that have not changed since the last
|
|
backup, to speed up the process and minimize storage requirements. In
|
|
addition, a small database is used to keep track of which local files have
|
|
been uploaded already, to avoid uploading them a second time. This
|
|
drastically reduces the work needed to do a "null backup" (when nothing has
|
|
changed locally), making "tahoe backup' suitable to run from a daily cronjob.
|
|
|
|
Note that the "tahoe backup" CLI command must be used in conjunction with a
|
|
1.3.0-or-newer Tahoe client node; there was a bug in the 1.2.0 webapi
|
|
implementation that would prevent the last step (create $target/Latest) from
|
|
working.
|
|
|
|
** Large Files
|
|
|
|
The 12GiB (approximate) immutable-file-size limitation is lifted. This
|
|
release knows how to handle so-called "v2 immutable shares", which permit
|
|
immutable files of up to about 18 EiB (about 3*10^14). These v2 shares are
|
|
created if the file to be uploaded is too large to fit into v1 shares. v1
|
|
shares are created if the file is small enough to fit into them, so that
|
|
files created with tahoe-1.3.0 can still be read by earlier versions if they
|
|
are not too large. Note that storage servers also had to be changed to
|
|
support larger files, and this release is the first release in which they are
|
|
able to do that. Clients will detect which servers are capable of supporting
|
|
large files on upload and will not attempt to upload shares of a large file
|
|
to a server which doesn't support it.
|
|
|
|
** FTP/SFTP Server
|
|
|
|
Tahoe now includes experimental FTP and SFTP servers. When configured with a
|
|
suitable method to translate username+password into a root directory cap, it
|
|
provides simple access to the virtual filesystem. Remember that FTP is
|
|
completely unencrypted: passwords, filenames, and file contents are all sent
|
|
over the wire in cleartext, so FTP should only be used on a local (127.0.0.1)
|
|
connection. This feature is still in development: there are no unit tests
|
|
yet, and behavior with respect to Unicode filenames is uncertain. Please see
|
|
docs/frontends/FTP-and-SFTP.txt for configuration details. (#512, #531)
|
|
|
|
** CLI Changes
|
|
|
|
This release adds the 'tahoe create-alias' command, which is a combination of
|
|
'tahoe mkdir' and 'tahoe add-alias'. This also allows you to start using a
|
|
new tahoe directory without exposing its URI in the argv list, which is
|
|
publicly visible (through the process table) on most unix systems. Thanks to
|
|
Kevin Reid for bringing this issue to our attention.
|
|
|
|
The single-argument form of "tahoe put" was changed to create an unlinked
|
|
file. I.e. "tahoe put bar.txt" will take the contents of a local "bar.txt"
|
|
file, upload them to the grid, and print the resulting read-cap; the file
|
|
will not be attached to any directories. This seemed a bit more useful than
|
|
the previous behavior (copy stdin, upload to the grid, attach the resulting
|
|
file into your default tahoe: alias in a child named 'bar.txt').
|
|
|
|
"tahoe put" was also fixed to handle mutable files correctly: "tahoe put
|
|
bar.txt URI:SSK:..." will read the contents of the local bar.txt and use them
|
|
to replace the contents of the given mutable file.
|
|
|
|
The "tahoe webopen" command was modified to accept aliases. This means "tahoe
|
|
webopen tahoe:" will cause your web browser to open to a "wui" page that
|
|
gives access to the directory associated with the default "tahoe:" alias. It
|
|
should also accept leading slashes, like "tahoe webopen tahoe:/stuff".
|
|
|
|
Many esoteric debugging commands were moved down into a "debug" subcommand:
|
|
|
|
tahoe debug dump-cap
|
|
tahoe debug dump-share
|
|
tahoe debug find-shares
|
|
tahoe debug catalog-shares
|
|
tahoe debug corrupt-share
|
|
|
|
The last command ("tahoe debug corrupt-share") flips a random bit of the
|
|
given local sharefile. This is used to test the file verifying/repairing
|
|
code, and obviously should not be used on user data.
|
|
|
|
The cli might not correctly handle arguments which contain non-ascii
|
|
characters in Tahoe v1.3 (although depending on your platform it
|
|
might, especially if your platform can be configured to pass such
|
|
characters on the command-line in utf-8 encoding). See
|
|
http://allmydata.org/trac/tahoe/ticket/565 for details.
|
|
|
|
** Web changes
|
|
|
|
The "default webapi port", used when creating a new client node (and in the
|
|
getting-started documentation), was changed from 8123 to 3456, to reduce
|
|
confusion when Tahoe accessed through a Firefox browser on which the
|
|
"Torbutton" extension has been installed. Port 8123 is occasionally used as a
|
|
Tor control port, so Torbutton adds 8123 to Firefox's list of "banned ports"
|
|
to avoid CSRF attacks against Tor. Once 8123 is banned, it is difficult to
|
|
diagnose why you can no longer reach a Tahoe node, so the Tahoe default was
|
|
changed. Note that 3456 is reserved by IANA for the "vat" protocol, but there
|
|
are argueably more Torbutton+Tahoe users than vat users these days. Note that
|
|
this will only affect newly-created client nodes. Pre-existing client nodes,
|
|
created by earlier versions of tahoe, may still be listening on 8123.
|
|
|
|
All deep-traversal operations (start-manifest, start-deep-size,
|
|
start-deep-stats, start-deep-check) now use a start-and-poll approach,
|
|
instead of using a single (fragile) long-running synchronous HTTP connection.
|
|
All these "start-" operations use POST instead of GET. The old "GET
|
|
manifest", "GET deep-size", and "POST deep-check" operations have been
|
|
removed.
|
|
|
|
The new "POST start-manifest" operation, when it finally completes, results
|
|
in a table of (path,cap), instead of the list of verifycaps produced by the
|
|
old "GET manifest". The table is available in several formats: use
|
|
output=html, output=text, or output=json to choose one. The JSON output also
|
|
includes stats, and a list of verifycaps and storage-index strings.
|
|
|
|
The "return_to=" and "when_done=" arguments have been removed from the
|
|
t=check and deep-check operations.
|
|
|
|
The top-level status page (/status) now has a machine-readable form, via
|
|
"/status/?t=json". This includes information about the currently-active
|
|
uploads and downloads, which may be useful for frontends that wish to display
|
|
progress information. There is no easy way to correlate the activities
|
|
displayed here with recent wapi requests, however.
|
|
|
|
Any files in BASEDIR/public_html/ (configurable) will be served in response
|
|
to requests in the /static/ portion of the URL space. This will simplify the
|
|
deployment of javascript-based frontends that can still access wapi calls
|
|
by conforming to the (regrettable) "same-origin policy".
|
|
|
|
The welcome page now has a "Report Incident" button, which is tied into the
|
|
"Incident Gatherer" machinery. If the node is attached to an incident
|
|
gatherer (via log_gatherer.furl), then pushing this button will cause an
|
|
Incident to be signalled: this means recent log events are aggregated and
|
|
sent in a bundle to the gatherer. The user can push this button after
|
|
something strange takes place (and they can provide a short message to go
|
|
along with it), and the relevant data will be delivered to a centralized
|
|
incident-gatherer for later processing by operations staff.
|
|
|
|
The "HEAD" method should now work correctly, in addition to the usual "GET",
|
|
"PUT", and "POST" methods. "HEAD" is supposed to return exactly the same
|
|
headers as "GET" would, but without any of the actual response body data. For
|
|
mutable files, this now does a brief mapupdate (to figure out the size of the
|
|
file that would be returned), without actually retrieving the file's
|
|
contents.
|
|
|
|
The "GET" operation on files can now support the HTTP "Range:" header,
|
|
allowing requests for partial content. This allows certain media players to
|
|
correctly stream audio and movies out of a Tahoe grid. The current
|
|
implementation uses a disk-based cache in BASEDIR/private/cache/download ,
|
|
which holds the plaintext of the files being downloaded. Future
|
|
implementations might not use this cache. GET for immutable files now returns
|
|
an ETag header.
|
|
|
|
Each file and directory now has a "Show More Info" web page, which contains
|
|
much of the information that was crammed into the directory page before. This
|
|
includes readonly URIs, storage index strings, object type, buttons to
|
|
control checking/verifying/repairing, and deep-check/deep-stats buttons (for
|
|
directories). For mutable files, the "replace contents" upload form has been
|
|
moved here too. As a result, the directory page is now much simpler and
|
|
cleaner, and several potentially-misleading links (like t=uri) are now gone.
|
|
|
|
Slashes are discouraged in Tahoe file/directory names, since they cause
|
|
problems when accessing the filesystem through the wapi. However, there are
|
|
a couple of accidental ways to generate such names. This release tries to
|
|
make it easier to correct such mistakes by escaping slashes in several
|
|
places, allowing slashes in the t=info and t=delete commands, and in the
|
|
source (but not the target) of a t=rename command.
|
|
|
|
** Packaging
|
|
|
|
Tahoe's dependencies have been extended to require the "[secure_connections]"
|
|
feature from Foolscap, which will cause pyOpenSSL to be required and/or
|
|
installed. If OpenSSL and its development headers are already installed on
|
|
your system, this can occur automatically. Tahoe now uses pollreactor
|
|
(instead of the default selectreactor) to work around a bug between pyOpenSSL
|
|
and the most recent release of Twisted (8.1.0). This bug only affects unit
|
|
tests (hang during shutdown), and should not impact regular use.
|
|
|
|
The Tahoe source code tarballs now come in two different forms: regular and
|
|
"sumo". The regular tarball contains just Tahoe, nothing else. When building
|
|
from the regular tarball, the build process will download any unmet
|
|
dependencies from the internet (starting with the index at PyPI) so it can
|
|
build and install them. The "sumo" tarball contains copies of all the
|
|
libraries that Tahoe requires (foolscap, twisted, zfec, etc), so using the
|
|
"sumo" tarball should not require any internet access during the build
|
|
process. This can be useful if you want to build Tahoe while on an airplane,
|
|
a desert island, or other bandwidth-limited environments.
|
|
|
|
Similarly, allmydata.org now hosts a "tahoe-deps" tarball which contains the
|
|
latest versions of all these dependencies. This tarball, located at
|
|
http://allmydata.org/source/tahoe/deps/tahoe-deps.tar.gz, can be unpacked in
|
|
the tahoe source tree (or in its parent directory), and the build process
|
|
should satisfy its downloading needs from it instead of reaching out to PyPI.
|
|
This can be useful if you want to build Tahoe from a darcs checkout while on
|
|
that airplane or desert island.
|
|
|
|
Because of the previous two changes ("sumo" tarballs and the "tahoe-deps"
|
|
bundle), most of the files have been removed from misc/dependencies/ . This
|
|
brings the regular Tahoe tarball down to 2MB (compressed), and the darcs
|
|
checkout (without history) to about 7.6MB. A full darcs checkout will still
|
|
be fairly large (because of the historical patches which included the
|
|
dependent libraries), but a 'lazy' one should now be small.
|
|
|
|
The default "make" target is now an alias for "setup.py build", which itself
|
|
is an alias for "setup.py develop --prefix support", with some extra work
|
|
before and after (see setup.cfg). Most of the complicated platform-dependent
|
|
code in the Makefile was rewritten in Python and moved into setup.py,
|
|
simplifying things considerably.
|
|
|
|
Likewise, the "make test" target now delegates most of its work to "setup.py
|
|
test", which takes care of getting PYTHONPATH configured to access the tahoe
|
|
code (and dependencies) that gets put in support/lib/ by the build_tahoe
|
|
step. This should allow unit tests to be run even when trial (which is part
|
|
of Twisted) wasn't already installed (in this case, trial gets installed to
|
|
support/bin because Twisted is a dependency of Tahoe).
|
|
|
|
Tahoe is now compatible with the recently-released Python 2.6 , although it
|
|
is recommended to use Tahoe on Python 2.5, on which it has received more
|
|
thorough testing and deployment.
|
|
|
|
Tahoe is now compatible with simplejson-2.0.x . The previous release assumed
|
|
that simplejson.loads always returned unicode strings, which is no longer the
|
|
case in 2.0.x .
|
|
|
|
** Grid Management Tools
|
|
|
|
Several tools have been added or updated in the misc/ directory, mostly munin
|
|
plugins that can be used to monitor a storage grid.
|
|
|
|
The misc/spacetime/ directory contains a "disk watcher" daemon (startable
|
|
with 'tahoe start'), which can be configured with a set of HTTP URLs
|
|
(pointing at the wapi '/statistics' page of a bunch of storage servers),
|
|
and will periodically fetch disk-used/disk-available information from all the
|
|
servers. It keeps this information in an Axiom database (a sqlite-based
|
|
library available from divmod.org). The daemon computes time-averaged rates
|
|
of disk usage, as well as a prediction of how much time is left before the
|
|
grid is completely full.
|
|
|
|
The misc/munin/ directory contains a new set of munin plugins
|
|
(tahoe_diskleft, tahoe_diskusage, tahoe_doomsday) which talk to the
|
|
disk-watcher and provide graphs of its calculations.
|
|
|
|
To support the disk-watcher, the Tahoe statistics component (visible through
|
|
the wapi at the /statistics/ URL) now includes disk-used and disk-available
|
|
information. Both are derived through an equivalent of the unix 'df' command
|
|
(i.e. they ask the kernel for the number of free blocks on the partition that
|
|
encloses the BASEDIR/storage directory). In the future, the disk-available
|
|
number will be further influenced by the local storage policy: if that policy
|
|
says that the server should refuse new shares when less than 5GB is left on
|
|
the partition, then "disk-available" will report zero even though the kernel
|
|
sees 5GB remaining.
|
|
|
|
The 'tahoe_overhead' munin plugin interacts with an allmydata.com-specific
|
|
server which reports the total of the 'deep-size' reports for all active user
|
|
accounts, compares this with the disk-watcher data, to report on overhead
|
|
percentages. This provides information on how much space could be recovered
|
|
once Tahoe implements some form of garbage collection.
|
|
|
|
** Configuration Changes: single INI-format tahoe.cfg file
|
|
|
|
The Tahoe node is now configured with a single INI-format file, named
|
|
"tahoe.cfg", in the node's base directory. Most of the previous
|
|
multiple-separate-files are still read for backwards compatibility (the
|
|
embedded SSH debug server and the advertised_ip_addresses files are the
|
|
exceptions), but new directives will only be added to tahoe.cfg . The "tahoe
|
|
create-client" command will create a tahoe.cfg for you, with sample values
|
|
commented out. (ticket #518)
|
|
|
|
tahoe.cfg now has controls for the foolscap "keepalive" and "disconnect"
|
|
timeouts (#521).
|
|
|
|
tahoe.cfg now has controls for the encoding parameters: "shares.needed" and
|
|
"shares.total" in the "[client]" section. The default parameters are still
|
|
3-of-10.
|
|
|
|
The inefficient storage 'sizelimit' control (which established an upper bound
|
|
on the amount of space that a storage server is allowed to consume) has been
|
|
replaced by a lightweight 'reserved_space' control (which establishes a lower
|
|
bound on the amount of remaining space). The storage server will reject all
|
|
writes that would cause the remaining disk space (as measured by a '/bin/df'
|
|
equivalent) to drop below this value. The "[storage]reserved_space="
|
|
tahoe.cfg parameter controls this setting. (note that this only affects
|
|
immutable shares: it is an outstanding bug that reserved_space does not
|
|
prevent the allocation of new mutable shares, nor does it prevent the growth
|
|
of existing mutable shares).
|
|
|
|
** Other Changes
|
|
|
|
Clients now declare which versions of the protocols they support. This is
|
|
part of a new backwards-compatibility system:
|
|
http://allmydata.org/trac/tahoe/wiki/Versioning .
|
|
|
|
The version strings for human inspection (as displayed on the Welcome web
|
|
page, and included in logs) now includes a platform identifer (frequently
|
|
including a linux distribution name, processor architecture, etc).
|
|
|
|
Several bugs have been fixed, including one that would cause an exception (in
|
|
the logs) if a wapi download operation was cancelled (by closing the TCP
|
|
connection, or pushing the "stop" button in a web browser).
|
|
|
|
Tahoe now uses Foolscap "Incidents", writing an "incident report" file to
|
|
logs/incidents/ each time something weird occurs. These reports are available
|
|
to an "incident gatherer" through the flogtool command. For more details,
|
|
please see the Foolscap logging documentation. An incident-classifying plugin
|
|
function is provided in misc/incident-gatherer/classify_tahoe.py .
|
|
|
|
If clients detect corruption in shares, they now automatically report it to
|
|
the server holding that share, if it is new enough to accept the report.
|
|
These reports are written to files in BASEDIR/storage/corruption-advisories .
|
|
|
|
The 'nickname' setting is now defined to be a UTF-8 -encoded string, allowing
|
|
non-ascii nicknames.
|
|
|
|
The 'tahoe start' command will now accept a --syslog argument and pass it
|
|
through to twistd, making it easier to launch non-Tahoe nodes (like the
|
|
cpu-watcher) and have them log to syslogd instead of a local file. This is
|
|
useful when running a Tahoe node out of a USB flash drive.
|
|
|
|
The Mac GUI in src/allmydata/gui/ has been improved.
|
|
|
|
|
|
* Release 1.2.0 (2008-07-21)
|
|
|
|
** Security
|
|
|
|
This release makes the immutable-file "ciphertext hash tree" mandatory.
|
|
Previous releases allowed the uploader to decide whether their file would
|
|
have an integrity check on the ciphertext or not. A malicious uploader could
|
|
use this to create a readcap that would download as one file or a different
|
|
one, depending upon which shares the client fetched first, with no errors
|
|
raised. There are other integrity checks on the shares themselves, preventing
|
|
a storage server or other party from violating the integrity properties of
|
|
the read-cap: this failure was only exploitable by the uploader who gives you
|
|
a carefully constructed read-cap. If you download the file with Tahoe 1.2.0
|
|
or later, you will not be vulnerable to this problem. #491
|
|
|
|
This change does not introduce a compatibility issue, because all existing
|
|
versions of Tahoe will emit the ciphertext hash tree in their shares.
|
|
|
|
** Dependencies
|
|
|
|
Tahoe now requires Foolscap-0.2.9 . It also requires pycryptopp 0.5 or newer,
|
|
since earlier versions had a bug that interacted with specific compiler
|
|
versions that could sometimes result in incorrect encryption behavior. Both
|
|
packages are included in the Tahoe source tarball in misc/dependencies/ , and
|
|
should be built automatically when necessary.
|
|
|
|
** Web API
|
|
|
|
Web API directory pages should now contain properly-slash-terminated links to
|
|
other directories. They have also stopped using absolute links in forms and
|
|
pages (which interfered with the use of a front-end load-balancing proxy).
|
|
|
|
The behavior of the "Check This File" button changed, in conjunction with
|
|
larger internal changes to file checking/verification. The button triggers an
|
|
immediate check as before, but the outcome is shown on its own page, and does
|
|
not get stored anywhere. As a result, the web directory page no longer shows
|
|
historical checker results.
|
|
|
|
A new "Deep-Check" button has been added, which allows a user to initiate a
|
|
recursive check of the given directory and all files and directories
|
|
reachable from it. This can cause quite a bit of work, and has no
|
|
intermediate progress information or feedback about the process. In addition,
|
|
the results of the deep-check are extremely limited. A later release will
|
|
improve this behavior.
|
|
|
|
The web server's behavior with respect to non-ASCII (unicode) filenames in
|
|
the "GET save=true" operation has been improved. To achieve maximum
|
|
compatibility with variously buggy web browsers, the server does not try to
|
|
figure out the character set of the inbound filename. It just echoes the same
|
|
bytes back to the browser in the Content-Disposition header. This seems to
|
|
make both IE7 and Firefox work correctly.
|
|
|
|
** Checker/Verifier/Repairer
|
|
|
|
Tahoe is slowly acquiring convenient tools to check up on file health,
|
|
examine existing shares for errors, and repair files that are not fully
|
|
healthy. This release adds a mutable checker/verifier/repairer, although
|
|
testing is very limited, and there are no web interfaces to trigger repair
|
|
yet. The "Check" button next to each file or directory on the wapi page
|
|
will perform a file check, and the "deep check" button on each directory will
|
|
recursively check all files and directories reachable from there (which may
|
|
take a very long time).
|
|
|
|
Future releases will improve access to this functionality.
|
|
|
|
** Operations/Packaging
|
|
|
|
A "check-grid" script has been added, along with a Makefile target. This is
|
|
intended (with the help of a pre-configured node directory) to check upon the
|
|
health of a Tahoe grid, uploading and downloading a few files. This can be
|
|
used as a monitoring tool for a deployed grid, to be run periodically and to
|
|
signal an error if it ever fails. It also helps with compatibility testing,
|
|
to verify that the latest Tahoe code is still able to handle files created by
|
|
an older version.
|
|
|
|
The munin plugins from misc/munin/ are now copied into any generated debian
|
|
packages, and are made executable (and uncompressed) so they can be symlinked
|
|
directly from /etc/munin/plugins/ .
|
|
|
|
Ubuntu "Hardy" was added as a supported debian platform, with a Makefile
|
|
target to produce hardy .deb packages. Some notes have been added to
|
|
docs/debian.txt about building Tahoe on a debian/ubuntu system.
|
|
|
|
Storage servers now measure operation rates and latency-per-operation, and
|
|
provides results through the /statistics web page as well as the stats
|
|
gatherer. Munin plugins have been added to match.
|
|
|
|
** Other
|
|
|
|
Tahoe nodes now use Foolscap "incident logging" to record unusual events to
|
|
their NODEDIR/logs/incidents/ directory. These incident files can be examined
|
|
by Foolscap logging tools, or delivered to an external log-gatherer for
|
|
further analysis. Note that Tahoe now requires Foolscap-0.2.9, since 0.2.8
|
|
had a bug that complained about "OSError: File exists" when trying to create
|
|
the incidents/ directory for a second time.
|
|
|
|
If no servers are available when retrieving a mutable file (like a
|
|
directory), the node now reports an error instead of hanging forever. Earlier
|
|
releases would not only hang (causing the wapi directory listing to get
|
|
stuck half-way through), but the internal dirnode serialization would cause
|
|
all subsequent attempts to retrieve or modify the same directory to hang as
|
|
well. #463
|
|
|
|
A minor internal exception (reported in logs/twistd.log, in the
|
|
"stopProducing" method) was fixed, which complained about "self._paused_at
|
|
not defined" whenever a file download was stopped from the web browser end.
|
|
|
|
|
|
* Release 1.1.0 (2008-06-11)
|
|
|
|
** CLI: new "alias" model
|
|
|
|
The new CLI code uses an scp/rsync -like interface, in which directories in
|
|
the Tahoe storage grid are referenced by a colon-suffixed alias. The new
|
|
commands look like:
|
|
tahoe cp local.txt tahoe:virtual.txt
|
|
tahoe ls work:subdir
|
|
|
|
More functionality is available through the CLI: creating unlinked files and
|
|
directories, recursive copy in or out of the storage grid, hardlinks, and
|
|
retrieving the raw read- or write- caps through the 'ls' command. Please read
|
|
docs/CLI.txt for complete details.
|
|
|
|
** wapi: new pages, new commands
|
|
|
|
Several new pages were added to the web API:
|
|
|
|
/helper_status : to describe what a Helper is doing
|
|
/statistics : reports node uptime, CPU usage, other stats
|
|
/file : for easy file-download URLs, see #221
|
|
/cap == /uri : future compatibility
|
|
|
|
The localdir=/localfile= and t=download operations were removed. These
|
|
required special configuration to enable anyways, but this feature was a
|
|
security problem, and was mostly obviated by the new "cp -r" command.
|
|
|
|
Several new options to the GET command were added:
|
|
|
|
t=deep-size : add up the size of all immutable files reachable from the directory
|
|
t=deep-stats : return a JSON-encoded description of number of files, size
|
|
distribution, total size, etc
|
|
|
|
POST is now preferred over PUT for most operations which cause side-effects.
|
|
|
|
Most wapi calls now accept overwrite=, and default to overwrite=true .
|
|
|
|
"POST /uri/DIRCAP/parent/child?t=mkdir" is now the preferred API to create
|
|
multiple directories at once, rather than ...?t=mkdir-p .
|
|
|
|
PUT to a mutable file ("PUT /uri/MUTABLEFILECAP", "PUT /uri/DIRCAP/child")
|
|
will modify the file in-place.
|
|
|
|
** more munin graphs in misc/munin/
|
|
|
|
tahoe-introstats
|
|
tahoe-rootdir-space
|
|
tahoe_estimate_files
|
|
mutable files published/retrieved
|
|
tahoe_cpu_watcher
|
|
tahoe_spacetime
|
|
|
|
** New Dependencies
|
|
|
|
zfec 1.1.0
|
|
foolscap 0.2.8
|
|
pycryptopp 0.5
|
|
setuptools (now required at runtime)
|
|
|
|
** New Mutable-File Code
|
|
|
|
The mutable-file handling code (mostly used for directories) has been
|
|
completely rewritten. The new scheme has a better API (with a modify()
|
|
method) and is less likely to lose data when several uncoordinated writers
|
|
change a file at the same time.
|
|
|
|
In addition, a single Tahoe process will coordinate its own writes. If you
|
|
make two concurrent directory-modifying wapi calls to a single tahoe node,
|
|
it will internally make one of them wait for the other to complete. This
|
|
prevents auto-collision (#391).
|
|
|
|
The new mutable-file code also detects errors during publish better. Earlier
|
|
releases might believe that a mutable file was published when in fact it
|
|
failed.
|
|
|
|
** other features
|
|
|
|
The node now monitors its own CPU usage, as a percentage, measured every 60
|
|
seconds. 1/5/15 minute moving averages are available on the /statistics web
|
|
page and via the stats-gathering interface.
|
|
|
|
Clients now accelerate reconnection to all servers after being offline
|
|
(#374). When a client is offline for a long time, it scales back reconnection
|
|
attempts to approximately once per hour, so it may take a while to make the
|
|
first attempt, but once any attempt succeeds, the other server connections
|
|
will be retried immediately.
|
|
|
|
A new "offloaded KeyGenerator" facility can be configured, to move RSA key
|
|
generation out from, say, a wapi node, into a separate process. RSA keys
|
|
can take several seconds to create, and so a wapi node which is being used
|
|
for directory creation will be unavailable for anything else during this
|
|
time. The Key Generator process will pre-compute a small pool of keys, to
|
|
speed things up further. This also takes better advantage of multi-core CPUs,
|
|
or SMP hosts.
|
|
|
|
The node will only use a potentially-slow "du -s" command at startup (to
|
|
measure how much space has been used) if the "sizelimit" parameter has been
|
|
configured (to limit how much space is used). Large storage servers should
|
|
turn off sizelimit until a later release improves the space-management code,
|
|
since "du -s" on a terabyte filesystem can take hours.
|
|
|
|
The Introducer now allows new announcements to replace old ones, to avoid
|
|
buildups of obsolete announcements.
|
|
|
|
Immutable files are limited to about 12GiB (when using the default 3-of-10
|
|
encoding), because larger files would be corrupted by the four-byte
|
|
share-size field on the storage servers (#439). A later release will remove
|
|
this limit. Earlier releases would allow >12GiB uploads, but the resulting
|
|
file would be unretrievable.
|
|
|
|
The docs/ directory has been rearranged, with old docs put in
|
|
docs/historical/ and not-yet-implemented ones in docs/proposed/ .
|
|
|
|
The Mac OS-X FUSE plugin has a significant bug fix: earlier versions would
|
|
corrupt writes that used seek() instead of writing the file in linear order.
|
|
The rsync tool is known to perform writes in this order. This has been fixed.
|