mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-19 13:07:56 +00:00
1398 lines
68 KiB
Plaintext
1398 lines
68 KiB
Plaintext
|
|
= The Tahoe REST-ful Web API =
|
|
|
|
1. Enabling the web-API port
|
|
2. Basic Concepts: GET, PUT, DELETE, POST
|
|
3. URLs, Machine-Oriented Interfaces
|
|
4. Browser Operations: Human-Oriented Interfaces
|
|
5. Welcome / Debug / Status pages
|
|
6. Static Files in /public_html
|
|
7. Safety and security issues -- names vs. URIs
|
|
8. Concurrency Issues
|
|
|
|
|
|
== Enabling the web-API port ==
|
|
|
|
Every Tahoe node is capable of running a built-in HTTP server. To enable
|
|
this, just write a port number into the "[node]web.port" line of your node's
|
|
tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"
|
|
section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port
|
|
3456.
|
|
|
|
This string is actually a Twisted "strports" specification, meaning you can
|
|
get more control over the interface to which the server binds by supplying
|
|
additional arguments. For more details, see the documentation on
|
|
twisted.application.strports:
|
|
http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
|
|
|
|
Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same
|
|
but binds to the loopback interface, ensuring that only the programs on the
|
|
local host can connect. Using
|
|
"ssl:3456:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
|
|
|
|
This webport can be set when the node is created by passing a --webport
|
|
option to the 'tahoe create-client' command. By default, the node listens on
|
|
port 3456, on the loopback (127.0.0.1) interface.
|
|
|
|
== Basic Concepts ==
|
|
|
|
As described in architecture.txt, each file and directory in a Tahoe virtual
|
|
filesystem is referenced by an identifier that combines the designation of
|
|
the object with the authority to do something with it (such as read or modify
|
|
the contents). This identifier is called a "read-cap" or "write-cap",
|
|
depending upon whether it enables read-only or read-write access. These
|
|
"caps" are also referred to as URIs.
|
|
|
|
The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
|
|
"REpresentational State Transfer": the original scheme by which the World
|
|
Wide Web was intended to work. Each object (file or directory) is referenced
|
|
by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
|
|
DELETE) are used to manipulate these objects. You can think of the URL as a
|
|
noun, and the method as a verb.
|
|
|
|
In REST, the GET method is used to retrieve information about an object, or
|
|
to retrieve some representation of the object itself. When the object is a
|
|
file, the basic GET method will simply return the contents of that file.
|
|
Other variations (generally implemented by adding query parameters to the
|
|
URL) will return information about the object, such as metadata. GET
|
|
operations are required to have no side-effects.
|
|
|
|
PUT is used to upload new objects into the filesystem, or to replace an
|
|
existing object. DELETE it used to delete objects from the filesystem. Both
|
|
PUT and DELETE are required to be idempotent: performing the same operation
|
|
multiple times must have the same side-effects as only performing it once.
|
|
|
|
POST is used for more complicated actions that cannot be expressed as a GET,
|
|
PUT, or DELETE. POST operations can be thought of as a method call: sending
|
|
some message to the object referenced by the URL. In Tahoe, POST is also used
|
|
for operations that must be triggered by an HTML form (including upload and
|
|
delete), because otherwise a regular web browser has no way to accomplish
|
|
these tasks. In general, everything that can be done with a PUT or DELETE can
|
|
also be done with a POST.
|
|
|
|
Tahoe's web API is designed for two different consumers. The first is a
|
|
program that needs to manipulate the virtual file system. Such programs are
|
|
expected to use the RESTful interface described above. The second is a human
|
|
using a standard web browser to work with the filesystem. This user is given
|
|
a series of HTML pages with links to download files, and forms that use POST
|
|
actions to upload, rename, and delete files.
|
|
|
|
== URLs ==
|
|
|
|
Tahoe uses a variety of read- and write- caps to identify files and
|
|
directories. The most common of these is the "immutable file read-cap", which
|
|
is used for most uploaded files. These read-caps look like the following:
|
|
|
|
URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
|
|
|
|
The next most common is a "directory write-cap", which provides both read and
|
|
write access to a directory, and look like this:
|
|
|
|
URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
|
|
|
|
There are also "directory read-caps", which start with "URI:DIR2-RO:", and
|
|
give read-only access to a directory. Finally there are also mutable file
|
|
read- and write- caps, which start with "URI:SSK", and give access to mutable
|
|
files.
|
|
|
|
(later versions of Tahoe will make these strings shorter, and will remove the
|
|
unfortunate colons, which must be escaped when these caps are embedded in
|
|
URLs).
|
|
|
|
To refer to any Tahoe object through the web API, you simply need to combine
|
|
a prefix (which indicates the HTTP server to use) with the cap (which
|
|
indicates which object inside that server to access). Since the default Tahoe
|
|
webport is 3456, the most common prefix is one that will use a local node
|
|
listening on this port:
|
|
|
|
http://127.0.0.1:3456/uri/ + $CAP
|
|
|
|
So, to access the directory named above (which happens to be the
|
|
publically-writable sample directory on the Tahoe test grid, described at
|
|
http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
|
|
|
|
http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
|
|
|
|
(note that the colons in the directory-cap are url-encoded into "%3A"
|
|
sequences).
|
|
|
|
Likewise, to access the file named above, use:
|
|
|
|
http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
|
|
|
|
In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
|
|
or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
|
|
that refers to a file (whether mutable or immutable). So those URLs above can
|
|
be abbreviated as:
|
|
|
|
http://127.0.0.1:3456/uri/$DIRCAP/
|
|
http://127.0.0.1:3456/uri/$FILECAP
|
|
|
|
The operation summaries below will abbreviate these further, by eliding the
|
|
server prefix. They will be displayed like this:
|
|
|
|
/uri/$DIRCAP/
|
|
/uri/$FILECAP
|
|
|
|
|
|
=== Child Lookup ===
|
|
|
|
Tahoe directories contain named children, just like directories in a regular
|
|
local filesystem. These children can be either files or subdirectories.
|
|
|
|
If you have a Tahoe URL that refers to a directory, and want to reference a
|
|
named child inside it, just append the child name to the URL. For example, if
|
|
our sample directory contains a file named "welcome.txt", we can refer to
|
|
that file with:
|
|
|
|
http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt
|
|
|
|
(or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
|
|
|
|
Multiple levels of subdirectories can be handled this way:
|
|
|
|
http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt
|
|
|
|
In this document, when we need to refer to a URL that references a file using
|
|
this child-of-some-directory format, we'll use the following string:
|
|
|
|
/uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
The "[SUBDIRS../]" part means that there are zero or more (optional)
|
|
subdirectory names in the middle of the URL. The "FILENAME" at the end means
|
|
that this whole URL refers to a file of some sort, rather than to a
|
|
directory.
|
|
|
|
When we need to refer specifically to a directory in this way, we'll write:
|
|
|
|
/uri/$DIRCAP/[SUBDIRS../]SUBDIR
|
|
|
|
|
|
Note that all components of pathnames in URLs are required to be UTF-8
|
|
encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
|
|
with:
|
|
|
|
http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
|
|
|
|
Also note that the filenames inside upload POST forms are interpreted using
|
|
whatever character set was provided in the conventional '_charset' field, and
|
|
defaults to UTF-8 if not otherwise specified. The JSON representation of each
|
|
directory contains native unicode strings. Tahoe directories are specified to
|
|
contain unicode filenames, and cannot contain binary strings that are not
|
|
representable as such.
|
|
|
|
All Tahoe operations that refer to existing files or directories must include
|
|
a suitable read- or write- cap in the URL: the wapi server won't add one
|
|
for you. If you don't know the cap, you can't access the file. This allows
|
|
the security properties of Tahoe caps to be extended across the wapi
|
|
interface.
|
|
|
|
== Slow Operations, Progress, and Cancelling ==
|
|
|
|
Certain operations can be expected to take a long time. The "t=deep-check",
|
|
described below, will recursively visit every file and directory reachable
|
|
from a given starting point, which can take minutes or even hours for
|
|
extremely large directory structures. A single long-running HTTP request is a
|
|
fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
|
|
with waiting and give up on the connection.
|
|
|
|
For this reason, long-running operations have an "operation handle", which
|
|
can be used to poll for status/progress messages while the operation
|
|
proceeds. This handle can also be used to cancel the operation. These handles
|
|
are created by the client, and passed in as a an "ophandle=" query argument
|
|
to the POST or PUT request which starts the operation. The following
|
|
operations can then be used to retrieve status:
|
|
|
|
GET /operations/$HANDLE?output=HTML (with or without t=status)
|
|
GET /operations/$HANDLE?output=JSON (same)
|
|
|
|
These two retrieve the current status of the given operation. Each operation
|
|
presents a different sort of information, but in general the page retrieved
|
|
will indicate:
|
|
|
|
* whether the operation is complete, or if it is still running
|
|
* how much of the operation is complete, and how much is left, if possible
|
|
|
|
Note that the final status output can be quite large: a deep-manifest of a
|
|
directory structure with 300k directories and 200k unique files is about
|
|
275MB of JSON, and might take two minutes to generate. For this reason, the
|
|
full status is not provided until the operation has completed.
|
|
|
|
The HTML form will include a meta-refresh tag, which will cause a regular
|
|
web browser to reload the status page about 60 seconds later. This tag will
|
|
be removed once the operation has completed.
|
|
|
|
There may be more status information available under
|
|
/operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
|
|
|
|
POST /operations/$HANDLE?t=cancel
|
|
|
|
This terminates the operation, and returns an HTML page explaining what was
|
|
cancelled. If the operation handle has already expired (see below), this
|
|
POST will return a 404, which indicates that the operation is no longer
|
|
running (either it was completed or terminated). The response body will be
|
|
the same as a GET /operations/$HANDLE on this operation handle, and the
|
|
handle will be expired immediately afterwards.
|
|
|
|
The operation handle will eventually expire, to avoid consuming an unbounded
|
|
amount of memory. The handle's time-to-live can be reset at any time, by
|
|
passing a retain-for= argument (with a count of seconds) to either the
|
|
initial POST that starts the operation, or the subsequent GET request which
|
|
asks about the operation. For example, if a 'GET
|
|
/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
|
|
handle will remain active for 600 seconds (10 minutes) after the GET was
|
|
received.
|
|
|
|
In addition, if the GET includes a release-after-complete=True argument, and
|
|
the operation has completed, the operation handle will be released
|
|
immediately.
|
|
|
|
If a retain-for= argument is not used, the default handle lifetimes are:
|
|
|
|
* handles will remain valid at least until their operation finishes
|
|
* uncollected handles for finished operations (i.e. handles for operations
|
|
which have finished but for which the GET page has not been accessed since
|
|
completion) will remain valid for one hour, or for the total time consumed
|
|
by the operation, whichever is greater.
|
|
* collected handles (i.e. the GET page has been retrieved at least once
|
|
since the operation completed) will remain valid for ten minutes.
|
|
|
|
Many "slow" operations can begin to use unacceptable amounts of memory when
|
|
operation on large directory structures. The memory usage increases when the
|
|
ophandle is polled, as the results must be copied into a JSON string, sent
|
|
over the wire, then parsed by a client. So, as an alternative, many "slow"
|
|
operations have streaming equivalents. These equivalents do not use operation
|
|
handles. Instead, they emit line-oriented status results immediately. Client
|
|
code can cancel the operation by simply closing the HTTP connection.
|
|
|
|
== Programmatic Operations ==
|
|
|
|
Now that we know how to build URLs that refer to files and directories in a
|
|
Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
|
|
This section contains a catalog of GET, PUT, DELETE, and POST operations that
|
|
can be performed on these URLs. This set of operations are aimed at programs
|
|
that use HTTP to communicate with a Tahoe node. The next section describes
|
|
operations that are intended for web browsers.
|
|
|
|
=== Reading A File ===
|
|
|
|
GET /uri/$FILECAP
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
This will retrieve the contents of the given file. The HTTP response body
|
|
will contain the sequence of bytes that make up the file.
|
|
|
|
To view files in a web browser, you may want more control over the
|
|
Content-Type and Content-Disposition headers. Please see the next section
|
|
"Browser Operations", for details on how to modify these URLs for that
|
|
purpose.
|
|
|
|
=== Writing/Uploading A File ===
|
|
|
|
PUT /uri/$FILECAP
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
Upload a file, using the data from the HTTP request body, and add whatever
|
|
child links and subdirectories are necessary to make the file available at
|
|
the given location. Once this operation succeeds, a GET on the same URL will
|
|
retrieve the same contents that were just uploaded. This will create any
|
|
necessary intermediate subdirectories.
|
|
|
|
To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file.
|
|
|
|
In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
|
|
writable mutable file, that files contents will be overwritten in-place. If
|
|
it is a read-cap for a mutable file, an error will occur. If it is an
|
|
immutable file, the old file will be discarded, and a new one will be put in
|
|
its place.
|
|
|
|
When creating a new file, if "mutable=true" is in the query arguments, the
|
|
operation will create a mutable file instead of an immutable one.
|
|
|
|
This returns the file-cap of the resulting file. If a new file was created
|
|
by this method, the HTTP response code (as dictated by rfc2616) will be set
|
|
to 201 CREATED. If an existing file was replaced or modified, the response
|
|
code will be 200 OK.
|
|
|
|
Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'
|
|
command can be used to invoke this operation.
|
|
|
|
PUT /uri
|
|
|
|
This uploads a file, and produces a file-cap for the contents, but does not
|
|
attach the file into the virtual drive. No directories will be modified by
|
|
this operation. The file-cap is returned as the body of the HTTP response.
|
|
|
|
If "mutable=true" is in the query arguments, the operation will create a
|
|
mutable file, and return its write-cap in the HTTP respose. The default is
|
|
to create an immutable file, returning the read-cap as a response.
|
|
|
|
=== Creating A New Directory ===
|
|
|
|
POST /uri?t=mkdir
|
|
PUT /uri?t=mkdir
|
|
|
|
Create a new empty directory and return its write-cap as the HTTP response
|
|
body. This does not make the newly created directory visible from the
|
|
virtual drive. The "PUT" operation is provided for backwards compatibility:
|
|
new code should use POST.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
|
|
|
|
Create new directories as necessary to make sure that the named target
|
|
($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
|
|
intermediate directories as necessary. If the named target directory already
|
|
exists, this will make no changes to it.
|
|
|
|
This will return an error if a blocking file is present at any of the parent
|
|
names, preventing the server from creating the necessary parent directory.
|
|
|
|
The write-cap of the new directory will be returned as the HTTP response
|
|
body.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
|
|
|
|
Create a new empty directory and attach it to the given existing directory.
|
|
This will create additional intermediate directories as necessary.
|
|
|
|
The URL of this form points to the parent of the bottom-most new directory,
|
|
whereas the previous form has a URL that points directly to the bottom-most
|
|
new directory.
|
|
|
|
=== Get Information About A File Or Directory (as JSON) ===
|
|
|
|
GET /uri/$FILECAP?t=json
|
|
GET /uri/$DIRCAP?t=json
|
|
GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
|
|
|
|
This returns a machine-parseable JSON-encoded description of the given
|
|
object. The JSON always contains a list, and the first element of the list
|
|
is always a flag that indicates whether the referenced object is a file or a
|
|
directory. If it is a file, then the information includes file size and URI,
|
|
like this:
|
|
|
|
GET /uri/$FILECAP?t=json :
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
|
|
|
|
[ "filenode", { "ro_uri": file_uri,
|
|
"verify_uri": verify_uri,
|
|
"size": bytes,
|
|
"mutable": false,
|
|
"metadata": {"ctime": 1202777696.7564139,
|
|
"mtime": 1202777696.7564139
|
|
}
|
|
} ]
|
|
|
|
If it is a directory, then it includes information about the children of
|
|
this directory, as a mapping from child name to a set of data about the
|
|
child (the same data that would appear in a corresponding GET?t=json of the
|
|
child itself). The child entries also include metadata about each child,
|
|
including creation- and modification- timestamps. The output looks like
|
|
this:
|
|
|
|
GET /uri/$DIRCAP?t=json :
|
|
GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
|
|
|
|
[ "dirnode", { "rw_uri": read_write_uri,
|
|
"ro_uri": read_only_uri,
|
|
"verify_uri": verify_uri,
|
|
"mutable": true,
|
|
"children": {
|
|
"foo.txt": [ "filenode", { "ro_uri": uri,
|
|
"size": bytes,
|
|
"metadata": {
|
|
"ctime": 1202777696.7564139,
|
|
"mtime": 1202777696.7564139
|
|
}
|
|
} ],
|
|
"subdir": [ "dirnode", { "rw_uri": rwuri,
|
|
"ro_uri": rouri,
|
|
"metadata": {
|
|
"ctime": 1202778102.7589991,
|
|
"mtime": 1202778111.2160511,
|
|
}
|
|
} ]
|
|
} } ]
|
|
|
|
In the above example, note how 'children' is a dictionary in which the keys
|
|
are child names and the values depend upon whether the child is a file or a
|
|
directory. The value is mostly the same as the JSON representation of the
|
|
child object (except that directories do not recurse -- the "children"
|
|
entry of the child is omitted, and the directory view includes the metadata
|
|
that is stored on the directory edge).
|
|
|
|
Then the rw_uri field will be present in the information about a directory
|
|
if and only if you have read-write access to that directory. The verify_uri
|
|
field will be presend if and only if the object has a verify-cap
|
|
(non-distributed LIT files do not have verify-caps).
|
|
|
|
|
|
=== Attaching an existing File or Directory by its read- or write- cap ===
|
|
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
|
|
|
|
This attaches a child object (either a file or directory) to a specified
|
|
location in the virtual filesystem. The child object is referenced by its
|
|
read- or write- cap, as provided in the HTTP request body. This will create
|
|
intermediate directories as necessary.
|
|
|
|
This is similar to a UNIX hardlink: by referencing a previously-uploaded
|
|
file (or previously-created directory) instead of uploading/creating a new
|
|
one, you can create two references to the same object.
|
|
|
|
The read- or write- cap of the child is provided in the body of the HTTP
|
|
request, and this same cap is returned in the response body.
|
|
|
|
The default behavior is to overwrite any existing object at the same
|
|
location. To prevent this (and make the operation return an error instead of
|
|
overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
|
|
With replace=false, this operation will return an HTTP 409 "Conflict" error
|
|
if there is already an object at the given location, rather than overwriting
|
|
the existing object. Note that "true", "t", and "1" are all synonyms for
|
|
"True", and "false", "f", and "0" are synonyms for "False". the parameter is
|
|
case-insensitive.
|
|
|
|
=== Deleting a File or Directory ===
|
|
|
|
DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
|
|
|
|
This removes the given name from its parent directory. CHILDNAME is the
|
|
name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
|
|
be modified.
|
|
|
|
Note that this does not actually delete the file or directory that the name
|
|
points to from the tahoe grid -- it only removes the named reference from
|
|
this directory. If there are other names in this directory or in other
|
|
directories that point to the resource, then it will remain accessible
|
|
through those paths. Even if all names pointing to this object are removed
|
|
from their parent directories, then someone with possession of its read-cap
|
|
can continue to access the object through that cap.
|
|
|
|
The object will only become completely unreachable once 1: there are no
|
|
reachable directories that reference it, and 2: nobody is holding a read-
|
|
or write- cap to the object. (This behavior is very similar to the way
|
|
hardlinks and anonymous files work in traditional unix filesystems).
|
|
|
|
This operation will not modify more than a single directory. Intermediate
|
|
directories which were implicitly created by PUT or POST methods will *not*
|
|
be automatically removed by DELETE.
|
|
|
|
This method returns the file- or directory- cap of the object that was just
|
|
removed.
|
|
|
|
== Browser Operations ==
|
|
|
|
This section describes the HTTP operations that provide support for humans
|
|
running a web browser. Most of these operations use HTML forms that use POST
|
|
to drive the Tahoe node.
|
|
|
|
Note that for all POST operations, the arguments listed can be provided
|
|
either as URL query arguments or as form body fields. URL query arguments are
|
|
separated from the main URL by "?", and from each other by "&". For example,
|
|
"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
|
|
specified by using <input type="hidden"> elements. For clarity, the
|
|
descriptions below display the most significant arguments as URL query args.
|
|
|
|
=== Viewing A Directory (as HTML) ===
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]
|
|
|
|
This returns an HTML page, intended to be displayed to a human by a web
|
|
browser, which contains HREF links to all files and directories reachable
|
|
from this directory. These HREF links do not have a t= argument, meaning
|
|
that a human who follows them will get pages also meant for a human. It also
|
|
contains forms to upload new files, and to delete files and directories.
|
|
Those forms use POST methods to do their job.
|
|
|
|
=== Viewing/Downloading a File ===
|
|
|
|
GET /uri/$FILECAP
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
This will retrieve the contents of the given file. The HTTP response body
|
|
will contain the sequence of bytes that make up the file.
|
|
|
|
If you want the HTTP response to include a useful Content-Type header,
|
|
either use the second form (which starts with a $DIRCAP), or add a
|
|
"filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
|
|
The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
|
|
to determine a Content-Type (since Tahoe immutable files are merely
|
|
sequences of bytes, not typed+named file objects).
|
|
|
|
If the URL has both filename= and "save=true" in the query arguments, then
|
|
the server to add a "Content-Disposition: attachment" header, along with a
|
|
filename= parameter. When a user clicks on such a link, most browsers will
|
|
offer to let the user save the file instead of displaying it inline (indeed,
|
|
most browsers will refuse to display it inline). "true", "t", "1", and other
|
|
case-insensitive equivalents are all treated the same.
|
|
|
|
Character-set handling in URLs and HTTP headers is a dubious art[1]. For
|
|
maximum compatibility, Tahoe simply copies the bytes from the filename=
|
|
argument into the Content-Disposition header's filename= parameter, without
|
|
trying to interpret them in any particular way.
|
|
|
|
|
|
GET /named/$FILECAP/FILENAME
|
|
|
|
This is an alternate download form which makes it easier to get the correct
|
|
filename. The Tahoe server will provide the contents of the given file, with
|
|
a Content-Type header derived from the given filename. This form is used to
|
|
get browsers to use the "Save Link As" feature correctly, and also helps
|
|
command-line tools like "wget" and "curl" use the right filename. Note that
|
|
this form can *only* be used with file caps; it is an error to use a
|
|
directory cap after the /named/ prefix.
|
|
|
|
=== Get Information About A File Or Directory (as HTML) ===
|
|
|
|
GET /uri/$FILECAP?t=info
|
|
GET /uri/$DIRCAP/?t=info
|
|
GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info
|
|
|
|
This returns a human-oriented HTML page with more detail about the selected
|
|
file or directory object. This page contains the following items:
|
|
|
|
object size
|
|
storage index
|
|
JSON representation
|
|
raw contents (text/plain)
|
|
access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
|
|
check/verify/repair form
|
|
deep-check/deep-size/deep-stats/manifest (for directories)
|
|
replace-conents form (for mutable files)
|
|
|
|
=== Creating a Directory ===
|
|
|
|
POST /uri?t=mkdir
|
|
|
|
This creates a new directory, but does not attach it to the virtual
|
|
filesystem.
|
|
|
|
If a "redirect_to_result=true" argument is provided, then the HTTP response
|
|
will cause the web browser to be redirected to a /uri/$DIRCAP page that
|
|
gives access to the newly-created directory. If you bookmark this page,
|
|
you'll be able to get back to the directory again in the future. This is the
|
|
recommended way to start working with a Tahoe server: create a new unlinked
|
|
directory (using redirect_to_result=true), then bookmark the resulting
|
|
/uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page
|
|
to invoke this action.
|
|
|
|
If "redirect_to_result=true" is not provided (or is given a value of
|
|
"false"), then the HTTP response body will simply be the write-cap of the
|
|
new directory.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
|
|
|
|
This creates a new directory as a child of the designated SUBDIR. This will
|
|
create additional intermediate directories as necessary.
|
|
|
|
If a "when_done=URL" argument is provided, the HTTP response will cause the
|
|
web browser to redirect to the given URL. This provides a convenient way to
|
|
return the browser to the directory that was just modified. Without a
|
|
when_done= argument, the HTTP response will simply contain the write-cap of
|
|
the directory that was just created.
|
|
|
|
|
|
=== Uploading a File ===
|
|
|
|
POST /uri?t=upload
|
|
|
|
This uploads a file, and produces a file-cap for the contents, but does not
|
|
attach the file into the virtual drive. No directories will be modified by
|
|
this operation.
|
|
|
|
The file must be provided as the "file" field of an HTML encoded form body,
|
|
produced in response to an HTML form like this:
|
|
<form action="/uri" method="POST" enctype="multipart/form-data">
|
|
<input type="hidden" name="t" value="upload" />
|
|
<input type="file" name="file" />
|
|
<input type="submit" value="Upload Unlinked" />
|
|
</form>
|
|
|
|
If a "when_done=URL" argument is provided, the response body will cause the
|
|
browser to redirect to the given URL. If the when_done= URL has the string
|
|
"%(uri)s" in it, that string will be replaced by a URL-escaped form of the
|
|
newly created file-cap. (Note that without this substitution, there is no
|
|
way to access the file that was just uploaded).
|
|
|
|
The default (in the absence of when_done=) is to return an HTML page that
|
|
describes the results of the upload. This page will contain information
|
|
about which storage servers were used for the upload, how long each
|
|
operation took, etc.
|
|
|
|
If a "mutable=true" argument is provided, the operation will create a
|
|
mutable file, and the response body will contain the write-cap instead of
|
|
the upload results page. The default is to create an immutable file,
|
|
returning the upload results page as a response.
|
|
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
|
|
|
|
This uploads a file, and attaches it as a new child of the given directory.
|
|
The file must be provided as the "file" field of an HTML encoded form body,
|
|
produced in response to an HTML form like this:
|
|
<form action="." method="POST" enctype="multipart/form-data">
|
|
<input type="hidden" name="t" value="upload" />
|
|
<input type="file" name="file" />
|
|
<input type="submit" value="Upload" />
|
|
</form>
|
|
|
|
A "name=" argument can be provided to specify the new child's name,
|
|
otherwise it will be taken from the "filename" field of the upload form
|
|
(most web browsers will copy the last component of the original file's
|
|
pathname into this field). To avoid confusion, name= is not allowed to
|
|
contain a slash.
|
|
|
|
If there is already a child with that name, and it is a mutable file, then
|
|
its contents are replaced with the data being uploaded. If it is not a
|
|
mutable file, the default behavior is to remove the existing child before
|
|
creating a new one. To prevent this (and make the operation return an error
|
|
instead of overwriting the old child), add a "replace=false" argument, as
|
|
"?t=upload&replace=false". With replace=false, this operation will return an
|
|
HTTP 409 "Conflict" error if there is already an object at the given
|
|
location, rather than overwriting the existing object. Note that "true",
|
|
"t", and "1" are all synonyms for "True", and "false", "f", and "0" are
|
|
synonyms for "False". the parameter is case-insensitive.
|
|
|
|
This will create additional intermediate directories as necessary, although
|
|
since it is expected to be triggered by a form that was retrieved by "GET
|
|
/uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
|
|
already exist.
|
|
|
|
If a "mutable=true" argument is provided, any new file that is created will
|
|
be a mutable file instead of an immutable one. <input type="checkbox"
|
|
name="mutable" /> will give the user a way to set this option.
|
|
|
|
If a "when_done=URL" argument is provided, the HTTP response will cause the
|
|
web browser to redirect to the given URL. This provides a convenient way to
|
|
return the browser to the directory that was just modified. Without a
|
|
when_done= argument, the HTTP response will simply contain the file-cap of
|
|
the file that was just uploaded (a write-cap for mutable files, or a
|
|
read-cap for immutable files).
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
|
|
|
|
This also uploads a file and attaches it as a new child of the given
|
|
directory. It is a slight variant of the previous operation, as the URL
|
|
refers to the target file rather than the parent directory. It is otherwise
|
|
identical: this accepts mutable= and when_done= arguments too.
|
|
|
|
POST /uri/$FILECAP?t=upload
|
|
|
|
This modifies the contents of an existing mutable file in-place. An error is
|
|
signalled if $FILECAP does not refer to a mutable file. It behaves just like
|
|
the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms
|
|
in a web browser.
|
|
|
|
=== Attaching An Existing File Or Directory (by URI) ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
|
|
|
|
This attaches a given read- or write- cap "CHILDCAP" to the designated
|
|
directory, with a specified child name. This behaves much like the PUT t=uri
|
|
operation, and is a lot like a UNIX hardlink.
|
|
|
|
This will create additional intermediate directories as necessary, although
|
|
since it is expected to be triggered by a form that was retrieved by "GET
|
|
/uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
|
|
already exist.
|
|
|
|
=== Deleting A Child ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
|
|
|
|
This instructs the node to delete a child object (file or subdirectory) from
|
|
the given directory. Note that the entire subtree is removed. This is
|
|
somewhat like "rm -rf" (from the point of view of the parent), but other
|
|
references into the subtree will see that the child subdirectories are not
|
|
modified by this operation. Only the link from the given directory to its
|
|
child is severed.
|
|
|
|
=== Renaming A Child ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
|
|
|
|
This instructs the node to rename a child of the given directory. This is
|
|
exactly the same as removing the child, then adding the same child-cap under
|
|
the new name. This operation cannot move the child to a different directory.
|
|
|
|
This operation will replace any existing child of the new name, making it
|
|
behave like the UNIX "mv -f" command.
|
|
|
|
=== Other Utilities ===
|
|
|
|
GET /uri?uri=$CAP
|
|
|
|
This causes a redirect to /uri/$CAP, and retains any additional query
|
|
arguments (like filename= or save=). This is for the convenience of web
|
|
forms which allow the user to paste in a read- or write- cap (obtained
|
|
through some out-of-band channel, like IM or email).
|
|
|
|
Note that this form merely redirects to the specific file or directory
|
|
indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
|
|
traverse to children by appending additional path segments to the URL.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
|
|
|
|
This provides a useful facility to browser-based user interfaces. It
|
|
returns a page containing a form targetting the "POST $DIRCAP t=rename"
|
|
functionality described above, with the provided $CHILDNAME present in the
|
|
'from_name' field of that form. I.e. this presents a form offering to
|
|
rename $CHILDNAME, requesting the new name, and submitting POST rename.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
|
|
|
|
This returns the file- or directory- cap for the specified object.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
|
|
|
|
This returns a read-only file- or directory- cap for the specified object.
|
|
If the object is an immutable file, this will return the same value as
|
|
t=uri.
|
|
|
|
=== Debugging and Testing Features ===
|
|
|
|
These URLs are less-likely to be helpful to the casual Tahoe user, and are
|
|
mainly intended for developers.
|
|
|
|
POST $URL?t=check
|
|
|
|
This triggers the FileChecker to determine the current "health" of the
|
|
given file or directory, by counting how many shares are available. The
|
|
page that is returned will display the results. This can be used as a "show
|
|
me detailed information about this file" page.
|
|
|
|
If a verify=true argument is provided, the node will perform a more
|
|
intensive check, downloading and verifying every single bit of every share.
|
|
|
|
If an output=JSON argument is provided, the response will be
|
|
machine-readable JSON instead of human-oriented HTML. The data is a
|
|
dictionary with the following keys:
|
|
|
|
storage-index: a base32-encoded string with the objects's storage index,
|
|
or an empty string for LIT files
|
|
summary: a string, with a one-line summary of the stats of the file
|
|
results: a dictionary that describes the state of the file. For LIT files,
|
|
this dictionary has only the 'healthy' key, which will always be
|
|
True. For distributed files, this dictionary has the following
|
|
keys:
|
|
count-shares-good: the number of good shares that were found
|
|
count-shares-needed: 'k', the number of shares required for recovery
|
|
count-shares-expected: 'N', the number of total shares generated
|
|
count-good-share-hosts: the number of distinct storage servers with
|
|
good shares. If this number is less than
|
|
count-shares-good, then some shares are doubled
|
|
up, increasing the correlation of failures. This
|
|
indicates that one or more shares should be
|
|
moved to an otherwise unused server, if one is
|
|
available.
|
|
count-wrong-shares: for mutable files, the number of shares for
|
|
versions other than the 'best' one (highest
|
|
sequence number, highest roothash). These are
|
|
either old ...
|
|
count-recoverable-versions: for mutable files, the number of
|
|
recoverable versions of the file. For
|
|
a healthy file, this will equal 1.
|
|
count-unrecoverable-versions: for mutable files, the number of
|
|
unrecoverable versions of the file.
|
|
For a healthy file, this will be 0.
|
|
count-corrupt-shares: the number of shares with integrity failures
|
|
list-corrupt-shares: a list of "share locators", one for each share
|
|
that was found to be corrupt. Each share locator
|
|
is a list of (serverid, storage_index, sharenum).
|
|
needs-rebalancing: (bool) True if there are multiple shares on a single
|
|
storage server, indicating a reduction in reliability
|
|
that could be resolved by moving shares to new
|
|
servers.
|
|
servers-responding: list of base32-encoded storage server identifiers,
|
|
one for each server which responded to the share
|
|
query.
|
|
healthy: (bool) True if the file is completely healthy, False otherwise.
|
|
Healthy files have at least N good shares. Overlapping shares
|
|
(indicated by count-good-share-hosts < count-shares-good) do not
|
|
currently cause a file to be marked unhealthy. If there are at
|
|
least N good shares, then corrupt shares do not cause the file to
|
|
be marked unhealthy, although the corrupt shares will be listed
|
|
in the results (list-corrupt-shares) and should be manually
|
|
removed to wasting time in subsequent downloads (as the
|
|
downloader rediscovers the corruption and uses alternate shares).
|
|
sharemap: dict mapping share identifier to list of serverids
|
|
(base32-encoded strings). This indicates which servers are
|
|
holding which shares. For immutable files, the shareid is
|
|
an integer (the share number, from 0 to N-1). For
|
|
immutable files, it is a string of the form
|
|
'seq%d-%s-sh%d', containing the sequence number, the
|
|
roothash, and the share number.
|
|
|
|
POST $URL?t=start-deep-check (must add &ophandle=XYZ)
|
|
|
|
This initiates a recursive walk of all files and directories reachable from
|
|
the target, performing a check on each one just like t=check. The result
|
|
page will contain a summary of the results, including details on any
|
|
file/directory that was not fully healthy.
|
|
|
|
t=start-deep-check can only be invoked on a directory. An error (400
|
|
BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
|
|
walker will deal with loops safely.
|
|
|
|
This accepts the same verify= argument as t=check.
|
|
|
|
Since this operation can take a long time (perhaps a second per object),
|
|
the ophandle= argument is required (see "Slow Operations, Progress, and
|
|
Cancelling" above). The response to this POST will be a redirect to the
|
|
corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
|
|
match the output= argument given to the POST). The deep-check operation
|
|
will continue to run in the background, and the /operations page should be
|
|
used to find out when the operation is done.
|
|
|
|
Detailed check results for non-healthy files and directories will be
|
|
available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
|
|
contain links to these detailed results.
|
|
|
|
The HTML /operations/$HANDLE page for incomplete operations will contain a
|
|
meta-refresh tag, set to 60 seconds, so that a browser which uses
|
|
deep-check will automatically poll until the operation has completed.
|
|
|
|
The JSON page (/options/$HANDLE?output=JSON) will contain a
|
|
machine-readable JSON dictionary with the following keys:
|
|
|
|
finished: a boolean, True if the operation is complete, else False. Some
|
|
of the remaining keys may not be present until the operation
|
|
is complete.
|
|
root-storage-index: a base32-encoded string with the storage index of the
|
|
starting point of the deep-check operation
|
|
count-objects-checked: count of how many objects were checked. Note that
|
|
non-distributed objects (i.e. small immutable LIT
|
|
files) are not checked, since for these objects,
|
|
the data is contained entirely in the URI.
|
|
count-objects-healthy: how many of those objects were completely healthy
|
|
count-objects-unhealthy: how many were damaged in some way
|
|
count-corrupt-shares: how many shares were found to have corruption,
|
|
summed over all objects examined
|
|
list-corrupt-shares: a list of "share identifiers", one for each share
|
|
that was found to be corrupt. Each share identifier
|
|
is a list of (serverid, storage_index, sharenum).
|
|
list-unhealthy-files: a list of (pathname, check-results) tuples, for
|
|
each file that was not fully healthy. 'pathname' is
|
|
a list of strings (which can be joined by "/"
|
|
characters to turn it into a single string),
|
|
relative to the directory on which deep-check was
|
|
invoked. The 'check-results' field is the same as
|
|
that returned by t=check&output=JSON, described
|
|
above.
|
|
stats: a dictionary with the same keys as the t=start-deep-stats command
|
|
(described below)
|
|
|
|
POST $URL?t=check&repair=true
|
|
|
|
This performs a health check of the given file or directory, and if the
|
|
checker determines that the object is not healthy (some shares are missing
|
|
or corrupted), it will perform a "repair". During repair, any missing
|
|
shares will be regenerated and uploaded to new servers.
|
|
|
|
This accepts the same verify=true argument as t=check. When an output=JSON
|
|
argument is provided, the machine-readable JSON response will contain the
|
|
following keys:
|
|
|
|
storage-index: a base32-encoded string with the objects's storage index,
|
|
or an empty string for LIT files
|
|
repair-attempted: (bool) True if repair was attempted
|
|
repair-successful: (bool) True if repair was attempted and the file was
|
|
fully healthy afterwards. False if no repair was
|
|
attempted, or if a repair attempt failed.
|
|
pre-repair-results: a dictionary that describes the state of the file
|
|
before any repair was performed. This contains exactly
|
|
the same keys as the 'results' value of the t=check
|
|
response, described above.
|
|
post-repair-results: a dictionary that describes the state of the file
|
|
after any repair was performed. If no repair was
|
|
performed, post-repair-results and pre-repair-results
|
|
will be the same. This contains exactly the same keys
|
|
as the 'results' value of the t=check response,
|
|
described above.
|
|
|
|
POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ)
|
|
|
|
This triggers a recursive walk of all files and directories, performing a
|
|
t=check&repair=true on each one.
|
|
|
|
Like t=start-deep-check without the repair= argument, this can only be
|
|
invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
|
|
is invoked on a file. The recursive walker will deal with loops safely.
|
|
|
|
This accepts the same verify=true argument as t=start-deep-check. It uses
|
|
the same ophandle= mechanism as start-deep-check. When an output=JSON
|
|
argument is provided, the response will contain the following keys:
|
|
|
|
finished: (bool) True if the operation has completed, else False
|
|
root-storage-index: a base32-encoded string with the storage index of the
|
|
starting point of the deep-check operation
|
|
count-objects-checked: count of how many objects were checked
|
|
|
|
count-objects-healthy-pre-repair: how many of those objects were completely
|
|
healthy, before any repair
|
|
count-objects-unhealthy-pre-repair: how many were damaged in some way
|
|
count-objects-healthy-post-repair: how many of those objects were completely
|
|
healthy, after any repair
|
|
count-objects-unhealthy-post-repair: how many were damaged in some way
|
|
|
|
count-repairs-attempted: repairs were attempted on this many objects.
|
|
count-repairs-successful: how many repairs resulted in healthy objects
|
|
count-repairs-unsuccessful: how many repairs resulted did not results in
|
|
completely healthy objects
|
|
count-corrupt-shares-pre-repair: how many shares were found to have
|
|
corruption, summed over all objects
|
|
examined, before any repair
|
|
count-corrupt-shares-post-repair: how many shares were found to have
|
|
corruption, summed over all objects
|
|
examined, after any repair
|
|
list-corrupt-shares: a list of "share identifiers", one for each share
|
|
that was found to be corrupt (before any repair).
|
|
Each share identifier is a list of (serverid,
|
|
storage_index, sharenum).
|
|
list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
|
|
that were successfully repaired are not
|
|
included. These are shares that need
|
|
manual processing. Since immutable shares
|
|
cannot be modified by clients, all corruption
|
|
in immutable shares will be listed here.
|
|
list-unhealthy-files: a list of (pathname, check-results) tuples, for
|
|
each file that was not fully healthy. 'pathname' is
|
|
relative to the directory on which deep-check was
|
|
invoked. The 'check-results' field is the same as
|
|
that returned by t=check&repair=true&output=JSON,
|
|
described above.
|
|
stats: a dictionary with the same keys as the t=start-deep-stats command
|
|
(described below)
|
|
|
|
POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)
|
|
|
|
This operation generates a "manfest" of the given directory tree, mostly
|
|
for debugging. This is a table of (path, filecap/dircap), for every object
|
|
reachable from the starting directory. The path will be slash-joined, and
|
|
the filecap/dircap will contain a link to the object in question. This page
|
|
gives immediate access to every object in the virtual filesystem subtree.
|
|
|
|
This operation uses the same ophandle= mechanism as deep-check. The
|
|
corresponding /operations/$HANDLE page has three different forms. The
|
|
default is output=HTML.
|
|
|
|
If output=text is added to the query args, the results will be a text/plain
|
|
list. The first line is special: it is either "finished: yes" or "finished:
|
|
no"; if the operation is not finished, you must periodically reload the
|
|
page until it completes. The rest of the results are a plaintext list, with
|
|
one file/dir per line, slash-separated, with the filecap/dircap separated
|
|
by a space.
|
|
|
|
If output=JSON is added to the queryargs, then the results will be a
|
|
JSON-formatted dictionary with six keys. Note that because large directory
|
|
structures can result in very large JSON results, the full results will not
|
|
be available until the operation is complete (i.e. until output["finished"]
|
|
is True):
|
|
|
|
finished (bool): if False then you must reload the page until True
|
|
origin_si (base32 str): the storage index of the starting point
|
|
manifest: list of (path, cap) tuples, where path is a list of strings.
|
|
verifycaps: list of (printable) verify cap strings
|
|
storage-index: list of (base32) storage index strings
|
|
stats: a dictionary with the same keys as the t=start-deep-stats command
|
|
(described below)
|
|
|
|
POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ)
|
|
|
|
This operation generates a number (in bytes) containing the sum of the
|
|
filesize of all directories and immutable files reachable from the given
|
|
directory. This is a rough lower bound of the total space consumed by this
|
|
subtree. It does not include space consumed by mutable files, nor does it
|
|
take expansion or encoding overhead into account. Later versions of the
|
|
code may improve this estimate upwards.
|
|
|
|
The /operations/$HANDLE status output consists of two lines of text:
|
|
|
|
finished: yes
|
|
size: 1234
|
|
|
|
POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ)
|
|
|
|
This operation performs a recursive walk of all files and directories
|
|
reachable from the given directory, and generates a collection of
|
|
statistics about those objects.
|
|
|
|
The result (obtained from the /operations/$OPHANDLE page) is a
|
|
JSON-serialized dictionary with the following keys (note that some of these
|
|
keys may be missing until 'finished' is True):
|
|
|
|
finished: (bool) True if the operation has finished, else False
|
|
count-immutable-files: count of how many CHK files are in the set
|
|
count-mutable-files: same, for mutable files (does not include directories)
|
|
count-literal-files: same, for LIT files (data contained inside the URI)
|
|
count-files: sum of the above three
|
|
count-directories: count of directories
|
|
size-immutable-files: total bytes for all CHK files in the set, =deep-size
|
|
size-mutable-files (TODO): same, for current version of all mutable files
|
|
size-literal-files: same, for LIT files
|
|
size-directories: size of directories (includes size-literal-files)
|
|
size-files-histogram: list of (minsize, maxsize, count) buckets,
|
|
with a histogram of filesizes, 5dB/bucket,
|
|
for both literal and immutable files
|
|
largest-directory: number of children in the largest directory
|
|
largest-immutable-file: number of bytes in the largest CHK file
|
|
|
|
size-mutable-files is not implemented, because it would require extra
|
|
queries to each mutable file to get their size. This may be implemented in
|
|
the future.
|
|
|
|
Assuming no sharing, the basic space consumed by a single root directory is
|
|
the sum of size-immutable-files, size-mutable-files, and size-directories.
|
|
The actual disk space used by the shares is larger, because of the
|
|
following sources of overhead:
|
|
|
|
integrity data
|
|
expansion due to erasure coding
|
|
share management data (leases)
|
|
backend (ext3) minimum block size
|
|
|
|
POST $URL?t=stream-manifest
|
|
|
|
This operation performs a recursive walk of all files and directories
|
|
reachable from the given starting point. For each such unique object
|
|
(duplicates are skipped), a single line of JSON is emitted to the HTTP
|
|
response channel. When the walk is complete, a final line of JSON is emitted
|
|
which contains the accumulated file-size/count "deep-stats" data.
|
|
|
|
A CLI tool can split the response stream on newlines into "response units",
|
|
and parse each response unit as JSON. Each such parsed unit will be a
|
|
dictionary, and will contain at least the "type" key: a string, one of
|
|
"file", "directory", or "stats".
|
|
|
|
For all units that have a type of "file" or "directory", the dictionary will
|
|
contain the following keys:
|
|
|
|
"path": a list of strings, with the path that is traversed to reach the
|
|
object
|
|
"cap": a writecap for the file or directory, if available, else a readcap
|
|
"verifycap": a verifycap for the file or directory
|
|
"repaircap": the weakest cap which can still be used to repair the object
|
|
"storage-index": a base32 storage index for the object
|
|
|
|
Note that non-distributed files (i.e. LIT files) will have values of None
|
|
for verifycap, repaircap, and storage-index, since these files can neither
|
|
be verified nor repaired, and are not stored on the storage servers.
|
|
|
|
The last unit in the stream will have a type of "stats", and will contain
|
|
the keys described in the "start-deep-stats" operation, below.
|
|
|
|
|
|
== Other Useful Pages ==
|
|
|
|
The portion of the web namespace that begins with "/uri" (and "/named") is
|
|
dedicated to giving users (both humans and programs) access to the Tahoe
|
|
virtual filesystem. The rest of the namespace provides status information
|
|
about the state of the Tahoe node.
|
|
|
|
GET / (the root page)
|
|
|
|
This is the "Welcome Page", and contains a few distinct sections:
|
|
|
|
Node information: library versions, local nodeid, services being provided.
|
|
|
|
Filesystem Access Forms: create a new directory, view a file/directory by
|
|
URI, upload a file (unlinked), download a file by
|
|
URI.
|
|
|
|
Grid Status: introducer information, helper information, connected storage
|
|
servers.
|
|
|
|
GET /status/
|
|
|
|
This page lists all active uploads and downloads, and contains a short list
|
|
of recent upload/download operations. Each operation has a link to a page
|
|
that describes file sizes, servers that were involved, and the time consumed
|
|
in each phase of the operation.
|
|
|
|
A GET of /status/?t=json will contain a machine-readable subset of the same
|
|
data. It returns a JSON-encoded dictionary. The only key defined at this
|
|
time is "active", with a value that is a list of operation dictionaries, one
|
|
for each active operation. Once an operation is completed, it will no longer
|
|
appear in data["active"] .
|
|
|
|
Each op-dict contains a "type" key, one of "upload", "download",
|
|
"mapupdate", "publish", or "retrieve" (the first two are for immutable
|
|
files, while the latter three are for mutable files and directories).
|
|
|
|
The "upload" op-dict will contain the following keys:
|
|
|
|
type (string): "upload"
|
|
storage-index-string (string): a base32-encoded storage index
|
|
total-size (int): total size of the file
|
|
status (string): current status of the operation
|
|
progress-hash (float): 1.0 when the file has been hashed
|
|
progress-ciphertext (float): 1.0 when the file has been encrypted.
|
|
progress-encode-push (float): 1.0 when the file has been encoded and
|
|
pushed to the storage servers. For helper
|
|
uploads, the ciphertext value climbs to 1.0
|
|
first, then encoding starts. For unassisted
|
|
uploads, ciphertext and encode-push progress
|
|
will climb at the same pace.
|
|
|
|
The "download" op-dict will contain the following keys:
|
|
|
|
type (string): "download"
|
|
storage-index-string (string): a base32-encoded storage index
|
|
total-size (int): total size of the file
|
|
status (string): current status of the operation
|
|
progress (float): 1.0 when the file has been fully downloaded
|
|
|
|
Front-ends which want to report progress information are advised to simply
|
|
average together all the progress-* indicators. A slightly more accurate
|
|
value can be found by ignoring the progress-hash value (since the current
|
|
implementation hashes synchronously, so clients will probably never see
|
|
progress-hash!=1.0).
|
|
|
|
GET /provisioning/
|
|
|
|
This page provides a basic tool to predict the likely storage and bandwidth
|
|
requirements of a large Tahoe grid. It provides forms to input things like
|
|
total number of users, number of files per user, average file size, number
|
|
of servers, expansion ratio, hard drive failure rate, etc. It then provides
|
|
numbers like how many disks per server will be needed, how many read
|
|
operations per second should be expected, and the likely MTBF for files in
|
|
the grid. This information is very preliminary, and the model upon which it
|
|
is based still needs a lot of work.
|
|
|
|
GET /helper_status/
|
|
|
|
If the node is running a helper (i.e. if [helper]enabled is set to True in
|
|
tahoe.cfg), then this page will provide a list of all the helper operations
|
|
currently in progress. If "?t=json" is added to the URL, it will return a
|
|
JSON-formatted list of helper statistics, which can then be used to produce
|
|
graphs to indicate how busy the helper is.
|
|
|
|
GET /statistics/
|
|
|
|
This page provides "node statistics", which are collected from a variety of
|
|
sources.
|
|
|
|
load_monitor: every second, the node schedules a timer for one second in
|
|
the future, then measures how late the subsequent callback
|
|
is. The "load_average" is this tardiness, measured in
|
|
seconds, averaged over the last minute. It is an indication
|
|
of a busy node, one which is doing more work than can be
|
|
completed in a timely fashion. The "max_load" value is the
|
|
highest value that has been seen in the last 60 seconds.
|
|
|
|
cpu_monitor: every minute, the node uses time.clock() to measure how much
|
|
CPU time it has used, and it uses this value to produce
|
|
1min/5min/15min moving averages. These values range from 0%
|
|
(0.0) to 100% (1.0), and indicate what fraction of the CPU
|
|
has been used by the Tahoe node. Not all operating systems
|
|
provide meaningful data to time.clock(): they may report 100%
|
|
CPU usage at all times.
|
|
|
|
uploader: this counts how many immutable files (and bytes) have been
|
|
uploaded since the node was started
|
|
|
|
downloader: this counts how many immutable files have been downloaded
|
|
since the node was started
|
|
|
|
publishes: this counts how many mutable files (including directories) have
|
|
been modified since the node was started
|
|
|
|
retrieves: this counts how many mutable files (including directories) have
|
|
been read since the node was started
|
|
|
|
There are other statistics that are tracked by the node. The "raw stats"
|
|
section shows a formatted dump of all of them.
|
|
|
|
By adding "?t=json" to the URL, the node will return a JSON-formatted
|
|
dictionary of stats values, which can be used by other tools to produce
|
|
graphs of node behavior. The misc/munin/ directory in the source
|
|
distribution provides some tools to produce these graphs.
|
|
|
|
GET / (introducer status)
|
|
|
|
For Introducer nodes, the welcome page displays information about both
|
|
clients and servers which are connected to the introducer. Servers make
|
|
"service announcements", and these are listed in a table. Clients will
|
|
subscribe to hear about service announcements, and these subscriptions are
|
|
listed in a separate table. Both tables contain information about what
|
|
version of Tahoe is being run by the remote node, their advertised and
|
|
outbound IP addresses, their nodeid and nickname, and how long they have
|
|
been available.
|
|
|
|
By adding "?t=json" to the URL, the node will return a JSON-formatted
|
|
dictionary of stats values, which can be used to produce graphs of connected
|
|
clients over time. This dictionary has the following keys:
|
|
|
|
["subscription_summary"] : a dictionary mapping service name (like
|
|
"storage") to an integer with the number of
|
|
clients that have subscribed to hear about that
|
|
service
|
|
["announcement_summary"] : a dictionary mapping service name to an integer
|
|
with the number of servers which are announcing
|
|
that service
|
|
["announcement_distinct_hosts"] : a dictionary mapping service name to an
|
|
integer which represents the number of
|
|
distinct hosts that are providing that
|
|
service. If two servers have announced
|
|
FURLs which use the same hostnames (but
|
|
different ports and tubids), they are
|
|
considered to be on the same host.
|
|
|
|
|
|
== Static Files in /public_html ==
|
|
|
|
The wapi server will take any request for a URL that starts with /static
|
|
and serve it from a configurable directory which defaults to
|
|
$BASEDIR/public_html . This is configured by setting the "[node]web.static"
|
|
value in $BASEDIR/tahoe.cfg . If this is left at the default value of
|
|
"public_html", then http://localhost:3456/static/subdir/foo.html will be
|
|
served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
|
|
|
|
This can be useful to serve a javascript application which provides a
|
|
prettier front-end to the rest of the Tahoe wapi.
|
|
|
|
|
|
== safety and security issues -- names vs. URIs ==
|
|
|
|
Summary: use explicit file- and dir- caps whenever possible, to reduce the
|
|
potential for surprises when the virtual drive is changed while you aren't
|
|
looking.
|
|
|
|
The vdrive provides a mutable filesystem, but the ways that the filesystem
|
|
can change are limited. The only thing that can change is that the mapping
|
|
from child names to child objects that each directory contains can be changed
|
|
by adding a new child name pointing to an object, removing an existing child
|
|
name, or changing an existing child name to point to a different object.
|
|
|
|
Obviously if you query tahoe for information about the filesystem and then
|
|
act upon the filesystem (such as by getting a listing of the contents of a
|
|
directory and then adding a file to the directory), then the filesystem might
|
|
have been changed after you queried it and before you acted upon it.
|
|
However, if you use the URI instead of the pathname of an object when you act
|
|
upon the object, then the only change that can happen is when the object is a
|
|
directory then the set of child names it has might be different. If, on the
|
|
other hand, you act upon the object using its pathname, then a different
|
|
object might be in that place, which can result in more kinds of surprises.
|
|
|
|
For example, suppose you are writing code which recursively downloads the
|
|
contents of a directory. The first thing your code does is fetch the listing
|
|
of the contents of the directory. For each child that it fetched, if that
|
|
child is a file then it downloads the file, and if that child is a directory
|
|
then it recurses into that directory. Now, if the download and the recurse
|
|
actions are performed using the child's name, then the results might be
|
|
wrong, because for example a child name that pointed to a sub-directory when
|
|
you listed the directory might have been changed to point to a file (in which
|
|
case your attempt to recurse into it would result in an error and the file
|
|
would be skipped), or a child name that pointed to a file when you listed the
|
|
directory might now point to a sub-directory (in which case your attempt to
|
|
download the child would result in a file containing HTML text describing the
|
|
sub-directory!).
|
|
|
|
If your recursive algorithm uses the uri of the child instead of the name of
|
|
the child, then those kinds of mistakes just can't happen. Note that both the
|
|
child's name and the child's URI are included in the results of listing the
|
|
parent directory, so it isn't any harder to use the URI for this purpose.
|
|
|
|
In general, use names if you want "whatever object (whether file or
|
|
directory) is found by following this name (or sequence of names) when my
|
|
request reaches the server". Use URIs if you want "this particular object".
|
|
|
|
== Concurrency Issues ==
|
|
|
|
Tahoe uses both mutable and immutable files. Mutable files can be created
|
|
explicitly by doing an upload with ?mutable=true added, or implicitly by
|
|
creating a new directory (since a directory is just a special way to
|
|
interpret a given mutable file).
|
|
|
|
Mutable files suffer from the same consistency-vs-availability tradeoff that
|
|
all distributed data storage systems face. It is not possible to
|
|
simultaneously achieve perfect consistency and perfect availability in the
|
|
face of network partitions (servers being unreachable or faulty).
|
|
|
|
Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
|
|
place, known as the Prime Coordination Directive: "Don't Do That". What this
|
|
means is that if write-access to a mutable file is available to several
|
|
parties, then those parties are responsible for coordinating their activities
|
|
to avoid multiple simultaneous updates. This could be achieved by having
|
|
these parties talk to each other and using some sort of locking mechanism, or
|
|
by serializing all changes through a single writer.
|
|
|
|
The consequences of performing uncoordinated writes can vary. Some of the
|
|
writers may lose their changes, as somebody else wins the race condition. In
|
|
many cases the file will be left in an "unhealthy" state, meaning that there
|
|
are not as many redundant shares as we would like (reducing the reliability
|
|
of the file against server failures). In the worst case, the file can be left
|
|
in such an unhealthy state that no version is recoverable, even the old ones.
|
|
It is this small possibility of data loss that prompts us to issue the Prime
|
|
Coordination Directive.
|
|
|
|
Tahoe nodes implement internal serialization to make sure that a single Tahoe
|
|
node cannot conflict with itself. For example, it is safe to issue two
|
|
directory modification requests to a single tahoe node's wapi server at the
|
|
same time, because the Tahoe node will internally delay one of them until
|
|
after the other has finished being applied. (This feature was introduced in
|
|
Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
|
|
web requests themselves).
|
|
|
|
For more details, please see the "Consistency vs Availability" and "The Prime
|
|
Coordination Directive" sections of mutable.txt, in the same directory as
|
|
this file.
|
|
|
|
|
|
[1]: URLs and HTTP and UTF-8, Oh My
|
|
|
|
HTTP does not provide a mechanism to specify the character set used to
|
|
encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
|
|
the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
|
|
For example, suppose we want to provoke the server into using a filename of
|
|
"f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
|
|
is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
|
|
repr() function would show). To encode this into a URL, the non-printable
|
|
characters must be escaped with the urlencode '%XX' mechansim, giving us
|
|
"fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
|
|
/uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
|
|
provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
|
|
|
|
The response header will need to indicate a non-ASCII filename. The actual
|
|
mechanism to do this is not clear. For ASCII filenames, the response header
|
|
would look like:
|
|
|
|
Content-Disposition: attachment; filename="english.txt"
|
|
|
|
If Tahoe were to enforce the utf-8 convention, it would need to decode the
|
|
URL argument into a unicode string, and then encode it back into a sequence
|
|
of bytes when creating the response header. One possibility would be to use
|
|
unencoded utf-8. Developers suggest that IE7 might accept this:
|
|
|
|
#1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
|
|
(note, the last four bytes of that line, not including the newline, are
|
|
0xC3 0xA9 0x65 0x22)
|
|
|
|
RFC2231#4 (dated 1997): suggests that the following might work, and some
|
|
developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
|
|
it is supported by firefox (but not IE7):
|
|
|
|
#2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
|
|
|
|
My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
|
|
the filename= parameter is defined to be wrapped in quotes (presumeably to
|
|
allow spaces without breaking the parsing of subsequent parameters), which
|
|
would give us:
|
|
|
|
#3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
|
|
|
|
However this is contrary to the examples in the email thread listed above.
|
|
|
|
Developers report that IE7 (when it is configured for UTF-8 URL encoding,
|
|
which is not the default in asian countries), will accept:
|
|
|
|
#4: Content-Disposition: attachment; filename=fianc%C3%A9e
|
|
|
|
However, for maximum compatibility, Tahoe simply copies bytes from the URL
|
|
into the response header, rather than enforcing the utf-8 convention. This
|
|
means it does not try to decode the filename from the URL argument, nor does
|
|
it encode the filename into the response header.
|