mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-25 21:59:19 +00:00
1013 lines
46 KiB
Plaintext
1013 lines
46 KiB
Plaintext
|
|
= The Tahoe REST-ful Web API =
|
|
|
|
1. Enabling the web-API port
|
|
2. Basic Concepts: GET, PUT, DELETE, POST
|
|
3. URLs, Machine-Oriented Interfaces
|
|
4. Browser Operations: Human-Oriented Interfaces
|
|
5. Welcome / Debug / Status pages
|
|
6. Safety and security issues -- names vs. URIs
|
|
7. Concurrency Issues
|
|
|
|
|
|
== Enabling the web-API port ==
|
|
|
|
Every Tahoe node is capable of running a built-in HTTP server. To enable
|
|
this, just write a port number into a file named "webport" in the node's base
|
|
directory. For example, writing "8123" into $NODEDIR/webport will cause the
|
|
node to run a webserver on port 8123.
|
|
|
|
This string is actually a Twisted "strports" specification, meaning you can
|
|
get more control over the interface to which the server binds by supplying
|
|
additional arguments. For more details, see the documentation on
|
|
twisted.application.strports:
|
|
http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
|
|
|
|
Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same
|
|
but binds to the loopback interface, ensuring that only the programs on the
|
|
local host can connect. Using
|
|
"ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
|
|
|
|
This webport can be set when the node is created by passing a --webport
|
|
option to the 'tahoe create-client' command. By default, the node listens on
|
|
port 8123, on the loopback (127.0.0.1) interface.
|
|
|
|
== Basic Concepts ==
|
|
|
|
As described in architecture.txt, each file and directory in a Tahoe virtual
|
|
filesystem is referenced by an identifier that combines the designation of
|
|
the object with the authority to do something with it (such as read or modify
|
|
the contents). This identifier is called a "read-cap" or "write-cap",
|
|
depending upon whether it enables read-only or read-write access. These
|
|
"caps" are also referred to as URIs.
|
|
|
|
The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
|
|
"REpresentational State Transfer": the original scheme by which the World
|
|
Wide Web was intended to work. Each object (file or directory) is referenced
|
|
by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
|
|
DELETE) are used to manipulate these objects. You can think of the URL as a
|
|
noun, and the method as a verb.
|
|
|
|
In REST, the GET method is used to retrieve information about an object, or
|
|
to retrieve some representation of the object itself. When the object is a
|
|
file, the basic GET method will simply return the contents of that file.
|
|
Other variations (generally implemented by adding query parameters to the
|
|
URL) will return information about the object, such as metadata. GET
|
|
operations are required to have no side-effects.
|
|
|
|
PUT is used to upload new objects into the filesystem, or to replace an
|
|
existing object. DELETE it used to delete objects from the filesystem. Both
|
|
PUT and DELETE are required to be idempotent: performing the same operation
|
|
multiple times must have the same side-effects as only performing it once.
|
|
|
|
POST is used for more complicated actions that cannot be expressed as a GET,
|
|
PUT, or DELETE. POST operations can be thought of as a method call: sending
|
|
some message to the object referenced by the URL. In Tahoe, POST is also used
|
|
for operations that must be triggered by an HTML form (including upload and
|
|
delete), because otherwise a regular web browser has no way to accomplish
|
|
these tasks.
|
|
|
|
Tahoe's web API is designed for two different consumers. The first is a
|
|
program that needs to manipulate the virtual file system. Such programs are
|
|
expected to use the RESTful interface described above. The second is a human
|
|
using a standard web browser to work with the filesystem. This user is given
|
|
a series of HTML pages with links to download files, and forms that use POST
|
|
actions to upload, rename, and delete files.
|
|
|
|
== URLs ==
|
|
|
|
Tahoe uses a variety of read- and write- caps to identify files and
|
|
directories. The most common of these is the "immutable file read-cap", which
|
|
is used for most uploaded files. These read-caps look like the following:
|
|
|
|
URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
|
|
|
|
The next most common is a "directory write-cap", which provides both read and
|
|
write access to a directory, and look like this:
|
|
|
|
URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
|
|
|
|
There are also "directory read-caps", which start with "URI:DIR2-RO:", and
|
|
give read-only access to a directory. Finally there are also mutable file
|
|
read- and write- caps, which start with "URI:SSK", and give access to mutable
|
|
files.
|
|
|
|
(later versions of Tahoe will make these strings shorter, and will remove the
|
|
unfortunate colons, which must be escaped when these caps are embedded in
|
|
URLs).
|
|
|
|
To refer to any Tahoe object through the web API, you simply need to combine
|
|
a prefix (which indicates the HTTP server to use) with the cap (which
|
|
indicates which object inside that server to access). Since the default Tahoe
|
|
webport is 8123, the most common prefix is one that will use a local node
|
|
listening on this port:
|
|
|
|
http://127.0.0.1:8123/uri/ + $CAP
|
|
|
|
So, to access the directory named above (which happens to be the
|
|
publically-writable sample directory on the Tahoe test grid, described at
|
|
http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
|
|
|
|
http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
|
|
|
|
(note that the colons in the directory-cap are url-encoded into "%3A"
|
|
sequences).
|
|
|
|
Likewise, to access the file named above, use:
|
|
|
|
http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
|
|
|
|
In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
|
|
or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
|
|
that refers to a file (whether mutable or immutable). So those URLs above can
|
|
be abbreviated as:
|
|
|
|
http://127.0.0.1:8123/uri/$DIRCAP/
|
|
http://127.0.0.1:8123/uri/$FILECAP
|
|
|
|
The operation summaries below will abbreviate these further, by eliding the
|
|
server prefix. They will be displayed like this:
|
|
|
|
/uri/$DIRCAP/
|
|
/uri/$FILECAP
|
|
|
|
|
|
=== Child Lookup ===
|
|
|
|
Tahoe directories contain named children, just like directories in a regular
|
|
local filesystem. These children can be either files or subdirectories.
|
|
|
|
If you have a Tahoe URL that refers to a directory, and want to reference a
|
|
named child inside it, just append the child name to the URL. For example, if
|
|
our sample directory contains a file named "welcome.txt", we can refer to
|
|
that file with:
|
|
|
|
http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt
|
|
|
|
(or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
|
|
|
|
Multiple levels of subdirectories can be handled this way:
|
|
|
|
http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt
|
|
|
|
In this document, when we need to refer to a URL that references a file using
|
|
this child-of-some-directory format, we'll use the following string:
|
|
|
|
/uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
The "[SUBDIRS../]" part means that there are zero or more (optional)
|
|
subdirectory names in the middle of the URL. The "FILENAME" at the end means
|
|
that this whole URL refers to a file of some sort, rather than to a
|
|
directory.
|
|
|
|
When we need to refer specifically to a directory in this way, we'll write:
|
|
|
|
/uri/$DIRCAP/[SUBDIRS../]SUBDIR
|
|
|
|
|
|
Note that all components of pathnames in URLs are required to be UTF-8
|
|
encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
|
|
with:
|
|
|
|
http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
|
|
|
|
Also note that the filenames inside upload POST forms are interpreted using
|
|
whatever character set was provided in the conventional '_charset' field, and
|
|
defaults to UTF-8 if not otherwise specified. The JSON representation of each
|
|
directory contains native unicode strings. Tahoe directories are specified to
|
|
contain unicode filenames, and cannot contain binary strings that are not
|
|
representable as such.
|
|
|
|
All Tahoe operations that refer to existing files or directories must include
|
|
a suitable read- or write- cap in the URL: the webapi server won't add one
|
|
for you. If you don't know the cap, you can't access the file. This allows
|
|
the security properties of Tahoe caps to be extended across the webapi
|
|
interface.
|
|
|
|
== Programmatic Operations ==
|
|
|
|
Now that we know how to build URLs that refer to files and directories in a
|
|
Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
|
|
This section contains a catalog of GET, PUT, DELETE, and POST operations that
|
|
can be performed on these URLs. This set of operations are aimed at programs
|
|
that use HTTP to communicate with a Tahoe node. The next section describes
|
|
operations that are intended for web browsers.
|
|
|
|
=== Reading A File ===
|
|
|
|
GET /uri/$FILECAP
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
This will retrieve the contents of the given file. The HTTP response body
|
|
will contain the sequence of bytes that make up the file.
|
|
|
|
To view files in a web browser, you may want more control over the
|
|
Content-Type and Content-Disposition headers. Please see the next section
|
|
"Browser Operations", for details on how to modify these URLs for that
|
|
purpose.
|
|
|
|
=== Writing/Uploading A File ===
|
|
|
|
PUT /uri/$FILECAP
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
Upload a file, using the data from the HTTP request body, and add whatever
|
|
child links and subdirectories are necessary to make the file available at
|
|
the given location. Once this operation succeeds, a GET on the same URL will
|
|
retrieve the same contents that were just uploaded. This will create any
|
|
necessary intermediate subdirectories.
|
|
|
|
To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file.
|
|
|
|
In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
|
|
writable mutable file, that files contents will be overwritten in-place. If
|
|
it is a read-cap for a mutable file, an error will occur. If it is an
|
|
immutable file, the old file will be discarded, and a new one will be put in
|
|
its place.
|
|
|
|
When creating a new file, if "mutable=true" is in the query arguments, the
|
|
operation will create a mutable file instead of an immutable one.
|
|
|
|
This returns the file-cap of the resulting file. If a new file was created
|
|
by this method, the HTTP response code (as dictated by rfc2616) will be set
|
|
to 201 CREATED. If an existing file was replaced or modified, the response
|
|
code will be 200 OK.
|
|
|
|
Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt'
|
|
command can be used to invoke this operation.
|
|
|
|
PUT /uri
|
|
|
|
This uploads a file, and produces a file-cap for the contents, but does not
|
|
attach the file into the virtual drive. No directories will be modified by
|
|
this operation. The file-cap is returned as the body of the HTTP response.
|
|
|
|
If "mutable=true" is in the query arguments, the operation will create a
|
|
mutable file, and return its write-cap in the HTTP respose. The default is
|
|
to create an immutable file, returning the read-cap as a response.
|
|
|
|
=== Creating A New Directory ===
|
|
|
|
POST /uri?t=mkdir
|
|
PUT /uri?t=mkdir
|
|
|
|
Create a new empty directory and return its write-cap as the HTTP response
|
|
body. This does not make the newly created directory visible from the
|
|
virtual drive. The "PUT" operation is provided for backwards compatibility:
|
|
new code should use POST.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
|
|
|
|
Create new directories as necessary to make sure that the named target
|
|
($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
|
|
intermediate directories as necessary. If the named target directory already
|
|
exists, this will make no changes to it.
|
|
|
|
This will return an error if a blocking file is present at any of the parent
|
|
names, preventing the server from creating the necessary parent directory.
|
|
|
|
The write-cap of the new directory will be returned as the HTTP response
|
|
body.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
|
|
|
|
Create a new empty directory and attach it to the given existing directory.
|
|
This will create additional intermediate directories as necessary.
|
|
|
|
The URL of this form points to the parent of the bottom-most new directory,
|
|
whereas the previous form has a URL that points directly to the bottom-most
|
|
new directory.
|
|
|
|
=== Get Information About A File Or Directory (as JSON) ===
|
|
|
|
GET /uri/$FILECAP?t=json
|
|
GET /uri/$DIRCAP?t=json
|
|
GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
|
|
|
|
This returns a machine-parseable JSON-encoded description of the given
|
|
object. The JSON always contains a list, and the first element of the list
|
|
is always a flag that indicates whether the referenced object is a file or a
|
|
directory. If it is a file, then the information includes file size and URI,
|
|
like this:
|
|
|
|
GET /uri/$FILECAP?t=json :
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
|
|
|
|
[ "filenode", { "ro_uri": file_uri,
|
|
"size": bytes,
|
|
"mutable": false,
|
|
"metadata": {"ctime": 1202777696.7564139,
|
|
"mtime": 1202777696.7564139
|
|
}
|
|
} ]
|
|
|
|
If it is a directory, then it includes information about the children of
|
|
this directory, as a mapping from child name to a set of data about the
|
|
child (the same data that would appear in a corresponding GET?t=json of the
|
|
child itself). The child entries also include metadata about each child,
|
|
including creation- and modification- timestamps. The output looks like
|
|
this:
|
|
|
|
GET /uri/$DIRCAP?t=json :
|
|
GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
|
|
|
|
[ "dirnode", { "rw_uri": read_write_uri,
|
|
"ro_uri": read_only_uri,
|
|
"mutable": true,
|
|
"children": {
|
|
"foo.txt": [ "filenode", { "ro_uri": uri,
|
|
"size": bytes,
|
|
"metadata": {
|
|
"ctime": 1202777696.7564139,
|
|
"mtime": 1202777696.7564139
|
|
}
|
|
} ],
|
|
"subdir": [ "dirnode", { "rw_uri": rwuri,
|
|
"ro_uri": rouri,
|
|
"metadata": {
|
|
"ctime": 1202778102.7589991,
|
|
"mtime": 1202778111.2160511,
|
|
}
|
|
} ]
|
|
} } ]
|
|
|
|
In the above example, note how 'children' is a dictionary in which the keys
|
|
are child names and the values depend upon whether the child is a file or a
|
|
directory. The value is mostly the same as the JSON representation of the
|
|
child object (except that directories do not recurse -- the "children"
|
|
entry of the child is omitted, and the directory view includes the metadata
|
|
that is stored on the directory edge).
|
|
|
|
Then the rw_uri field will be present in the information about a directory
|
|
if and only if you have read-write access to that directory,
|
|
|
|
|
|
=== Attaching an existing File or Directory by its read- or write- cap ===
|
|
|
|
PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
|
|
|
|
This attaches a child object (either a file or directory) to a specified
|
|
location in the virtual filesystem. The child object is referenced by its
|
|
read- or write- cap, as provided in the HTTP request body. This will create
|
|
intermediate directories as necessary.
|
|
|
|
This is similar to a UNIX hardlink: by referencing a previously-uploaded
|
|
file (or previously-created directory) instead of uploading/creating a new
|
|
one, you can create two references to the same object.
|
|
|
|
The read- or write- cap of the child is provided in the body of the HTTP
|
|
request, and this same cap is returned in the response body.
|
|
|
|
The default behavior is to overwrite any existing object at the same
|
|
location. To prevent this (and make the operation return an error instead of
|
|
overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
|
|
With replace=false, this operation will return an HTTP 409 "Conflict" error
|
|
if there is already an object at the given location, rather than overwriting
|
|
the existing object. Note that "true", "t", and "1" are all synonyms for
|
|
"True", and "false", "f", and "0" are synonyms for "False". the parameter is
|
|
case-insensitive.
|
|
|
|
=== Deleting a File or Directory ===
|
|
|
|
DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
|
|
|
|
This removes the given name from its parent directory. CHILDNAME is the
|
|
name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
|
|
be modified.
|
|
|
|
Note that this does not actually delete the file or directory that the name
|
|
points to from the tahoe grid -- it only removes the named reference from
|
|
this directory. If there are other names in this directory or in other
|
|
directories that point to the resource, then it will remain accessible
|
|
through those paths. Even if all names pointing to this object are removed
|
|
from their parent directories, then someone with possession of its read-cap
|
|
can continue to access the object through that cap.
|
|
|
|
The object will only become completely unreachable once 1: there are no
|
|
reachable directories that reference it, and 2: nobody is holding a read-
|
|
or write- cap to the object. (This behavior is very similar to the way
|
|
hardlinks and anonymous files work in traditional unix filesystems).
|
|
|
|
This operation will not modify more than a single directory. Intermediate
|
|
directories which were implicitly created by PUT or POST methods will *not*
|
|
be automatically removed by DELETE.
|
|
|
|
This method returns the file- or directory- cap of the object that was just
|
|
removed.
|
|
|
|
== Browser Operations ==
|
|
|
|
This section describes the HTTP operations that provide support for humans
|
|
running a web browser. Most of these operations use HTML forms that use POST
|
|
to drive the Tahoe node.
|
|
|
|
Note that for all POST operations, the arguments listed can be provided
|
|
either as URL query arguments or as form body fields. URL query arguments are
|
|
separated from the main URL by "?", and from each other by "&". For example,
|
|
"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
|
|
specified by using <input type="hidden"> elements. For clarity, the
|
|
descriptions below display the most significant arguments as URL query args.
|
|
|
|
=== Viewing A Directory (as HTML) ===
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]
|
|
|
|
This returns an HTML page, intended to be displayed to a human by a web
|
|
browser, which contains HREF links to all files and directories reachable
|
|
from this directory. These HREF links do not have a t= argument, meaning
|
|
that a human who follows them will get pages also meant for a human. It also
|
|
contains forms to upload new files, and to delete files and directories.
|
|
Those forms use POST methods to do their job.
|
|
|
|
=== Viewing/Downloading a File ===
|
|
|
|
GET /uri/$FILECAP
|
|
GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
|
|
|
|
This will retrieve the contents of the given file. The HTTP response body
|
|
will contain the sequence of bytes that make up the file.
|
|
|
|
If you want the HTTP response to include a useful Content-Type header,
|
|
either use the second form (which starts with a $DIRCAP), or add a
|
|
"filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
|
|
The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
|
|
to determine a Content-Type (since Tahoe immutable files are merely
|
|
sequences of bytes, not typed+named file objects).
|
|
|
|
If the URL has both filename= and "save=true" in the query arguments, then
|
|
the server to add a "Content-Disposition: attachment" header, along with a
|
|
filename= parameter. When a user clicks on such a link, most browsers will
|
|
offer to let the user save the file instead of displaying it inline (indeed,
|
|
most browsers will refuse to display it inline). "true", "t", "1", and other
|
|
case-insensitive equivalents are all treated the same.
|
|
|
|
Character-set handling in URLs and HTTP headers is a dubious art[1]. For
|
|
maximum compatibility, Tahoe simply copies the bytes from the filename=
|
|
argument into the Content-Disposition header's filename= parameter, without
|
|
trying to interpret them in any particular way.
|
|
|
|
|
|
GET /named/$FILECAP/FILENAME
|
|
|
|
This is an alternate download form which makes it easier to get the correct
|
|
filename. The Tahoe server will provide the contents of the given file, with
|
|
a Content-Type header derived from the given filename. This form is used to
|
|
get browsers to use the "Save Link As" feature correctly, and also helps
|
|
command-line tools like "wget" and "curl" use the right filename. Note that
|
|
this form can *only* be used with file caps; it is an error to use a
|
|
directory cap after the /named/ prefix.
|
|
|
|
=== Creating a Directory ===
|
|
|
|
POST /uri?t=mkdir
|
|
|
|
This creates a new directory, but does not attach it to the virtual
|
|
filesystem.
|
|
|
|
If a "redirect_to_result=true" argument is provided, then the HTTP response
|
|
will cause the web browser to be redirected to a /uri/$DIRCAP page that
|
|
gives access to the newly-created directory. If you bookmark this page,
|
|
you'll be able to get back to the directory again in the future. This is the
|
|
recommended way to start working with a Tahoe server: create a new unlinked
|
|
directory (using redirect_to_result=true), then bookmark the resulting
|
|
/uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page
|
|
to invoke this action.
|
|
|
|
If "redirect_to_result=true" is not provided (or is given a value of
|
|
"false"), then the HTTP response body will simply be the write-cap of the
|
|
new directory.
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
|
|
|
|
This creates a new directory as a child of the designated SUBDIR. This will
|
|
create additional intermediate directories as necessary.
|
|
|
|
If a "when_done=URL" argument is provided, the HTTP response will cause the
|
|
web browser to redirect to the given URL. This provides a convenient way to
|
|
return the browser to the directory that was just modified. Without a
|
|
when_done= argument, the HTTP response will simply contain the write-cap of
|
|
the directory that was just created.
|
|
|
|
|
|
=== Uploading a File ===
|
|
|
|
POST /uri?t=upload
|
|
|
|
This uploads a file, and produces a file-cap for the contents, but does not
|
|
attach the file into the virtual drive. No directories will be modified by
|
|
this operation.
|
|
|
|
The file must be provided as the "file" field of an HTML encoded form body,
|
|
produced in response to an HTML form like this:
|
|
<form action="/uri" method="POST" enctype="multipart/form-data">
|
|
<input type="hidden" name="t" value="upload" />
|
|
<input type="file" name="file" />
|
|
<input type="submit" value="Upload Unlinked" />
|
|
</form>
|
|
|
|
If a "when_done=URL" argument is provided, the response body will cause the
|
|
browser to redirect to the given URL. If the when_done= URL has the string
|
|
"%(uri)s" in it, that string will be replaced by a URL-escaped form of the
|
|
newly created file-cap. (Note that without this substitution, there is no
|
|
way to access the file that was just uploaded).
|
|
|
|
The default (in the absence of when_done=) is to return an HTML page that
|
|
describes the results of the upload. This page will contain information
|
|
about which storage servers were used for the upload, how long each
|
|
operation took, etc.
|
|
|
|
If a "mutable=true" argument is provided, the operation will create a
|
|
mutable file, and the response body will contain the write-cap instead of
|
|
the upload results page. The default is to create an immutable file,
|
|
returning the upload results page as a response.
|
|
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
|
|
|
|
This uploads a file, and attaches it as a new child of the given directory.
|
|
The file must be provided as the "file" field of an HTML encoded form body,
|
|
produced in response to an HTML form like this:
|
|
<form action="." method="POST" enctype="multipart/form-data">
|
|
<input type="hidden" name="t" value="upload" />
|
|
<input type="file" name="file" />
|
|
<input type="submit" value="Upload" />
|
|
</form>
|
|
|
|
A "name=" argument can be provided to specify the new child's name,
|
|
otherwise it will be taken from the "filename" field of the upload form
|
|
(most web browsers will copy the last component of the original file's
|
|
pathname into this field). To avoid confusion, name= is not allowed to
|
|
contain a slash.
|
|
|
|
If there is already a child with that name, and it is a mutable file, then
|
|
its contents are replaced with the data being uploaded. If it is not a
|
|
mutable file, the default behavior is to remove the existing child before
|
|
creating a new one. To prevent this (and make the operation return an error
|
|
instead of overwriting the old child), add a "replace=false" argument, as
|
|
"?t=upload&replace=false". With replace=false, this operation will return an
|
|
HTTP 409 "Conflict" error if there is already an object at the given
|
|
location, rather than overwriting the existing object. Note that "true",
|
|
"t", and "1" are all synonyms for "True", and "false", "f", and "0" are
|
|
synonyms for "False". the parameter is case-insensitive.
|
|
|
|
This will create additional intermediate directories as necessary, although
|
|
since it is expected to be triggered by a form that was retrieved by "GET
|
|
/uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
|
|
already exist.
|
|
|
|
If a "mutable=true" argument is provided, any new file that is created will
|
|
be a mutable file instead of an immutable one. <input type="checkbox"
|
|
name="mutable" /> will give the user a way to set this option.
|
|
|
|
If a "when_done=URL" argument is provided, the HTTP response will cause the
|
|
web browser to redirect to the given URL. This provides a convenient way to
|
|
return the browser to the directory that was just modified. Without a
|
|
when_done= argument, the HTTP response will simply contain the file-cap of
|
|
the file that was just uploaded (a write-cap for mutable files, or a
|
|
read-cap for immutable files).
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
|
|
|
|
This also uploads a file and attaches it as a new child of the given
|
|
directory. It is a slight variant of the previous operation, as the URL
|
|
refers to the target file rather than the parent directory. It is otherwise
|
|
identical: this accepts mutable= and when_done= arguments too.
|
|
|
|
POST /uri/$FILECAP?t=upload
|
|
|
|
=== Attaching An Existing File Or Directory (by URI) ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
|
|
|
|
This attaches a given read- or write- cap "CHILDCAP" to the designated
|
|
directory, with a specified child name. This behaves much like the PUT t=uri
|
|
operation, and is a lot like a UNIX hardlink.
|
|
|
|
This will create additional intermediate directories as necessary, although
|
|
since it is expected to be triggered by a form that was retrieved by "GET
|
|
/uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
|
|
already exist.
|
|
|
|
=== Deleting A Child ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
|
|
|
|
This instructs the node to delete a child object (file or subdirectory) from
|
|
the given directory. Note that the entire subtree is removed. This is
|
|
somewhat like "rm -rf" (from the point of view of the parent), but other
|
|
references into the subtree will see that the child subdirectories are not
|
|
modified by this operation. Only the link from the given directory to its
|
|
child is severed.
|
|
|
|
=== Renaming A Child ===
|
|
|
|
POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
|
|
|
|
This instructs the node to rename a child of the given directory. This is
|
|
exactly the same as removing the child, then adding the same child-cap under
|
|
the new name. This operation cannot move the child to a different directory.
|
|
|
|
This operation will replace any existing child of the new name, making it
|
|
behave like the UNIX "mv -f" command.
|
|
|
|
=== Other Utilities ===
|
|
|
|
GET /uri?uri=$CAP
|
|
|
|
This causes a redirect to /uri/$CAP, and retains any additional query
|
|
arguments (like filename= or save=). This is for the convenience of web
|
|
forms which allow the user to paste in a read- or write- cap (obtained
|
|
through some out-of-band channel, like IM or email).
|
|
|
|
Note that this form merely redirects to the specific file or directory
|
|
indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
|
|
traverse to children by appending additional path segments to the URL.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
|
|
|
|
This provides a useful facility to browser-based user interfaces. It
|
|
returns a page containing a form targetting the "POST $DIRCAP t=rename"
|
|
functionality described above, with the provided $CHILDNAME present in the
|
|
'from_name' field of that form. I.e. this presents a form offering to
|
|
rename $CHILDNAME, requesting the new name, and submitting POST rename.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
|
|
|
|
This returns the file- or directory- cap for the specified object.
|
|
|
|
GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
|
|
|
|
This returns a read-only file- or directory- cap for the specified object.
|
|
If the object is an immutable file, this will return the same value as
|
|
t=uri.
|
|
|
|
=== Debugging and Testing Features ===
|
|
|
|
These URLs are less-likely to be helpful to the casual Tahoe user, and are
|
|
mainly intended for developers.
|
|
|
|
POST $URL?t=check
|
|
|
|
This triggers the FileChecker to determine the current "health" of the
|
|
given file or directory, by counting how many shares are available. The
|
|
page that is returned will display the results. This can be used as a "show
|
|
me detailed information about this file" page.
|
|
|
|
If a when_done=url argument is provided, the return value will be a redirect
|
|
to that URL instead of the checker results.
|
|
|
|
If a return_to=url argument is provided, the returned page will include a
|
|
link to the given URL entitled "Return to the parent directory".
|
|
|
|
If a verify=true argument is provided, the node will perform a more
|
|
intensive check, downloading and verifying every single bit of every share.
|
|
|
|
POST $URL?t=deep-check
|
|
|
|
This triggers a recursive walk of all files and directories reachable from
|
|
the target, performing a check on each one just like t=check. The result
|
|
page will contain a summary of the results, including details on any
|
|
file/directory that was not fully healthy.
|
|
|
|
t=deep-check is most useful to invoke on a directory. If invoked on a file,
|
|
it will just check that single object. The recursive walker will deal with
|
|
loops safely.
|
|
|
|
This accepts the same verify=, when_done=, and return_to= arguments as
|
|
t=check.
|
|
|
|
Be aware that this can take a long time: perhaps a second per object.
|
|
|
|
GET $DIRURL?t=manifest
|
|
|
|
Return an HTML-formatted manifest of the given directory, for debugging.
|
|
This is a table of verifier-caps.
|
|
|
|
GET $DIRURL?t=deep-size
|
|
|
|
Return a number (in bytes) containing the sum of the filesize of all
|
|
immutable files reachable from the given directory. This is a rough lower
|
|
bound of the total space consumed by this subtree. It does not include
|
|
space consumed by directories or immutable files, nor does it take
|
|
expansion or encoding overhead into account. Later versions of the code may
|
|
improve this estimate upwards.
|
|
|
|
GET $DIRURL?t=deep-stats
|
|
|
|
Return a JSON-encoded dictionary that lists interesting statistics about
|
|
the set of all files and directories reachable from the given directory:
|
|
|
|
count-immutable-files: count of how many CHK files are in the set
|
|
count-mutable-files: same, for mutable files (does not include directories)
|
|
count-literal-files: same, for LIT files (data contained inside the URI)
|
|
count-files: sum of the above three
|
|
count-directories: count of directories
|
|
size-immutable-files: total bytes for all CHK files in the set, =deep-size
|
|
size-mutable-files (TODO): same, for current version of all mutable files
|
|
size-literal-files: same, for LIT files
|
|
size-directories: size of directories (includes size-literal-files)
|
|
size-files-histogram: list of (minsize, maxsize, count) buckets,
|
|
with a histogram of filesizes, 5dB/bucket,
|
|
for both literal and immutable files
|
|
largest-directory: number of children in the largest directory
|
|
largest-immutable-file: number of bytes in the largest CHK file
|
|
|
|
size-mutable-files is not implemented, because it would require extra
|
|
queries to each mutable file to get their size. This may be implemented in
|
|
the future.
|
|
|
|
Assuming no sharing, the basic space consumed by a single root directory is
|
|
the sum of size-immutable-files, size-mutable-files, and size-directories.
|
|
The actual disk space used by the shares is larger, because of the
|
|
following sources of overhead:
|
|
|
|
integrity data
|
|
expansion due to erasure coding
|
|
share management data (leases)
|
|
backend (ext3) minimum block size
|
|
|
|
== Other Useful Pages ==
|
|
|
|
The portion of the web namespace that begins with "/uri" (and "/named") is
|
|
dedicated to giving users (both humans and programs) access to the Tahoe
|
|
virtual filesystem. The rest of the namespace provides status information
|
|
about the state of the Tahoe node.
|
|
|
|
GET / (the root page)
|
|
|
|
This is the "Welcome Page", and contains a few distinct sections:
|
|
|
|
Node information: library versions, local nodeid, services being provided.
|
|
|
|
Filesystem Access Forms: create a new directory, view a file/directory by
|
|
URI, upload a file (unlinked), download a file by
|
|
URI.
|
|
|
|
Grid Status: introducer information, helper information, connected storage
|
|
servers.
|
|
|
|
GET /status/
|
|
|
|
This page lists all active uploads and downloads, and contains a short list
|
|
of recent upload/download operations. Each operation has a link to a page
|
|
that describes file sizes, servers that were involved, and the time consumed
|
|
in each phase of the operation.
|
|
|
|
A GET of /status/?t=json will contain a machine-readable subset of the same
|
|
data. It returns a JSON-encoded dictionary. The only key defined at this
|
|
time is "active", with a value that is a list of operation dictionaries, one
|
|
for each active operation. Once an operation is completed, it will no longer
|
|
appear in data["active"] .
|
|
|
|
Each op-dict contains a "type" key, one of "upload", "download",
|
|
"mapupdate", "publish", or "retrieve" (the first two are for immutable
|
|
files, while the latter three are for mutable files and directories).
|
|
|
|
The "upload" op-dict will contain the following keys:
|
|
|
|
type (string): "upload"
|
|
storage-index-string (string): a base32-encoded storage index
|
|
total-size (int): total size of the file
|
|
status (string): current status of the operation
|
|
progress-hash (float): 1.0 when the file has been hashed
|
|
progress-ciphertext (float): 1.0 when the file has been encrypted.
|
|
progress-encode-push (float): 1.0 when the file has been encoded and
|
|
pushed to the storage servers. For helper
|
|
uploads, the ciphertext value climbs to 1.0
|
|
first, then encoding starts. For unassisted
|
|
uploads, ciphertext and encode-push progress
|
|
will climb at the same pace.
|
|
|
|
The "download" op-dict will contain the following keys:
|
|
|
|
type (string): "download"
|
|
storage-index-string (string): a base32-encoded storage index
|
|
total-size (int): total size of the file
|
|
status (string): current status of the operation
|
|
progress (float): 1.0 when the file has been fully downloaded
|
|
|
|
Front-ends which want to report progress information are advised to simply
|
|
average together all the progress-* indicators. A slightly more accurate
|
|
value can be found by ignoring the progress-hash value (since the current
|
|
implementation hashes synchronously, so clients will probably never see
|
|
progress-hash!=1.0).
|
|
|
|
GET /provisioning/
|
|
|
|
This page provides a basic tool to predict the likely storage and bandwidth
|
|
requirements of a large Tahoe grid. It provides forms to input things like
|
|
total number of users, number of files per user, average file size, number
|
|
of servers, expansion ratio, hard drive failure rate, etc. It then provides
|
|
numbers like how many disks per server will be needed, how many read
|
|
operations per second should be expected, and the likely MTBF for files in
|
|
the grid. This information is very preliminary, and the model upon which it
|
|
is based still needs a lot of work.
|
|
|
|
GET /helper_status/
|
|
|
|
If the node is running a helper (i.e. if "$BASEDIR/run_helper" is
|
|
non-empty), then this page will provide a list of all the helper operations
|
|
currently in progress. If "?t=json" is added to the URL, it will return a
|
|
JSON-formatted list of helper statistics, which can then be used to produce
|
|
graphs to indicate how busy the helper is.
|
|
|
|
GET /statistics/
|
|
|
|
This page provides "node statistics", which are collected from a variety of
|
|
sources.
|
|
|
|
load_monitor: every second, the node schedules a timer for one second in
|
|
the future, then measures how late the subsequent callback
|
|
is. The "load_average" is this tardiness, measured in
|
|
seconds, averaged over the last minute. It is an indication
|
|
of a busy node, one which is doing more work than can be
|
|
completed in a timely fashion. The "max_load" value is the
|
|
highest value that has been seen in the last 60 seconds.
|
|
|
|
cpu_monitor: every minute, the node uses time.clock() to measure how much
|
|
CPU time it has used, and it uses this value to produce
|
|
1min/5min/15min moving averages. These values range from 0%
|
|
(0.0) to 100% (1.0), and indicate what fraction of the CPU
|
|
has been used by the Tahoe node. Not all operating systems
|
|
provide meaningful data to time.clock(): they may report 100%
|
|
CPU usage at all times.
|
|
|
|
uploader: this counts how many immutable files (and bytes) have been
|
|
uploaded since the node was started
|
|
|
|
downloader: this counts how many immutable files have been downloaded
|
|
since the node was started
|
|
|
|
publishes: this counts how many mutable files (including directories) have
|
|
been modified since the node was started
|
|
|
|
retrieves: this counts how many mutable files (including directories) have
|
|
been read since the node was started
|
|
|
|
There are other statistics that are tracked by the node. The "raw stats"
|
|
section shows a formatted dump of all of them.
|
|
|
|
By adding "?t=json" to the URL, the node will return a JSON-formatted
|
|
dictionary of stats values, which can be used by other tools to produce
|
|
graphs of node behavior. The misc/munin/ directory in the source
|
|
distribution provides some tools to produce these graphs.
|
|
|
|
GET / (introducer status)
|
|
|
|
For Introducer nodes, the welcome page displays information about both
|
|
clients and servers which are connected to the introducer. Servers make
|
|
"service announcements", and these are listed in a table. Clients will
|
|
subscribe to hear about service announcements, and these subscriptions are
|
|
listed in a separate table. Both tables contain information about what
|
|
version of Tahoe is being run by the remote node, their advertised and
|
|
outbound IP addresses, their nodeid and nickname, and how long they have
|
|
been available.
|
|
|
|
By adding "?t=json" to the URL, the node will return a JSON-formatted
|
|
dictionary of stats values, which can be used to produce graphs of connected
|
|
clients over time.
|
|
|
|
|
|
== safety and security issues -- names vs. URIs ==
|
|
|
|
Summary: use explicit file- and dir- caps whenever possible, to reduce the
|
|
potential for surprises when the virtual drive is changed while you aren't
|
|
looking.
|
|
|
|
The vdrive provides a mutable filesystem, but the ways that the filesystem
|
|
can change are limited. The only thing that can change is that the mapping
|
|
from child names to child objects that each directory contains can be changed
|
|
by adding a new child name pointing to an object, removing an existing child
|
|
name, or changing an existing child name to point to a different object.
|
|
|
|
Obviously if you query tahoe for information about the filesystem and then
|
|
act upon the filesystem (such as by getting a listing of the contents of a
|
|
directory and then adding a file to the directory), then the filesystem might
|
|
have been changed after you queried it and before you acted upon it.
|
|
However, if you use the URI instead of the pathname of an object when you act
|
|
upon the object, then the only change that can happen is when the object is a
|
|
directory then the set of child names it has might be different. If, on the
|
|
other hand, you act upon the object using its pathname, then a different
|
|
object might be in that place, which can result in more kinds of surprises.
|
|
|
|
For example, suppose you are writing code which recursively downloads the
|
|
contents of a directory. The first thing your code does is fetch the listing
|
|
of the contents of the directory. For each child that it fetched, if that
|
|
child is a file then it downloads the file, and if that child is a directory
|
|
then it recurses into that directory. Now, if the download and the recurse
|
|
actions are performed using the child's name, then the results might be
|
|
wrong, because for example a child name that pointed to a sub-directory when
|
|
you listed the directory might have been changed to point to a file (in which
|
|
case your attempt to recurse into it would result in an error and the file
|
|
would be skipped), or a child name that pointed to a file when you listed the
|
|
directory might now point to a sub-directory (in which case your attempt to
|
|
download the child would result in a file containing HTML text describing the
|
|
sub-directory!).
|
|
|
|
If your recursive algorithm uses the uri of the child instead of the name of
|
|
the child, then those kinds of mistakes just can't happen. Note that both the
|
|
child's name and the child's URI are included in the results of listing the
|
|
parent directory, so it isn't any harder to use the URI for this purpose.
|
|
|
|
In general, use names if you want "whatever object (whether file or
|
|
directory) is found by following this name (or sequence of names) when my
|
|
request reaches the server". Use URIs if you want "this particular object".
|
|
|
|
== Concurrency Issues ==
|
|
|
|
Tahoe uses both mutable and immutable files. Mutable files can be created
|
|
explicitly by doing an upload with ?mutable=true added, or implicitly by
|
|
creating a new directory (since a directory is just a special way to
|
|
interpret a given mutable file).
|
|
|
|
Mutable files suffer from the same consistency-vs-availability tradeoff that
|
|
all distributed data storage systems face. It is not possible to
|
|
simultaneously achieve perfect consistency and perfect availability in the
|
|
face of network partitions (servers being unreachable or faulty).
|
|
|
|
Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
|
|
place, known as the Prime Coordination Directive: "Don't Do That". What this
|
|
means is that if write-access to a mutable file is available to several
|
|
parties, then those parties are responsible for coordinating their activities
|
|
to avoid multiple simultaneous updates. This could be achieved by having
|
|
these parties talk to each other and using some sort of locking mechanism, or
|
|
by serializing all changes through a single writer.
|
|
|
|
The consequences of performing uncoordinated writes can vary. Some of the
|
|
writers may lose their changes, as somebody else wins the race condition. In
|
|
many cases the file will be left in an "unhealthy" state, meaning that there
|
|
are not as many redundant shares as we would like (reducing the reliability
|
|
of the file against server failures). In the worst case, the file can be left
|
|
in such an unhealthy state that no version is recoverable, even the old ones.
|
|
It is this small possibility of data loss that prompts us to issue the Prime
|
|
Coordination Directive.
|
|
|
|
Tahoe nodes implement internal serialization to make sure that a single Tahoe
|
|
node cannot conflict with itself. For example, it is safe to issue two
|
|
directory modification requests to a single tahoe node's webapi server at the
|
|
same time, because the Tahoe node will internally delay one of them until
|
|
after the other has finished being applied. (This feature was introduced in
|
|
Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
|
|
web requests themselves).
|
|
|
|
For more details, please see the "Consistency vs Availability" and "The Prime
|
|
Coordination Directive" sections of mutable.txt, in the same directory as
|
|
this file.
|
|
|
|
|
|
[1]: URLs and HTTP and UTF-8, Oh My
|
|
|
|
HTTP does not provide a mechanism to specify the character set used to
|
|
encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
|
|
the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
|
|
For example, suppose we want to provoke the server into using a filename of
|
|
"f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
|
|
is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
|
|
repr() function would show). To encode this into a URL, the non-printable
|
|
characters must be escaped with the urlencode '%XX' mechansim, giving us
|
|
"fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
|
|
/uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
|
|
provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
|
|
|
|
The response header will need to indicate a non-ASCII filename. The actual
|
|
mechanism to do this is not clear. For ASCII filenames, the response header
|
|
would look like:
|
|
|
|
Content-Disposition: attachment; filename="english.txt"
|
|
|
|
If Tahoe were to enforce the utf-8 convention, it would need to decode the
|
|
URL argument into a unicode string, and then encode it back into a sequence
|
|
of bytes when creating the response header. One possibility would be to use
|
|
unencoded utf-8. Developers suggest that IE7 might accept this:
|
|
|
|
#1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
|
|
(note, the last four bytes of that line, not including the newline, are
|
|
0xC3 0xA9 0x65 0x22)
|
|
|
|
RFC2231#4 (dated 1997): suggests that the following might work, and some
|
|
developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
|
|
it is supported by firefox (but not IE7):
|
|
|
|
#2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
|
|
|
|
My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
|
|
the filename= parameter is defined to be wrapped in quotes (presumeably to
|
|
allow spaces without breaking the parsing of subsequent parameters), which
|
|
would give us:
|
|
|
|
#3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
|
|
|
|
However this is contrary to the examples in the email thread listed above.
|
|
|
|
Developers report that IE7 (when it is configured for UTF-8 URL encoding,
|
|
which is not the default in asian countries), will accept:
|
|
|
|
#4: Content-Disposition: attachment; filename=fianc%C3%A9e
|
|
|
|
However, for maximum compatibility, Tahoe simply copies bytes from the URL
|
|
into the response header, rather than enforcing the utf-8 convention. This
|
|
means it does not try to decode the filename from the URL argument, nor does
|
|
it encode the filename into the response header.
|