tahoe-lafs/docs/webapi.txt
Zooko O'Whielacronx 4f2244bfdd webapi.txt: shorter and hopefully clearer description of names vs. identifiers
Brian (and anyone who has an interest in the API and documentation): please review.
2007-08-15 12:28:04 -07:00

422 lines
18 KiB
Plaintext

== connecting to the tahoe node ==
Writing "8011" into $NODEDIR/webport causes the node to run a webserver on
port 8011. Writing "tcp:8011:interface=127.0.0.1" into $NODEDIR/webport does
the same but binds to the loopback interface, ensuring that only the programs
on the local host can connect. Using
"ssl:8011:privateKey=mykey.pem:certKey=cert.pem" would run an SSL server. See
twisted.application.strports for more details.
If $NODEDIR/webpassword exists, it will be used (somehow) to require HTTP
Digest Authentication for all webserver connections. XXX specify how
== vdrive ==
The node provides some small number of "virtual drives". In the 0.5
release, this number is two: the first is the global shared vdrive, the
second is the private non-shared vdrive. We will call these "global" and
"private" for now.
For the purpose of this document, let us assume that the vdrives currently
contain the following directories and files:
global/
global/Documents/
global/Documents/notes.txt
private/
private/Pictures/
private/Pictures/tractors.jpg
private/Pictures/family/
private/Pictures/family/bobby.jpg
Within the webserver, there is a tree of resources. The top-level "vdrive"
resource gives access to files and directories in all of the user's virtual
drives. For example, the URL that corresponds to notes.txt would be:
http://localhost:8011/vdrive/global/Documents/notes.txt
and the URL for tractors.jpg would be:
http://localhost:8011/vdrive/private/Pictures/tractors.jpg
In addition, each directory has a corresponding URL. The Pictures URL is:
http://localhost:8011/vdrive/private/Pictures
Now, what can we do with these URLs? By varying the HTTP method
(GET/PUT/POST/DELETE) and by appending a type-indicating query argument, we
control how what we want to do with the data and how it should be presented.
=== Manipulating files and directories by name ===
In the following examples "$URL" is a shorthand for a URL like the ones
described above, with "vdrive/" as the top level, followed by a
slash-separated sequence of file or directory names. "$NEWURL" is a
shorthand for a URL pointing to a location in the vdrive where currently
nothing exists.
GET $URL
If the given place in the vdrive contains a file, then this simply
retrieves the contents of the file. The Content-Type is set according to
the vdrive's metadata (if available) or by using the usual
filename-extension-magic built into most webservers. The file's contents
are provided in the body of the HTTP response.
If the given place contains a directory, then this returns an HTML page,
intended to be used by humans, which contains HREF links to all files and
directories reachable from this dirnode. These HREF links do not have a t=
argument, meaning that a human who follows them will get pages also meant
for a human. It also contains forms to upload new files, and to delete
files and directories. These forms use POST methods to do their job.
You can add the "save=true" argument, which adds a 'Content-Disposition:
attachment' header to prompt most web browsers to save the file to disk
rather than attempting to display it.
GET $URL?t=json
This returns machine-parseable information about the named file or
directory in the HTTP response body. This information contains a flag that
indicates whether the thing is a file or a directory.
If it is a file, then the information includes file size, metadata (like
Content-Type), and URIs, like this:
[ 'filenode', { 'mutable': bool, 'uri': file_uri, 'size': bytes } ]
If it is a directory, then it includes a flag to indicate whether this is a
read-write dirnode or a read-only dirnode, and information about the
children of this directory, as a mapping from child name to a set of
metadata about the child (the same data that would appear in a
corresponding GET?t=json of the child itself). Like this:
[ 'dirnode', { 'mutable': bool, 'uri': uri, 'children': children } ]
where 'children' is a dictionary in which the keys are child names
and the values depend upon whether the child is a file or a directory:
'foo.txt': [ 'filenode', { 'mutable': bool, 'uri': uri, 'size': bytes } ]
'subdir': [ 'dirnode', { 'mutable': bool, 'uri': uri } ]
note that the value is the same as the JSON representation of the
corresponding FILEURL or DIRURL (except that dirnodes do not recurse --
the "children" entry of the child is omitted).
Before writing code that uses these results, please see the important note
below about TOCTTOU bugs.
GET $URL?t=uri
This returns the URI of the given file or directory in the HTTP response
body. If you have read-write access to that resource then this returns a
URI which provides read-write access. If you have read-only access to that
resource then this returns a URI which provides read-only access.
GET $URL?t=readonly-uri
This returns the URI providing read-only access to the given file or
directory (whether or not you have read-only or read-write access).
(Currently all files are immutable so everyone has read-only access to all
files.)
PUT $URL?t=uri
This attaches a child (either a file or a directory) to the vdrive at the
given location. The URI of the child is provided in the body of the HTTP
request. This can be used to attach a shared directory to the
vdrive. Intermediate directories are created on-demand just like with the
regular PUT command.
DELETE $URL
This deletes the given file or directory from the vdrive. If it is a
directory then this deletes all of its chilren. Note that this *does not*
delete any parent directories, so a sequence of 'PUT $NEWURL' and 'DELETE
$NEWURL' does not necessarily return the vdrive to its original state (it
may leave some intermediate directory nodes).
=== Manipulating files by name ===
PUT $NEWURL
This uploads a file to the given place in the vdrive. It will create
intermediate directory nodes as necessary. The file's contents are taken
from the body of the HTTP request. For convenience, the HTTP response
contains the URI that results from uploading the file, although the node
is not obligated to do anything with the URI. According to the HTTP/1.1
specification (rfc2616), this should return a 200 (OK) code when modifying
an existing file, and a 201 (Created) code when creating a new file.
To use this, run 'curl -T localfile http://localhost:8011/vdrive/global/newfile'
=== Manipulating directories by name ===
PUT $NEWURL?t=mkdir
Create a new empty directory at the given path. The HTTP response contains
the URI of the given directory, although the client is not obligated to do
anything with it.
GET $URL?t=rename-form&name=$CHILDNAME
This provides a useful facility to browser-based user interfaces. It
returns a page containing a form targetting the "POST $URL t=rename"
functionality described below, with the provided $CHILDNAME present in the
'from_name' field of that form. I.e. this presents a form offering to
rename $CHILDNAME, requesting the new name, and submitting POST rename.
== URIs ==
A separate top-level resource namespace ("uri/" instead of "vdrive/") is used
to get access to files and dirnodes that are indexed directly by URI, rather
than by going through the vdrive. The resource thus referenced is used the
same way as if it were accessed through the vdrive (including accessing a
directory's children with "$URI/childname").
For example, this identifies a file or directory:
http://localhost:8011/uri/$URI
And this identifies a file or directory "foo" in a subdirectory "somedir" of
the identified directory:
http://localhost:8011/uri/$URI/somedir/foo
In the following examples, "$URI_URL" is a shorthand for a URL like the one
above, with "uri/" as the top level, followed by a URI.
Note that since tahoe URIs may contain slashes (in particular, dirnode URIs
contain a FURL, which resembles a regular HTTP URL and starts with pb://),
when URIs are used in this form, they must be specially quoted. All slashes
in the URI must be replaced by '!' characters. XXX consider changing the
allmydata.org uri format to relieve the user of this requirement.
GET $URI_URL
GET $URI_URL?t=json
GET $URI_URL?t=uri
GET $URI_URL?t=readonly-uri
These each behave the same way that their name-based URL equivalent does,
described in the "files and directories" section above. The difference is
that which file or directory you access does not depend on the contents of
parent directories as it does with the name-based URLs, since a URI
uniquely identifies an object regardless of its location.
Since files accessed directly this way do not have a filename (from which a
MIME-type can be derived), one can be specified using a 'filename=' query
argument. This filename is also the one used if the 'save=true' argument is
set. For example:
GET http://localhost:8011/uri/$TRACTORS_URI?filename=tractors.jpg
If the URI represents a directory, you can append additional path segments
to $URI_URL to access children of that directory. For example, if we first
obtained the URI of the "private/Pictures" directory by doing:
GET http://localhost:8011/vdrive/private/Pictures?t=uri -> PICTURES_URI
then we could download "private/Pictures/family/bobby.jpg" by fetching:
GET http://localhost:8011/uri/$PICTURES_URI/family/bobby.jpg
Note that since the $URI_URL already contains the URI, the only use for the
"?t=readonly-uri" command is if the thing identified is a directory and you
have read-write access to it and you want to get a URI which provides
read-only access to it. "?t=uri" is completely redundant but included for
completeness.
GET http://localhost:8011/uri?uri=$URI
This causes a redirect to /uri/$URI, and retains any additional query
arguments (like filename= or save=). This is for the convenience of web
forms which allow the user to paste in a URI (obtained through some
out-of-band channel, like IM or email).
Note that this form merely redirects to the specific node indicated by the
URI: unlike the GET /uri/$URI form, you cannot traverse to child nodes by
appending additional path segments to the URL.
The $URI provided as a query argument is allowed to contain slashes. The
redirection provided will escape the slashes with exclamation points, as
described above.
== names versus identifiers ==
The vdrive provides a mutable filesystem, but the ways that the filesystem
can change are limited. The only thing that can change is that the mapping
from child names to child objects that each directory contains can be changed
by adding a new child name pointing to an object, removing an existing child
name, or changing an existing child name to point to a different object.
Obviously if you query tahoe for information about the filesystem and then
act upon the filesystem (such as by getting a listing of the contents of a
directory and then adding a file to the directory), then the filesystem might
have been changed after you queried it and before you acted upon it.
However, if you use the URI instead of the pathname of an object when you act
upon the object, then the only change that can happen is when the object is a
directory then the set of child names it has might be different. If, on the
other hand, you act upon the object using its pathname, then a different
object might be in that place, which can result in more kinds of surprises.
For example, suppose you are writing code which recursively downloads the
contents of a directory. The first thing your code does is fetch the listing
of the contents of the directory. For each child that it fetched, if that
child is a file then it downloads the file, and if that child is a directory
then it recurses into that directory. Now, if the download and the recurse
actions are performed using the child's name, then the results might be
wrong, because for example a child name that pointed to a sub-directory when
you listed the directory might have been changed to point to a file, in which
case your attempt to recurse into it would result in an error and the file
would be skipped, or a child name that pointed to a file when you listed the
directory might now point to a sub-directory, in which case your attempt to
download the child would result in a file containing HTML text describing the
sub-directory!
If your recursive algorithm uses the URI of the child instead of the name of
the child, then those kinds of mistakes just can't happen. Note that both the
child's name and the child's URI are included in the results of listing the
parent directory, so it isn't harder to use the URI for this purpose.
In general, use names if you want "whatever object (whether file or
directory) is found by following this name (or sequence of names) when my
request reaches the server". Use URIs if you want "this particular object".
== POST forms ==
POST $URL
t=upload
name=childname (optional)
file=newfile
This instructs the node to upload a file into the given dirnode. We need
this because forms are the only way for a web browser to upload a file
(browsers do not know how to do PUT or DELETE). The file's contents and the
new child name will be included in the form's arguments. This can only be
used to upload a single file at a time. To avoid confusion, name= is not
allowed to contain a slash (a 400 Bad Request error will result).
POST $URL
t=mkdir
name=childname
This instructs the node to create a new empty directory. The name of the
new child directory will be included in the form's arguments.
POST $URL
t=uri
name=childname
uri=newuri
This instructs the node to attach a child that is referenced by URI (just
like the PUT $URL?t=uri method). The name and URI of the new child
will be included in the form's arguments.
POST $URL
t=delete
name=childname
This instructs the node to delete a file from the given dirnode. The name
of the child to be deleted will be included in the form's arguments.
POST $URL
t=rename
from_name=oldchildname
to_name=newchildname
This instructs the node to rename a child within the given dirnode. The
child specified by 'from_name' is removed, and reattached as a child named
for 'to_name'. This is unconditional and will replace any child already
present under 'to_name', akin to 'mv -f' in unix parlance.
== XMLRPC ==
http://localhost:8011/xmlrpc
This resource provides an XMLRPC server on which all of the previous
operations can be expressed as function calls taking a "pathname" argument.
This is provided for applications that want to think of everything in terms
of XMLRPC.
listdir(vdrivename, path) -> dict of (childname -> (stuff))
put(vdrivename, path, contents) -> URI
get(vdrivename, path) -> contents
mkdir(vdrivename, path) -> URI
put_localfile(vdrivename, path, localfilename) -> URI
get_localfile(vdrivename, path, localfilename)
put_localdir(vdrivename, path, localdirname) # recursive
get_localdir(vdrivename, path, localdirname) # recursive
put_uri(vdrivename, path, URI)
etc..
== Testing/Debugging Commands ==
GET $URL?t=download&localfile=$LOCALPATH
GET $URL?t=download&localdir=$LOCALPATH
The localfile= form instructs the node to download the given file and write
it into the local filesystem at $LOCALPATH. The localdir= form instructs
the node to recursively download everything from the given directory and
below into the local filesystem. To avoid surprises, the localfile= form
will signal an error if $URL actually refers to a directory, likewise if
localdir= is used with a $URL that refers to a file.
This request will only be accepted from an HTTP client connection
originating at 127.0.0.1 . This request is most useful when the client node
and the HTTP client are operated by the same user. $LOCALPATH should be an
absolute pathname.
This form is only implemented for testing purposes, because of a trivially
easy attack: any web server that the local browser visits could serve an
IMG tag that causes the local node to modify the local filesystem.
Therefore this form is only enabled if you create a file named
'webport_allow_localfile' in the node's base directory.
PUT $NEWURL?t=upload&localfile=$LOCALPATH
PUT $NEWURL?t=upload&localdir=$LOCALPATH
This uploads a file or directory from the node's local filesystem to the
vdrive. As with "GET $URL?t=download&localfile=$LOCALPATH", this request
will only be accepted from an HTTP connection originating from 127.0.0.1 .
The localfile= form expects that $LOCALPATH will point to a file on the
node's local filesystem, and cause sthe node to upload that one file into
the vdrive at the given location. Any parent directories will be created in
the vdrive as necessary.
The localdir= form expects that $LOCALPATH will point to a directory on the
node's local filesystem, and it causes the node to perform a recursive
upload of the directory into the vdrive at the given location, creating
parent directories as necessary. When the operation is complete, the
directory referenced by $NEWURL will contain all of the files and
directories that were present in $LOCALPATH, so this is equivalent to the
unix commands:
mkdir -p $NEWURL; cp -r $LOCALPATH/* $NEWURL/
Note that the "curl" utility can be used to provoke this sort of recursive
upload, since the -T option will make it use an HTTP 'PUT':
curl -T /dev/null 'http://localhost:8011/vdrive/global/newdir?t=upload&localdir=/home/user/directory-to-upload'
This form is only implemented for testing purposes, because any attacker's
web server that a local browser visits could serve an IMG tag that causes
the local node to modify the local filesystem. Therefore this form is only
enabled if you create a file named 'webport_allow_localfile' in the node's
base directory.
GET $URL?t=manifest
Return an HTML-formatted manifest of the given directory, for debugging.