docs/webapi.rst: capitalization, formatting of section on URL character encoding, and a correction about Internet Explorer.

This commit is contained in:
david-sarah 2010-12-29 19:40:49 -08:00
parent 83f97dcf7d
commit 53aeb3a3bc

View File

@ -226,8 +226,8 @@ with::
Also note that the filenames inside upload POST forms are interpreted using Also note that the filenames inside upload POST forms are interpreted using
whatever character set was provided in the conventional '_charset' field, and whatever character set was provided in the conventional '_charset' field, and
defaults to UTF-8 if not otherwise specified. The JSON representation of each defaults to UTF-8 if not otherwise specified. The JSON representation of each
directory contains native unicode strings. Tahoe directories are specified to directory contains native Unicode strings. Tahoe directories are specified to
contain unicode filenames, and cannot contain binary strings that are not contain Unicode filenames, and cannot contain binary strings that are not
representable as such. representable as such.
All Tahoe operations that refer to existing files or directories must include All Tahoe operations that refer to existing files or directories must include
@ -467,7 +467,7 @@ Creating A New Directory
form submissions, since the body is not formatted this way. Doing so will form submissions, since the body is not formatted this way. Doing so will
cause a server error as the lower-level code misparses the request body. cause a server error as the lower-level code misparses the request body.
Child file names should each be expressed as a unicode string, then used as Child file names should each be expressed as a Unicode string, then used as
keys of the dictionary. The dictionary should then be converted into JSON, keys of the dictionary. The dictionary should then be converted into JSON,
and the resulting string encoded into UTF-8. This UTF-8 bytestring should and the resulting string encoded into UTF-8. This UTF-8 bytestring should
then be used as the POST body. then be used as the POST body.
@ -1908,16 +1908,19 @@ Coordination Directive" sections of `mutable.rst <../specifications/mutable.rst>
.. [1] URLs and HTTP and UTF-8, Oh My .. [1] URLs and HTTP and UTF-8, Oh My
HTTP does not provide a mechanism to specify the character set used to HTTP does not provide a mechanism to specify the character set used to
encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that encode non-ASCII names in URLs
the filename= argument shall be a URL-encoded UTF-8 encoded unicode object. (`RFC3986#2.1 <http://tools.ietf.org/html/rfc3986#section-2.1>`_).
We prefer the convention that the ``filename=`` argument shall be a
URL-encoded UTF-8 encoded Unicode string.
For example, suppose we want to provoke the server into using a filename of For example, suppose we want to provoke the server into using a filename of
"f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this "f i a n c e-acute e" (i.e. f i a n c U+00E9 e). The UTF-8 encoding of this
is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\\xC3\\xA9e", as python's
repr() function would show). To encode this into a URL, the non-printable ``repr()`` function would show). To encode this into a URL, the non-printable
characters must be escaped with the urlencode '%XX' mechansim, giving us characters must be escaped with the urlencode ``%XX`` mechansim, giving us
"fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET "fianc%C3%A9e". Thus, the first line of the HTTP request will be
/uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers "``GET /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1``". Not all
provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e. browsers provide this: IE7 by default uses the Latin-1 encoding, which is
fianc%E9e (although it has a configuration option to send URLs as UTF-8).
The response header will need to indicate a non-ASCII filename. The actual The response header will need to indicate a non-ASCII filename. The actual
mechanism to do this is not clear. For ASCII filenames, the response header mechanism to do this is not clear. For ASCII filenames, the response header
@ -1925,10 +1928,10 @@ Coordination Directive" sections of `mutable.rst <../specifications/mutable.rst>
Content-Disposition: attachment; filename="english.txt" Content-Disposition: attachment; filename="english.txt"
If Tahoe were to enforce the utf-8 convention, it would need to decode the If Tahoe were to enforce the UTF-8 convention, it would need to decode the
URL argument into a unicode string, and then encode it back into a sequence URL argument into a Unicode string, and then encode it back into a sequence
of bytes when creating the response header. One possibility would be to use of bytes when creating the response header. One possibility would be to use
unencoded utf-8. Developers suggest that IE7 might accept this:: unencoded UTF-8. Developers suggest that IE7 might accept this::
#1: Content-Disposition: attachment; filename="fianc\xC3\xA9e" #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
(note, the last four bytes of that line, not including the newline, are (note, the last four bytes of that line, not including the newline, are
@ -1937,7 +1940,7 @@ Coordination Directive" sections of `mutable.rst <../specifications/mutable.rst>
`RFC2231#4 <http://tools.ietf.org/html/rfc2231#section-4>`_ `RFC2231#4 <http://tools.ietf.org/html/rfc2231#section-4>`_
(dated 1997): suggests that the following might work, and (dated 1997): suggests that the following might work, and
`some developers have reported <http://markmail.org/message/dsjyokgl7hv64ig3>`_ `some developers have reported <http://markmail.org/message/dsjyokgl7hv64ig3>`_
that it is supported by firefox (but not IE7):: that it is supported by Firefox (but not IE7)::
#2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
@ -1951,11 +1954,11 @@ Coordination Directive" sections of `mutable.rst <../specifications/mutable.rst>
However this is contrary to the examples in the email thread listed above. However this is contrary to the examples in the email thread listed above.
Developers report that IE7 (when it is configured for UTF-8 URL encoding, Developers report that IE7 (when it is configured for UTF-8 URL encoding,
which is not the default in asian countries), will accept:: which is not the default in Asian countries), will accept::
#4: Content-Disposition: attachment; filename=fianc%C3%A9e #4: Content-Disposition: attachment; filename=fianc%C3%A9e
However, for maximum compatibility, Tahoe simply copies bytes from the URL However, for maximum compatibility, Tahoe simply copies bytes from the URL
into the response header, rather than enforcing the utf-8 convention. This into the response header, rather than enforcing the UTF-8 convention. This
means it does not try to decode the filename from the URL argument, nor does means it does not try to decode the filename from the URL argument, nor does
it encode the filename into the response header. it encode the filename into the response header.