From 4a0313d84f7ec84fc3b6368d2da1fb50305391c6 Mon Sep 17 00:00:00 2001 From: Jean-Paul Calderone Date: Fri, 13 Nov 2020 12:31:52 -0500 Subject: [PATCH] Here's my first pass at it --- docs/specifications/index.rst | 1 + docs/specifications/url.rst | 103 ++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) create mode 100644 docs/specifications/url.rst diff --git a/docs/specifications/index.rst b/docs/specifications/index.rst index 7d99934f6..2029c9e5a 100644 --- a/docs/specifications/index.rst +++ b/docs/specifications/index.rst @@ -8,6 +8,7 @@ the data formats used by Tahoe. :maxdepth: 2 outline + url uri file-encoding URI-extension diff --git a/docs/specifications/url.rst b/docs/specifications/url.rst new file mode 100644 index 000000000..369f78392 --- /dev/null +++ b/docs/specifications/url.rst @@ -0,0 +1,103 @@ +URLs +==== + +The goal of this document is to completely specify the construction and use of the URLs by Tahoe-LAFS for service location. +This includes, but is not limited to, the original Foolscap-based URLs. +These are not to be confused with the URI-like capabilities Tahoe-LAFS uses to refer to stored data. +An attempt is also made to outline the rationale for certain choices about these URLs. +The intended audience for this document is Tahoe-LAFS maintainers and other developers interested in interoperating with Tahoe-LAFS or these URLs. + +Background +---------- + +Tahoe-LAFS first used Foolscap_ for network communication. +Foolscap connection setup takes as an input a Foolscap URL or a *fURL*. +A fURL includes three components: + +* the base32-encoded SHA1 hash of the DER form of an x509v3 certificate +* zero or more network addresses +* an object identifier + +A Foolscap client tries to connect to each network address in turn. +If a connection is established then TLS is negotiated. +The server is authenticated by matching its certificate against the hash in the fURL. +A matching certificate serves as proof that the handshaking peer is the correct server. +This serves as the process by which the client authenticates the server. + +The client can then exercise further Foolscap functionality using the fURL's object identifier. +If the object identifier is an unguessable, secret string then it serves as a capability. +This serves as the process by which the server authorizes the client. + +NURLs +----- + +The authentication and authorization properties of fURLs are a good fit for Tahoe-LAFS' requirements. +These are not inherently tied to the Foolscap protocol itself. +In particular they are beneficial to :doc:`http-storage-node-protocol` which uses HTTP instead of Foolscap. +It is conceivable they will also be used with WebSockets at some point as well. + +Continuing to refer to these URLs as fURLs when they are being used for other protocols may cause confusion. +Therefore, +this document coins the name *NURL* for these URLs. +This can be considered to expand to "New URLs" or "Authe*N*ticating URLs" or "Authorizi*N*g URLs" as the reader prefers. + +Syntax +------ + +The EBNF for a NURL is as follows:: + + scheme = "pb://" + + hostname = domain-name | ipv4-address | ipv6-address + net-loc = hostname, [ ":" port ] + net-loc-list = net-loc, [ { ",", net-loc } ] + + swiss-number = segment + + nurl = scheme, hash, "@", net-loc-list, "/", swiss-number + +See https://tools.ietf.org/html/rfc4648#section-5 for the definition of ``urlsafe-base64-string`` +(RFC 4648 provides an ad hoc definition rather than EBNF). +See https://tools.ietf.org/html/rfc3986#section-3.3 for the definition of ``segment``. + +Versions +-------- + +Though all NURLs are syntactically compatible some semantic differences are allowed. +These differences are separated into distinct versions. + +Version 0 +--------- + +A Foolscap fURL is considered the canonical definition of a version 0 NURL. +Notably, +the hash component is defined as the base32-encoded SHA1 hash of the DER form of an x509v3 certificate. +A version 0 NURL is identified by the length of the hash string which is always 32 bytes. + +Version 1 +--------- + +The hash component of a version 1 NURL differs in three ways from the prior version. + +1. The hash function used is SHA3-224 instead of SHA1. + The security of SHA1 `continues to be eroded`_. + Contrariwise SHA3 is currently the most recent addition to the SHA family by NIST. + The 224 bit instance is chosen to keep the output short and because it offers greater collision resistance than SHA1 was thought to offer even at its inception + (prior to security research showing actual collision resistance is lower). +2. The hash is computed over the certificate's SPKI instead of the whole certificate. + This allows certificate re-generation so long as the public key remains the same. + This is useful to allow contact information to be updated or extension of validity period. + Use of an SPKI hash has also been `explored by the web community`_ during its flirtation with using it for HTTPS certificate pinning + (though this is now largely abandoned). +3. The hash is encoded using urlsafe-base64 (without padding) instead of base32. + This provides a more compact representation and minimizes the usability impacts of switching from a 160 bit hash to a 224 bit hash. + +A version 1 NURL is identified by the length of the hash string which is always 38 bytes. + +It is possible for a client to unilaterally upgrade a version 0 NURL to a version 1 NURL. +After establishing and authenticating a connection the client will have received a copy of the server's certificate. +This is sufficient to compute the new hash and rewrite the NURL to upgrade it to version 1. +This provides stronger authentication assurances for future uses but it is not required. + +.. _`continues to be eroded`: https://en.wikipedia.org/wiki/SHA-1#Cryptanalysis_and_validation +.. _`explored by the web community`: https://www.imperialviolet.org/2011/05/04/pinning.html