tahoe-lafs/docs/proposed/accounting-overview.txt

723 lines
36 KiB
Plaintext
Raw Normal View History

= Accounting =
"Accounting" is the arena of the Tahoe system that concerns measuring,
controlling, and enabling the ability to upload and download files, and to
create new directories. In contrast with the capability-based access control
model, which dictates how specific files and directories may or may not be
manipulated, Accounting is concerned with resource consumption: how much disk
space a given person/account/entity can use.
2009-05-23 18:40:11 +00:00
Tahoe releases up to and including 1.4.1 have a nearly-unbounded resource
usage model. Anybody who can talk to the Introducer gets to talk to all the
Storage Servers, and anyone who can talk to a Storage Server gets to use as
much disk space as they want (up to the reserved_space= limit imposed by the
server, which affects all users equally). Not only is the per-user space
usage unlimited, it is also unmeasured: the owner of the Storage Server has
no way to find out how much space Alice or Bob is using.
The goals of the Accounting system are thus:
* allow the owner of a storage server to control who gets to use disk space,
with separate limits per user
* allow both the server owner and the user to measure how much space the user
is consuming, in an efficient manner
* provide grid-wide aggregation tools, so a set of cooperating server
operators can easily measure how much a given user is consuming across all
servers. This information should also be available to the user in question.
For the purposes of this document, the terms "Account" and "User" are mostly
interchangeable. The fundamental unit of Accounting is the "Account", in that
usage and quota enforcement is performed separately for each account. These
accounts might correspond to individual human users, or they might be shared
among a group, or a user might have an arbitrary number of accounts.
Accounting interacts with Garbage Collection. To protect their shares from
GC, clients maintain limited-duration leases on those shares: when the last
lease expires, the share is deleted. Each lease has a "label", which
indicates the account or user which wants to keep the share alive. A given
account's "usage" (their per-server aggregate usage) is simply the sum of the
sizes of all shares on which they hold a lease. The storage server may limit
the user to a fixed "quota" (an upper bound on their usage). To keep a file
2009-05-23 18:40:11 +00:00
alive, the user must be willing to use up some of their quota.
Note that a popular file might have leases from multiple users, in which case
one user might take a chance and decline to add their own lease, saving some
of their quota and hoping that the other leases continue to keep the file
alive despite their personal unwillingness to contribute to the effort. One
could imagine a "pro-rated quotas" scheme, in which a 10MB file with 5
leaseholders would deduct 2MB from each leaseholder's quota. We have decided
to not implement pro-rated quotas, because such a scheme would make usage
values hard to predict: a given account might suddenly go over quota solely
because of a third party's actions.
== Authority Flow ==
The authority to consume space on the storage server originates, of course,
with the storage server operator. These operators start with complete control
over their space, and delegate portions of it to others: either directly to
clients who want to upload files, or to intermediaries who can then delegate
attenuated authority onwards. The operators have various reasons for wanting
to share their space: monetary consideration, expectations of in-kind
2009-05-23 18:40:11 +00:00
exchange, or simple generosity. But the final authority always rests with the
operator.
2009-05-23 18:40:11 +00:00
The server operator grants limited authority over their space by configuring
their server to accept requests that demonstrate knowledge of certain
secrets. They then share those secrets with the client who intends to use
this space, or with an intermediary who will generate still more secrets and
share those with the client. Eventually, an upload or create-directory
operation will be performed that needs this authority. Part of the operation
will involve proving knowledge of the secret to the storage server, and the
server will require this proof before accepting the uploaded share or adding
a new lease.
The authority is expressed as a string, containing cryptographically-signed
messages and keys. The string also contains "restrictions", which are
annotations that explain the limits imposed upon this authority, either by
the original grantor (the storage server operator) or by one of the
intermediaries. Authority can be reduced but not increased. Any holder of a
given authority can delegate some or all of it to another party.
The authority string may be short enough to include as an argument to a CLI
command (--with-authority ABCDE), or it may be long enough that it must be
stashed in a file and referenced in some other fashion (--with-authority-file
~/.my_authority). There are CLI tools to create brand new authority strings,
to derive attenuated authorities from an existing one, and to explain the
contents of an authority string. These authority strings can be shared with
others just like filecaps and dircaps: knowledge of the authority string is
both necessary and complete to wield the authority it represents.
Web-API requests will include the authority necessary to complete the
operation. When used by a CLI tool, the authority is likely to come from
~/.tahoe/private/authority (i.e. it is ambient to the user who has access to
that node, just like aliases provide similar access to a specific "root
directory"). When used by the browser-oriented WUI, the authority will [TODO]
somehow be retained on each page in a way that minimizes the risk of CSRF
attacks and allows safe sharing (cut-and-paste of a URL without sharing the
storage authority too). The client node receiving the web-API request will
extract the authority string from the request and use it to build the storage
2009-05-23 19:03:59 +00:00
server messages that it sends to fulfill that request.
== Definition Of Authority ==
2009-05-23 19:03:59 +00:00
The term "authority" is used here in the object-capability sense: it refers
to the ability of some principal to cause some action to occur, whether
because they can do it themselves, or because they can convince some other
principal to do it for them. In Tahoe terms, "storage authority" is the
ability to do one of the following actions:
* upload a new share, thus consuming storage space
* adding a new lease to a share, thus preventing space from being reclaimed
* modify an existing mutable share, potentially increasing the space consumed
2009-05-23 19:03:59 +00:00
The Accounting effort may involve other kinds of authority that get limited
in a similar manner as storage authority, like the ability to download a
2009-05-23 19:03:59 +00:00
share or query whether a given share is present: anything that may consume
CPU time, disk bandwidth, or other limited resources. The authority to renew
or cancel a lease may be controlled in a similar fashion.
Storage authority, as granted from a server operator to a client, is not
simply a binary "use space or not" grant. Instead, it is parameterized by a
number of "restrictions". The most important of these restrictions (with
respect to the goals of Accounting) is the "Account Label".
=== Account Labels ===
A Tahoe "Account" is defined by a variable-length sequence of small integers.
(they are not required to be small, the actual limit is 2**64, but neither
2009-05-23 19:03:59 +00:00
are they required to be unguessable). For the purposes of discussion, these
lists will be expressed as period-joined strings: the two-element list (1,4)
will be displayed here as "1.4".
These accounts are arranged in a hierarchy: the account identifier 1.4 is
considered to be a "parent" of 1.4.2 . There is no relationship between the
values used by unrelated accounts: 1.4 is unrelated to 2.4, despite both
coincidentally using a "4" in the second element.
Each lease has a label, which contains the Account identifier. The storage
server maintains an aggregate size count for each label prefix: when asked
2009-05-23 19:03:59 +00:00
about account 1.4, it will report the amount of space used by shares labeled
1.4, 1.4.2, 1.4.7, 1.4.7.8, etc (but *not* 1 or 1.5).
The "Account Label" restriction allows a client to apply any label it wants,
2009-05-23 19:03:59 +00:00
as long as that label begins with a specific prefix. If account 1 is
associated with Alice, then Alice will receive a storage authority string
2009-05-23 19:03:59 +00:00
that contains a "must start with 1" restriction, enabling her to to use
storage space but obligating her to lease her shares with a label that can be
traced back to her. She can delegate part of her authority to others (perhaps
with other non-label restrictions, such as a space restriction or time limit)
with or without an additional label restriction. For example, she might
2009-05-23 19:03:59 +00:00
delegate some of her authority to her friend Amy, with a 1.4 label
restriction. Amy could then create labels with 1.4 or 1.4.7, but she could
not create labels with the same 1 identifier that Alice can do, nor could she
create labels with 1.5 (which Alice might have given to her other friend
Annette). The storage server operator can ask about the usage of 1 to find
out how much Alice is responsible for (which includes the space that she has
delegated to Amy and Annette), and none of the A-users can avoid being
counted in this total. But Alice can ask the storage server about the usage
of 1.4 to find out how much Amy has taken advantage of her gift. Likewise,
Alice has control over any lease with a label that begins with 1, so she can
cancel Amy's leases and free the space they were consuming. If this seems
surprising, consider that the storage server operator considered Alice to be
responsible for that space anyways: with great responsibility (for space
consumed) comes great power (to stop consuming that space).
=== Server Space Restriction ===
The storage server's basic control over how space usage (apart from the
binary use-it-or-not authority granted by handing out an authority string at
all) is implemented by keeping track of the space used by any given account
2009-05-23 19:03:59 +00:00
identifier. If account 1.4 sends a request to allocate a 1MB share, but that
1MB would bring the 1.4 usage over its quota, the request will be denied.
For this to be useful, the storage server must give each usage-limited
principal a separate account, and it needs to configure a size limit at the
same time as the authority string is minted. For a friendnet, the CLI "add
account" tool can do both at once:
tahoe server add-account --quota 5GB Alice
--> Please give the following authority string to "Alice", who should
provide it to the "tahoe add-authority" command
(authority string..)
This command will allocate an account identifier, add Alice to the "pet name
table" to associate it with the new account, and establish the 5GB sizelimit.
Both the sizelimit and the petname can be changed later.
Note that this restriction is independent for each server: some additional
mechanism must be used to provide a grid-wide restriction.
Also note that this restriction is not expressed in the authority string. It
is purely local to the storage server.
=== Attenuated Server Space Restriction ===
TODO (or not)
The server-side space restriction described above can only be applied by the
storage server, and cannot be attenuated by other delegates. Alice might be
allowed to use 5GB on this server, but she cannot use that restriction to
delegate, say, just 1GB to Amy.
Instead, Alice's sub-delegation should include a "server_size" restriction
key, which contains a size limit. The storage server will only honor a
request that uses this authority string if it does not cause the aggregate
usage of this authority string's account prefix to rise above the given size
limit.
Note that this will not enforce the desired restriction if the size limits
are not consistent across multiple delegated authorities for the same label.
For example, if Amy ends up with two delagations, A1 (which gives her a size
limit of 1GB) and A2 (which gives her 5GB), then she can consume 5GB despite
the limit in A1.
=== Other Restrictions ===
Many storage authority restrictions are meant for internal use by tahoe tools
as they delegate short-lived subauthorities to each other, and are not likely
to be set by end users.
* "SI": a storage index string. The authority can only be used to upload
shares of a single file.
* "serverid": a server identifier. The authority can only be used when
talking to a specific server
* "UEB_hash": a binary hash. The authority can only be used to upload shares
of a single file, identified by its share's contents. (note: this
restricton would require the server to parse the share and validate the
hash)
* "before": a timestamp. The authority is only valid until a specific time.
Requires synchronized clocks or a better definition of "timestamp".
* "delegate_to_furl": a string, used to acquire a FURL for an object that
contains the attenuated authority. When it comes time to actually use the
authority string to do something, this is the first step.
* "delegate_to_key": an ECDSA pubkey, used to grant attenuated authority to
a separate private key.
== User Experience ==
The process starts with Bob the storage server operator, who has just created
a new Storage Server:
tahoe create-node
--> creates ~/.tahoe
# edit ~/.tahoe/tahoe.cfg, add introducer.furl, configure storage, etc
Now Bob decides that he wants to let his friend Alice use 5GB of space on his
new server.
tahoe server add-account --quota=5GB Alice
--> Please give the following authority string to "Alice", who should
provide it to the "tahoe add-authority" command
(authority string XYZ..)
Bob copies the new authority string into an email message and sends it to
Alice. Meanwhile, Alice has created her own client, and attached it to the
same Introducer as Bob. When she gets the email, she pastes the authority
string into her local client:
tahoe client add-authority (authority string XYZ..)
--> new authority added: account (1)
Now all CLI commands that Alice runs with her node will take advantage of
Bob's space grant. Once Alice's node connects to Bob's, any upload which
needs to send a share to Bob's server will search her list of authorities to
find one that allows her to use Bob's server.
When Alice uses her WUI, upload will be disabled until and unless she pastes
one or more authority strings into a special "storage authority" box. TODO:
Once pasted, we'll use some trick to keep the authority around in a
convenient-yet-safe fashion.
When Alice uses her javascript-based web drive, the javascript program will
be launched with some trick to hand it the storage authorities, perhaps via a
fragment identifier (http://server/path#fragment).
If Alice decides that she wants Amy to have some space, she takes the
authority string that Bob gave her and uses it to create one for Amy:
tahoe authority dump (authority string XYZ..)
--> explanation of what is in XYZ
tahoe authority delegate --account 4,1 --space 2GB (authority string XYZ..)
--> (new authority string ABC..)
Alice sends the ABC string to Amy, who uses "tahoe client add-authority" to
start using it.
Later, Bob would like to find out how much space Alice is using. He brings up
his node's Storage Server Web Status page. In addition to the overall usage
numbers, the page will have a collapsible-treeview table with lines like:
AccountID Usage TotalUsage Petname
(1) 1.5GB 2.5GB Alice
+(1,4) 1.0GB 1.0GB ?
This indicates that Alice, as a whole, is using 2.5GB. It also indicates that
Alice has delegated some space to a (1,4) account, and that delegation has
used 1.0GB. Alice has used 1.5GB on her own, but is responsible for the full
2.5GB. If Alice tells Bob that the subaccount is for Amy, then Bob can assign
a pet name for (1,4) with "tahoe server add-pet-name 1,4 Amy". Note that Bob
is not aware of the 2GB limit that Alice has imposed upon Amy: the size
restriction may have appeared on all the requests that have showed up thus
far, but Bob has no way of being sure that a less-restrictive delgation
hasn't been created, so his UI does not attempt to remember or present the
restrictions it has seen before.
=== Friendnet ===
A "friendnet" is a set of nodes, each of which is both a storage server and a
client, each operated by a separate person, all of which have granted storage
rights to the others.
The simplest way to get a friendnet started is to simply grant storage
authority to everybody. "tahoe server enable-ambient-storage-authority" will
configure the storage server to give space to anyone who asks. This behaves
just like a 1.3.0 server, without accounting of any sort.
The next step is to restrict server use to just the participants. "tahoe
server disable-ambient-storage-authority" will undo the previous step, then
there are two basic approaches:
* "full mesh": each node grants authority directory to all the others.
First, agree upon a userid number for each participant (the value doesn't
matter, as long as it is unique). Each user should then use "tahoe server
add-account" for all the accounts (including themselves, if they want some
of their shares to land on their own machine), including a quota if they
wish to restrict individuals:
tahoe server add-account --account 1 --quota 5GB Alice
--> authority string for Alice
tahoe server add-account --account 2 --quota 5GB Bob
--> authority string for Bob
tahoe server add-account --account 3 --quota 5GB Carol
--> authority string for Carol
Then email Alice's string to Alice, Bob's string to Bob, etc. Once all
users have used "tahoe client add-authority" on everything, each server
will accept N distinct authorities, and each client will hold N distinct
authorities.
* "account manager": the group designates somebody to be the "AM", or
"account manager". The AM generates a keypair and publishes the public key
to all the participants, who create a local authority which delgates full
storage rights to the corresponding private key. The AM then delegates
account-restricted authority to each user, sending them their personal
authority string:
AM:
tahoe authority create-authority --write-private-to=private.txt
--> public.txt
# email public.txt to all members
AM:
tahoe authority delegate --from-file=private.txt --account 1 --quota 5GB
--> alice_authority.txt # email this to Alice
tahoe authority delegate --from-file=private.txt --account 2 --quota 5GB
--> bob_authority.txt # email this to Bob
tahoe authority delegate --from-file=private.txt --account 3 --quota 5GB
--> carol_authority.txt # email this to Carol
...
Alice:
# receives alice_authority.txt
tahoe client add-authority --from-file=alice_authority.txt
# receives public.txt
tahoe server add-authorization --from-file=public.txt
Bob:
# receives bob_authority.txt
tahoe client add-authority --from-file=bob_authority.txt
# receives public.txt
tahoe server add-authorization --from-file=public.txt
Carol:
# receives carol_authority.txt
tahoe client add-authority --from-file=carol_authority.txt
# receives public.txt
tahoe server add-authorization --from-file=public.txt
If the members want to see names next to their local usage totals, they
can set local petnames for the accounts:
tahoe server set-petname 1 Alice
tahoe server set-petname 2 Bob
tahoe server set-petname 3 Carol
Alternatively, the AM could provide a usage aggregator, which will collect
usage values from all the storage servers and show the totals in a single
place, and add the petnames to that display instead.
The AM gets more authority than anyone else (they can spoof everybody),
but each server has just a single authorization instead of N, and each
client has a single authority instead of N. When a new member joins the
group, the amount of work that must be done is significantly less, and
only two parties are involved instead of all N:
AM:
tahoe authority delegate --from-file=private.txt --account 4 --quota 5GB
--> dave_authority.txt # email this to Dave
Dave:
# receives dave_authority.txt
tahoe client add-authority --from-file=dave_authority.txt
# receives public.txt
tahoe server add-authorization --from-file=public.txt
Another approach is to let everybody be the AM: instead of keeping the
private.txt file secret, give it to all members of the group (but not to
outsiders). This lets current members bring new members into the group
without depending upon anybody else doing work. It also renders any notion
of enforced quotas meaningless, so it is only appropriate for actual
friends who are voluntarily refraining from spoofing each other.
=== Commercial Grid ===
A "commercial grid", like the one that allmydata.com manages as a for-profit
service, is characterized by a large number of independent clients (who do
not know each other), and by all of the storage servers being managed by a
single entity. In this case, we use an Account Manager like above, to
collapse the potential N*M explosion of authorities into something smaller.
We also create a dummy "parent" account, and give all the real clients
subaccounts under it, to give the operations personnel a convenient "total
space used" number. Each time a new customer joins, the AM is directed to
create a new authority for them, and the resulting string is provided to the
customer's client node.
AM:
tahoe authority create-authority --account 1 \
--write-private-to=AM-private.txt --write-public-to=AM-public.txt
Each time a new storage server is brought up:
SERVER:
tahoe server add-authorization --from-file=AM-public.txt
Each time a new client joins:
AM:
N = next_account++
tahoe authority delegate --from-file=AM-private.txt --account 1,N
--> new_client_authority.txt # give this to new client
== Programmatic Interfaces ==
The storage authority can be passed as a string in a single serialized form,
which is cut-and-pasteable and printable. It uses minimal punctuation, to
make it possible to include it as a URL query argument or HTTP header field
without requiring character-escaping.
Before passing it over HTTP, however, note that revealing the authority
string to someone is equivalent to irrevocably delegating all that authority
to them. While this is appropriate when transferring authority from, say, a
receptive storage server to your local agent, it is not appropriate when
using a foreign tahoe node, or when asking a Helper to upload a specific
file. Attenuations (see below) should be used to limit the delegated
authority in these cases.
In the programmatic web-API, any operation that consumes storage will accept
a storage-authority= query argument, the value of which will be the printable
form of an authority string. This includes all PUT operations, POST t=upload
and t=mkdir, and anything which creates a new file, creates a directory
(perhaps an intermediate one), or modifies a mutable file.
Alternatively, the authority string can also be passed through an HTTP
header. A single "X-Tahoe-Storage-Authority:" header can be used with the
printable authority string. If the string is too large to fit in a single
header, the application can provide a series of numbered
"X-Tahoe-Storage-Authority-1:", "X-Tahoe-Storage-Authority-2:", etc, headers,
and these will be sorted in alphabetical order (please use 08/09/10/11 rather
than 8/9/10/11), stripped of leading and trailing whitespace, and
concatenated. The HTTP header form can accomodate larger authority strings,
since these strings can grow too large to pass as a query argument
(especially when several delegations or attenuations are involved). However,
depending upon the HTTP client library being used, passing extra HTTP headers
may be more complicated than simply modifying the URL, and may be impossible
in some cases (such as javascript running in a web browser).
TODO: we may add a stored-token form of authority-passing to handle
environments in which query-args won't work and headers are not available.
This approach would use a special PUT which takes the authority string as the
HTTP body, and remembers it on the server side in associated with a
brief-but-unguessable token. Later operations would then use the authority by
passing a --storage-authority-token=XYZ query argument. These authorities
would expire after some period.
== Quota Management, Aggregation, Reporting ==
The storage server will maintain enough information to efficiently compute
usage totals for each account referenced in all of their leases, as well as
all their parent accounts. This information is used for several purposes:
* enforce server-space restrictions, by selectively rejecting storage
requests which would cause the account-usage-total to rise above the limit
specified in the enabling authorization string
* report individual account usage to the account-holder (if a client can
consume space under account A, they are also allowed to query usage for
account A or a subaccount).
* report individual account usage to the storage-server operator, possibly
associated with a pet name
* report usage for all accounts to the storage-server operator, possibly
associated with a pet name, in the form of a large table
* report usage for all accounts to an external aggregator
The external aggregator would take usage information from all the storage
servers in a single grid and sum them together, providing a grid-wide usage
number for each account. This could be used by e.g. clients in a commercial
grid to report overall-space-used to the end user.
There will be web-API URLs available for all of these reports.
TODO: storage servers might also have a mechanism to apply space-usage limits
to specific account ids directly, rather than requiring that these be
expressed only through authority-string limitation fields. This would let a
storage server operator revoke their space-allocation after delivering the
authority string.
== Low-Level Formats ==
This section describes the low-level formats used by the Accounting process,
beginning with the storage-authority data structure and working upwards. This
section is organized to follow the storage authority, starting from the point
of grant. The discussion will thus begin at the storage server (where the
authority is first created), work back to the client (which receives the
authority as a web-API argument), then follow the authority back to the
servers as it is used to enable specific storage operations. It will then
detail the accounting tables that the storage server is obligated to
maintain, and describe the interfaces through which these tables are accessed
by other parties.
=== Storage Authority ===
==== Terminology ====
Storage Authority is represented as a chain of certificates and a private
key. Each certificate authorizes and restricts a specific private key. The
initial certificate in the chain derives its authority by being placed in the
storage server's tahoe.cfg file (i.e. by being authorized by the storage
server operator). All subsequent certificates are signed by the authorized
private key that was identified in the previous certificate: they derive
their authority by delegation. Each certificate has restrictions which limit
the authority being delegated.
authority: ([cert[0], cert[1], cert[2] ...], privatekey)
The "restrictions dictionary" is a table which establishes an upper bound on
how this authority (or any attenuations thereof) may be used. It is
effectively a set of key-value pairs.
A "signing key" is an EC-DSA192 private key string, as supplied to the
pycryptopp SigningKey() constructor, and is 12 bytes long. A "verifying key"
is an EC-DSA192 public key string, as produced by pycryptopp, and is 24 bytes
long. A "key identifier" is a string which securely identifies a specific
signing/verifying keypair: for long RSA keys it would be a secure hash of the
public key, but since ECDSA192 keys are so short, we simply use the full
verifying key verbatim. A "key hint" is a variable-length prefix of the key
identifier, perhaps zero bytes long, used to help a recipient reduce the
number of verifying keys that it must search to find one that matches a
signed message.
==== Authority Chains ====
The authority chain consists of a list of certificates, each of which has a
serialized restrictions dictionary. Each dictionary will have a
"delegate-to-key" field, which delegates authority to a private key,
referenced with a key identifier. In addition, the non-initial certs are
signed, so they each contain a signature and a key hint:
cert[0]: serialized(restrictions_dictionary)
cert[1]: serialized(restrictions_dictionary), signature, keyhint
cert[2]: serialized(restrictions_dictionary), signature, keyhint
In this example, suppose cert[0] contains a delegate-to-key field that
identifies a keypair sign_A/verify_A. In this case, cert[1] will have a
signature that was made with sign_A, and the keyhint in cert[1] will
reference verify_A.
cert[0].restrictions[delegate-to-key] = A_keyid
cert[1].signature = SIGN(sign_A, serialized(cert[0].restrictions))
cert[1].keyhint = verify_A
cert[1].restrictions[delegate-to-key] = B_keyid
cert[2].signature = SIGN(sign_B, serialized(cert[1].restrictions))
cert[2].keyhint = verify_B
cert[2].restrictions[delete-to-key] = C_keyid
In this example, the full storage authority consists of the cert[0,1,2] chain
and the sign_C private key: anyone who is in possession of both will be able
to exert this authority. To wield the authority, a client will present the
cert[0,1,2] chain and an action message signed by sign_C; the server will
validate the chain and the signature before performing the requested action.
The only circumstances that might prompt the client to share the sign_C
private key with another party (including the server) would be if it wanted
to irrevocably share its full authority with that party.
==== Restriction Dictionaries ====
Within a restriction dictionary, the following keys are defined. Their full
meanings are defined later.
'accountid': an arbitrary-length sequence of integers >=0, restricting the
accounts which can be manipulated or used in leases
'SI': a storage index (binary string), controlling which file may be
manipulated
'serverid': binary string, limiting which server will accept requests
'UEB-hash': binary string, limiting the content of the file being manipulated
'before': timestamp (seconds since epoch), limits the lifetime of this
authority
'server-size': integer >0, maximum aggregate storage (in bytes) per account
'delegate-to-key': binary string (DSA pubkey identifier)
'furl-to': printable FURL string
==== Authority Serialization ====
There is only one form of serialization: a somewhat-compact URL-safe
cut-and-pasteable printable form. We are interested in minimizing the size of
the resulting authority, so rather than using a general-purpose (perhaps
JSON-based) serialization scheme, we use one that is specialized for this
task.
This URL-safe form will use minimal punctuation to avoid quoting issues when
used in a URL query argument. It would be nice to avoid word-breaking
characters that make cut-and-paste troublesome, however this is more
difficult because most non-alphanumeric characters are word-breaking in at
least one application.
The serialized storage authority as a whole contains a single version
identifier and magic number at the beginning. None of the internal components
contain redundant version numbers: they are implied by the container. If
components are serialized independently for other reasons, they may contain
version identifers in that form.
Signing keys (i.e. private keys) are URL-safe-serialized using Zooko's base62
alphabet, which offers almost the same density as standard base64 but without
any non-URL-safe or word-breaking characters. Since we used fixed-format keys
(EC-DSA, 192bit, with SHA256), the private keys are fixed-length (96 bits or
12 bytes), so there is no length indicator: all URL-safe-serialized signing
keys are 17 base62 characters long. The 192-bit verifying keys (i.e. public
keys) use the same approach: the URL-safe form is 33 characters long.
An account-id sequence (a variable-length sequence of non-negative numbers)
is serialized by representing each number in decimal ASCII, then joining the
pieces with commas. The string is terminated by the first non-[0-9,]
character encountered, which will either be the key-identifier letter of the
next field, or the dictionary-terminating character at the end.
2009-05-23 18:40:11 +00:00
Any single integral decimal number (such as the "before" timestamp field, or
the "server-size" field) is serialized as a variable-length sequence of ASCII
decimal digits, terminated by any non-digit.
The restrictions dictionary is serialized as a concatenated series of
key-identifier-letter / value string pairs, ending with the marker "E.". The
URL-safe form uses a single printable letter to indicate the which key is
being serialized. Each type of value string is serialized differently:
"A": accountid: variable-length sequence of comma-joned numbers
2009-05-23 18:40:11 +00:00
"I": storage index: fixed-length 26-character *base32*-encoded storage index
"P": server id (peer id): fixed-length 32-character *base32* encoded serverid
(matching the printable Tub.tubID string that Foolscap provides)
"U": UEB hash: fixed-length 43-character base62 encoded UEB hash
"B": before: variable-length sequence of decimal digits, seconds-since-epoch.
"S": server-size: variable-length sequence of decimal digits, max size in bytes
"D": delegate-to-key: ECDSA public key, 33 base62 characters.
"F": furl-to: variable-length FURL string, wrapped in a netstring:
"%d:%s," % (len(FURL), FURL). Note that this is rarely pasted.
"E.": end-of-dictionary marker
The ECDSA signature is serialized as a variable number of base62 characters,
terminated by a period. We expect the signature to be about 384 bits (48
bytes) long, or 65 base62 characters. A missing signature (such as for the
initial cert) is represented as a single period.
The key hint is serialized with a base62-encoded serialized hint string (a
byte-quantized prefix of the serialized public key), terminated by a period.
An empty hint would thus be serialized as a single period. For the current
design, we expect the key hint to be empty.
The full storage authority string consists of a certificate chain and a
delegate private key. Given the single-certificate serialization scheme
described above, the full authority is serialized as follows:
* version prefix: depends upon the application, but for storage-authority
2009-05-23 18:40:11 +00:00
chains this will be "sa0-", for Storage-Authority Version 0.
* serialized certificates, concatenated together
* serialized private key (to which the last certificate delegates authority)
Note that this serialization form does not have an explicit terminator, so
the environment must provide a length indicator or some other way to identify
the end of the authority string. The benefit of this approach is that the
full string will begin and end with alphanumeric characters, making
cut-and-paste easier (increasing the size of the mouse target: anywhere
within the final component will work).
Also note that the period is a reserved delimiter: it cannot appear in the
serialized restrictions dictionary. The parser can remove the version prefix,
split the rest on periods, and expect to see 3*k+1 fields, consisting of k
(restriction-dictionary,signature,keyhint) 3-tuples and a single private key
at the end.
Some examples:
2009-05-23 18:40:11 +00:00
(example A)
cert[0] delegates account 1,4 to (pubkey ZlFA / privkey 1f2S):
2009-05-23 18:40:11 +00:00
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...1f2SI9UJPXvb7vdJ1
2009-05-23 18:40:11 +00:00
(example B)
cert[0] delegates account 1,4 to ZlFA/1f2S
cert[1] subdelegates 5GB and subaccount 1,4,7 to pubkey 0BPo/06rt:
2009-05-23 18:40:11 +00:00
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...A1,4,7S5000000000D0BPoGxJ3M4KWrmdpLnknhJABrWip5e9kPE,7cyhQvv5axdeihmOzIHjs85TcUIYiWHdsxNz50GTerEOR5ucj2TITPXxyaCUli1oF...06rtcPQotR3q4f2cT
== Problems ==
Problems which have thus far been identified with this approach:
* allowing arbitrary subaccount generation will permit a DoS attack, in
which an authorized uploader consumes lots of DB space by creating an
unbounded number of randomly-generated subaccount identifiers. OTOH, they
can already attach an unbounded number of leases to any file they like,
consuming a lot of space.