mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-13 16:29:51 +00:00
713 lines
35 KiB
Plaintext
713 lines
35 KiB
Plaintext
|
|
||
|
= Accounting =
|
||
|
|
||
|
"Accounting" is the arena of the Tahoe system that concerns measuring,
|
||
|
controlling, and enabling the ability to upload and download files, and to
|
||
|
create new directories. In contrast with the capability-based access control
|
||
|
model, which dictates how specific files and directories may or may not be
|
||
|
manipulated, Accounting is concerned with resource consumption: how much disk
|
||
|
space a given person/account/entity can use.
|
||
|
|
||
|
The 1.3.0 and earlier releases have a nearly-unbounded resource usage model.
|
||
|
Anybody who can talk to the Introducer gets to talk to all the Storage
|
||
|
Servers, and anyone who can talk to a Storage Server gets to use as much disk
|
||
|
space as they want (up to the reserved_space= limit imposed by the server,
|
||
|
which affects all users equally). Not only is the per-user space usage
|
||
|
unlimited, it is also unmeasured: the owner of the Storage Server has no way
|
||
|
to find out how much space Alice or Bob is using.
|
||
|
|
||
|
The goals of the Accounting system are thus:
|
||
|
|
||
|
* allow the owner of a storage server to control who gets to use disk space,
|
||
|
with separate limits per user
|
||
|
* allow both the server owner and the user to measure how much space the user
|
||
|
is consuming, in an efficient manner
|
||
|
* provide grid-wide aggregation tools, so a set of cooperating server
|
||
|
operators can easily measure how much a given user is consuming across all
|
||
|
servers. This information should also be available to the user in question.
|
||
|
|
||
|
For the purposes of this document, the terms "Account" and "User" are mostly
|
||
|
interchangeable. The fundamental unit of Accounting is the "Account", in that
|
||
|
usage and quota enforcement is performed separately for each account. These
|
||
|
accounts might correspond to individual human users, or they might be shared
|
||
|
among a group, or a user might have an arbitrary number of accounts.
|
||
|
|
||
|
Accounting interacts with Garbage Collection. To protect their shares from
|
||
|
GC, clients maintain limited-duration leases on those shares: when the last
|
||
|
lease expires, the share is deleted. Each lease has a "label", which
|
||
|
indicates the account or user which wants to keep the share alive. A given
|
||
|
account's "usage" (their per-server aggregate usage) is simply the sum of the
|
||
|
sizes of all shares on which they hold a lease. The storage server may limit
|
||
|
the user to a fixed "quota" (an upper bound on their usage). To keep a file
|
||
|
alive, the user must be willing to use up some of their quota. A popular file
|
||
|
might have leases from multiple users, in which case one user might take a
|
||
|
chance and decline to add their own lease, saving some of their quota and
|
||
|
hoping that the other leases continue to keep the file alive despite their
|
||
|
personal unwillingness to contribute to the effort.
|
||
|
|
||
|
== Authority Flow ==
|
||
|
|
||
|
The authority to consume space on the storage server originates, of course,
|
||
|
with the storage server operator. These operators start with complete control
|
||
|
over their space, and delegate portions of it to others: either directly to
|
||
|
clients who want to upload files, or to intermediaries who can then delegate
|
||
|
attenuated authority onwards. The operators have various reasons for wanting
|
||
|
to share their space: monetary consideration, expectations of in-kind
|
||
|
exchange, or simple generosity. But the first and final authority rests with
|
||
|
them.
|
||
|
|
||
|
The server operator grants restricted authority over their space by
|
||
|
configuring their server to accept requests that demonstrate knowledge of
|
||
|
certain secrets. They then share those secrets with the client who intends to
|
||
|
use this space, or an intermediary who will generate still more secrets and
|
||
|
share those with the client. Eventually, an upload or create-directory
|
||
|
operation will be performed that needs this authority. Part of the operation
|
||
|
will involve proving knowledge of the secret to the storage server, and the
|
||
|
server will require this proof before accepting the uploaded share or adding
|
||
|
a new lease.
|
||
|
|
||
|
The authority is expressed as a string, containing cryptographically-signed
|
||
|
messages and keys. The string also contains "restrictions", which are
|
||
|
annotations that explain the limits imposed upon this authority, either by
|
||
|
the original grantor (the storage server operator) or by one of the
|
||
|
intermediaries. Authority can be reduced but not increased. Any holder of a
|
||
|
given authority can delegate some or all of it to another party.
|
||
|
|
||
|
The authority string may be short enough to include as an argument to a CLI
|
||
|
command (--with-authority ABCDE), or it may be long enough that it must be
|
||
|
stashed in a file and referenced in some other fashion (--with-authority-file
|
||
|
~/.my_authority). There are CLI tools to create brand new authority strings,
|
||
|
to derive attenuated authorities from an existing one, and to explain the
|
||
|
contents of an authority string. These authority strings can be shared with
|
||
|
others just like filecaps and dircaps: knowledge of the authority string is
|
||
|
both necessary and complete to wield the authority it represents.
|
||
|
|
||
|
webapi requests will include the authority necessary to complete the
|
||
|
operation. When used by a CLI tool, the authority is likely to come from
|
||
|
~/.tahoe/private/authority (i.e. it is ambient to the user who has access to
|
||
|
that node, just like aliases provide similar access to a specific "root
|
||
|
directory"). When used by the browser-oriented WUI, the authority will [TODO]
|
||
|
somehow be retained on each page in a way that minimizes the risk of CSRF
|
||
|
attacks and allows safe sharing (cut-and-paste of a URL without sharing the
|
||
|
storage authority too). The client node receiving the webapi request will
|
||
|
extract the authority string from the request and use it to build the storage
|
||
|
server messages that it uses to fulfill the request.
|
||
|
|
||
|
== Definition Of Authority ==
|
||
|
|
||
|
The term "authority" is used here somewhat casually: in the object-capability
|
||
|
world, the word refers to the ability of some principal to cause some action
|
||
|
to occur, whether because they can do it themselves, or because they can
|
||
|
convince some other principal to do it for them. In Tahoe terms, "storage
|
||
|
authority" is the ability to do one of the following actions:
|
||
|
|
||
|
* upload a new share, thus consuming storage space
|
||
|
* adding a new lease to a share, thus preventing space from being reclaimed
|
||
|
* modify an existing mutable share, potentially increasing the space consumed
|
||
|
|
||
|
The Accounting effort may involve other kinds of authority that gets limited
|
||
|
in a similar manner as storage authority, like the ability to download a
|
||
|
share: things that may consume CPU time, disk bandwidth, or other limited
|
||
|
resources. There is also the authority to renew or cancel a lease, which may
|
||
|
be controlled in a similar fashion.
|
||
|
|
||
|
Storage authority, as granted from a server operator to a client, is not
|
||
|
simply a binary "use space or not" grant. Instead, it is parameterized by a
|
||
|
number of "restrictions". The most important of these restrictions (with
|
||
|
respect to the goals of Accounting) is the "Account Label".
|
||
|
|
||
|
=== Account Labels ===
|
||
|
|
||
|
A Tahoe "Account" is defined by a variable-length sequence of small integers.
|
||
|
(they are not required to be small, the actual limit is 2**64, but neither
|
||
|
are they required to be unguessable). These accounts are arranged in a
|
||
|
hierarchy: the account identifier (1,4) is considered to be a "parent" of
|
||
|
(1,4,2). There is no relationship between the values used by unrelated
|
||
|
accounts: (1,4) is unrelated to (2,4), despite both coincidentally using a
|
||
|
"4" in the second element.
|
||
|
|
||
|
Each lease has a label, which contains the Account identifier. The storage
|
||
|
server maintains an aggregate size count for each label prefix: when asked
|
||
|
about account (1,4), it will report the amount of space used by shares
|
||
|
labeled (1,4), (1,4,2), (1,4,7), (1,4,7,8), etc (but *not* (1) or (1,5)).
|
||
|
|
||
|
The "Account Label" restriction allows a client to apply any label it wants,
|
||
|
as long as that label begins with a specific prefix. If account (1) is
|
||
|
associated with Alice, then Alice will receive a storage authority string
|
||
|
that contains a "must start with (1)" restriction, enabling her to to use
|
||
|
storage space but obligating her to lease her shares with a label that can be
|
||
|
traced back to her. She can delegate part of her authority to others (perhaps
|
||
|
with other non-label restrictions, such as a space restriction or time limit)
|
||
|
with or without an additional label restriction. For example, she might
|
||
|
delegate some of her authority to her friend Amy, with a (1,4) label
|
||
|
restriction. Amy could then create labels with (1,4) or (1,4,7), but she
|
||
|
could not create labels with the same (1) identifier that Alice can do, nor
|
||
|
could she create labels with (1,5) (which Alice might have given to her other
|
||
|
friend Annette). The storage server operator can ask about the usage of (1)
|
||
|
to find out how much Alice is responsible for (which includes the space that
|
||
|
she has delegated to Amy and Annette), and none of the A-users can avoid
|
||
|
being counted in this total. But Alice can ask the storage server about the
|
||
|
usage of (1,4) to find out how much Amy has taken advantage of her gift.
|
||
|
Likewise, Alice has control over any lease with a label that begins with (1),
|
||
|
so she can cancel Amy's leases and free the space they were consuming. If
|
||
|
this seems surprising, consider that the storage server operator considerd
|
||
|
Alice to be responsible for that space anyways: with great responsibility
|
||
|
(for space consumed) comes great power (to stop consuming that space).
|
||
|
|
||
|
=== Server Space Restriction ===
|
||
|
|
||
|
The storage server's basic control over how space usage (apart from the
|
||
|
binary use-it-or-not authority granted by handing out an authority string at
|
||
|
all) is implemented by keeping track of the space used by any given account
|
||
|
identifier. If account (1,4) sends a request to allocate a 1MB share, but
|
||
|
that 1MB would bring the (1,4) usage over its quota, the request will be
|
||
|
denied.
|
||
|
|
||
|
For this to be useful, the storage server must give each usage-limited
|
||
|
principal a separate account, and it needs to configure a size limit at the
|
||
|
same time as the authority string is minted. For a friendnet, the CLI "add
|
||
|
account" tool can do both at once:
|
||
|
|
||
|
tahoe server add-account --quota 5GB Alice
|
||
|
--> Please give the following authority string to "Alice", who should
|
||
|
provide it to the "tahoe add-authority" command
|
||
|
(authority string..)
|
||
|
|
||
|
This command will allocate an account identifier, add Alice to the "pet name
|
||
|
table" to associate it with the new account, and establish the 5GB sizelimit.
|
||
|
Both the sizelimit and the petname can be changed later.
|
||
|
|
||
|
Note that this restriction is independent for each server: some additional
|
||
|
mechanism must be used to provide a grid-wide restriction.
|
||
|
|
||
|
Also note that this restriction is not expressed in the authority string. It
|
||
|
is purely local to the storage server.
|
||
|
|
||
|
=== Attenuated Server Space Restriction ===
|
||
|
|
||
|
TODO (or not)
|
||
|
|
||
|
The server-side space restriction described above can only be applied by the
|
||
|
storage server, and cannot be attenuated by other delegates. Alice might be
|
||
|
allowed to use 5GB on this server, but she cannot use that restriction to
|
||
|
delegate, say, just 1GB to Amy.
|
||
|
|
||
|
Instead, Alice's sub-delegation should include a "server_size" restriction
|
||
|
key, which contains a size limit. The storage server will only honor a
|
||
|
request that uses this authority string if it does not cause the aggregate
|
||
|
usage of this authority string's account prefix to rise above the given size
|
||
|
limit.
|
||
|
|
||
|
Note that this will not enforce the desired restriction if the size limits
|
||
|
are not consistent across multiple delegated authorities for the same label.
|
||
|
For example, if Amy ends up with two delagations, A1 (which gives her a size
|
||
|
limit of 1GB) and A2 (which gives her 5GB), then she can consume 5GB despite
|
||
|
the limit in A1.
|
||
|
|
||
|
=== Other Restrictions ===
|
||
|
|
||
|
Many storage authority restrictions are meant for internal use by tahoe tools
|
||
|
as they delegate short-lived subauthorities to each other, and are not likely
|
||
|
to be set by end users.
|
||
|
|
||
|
* "SI": a storage index string. The authority can only be used to upload
|
||
|
shares of a single file.
|
||
|
* "serverid": a server identifier. The authority can only be used when
|
||
|
talking to a specific server
|
||
|
* "UEB_hash": a binary hash. The authority can only be used to upload shares
|
||
|
of a single file, identified by its share's contents. (note: this
|
||
|
restricton would require the server to parse the share and validate the
|
||
|
hash)
|
||
|
* "before": a timestamp. The authority is only valid until a specific time.
|
||
|
Requires synchronized clocks or a better definition of "timestamp".
|
||
|
* "delegate_to_furl": a string, used to acquire a FURL for an object that
|
||
|
contains the attenuated authority. When it comes time to actually use the
|
||
|
authority string to do something, this is the first step.
|
||
|
* "delegate_to_key": an ECDSA pubkey, used to grant attenuated authority to
|
||
|
a separate private key.
|
||
|
|
||
|
== User Experience ==
|
||
|
|
||
|
The process starts with Bob the storage server operator, who has just created
|
||
|
a new Storage Server:
|
||
|
|
||
|
tahoe create-client
|
||
|
--> creates ~/.tahoe
|
||
|
# edit ~/.tahoe/tahoe.cfg, add introducer.furl, configure storage, etc
|
||
|
|
||
|
Now Bob decides that he wants to let his friend Alice use 5GB of space on his
|
||
|
new server.
|
||
|
|
||
|
tahoe server add-account --quota=5GB Alice
|
||
|
--> Please give the following authority string to "Alice", who should
|
||
|
provide it to the "tahoe add-authority" command
|
||
|
(authority string XYZ..)
|
||
|
|
||
|
Bob copies the new authority string into an email message and sends it to
|
||
|
Alice. Meanwhile, Alice has created her own client, and attached it to the
|
||
|
same Introducer as Bob. When she gets the email, she pastes the authority
|
||
|
string into her local client:
|
||
|
|
||
|
tahoe client add-authority (authority string XYZ..)
|
||
|
--> new authority added: account (1)
|
||
|
|
||
|
Now all CLI commands that Alice runs with her node will take advantage of
|
||
|
Bob's space grant. Once Alice's node connects to Bob's, any upload which
|
||
|
needs to send a share to Bob's server will search her list of authorities to
|
||
|
find one that allows her to use Bob's server.
|
||
|
|
||
|
When Alice uses her WUI, upload will be disabled until and unless she pastes
|
||
|
one or more authority strings into a special "storage authority" box. TODO:
|
||
|
Once pasted, we'll use some trick to keep the authority around in a
|
||
|
convenient-yet-safe fashion.
|
||
|
|
||
|
When Alice uses her javascript-based web drive, the javascript program will
|
||
|
be launched with some trick to hand it the storage authorities, perhaps via a
|
||
|
fragment identifier (http://server/path#fragment).
|
||
|
|
||
|
If Alice decides that she wants Amy to have some space, she takes the
|
||
|
authority string that Bob gave her and uses it to create one for Amy:
|
||
|
|
||
|
tahoe authority dump (authority string XYZ..)
|
||
|
--> explanation of what is in XYZ
|
||
|
tahoe authority delegate --account 4,1 --space 2GB (authority string XYZ..)
|
||
|
--> (new authority string ABC..)
|
||
|
|
||
|
Alice sends the ABC string to Amy, who uses "tahoe client add-authority" to
|
||
|
start using it.
|
||
|
|
||
|
Later, Bob would like to find out how much space Alice is using. He brings up
|
||
|
his node's Storage Server Web Status page. In addition to the overall usage
|
||
|
numbers, the page will have a collapsible-treeview table with lines like:
|
||
|
|
||
|
AccountID Usage TotalUsage Petname
|
||
|
(1) 1.5GB 2.5GB Alice
|
||
|
+(1,4) 1.0GB 1.0GB ?
|
||
|
|
||
|
This indicates that Alice, as a whole, is using 2.5GB. It also indicates that
|
||
|
Alice has delegated some space to a (1,4) account, and that delegation has
|
||
|
used 1.0GB. Alice has used 1.5GB on her own, but is responsible for the full
|
||
|
2.5GB. If Alice tells Bob that the subaccount is for Amy, then Bob can assign
|
||
|
a pet name for (1,4) with "tahoe server add-pet-name 1,4 Amy". Note that Bob
|
||
|
is not aware of the 2GB limit that Alice has imposed upon Amy: the size
|
||
|
restriction may have appeared on all the requests that have showed up thus
|
||
|
far, but Bob has no way of being sure that a less-restrictive delgation
|
||
|
hasn't been created, so his UI does not attempt to remember or present the
|
||
|
restrictions it has seen before.
|
||
|
|
||
|
=== Friendnet ===
|
||
|
|
||
|
A "friendnet" is a set of nodes, each of which is both a storage server and a
|
||
|
client, each operated by a separate person, all of which have granted storage
|
||
|
rights to the others.
|
||
|
|
||
|
The simplest way to get a friendnet started is to simply grant storage
|
||
|
authority to everybody. "tahoe server enable-ambient-storage-authority" will
|
||
|
configure the storage server to give space to anyone who asks. This behaves
|
||
|
just like a 1.3.0 server, without accounting of any sort.
|
||
|
|
||
|
The next step is to restrict server use to just the participants. "tahoe
|
||
|
server disable-ambient-storage-authority" will undo the previous step, then
|
||
|
there are two basic approaches:
|
||
|
|
||
|
* "full mesh": each node grants authority directory to all the others.
|
||
|
First, agree upon a userid number for each participant (the value doesn't
|
||
|
matter, as long as it is unique). Each user should then use "tahoe server
|
||
|
add-account" for all the accounts (including themselves, if they want some
|
||
|
of their shares to land on their own machine), including a quota if they
|
||
|
wish to restrict individuals:
|
||
|
|
||
|
tahoe server add-account --account 1 --quota 5GB Alice
|
||
|
--> authority string for Alice
|
||
|
tahoe server add-account --account 2 --quota 5GB Bob
|
||
|
--> authority string for Bob
|
||
|
tahoe server add-account --account 3 --quota 5GB Carol
|
||
|
--> authority string for Carol
|
||
|
|
||
|
Then email Alice's string to Alice, Bob's string to Bob, etc. Once all
|
||
|
users have used "tahoe client add-authority" on everything, each server
|
||
|
will accept N distinct authorities, and each client will hold N distinct
|
||
|
authorities.
|
||
|
|
||
|
* "account manager": the group designates somebody to be the "AM", or
|
||
|
"account manager". The AM generates a keypair and publishes the public key
|
||
|
to all the participants, who create a local authority which delgates full
|
||
|
storage rights to the corresponding private key. The AM then delegates
|
||
|
account-restricted authority to each user, sending them their personal
|
||
|
authority string:
|
||
|
|
||
|
AM:
|
||
|
tahoe authority create-authority --write-private-to=private.txt
|
||
|
--> public.txt
|
||
|
# email public.txt to all members
|
||
|
AM:
|
||
|
tahoe authority delegate --from-file=private.txt --account 1 --quota 5GB
|
||
|
--> alice_authority.txt # email this to Alice
|
||
|
tahoe authority delegate --from-file=private.txt --account 2 --quota 5GB
|
||
|
--> bob_authority.txt # email this to Bob
|
||
|
tahoe authority delegate --from-file=private.txt --account 3 --quota 5GB
|
||
|
--> carol_authority.txt # email this to Carol
|
||
|
...
|
||
|
Alice:
|
||
|
# receives alice_authority.txt
|
||
|
tahoe client add-authority --from-file=alice_authority.txt
|
||
|
# receives public.txt
|
||
|
tahoe server add-authorization --from-file=public.txt
|
||
|
Bob:
|
||
|
# receives bob_authority.txt
|
||
|
tahoe client add-authority --from-file=bob_authority.txt
|
||
|
# receives public.txt
|
||
|
tahoe server add-authorization --from-file=public.txt
|
||
|
Carol:
|
||
|
# receives carol_authority.txt
|
||
|
tahoe client add-authority --from-file=carol_authority.txt
|
||
|
# receives public.txt
|
||
|
tahoe server add-authorization --from-file=public.txt
|
||
|
|
||
|
If the members want to see names next to their local usage totals, they
|
||
|
can set local petnames for the accounts:
|
||
|
|
||
|
tahoe server set-petname 1 Alice
|
||
|
tahoe server set-petname 2 Bob
|
||
|
tahoe server set-petname 3 Carol
|
||
|
|
||
|
Alternatively, the AM could provide a usage aggregator, which will collect
|
||
|
usage values from all the storage servers and show the totals in a single
|
||
|
place, and add the petnames to that display instead.
|
||
|
|
||
|
The AM gets more authority than anyone else (they can spoof everybody),
|
||
|
but each server has just a single authorization instead of N, and each
|
||
|
client has a single authority instead of N. When a new member joins the
|
||
|
group, the amount of work that must be done is significantly less, and
|
||
|
only two parties are involved instead of all N:
|
||
|
|
||
|
AM:
|
||
|
tahoe authority delegate --from-file=private.txt --account 4 --quota 5GB
|
||
|
--> dave_authority.txt # email this to Dave
|
||
|
Dave:
|
||
|
# receives dave_authority.txt
|
||
|
tahoe client add-authority --from-file=dave_authority.txt
|
||
|
# receives public.txt
|
||
|
tahoe server add-authorization --from-file=public.txt
|
||
|
|
||
|
Another approach is to let everybody be the AM: instead of keeping the
|
||
|
private.txt file secret, give it to all members of the group (but not to
|
||
|
outsiders). This lets current members bring new members into the group
|
||
|
without depending upon anybody else doing work. It also renders any notion
|
||
|
of enforced quotas meaningless, so it is only appropriate for actual
|
||
|
friends who are voluntarily refraining from spoofing each other.
|
||
|
|
||
|
=== Commercial Grid ===
|
||
|
|
||
|
A "commercial grid", like the one that allmydata.com manages as a for-profit
|
||
|
service, is characterized by a large number of independent clients (who do
|
||
|
not know each other), and by all of the storage servers being managed by a
|
||
|
single entity. In this case, we use an Account Manager like above, to
|
||
|
collapse the potential N*M explosion of authorities into something smaller.
|
||
|
We also create a dummy "parent" account, and give all the real clients
|
||
|
subaccounts under it, to give the operations personnel a convenient "total
|
||
|
space used" number. Each time a new customer joins, the AM is directed to
|
||
|
create a new authority for them, and the resulting string is provided to the
|
||
|
customer's client node.
|
||
|
|
||
|
AM:
|
||
|
tahoe authority create-authority --account 1 \
|
||
|
--write-private-to=AM-private.txt --write-public-to=AM-public.txt
|
||
|
|
||
|
Each time a new storage server is brought up:
|
||
|
|
||
|
SERVER:
|
||
|
tahoe server add-authorization --from-file=AM-public.txt
|
||
|
|
||
|
Each time a new client joins:
|
||
|
|
||
|
AM:
|
||
|
N = next_account++
|
||
|
tahoe authority delegate --from-file=AM-private.txt --account 1,N
|
||
|
--> new_client_authority.txt # give this to new client
|
||
|
|
||
|
== Programmatic Interfaces ==
|
||
|
|
||
|
The storage authority can be passed as a string in a single serialized form,
|
||
|
which is cut-and-pasteable and printable. It uses minimal punctuation, to
|
||
|
make it possible to include it as a URL query argument or HTTP header field
|
||
|
without requiring character-escaping.
|
||
|
|
||
|
Before passing it over HTTP, however, note that revealing the authority
|
||
|
string to someone is equivalent to irrevocably delegating all that authority
|
||
|
to them. While this is appropriate when transferring authority from, say, a
|
||
|
receptive storage server to your local agent, it is not appropriate when
|
||
|
using a foreign tahoe node, or when asking a Helper to upload a specific
|
||
|
file. Attenuations (see below) should be used to limit the delegated
|
||
|
authority in these cases.
|
||
|
|
||
|
In the programmatic webapi interface (colloquially known as the "WAPI"), any
|
||
|
operation that consumes storage will accept a storage-authority= query
|
||
|
argument, the value of which will be the printable form of an authority
|
||
|
string. This includes all PUT operations, POST t=upload and t=mkdir, and
|
||
|
anything which creates a new file, creates a directory (perhaps an
|
||
|
intermediate one), or modifies a mutable file.
|
||
|
|
||
|
Alternatively, the authority string can also be passed through an HTTP
|
||
|
header. A single "X-Tahoe-Storage-Authority:" header can be used with the
|
||
|
printable authority string. If the string is too large to fit in a single
|
||
|
header, the application can provide a series of numbered
|
||
|
"X-Tahoe-Storage-Authority-1:", "X-Tahoe-Storage-Authority-2:", etc, headers,
|
||
|
and these will be sorted in alphabetical order (please use 08/09/10/11 rather
|
||
|
than 8/9/10/11), stripped of leading and trailing whitespace, and
|
||
|
concatenated. The HTTP header form can accomodate larger authority strings,
|
||
|
since these strings can grow too large to pass as a query argument
|
||
|
(especially when several delegations or attenuations are involved). However,
|
||
|
depending upon the HTTP client library being used, passing extra HTTP headers
|
||
|
may be more complicated than simply modifying the URL, and may be impossible
|
||
|
in some cases (such as javascript running in a web browser).
|
||
|
|
||
|
TODO: we may add a stored-token form of authority-passing to handle
|
||
|
environments in which query-args won't work and headers are not available.
|
||
|
This approach would use a special PUT which takes the authority string as the
|
||
|
HTTP body, and remembers it on the server side in associated with a
|
||
|
brief-but-unguessable token. Later operations would then use the authority by
|
||
|
passing a --storage-authority-token=XYZ query argument. These authorities
|
||
|
would expire after some period.
|
||
|
|
||
|
== Quota Management, Aggregation, Reporting ==
|
||
|
|
||
|
The storage server will maintain enough information to efficiently compute
|
||
|
usage totals for each account referenced in all of their leases, as well as
|
||
|
all their parent accounts. This information is used for several purposes:
|
||
|
|
||
|
* enforce server-space restrictions, by selectively rejecting storage
|
||
|
requests which would cause the account-usage-total to rise above the limit
|
||
|
specified in the enabling authorization string
|
||
|
* report individual account usage to the account-holder (if a client can
|
||
|
consume space under account A, they are also allowed to query usage for
|
||
|
account A or a subaccount).
|
||
|
* report individual account usage to the storage-server operator, possibly
|
||
|
associated with a pet name
|
||
|
* report usage for all accounts to the storage-server operator, possibly
|
||
|
associated with a pet name, in the form of a large table
|
||
|
* report usage for all accounts to an external aggregator
|
||
|
|
||
|
The external aggregator would take usage information from all the storage
|
||
|
servers in a single grid and sum them together, providing a grid-wide usage
|
||
|
number for each account. This could be used by e.g. clients in a commercial
|
||
|
grid to report overall-space-used to the end user.
|
||
|
|
||
|
There will be webapi URLs available for all of these reports.
|
||
|
|
||
|
TODO: storage servers might also have a mechanism to apply space-usage limits
|
||
|
to specific account ids directly, rather than requiring that these be
|
||
|
expressed only through authority-string limitation fields. This would let a
|
||
|
storage server operator revoke their space-allocation after delivering the
|
||
|
authority string.
|
||
|
|
||
|
== Low-Level Formats ==
|
||
|
|
||
|
This section describes the low-level formats used by the Accounting process,
|
||
|
beginning with the storage-authority data structure and working upwards. This
|
||
|
section is organized to follow the storage authority, starting from the point
|
||
|
of grant. The discussion will thus begin at the storage server (where the
|
||
|
authority is first created), work back to the client (which receives the
|
||
|
authority as a webapi argument), then follow the authority back to the
|
||
|
servers as it is used to enable specific storage operations. It will then
|
||
|
detail the accounting tables that the storage server is obligated to
|
||
|
maintain, and describe the interfaces through which these tables are accessed
|
||
|
by other parties.
|
||
|
|
||
|
=== Storage Authority ===
|
||
|
|
||
|
==== Terminology ====
|
||
|
|
||
|
Storage Authority is represented as a chain of certificates and a private
|
||
|
key. Each certificate authorizes and restricts a specific private key. The
|
||
|
initial certificate in the chain derives its authority by being placed in the
|
||
|
storage server's tahoe.cfg file (i.e. by being authorized by the storage
|
||
|
server operator). All subsequent certificates are signed by the authorized
|
||
|
private key that was identified in the previous certificate: they derive
|
||
|
their authority by delegation. Each certificate has restrictions which limit
|
||
|
the authority being delegated.
|
||
|
|
||
|
authority: ([cert[0], cert[1], cert[2] ...], privatekey)
|
||
|
|
||
|
The "restrictions dictionary" is a table which establishes an upper bound on
|
||
|
how this authority (or any attenuations thereof) may be used. It is
|
||
|
effectively a set of key-value pairs.
|
||
|
|
||
|
A "signing key" is an EC-DSA192 private key string, as supplied to the
|
||
|
pycryptopp SigningKey() constructor, and is 12 bytes long. A "verifying key"
|
||
|
is an EC-DSA192 public key string, as produced by pycryptopp, and is 24 bytes
|
||
|
long. A "key identifier" is a string which securely identifies a specific
|
||
|
signing/verifying keypair: for long RSA keys it would be a secure hash of the
|
||
|
public key, but since ECDSA192 keys are so short, we simply use the full
|
||
|
verifying key verbatim. A "key hint" is a variable-length prefix of the key
|
||
|
identifier, perhaps zero bytes long, used to help a recipient reduce the
|
||
|
number of verifying keys that it must search to find one that matches a
|
||
|
signed message.
|
||
|
|
||
|
==== Authority Chains ====
|
||
|
|
||
|
The authority chain consists of a list of certificates, each of which has a
|
||
|
serialized restrictions dictionary. Each dictionary will have a
|
||
|
"delegate-to-key" field, which delegates authority to a private key,
|
||
|
referenced with a key identifier. In addition, the non-initial certs are
|
||
|
signed, so they each contain a signature and a key hint:
|
||
|
|
||
|
cert[0]: serialized(restrictions_dictionary)
|
||
|
cert[1]: serialized(restrictions_dictionary), signature, keyhint
|
||
|
cert[2]: serialized(restrictions_dictionary), signature, keyhint
|
||
|
|
||
|
In this example, suppose cert[0] contains a delegate-to-key field that
|
||
|
identifies a keypair sign_A/verify_A. In this case, cert[1] will have a
|
||
|
signature that was made with sign_A, and the keyhint in cert[1] will
|
||
|
reference verify_A.
|
||
|
|
||
|
cert[0].restrictions[delegate-to-key] = A_keyid
|
||
|
|
||
|
cert[1].signature = SIGN(sign_A, serialized(cert[0].restrictions))
|
||
|
cert[1].keyhint = verify_A
|
||
|
cert[1].restrictions[delegate-to-key] = B_keyid
|
||
|
|
||
|
cert[2].signature = SIGN(sign_B, serialized(cert[1].restrictions))
|
||
|
cert[2].keyhint = verify_B
|
||
|
cert[2].restrictions[delete-to-key] = C_keyid
|
||
|
|
||
|
In this example, the full storage authority consists of the cert[0,1,2] chain
|
||
|
and the sign_C private key: anyone who is in possession of both will be able
|
||
|
to exert this authority. To wield the authority, a client will present the
|
||
|
cert[0,1,2] chain and an action message signed by sign_C; the server will
|
||
|
validate the chain and the signature before performing the requested action.
|
||
|
The only circumstances that might prompt the client to share the sign_C
|
||
|
private key with another party (including the server) would be if it wanted
|
||
|
to irrevocably share its full authority with that party.
|
||
|
|
||
|
==== Restriction Dictionaries ====
|
||
|
|
||
|
Within a restriction dictionary, the following keys are defined. Their full
|
||
|
meanings are defined later.
|
||
|
|
||
|
'accountid': an arbitrary-length sequence of integers >=0, restricting the
|
||
|
accounts which can be manipulated or used in leases
|
||
|
'SI': a storage index (binary string), controlling which file may be
|
||
|
manipulated
|
||
|
'serverid': binary string, limiting which server will accept requests
|
||
|
'UEB-hash': binary string, limiting the content of the file being manipulated
|
||
|
'before': timestamp (seconds since epoch), limits the lifetime of this
|
||
|
authority
|
||
|
'server-size': integer >0, maximum aggregate storage (in bytes) per account
|
||
|
'delegate-to-key': binary string (DSA pubkey identifier)
|
||
|
'furl-to': printable FURL string
|
||
|
|
||
|
==== Authority Serialization ====
|
||
|
|
||
|
There is only one form of serialization: a somewhat-compact URL-safe
|
||
|
cut-and-pasteable printable form. We are interested in minimizing the size of
|
||
|
the resulting authority, so rather than using a general-purpose (perhaps
|
||
|
JSON-based) serialization scheme, we use one that is specialized for this
|
||
|
task.
|
||
|
|
||
|
This URL-safe form will use minimal punctuation to avoid quoting issues when
|
||
|
used in a URL query argument. It would be nice to avoid word-breaking
|
||
|
characters that make cut-and-paste troublesome, however this is more
|
||
|
difficult because most non-alphanumeric characters are word-breaking in at
|
||
|
least one application.
|
||
|
|
||
|
The serialized storage authority as a whole contains a single version
|
||
|
identifier and magic number at the beginning. None of the internal components
|
||
|
contain redundant version numbers: they are implied by the container. If
|
||
|
components are serialized independently for other reasons, they may contain
|
||
|
version identifers in that form.
|
||
|
|
||
|
Signing keys (i.e. private keys) are URL-safe-serialized using Zooko's base62
|
||
|
alphabet, which offers almost the same density as standard base64 but without
|
||
|
any non-URL-safe or word-breaking characters. Since we used fixed-format keys
|
||
|
(EC-DSA, 192bit, with SHA256), the private keys are fixed-length (96 bits or
|
||
|
12 bytes), so there is no length indicator: all URL-safe-serialized signing
|
||
|
keys are 17 base62 characters long. The 192-bit verifying keys (i.e. public
|
||
|
keys) use the same approach: the URL-safe form is 33 characters long.
|
||
|
|
||
|
An account-id sequence (a variable-length sequence of non-negative numbers)
|
||
|
is serialized by representing each number in decimal ASCII, then joining the
|
||
|
pieces with commas. The string is terminated by the first non-[0-9,]
|
||
|
character encountered, which will either be the key-identifier letter of the
|
||
|
next field, or the dictionary-terminating character at the end.
|
||
|
|
||
|
Any single decimal number (such as the "before" timestamp field, or the
|
||
|
"server-size" field) is serialized as a variable-length sequence of ASCII
|
||
|
deciman digits, terminated by any non-digit.
|
||
|
|
||
|
The restrictions dictionary is serialized as a concatenated series of
|
||
|
key-identifier-letter / value string pairs, ending with the marker "E.". The
|
||
|
URL-safe form uses a single printable letter to indicate the which key is
|
||
|
being serialized. Each type of value string is serialized differently:
|
||
|
|
||
|
"A": accountid: variable-length sequence of comma-joned numbers
|
||
|
"I": storage index: fixed-length 22-character base62-encoded storage index
|
||
|
"P": server id (peer id): fixed-length 32-character *base32* encoded serverid
|
||
|
(matching the printable Tub.tubID string that Foolscap provides)
|
||
|
"U": UEB hash: fixed-length 43-character base62 encoded UEB hash
|
||
|
"B": before: variable-length sequence of decimal digits, seconds-since-epoch.
|
||
|
"S": server-size: variable-length sequence of decimal digits, max size in bytes
|
||
|
"D": delegate-to-key: ECDSA public key, 33 base62 characters.
|
||
|
"F": furl-to: variable-length FURL string, wrapped in a netstring:
|
||
|
"%d:%s," % (len(FURL), FURL). Note that this is rarely pasted.
|
||
|
"E.": end-of-dictionary marker
|
||
|
|
||
|
The ECDSA signature is serialized as a variable number of base62 characters,
|
||
|
terminated by a period. We expect the signature to be about 384 bits (48
|
||
|
bytes) long, or 65 base62 characters. A missing signature (such as for the
|
||
|
initial cert) is represented as a single period.
|
||
|
|
||
|
The key hint is serialized with a base62-encoded serialized hint string (a
|
||
|
byte-quantized prefix of the serialized public key), terminated by a period.
|
||
|
An empty hint would thus be serialized as a single period. For the current
|
||
|
design, we expect the key hint to be empty.
|
||
|
|
||
|
The full storage authority string consists of a certificate chain and a
|
||
|
delegate private key. Given the single-certificate serialization scheme
|
||
|
described above, the full authority is serialized as follows:
|
||
|
|
||
|
* version prefix: depends upon the application, but for storage-authority
|
||
|
chains this will be "sa0-", for storage-authority version
|
||
|
0.
|
||
|
* serialized certificates, concatenated together
|
||
|
* serialized private key (to which the last certificate delegates authority)
|
||
|
|
||
|
Note that this serialization form does not have an explicit terminator, so
|
||
|
the environment must provide a length indicator or some other way to identify
|
||
|
the end of the authority string. The benefit of this approach is that the
|
||
|
full string will begin and end with alphanumeric characters, making
|
||
|
cut-and-paste easier (increasing the size of the mouse target: anywhere
|
||
|
within the final component will work).
|
||
|
|
||
|
Also note that the period is a reserved delimiter: it cannot appear in the
|
||
|
serialized restrictions dictionary. The parser can remove the version prefix,
|
||
|
split the rest on periods, and expect to see 3*k+1 fields, consisting of k
|
||
|
(restriction-dictionary,signature,keyhint) 3-tuples and a single private key
|
||
|
at the end.
|
||
|
|
||
|
Some examples:
|
||
|
|
||
|
cert[0] delegates account 1,4 to (pubkey ZlFA / privkey 1f2S):
|
||
|
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...1f2SI9UJPXvb7vdJ1
|
||
|
|
||
|
cert[0] delegates account 1,4 to ZlFA/1f2S
|
||
|
cert[1] subdelegates 5GB and subaccount 1,4,7 to pubkey 0BPo/06rt:
|
||
|
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...A1,4,7S5000000000D0BPoGxJ3M4KWrmdpLnknhJABrWip5e9kPE,7cyhQvv5axdeihmOzIHjs85TcUIYiWHdsxNz50GTerEOR5ucj2TITPXxyaCUli1oF...06rtcPQotR3q4f2cT
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
== Problems ==
|
||
|
|
||
|
Problems which have thus far been identified with this approach:
|
||
|
|
||
|
* allowing arbitrary subaccount generation will permit a DoS attack, in
|
||
|
which an authorized uploader consumes lots of DB space by creating an
|
||
|
unbounded number of randomly-generated subaccount identifiers. OTOH, they
|
||
|
can already attach an unbounded number of leases to any file they like,
|
||
|
consuming a lot of space.
|
||
|
|