mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-19 13:07:56 +00:00
729 lines
36 KiB
Plaintext
729 lines
36 KiB
Plaintext
|
|
= Accounting =
|
|
|
|
"Accounting" is the arena of the Tahoe system that concerns measuring,
|
|
controlling, and enabling the ability to upload and download files, and to
|
|
create new directories. In contrast with the capability-based access control
|
|
model, which dictates how specific files and directories may or may not be
|
|
manipulated, Accounting is concerned with resource consumption: how much disk
|
|
space a given person/account/entity can use.
|
|
|
|
Tahoe releases up to and including 1.4.1 have a nearly-unbounded resource
|
|
usage model. Anybody who can talk to the Introducer gets to talk to all the
|
|
Storage Servers, and anyone who can talk to a Storage Server gets to use as
|
|
much disk space as they want (up to the reserved_space= limit imposed by the
|
|
server, which affects all users equally). Not only is the per-user space
|
|
usage unlimited, it is also unmeasured: the owner of the Storage Server has
|
|
no way to find out how much space Alice or Bob is using.
|
|
|
|
The goals of the Accounting system are thus:
|
|
|
|
* allow the owner of a storage server to control who gets to use disk space,
|
|
with separate limits per user
|
|
* allow both the server owner and the user to measure how much space the user
|
|
is consuming, in an efficient manner
|
|
* provide grid-wide aggregation tools, so a set of cooperating server
|
|
operators can easily measure how much a given user is consuming across all
|
|
servers. This information should also be available to the user in question.
|
|
|
|
For the purposes of this document, the terms "Account" and "User" are mostly
|
|
interchangeable. The fundamental unit of Accounting is the "Account", in that
|
|
usage and quota enforcement is performed separately for each account. These
|
|
accounts might correspond to individual human users, or they might be shared
|
|
among a group, or a user might have an arbitrary number of accounts.
|
|
|
|
Accounting interacts with Garbage Collection. To protect their shares from
|
|
GC, clients maintain limited-duration leases on those shares: when the last
|
|
lease expires, the share is deleted. Each lease has a "label", which
|
|
indicates the account or user which wants to keep the share alive. A given
|
|
account's "usage" (their per-server aggregate usage) is simply the sum of the
|
|
sizes of all shares on which they hold a lease. The storage server may limit
|
|
the user to a fixed "quota" (an upper bound on their usage). To keep a file
|
|
alive, the user must be willing to use up some of their quota.
|
|
|
|
Note that a popular file might have leases from multiple users, in which case
|
|
one user might take a chance and decline to add their own lease, saving some
|
|
of their quota and hoping that the other leases continue to keep the file
|
|
alive despite their personal unwillingness to contribute to the effort. One
|
|
could imagine a "pro-rated quotas" scheme, in which a 10MB file with 5
|
|
leaseholders would deduct 2MB from each leaseholder's quota. We have decided
|
|
to not implement pro-rated quotas, because such a scheme would make usage
|
|
values hard to predict: a given account might suddenly go over quota solely
|
|
because of a third party's actions.
|
|
|
|
== Accounting Implementation ==
|
|
|
|
The implementation of these accounting features are tracked in this ticket:
|
|
|
|
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/666
|
|
|
|
== Authority Flow ==
|
|
|
|
The authority to consume space on the storage server originates, of course,
|
|
with the storage server operator. These operators start with complete control
|
|
over their space, and delegate portions of it to others: either directly to
|
|
clients who want to upload files, or to intermediaries who can then delegate
|
|
attenuated authority onwards. The operators have various reasons for wanting
|
|
to share their space: monetary consideration, expectations of in-kind
|
|
exchange, or simple generosity. But the final authority always rests with the
|
|
operator.
|
|
|
|
The server operator grants limited authority over their space by configuring
|
|
their server to accept requests that demonstrate knowledge of certain
|
|
secrets. They then share those secrets with the client who intends to use
|
|
this space, or with an intermediary who will generate still more secrets and
|
|
share those with the client. Eventually, an upload or create-directory
|
|
operation will be performed that needs this authority. Part of the operation
|
|
will involve proving knowledge of the secret to the storage server, and the
|
|
server will require this proof before accepting the uploaded share or adding
|
|
a new lease.
|
|
|
|
The authority is expressed as a string, containing cryptographically-signed
|
|
messages and keys. The string also contains "restrictions", which are
|
|
annotations that explain the limits imposed upon this authority, either by
|
|
the original grantor (the storage server operator) or by one of the
|
|
intermediaries. Authority can be reduced but not increased. Any holder of a
|
|
given authority can delegate some or all of it to another party.
|
|
|
|
The authority string may be short enough to include as an argument to a CLI
|
|
command (--with-authority ABCDE), or it may be long enough that it must be
|
|
stashed in a file and referenced in some other fashion (--with-authority-file
|
|
~/.my_authority). There are CLI tools to create brand new authority strings,
|
|
to derive attenuated authorities from an existing one, and to explain the
|
|
contents of an authority string. These authority strings can be shared with
|
|
others just like filecaps and dircaps: knowledge of the authority string is
|
|
both necessary and complete to wield the authority it represents.
|
|
|
|
Web-API requests will include the authority necessary to complete the
|
|
operation. When used by a CLI tool, the authority is likely to come from
|
|
~/.tahoe/private/authority (i.e. it is ambient to the user who has access to
|
|
that node, just like aliases provide similar access to a specific "root
|
|
directory"). When used by the browser-oriented WUI, the authority will [TODO]
|
|
somehow be retained on each page in a way that minimizes the risk of CSRF
|
|
attacks and allows safe sharing (cut-and-paste of a URL without sharing the
|
|
storage authority too). The client node receiving the web-API request will
|
|
extract the authority string from the request and use it to build the storage
|
|
server messages that it sends to fulfill that request.
|
|
|
|
== Definition Of Authority ==
|
|
|
|
The term "authority" is used here in the object-capability sense: it refers
|
|
to the ability of some principal to cause some action to occur, whether
|
|
because they can do it themselves, or because they can convince some other
|
|
principal to do it for them. In Tahoe terms, "storage authority" is the
|
|
ability to do one of the following actions:
|
|
|
|
* upload a new share, thus consuming storage space
|
|
* adding a new lease to a share, thus preventing space from being reclaimed
|
|
* modify an existing mutable share, potentially increasing the space consumed
|
|
|
|
The Accounting effort may involve other kinds of authority that get limited
|
|
in a similar manner as storage authority, like the ability to download a
|
|
share or query whether a given share is present: anything that may consume
|
|
CPU time, disk bandwidth, or other limited resources. The authority to renew
|
|
or cancel a lease may be controlled in a similar fashion.
|
|
|
|
Storage authority, as granted from a server operator to a client, is not
|
|
simply a binary "use space or not" grant. Instead, it is parameterized by a
|
|
number of "restrictions". The most important of these restrictions (with
|
|
respect to the goals of Accounting) is the "Account Label".
|
|
|
|
=== Account Labels ===
|
|
|
|
A Tahoe "Account" is defined by a variable-length sequence of small integers.
|
|
(they are not required to be small, the actual limit is 2**64, but neither
|
|
are they required to be unguessable). For the purposes of discussion, these
|
|
lists will be expressed as period-joined strings: the two-element list (1,4)
|
|
will be displayed here as "1.4".
|
|
|
|
These accounts are arranged in a hierarchy: the account identifier 1.4 is
|
|
considered to be a "parent" of 1.4.2 . There is no relationship between the
|
|
values used by unrelated accounts: 1.4 is unrelated to 2.4, despite both
|
|
coincidentally using a "4" in the second element.
|
|
|
|
Each lease has a label, which contains the Account identifier. The storage
|
|
server maintains an aggregate size count for each label prefix: when asked
|
|
about account 1.4, it will report the amount of space used by shares labeled
|
|
1.4, 1.4.2, 1.4.7, 1.4.7.8, etc (but *not* 1 or 1.5).
|
|
|
|
The "Account Label" restriction allows a client to apply any label it wants,
|
|
as long as that label begins with a specific prefix. If account 1 is
|
|
associated with Alice, then Alice will receive a storage authority string
|
|
that contains a "must start with 1" restriction, enabling her to to use
|
|
storage space but obligating her to lease her shares with a label that can be
|
|
traced back to her. She can delegate part of her authority to others (perhaps
|
|
with other non-label restrictions, such as a space restriction or time limit)
|
|
with or without an additional label restriction. For example, she might
|
|
delegate some of her authority to her friend Amy, with a 1.4 label
|
|
restriction. Amy could then create labels with 1.4 or 1.4.7, but she could
|
|
not create labels with the same 1 identifier that Alice can do, nor could she
|
|
create labels with 1.5 (which Alice might have given to her other friend
|
|
Annette). The storage server operator can ask about the usage of 1 to find
|
|
out how much Alice is responsible for (which includes the space that she has
|
|
delegated to Amy and Annette), and none of the A-users can avoid being
|
|
counted in this total. But Alice can ask the storage server about the usage
|
|
of 1.4 to find out how much Amy has taken advantage of her gift. Likewise,
|
|
Alice has control over any lease with a label that begins with 1, so she can
|
|
cancel Amy's leases and free the space they were consuming. If this seems
|
|
surprising, consider that the storage server operator considered Alice to be
|
|
responsible for that space anyways: with great responsibility (for space
|
|
consumed) comes great power (to stop consuming that space).
|
|
|
|
=== Server Space Restriction ===
|
|
|
|
The storage server's basic control over how space usage (apart from the
|
|
binary use-it-or-not authority granted by handing out an authority string at
|
|
all) is implemented by keeping track of the space used by any given account
|
|
identifier. If account 1.4 sends a request to allocate a 1MB share, but that
|
|
1MB would bring the 1.4 usage over its quota, the request will be denied.
|
|
|
|
For this to be useful, the storage server must give each usage-limited
|
|
principal a separate account, and it needs to configure a size limit at the
|
|
same time as the authority string is minted. For a friendnet, the CLI "add
|
|
account" tool can do both at once:
|
|
|
|
tahoe server add-account --quota 5GB Alice
|
|
--> Please give the following authority string to "Alice", who should
|
|
provide it to the "tahoe add-authority" command
|
|
(authority string..)
|
|
|
|
This command will allocate an account identifier, add Alice to the "pet name
|
|
table" to associate it with the new account, and establish the 5GB sizelimit.
|
|
Both the sizelimit and the petname can be changed later.
|
|
|
|
Note that this restriction is independent for each server: some additional
|
|
mechanism must be used to provide a grid-wide restriction.
|
|
|
|
Also note that this restriction is not expressed in the authority string. It
|
|
is purely local to the storage server.
|
|
|
|
=== Attenuated Server Space Restriction ===
|
|
|
|
TODO (or not)
|
|
|
|
The server-side space restriction described above can only be applied by the
|
|
storage server, and cannot be attenuated by other delegates. Alice might be
|
|
allowed to use 5GB on this server, but she cannot use that restriction to
|
|
delegate, say, just 1GB to Amy.
|
|
|
|
Instead, Alice's sub-delegation should include a "server_size" restriction
|
|
key, which contains a size limit. The storage server will only honor a
|
|
request that uses this authority string if it does not cause the aggregate
|
|
usage of this authority string's account prefix to rise above the given size
|
|
limit.
|
|
|
|
Note that this will not enforce the desired restriction if the size limits
|
|
are not consistent across multiple delegated authorities for the same label.
|
|
For example, if Amy ends up with two delagations, A1 (which gives her a size
|
|
limit of 1GB) and A2 (which gives her 5GB), then she can consume 5GB despite
|
|
the limit in A1.
|
|
|
|
=== Other Restrictions ===
|
|
|
|
Many storage authority restrictions are meant for internal use by tahoe tools
|
|
as they delegate short-lived subauthorities to each other, and are not likely
|
|
to be set by end users.
|
|
|
|
* "SI": a storage index string. The authority can only be used to upload
|
|
shares of a single file.
|
|
* "serverid": a server identifier. The authority can only be used when
|
|
talking to a specific server
|
|
* "UEB_hash": a binary hash. The authority can only be used to upload shares
|
|
of a single file, identified by its share's contents. (note: this
|
|
restricton would require the server to parse the share and validate the
|
|
hash)
|
|
* "before": a timestamp. The authority is only valid until a specific time.
|
|
Requires synchronized clocks or a better definition of "timestamp".
|
|
* "delegate_to_furl": a string, used to acquire a FURL for an object that
|
|
contains the attenuated authority. When it comes time to actually use the
|
|
authority string to do something, this is the first step.
|
|
* "delegate_to_key": an ECDSA pubkey, used to grant attenuated authority to
|
|
a separate private key.
|
|
|
|
== User Experience ==
|
|
|
|
The process starts with Bob the storage server operator, who has just created
|
|
a new Storage Server:
|
|
|
|
tahoe create-node
|
|
--> creates ~/.tahoe
|
|
# edit ~/.tahoe/tahoe.cfg, add introducer.furl, configure storage, etc
|
|
|
|
Now Bob decides that he wants to let his friend Alice use 5GB of space on his
|
|
new server.
|
|
|
|
tahoe server add-account --quota=5GB Alice
|
|
--> Please give the following authority string to "Alice", who should
|
|
provide it to the "tahoe add-authority" command
|
|
(authority string XYZ..)
|
|
|
|
Bob copies the new authority string into an email message and sends it to
|
|
Alice. Meanwhile, Alice has created her own client, and attached it to the
|
|
same Introducer as Bob. When she gets the email, she pastes the authority
|
|
string into her local client:
|
|
|
|
tahoe client add-authority (authority string XYZ..)
|
|
--> new authority added: account (1)
|
|
|
|
Now all CLI commands that Alice runs with her node will take advantage of
|
|
Bob's space grant. Once Alice's node connects to Bob's, any upload which
|
|
needs to send a share to Bob's server will search her list of authorities to
|
|
find one that allows her to use Bob's server.
|
|
|
|
When Alice uses her WUI, upload will be disabled until and unless she pastes
|
|
one or more authority strings into a special "storage authority" box. TODO:
|
|
Once pasted, we'll use some trick to keep the authority around in a
|
|
convenient-yet-safe fashion.
|
|
|
|
When Alice uses her javascript-based web drive, the javascript program will
|
|
be launched with some trick to hand it the storage authorities, perhaps via a
|
|
fragment identifier (http://server/path#fragment).
|
|
|
|
If Alice decides that she wants Amy to have some space, she takes the
|
|
authority string that Bob gave her and uses it to create one for Amy:
|
|
|
|
tahoe authority dump (authority string XYZ..)
|
|
--> explanation of what is in XYZ
|
|
tahoe authority delegate --account 4,1 --space 2GB (authority string XYZ..)
|
|
--> (new authority string ABC..)
|
|
|
|
Alice sends the ABC string to Amy, who uses "tahoe client add-authority" to
|
|
start using it.
|
|
|
|
Later, Bob would like to find out how much space Alice is using. He brings up
|
|
his node's Storage Server Web Status page. In addition to the overall usage
|
|
numbers, the page will have a collapsible-treeview table with lines like:
|
|
|
|
AccountID Usage TotalUsage Petname
|
|
(1) 1.5GB 2.5GB Alice
|
|
+(1,4) 1.0GB 1.0GB ?
|
|
|
|
This indicates that Alice, as a whole, is using 2.5GB. It also indicates that
|
|
Alice has delegated some space to a (1,4) account, and that delegation has
|
|
used 1.0GB. Alice has used 1.5GB on her own, but is responsible for the full
|
|
2.5GB. If Alice tells Bob that the subaccount is for Amy, then Bob can assign
|
|
a pet name for (1,4) with "tahoe server add-pet-name 1,4 Amy". Note that Bob
|
|
is not aware of the 2GB limit that Alice has imposed upon Amy: the size
|
|
restriction may have appeared on all the requests that have showed up thus
|
|
far, but Bob has no way of being sure that a less-restrictive delgation
|
|
hasn't been created, so his UI does not attempt to remember or present the
|
|
restrictions it has seen before.
|
|
|
|
=== Friendnet ===
|
|
|
|
A "friendnet" is a set of nodes, each of which is both a storage server and a
|
|
client, each operated by a separate person, all of which have granted storage
|
|
rights to the others.
|
|
|
|
The simplest way to get a friendnet started is to simply grant storage
|
|
authority to everybody. "tahoe server enable-ambient-storage-authority" will
|
|
configure the storage server to give space to anyone who asks. This behaves
|
|
just like a 1.3.0 server, without accounting of any sort.
|
|
|
|
The next step is to restrict server use to just the participants. "tahoe
|
|
server disable-ambient-storage-authority" will undo the previous step, then
|
|
there are two basic approaches:
|
|
|
|
* "full mesh": each node grants authority directory to all the others.
|
|
First, agree upon a userid number for each participant (the value doesn't
|
|
matter, as long as it is unique). Each user should then use "tahoe server
|
|
add-account" for all the accounts (including themselves, if they want some
|
|
of their shares to land on their own machine), including a quota if they
|
|
wish to restrict individuals:
|
|
|
|
tahoe server add-account --account 1 --quota 5GB Alice
|
|
--> authority string for Alice
|
|
tahoe server add-account --account 2 --quota 5GB Bob
|
|
--> authority string for Bob
|
|
tahoe server add-account --account 3 --quota 5GB Carol
|
|
--> authority string for Carol
|
|
|
|
Then email Alice's string to Alice, Bob's string to Bob, etc. Once all
|
|
users have used "tahoe client add-authority" on everything, each server
|
|
will accept N distinct authorities, and each client will hold N distinct
|
|
authorities.
|
|
|
|
* "account manager": the group designates somebody to be the "AM", or
|
|
"account manager". The AM generates a keypair and publishes the public key
|
|
to all the participants, who create a local authority which delgates full
|
|
storage rights to the corresponding private key. The AM then delegates
|
|
account-restricted authority to each user, sending them their personal
|
|
authority string:
|
|
|
|
AM:
|
|
tahoe authority create-authority --write-private-to=private.txt
|
|
--> public.txt
|
|
# email public.txt to all members
|
|
AM:
|
|
tahoe authority delegate --from-file=private.txt --account 1 --quota 5GB
|
|
--> alice_authority.txt # email this to Alice
|
|
tahoe authority delegate --from-file=private.txt --account 2 --quota 5GB
|
|
--> bob_authority.txt # email this to Bob
|
|
tahoe authority delegate --from-file=private.txt --account 3 --quota 5GB
|
|
--> carol_authority.txt # email this to Carol
|
|
...
|
|
Alice:
|
|
# receives alice_authority.txt
|
|
tahoe client add-authority --from-file=alice_authority.txt
|
|
# receives public.txt
|
|
tahoe server add-authorization --from-file=public.txt
|
|
Bob:
|
|
# receives bob_authority.txt
|
|
tahoe client add-authority --from-file=bob_authority.txt
|
|
# receives public.txt
|
|
tahoe server add-authorization --from-file=public.txt
|
|
Carol:
|
|
# receives carol_authority.txt
|
|
tahoe client add-authority --from-file=carol_authority.txt
|
|
# receives public.txt
|
|
tahoe server add-authorization --from-file=public.txt
|
|
|
|
If the members want to see names next to their local usage totals, they
|
|
can set local petnames for the accounts:
|
|
|
|
tahoe server set-petname 1 Alice
|
|
tahoe server set-petname 2 Bob
|
|
tahoe server set-petname 3 Carol
|
|
|
|
Alternatively, the AM could provide a usage aggregator, which will collect
|
|
usage values from all the storage servers and show the totals in a single
|
|
place, and add the petnames to that display instead.
|
|
|
|
The AM gets more authority than anyone else (they can spoof everybody),
|
|
but each server has just a single authorization instead of N, and each
|
|
client has a single authority instead of N. When a new member joins the
|
|
group, the amount of work that must be done is significantly less, and
|
|
only two parties are involved instead of all N:
|
|
|
|
AM:
|
|
tahoe authority delegate --from-file=private.txt --account 4 --quota 5GB
|
|
--> dave_authority.txt # email this to Dave
|
|
Dave:
|
|
# receives dave_authority.txt
|
|
tahoe client add-authority --from-file=dave_authority.txt
|
|
# receives public.txt
|
|
tahoe server add-authorization --from-file=public.txt
|
|
|
|
Another approach is to let everybody be the AM: instead of keeping the
|
|
private.txt file secret, give it to all members of the group (but not to
|
|
outsiders). This lets current members bring new members into the group
|
|
without depending upon anybody else doing work. It also renders any notion
|
|
of enforced quotas meaningless, so it is only appropriate for actual
|
|
friends who are voluntarily refraining from spoofing each other.
|
|
|
|
=== Commercial Grid ===
|
|
|
|
A "commercial grid", like the one that allmydata.com manages as a for-profit
|
|
service, is characterized by a large number of independent clients (who do
|
|
not know each other), and by all of the storage servers being managed by a
|
|
single entity. In this case, we use an Account Manager like above, to
|
|
collapse the potential N*M explosion of authorities into something smaller.
|
|
We also create a dummy "parent" account, and give all the real clients
|
|
subaccounts under it, to give the operations personnel a convenient "total
|
|
space used" number. Each time a new customer joins, the AM is directed to
|
|
create a new authority for them, and the resulting string is provided to the
|
|
customer's client node.
|
|
|
|
AM:
|
|
tahoe authority create-authority --account 1 \
|
|
--write-private-to=AM-private.txt --write-public-to=AM-public.txt
|
|
|
|
Each time a new storage server is brought up:
|
|
|
|
SERVER:
|
|
tahoe server add-authorization --from-file=AM-public.txt
|
|
|
|
Each time a new client joins:
|
|
|
|
AM:
|
|
N = next_account++
|
|
tahoe authority delegate --from-file=AM-private.txt --account 1,N
|
|
--> new_client_authority.txt # give this to new client
|
|
|
|
== Programmatic Interfaces ==
|
|
|
|
The storage authority can be passed as a string in a single serialized form,
|
|
which is cut-and-pasteable and printable. It uses minimal punctuation, to
|
|
make it possible to include it as a URL query argument or HTTP header field
|
|
without requiring character-escaping.
|
|
|
|
Before passing it over HTTP, however, note that revealing the authority
|
|
string to someone is equivalent to irrevocably delegating all that authority
|
|
to them. While this is appropriate when transferring authority from, say, a
|
|
receptive storage server to your local agent, it is not appropriate when
|
|
using a foreign tahoe node, or when asking a Helper to upload a specific
|
|
file. Attenuations (see below) should be used to limit the delegated
|
|
authority in these cases.
|
|
|
|
In the programmatic web-API, any operation that consumes storage will accept
|
|
a storage-authority= query argument, the value of which will be the printable
|
|
form of an authority string. This includes all PUT operations, POST t=upload
|
|
and t=mkdir, and anything which creates a new file, creates a directory
|
|
(perhaps an intermediate one), or modifies a mutable file.
|
|
|
|
Alternatively, the authority string can also be passed through an HTTP
|
|
header. A single "X-Tahoe-Storage-Authority:" header can be used with the
|
|
printable authority string. If the string is too large to fit in a single
|
|
header, the application can provide a series of numbered
|
|
"X-Tahoe-Storage-Authority-1:", "X-Tahoe-Storage-Authority-2:", etc, headers,
|
|
and these will be sorted in alphabetical order (please use 08/09/10/11 rather
|
|
than 8/9/10/11), stripped of leading and trailing whitespace, and
|
|
concatenated. The HTTP header form can accomodate larger authority strings,
|
|
since these strings can grow too large to pass as a query argument
|
|
(especially when several delegations or attenuations are involved). However,
|
|
depending upon the HTTP client library being used, passing extra HTTP headers
|
|
may be more complicated than simply modifying the URL, and may be impossible
|
|
in some cases (such as javascript running in a web browser).
|
|
|
|
TODO: we may add a stored-token form of authority-passing to handle
|
|
environments in which query-args won't work and headers are not available.
|
|
This approach would use a special PUT which takes the authority string as the
|
|
HTTP body, and remembers it on the server side in associated with a
|
|
brief-but-unguessable token. Later operations would then use the authority by
|
|
passing a --storage-authority-token=XYZ query argument. These authorities
|
|
would expire after some period.
|
|
|
|
== Quota Management, Aggregation, Reporting ==
|
|
|
|
The storage server will maintain enough information to efficiently compute
|
|
usage totals for each account referenced in all of their leases, as well as
|
|
all their parent accounts. This information is used for several purposes:
|
|
|
|
* enforce server-space restrictions, by selectively rejecting storage
|
|
requests which would cause the account-usage-total to rise above the limit
|
|
specified in the enabling authorization string
|
|
* report individual account usage to the account-holder (if a client can
|
|
consume space under account A, they are also allowed to query usage for
|
|
account A or a subaccount).
|
|
* report individual account usage to the storage-server operator, possibly
|
|
associated with a pet name
|
|
* report usage for all accounts to the storage-server operator, possibly
|
|
associated with a pet name, in the form of a large table
|
|
* report usage for all accounts to an external aggregator
|
|
|
|
The external aggregator would take usage information from all the storage
|
|
servers in a single grid and sum them together, providing a grid-wide usage
|
|
number for each account. This could be used by e.g. clients in a commercial
|
|
grid to report overall-space-used to the end user.
|
|
|
|
There will be web-API URLs available for all of these reports.
|
|
|
|
TODO: storage servers might also have a mechanism to apply space-usage limits
|
|
to specific account ids directly, rather than requiring that these be
|
|
expressed only through authority-string limitation fields. This would let a
|
|
storage server operator revoke their space-allocation after delivering the
|
|
authority string.
|
|
|
|
== Low-Level Formats ==
|
|
|
|
This section describes the low-level formats used by the Accounting process,
|
|
beginning with the storage-authority data structure and working upwards. This
|
|
section is organized to follow the storage authority, starting from the point
|
|
of grant. The discussion will thus begin at the storage server (where the
|
|
authority is first created), work back to the client (which receives the
|
|
authority as a web-API argument), then follow the authority back to the
|
|
servers as it is used to enable specific storage operations. It will then
|
|
detail the accounting tables that the storage server is obligated to
|
|
maintain, and describe the interfaces through which these tables are accessed
|
|
by other parties.
|
|
|
|
=== Storage Authority ===
|
|
|
|
==== Terminology ====
|
|
|
|
Storage Authority is represented as a chain of certificates and a private
|
|
key. Each certificate authorizes and restricts a specific private key. The
|
|
initial certificate in the chain derives its authority by being placed in the
|
|
storage server's tahoe.cfg file (i.e. by being authorized by the storage
|
|
server operator). All subsequent certificates are signed by the authorized
|
|
private key that was identified in the previous certificate: they derive
|
|
their authority by delegation. Each certificate has restrictions which limit
|
|
the authority being delegated.
|
|
|
|
authority: ([cert[0], cert[1], cert[2] ...], privatekey)
|
|
|
|
The "restrictions dictionary" is a table which establishes an upper bound on
|
|
how this authority (or any attenuations thereof) may be used. It is
|
|
effectively a set of key-value pairs.
|
|
|
|
A "signing key" is an EC-DSA192 private key string, as supplied to the
|
|
pycryptopp SigningKey() constructor, and is 12 bytes long. A "verifying key"
|
|
is an EC-DSA192 public key string, as produced by pycryptopp, and is 24 bytes
|
|
long. A "key identifier" is a string which securely identifies a specific
|
|
signing/verifying keypair: for long RSA keys it would be a secure hash of the
|
|
public key, but since ECDSA192 keys are so short, we simply use the full
|
|
verifying key verbatim. A "key hint" is a variable-length prefix of the key
|
|
identifier, perhaps zero bytes long, used to help a recipient reduce the
|
|
number of verifying keys that it must search to find one that matches a
|
|
signed message.
|
|
|
|
==== Authority Chains ====
|
|
|
|
The authority chain consists of a list of certificates, each of which has a
|
|
serialized restrictions dictionary. Each dictionary will have a
|
|
"delegate-to-key" field, which delegates authority to a private key,
|
|
referenced with a key identifier. In addition, the non-initial certs are
|
|
signed, so they each contain a signature and a key hint:
|
|
|
|
cert[0]: serialized(restrictions_dictionary)
|
|
cert[1]: serialized(restrictions_dictionary), signature, keyhint
|
|
cert[2]: serialized(restrictions_dictionary), signature, keyhint
|
|
|
|
In this example, suppose cert[0] contains a delegate-to-key field that
|
|
identifies a keypair sign_A/verify_A. In this case, cert[1] will have a
|
|
signature that was made with sign_A, and the keyhint in cert[1] will
|
|
reference verify_A.
|
|
|
|
cert[0].restrictions[delegate-to-key] = A_keyid
|
|
|
|
cert[1].signature = SIGN(sign_A, serialized(cert[0].restrictions))
|
|
cert[1].keyhint = verify_A
|
|
cert[1].restrictions[delegate-to-key] = B_keyid
|
|
|
|
cert[2].signature = SIGN(sign_B, serialized(cert[1].restrictions))
|
|
cert[2].keyhint = verify_B
|
|
cert[2].restrictions[delete-to-key] = C_keyid
|
|
|
|
In this example, the full storage authority consists of the cert[0,1,2] chain
|
|
and the sign_C private key: anyone who is in possession of both will be able
|
|
to exert this authority. To wield the authority, a client will present the
|
|
cert[0,1,2] chain and an action message signed by sign_C; the server will
|
|
validate the chain and the signature before performing the requested action.
|
|
The only circumstances that might prompt the client to share the sign_C
|
|
private key with another party (including the server) would be if it wanted
|
|
to irrevocably share its full authority with that party.
|
|
|
|
==== Restriction Dictionaries ====
|
|
|
|
Within a restriction dictionary, the following keys are defined. Their full
|
|
meanings are defined later.
|
|
|
|
'accountid': an arbitrary-length sequence of integers >=0, restricting the
|
|
accounts which can be manipulated or used in leases
|
|
'SI': a storage index (binary string), controlling which file may be
|
|
manipulated
|
|
'serverid': binary string, limiting which server will accept requests
|
|
'UEB-hash': binary string, limiting the content of the file being manipulated
|
|
'before': timestamp (seconds since epoch), limits the lifetime of this
|
|
authority
|
|
'server-size': integer >0, maximum aggregate storage (in bytes) per account
|
|
'delegate-to-key': binary string (DSA pubkey identifier)
|
|
'furl-to': printable FURL string
|
|
|
|
==== Authority Serialization ====
|
|
|
|
There is only one form of serialization: a somewhat-compact URL-safe
|
|
cut-and-pasteable printable form. We are interested in minimizing the size of
|
|
the resulting authority, so rather than using a general-purpose (perhaps
|
|
JSON-based) serialization scheme, we use one that is specialized for this
|
|
task.
|
|
|
|
This URL-safe form will use minimal punctuation to avoid quoting issues when
|
|
used in a URL query argument. It would be nice to avoid word-breaking
|
|
characters that make cut-and-paste troublesome, however this is more
|
|
difficult because most non-alphanumeric characters are word-breaking in at
|
|
least one application.
|
|
|
|
The serialized storage authority as a whole contains a single version
|
|
identifier and magic number at the beginning. None of the internal components
|
|
contain redundant version numbers: they are implied by the container. If
|
|
components are serialized independently for other reasons, they may contain
|
|
version identifers in that form.
|
|
|
|
Signing keys (i.e. private keys) are URL-safe-serialized using Zooko's base62
|
|
alphabet, which offers almost the same density as standard base64 but without
|
|
any non-URL-safe or word-breaking characters. Since we used fixed-format keys
|
|
(EC-DSA, 192bit, with SHA256), the private keys are fixed-length (96 bits or
|
|
12 bytes), so there is no length indicator: all URL-safe-serialized signing
|
|
keys are 17 base62 characters long. The 192-bit verifying keys (i.e. public
|
|
keys) use the same approach: the URL-safe form is 33 characters long.
|
|
|
|
An account-id sequence (a variable-length sequence of non-negative numbers)
|
|
is serialized by representing each number in decimal ASCII, then joining the
|
|
pieces with commas. The string is terminated by the first non-[0-9,]
|
|
character encountered, which will either be the key-identifier letter of the
|
|
next field, or the dictionary-terminating character at the end.
|
|
|
|
Any single integral decimal number (such as the "before" timestamp field, or
|
|
the "server-size" field) is serialized as a variable-length sequence of ASCII
|
|
decimal digits, terminated by any non-digit.
|
|
|
|
The restrictions dictionary is serialized as a concatenated series of
|
|
key-identifier-letter / value string pairs, ending with the marker "E.". The
|
|
URL-safe form uses a single printable letter to indicate the which key is
|
|
being serialized. Each type of value string is serialized differently:
|
|
|
|
"A": accountid: variable-length sequence of comma-joned numbers
|
|
"I": storage index: fixed-length 26-character *base32*-encoded storage index
|
|
"P": server id (peer id): fixed-length 32-character *base32* encoded serverid
|
|
(matching the printable Tub.tubID string that Foolscap provides)
|
|
"U": UEB hash: fixed-length 43-character base62 encoded UEB hash
|
|
"B": before: variable-length sequence of decimal digits, seconds-since-epoch.
|
|
"S": server-size: variable-length sequence of decimal digits, max size in bytes
|
|
"D": delegate-to-key: ECDSA public key, 33 base62 characters.
|
|
"F": furl-to: variable-length FURL string, wrapped in a netstring:
|
|
"%d:%s," % (len(FURL), FURL). Note that this is rarely pasted.
|
|
"E.": end-of-dictionary marker
|
|
|
|
The ECDSA signature is serialized as a variable number of base62 characters,
|
|
terminated by a period. We expect the signature to be about 384 bits (48
|
|
bytes) long, or 65 base62 characters. A missing signature (such as for the
|
|
initial cert) is represented as a single period.
|
|
|
|
The key hint is serialized with a base62-encoded serialized hint string (a
|
|
byte-quantized prefix of the serialized public key), terminated by a period.
|
|
An empty hint would thus be serialized as a single period. For the current
|
|
design, we expect the key hint to be empty.
|
|
|
|
The full storage authority string consists of a certificate chain and a
|
|
delegate private key. Given the single-certificate serialization scheme
|
|
described above, the full authority is serialized as follows:
|
|
|
|
* version prefix: depends upon the application, but for storage-authority
|
|
chains this will be "sa0-", for Storage-Authority Version 0.
|
|
* serialized certificates, concatenated together
|
|
* serialized private key (to which the last certificate delegates authority)
|
|
|
|
Note that this serialization form does not have an explicit terminator, so
|
|
the environment must provide a length indicator or some other way to identify
|
|
the end of the authority string. The benefit of this approach is that the
|
|
full string will begin and end with alphanumeric characters, making
|
|
cut-and-paste easier (increasing the size of the mouse target: anywhere
|
|
within the final component will work).
|
|
|
|
Also note that the period is a reserved delimiter: it cannot appear in the
|
|
serialized restrictions dictionary. The parser can remove the version prefix,
|
|
split the rest on periods, and expect to see 3*k+1 fields, consisting of k
|
|
(restriction-dictionary,signature,keyhint) 3-tuples and a single private key
|
|
at the end.
|
|
|
|
Some examples:
|
|
|
|
(example A)
|
|
cert[0] delegates account 1,4 to (pubkey ZlFA / privkey 1f2S):
|
|
|
|
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...1f2SI9UJPXvb7vdJ1
|
|
|
|
(example B)
|
|
cert[0] delegates account 1,4 to ZlFA/1f2S
|
|
cert[1] subdelegates 5GB and subaccount 1,4,7 to pubkey 0BPo/06rt:
|
|
|
|
sa0-A1,4D2lFA6LboL2xx0ldQH2K1TdSrwuqMMiME3E...A1,4,7S5000000000D0BPoGxJ3M4KWrmdpLnknhJABrWip5e9kPE,7cyhQvv5axdeihmOzIHjs85TcUIYiWHdsxNz50GTerEOR5ucj2TITPXxyaCUli1oF...06rtcPQotR3q4f2cT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
== Problems ==
|
|
|
|
Problems which have thus far been identified with this approach:
|
|
|
|
* allowing arbitrary subaccount generation will permit a DoS attack, in
|
|
which an authorized uploader consumes lots of DB space by creating an
|
|
unbounded number of randomly-generated subaccount identifiers. OTOH, they
|
|
can already attach an unbounded number of leases to any file they like,
|
|
consuming a lot of space.
|
|
|