tahoe-lafs/roadmap.txt

109 lines
3.8 KiB
Plaintext

['*' means complete]
Connection Management:
*v1: foolscap, no relay, live=connected-to-introducer, broadcast updates, fully connected topology
*v2: configurable IP address -- http://allmydata.org/trac/tahoe/ticket/22
v3: live != connected-to-introducer, connect on demand
v4: decentralized introduction -- http://allmydata.org/trac/tahoe/ticket/68
v5: relay?
File Encoding:
*v1: single-segment, no merkle trees
*v2: multiple-segment (LFE)
*v3: merkle tree to verify each share
*v4: merkle tree to verify each segment
*v5: merkle tree on plaintext and crypttext: incremental validation
v6: only retrieve the minimal number of hashes instead of all of them
Share Encoding:
*v1: fake it (replication)
*v2: PyRS
*v2.5: ICodec-based codecs, but still using replication
*v3: C-based Reed-Solomon
URI:
*v1: really big
*v2: store URI Extension with shares
*v3: derive storage index from readkey
v4: perhaps derive more information from version and filesize, to remove
codec_name, codec_params, tail_codec_params, needed_shares,
total_shares, segment_size from the URI Extension
Upload Peer Selection:
*v1: permuted peer list, consistent hash
*v2: permute peers by verifierid and arrange around ring, intermixed with
shareids on the same range, each share goes to the
next-clockwise-available peer
v3: reliability/goodness-point counting?
v4: denver airport (chord)?
Download Peer Selection:
*v1: ask all peers
v2: permute peers and shareids as in upload, ask next-clockwise peers first
(the "A" list), if necessary ask the ones after them, etc.
v3: denver airport?
Directory/Filesystem Maintenance:
*v1: vdrive-based tree of MutableDirectoryNodes, persisted to vdrive's disk
no accounts
*v2: single-host dirnodes, one tree per user, plus one global mutable space
*v3: distributed storage for dirnodes
v4: maintain file manifest, delete on remove
v5: figure out accounts, users, quotas, snapshots, versioning, etc
Checker/Repairer:
*v1: none
v1.5: maintain file manifest
v2: centralized checker, repair agent
v3: nodes also check their own files
Storage:
*v1: no deletion, one directory per verifierid, no owners of shares,
leases never expire
*v2: multiple shares per verifierid [zooko]
*v3: disk space limits on storage servers -- ticket #34
v4: deletion
v5: leases expire, delete expired data on demand, multiple owners per share
UI:
*v1: readonly webish (nevow, URLs are filepaths)
*v2: read/write webish, mkdir, del (files)
*v2.5: del (directories)
*v3: CLI tool.
v4: FUSE (linux) -- http://allmydata.org/trac/tahoe/ticket/36
v5: WebDAV
Operations/Deployment/Doc/Free Software/Community:
- move this file into the wiki ?
back pocket ideas:
when nodes are unable to reach storage servers, make a note of it, inform
verifier/checker eventually. verifier/checker then puts server under
observation or otherwise looks for differences between their self-reported
availability and the experiences of others
store filetable URI in the first 10 peers that appear after your own nodeid
each entry has a sequence number, maybe a timestamp
on recovery, find the newest
multiple categories of leases:
1: committed leases -- we will not delete these in any case, but will instead
tell an uploader that we are full
1a: active leases
1b: in-progress leases (partially filled, not closed, pb connection is
currently open)
2: uncommitted leases -- we will delete these in order to make room for new
lease requests
2a: interrupted leases (partially filled, not closed, pb connection is
currently not open, but they might come back)
2b: expired leases
(I'm not sure about the precedence of these last two. Probably deleting
expired leases instead of deleting interrupted leases would be okay.)
big questions:
convergence?
peer list maintenance: lots of entries