Nathan Wilcox observed that the storage server can rely on the size of the share file combined with the count of leases to unambiguously identify the location of the leases. This means that it can hold any size share data, even though the field nominally used to hold the size of the share data is only 32 bits wide.
With this patch, the storage server still writes the "size of the share data" field (just in case the server gets downgraded to an earlier version which requires that field, or the share file gets moved to another server which is of an earlier vintage), but it doesn't use it. Also, with this patch, the server no longer rejects requests to write shares which are >= 2^32 bytes in size, and it no longer rejects attempts to read such shares.
This fixes http://allmydata.org/trac/tahoe/ticket/346 (increase share-size field to 8 bytes, remove 12GiB filesize limit), although there remains open a question of how clients know that a given server can handle large shares (by using the new versioning scheme, probably).
Note that share size is also limited by another factor -- how big of a file we can store on the local filesystem on the server. Currently allmydata.com typically uses ext3 and I think we typically have block size = 4 KiB, which means that the largest file is about 2 TiB. Also, the hard drives themselves are only 1 TB, so the largest share is definitely slightly less than 1 TB, which means (when K == 3), the largest file is less than 3 TB.
This patch also refactors the creation of new sharefiles so that only a single fopen() is used.
This patch also helps with the unit-testing of repairer, since formerly it was unclear what repairer should expect to find if the "share data size" field was corrupted (some corruptions would have no effect, others would cause failure to download). Now it is clear that repairer is not required to notice if this field is corrupted since it has no effect on download. :-)
The filesystem which gets my vote for most undeservedly popular is ext3, and it has a hard limit of 32,000 entries in a directory. Many other filesystems (even ones that I like more than I like ext3) have either hard limits or bad performance consequences or weird edge cases when you get too many entries in a single directory.
This patch makes it so that there is a layer of intermediate directories between the "shares" directory and the actual storage-index directory (the one whose name contains the entire storage index (z-base-32 encoded) and which contains one or more share files named by their share number).
The intermediate directories are named by the first 14 bits of the storage index, which means there are at most 16384 of them. (This also means that the intermediate directory names are not a leading prefix of the storage-index directory names -- to do that would have required us to have intermediate directories limited to either 1024 (2-char), which is too few, or 32768 (3-chars of a full 5 bits each), which would overrun ext3's funny hard limit of 32,000.))
This closes#150, and please see the "convertshares.py" script attached to #150 to convert your old tahoe-0.7.0 storage/shares directory into a new tahoe-0.8.0 storage/shares directory.