52 Commits

Author SHA1 Message Date
Brian Warner
6160af5f50 encode.py: update comments, max_segment_size is now 2MiB 2007-10-16 11:00:29 -07:00
Zooko O'Whielacronx
426721f3f2 update a few documents, comments, and defaults to mention 3-of-10 instead of 25-of-100 2007-10-15 19:53:59 -07:00
Brian Warner
8b14ad1673 encode.py: log a percentage complete as well as the raw byte counts 2007-08-09 18:28:45 -07:00
Brian Warner
08ee80176a Encoder.__repr__: mention the file being encoded 2007-08-09 18:26:56 -07:00
Brian Warner
684966d103 encode.py: add a reactor turn barrier between segments, to allow Deferreds to retire and free their arguments, all in the name of reducing memory footprint 2007-08-09 18:26:17 -07:00
Brian Warner
e6e9ddc588 refactor upload/encode, to split encrypt and encode responsibilities 2007-07-23 19:31:53 -07:00
Brian Warner
9af506900b upload: refactor to enable streaming upload. not all tests pass yet 2007-07-19 18:21:44 -07:00
Brian Warner
20c980d02b reduce MAX_SEGMENT_SIZE from 2MB to 1MB, to compensate for the large blocks that 3-of-10 produces 2007-07-16 13:48:34 -07:00
Brian Warner
d206843625 encode.py: minor typo 2007-07-13 17:00:06 -07:00
Brian Warner
7589a8ee82 storage: we must truncate short segments. Now most tests pass (except uri_extension) 2007-07-13 16:38:25 -07:00
Brian Warner
1f8e407d9c more #85 work, system test still fails 2007-07-13 15:09:01 -07:00
Brian Warner
cd8648d39b storage: use one file per share instead of 7 (#85). work-in-progress, tests still fail 2007-07-13 14:04:49 -07:00
Brian Warner
5399395c27 allow the introducer to set default encoding parameters. Closes #84.
By writing something like "25 75 100" into a file named 'encoding_parameters'
in the central Introducer's base directory, all clients which use that
introducer will be advised to use 25-out-of-100 encoding for files (i.e.
100 shares will be produced, 25 are required to reconstruct, and the upload
process will be happy if it can find homes for at least 75 shares). The
default values are "3 7 10". For small meshes, the defaults are probably
good, but for larger ones it may be appropriate to increase the number of
shares.
2007-07-12 15:33:30 -07:00
Brian Warner
dce1dc2730 storage: wrap buckets in a local proxy
This will make it easier to change RIBucketWriter in the future to reduce the wire
protocol to just open/write(offset,data)/close, and do all the structuring on the
client end. The ultimate goal is to store each bucket in a single file, to reduce
the considerable filesystem-quantization/inode overhead on the storage servers.
2007-07-08 23:27:46 -07:00
Brian Warner
382888899b refactor URI_extension handlers out of encode/download and into uri.py 2007-06-11 18:25:18 -07:00
Brian Warner
584dc4ae94 handle uri_extension with a non-bencode serialization scheme 2007-06-08 16:17:54 -07:00
Brian Warner
c9ef291c02 rename thingA to 'uri extension' 2007-06-08 15:59:16 -07:00
Brian Warner
f62a544b93 remove several leftover defintions of netstring() 2007-06-07 22:13:18 -07:00
Brian Warner
c049941529 move almost all hashing to SHA256, consolidate into hashutil.py
The only SHA-1 hash that remains is used in the permutation of nodeids,
where we need to decide if we care about performance or long-term security.
I suspect that we could use a much weaker hash (and faster) hash for
this purpose. In the long run, we'll be doing thousands of such hashes
for each file uploaded or downloaded (one per known peer).
2007-06-07 21:47:21 -07:00
Brian Warner
cabba59fe7 test_encode.py: even more testing of merkle trees, getting fairly comprehensive now 2007-06-07 21:24:39 -07:00
Brian Warner
f3846da4ab encode.py: hush pyflakes warnings 2007-06-07 13:18:55 -07:00
Brian Warner
b2caf7fb9a encode/download: reduce memory footprint by deleting large intermediate buffers as soon as possible, improve hash tree usage 2007-06-07 13:15:58 -07:00
Brian Warner
c81f2b01ff encode.py: fix generation of plaintext/crypttext merkle trees 2007-06-07 13:14:14 -07:00
Brian Warner
e04ff3adac fetch plaintext/crypttext merkle trees during download, but don't check the segments against them yet 2007-06-07 00:15:41 -07:00
Brian Warner
dcf5abb51c encode.py: fix pyflakes warning 2007-06-07 02:56:16 -07:00
Brian Warner
5cbdc240e2 encode: add plaintext/crypttext merkle trees to the shares, and the thingA block. Still needs tests and download-side verification. 2007-06-06 19:40:20 -07:00
Brian Warner
f4c048bbeb encode.py: clean up handling of lost peers during upload, add some logging 2007-06-06 12:40:16 -07:00
Brian Warner
6bb9debc16 encode: tolerate lost peers, as long as we still get enough shares out. Closes #17. 2007-06-06 10:32:40 -07:00
Brian Warner
3dfd26970b move validation data to thingA, URI has storage_index plus thingA hash
This (compatibility-breaking) change moves much of the validation data and
encoding parameters out of the URI and into the so-called "thingA" block
(which will get a better name as soon as we find one we're comfortable with).
The URI retains the "storage_index" (a generalized term for the role that
we're currently using the verifierid for, the unique index for each file
that gets used by storage servers to decide which shares to return), the
decryption key, the needed_shares/total_shares counts (since they affect
peer selection), and the hash of the thingA block.

This shortens the URI and lets us add more kinds of validation data without
growing the URI (like plaintext merkle trees, to enable strong incremental
plaintext validation), at the cost of maybe 150 bytes of alacrity. Each
storage server holds an identical copy of the thingA block.

This is an incompatible change: new messages have been added to the storage
server interface, and the URI format has changed drastically.
2007-06-01 18:48:01 -07:00
Zooko O'Whielacronx
e0a18d12af globally search and replace "mesh" with "grid" and adjust description of the effect of NAT on the topology 2007-04-30 13:06:09 -07:00
Brian Warner
4b2298937b use real encryption, generate/store/verify verifierid and fileid 2007-04-25 17:53:10 -07:00
Brian Warner
49e992b8b6 make test_encode less CPU-intense by using 4-out-of-10 encoding instead of 25-out-of-100 2007-04-19 10:56:15 -07:00
Brian Warner
2d0e240466 encode: handle uploads of the same file multiple times. Unfortunately we have to do almost as much work the second time around, to compute the full URI 2007-04-18 18:29:10 -07:00
Brian Warner
85b36e348b encode.py: remove unused pad() code 2007-04-17 21:22:32 -07:00
Brian Warner
b84d6ed07f encode: fix multi-segment uploads: lambdas inside for loops require special attention to make sure you are capturing the *value* of the loop variable and not just the slot it lives in 2007-04-17 20:29:08 -07:00
Brian Warner
e7ec4ff4e5 factor out the tagged hash function used for subshares/blocks 2007-04-17 20:27:56 -07:00
Brian Warner
ff8cb4d32e encode: make MAX_SEGMENT_SIZE controllable, to support tests which force the use of multiple segments. Also, remove not-very-useful upload-side debug messages 2007-04-16 19:29:57 -07:00
Brian Warner
03fbee6ade encode.py: remove an unused import 2007-04-12 20:09:32 -07:00
Brian Warner
30133a7cdf hash trees: further cleanup, to make sure we're validating the right thing
hashtree.py: improve the methods available for finding out which hash nodes
 are needed. Change set_hashes() to require that every hash provided can
 be validated up to the root.
download.py: validate from the top down, including the URI-derived roothash
 in the share hash tree, and stashing the thus-validated share hash for use
 in the block hash tree.
2007-04-12 19:41:48 -07:00
Brian Warner
d8215e0c6f rename chunk.py to hashtree.py 2007-04-12 13:13:25 -07:00
Brian Warner
5168c7c8d6 encode: add more logging to investigate occasional test failures 2007-04-06 18:04:38 -07:00
Brian Warner
97a40bf20f encode/upload: add more logging, to understand the test failure on a slow buildslave 2007-04-06 15:45:45 -07:00
Brian Warner
8d2def5b04 encode: clean up some weirdness that was there to make unit tests easier to write 2007-04-05 22:36:18 -07:00
Brian Warner
919ca3e902 rename encode_new.py to encode.py, now that there isn't an old one anymore 2007-04-05 21:17:42 -07:00
Brian Warner
417c17755b use the word 'codec' for erasure coding, for now. 'encode' is used for file-level segmentation/hashing 2007-01-11 20:51:27 -07:00
Brian Warner
42c0d2e336 disable figleaf tracing during py_ecc, since it takes *forever*, especially on the slow buildslave 2007-01-05 18:12:04 -07:00
Brian Warner
3d3a7a5b8d encode.py: add some timing comments 2007-01-05 00:48:42 -07:00
Brian Warner
409d92e746 only run a single (short) py_ecc test on slave3, since it is so slow the tests timeout 2007-01-05 00:42:52 -07:00
Brian Warner
e1c6ee9dcf encoding: fix the last py_ecc problem, tests pass now 2007-01-05 00:06:42 -07:00
Brian Warner
c91d14dca8 fix our use of py_ecc (set log2FieldSize=8 explicitly) 2007-01-04 23:50:21 -07:00