mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2024-12-20 05:28:04 +00:00
add README.txt
This commit is contained in:
parent
643c4994c0
commit
54fdcc2626
63
pyfec/README.txt
Normal file
63
pyfec/README.txt
Normal file
@ -0,0 +1,63 @@
|
||||
This package provides an "erasure code", or "forward error correction code".
|
||||
It is licensed under the GNU General Public License (see the COPYING file for
|
||||
details).
|
||||
|
||||
The most widely known example of an erasure code is the RAID-5 algorithm which
|
||||
makes it so that in the event of the loss of any one hard drive, the stored
|
||||
data can be completely recovered. The algorithm in the pyfec package has a
|
||||
similar effect, but instead of recovering from the loss of any one element, it
|
||||
can be parameterized to choose in advance the number of elements whose loss it
|
||||
can recover from.
|
||||
|
||||
This package is largely based on the old "fec" library by Luigi Rizzo et al.,
|
||||
which is a simple, fast, mature, and optimized implementation of erasure
|
||||
coding. The pyfec package makes several changes from the original "fec"
|
||||
package, including addition of the Python API, refactoring of the C API to be
|
||||
faster (for the way that I use it, at least), and a few clean-ups and
|
||||
micro-optimizations of the core code itself.
|
||||
|
||||
This package performs two operations, encoding and decoding. Encoding takes
|
||||
some input data and expands its size by producing extra "check blocks".
|
||||
Decoding takes some blocks -- any combination of original blocks of data (also
|
||||
called "primary shares") and check blocks (also called "secondary shares"), and
|
||||
produces the original data.
|
||||
|
||||
The encoding is parameterized by two integers, k and m. m is the total number
|
||||
of shares produced, and k is how many of those shares are necessary to
|
||||
reconstruct the original data. m is required to be at least 1 and at most 255,
|
||||
and k is required to be at least 1 and at most m. (Note that when k == m then
|
||||
there is no point in doing erasure coding.)
|
||||
|
||||
Note that each "primary share" is a segment of the original data, so its size
|
||||
is 1/k'th of the size of original data, and each "secondary share" is of the
|
||||
same size, so the total space used by all the shares is about m/k times the
|
||||
size of the original data.
|
||||
|
||||
The decoding step requires as input k of the shares which were produced by the
|
||||
encoding step. The decoding step produces as output the data that was earlier
|
||||
input to the encoding step.
|
||||
|
||||
This package also includes a Python interface. See the Python docstrings for
|
||||
usage details.
|
||||
|
||||
See also the filefec.py module which has a utility function for efficiently
|
||||
reading a file and encoding it piece by piece.
|
||||
|
||||
Beware of a "gotcha" that can result from the combination of mutable buffers
|
||||
and the fact that pyfec never makes an unnecessary data copy. That is:
|
||||
whenever one of the shares produced from a call to encode() or decode() has the
|
||||
same contents as one of the shares passed as input, then pyfec will return as
|
||||
output a pointer (in the C API) or a Python reference (in the Python API) to
|
||||
the object which was passed to it as input. This is efficient as it avoids
|
||||
making an unnecessary copy of the data. But if the object which was passed as
|
||||
input is mutable and if that object is mutated after the call to pyfec returns,
|
||||
then the result from pyfec -- which is just a reference to that same object --
|
||||
will also be mutated. This subtlety is the price you pay for avoiding data
|
||||
copying. If you don't want to have to worry about this, then simply use
|
||||
immutable objects (e.g. Python strings) to hold the data that you pass to
|
||||
pyfec.
|
||||
|
||||
Enjoy!
|
||||
|
||||
Zooko Wilcox-O'Hearn
|
||||
|
Loading…
Reference in New Issue
Block a user