mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-12 16:02:43 +00:00
3782c27ac5
Various improvements to the lossmodel, plus addition of README.lossmodel that provides a link to the PDF.
2644 lines
56 KiB
Plaintext
2644 lines
56 KiB
Plaintext
#LyX 1.6.2 created this file. For more info see http://www.lyx.org/
|
|
\lyxformat 345
|
|
\begin_document
|
|
\begin_header
|
|
\textclass amsart
|
|
\use_default_options true
|
|
\begin_modules
|
|
theorems-ams
|
|
theorems-ams-extended
|
|
\end_modules
|
|
\language english
|
|
\inputencoding auto
|
|
\font_roman default
|
|
\font_sans default
|
|
\font_typewriter default
|
|
\font_default_family default
|
|
\font_sc false
|
|
\font_osf false
|
|
\font_sf_scale 100
|
|
\font_tt_scale 100
|
|
|
|
\graphics default
|
|
\float_placement h
|
|
\paperfontsize default
|
|
\spacing single
|
|
\use_hyperref false
|
|
\papersize default
|
|
\use_geometry false
|
|
\use_amsmath 1
|
|
\use_esint 1
|
|
\cite_engine basic
|
|
\use_bibtopic false
|
|
\paperorientation portrait
|
|
\secnumdepth 3
|
|
\tocdepth 3
|
|
\paragraph_separation indent
|
|
\defskip medskip
|
|
\quotes_language english
|
|
\papercolumns 1
|
|
\papersides 1
|
|
\paperpagestyle default
|
|
\tracking_changes false
|
|
\output_changes false
|
|
\author ""
|
|
\author ""
|
|
\end_header
|
|
|
|
\begin_body
|
|
|
|
\begin_layout Title
|
|
Tahoe Distributed Filesharing System Loss Model
|
|
\end_layout
|
|
|
|
\begin_layout Author
|
|
Shawn Willden
|
|
\end_layout
|
|
|
|
\begin_layout Date
|
|
07/22/2009
|
|
\end_layout
|
|
|
|
\begin_layout Address
|
|
South Weber, Utah
|
|
\end_layout
|
|
|
|
\begin_layout Email
|
|
shawn@willden.org
|
|
\end_layout
|
|
|
|
\begin_layout Abstract
|
|
The abstract goes here
|
|
\end_layout
|
|
|
|
\begin_layout Section
|
|
Problem Statement
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The allmydata Tahoe distributed file system uses Reed-Solomon erasure coding
|
|
to split files into
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
shares which are delivered to randomly-selected peers in a distributed
|
|
network.
|
|
The file can later be reassembled from any
|
|
\begin_inset Formula $k\leq N$
|
|
\end_inset
|
|
|
|
of the shares, if they are available.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Over time shares are lost for a variety of reasons.
|
|
Storage servers may crash, be destroyed or simply be removed from the network.
|
|
To mitigate such losses, Tahoe network clients employ a repair agent which
|
|
scans the peers once per time period
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
and determines how many of the shares remain.
|
|
If less than
|
|
\begin_inset Formula $L$
|
|
\end_inset
|
|
|
|
(
|
|
\begin_inset Formula $k\leq L\leq N$
|
|
\end_inset
|
|
|
|
) shares remain, then the repairer reconstructs the file shares and redistribute
|
|
s the missing ones, bringing the availability back up to full.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The question we're trying to answer is "What is the probability that we'll
|
|
be able to reassemble the file at some later time
|
|
\begin_inset Formula $T$
|
|
\end_inset
|
|
|
|
?".
|
|
We'd also like to be able to determine what values we should choose for
|
|
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
,
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
,
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
, and
|
|
\begin_inset Formula $L$
|
|
\end_inset
|
|
|
|
in order to ensure
|
|
\begin_inset Formula $Pr[loss]\leq r$
|
|
\end_inset
|
|
|
|
for some threshold probability
|
|
\begin_inset Formula $r$
|
|
\end_inset
|
|
|
|
.
|
|
This is an optimization problem because although we could obtain very low
|
|
|
|
\begin_inset Formula $Pr[loss]$
|
|
\end_inset
|
|
|
|
by selecting conservative parameters, these choices have costs.
|
|
The peer storage and bandwidth consumed by the share distribution process
|
|
are approximately
|
|
\begin_inset Formula $\nicefrac{N}{k}$
|
|
\end_inset
|
|
|
|
times the size of the original file, so we would like to minimize
|
|
\begin_inset Formula $\nicefrac{N}{k}$
|
|
\end_inset
|
|
|
|
, consistent with
|
|
\begin_inset Formula $Pr[loss]\leq r$
|
|
\end_inset
|
|
|
|
.
|
|
Likewise, a frequent and aggressive repair process keeps the number of
|
|
shares available close to
|
|
\begin_inset Formula $N,$
|
|
\end_inset
|
|
|
|
but at a cost in bandwidth and processing time as the repair agent downloads
|
|
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
shares, reconstructs the file and uploads new shares to replace those that
|
|
are lost.
|
|
\end_layout
|
|
|
|
\begin_layout Section
|
|
Reliability
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The probability that the file becomes unrecoverable is dependent upon the
|
|
probability that the peers to whom we send shares are able to return those
|
|
copies on demand.
|
|
Shares that are corrupted are detected and discarded, so there is no need
|
|
to distinguish between corruption and loss.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Many factors affect share availability.
|
|
Availability can be temporarily interrupted by peer unavailability due
|
|
to network outages, power failures or administrative shutdown, among other
|
|
reasons.
|
|
Availability can be permanently lost due to failure or corruption of storage
|
|
media, catastrophic damage to the peer system, administrative error, withdrawal
|
|
from the network, malicious corruption, etc.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The existence of intermittent failure modes motivates the introduction of
|
|
a distinction between
|
|
\noun on
|
|
availability
|
|
\noun default
|
|
and
|
|
\noun on
|
|
reliability
|
|
\noun default
|
|
.
|
|
Reliability is the probability that a share is retrievable assuming intermitten
|
|
t failures can be waited out, so reliability considers only permanent failures.
|
|
Availability considers all failures, and is focused on the probability
|
|
of retrieval within some defined time frame.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Another consideration is that some failures affect multiple shares.
|
|
If multiple shares of a file are stored on a single hard drive, for example,
|
|
failure of that drive may lose them all.
|
|
Catastrophic damage to a data center may destroy all shares on all peers
|
|
in that data center.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
While the types of failures that may occur are quite consistent across peers,
|
|
their probabilities differ dramatically.
|
|
A professionally-administered server with redundant storage, power and
|
|
Internet located in a carefully-monitored data center with automatic fire
|
|
suppression systems is much less likely to become either temporarily or
|
|
permanently unavailable than the typical virus and malware-ridden home
|
|
computer on a single cable modem connection.
|
|
A variety of situations in between exist as well, such as the case of the
|
|
author's home file server, which is administered by an IT professional
|
|
and uses RAID level 6 redundant storage, but runs on old, cobbled-together
|
|
equipment, and has a consumer-grade Internet connection.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
To begin with, let's use a simple definition of reliability:
|
|
\end_layout
|
|
|
|
\begin_layout Definition
|
|
|
|
\noun on
|
|
Reliability
|
|
\noun default
|
|
is the probability
|
|
\begin_inset Formula $p_{i}$
|
|
\end_inset
|
|
|
|
that a share
|
|
\begin_inset Formula $s_{i}$
|
|
\end_inset
|
|
|
|
will survive to (be retrievable at) time
|
|
\begin_inset Formula $T=A$
|
|
\end_inset
|
|
|
|
, ignoring intermittent failures.
|
|
That is, the probability that the share will be retrievable at the end
|
|
of the current repair cycle, and therefore usable by the repairer to regenerate
|
|
any lost shares.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Reliability
|
|
\begin_inset Formula $p_{i}$
|
|
\end_inset
|
|
|
|
is clearly dependent on
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
.
|
|
Short repair cycles offer less time for shares to
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
decay
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
into unavailability.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Peer Reliability
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Since peer reliability is the basis for any computations we may do on share
|
|
and file reliability, we must have a way to estimate it.
|
|
Reliability modeling of hardware, software and human performance are each
|
|
complex topics, the subject of much ongoing research.
|
|
In particular, the reliability of one of the key components of any peer
|
|
from our perspective -- the hard drive where file shares are stored --
|
|
is the subject of much current debate.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
A common assumption about hardware failure is that it follows the
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
bathtub curve
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
, with frequent failures during the first few months, a constant failure
|
|
rate for a few years and then a rising failure rate as the hardware wears
|
|
out.
|
|
This curve is often flattened by burn-in stress testing, and by periodic
|
|
replacement that assures that in-service components never reach
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
old age
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
In any case, we're generally going to ignore all of that complexity and
|
|
focus on the bottom of the bathtub, assuming constant failure rates.
|
|
This is a particularly reasonable assumption as long as we're focused on
|
|
failures during a particular, relatively short interval
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
.
|
|
Towards the end of this paper, as we examine failures over many repair
|
|
intervals, the assumption becomes more tenuous, and we note some of the
|
|
issues.
|
|
\end_layout
|
|
|
|
\begin_layout Subsubsection
|
|
Estimate Adaptation
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Even assuming constant failure rates, however, it will be rare that the
|
|
duration of
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
coincides with the available failure rate data, particularly since we want
|
|
to view
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
as a tunable parameter.
|
|
It's necessary to be able adapt failure rates baselined against any given
|
|
duration to the selected value of
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Another issue is that failure rates of hardware, etc., are necessarily continuous
|
|
in nature, while the per-interval failure/survival rates that are of interest
|
|
for file reliability calculations are discrete -- a peer either survives
|
|
or fails during the interval.
|
|
The continuous nature of failure rates means that the common and obvious
|
|
methods for estimating failure rates result in values that follow continuous,
|
|
not discrete distributions.
|
|
The difference is minor for small failure probabilities, and converges
|
|
to zero as the number of intervals goes to infinity, but is important enough
|
|
in some cases to be worth correcting for.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Continuous failure rates are described in terms of mean time to failure,
|
|
and under the assumption that failure rates are constant, are exponentially
|
|
distributed.
|
|
Under these assumptions, the probability that a machine fails at time
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
, is
|
|
\begin_inset Formula \[
|
|
f\left(t\right)=\lambda e^{-\lambda t}\]
|
|
|
|
\end_inset
|
|
|
|
where
|
|
\begin_inset Formula $\lambda$
|
|
\end_inset
|
|
|
|
represents the per unit-time failure rate.
|
|
The probability that a machine fails at or before time
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
is therefore
|
|
\begin_inset Formula \begin{align}
|
|
F\left(t\right) & =\int_{0}^{A}f\left(x\right)dx\nonumber \\
|
|
& =\int_{0}^{A}\lambda e^{-\lambda x}dx\nonumber \\
|
|
& =1-e^{-\lambda A}\label{eq:failure-time}\end{align}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Note that
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $\lambda$
|
|
\end_inset
|
|
|
|
in
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:failure-time"
|
|
|
|
\end_inset
|
|
|
|
must be expressed in consistent time units.
|
|
If they're different, unit conversions should be applied in the normal
|
|
way.
|
|
For example, if the estimate for
|
|
\begin_inset Formula $\lambda$
|
|
\end_inset
|
|
|
|
is 750 failures per million hours, and
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
is one month, then either
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
should be represented as
|
|
\begin_inset Formula $30\cdot24/1000000=.00072$
|
|
\end_inset
|
|
|
|
, or
|
|
\begin_inset Formula $\lambda$
|
|
\end_inset
|
|
|
|
should be converted to failures per month.
|
|
Or both may be converted to hours.
|
|
\end_layout
|
|
|
|
\begin_layout Subsubsection
|
|
Acquiring Peer Reliability Estimates
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Need to write this.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Uniform Reliability
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
name "sub:Fixed-Reliability"
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
In the simplest case, the peers holding the file shares all have the same
|
|
reliability
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
, and are all independent from one another.
|
|
Let
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
be a random variable that represents the number of shares that survive
|
|
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
.
|
|
Each share's survival can be viewed as an independent Bernoulli trial with
|
|
a success probability of
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
, which means that
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
follows the binomial distribution with parameters
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
.
|
|
That is,
|
|
\begin_inset Formula $K\sim B(N,p)$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Theorem
|
|
Binomial Distribution Theorem
|
|
\end_layout
|
|
|
|
\begin_layout Theorem
|
|
Consider
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
|
|
independent Bernoulli trials
|
|
\begin_inset Foot
|
|
status collapsed
|
|
|
|
\begin_layout Plain Layout
|
|
A Bernoulli trial is simply a test of some sort that results in one of two
|
|
outcomes, one of which is designated success and the other failure.
|
|
The classic example of a Bernoulli trial is a coin toss.
|
|
\end_layout
|
|
|
|
\end_inset
|
|
|
|
that succeed with probability
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
, and let
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
be a random variable that represents the number,
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
, of successes,
|
|
\begin_inset Formula $0\le m\le n$
|
|
\end_inset
|
|
|
|
.
|
|
We say that
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
follows the Binomial Distribution with parameters n and p, denoted
|
|
\begin_inset Formula $K\sim B(n,p)$
|
|
\end_inset
|
|
|
|
.
|
|
The probability mass function (PMF) of K is a function that gives the probabili
|
|
ty that
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
takes a particular value
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
(the probability that there are exactly
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
successful trials, and therefore
|
|
\begin_inset Formula $n-m$
|
|
\end_inset
|
|
|
|
failures).
|
|
The PMF of K is
|
|
\begin_inset Formula \begin{equation}
|
|
Pr[K=m]=f(m;n,p)=\binom{n}{m}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Proof
|
|
Consider the specific case of exactly
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
successes followed by
|
|
\begin_inset Formula $n-m$
|
|
\end_inset
|
|
|
|
failures, because each success has probability
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
, each failure has probability
|
|
\begin_inset Formula $1-p$
|
|
\end_inset
|
|
|
|
, and the trials are independent, the probability of this exact case occurring
|
|
is
|
|
\begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$
|
|
\end_inset
|
|
|
|
, the product of the probabilities of the outcome of each trial.
|
|
\end_layout
|
|
|
|
\begin_layout Proof
|
|
Now consider any reordering of these
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
successes and
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
|
|
failures.
|
|
Any such reordering occurs with the same probability
|
|
\begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$
|
|
\end_inset
|
|
|
|
, but with the terms of the product reordered.
|
|
Since multiplication is commutative, each such reordering has the same
|
|
probability.
|
|
There are n-choose-m such orderings, and each ordering is an independent
|
|
event, meaning we can sum the probabilities of the individual orderings,
|
|
so the probability that any ordering of
|
|
\begin_inset Formula $m$
|
|
\end_inset
|
|
|
|
successes and
|
|
\begin_inset Formula $n-m$
|
|
\end_inset
|
|
|
|
failures occurs is given by
|
|
\begin_inset Formula \[
|
|
\binom{n}{m}p^{m}\left(1-p\right)^{\left(n-m\right)}\]
|
|
|
|
\end_inset
|
|
|
|
which is the right-hand-side of equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:binomial-pmf"
|
|
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
A file survives if at least
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
of the
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
shares survive.
|
|
Equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:binomial-pmf"
|
|
|
|
\end_inset
|
|
|
|
gives the probability that exactly
|
|
\begin_inset Formula $i$
|
|
\end_inset
|
|
|
|
shares survive, for any
|
|
\begin_inset Formula $1\leq i\leq n$
|
|
\end_inset
|
|
|
|
, so the probability that fewer than
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
survive is the sum of the probabilities that
|
|
\begin_inset Formula $0,1,2,\ldots,k-1$
|
|
\end_inset
|
|
|
|
shares survive.
|
|
That is:
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
\begin_inset Formula \begin{equation}
|
|
Pr[file\, lost]=\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:simple-failure}\end{equation}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Independent Reliability
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
name "sub:Independent-Reliability"
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:simple-failure"
|
|
|
|
\end_inset
|
|
|
|
assumes that all shares have the same probability of survival, but as explained
|
|
above, this is not necessarily true.
|
|
A more accurate model allows each share
|
|
\begin_inset Formula $s_{i}$
|
|
\end_inset
|
|
|
|
an independent probability of survival
|
|
\begin_inset Formula $p_{i}$
|
|
\end_inset
|
|
|
|
.
|
|
Each share's survival can still be treated as an independent Bernoulli
|
|
trial, but with success probability
|
|
\begin_inset Formula $p_{i}$
|
|
\end_inset
|
|
|
|
.
|
|
Under this assumption,
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
follows a generalized binomial distribution with parameters
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $p_{1},p_{2},\dots,p_{N}$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The PMF for this generalized
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
does not have a simple closed-form representation.
|
|
However, the PMFs for random variables representing individual share survival
|
|
do.
|
|
Let
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
be a random variable such that:
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
\begin_inset Formula \[
|
|
K_{i}=\begin{cases}
|
|
1 & \textnormal{if }s_{i}\textnormal{ survives}\\
|
|
0 & \textnormal{if }s_{i}\textnormal{ fails}\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The PMF for
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
is very simple:
|
|
\begin_inset Formula \[
|
|
Pr[K_{i}=j]=\begin{cases}
|
|
p_{i} & j=1\\
|
|
1-p_{i} & j=0\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
which can also be expressed as
|
|
\begin_inset Formula \[
|
|
Pr[K_{i}=j]=f\left(j\right)=\left(1-p_{i}\right)\left(1-j\right)+p_{i}\left(j\right)\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Note that since each
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
represents the count of shares
|
|
\begin_inset Formula $s_{i}$
|
|
\end_inset
|
|
|
|
that survives (either 0 or 1), if we add up all of the individual survivor
|
|
counts, we get the group survivor count.
|
|
That is:
|
|
\begin_inset Formula \[
|
|
\sum_{i=1}^{N}K_{i}=K\]
|
|
|
|
\end_inset
|
|
|
|
Effectively, we have separated
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
into the series of Bernoulli trials that make it up.
|
|
\end_layout
|
|
|
|
\begin_layout Theorem
|
|
Discrete Convolution Theorem
|
|
\end_layout
|
|
|
|
\begin_layout Theorem
|
|
Let
|
|
\begin_inset Formula $X$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $Y$
|
|
\end_inset
|
|
|
|
be discrete random variables with probability mass functions given by
|
|
\begin_inset Formula $Pr\left[X=x\right]=f(x)$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $Pr\left[Y=y\right]=g(y).$
|
|
\end_inset
|
|
|
|
Let
|
|
\begin_inset Formula $Z$
|
|
\end_inset
|
|
|
|
be the discrete random random variable obtained by summing
|
|
\begin_inset Formula $X$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $Y$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Theorem
|
|
The probability mass function of
|
|
\begin_inset Formula $Z$
|
|
\end_inset
|
|
|
|
is given by
|
|
\begin_inset Formula \[
|
|
Pr[Z=z]=h(z)=\left(f\star g\right)(z)\]
|
|
|
|
\end_inset
|
|
|
|
where
|
|
\begin_inset Formula $\star$
|
|
\end_inset
|
|
|
|
denotes the discrete convolution operation:
|
|
\begin_inset Formula \[
|
|
\left(f\star g\right)\left(n\right)=\sum_{m=-\infty}^{\infty}f\left(m\right)g\left(m-n\right)\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Proof
|
|
The proof is beyond the scope of this paper.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
If we denote the PMF of
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
with
|
|
\begin_inset Formula $f$
|
|
\end_inset
|
|
|
|
and the PMF of
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
with
|
|
\begin_inset Formula $g_{i}$
|
|
\end_inset
|
|
|
|
(more formally,
|
|
\begin_inset Formula $Pr[K=x]=f(x)$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $Pr[K_{i}=x]=g_{i}(x)$
|
|
\end_inset
|
|
|
|
) then since
|
|
\begin_inset Formula $K=\sum_{i=1}^{N}K_{i}$
|
|
\end_inset
|
|
|
|
, according to the discrete convolution theorem
|
|
\begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$
|
|
\end_inset
|
|
|
|
.
|
|
Since convolution is associative, this can also be written as
|
|
\begin_inset Formula $ $
|
|
\end_inset
|
|
|
|
|
|
\begin_inset Formula \begin{equation}
|
|
f=(\ldots((g_{1}\star g_{2})\star g_{3})\star\ldots)\star g_{N})\label{eq:convolution}\end{equation}
|
|
|
|
\end_inset
|
|
|
|
Therefore,
|
|
\begin_inset Formula $f$
|
|
\end_inset
|
|
|
|
can be computed as a sequence of convolution operations on the simple PMFs
|
|
of the random variables
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
.
|
|
In fact, for large
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
, equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:convolution"
|
|
|
|
\end_inset
|
|
|
|
turns out to be a more effective means of computing the PMF of
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
than the binomial theorem.
|
|
even in the case of shares with identical survival probability.
|
|
The reason it's better is because the calculation of
|
|
\begin_inset Formula $\binom{n}{m}$
|
|
\end_inset
|
|
|
|
in equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:binomial-pmf"
|
|
|
|
\end_inset
|
|
|
|
produces very large values that overflow unless arbitrary precision numeric
|
|
representations are used.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Note also that it is not necessary to have very simple PMFs like those of
|
|
the
|
|
\begin_inset Formula $K_{i}$
|
|
\end_inset
|
|
|
|
.
|
|
Any share or set of shares that has a known PMF can be combined with any
|
|
other set with a known PMF by convolution, as long as the two share sets
|
|
are independent.
|
|
The reverse holds as well; given a group with an empirically-derived PMF,
|
|
in it's theoretically possible to solve for an individual PMF, and thereby
|
|
determine
|
|
\begin_inset Formula $p_{i}$
|
|
\end_inset
|
|
|
|
even when per-share data is unavailable.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Multiple Failure Modes
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
name "sub:Multiple-Failure-Modes"
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
In modeling share survival probabilities, it's useful to be able to analyze
|
|
separately each of the various failure modes.
|
|
For example, if reliable statistics for disk failure can be obtained, then
|
|
a probability mass function for that form of failure can be generated.
|
|
Similarly, statistics on other hardware failures, administrative errors,
|
|
network losses, etc., can all be estimated independently.
|
|
If those estimates can then be combined into a single PMF for a share,
|
|
then we can use it to predict failures for that share.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Combining independent failure modes for a single share is straightforward.
|
|
If
|
|
\begin_inset Formula $p_{i,j}$
|
|
\end_inset
|
|
|
|
is the probability of survival of the
|
|
\begin_inset Formula $j$
|
|
\end_inset
|
|
|
|
th failure mode of share
|
|
\begin_inset Formula $i$
|
|
\end_inset
|
|
|
|
,
|
|
\begin_inset Formula $1\leq j\leq m$
|
|
\end_inset
|
|
|
|
, then
|
|
\begin_inset Formula \[
|
|
Pr[K_{i}=k]=f_{i}(k)=\begin{cases}
|
|
\prod_{j=1}^{m}p_{i,j} & k=1\\
|
|
1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
is the survival PMF.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Multi-share failures
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
name "sub:Multi-share-failures"
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
If there are failure modes that affect multiple computers, we can also construct
|
|
the PMF that predicts their survival.
|
|
The key observation is that the PMF has non-zero probabilities only for
|
|
|
|
\begin_inset Formula $0$
|
|
\end_inset
|
|
|
|
survivors and
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
|
|
survivors, where
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
|
|
is the number of shares in the set.
|
|
If
|
|
\begin_inset Formula $p$
|
|
\end_inset
|
|
|
|
is the probability of survival, the PMF of
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
, a random variable representing the number of survivors is
|
|
\begin_inset Formula \[
|
|
Pr[K=k]=f(k)=\begin{cases}
|
|
p & k=n\\
|
|
0 & 0<i<n\\
|
|
1-p & k=0\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Group failures due to multiple independent causes can be combined as in
|
|
section
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "sub:Multiple-Failure-Modes"
|
|
|
|
\end_inset
|
|
|
|
, as long as they apply to the whole group.
|
|
\end_layout
|
|
|
|
\begin_layout Example
|
|
Putting the Pieces Together
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Sections
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "sub:Fixed-Reliability"
|
|
|
|
\end_inset
|
|
|
|
through
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "sub:Multi-share-failures"
|
|
|
|
\end_inset
|
|
|
|
provide ways of calculating the survival probability mass functions for
|
|
a variety of share failure structures and modes.
|
|
As an example of how these pieces can be used, consider a network with
|
|
the following peers:
|
|
\end_layout
|
|
|
|
\begin_layout Itemize
|
|
Four servers located in a data center in Nebraska.
|
|
The machines have multiply-redundant Internet connections, with a failure
|
|
probability of 0.0001.
|
|
They store their shares on RAID arrays with failure probability of 0.0002.
|
|
The administrative staff makes data-destroying errors with probability
|
|
0.003.
|
|
\end_layout
|
|
|
|
\begin_layout Itemize
|
|
Four servers located in a data center on the island of Hawaii.
|
|
These servers have identical failure probabilities as the servers in Nebraska,
|
|
except that the data center is near the edge of the crater on Mount Kilauea
|
|
(nobody said examples had to be realistic).
|
|
There is a 0.04 chance that the volcano will erupt and bury the data center
|
|
in molten lava, destroying it entirely.
|
|
\end_layout
|
|
|
|
\begin_layout Itemize
|
|
Four PCs located in random homes, connected to the Internet via assorted
|
|
cable modems and DSL.
|
|
Their network connections fail with probability 0.009.
|
|
Their disks fail with probability 0.001.
|
|
Their users destroy data with probability 0.05.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
If one share is placed on each of these 12 computers, what's the probability
|
|
mass function of share survival? To more compactly describe PMFs, we'll
|
|
denote them as probability vectors of the form
|
|
\begin_inset Formula $\left[\alpha_{o},\alpha_{1},\alpha_{2},\ldots\alpha_{n}\right]$
|
|
\end_inset
|
|
|
|
where
|
|
\begin_inset Formula $\alpha_{i}$
|
|
\end_inset
|
|
|
|
is the probability that exactly
|
|
\begin_inset Formula $i$
|
|
\end_inset
|
|
|
|
shares survive.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The servers in the two data centers have individual failure probabilities
|
|
of RAID failure (.0002) and administrative error (.003) giving an individual
|
|
survival probability of
|
|
\begin_inset Formula \[
|
|
(1-.0002)\cdot(1-.003)=.9998\cdot.997=.9968\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Using
|
|
\begin_inset Formula $p=.9968,n=4$
|
|
\end_inset
|
|
|
|
in equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:binomial-pmf"
|
|
|
|
\end_inset
|
|
|
|
gives the survival PMF
|
|
\begin_inset Formula \[
|
|
\left[1.049\times10^{-10},1.307\times10^{-7},6.105\times10^{-5},0.01271,0.9872\right]\]
|
|
|
|
\end_inset
|
|
|
|
which applies to each group of four servers.
|
|
However, each data center also has a .0001 chance of data connection loss,
|
|
which affects all four servers at once, and Hawaii has the additional .04
|
|
probability of severe lava burn.
|
|
If the network fails at a location, all the machines go offline together.
|
|
The probability that 0 machines survive is the probability that they all
|
|
fail for individual reasons (
|
|
\begin_inset Formula $1.049\cdot10^{-10}$
|
|
\end_inset
|
|
|
|
) plus the probability they all fail because of a network outage (
|
|
\begin_inset Formula $.0001$
|
|
\end_inset
|
|
|
|
) less the probability they fail for both reasons:
|
|
\begin_inset Formula \[
|
|
\left(1.049\times10^{-10}\right)+\left(0.0001\right)-\left[\left(1.049\times10^{-10}\right)\cdot\left(0.0001\right)\right]\approxeq0.0001\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
That's the
|
|
\begin_inset Formula $i=0$
|
|
\end_inset
|
|
|
|
element of the combined PMF.
|
|
The combined probability of survival of
|
|
\begin_inset Formula $0<i\leq4$
|
|
\end_inset
|
|
|
|
servers is simpler: it's the probability they survive individual failure,
|
|
from the individual failure PMF above, times the probability they survive
|
|
network failure (.9999).
|
|
So the combined survival PMF, which we'll denote as
|
|
\begin_inset Formula $n(i)$
|
|
\end_inset
|
|
|
|
of the Nebraska servers is
|
|
\begin_inset Formula \[
|
|
n(i)=\left[0.0001,1.306\times10^{-7},6.104\times10^{-5},0.01268,0.9872\right]\]
|
|
|
|
\end_inset
|
|
|
|
which has the interesting property that complete failure is 1000 times more
|
|
likely than survival of one server.
|
|
This is because the probability of a network outage is so much greater
|
|
than simultaneous
|
|
\begin_inset Foot
|
|
status collapsed
|
|
|
|
\begin_layout Plain Layout
|
|
Of course, the failures need not be truly simultaneous, they just have happen
|
|
in the same interval between repair runs.
|
|
\end_layout
|
|
|
|
\end_inset
|
|
|
|
independent failure of three servers.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
We apply the same process for the Hawaii servers, but with group survival
|
|
probability of
|
|
\begin_inset Formula $(1-.0001)(1-.04)=.9799$
|
|
\end_inset
|
|
|
|
gives the survival PMF
|
|
\begin_inset Formula \[
|
|
h(i)=\left[0.0201,1.280\times10^{-7},5.982\times10^{-5},0.01242,0.9674\right]\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Applying the convolution operator to
|
|
\begin_inset Formula $n(i)$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $h(i)$
|
|
\end_inset
|
|
|
|
, the survival PMF of all eight servers is:
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
\begin_inset Formula \[
|
|
\left(n\star h\right)\left(i\right)=\begin{cases}
|
|
2.010\times10^{-6} & i=0\\
|
|
2.639\times10^{-9} & i=1\\
|
|
1.233\times10^{-6} & i=2\\
|
|
2.560\times10^{-4} & i=3\\
|
|
0.01994 & i=4\\
|
|
1.769\times10^{-6} & i=5\\
|
|
2.756\times10^{-4} & i=6\\
|
|
0.02452 & i=7\\
|
|
0.9559 & i=8\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
\begin_inset VSpace defskip
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Note that losing four shares (
|
|
\begin_inset Formula $i=4$
|
|
\end_inset
|
|
|
|
) is 10,000 times more likely than losing three (
|
|
\begin_inset Formula $i=5$
|
|
\end_inset
|
|
|
|
).
|
|
This is because both data centers have a whole-center failure mode, and
|
|
the Hawaii center's lava burn probability is so high.
|
|
Similarly, the probability of losing all of them is 1000 times higher than
|
|
the probability of losing all but one.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
For the home PCs, their individual probability of survival is
|
|
\begin_inset Formula \[
|
|
(1-.009)\cdot(1-.001)\cdot(1-.05)=.991\cdot.999\cdot.95=.9405\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
We can then apply equation
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "eq:binomial-pmf"
|
|
|
|
\end_inset
|
|
|
|
with
|
|
\begin_inset Formula $N=4$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $p=.9405$
|
|
\end_inset
|
|
|
|
to compute the PMF
|
|
\begin_inset Formula $g(i),0\leq i\leq4$
|
|
\end_inset
|
|
|
|
for the PCs and finally compute
|
|
\begin_inset Formula $f(i)=\left(g\star\left(n\star h\right)\right)\left(i\right)$
|
|
\end_inset
|
|
|
|
, the PMF of the whole share set.
|
|
Summing the values of
|
|
\begin_inset Formula $f(i)$
|
|
\end_inset
|
|
|
|
for
|
|
\begin_inset Formula $0\leq i\leq k-1$
|
|
\end_inset
|
|
|
|
gives the probability that less than
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
shares survive and the file is unrecoverable.
|
|
For this example, those sums are shown in table
|
|
\begin_inset CommandInset ref
|
|
LatexCommand vref
|
|
reference "tab:Example-PMF"
|
|
|
|
\end_inset
|
|
|
|
.
|
|
\begin_inset Float table
|
|
wide false
|
|
sideways false
|
|
status collapsed
|
|
|
|
\begin_layout Plain Layout
|
|
\align center
|
|
\begin_inset Tabular
|
|
<lyxtabular version="3" rows="13" columns="4">
|
|
<features>
|
|
<column alignment="center" valignment="top" width="0">
|
|
<column alignment="center" valignment="top" width="0">
|
|
<column alignment="center" valignment="top" width="0">
|
|
<column alignment="center" valignment="top" width="0">
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $Pr[K=k]$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $Pr[file\, loss]=Pr[K<k]$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $N/k$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $1.60\times10^{-9}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $2.53\times10^{-11}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
12
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
2
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $3.80\times10^{-8}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $1.63\times10^{-9}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
6
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
3
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $4.04\times10^{-7}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $3.70\times10^{-8}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
4
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
4
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $2.06\times10^{-6}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $4.44\times10^{-7}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
3
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
5
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $2.10\times10^{-5}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $2.50\times10^{-6}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
2.4
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
6
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.000428$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $2.35\times10^{-5}$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
2
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
7
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.00417$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.000452$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1.7
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
8
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.0157$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.00462$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1.5
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
9
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.00127$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.0203$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1.3
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
10
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.0230$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.0216$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1.2
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
11
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.208$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.0446$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1.1
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
<row>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
12
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.747$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Formula $0.253$
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
|
|
\begin_inset Text
|
|
|
|
\begin_layout Plain Layout
|
|
1
|
|
\end_layout
|
|
|
|
\end_inset
|
|
</cell>
|
|
</row>
|
|
</lyxtabular>
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Plain Layout
|
|
\begin_inset Caption
|
|
|
|
\begin_layout Plain Layout
|
|
\align left
|
|
\begin_inset CommandInset label
|
|
LatexCommand label
|
|
name "tab:Example-PMF"
|
|
|
|
\end_inset
|
|
|
|
Example PMF
|
|
\end_layout
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Plain Layout
|
|
|
|
\end_layout
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The table demonstrates the importance of the selection of
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
, and the tradeoff against file size expansion.
|
|
Note that the survival of exactly 9 servers is significantly less likely
|
|
than the survival of 8 or 10 servers.
|
|
This is, again, an artifact of the group failure modes.
|
|
Because of this, there is no reason to choose
|
|
\begin_inset Formula $k=9$
|
|
\end_inset
|
|
|
|
over
|
|
\begin_inset Formula $k=10$
|
|
\end_inset
|
|
|
|
.
|
|
Normally, reducing the number of shares needed for reassembly improve the
|
|
file's chances of survival, but in this case it provides a minuscule gain
|
|
in reliability at the cost of a 10% increase in bandwidth and storage consumed.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Share Duplication
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Before moving on to consider issues other than single-interval file loss,
|
|
let's analyze one more possibility, that of
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
cheap
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
file repair via share duplication.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Initially, files are split using erasure coding, which creates
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
unique shares, any
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
of which can be used to to reconstruct the file.
|
|
When shares are lost, proper repair downloads some
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
shares, reconstructs the original file and then uses the erasure coding
|
|
algorithm to reconstruct the lost shares, then redeploys them to peers
|
|
in the network.
|
|
This is a somewhat expensive process.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
A cheaper repair option is simply to direct some peer that has share
|
|
\begin_inset Formula $s_{i}$
|
|
\end_inset
|
|
|
|
to send a copy to another peer, thus increasing by one the number of shares
|
|
in the network.
|
|
This is not as good as actually replacing the lost share, though.
|
|
Suppose that more shares were lost, leaving only
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
shares remaining.
|
|
If two of those shares are identical, because one was duplicated in this
|
|
fashion, then only
|
|
\begin_inset Formula $k-1$
|
|
\end_inset
|
|
|
|
shares truly remain, and the file can no longer be reconstructed.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
However, such cheap repair is not completely pointless; it does increase
|
|
file survivability.
|
|
But by how much?
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Effectively, share duplication simply increases the probability that
|
|
\begin_inset Formula $s_{i}$
|
|
\end_inset
|
|
|
|
will survive, by providing two locations from which to retrieve it.
|
|
We can view the two copies of the single share as one, but with a higher
|
|
probability of survival than would be provided by either of the two peers.
|
|
In particular, if
|
|
\begin_inset Formula $p_{1}$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $p_{2}$
|
|
\end_inset
|
|
|
|
are the probabilities that the two peers will survive, respectively, then
|
|
\begin_inset Formula \[
|
|
Pr[s_{i}\, survives]=p_{1}+p_{2}-p_{1}p_{2}\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
More generally, if a single share is deployed on
|
|
\begin_inset Formula $n$
|
|
\end_inset
|
|
|
|
peers, each with a PMF
|
|
\begin_inset Formula $f_{i}(j),0\leq j\leq1,1\leq i\leq n$
|
|
\end_inset
|
|
|
|
, the share survival count is a random variable
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
and the probability of share loss is
|
|
\begin_inset Formula \[
|
|
Pr[K=0]=(f_{1}\star f_{2}\star\ldots\star f_{n})(0)\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
From that, we can construct a share PMF in the obvious way, which can then
|
|
be convolved with the other share PMFs to produce the share set PMF.
|
|
\end_layout
|
|
|
|
\begin_layout Example
|
|
Suppose a file has
|
|
\begin_inset Formula $N=10,k=3$
|
|
\end_inset
|
|
|
|
and that all servers have survival probability
|
|
\begin_inset Formula $p=.9$
|
|
\end_inset
|
|
|
|
.
|
|
Given a full complement of shares,
|
|
\begin_inset Formula $Pr[\textrm{file\, loss}]=3.74\times10^{-7}$
|
|
\end_inset
|
|
|
|
.
|
|
Suppose that four shares are lost, which increases
|
|
\begin_inset Formula $Pr[\textrm{file\, loss}]$
|
|
\end_inset
|
|
|
|
to
|
|
\begin_inset Formula $.00127$
|
|
\end_inset
|
|
|
|
, a value
|
|
\begin_inset Formula $3400$
|
|
\end_inset
|
|
|
|
times greater.
|
|
Rather than doing a proper reconstruction, we could direct four peers still
|
|
holding shares to send a copy of their share to new peer, which changes
|
|
the composition of the shares from one of six, unique
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
standard
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
shares, to one of two standard shares, each with survival probability
|
|
\begin_inset Formula $.9$
|
|
\end_inset
|
|
|
|
and four
|
|
\begin_inset Quotes eld
|
|
\end_inset
|
|
|
|
doubled
|
|
\begin_inset Quotes erd
|
|
\end_inset
|
|
|
|
shares, each with survival probability
|
|
\begin_inset Formula $2p-p^{2}\approxeq.99$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Example
|
|
Combining the two single-peer share PMFs with the four double-share PMFs
|
|
gives a new file survival probability of
|
|
\begin_inset Formula $6.64\times10^{-6}$
|
|
\end_inset
|
|
|
|
.
|
|
Not as good as a full repair, but still quite respectable.
|
|
Also, if storage were not a concern, all six shares could be duplicated,
|
|
for a
|
|
\begin_inset Formula $Pr[file\, loss]=1.48\times10^{-7}$
|
|
\end_inset
|
|
|
|
, which is actually three time better than the nominal case.
|
|
\end_layout
|
|
|
|
\begin_layout Example
|
|
The reason such cheap repairs may be attractive in many cases is that distribute
|
|
d bandwidth is cheaper than bandwidth through a single peer.
|
|
This is particularly true if that single peer has a very slow connection,
|
|
which is common for home computers -- especially in the outbound direction.
|
|
\end_layout
|
|
|
|
\begin_layout Section
|
|
Long-Term Reliability
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Thus far, we've focused entirely on the probability that a file survives
|
|
the interval
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
between repair times.
|
|
The probability that a file survives long-term, though, is also important.
|
|
As long as the probability of failure during a repair period is non-zero,
|
|
a given file will eventually be lost.
|
|
We want to know the probability of surviving for time
|
|
\begin_inset Formula $T$
|
|
\end_inset
|
|
|
|
, and how the parameters
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
(time between repairs) and
|
|
\begin_inset Formula $L$
|
|
\end_inset
|
|
|
|
(allowed share low watermark) affect survival time.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
To model file survival time, let
|
|
\begin_inset Formula $T$
|
|
\end_inset
|
|
|
|
be a random variable denoting the time at which a given file becomes unrecovera
|
|
ble, and
|
|
\begin_inset Formula $R(t)=Pr[T>t]$
|
|
\end_inset
|
|
|
|
be a function that gives the probability that the file survives to time
|
|
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
.
|
|
|
|
\begin_inset Formula $R(t)$
|
|
\end_inset
|
|
|
|
is the cumulative distribution function of
|
|
\begin_inset Formula $T$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Most survival functions are continuous, but
|
|
\begin_inset Formula $R(t)$
|
|
\end_inset
|
|
|
|
is inherently discrete and stochastic.
|
|
The time steps are the repair intervals, each of length
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
, so
|
|
\begin_inset Formula $T$
|
|
\end_inset
|
|
|
|
-values are multiples of
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
.
|
|
During each interval, the file's shares degrade according to the probability
|
|
mass function of
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Aggressive Repair
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Let's first consider the case of an aggressive repairer.
|
|
Every interval, this repairer checks the file for share losses and restores
|
|
them.
|
|
Thus, at the beginning of each interval, the file always has
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
shares, distributed on servers with various individual and group failure
|
|
probabilities, which will survive or fail per the output of random variable
|
|
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
For any interval, then, the probability that the file will survive is
|
|
\begin_inset Formula $f\left(k\right)=Pr[K\geq k]$
|
|
\end_inset
|
|
|
|
.
|
|
Since each interval success or failure is independent, and assuming the
|
|
share reliabilities remain constant over time,
|
|
\begin_inset Formula \begin{equation}
|
|
R\left(t\right)=f(k)^{t}\end{equation}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
This simple survival function makes it simple to select parameters
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $K$
|
|
\end_inset
|
|
|
|
such that
|
|
\begin_inset Formula $R(t)\geq r$
|
|
\end_inset
|
|
|
|
, where
|
|
\begin_inset Formula $r$
|
|
\end_inset
|
|
|
|
is a user-specified parameter indicating the desired probability of survival
|
|
to time
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
.
|
|
Specifically, we can solve for
|
|
\begin_inset Formula $f\left(k\right)$
|
|
\end_inset
|
|
|
|
in
|
|
\begin_inset Formula $r\leq f\left(k\right)^{t}$
|
|
\end_inset
|
|
|
|
, giving:
|
|
\begin_inset Formula \begin{equation}
|
|
f\left(k\right)\geq\sqrt[t]{r}\end{equation}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
So, given a PMF
|
|
\begin_inset Formula $f\left(k\right)$
|
|
\end_inset
|
|
|
|
, to assure the survival of a file to time
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
with probability at least
|
|
\begin_inset Formula $r$
|
|
\end_inset
|
|
|
|
, choose
|
|
\begin_inset Formula $k$
|
|
\end_inset
|
|
|
|
such that
|
|
\begin_inset Formula $f\left(k\right)\geq\sqrt[t]{r}$
|
|
\end_inset
|
|
|
|
.
|
|
For example, if
|
|
\begin_inset Formula $A$
|
|
\end_inset
|
|
|
|
is one month, and
|
|
\begin_inset Formula $r=1-\nicefrac{1}{10^{6}}$
|
|
\end_inset
|
|
|
|
and
|
|
\begin_inset Formula $t=120$
|
|
\end_inset
|
|
|
|
, or 10 years, we calculate
|
|
\begin_inset Formula $f\left(k\right)\geq\sqrt[120]{.999999}\approx0.999999992$
|
|
\end_inset
|
|
|
|
.
|
|
Per the PMF of table
|
|
\begin_inset CommandInset ref
|
|
LatexCommand ref
|
|
reference "tab:Example-PMF"
|
|
|
|
\end_inset
|
|
|
|
, this means
|
|
\begin_inset Formula $k=2$
|
|
\end_inset
|
|
|
|
, achieves the goal, at the cost of a six-fold expansion in stored file
|
|
size.
|
|
If the lesser goal of no more than
|
|
\begin_inset Formula $\nicefrac{1}{1000}$
|
|
\end_inset
|
|
|
|
probability of loss is taken, then since
|
|
\begin_inset Formula $\sqrt[120]{.9999}=.999992$
|
|
\end_inset
|
|
|
|
,
|
|
\begin_inset Formula $k=5$
|
|
\end_inset
|
|
|
|
achieves the goal with an expansion factor of
|
|
\begin_inset Formula $2.4$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Repair Cost
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The simplicity and predictability of aggressive repair is attractive, but
|
|
there is a downside: Repairs cost processing power and bandwidth.
|
|
The processing power is proportional to the size of the file, since the
|
|
whole file must be reconstructed and then re-processed using the Reed-Solomon
|
|
algorithm, while the bandwidth cost is proportional to the number of missing
|
|
shares that must be replaced,
|
|
\begin_inset Formula $N-K$
|
|
\end_inset
|
|
|
|
.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Let
|
|
\begin_inset Formula $c\left(s,d,k\right)$
|
|
\end_inset
|
|
|
|
be a cost function that combines the processing cost of regenerating a
|
|
file of size
|
|
\begin_inset Formula $s$
|
|
\end_inset
|
|
|
|
and the bandwidth cost of downloading a file of size
|
|
\begin_inset Formula $s$
|
|
\end_inset
|
|
|
|
and uploading
|
|
\begin_inset Formula $d$
|
|
\end_inset
|
|
|
|
shares each of size
|
|
\begin_inset Formula $\nicefrac{s}{k}$
|
|
\end_inset
|
|
|
|
.
|
|
Also, let
|
|
\begin_inset Formula $D$
|
|
\end_inset
|
|
|
|
denote the random variable
|
|
\begin_inset Formula $N-K$
|
|
\end_inset
|
|
|
|
, which is the number of shares that must be redistributed to bring the
|
|
file share set back up to
|
|
\begin_inset Formula $N$
|
|
\end_inset
|
|
|
|
after degrading during an interval.
|
|
The probability mass function of
|
|
\begin_inset Formula $D$
|
|
\end_inset
|
|
|
|
is
|
|
\begin_inset Formula \[
|
|
Pr[D=d]=f(d)=\begin{cases}
|
|
Pr\left[K=N\right]+Pr[K<k] & d=0\\
|
|
Pr\left[K=N-d\right] & 0<d\leq N-k\\
|
|
0 & N-k<d\leq N\end{cases}\]
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The expected cost of repairs in a given interval, then, is simply
|
|
\begin_inset Formula $c\left(s,E\left[D\right],k\right)$
|
|
\end_inset
|
|
|
|
where E is the expected value function -- in this case:
|
|
\begin_inset Formula \begin{align*}
|
|
E[D] & =\sum_{d=0}^{N}d\cdot Pr\left[D=d\right]\\
|
|
& =0\cdot Pr\left[D=0\right]+\sum_{d=1}^{N-k}\left\{ d\cdot Pr\left[K=N-d\right]\right\} +\sum_{d=N-k+1}^{N}\left\{ d\cdot0\right\} \\
|
|
& =\sum_{d=1}^{N-k}d\cdot Pr\left[K=N-d\right]\end{align*}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Since each interval starts with a full complement of shares, the expected
|
|
repair cost for each interval is the same, and the cost for file that survives
|
|
for
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
intervals is
|
|
\begin_inset Formula $t\cdot c\left(s,E\left[D\right]\right)$
|
|
\end_inset
|
|
|
|
.
|
|
To calculate the lifetime repair cost, we just take the limit over all
|
|
intervals as
|
|
\begin_inset Formula $t\rightarrow\infty$
|
|
\end_inset
|
|
|
|
, discounting each cost by the probability that the file has already failed.
|
|
So, the lifetime expected repair cost is
|
|
\begin_inset Formula \begin{align*}
|
|
\sum_{t=1}^{\infty}R\left(t-1\right)c\left(s,E\left[D\right],k\right) & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}R\left(t-1\right)\\
|
|
& =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}f\left(k\right)^{t-1}\\
|
|
& =c\left(s,E\left[D\right],k\right)\cdot\frac{1}{1-f\left(k\right)}\\
|
|
& =\frac{c\left(s,E\left[D\right],k\right)}{1-f\left(k\right)}\end{align*}
|
|
|
|
\end_inset
|
|
|
|
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
It is also necessary to discount future cost, since CPU and bandwidth are
|
|
both going to get cheaper over time.
|
|
To accommodate this, we throw in an addition per-period discount rate
|
|
\begin_inset Formula $r$
|
|
\end_inset
|
|
|
|
.
|
|
In accordance with common discount rate usage, the discount multiplier
|
|
at time
|
|
\begin_inset Formula $t$
|
|
\end_inset
|
|
|
|
is
|
|
\begin_inset Formula $\left(1-r\right)^{t}$
|
|
\end_inset
|
|
|
|
.
|
|
This gives:
|
|
\begin_inset Formula \begin{align*}
|
|
\sum_{t=1}^{\infty}\left(1-r\right){}^{t}R\left(t-1\right)c\left(s,E\left[D\right],k\right) & =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t}f\left(k\right)^{t-1}\\
|
|
& =c\left(s,E\left[D\right],k\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t}f\left(k\right)^{t-1}\\
|
|
& =c\left(s,E\left[D\right],k\right)\left(1-r\right)\sum_{t=1}^{\infty}\left(1-r\right)^{t-1}f\left(k\right)^{t-1}\\
|
|
& =\frac{c\left(s,E\left[D\right],k\right)\left(1-r\right)}{1-\left(1-r\right)f\left(k\right)}\end{align*}
|
|
|
|
\end_inset
|
|
|
|
If
|
|
\begin_inset Formula $r=0$
|
|
\end_inset
|
|
|
|
this collapses to the previous result, as one would expect.
|
|
\end_layout
|
|
|
|
\begin_layout Subsection
|
|
Non-aggressive Repair
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Need to write this.
|
|
\end_layout
|
|
|
|
\begin_layout Section
|
|
Time-Sensitive Retrieval
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
The above work has almost entirely ignored the distinction between availability
|
|
and reliability.
|
|
\end_layout
|
|
|
|
\begin_layout Standard
|
|
Need to write this.
|
|
\end_layout
|
|
|
|
\end_body
|
|
\end_document
|