mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-02-08 04:10:25 +00:00
Lossmodel updates
Various improvements to the lossmodel, plus addition of README.lossmodel that provides a link to the PDF.
This commit is contained in:
parent
560b09d696
commit
3782c27ac5
28
docs/proposed/README.lossmodel
Normal file
28
docs/proposed/README.lossmodel
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
The lossmodel.lyx file is the source document for an in-progress paper
|
||||||
|
that analyzes the probability of losing files stored in a Tahoe
|
||||||
|
Least-acces File System under various scenarios. It describes:
|
||||||
|
|
||||||
|
1. How to estimate peer reliabilities, based on peer MTBF failure
|
||||||
|
data.
|
||||||
|
|
||||||
|
2. How to compute file loss probabilities, based on a given set of
|
||||||
|
shares stored on peers with estimated reliabilities. The peer
|
||||||
|
reliabilities do not have to be uniform, and the model takes into
|
||||||
|
account the file repair process.
|
||||||
|
|
||||||
|
3. How to estimate Tahoe parameters for k (shares needed), n (shares
|
||||||
|
distributed) and A (repair interval) to achieve a file reliability
|
||||||
|
target.
|
||||||
|
|
||||||
|
4. How to compute the estimated repair cost over time, discounted at
|
||||||
|
a fixed rate, of maintaining a file for a time period T.
|
||||||
|
|
||||||
|
Future work will also address the latter three issues in the context
|
||||||
|
of "non-aggressive" repair, where repair will only be performed if
|
||||||
|
too many shares are lost, and it will also extend the repair cost
|
||||||
|
estimation model to suggest cost functions appropriate for common
|
||||||
|
network architectures.
|
||||||
|
|
||||||
|
A PDF of the current version of the file may be downloaded from:
|
||||||
|
|
||||||
|
http://willden.org/~shawn/lossmodel.pdf
|
@ -1,4 +1,4 @@
|
|||||||
#LyX 1.6.1 created this file. For more info see http://www.lyx.org/
|
#LyX 1.6.2 created this file. For more info see http://www.lyx.org/
|
||||||
\lyxformat 345
|
\lyxformat 345
|
||||||
\begin_document
|
\begin_document
|
||||||
\begin_header
|
\begin_header
|
||||||
@ -56,7 +56,7 @@ Shawn Willden
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Date
|
\begin_layout Date
|
||||||
01/14/2009
|
07/22/2009
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Address
|
\begin_layout Address
|
||||||
@ -81,8 +81,8 @@ The allmydata Tahoe distributed file system uses Reed-Solomon erasure coding
|
|||||||
\begin_inset Formula $N$
|
\begin_inset Formula $N$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
shares, each of which is then delivered to a randomly-selected peer in
|
shares which are delivered to randomly-selected peers in a distributed
|
||||||
a distributed network.
|
network.
|
||||||
The file can later be reassembled from any
|
The file can later be reassembled from any
|
||||||
\begin_inset Formula $k\leq N$
|
\begin_inset Formula $k\leq N$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -112,7 +112,7 @@ s the missing ones, bringing the availability back up to full.
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
The question we're trying to answer is "What's the probability that we'll
|
The question we're trying to answer is "What is the probability that we'll
|
||||||
be able to reassemble the file at some later time
|
be able to reassemble the file at some later time
|
||||||
\begin_inset Formula $T$
|
\begin_inset Formula $T$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -136,11 +136,11 @@ The question we're trying to answer is "What's the probability that we'll
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
in order to ensure
|
in order to ensure
|
||||||
\begin_inset Formula $Pr[loss]\leq t$
|
\begin_inset Formula $Pr[loss]\leq r$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
for some threshold probability
|
for some threshold probability
|
||||||
\begin_inset Formula $t$
|
\begin_inset Formula $r$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -149,45 +149,33 @@ The question we're trying to answer is "What's the probability that we'll
|
|||||||
\begin_inset Formula $Pr[loss]$
|
\begin_inset Formula $Pr[loss]$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
by choosing small
|
by selecting conservative parameters, these choices have costs.
|
||||||
\begin_inset Formula $k,$
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
large
|
|
||||||
\begin_inset Formula $N$
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
, small
|
|
||||||
\begin_inset Formula $A$
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
, and setting
|
|
||||||
\begin_inset Formula $L=N$
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
, these choices have costs.
|
|
||||||
The peer storage and bandwidth consumed by the share distribution process
|
The peer storage and bandwidth consumed by the share distribution process
|
||||||
are approximately
|
are approximately
|
||||||
\begin_inset Formula $\nicefrac{N}{k}$
|
\begin_inset Formula $\nicefrac{N}{k}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
times the size of the original file, so we would like to reduce this ratio
|
times the size of the original file, so we would like to minimize
|
||||||
as far as possible consistent with
|
\begin_inset Formula $\nicefrac{N}{k}$
|
||||||
\begin_inset Formula $Pr[loss]\leq t$
|
\end_inset
|
||||||
|
|
||||||
|
, consistent with
|
||||||
|
\begin_inset Formula $Pr[loss]\leq r$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
Likewise, frequent and aggressive repair process can be used to ensure
|
Likewise, a frequent and aggressive repair process keeps the number of
|
||||||
that the number of shares available at any time is very close to
|
shares available close to
|
||||||
\begin_inset Formula $N,$
|
\begin_inset Formula $N,$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
but at a cost in bandwidth as the repair agent downloads
|
but at a cost in bandwidth and processing time as the repair agent downloads
|
||||||
|
|
||||||
\begin_inset Formula $k$
|
\begin_inset Formula $k$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
shares to reconstruct the file and uploads new shares to replace those
|
shares, reconstructs the file and uploads new shares to replace those that
|
||||||
that are lost.
|
are lost.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Section
|
\begin_layout Section
|
||||||
@ -198,13 +186,13 @@ Reliability
|
|||||||
The probability that the file becomes unrecoverable is dependent upon the
|
The probability that the file becomes unrecoverable is dependent upon the
|
||||||
probability that the peers to whom we send shares are able to return those
|
probability that the peers to whom we send shares are able to return those
|
||||||
copies on demand.
|
copies on demand.
|
||||||
Shares that are returned in corrupted form can be detected and discarded,
|
Shares that are corrupted are detected and discarded, so there is no need
|
||||||
so there is no need to distinguish between corruption and loss.
|
to distinguish between corruption and loss.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
There are a large number of factors that affect share availability.
|
Many factors affect share availability.
|
||||||
Availability can be temporarily interrupted by peer unavailability, due
|
Availability can be temporarily interrupted by peer unavailability due
|
||||||
to network outages, power failures or administrative shutdown, among other
|
to network outages, power failures or administrative shutdown, among other
|
||||||
reasons.
|
reasons.
|
||||||
Availability can be permanently lost due to failure or corruption of storage
|
Availability can be permanently lost due to failure or corruption of storage
|
||||||
@ -238,12 +226,12 @@ Another consideration is that some failures affect multiple shares.
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
While the types of failures that may occur are pretty consistent across
|
While the types of failures that may occur are quite consistent across peers,
|
||||||
even very different peers, their probabilities differ dramatically.
|
their probabilities differ dramatically.
|
||||||
A professionally-administered blade server with redundant storage, power
|
A professionally-administered server with redundant storage, power and
|
||||||
and Internet located in a carefully-monitored data center with automatic
|
Internet located in a carefully-monitored data center with automatic fire
|
||||||
fire suppression systems is much less likely to become either temporarily
|
suppression systems is much less likely to become either temporarily or
|
||||||
or permanently unavailable than the typical virus and malware-ridden home
|
permanently unavailable than the typical virus and malware-ridden home
|
||||||
computer on a single cable modem connection.
|
computer on a single cable modem connection.
|
||||||
A variety of situations in between exist as well, such as the case of the
|
A variety of situations in between exist as well, such as the case of the
|
||||||
author's home file server, which is administered by an IT professional
|
author's home file server, which is administered by an IT professional
|
||||||
@ -268,7 +256,7 @@ Reliability
|
|||||||
\begin_inset Formula $s_{i}$
|
\begin_inset Formula $s_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
will surve to (be retrievable at) time
|
will survive to (be retrievable at) time
|
||||||
\begin_inset Formula $T=A$
|
\begin_inset Formula $T=A$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -278,8 +266,12 @@ Reliability
|
|||||||
any lost shares.
|
any lost shares.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Definition
|
\begin_layout Standard
|
||||||
Reliability is clearly dependent on
|
Reliability
|
||||||
|
\begin_inset Formula $p_{i}$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
is clearly dependent on
|
||||||
\begin_inset Formula $A$
|
\begin_inset Formula $A$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -296,7 +288,181 @@ decay
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Subsection
|
\begin_layout Subsection
|
||||||
Fixed Reliability
|
Peer Reliability
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Since peer reliability is the basis for any computations we may do on share
|
||||||
|
and file reliability, we must have a way to estimate it.
|
||||||
|
Reliability modeling of hardware, software and human performance are each
|
||||||
|
complex topics, the subject of much ongoing research.
|
||||||
|
In particular, the reliability of one of the key components of any peer
|
||||||
|
from our perspective -- the hard drive where file shares are stored --
|
||||||
|
is the subject of much current debate.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
A common assumption about hardware failure is that it follows the
|
||||||
|
\begin_inset Quotes eld
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
bathtub curve
|
||||||
|
\begin_inset Quotes erd
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, with frequent failures during the first few months, a constant failure
|
||||||
|
rate for a few years and then a rising failure rate as the hardware wears
|
||||||
|
out.
|
||||||
|
This curve is often flattened by burn-in stress testing, and by periodic
|
||||||
|
replacement that assures that in-service components never reach
|
||||||
|
\begin_inset Quotes eld
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
old age
|
||||||
|
\begin_inset Quotes erd
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
In any case, we're generally going to ignore all of that complexity and
|
||||||
|
focus on the bottom of the bathtub, assuming constant failure rates.
|
||||||
|
This is a particularly reasonable assumption as long as we're focused on
|
||||||
|
failures during a particular, relatively short interval
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
Towards the end of this paper, as we examine failures over many repair
|
||||||
|
intervals, the assumption becomes more tenuous, and we note some of the
|
||||||
|
issues.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Subsubsection
|
||||||
|
Estimate Adaptation
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Even assuming constant failure rates, however, it will be rare that the
|
||||||
|
duration of
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
coincides with the available failure rate data, particularly since we want
|
||||||
|
to view
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
as a tunable parameter.
|
||||||
|
It's necessary to be able adapt failure rates baselined against any given
|
||||||
|
duration to the selected value of
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Another issue is that failure rates of hardware, etc., are necessarily continuous
|
||||||
|
in nature, while the per-interval failure/survival rates that are of interest
|
||||||
|
for file reliability calculations are discrete -- a peer either survives
|
||||||
|
or fails during the interval.
|
||||||
|
The continuous nature of failure rates means that the common and obvious
|
||||||
|
methods for estimating failure rates result in values that follow continuous,
|
||||||
|
not discrete distributions.
|
||||||
|
The difference is minor for small failure probabilities, and converges
|
||||||
|
to zero as the number of intervals goes to infinity, but is important enough
|
||||||
|
in some cases to be worth correcting for.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Continuous failure rates are described in terms of mean time to failure,
|
||||||
|
and under the assumption that failure rates are constant, are exponentially
|
||||||
|
distributed.
|
||||||
|
Under these assumptions, the probability that a machine fails at time
|
||||||
|
\begin_inset Formula $t$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, is
|
||||||
|
\begin_inset Formula \[
|
||||||
|
f\left(t\right)=\lambda e^{-\lambda t}\]
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
where
|
||||||
|
\begin_inset Formula $\lambda$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
represents the per unit-time failure rate.
|
||||||
|
The probability that a machine fails at or before time
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
is therefore
|
||||||
|
\begin_inset Formula \begin{align}
|
||||||
|
F\left(t\right) & =\int_{0}^{A}f\left(x\right)dx\nonumber \\
|
||||||
|
& =\int_{0}^{A}\lambda e^{-\lambda x}dx\nonumber \\
|
||||||
|
& =1-e^{-\lambda A}\label{eq:failure-time}\end{align}
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Note that
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
and
|
||||||
|
\begin_inset Formula $\lambda$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
in
|
||||||
|
\begin_inset CommandInset ref
|
||||||
|
LatexCommand ref
|
||||||
|
reference "eq:failure-time"
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
must be expressed in consistent time units.
|
||||||
|
If they're different, unit conversions should be applied in the normal
|
||||||
|
way.
|
||||||
|
For example, if the estimate for
|
||||||
|
\begin_inset Formula $\lambda$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
is 750 failures per million hours, and
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
is one month, then either
|
||||||
|
\begin_inset Formula $A$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
should be represented as
|
||||||
|
\begin_inset Formula $30\cdot24/1000000=.00072$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, or
|
||||||
|
\begin_inset Formula $\lambda$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
should be converted to failures per month.
|
||||||
|
Or both may be converted to hours.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Subsubsection
|
||||||
|
Acquiring Peer Reliability Estimates
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Need to write this.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Subsection
|
||||||
|
Uniform Reliability
|
||||||
\begin_inset CommandInset label
|
\begin_inset CommandInset label
|
||||||
LatexCommand label
|
LatexCommand label
|
||||||
name "sub:Fixed-Reliability"
|
name "sub:Fixed-Reliability"
|
||||||
@ -323,8 +489,8 @@ In the simplest case, the peers holding the file shares all have the same
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
Each share's survival can be viewed as an indepedent Bernoulli trial with
|
Each share's survival can be viewed as an independent Bernoulli trial with
|
||||||
a succes probability of
|
a success probability of
|
||||||
\begin_inset Formula $p$
|
\begin_inset Formula $p$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -332,7 +498,7 @@ In the simplest case, the peers holding the file shares all have the same
|
|||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
follows the binomial distribution with paramaters
|
follows the binomial distribution with parameters
|
||||||
\begin_inset Formula $N$
|
\begin_inset Formula $N$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -377,7 +543,15 @@ A Bernoulli trial is simply a test of some sort that results in one of two
|
|||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
be a random variable that represents the number of successes.
|
be a random variable that represents the number,
|
||||||
|
\begin_inset Formula $m$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, of successes,
|
||||||
|
\begin_inset Formula $0\le m\le n$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
We say that
|
We say that
|
||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -387,7 +561,8 @@ A Bernoulli trial is simply a test of some sort that results in one of two
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
The probability that
|
The probability mass function (PMF) of K is a function that gives the probabili
|
||||||
|
ty that
|
||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -403,9 +578,10 @@ A Bernoulli trial is simply a test of some sort that results in one of two
|
|||||||
\begin_inset Formula $n-m$
|
\begin_inset Formula $n-m$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
failures) is called the probability mass function and is given by:
|
failures).
|
||||||
|
The PMF of K is
|
||||||
\begin_inset Formula \begin{equation}
|
\begin_inset Formula \begin{equation}
|
||||||
Pr[K=m]=f(m;n,p)=\binom{n}{p}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation}
|
Pr[K=m]=f(m;n,p)=\binom{n}{m}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation}
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -455,7 +631,8 @@ Now consider any reordering of these
|
|||||||
Since multiplication is commutative, each such reordering has the same
|
Since multiplication is commutative, each such reordering has the same
|
||||||
probability.
|
probability.
|
||||||
There are n-choose-m such orderings, and each ordering is an independent
|
There are n-choose-m such orderings, and each ordering is an independent
|
||||||
event, so the probability that any ordering of
|
event, meaning we can sum the probabilities of the individual orderings,
|
||||||
|
so the probability that any ordering of
|
||||||
\begin_inset Formula $m$
|
\begin_inset Formula $m$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -544,7 +721,7 @@ reference "eq:simple-failure"
|
|||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
assumes that each share has the same probability of survival, but as explained
|
assumes that all shares have the same probability of survival, but as explained
|
||||||
above, this is not necessarily true.
|
above, this is not necessarily true.
|
||||||
A more accurate model allows each share
|
A more accurate model allows each share
|
||||||
\begin_inset Formula $s_{i}$
|
\begin_inset Formula $s_{i}$
|
||||||
@ -570,11 +747,7 @@ reference "eq:simple-failure"
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
and
|
and
|
||||||
\begin_inset Formula $p_{i}$
|
\begin_inset Formula $p_{1},p_{2},\dots,p_{N}$
|
||||||
\end_inset
|
|
||||||
|
|
||||||
where
|
|
||||||
\begin_inset Formula $1\leq i\leq N$
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -589,7 +762,7 @@ The PMF for this generalized
|
|||||||
However, the PMFs for random variables representing individual share survival
|
However, the PMFs for random variables representing individual share survival
|
||||||
do.
|
do.
|
||||||
Let
|
Let
|
||||||
\begin_inset Formula $S_{i}$
|
\begin_inset Formula $K_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
be a random variable such that:
|
be a random variable such that:
|
||||||
@ -597,7 +770,7 @@ The PMF for this generalized
|
|||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
S_{i}=\begin{cases}
|
K_{i}=\begin{cases}
|
||||||
1 & \textnormal{if }s_{i}\textnormal{ survives}\\
|
1 & \textnormal{if }s_{i}\textnormal{ survives}\\
|
||||||
0 & \textnormal{if }s_{i}\textnormal{ fails}\end{cases}\]
|
0 & \textnormal{if }s_{i}\textnormal{ fails}\end{cases}\]
|
||||||
|
|
||||||
@ -608,14 +781,20 @@ S_{i}=\begin{cases}
|
|||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
The PMF for
|
The PMF for
|
||||||
\begin_inset Formula $S_{i}$
|
\begin_inset Formula $K_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
is very simple:
|
is very simple:
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
Pr[S_{i}=j]=\begin{cases}
|
Pr[K_{i}=j]=\begin{cases}
|
||||||
1-p_{i} & j=0\\
|
p_{i} & j=1\\
|
||||||
p_{i} & j=1\end{cases}\]
|
1-p_{i} & j=0\end{cases}\]
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
which can also be expressed as
|
||||||
|
\begin_inset Formula \[
|
||||||
|
Pr[K_{i}=j]=f\left(j\right)=\left(1-p_{i}\right)\left(1-j\right)+p_{i}\left(j\right)\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -624,7 +803,7 @@ p_{i} & j=1\end{cases}\]
|
|||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
Note that since each
|
Note that since each
|
||||||
\begin_inset Formula $S_{i}$
|
\begin_inset Formula $K_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
represents the count of shares
|
represents the count of shares
|
||||||
@ -635,16 +814,15 @@ Note that since each
|
|||||||
counts, we get the group survivor count.
|
counts, we get the group survivor count.
|
||||||
That is:
|
That is:
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
\sum_{i=1}^{N}S_{i}=K\]
|
\sum_{i=1}^{N}K_{i}=K\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
Effectively,
|
Effectively, we have separated
|
||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
has just been separated into the series of Bernoulli trials that make it
|
into the series of Bernoulli trials that make it up.
|
||||||
up.
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Theorem
|
\begin_layout Theorem
|
||||||
@ -709,54 +887,38 @@ where
|
|||||||
|
|
||||||
\begin_layout Proof
|
\begin_layout Proof
|
||||||
The proof is beyond the scope of this paper.
|
The proof is beyond the scope of this paper.
|
||||||
\begin_inset Foot
|
|
||||||
status collapsed
|
|
||||||
|
|
||||||
\begin_layout Plain Layout
|
|
||||||
\begin_inset Quotes eld
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
Beyond the scope of this paper
|
|
||||||
\begin_inset Quotes erd
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
usually means
|
|
||||||
\begin_inset Quotes eld
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
Too long and nasty to bore you with
|
|
||||||
\begin_inset Quotes erd
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
.
|
|
||||||
In this case it means
|
|
||||||
\begin_inset Quotes eld
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
The author hasn't the foggiest idea why this is true, or how to prove it,
|
|
||||||
but reliable authorities say it's real, and in practice it works a treat.
|
|
||||||
\begin_inset Quotes erd
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
|
|
||||||
\end_layout
|
|
||||||
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
If you don't believe it's true, look it up on Wikipedia, which is never
|
|
||||||
wrong.
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
Applying the discrete convolution theorem, if
|
If we denote the PMF of
|
||||||
\begin_inset Formula $Pr[K=i]=f(i)$
|
\begin_inset Formula $K$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
with
|
||||||
|
\begin_inset Formula $f$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
and the PMF of
|
||||||
|
\begin_inset Formula $K_{i}$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
with
|
||||||
|
\begin_inset Formula $g_{i}$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
(more formally,
|
||||||
|
\begin_inset Formula $Pr[K=x]=f(x)$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
and
|
and
|
||||||
\begin_inset Formula $Pr[S_{i}=j]=g_{i}(j)$
|
\begin_inset Formula $Pr[K_{i}=x]=g_{i}(x)$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, then
|
) then since
|
||||||
|
\begin_inset Formula $K=\sum_{i=1}^{N}K_{i}$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, according to the discrete convolution theorem
|
||||||
\begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$
|
\begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -777,7 +939,7 @@ Therefore,
|
|||||||
|
|
||||||
can be computed as a sequence of convolution operations on the simple PMFs
|
can be computed as a sequence of convolution operations on the simple PMFs
|
||||||
of the random variables
|
of the random variables
|
||||||
\begin_inset Formula $S_{i}$
|
\begin_inset Formula $K_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -796,8 +958,13 @@ reference "eq:convolution"
|
|||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
even in the case of the standard binomial distribution, primarily because
|
than the binomial theorem.
|
||||||
the binomial calculation in equation
|
even in the case of shares with identical survival probability.
|
||||||
|
The reason it's better is because the calculation of
|
||||||
|
\begin_inset Formula $\binom{n}{m}$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
in equation
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
LatexCommand ref
|
LatexCommand ref
|
||||||
reference "eq:binomial-pmf"
|
reference "eq:binomial-pmf"
|
||||||
@ -811,24 +978,20 @@ reference "eq:binomial-pmf"
|
|||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
Note also that it is not necessary to have very simple PMFs like those of
|
Note also that it is not necessary to have very simple PMFs like those of
|
||||||
the
|
the
|
||||||
\begin_inset Formula $S_{i}$
|
\begin_inset Formula $K_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
Any share or set of shares that has a known PMF can be combined with any
|
Any share or set of shares that has a known PMF can be combined with any
|
||||||
other set with a known PMF by convolution, as long as the two share sets
|
other set with a known PMF by convolution, as long as the two share sets
|
||||||
are independent.
|
are independent.
|
||||||
Since PMFs are easily represented as simple lists of probabilities, where
|
The reverse holds as well; given a group with an empirically-derived PMF,
|
||||||
the
|
in it's theoretically possible to solve for an individual PMF, and thereby
|
||||||
\begin_inset Formula $i$
|
determine
|
||||||
|
\begin_inset Formula $p_{i}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
th element in the list corresponds to
|
even when per-share data is unavailable.
|
||||||
\begin_inset Formula $Pr[K=i]$
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
, these functions are easily managed in software, and computing the convolution
|
|
||||||
is both simple and efficient.
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Subsection
|
\begin_layout Subsection
|
||||||
@ -845,8 +1008,8 @@ name "sub:Multiple-Failure-Modes"
|
|||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
In modeling share survival probabilities, it's useful to be able to analyze
|
In modeling share survival probabilities, it's useful to be able to analyze
|
||||||
separately each of the various failure modes.
|
separately each of the various failure modes.
|
||||||
If reliable statistics for disk failure can be obtained, then a probability
|
For example, if reliable statistics for disk failure can be obtained, then
|
||||||
mass function for that form of failure can be generated.
|
a probability mass function for that form of failure can be generated.
|
||||||
Similarly, statistics on other hardware failures, administrative errors,
|
Similarly, statistics on other hardware failures, administrative errors,
|
||||||
network losses, etc., can all be estimated independently.
|
network losses, etc., can all be estimated independently.
|
||||||
If those estimates can then be combined into a single PMF for a share,
|
If those estimates can then be combined into a single PMF for a share,
|
||||||
@ -873,7 +1036,7 @@ th failure mode of share
|
|||||||
|
|
||||||
, then
|
, then
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
Pr[S_{i}=k]=f_{i}(k)=\begin{cases}
|
Pr[K_{i}=k]=f_{i}(k)=\begin{cases}
|
||||||
\prod_{j=1}^{m}p_{i,j} & k=1\\
|
\prod_{j=1}^{m}p_{i,j} & k=1\\
|
||||||
1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\]
|
1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\]
|
||||||
|
|
||||||
@ -918,12 +1081,12 @@ If there are failure modes that affect multiple computers, we can also construct
|
|||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, a random variable representing the number of surviors is
|
, a random variable representing the number of survivors is
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
Pr[K=i]=f(i)=\begin{cases}
|
Pr[K=k]=f(k)=\begin{cases}
|
||||||
p & i=n\\
|
p & k=n\\
|
||||||
0 & 0<i<n\\
|
0 & 0<i<n\\
|
||||||
1-p & i=0\end{cases}\]
|
1-p & k=0\end{cases}\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -994,7 +1157,7 @@ Four PCs located in random homes, connected to the Internet via assorted
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
If one share is placed on each of these 20 computers, what's the probability
|
If one share is placed on each of these 12 computers, what's the probability
|
||||||
mass function of share survival? To more compactly describe PMFs, we'll
|
mass function of share survival? To more compactly describe PMFs, we'll
|
||||||
denote them as probability vectors of the form
|
denote them as probability vectors of the form
|
||||||
\begin_inset Formula $\left[\alpha_{o},\alpha_{1},\alpha_{2},\ldots\alpha_{n}\right]$
|
\begin_inset Formula $\left[\alpha_{o},\alpha_{1},\alpha_{2},\ldots\alpha_{n}\right]$
|
||||||
@ -1012,13 +1175,18 @@ If one share is placed on each of these 20 computers, what's the probability
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
The servers in the two data centers have individual survival probabilities
|
The servers in the two data centers have individual failure probabilities
|
||||||
of RAID failure (.0002) and administrative error (.003) giving
|
of RAID failure (.0002) and administrative error (.003) giving an individual
|
||||||
|
survival probability of
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
(1-.0002)\cdot(1-.003)=.9998\cdot.997=.9968\]
|
(1-.0002)\cdot(1-.003)=.9998\cdot.997=.9968\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
Using
|
Using
|
||||||
\begin_inset Formula $p=.9968,n=4$
|
\begin_inset Formula $p=.9968,n=4$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -1046,26 +1214,30 @@ which applies to each group of four servers.
|
|||||||
\begin_inset Formula $1.049\cdot10^{-10}$
|
\begin_inset Formula $1.049\cdot10^{-10}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
) times the probability they all fail because of a network outage (
|
) plus the probability they all fail because of a network outage (
|
||||||
\begin_inset Formula $.0001$
|
\begin_inset Formula $.0001$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
) less the probability they fail for both reasons:
|
) less the probability they fail for both reasons:
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
\left(1.049\times10^{-10}\right)+\left(0.0001\right)-\left[\left(1.049\times10^{-10}\right)\cdot\left(0.0001\right)\right]=0.0001\]
|
\left(1.049\times10^{-10}\right)+\left(0.0001\right)-\left[\left(1.049\times10^{-10}\right)\cdot\left(0.0001\right)\right]\approxeq0.0001\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
That's the
|
That's the
|
||||||
\begin_inset Formula $0$
|
\begin_inset Formula $i=0$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
th element of the combined PMF.
|
element of the combined PMF.
|
||||||
The combined probability of survival of
|
The combined probability of survival of
|
||||||
\begin_inset Formula $0<i\leq4$
|
\begin_inset Formula $0<i\leq4$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
servers is simpler: it's the probility they survive individual failure,
|
servers is simpler: it's the probability they survive individual failure,
|
||||||
from the individual failure PMF above, times the probability they survive
|
from the individual failure PMF above, times the probability they survive
|
||||||
network failure (.9999).
|
network failure (.9999).
|
||||||
So the combined survival PMF, which we'll denote as
|
So the combined survival PMF, which we'll denote as
|
||||||
@ -1096,9 +1268,9 @@ Of course, the failures need not be truly simultaneous, they just have happen
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
The same process for the Hawaii servers, but with group survival probability
|
We apply the same process for the Hawaii servers, but with group survival
|
||||||
of
|
probability of
|
||||||
\begin_inset Formula $(1-.0001)(1-.02)=.9799$
|
\begin_inset Formula $(1-.0001)(1-.04)=.9799$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
gives the survival PMF
|
gives the survival PMF
|
||||||
@ -1107,10 +1279,7 @@ h(i)=\left[0.0201,1.280\times10^{-7},5.982\times10^{-5},0.01242,0.9674\right]\]
|
|||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
which has the unusual property that it's more likely that all of the servers
|
|
||||||
will be lost than that only one will survive.
|
|
||||||
This is because in order for exactly one to survive, it's necessary for
|
|
||||||
three to have the
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -1140,10 +1309,30 @@ Applying the convolution operator to
|
|||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
Note the interesting fact that losing four shares is 10,000 times more likely
|
|
||||||
than losing three.
|
\end_layout
|
||||||
This is because both data centers have a whole-center failure modes, and
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\begin_inset VSpace defskip
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Note that losing four shares (
|
||||||
|
\begin_inset Formula $i=4$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
) is 10,000 times more likely than losing three (
|
||||||
|
\begin_inset Formula $i=5$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
).
|
||||||
|
This is because both data centers have a whole-center failure mode, and
|
||||||
the Hawaii center's lava burn probability is so high.
|
the Hawaii center's lava burn probability is so high.
|
||||||
|
Similarly, the probability of losing all of them is 1000 times higher than
|
||||||
|
the probability of losing all but one.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -1173,16 +1362,16 @@ reference "eq:binomial-pmf"
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
to compute the PMF
|
to compute the PMF
|
||||||
\begin_inset Formula $f(i),0\leq i\leq4$
|
\begin_inset Formula $g(i),0\leq i\leq4$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
for the PCs and finally compute
|
for the PCs and finally compute
|
||||||
\begin_inset Formula $s(i)=\left(f\star\left(n\star h\right)\right)\left(i\right)$
|
\begin_inset Formula $f(i)=\left(g\star\left(n\star h\right)\right)\left(i\right)$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, the PMF of the whole share set.
|
, the PMF of the whole share set.
|
||||||
Summing the values of
|
Summing the values of
|
||||||
\begin_inset Formula $s(i)$
|
\begin_inset Formula $f(i)$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
for
|
for
|
||||||
@ -1848,7 +2037,7 @@ The table demonstrates the importance of the selection of
|
|||||||
|
|
||||||
.
|
.
|
||||||
Normally, reducing the number of shares needed for reassembly improve the
|
Normally, reducing the number of shares needed for reassembly improve the
|
||||||
file's chances of survival, but in this case it provides a miniscule gain
|
file's chances of survival, but in this case it provides a minuscule gain
|
||||||
in reliability at the cost of a 10% increase in bandwidth and storage consumed.
|
in reliability at the cost of a 10% increase in bandwidth and storage consumed.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
@ -1898,10 +2087,6 @@ A cheaper repair option is simply to direct some peer that has share
|
|||||||
in the network.
|
in the network.
|
||||||
This is not as good as actually replacing the lost share, though.
|
This is not as good as actually replacing the lost share, though.
|
||||||
Suppose that more shares were lost, leaving only
|
Suppose that more shares were lost, leaving only
|
||||||
\begin_inset Formula $ $
|
|
||||||
\end_inset
|
|
||||||
|
|
||||||
|
|
||||||
\begin_inset Formula $k$
|
\begin_inset Formula $k$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -1917,7 +2102,7 @@ A cheaper repair option is simply to direct some peer that has share
|
|||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
However, such cheap repair is not completely pointless; it does increase
|
However, such cheap repair is not completely pointless; it does increase
|
||||||
file survivability.
|
file survivability.
|
||||||
The question is: By how much?
|
But by how much?
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -1955,12 +2140,12 @@ More generally, if a single share is deployed on
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, the share survival count is a random variable
|
, the share survival count is a random variable
|
||||||
\begin_inset Formula $S$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
and the probability of share loss is
|
and the probability of share loss is
|
||||||
\begin_inset Formula \[
|
\begin_inset Formula \[
|
||||||
Pr[S=0]=(f_{1}\star f_{2}\star\ldots\star f_{n})(0)\]
|
Pr[K=0]=(f_{1}\star f_{2}\star\ldots\star f_{n})(0)\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -2023,7 +2208,7 @@ doubled
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
shares, each with survival probability
|
shares, each with survival probability
|
||||||
\begin_inset Formula $2p-p^{2}\approx.99$
|
\begin_inset Formula $2p-p^{2}\approxeq.99$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -2066,11 +2251,11 @@ Thus far, we've focused entirely on the probability that a file survives
|
|||||||
The probability that a file survives long-term, though, is also important.
|
The probability that a file survives long-term, though, is also important.
|
||||||
As long as the probability of failure during a repair period is non-zero,
|
As long as the probability of failure during a repair period is non-zero,
|
||||||
a given file will eventually be lost.
|
a given file will eventually be lost.
|
||||||
We want to know what the probability of surviving for time
|
We want to know the probability of surviving for time
|
||||||
\begin_inset Formula $T$
|
\begin_inset Formula $T$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
is, and how the parameters
|
, and how the parameters
|
||||||
\begin_inset Formula $A$
|
\begin_inset Formula $A$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -2078,7 +2263,7 @@ Thus far, we've focused entirely on the probability that a file survives
|
|||||||
\begin_inset Formula $L$
|
\begin_inset Formula $L$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
(share low watermark) affect survival time.
|
(allowed share low watermark) affect survival time.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -2113,7 +2298,7 @@ Most survival functions are continuous, but
|
|||||||
\begin_inset Formula $R(t)$
|
\begin_inset Formula $R(t)$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
is inherently discrete, and stochastic.
|
is inherently discrete and stochastic.
|
||||||
The time steps are the repair intervals, each of length
|
The time steps are the repair intervals, each of length
|
||||||
\begin_inset Formula $A$
|
\begin_inset Formula $A$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -2136,7 +2321,7 @@ Most survival functions are continuous, but
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Subsection
|
\begin_layout Subsection
|
||||||
Aggressive Repairs
|
Aggressive Repair
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -2148,7 +2333,7 @@ Let's first consider the case of an aggressive repairer.
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
shares, distributed on servers with various individual and group failure
|
shares, distributed on servers with various individual and group failure
|
||||||
probalities, which will survive or fail per the output of random variable
|
probabilities, which will survive or fail per the output of random variable
|
||||||
|
|
||||||
\begin_inset Formula $K$
|
\begin_inset Formula $K$
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -2226,7 +2411,11 @@ So, given a PMF
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, choose
|
, choose
|
||||||
\begin_inset Formula $k:f\left(k\right)\geq\sqrt[t]{r}$
|
\begin_inset Formula $k$
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
such that
|
||||||
|
\begin_inset Formula $f\left(k\right)\geq\sqrt[t]{r}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -2235,7 +2424,7 @@ So, given a PMF
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
is one month, and
|
is one month, and
|
||||||
\begin_inset Formula $r=1-\nicefrac{1}{1000000}$
|
\begin_inset Formula $r=1-\nicefrac{1}{10^{6}}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
and
|
and
|
||||||
@ -2243,7 +2432,7 @@ So, given a PMF
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, or 10 years, we calculate
|
, or 10 years, we calculate
|
||||||
\begin_inset Formula $f\left(k\right)\geq\sqrt[120]{.999999}\cong0.999999992$
|
\begin_inset Formula $f\left(k\right)\geq\sqrt[120]{.999999}\approx0.999999992$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
@ -2258,7 +2447,7 @@ reference "tab:Example-PMF"
|
|||||||
\begin_inset Formula $k=2$
|
\begin_inset Formula $k=2$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, achieves the goal, at the cose of a six-fold expansion in stored file
|
, achieves the goal, at the cost of a six-fold expansion in stored file
|
||||||
size.
|
size.
|
||||||
If the lesser goal of no more than
|
If the lesser goal of no more than
|
||||||
\begin_inset Formula $\nicefrac{1}{1000}$
|
\begin_inset Formula $\nicefrac{1}{1000}$
|
||||||
@ -2396,9 +2585,9 @@ Since each interval starts with a full complement of shares, the expected
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
It may also be useful to discount future cost, since CPU and bandwidth are
|
It is also necessary to discount future cost, since CPU and bandwidth are
|
||||||
both going to get cheaper over time.
|
both going to get cheaper over time.
|
||||||
To accomodate this, we throw in an addition per-period discount rate
|
To accommodate this, we throw in an addition per-period discount rate
|
||||||
\begin_inset Formula $r$
|
\begin_inset Formula $r$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
@ -2429,6 +2618,14 @@ If
|
|||||||
this collapses to the previous result, as one would expect.
|
this collapses to the previous result, as one would expect.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Subsection
|
||||||
|
Non-aggressive Repair
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Need to write this.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Section
|
\begin_layout Section
|
||||||
Time-Sensitive Retrieval
|
Time-Sensitive Retrieval
|
||||||
\end_layout
|
\end_layout
|
||||||
@ -2436,8 +2633,10 @@ Time-Sensitive Retrieval
|
|||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
The above work has almost entirely ignored the distinction between availability
|
The above work has almost entirely ignored the distinction between availability
|
||||||
and reliability.
|
and reliability.
|
||||||
In reality, temporary and permanent failures need to be modeled separately,
|
\end_layout
|
||||||
and
|
|
||||||
|
\begin_layout Standard
|
||||||
|
Need to write this.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\end_body
|
\end_body
|
||||||
|
Loading…
x
Reference in New Issue
Block a user