[Mnet-devel] erasure code optimization project
Lenny G Arbage
alengarbage at yahoo.com
Thu Aug 5 21:39:37 BST 2004
--- Bryce Wilcox-O'Hearn <zooko at zooko.com> wrote:
>
> > There is a paper
> > at the website that goes into some depth about
> > "inefficiency," and shows that values of about
> k*1.05
> > are achievable.
>
> Is it this one?
>
> ``Design, Evaluation and Comparison of Four Large
> Block FEC Codecs,
> LDPC, LDGM, LDGM Staircase and LDGM Triangle, plus a
> Reed-Solomon Small
> Block FEC Codec'',
> V. Roca, C. Neumann
>
> I haven't read it yet.
That's the one. While I'm not an expert in the
codec field by any stretch, its worth thumbing
through. See especially Sec 4. Spending some time
staring at Figures 3 and 4 might also be interesting.
> This paper:
>
> ``On the Practical Use of LDPC Erasure Codes for
> Distributed Storage
> Applications''
> by James S. Plank and Michael G. Thomason
>
>
http://www.cs.utk.edu/~plank/plank/papers/CS-03-510.html
>
> Seems to indicate that the "1.05" number is an
> asymptotic estimate, and
> that for many practical applications the "1.31" that
> you observed is
> more likely!
The Plank paper is an excellent study. Contrast the
conclusions with those that are drawn from graphs 3 &
4 in the INRIA paper -- I don't think I've given this
enough thought to reconcile the two perspectives, but
the INRIA paper's use of largish (1024B) shares
throughout certainly might be one of the reasons their
results appear so favorable...
> Is this too much? Let's assume for the moment that
> we would have to
> download 1.31 times as many blocks with a LDPC than
> with Reed-Solomon,
> but that the LDPC code would decode 5x as fast.
> Then if the time to
> download the file were less than 3/5 of the time to
> decode the file
> (with Reed-Solomon), then switching to LDPC would be
> a win. So
> approximately it is better to use LDPC if
> Reed-Solomon takes half-again
> as long to decode the file as you have to wait to
> download the file.
>
> However, consider that you can download and decode
> in parallel, so in
> practice Reed-Solomon might continue to be faster
> even if it takes
> twice as long to perform the Reed-Solomon decoding
> as it does to
> download the file.
>
> Unfortunately, the person who encoded and published
> the file in the
> first place didn't know the ratio between download
> speed and decode
> speed on your computer, so we need to think of a
> good strategy in
> advance.
>
> One possible approach would be to fix the block size
> at 64 KB for every
> block, and fix the expansion ratio to 3. (M=3*K)
>
> Then if filesize / 64 KB were greater than 84, which
> means that we
> could no longer fit into the GF(2^8) Reed-Solomon,
> we could switch to
> LDPC instead.
>
> I'm not sure that is better than our current plan of
> keeping GF(2^8)
> Reed-Solomon and just making the blocks bigger.
I'm somewhat agnostic on this point, to tell you the
truth. I'd considered the same tradeoff in the design
of a system similar to Mnet (done for a course I took)
-- using RSE for smaller files and LDGM for larger.
But simplicity is often better than cleverness, even
when cleverness actually pays off.
The other curves that neither of these papers
discuss are the relative performance gains of CPUs vs.
Networks, and what those curves might imply for
systems that want to be "built to last."
I was just browsing through mnetlib/filesystem. It
looks like the implementation allows plugging in a new
encoder/decoder in a fairly clean way (good for you!)
-- is that assessment accurate?
-Lenny
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail
More information about the Mnet-devel
mailing list