Rateless Codes and Big Downloads
may have changed. You can check the current page or check for previous versions at the Internet Archive.
Yahoo! is not affiliated with the authors of this page or responsible for its content.
Rateless Codes and Big Downloads
Rateless Codes and Big Downloads
Petar Maymounkov and David Mazi`eres
牏·"ェΙ! #"%$'&()")(
NYU Department of Computer Science
Abstract
This paper presents a novel algorithm for download-
ing big les from multiple sources in peer-to-peer
networks. The algorithm is simple, but offers sev-
eral compelling properties.
It ensures low hand-
shaking overhead between peers that download les
(or parts of a les) from each other. It is computa-
tionally efcient, with cost linear in the amount of
data transfered. Most importantly, when nodes leave
the network in the middle of uploads, the algorithm
minimizes the duplicate information shared by nodes
with truncated downloads. Thus, any two peers with
partial knowledge of a given le can almost always
fully benet from each others knowledge. Our algo-
rithm is made possible by the recent introduction of
linear-time, rateless erasure codes.
1
Introduction
One of the most prominent uses of peer-to-peer sys-
tems is to download lesoften very large les,
such as movies [9]. More often than not, these les
are available at least in part at more than one node on
the network. This observation has inspired a number
of different algorithms for multi-source le down-
load [3], including some that have already been de-
ployed [1].
The basic multi-source download problem is sim-
ple. A set of nodes, called source nodes, have com-
plete knowledge of a certain le. A set of nodes we
call requesting nodes wish to obtain a copies of that
le. The goal is to transfer the le to the request-
ing nodes in a fast and bandwidth-efcient manner.
In practice, the task is complicated by the fact that
nodes can join or leave the system, aborting down-
loads and initiating new requests at any time. Thus, a
download algorithm can make very few assumptions
about the uptime or bandwidth capacity of participat-
ing nodes.
Most multi-source download algorithms take the
same general approach.
When a requesting node
needs a le, it rst locates a set nodes with full or par-
tial knowledge of that le. It then contacts as many
of them as necessary to download the le efciently.
For each source node the requesting node contacts,
the two must reconcile the differences in their knowl-
edge of the le. Then either the requesting node
downloads any non-overlapping information, or of-
ten both nodes exchange any non-overlapping infor-
mation they have about the le.
An effective multi-source download algorithm
should meet two main challenges. First, it should
maximize the utility of nodes with partial knowledge
of a le to each other. This, in turn, means minimiz-
ing the amount of overlapping information nodes are
likely to have. We call this property the availability
aspect of the algorithm, because it allows nodes with
truncated downloads to reconstruct a le even in the
event that every source node with the complete le
has left the network.
The second challenge of a multi-source down-
load algorithm is to make the reconciliation phase
as bandwidth-efcient as possible. This phase is an
instance of the more general set reconciliation prob-
lem [10, 4, 11, 2]. Unfortunately, existing set recon-
ciliation algorithms are not practical for multi-source
download algorithms. They are either too computa-
tionally costly, suboptimal in terms of message com-
plexity, or simply too complicated to implement.
In this paper, we propose an algorithm that com-
bines near-optimal availability with a simple yet
practical reconciliation phase not based on the gen-
eral set reconciliation problem.
Our approach is
1
made possible by the recent introduction of locally-
encodable, linear-time decodable, rateless erasure
codes. It exploits particular properties of the way
le contents tend to disperse over nodes in a peer-
to-peer system. The paper is presented in terms of a
new erasure code called on-line codes [8]. The re-
cently published LT-codes [7] are similar to on-line
codes, but have O
牏·&エЗ
running time, compared
to linear time for on-line codes.
The next section gives an overview of erasure
codes and their use in multi-source downloads and
introduces on-line codes. Section 3 details on-line
codes and their implementation. Section 4 describes
our multi-source download algorithm. Section 5 dis-
cusses aspects of the algorithm and open questions.
2
Loss-resilient codes
Erasure codes transform a message of
blocks into
an encoding of more than
blocks such that one can
recover the message from only a fraction of the en-
coded blocks. A block is just a xed-length bit string,
with block size a parameter of the algorithm. Many
erasure codes, including the on-line codes in this pa-
per, support blocks of arbitrary size, from a single
bit to tens of kilobytes or more. Choice of block size
is driven by the fact that a fragment of an encoded
block conveys no useful information about the orig-
inal message. Thus, blocks should be small enough
that aborted block transfers do not waste appreciable
bandwidth.
Conventional erasure codes also have a rate pa-
rameter, specifying the fraction of the encoded out-
put blocks required to reconstruct the original mes-
sage. An optimal erasure code with rate
trans-
forms a message of
blocks into
blocks such
that any
sufce to decode the original message.
Because of the cost of optimal codes, people often
employ near-optimal codes, which require
!"
output blocks to reconstruct the message for any
xed
$#&%
. The cost of a smaller
is increased
computation. Currently there is only one instance
of linear-time, near-optimal, conventional erasure
codes, called Tornado codes [5].
Linear-time, near-optimal erasure codes have al-
ready been used for multi-source downloads [3]. The
basic approach is for source nodes to encode les
with rate
(')021
to achieve a large expansion fac-
tor. When a node downloads les, the source node
sends a pseudo-random permutation of the encoded
blocks to the requesting node, deriving the permuta-
tion from the requesting nodes ID. Using this tech-
nique, two nodes that each have downloaded
%4365
encoded blocks of an
-block le will likely have
enough information between them to reconstruct the
le. Thus, the le will remain available even if all
source nodes leave the network. To generalize the
technique to more requesters, however, the expan-
sion factor
0
would need to grow proportionally to
the number of truncated downloads. Unfortunately,
even Tornado codes become impractically expensive
and memory-intensive for rates
7'80@9
.
A new class of erasure codes, rateless, locally-
encodable codes, addresses the problem. The two
novel properties of these codesratelessness and
local encodability, go hand-in-hand.
Ratelessness
means that each message of size
has practically
innite encoding.
Local encodability means that
any one encoding block can be computed quickly
and independently of the others.
Replacing con-
ventional xed-rate codes with rateless, locally-
encodable ones in the above scenario and making
some additional use of the unique properties of rate-
less codes leads to the multi-source download algo-
rithm presented in section 4 of this paper.
There are two instances of rateless codes, LT
codes [7] and on-line codes [8], both recently pro-
posed. We present our algorithm in terms of on-
line codes because they are more efcient, requiring
A
0
time to generate each encoding block and
A
牏
time to decode a message of length
. LT codes, in
contrast, take
A
燘&エЗ
and
A
牏·&エ
time respec-
tively (though they require no preprocessing, which
may make them more convenient for other settings).
The next section describes the implementation of on-
line codes in greater detail.
3
On-line codes
This section explains how to implement on-line
codes. A more detailed description and analysis of
the algorithm is available in [8]. On-line codes are
characterized by two parameters,
and
D
(in addi-
tion to the block size).
determines the degree of
2
OUTER
INNER
INNER
OUTER
ⅲⅲ
Partially-recovered message
and auxiliary blocks
Message blocks
Auxiliary blocks
Recovered message blocks
Received check blocks
Lossy channel
Check blocks
Figure 1: Overall design of online codes.
suboptimalitya message of
blocks can, with high
probability, be decoded from
"
output blocks.
D
, discussed subsequently, affects the success proba-
bility of the algorithmit may fail to reconstruct the
message with negligible probability
21
Εě
.
The overall structure of on-line codes has two lay-
ers, depicted in Figure 1. To encode a message, in
an outer encoding, we rst generate a small number
of auxiliary blocks and append them to the original
message to produce a composite message. The com-
posite message has the property that knowledge of
any
$!21
fraction of its blocks is sufcient to re-
cover the entire original message. Next, in