Byzantine Fault Tolerance of Inverse de Bruijn Overlay Networks for ...

ased overlay networks, called inverse de
Bruijn (IDB) graph, to enable multi-path P2P routing. The IDB overlays provide multiple entry
points and multiple routes between node pair. These features enable secure routing services in
O(log
k
n) time with O(k log
k
n) lookup messages and O(klog
k2
n) joining messages, where the
graph index k 2 and the full overlay has n=k
d
nodes. For large index k >> 2, the IDB overlays
handle random and Byzantine faults effectively, far beyond the capability of Chord or CAN.
The IDB overlays establish a Byzantine fault tolerance (BFT) framework, by which
secure P2P routing is guaranteed by using multiple routes to bypass faulty route. New BFT
routing algorithms are developed for secure joining and lookup operations. Consequently, fault
tolerance is achieved by majority vote on data returned from multiple routes. The paper presents
graph-theoretic properties of sparse IDB overlay networks and the protocols needed to establish
the BFT. Experimental results are reported on simulated IDB and Chord overlays. Both
theoretical and experimental results support the claimed BFT advantages. The performance gain
is obtained with small increase in messaging overhead. In summary, the IDB overlays are ideal
to cope with peer collusions or Byzantine attacks in DHT-based P2P systems.
Keywords:
Peer-to-peer systems, overlay networks, Byzantine faults, fault tolerance,
distributed hash table, Chord overlays, de Bruijn graphs, peer joining,
lookup services, and multi-path routing.


Manuscript submitted to IEEE Transactions on Parallel and Distributed Systems (TPDS) on October 20, 2006.
This research was supported by NSF ITR grant ACI-0325409. Y. Chen and K. Hwang are with the University of
Southern California, Los Angeles, CA 90089. Corresponding author: Kai Hwang, Email: kaihwang@usc.edu.

2
1. Introduction
Distributed hash table (DHT) has been suggested for building structured P2P systems by
many researchers [10, 24, 25, 29, 30, 34]. DHT-based overlays appear in different topologies
such as the Chord [34], CAN [29], and Pastry [32]. These BHT-based overlays are self-
organizing and flexible in adapting to changes in distributed environments [25]. However, their
adaptive features are exactly the sources of system vulnerability exploited by collusive peer
attacks in an open network environment. Peer routing and lookup services are very vulnerable in
structured P2P systems. The malicious peers could collude to miss-route messages, relay
corrupted data, or forge the routing information.
Hence, secure routing and lookup services have become the most challenging problem in
trusted P2P computing [2, 7, 8, 9, 14, 19, 23]. Routing faults in P2P systems are classified into
two types: fail-stop faults and Byzantine faults. Fail-stop faults are resulted from node or link
failures. Peers can detect the fail-stop faults among themselves. The faulty peer must stop
functioning and thus be expelled from the system, automatically. In other words, P2P systems
can tolerate fail-stop faults by simply isolating the faulty peers.
On the other hand, Byzantine faults in a P2P system are known as attacks, which are
difficult to detect due to its non-stop nature in faulty peers. The attackers disrupt the routing
functions in peer joining and lookup services by misleading users in message routing across the
peers. Byzantine faults have the potential to paralyze the entire P2P network operations, if not
properly controlled. Lamport, et al

[20] have formulated the Byzantine problem for distributed
computing. We attack the problem in P2P network systems.
Previous works on P2P fault tolerance [2, 11, 15, 19, 22, 30, 33] dealt most with random
attacks, by which each peer suffers a fault independently. However, these solutions cannot
handle peer collusions, effectively. We want to design a P2P overlay network to safeguard the
critical routing functions under Byzantine attacks from peer collusion [1, 3, 7]. In Table 1, we
list the key parameters used in this paper. Here, we consider a faulty node as one which provides
incorrect routing information in joining or lookup operations.
In the sequel, we denote the node degree by k, the dimension d, and s the sparseness of an
overlay network. The number of nodes of a DHT overlay is denoted by n. The maximum value
3
of n is equal to k
d
. The actual number of nodes in a sparse DHT overlay is expressed by s譳
d
,
where 0 < s 1. The route multiplicity m is the number of alternate routes between a node pair.
The fault node ration f specifies the fraction of real nodes that are faulty. The collusion group
size c refers to the number of peers colluding in Byzantine attacks. The English letters refer to
network parameters. The Greek letters are reserved for performance metrics.
Table 1. Notations of Key Parameters and Performance Metrics Used
Name, Notation
Brief Definition and Default Range
Full overlay size, n = k
d

Maximum number of nodes in an overlay network
A Chord overlay G(n, d)
A Chord overlay with maximum n = 2
d
nodes
Sparse DB(k, d, s)
overlay network
de Bruijn graph with node degree k, dimension d,
and sparseness s, where 0 < s 1
Sparse IDB (k, d, s)
overlay network
Inverse de Bruijn graph with node degree k,
dimension d, and sparseness s for 0 < s 1
Route multiplicity, m
Alternate routing paths between a node pair
Faulty node ratio, f
Fraction of faulty nodes in a P2P system, 0 f 1
Collusion group size, c
No. of peers colluding in Byzantine attacks, 1 c f譵
BFT Probability,
Probability of Byzantine fault-tolerant routing service
Average route length,
Average routing path length in terms of hop count
Route overlapping ratio,
Percentage of routing paths having common nodes

The rest of the paper is organized as follows: Section 2 discusses related works in
Byzantine fault tolerance in P2P routing. Section 3 introduces the IDB overlays, compared with
Chord overlay networks. Section 4 specifies the fault-tolerant peer joining process. We model
the joining and lookup faults in both IDB and chord networks. Section 5 describes the fault-
tolerating lookup services in BHT-based overlays. In section 6, we report the experimental
results on the proposed Byzantine fault tolerance (BFT) overlay networks. Finally, we
summarize the research findings and discuss further research needed.
2. Related Work and Our New Approach
Ever since the pioneering work on Byzantine problem by Lamport, et al

[20], many
researchers have studied Byzantine faults in P2P systems [1, 3, 5, 6, 27]. The problem gets even
4
worse in censor-free DHT-based overlay networks like Chord, CAN, and Pastry [12, 27]. Most
DHT-based P2P systems are subject to Byzantine fault attacks. To apply structured P2P
technology in the real world, a P2P system must be designed to tolerate both fail-stop faults and
Byzantine attacks from the peers. Yaos work [36] on protocols for secure communication is also
related to these kind of faults.
The de Bruijn (DB) graph was previously studied by [4, 16, 22, 31] for fault-tolerating
P2P computing. This DB graph is denoted by DB (k, d), where k 2 is the node degree (or
index) and d is the dimension of the graph. The graph has a key-search space of n = k
d
nodes
(keys). Each node is encoded by a k-ary string of d digits from the digit set {0, 1, 2,., k-1}. In
the binary case with k = 2, each node is identified by a string of d bits. Each node is connected
to at most k neighbors. Each neighbor has an encoding obtained by left-shifting one digit of the
current node encoding with a k-ary digit entering from the rightmost digital position.
We developed a new class of overlay networks, called inverse de Bruijn (IDB) graph,
which selects the routing path by right-shifting the successive node encodings. Throughout the
paper, we will prove that IDB graph outperforms the DB graph, Chord and CAN overlays in
lookup latency, route non-overlapping, and BFT capabilities. Attacks along P2P routing path
degrades the performance the P2P networks. Table 2 compares the fault tolerance and lookup
complexity of three classes of DHT-based overlay networks. The entries for Chord, CAN, and S-
Chord are well known. The new results on IDB overlays are reported in subsequent sections.
Table 2 Fault Tolerance and Lookup Complexity of DHT-based Overlay Networks
For an n-node IDB overlay, the network diameter is log
k
n. The lookup operation requires to
send O(k log
k
n) messages. The lookup latency equals O(log
k
n). There are O(k log
k 2
n) messages
Overlay
Network
Fault Tolerance and
System Capability
Lookup
Latency
Lookup
Messages
Peer Joining
Messages
CAN [29],
Chord [34]
Tolerate random faults
in fail-stop mode
O(log
2
n) O(log
2
n)
O(log
2
n)
S-Chord, [11]
Tolerate random faults in fail-
stop or Byzantine modes
(log
2
n)
(log
23
n)
(log
23
n)
Inverse De Bruijn
(IDB) Overlays
(This paper)
Tolerate random faults and
peer collusions in fail-stop
and Byzantine modes
O(log
k
n)
(Theorem 1)
O(k log
k
n)
(Theorem 4)
O(k log
k 2
n)
(Theorem 4)
5
needed per new peer to join the system. There are O(k) links stored at each node. Most previous
work handles random Byzantine faults, which are instantaneous. Each peer suffers a Byzantine
fault independently with a probability less than 0.5. The S-Chord [12] can resist more collusions.
However, this capability does not exist in other dynamic P2P systems.
To secure P2P routing, we characterize Byzantine faults and propose a no