ON/OFF Model: A New Tool to Understand BGP Update Burst

asters « back to results for ""
Below is a cache of http://www.cs.colostate.edu/~massey/pubs/tr/massey_usctr04-819.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
ON/OFF Model: A New Tool to Understand BGP Update Burst ON/OFF Model: A New Tool to Understand
BGP Update Burst
Xiaoliang Zhao, Daniel Massey
University of Southern California
Information Sciences Institute
Email: {xzhao, masseyd}@isi.edu
Mohit Lad, Lixia Zhang
Computer Science Department
U. of California, Los Angeles
Email: {mohit, lixia}@cs.ucla.edu
Abstract
BGP, the inter-domain routing proto-
col, can exhibit complex behaviors under various
conditions. Although BGP log data have been made
available in the recent years, the sheer size of the log
data makes it dicult to interpret BGP behavior
using only the raw BGP update messages and un-
derstanding the global routing dynamics in todays
Internet remains a great challenge.
In this paper we focus on the analysis of BGP up-
date bursts, a commonly observed event that occurs
at varying frequency. We dene a BGP update burst
as an occurrence of a large number of BGP updates
that are separated by very short time intervals. To
investigate the causes of such bursts we developed
an ON/OFF model which can be used to classify the
BGP bursts into two classes: stable routing changes
and transient route apping. A stable routing change
means an existing route is replaced by a new route
that lasts for a long time period, while transient
route apping means a series of routing updates
occur for the prex over a short time period but
at the end of the burst the route is the same as the
original route. By applying our ON/OFF model to
BGP routing updates over the last two years, we
found that the ON/OFF model is an eective way to
identify stable routing changes, such as those caused
by physical failures in the network, and that about
half of the update bursts are caused by transient
route apping. Further investigation reveals the spe-
cic causes for a number of the transient appings.
Overall, the development of the ON/OFF model
helps us make a signicant step towards a complete
understanding of the global routing dynamics.
I. Introduction
The Internet consists of large number of Autonomous
Systems (AS) that exchange routing information with
each other to learn the best path to the destinations.
Presently, BGP (Border Gateway Protocol) is the de
facto inter-AS routing protocol and is designed to adapt
to link failures, AS topology changes and routing policy
changes. BGP is a path vector based routing protocol
and each BGP router advertises to neighbors (peers),
entire AS path information to destinations. To exchange
routing information, the BGP peers establish peering
sessions. Whenever a new BGP session is set up between
two peers, the complete routing tables are exchanged
between them. After this initial exchange, routers only
send update messages for routes that change or new
routes that are added. Information exchanged by BGP
is used for global routing. Therefore, faults or attacks
in the BGP infrastructure can lead to problems such as
denial of service and misdirected trac.
Ideally, as a protocol, there would be a solid under-
standing of BGPs behavior, its response to faults, and
its vulnerabilities to attacks. But in practice, the BGP
infrastructure constitutes a large scale system and could
exhibit complex behaviors under various conditions.
BGP log data have been available in the recent years,
provided by Oregon Route-Views [1] and RIPE [2]. In
their services, there are one or more monitoring points,
which are BGP routers that peer with routers within
ISPs. A monitoring point archives its BGP routing
table snapshots and the BGP updates received from its
peers. These update messages that either signal route
change or some route attribute change, are caused by
events such as a physical link failure, the emergence
of a better route, or simply a policy change. Due to
the large scale deployment of BGP, and policies, events
are hidden from the observers at the monitoring points.
Instead, what we see at these monitoring points, is
the results of the events. For instance, a physical link
failure is an event that would cause the ends of the link
to send update messages to their neighboring routers.
Depending on how many of these routers use this link,
we would have updates being propagated further. At a
remote monitoring point, all we see is update messages,
without any idea about what kind of event caused this
update. This problem, as well as the sheer size of the
log data, make it dicult to interpret BGP behavior
using only the raw BGP updates messages. Therefore,
understanding BGP dynamic behavior continues to be
an open challenge.
In this paper, we propose a model that would be a
signicant step toward a complete understanding of the
global routing dynamics. This paper is an attempt to
demystify the events behind these updates as observed
from monitoring points and to gain some high level in-
sight into what these updates can tell us about the type
of changes in BGP routes. In particular we study the
event of BGP update message bursts. BGP burst refers
to a series of updates triggered by routing changes. We
show that with our model we can gain considerable
insight into the events causing these bursts. We classify
BGP bursts into two classes: transient routing changes
and non-transient routing changes. A transient routing change refers to a change in which a route, after a
series of routing updates, is eventually restored back,
while a non-transient change is one in which a route
is replaced by another route for a signicantly long
time. Transient changes, if better understood, could be
potentially benecial for operational practices, such as
optimizing some BGP parameters to better handle such
changes.
By applying our ON/OFF model to BGP routing
updates over the last two years, we found that the
ON/OFF model is an eective way to identify stable
routing changes, such as those caused by physical fail-
ures in the network, and that about half of the update
bursts are caused by transient route apping. Further
investigation reveals the specic causes for a number
of the transient appings. Overall, the development of
the ON/OFF model helps us make a signicant step
towards a complete understanding of the global routing
dynamics.
The paper is organized as follows. Section II talks
about our methodology used for the data processing.
Section III presents the ON/OFF model. Section IV
shows that, given a ON timer as ve minutes, there
are 50% of total BGP bursts are transient changes,
as well as some statistics for duration distribution of
BGP bursts are presented. Section V studies some cases
of BGP bursts and found some of them are caused
by worm activities, faults, which may suggest us to
look back at protocol design more carefully to better
response to those changes.
II. Data Source
We analyzed BGP routing updates collected by RIPE
NCC[2] during several months in 2001 and 2002. RIPE
NCC has eight data monitoring points (rrc00 - rrc07).
We selected the rrc00 monitoring point and gathered
data from the BGP routers listed in Table I. Some
of these routers are located in global ISPs and others
are located in regional ISPs. Geographically, routers
are located in dierent countries including the United
States, Japan and three European countries.
We chose the rrc00 monitoring point because it re-
ceives full routing tables from ISPs. If an ISP only
provides partial routing tables and then withdraws its
route to a prex, this may indicate that ISP has lost its
route to this prex or may indicate the ISP has simply
changed routes and the new route does not match the
partial export policy.
It should also be noted that BGP updates are sent to
the monitoring point via multi-hop BGP connections.
In the operational Internet, nearly all ISP peerings are
through BGP routers sharing a common physical link,
where BGP updates are sent via TCP connection over
single link/hop. However, the BGP monitoring point
RRC00 peers with ISP routers via TCP connections
that cross multiple route hops and links. When the
multi-hop session fails, the monitoring point reports
Location
ASes that rrc00s peers belong to
US
AS7018 (AT&T), AS2914 (Verio)
Netherlands
AS3333 (RIPE NCC)
AS1103 (SURFnet)
AS3257 (Tiscali Global)
Switzerland
AS513 (CERN), AS9177 (Nextra)
Britain
AS3549 (Global Crossing)
Japan
AS4777 (NSPIXP2)
TABLE I
RRC00s peering ASes that we examined
a session state change. Note that if a peering session
is reset, all routes are implictily withdrawn and, when
the new peering session is started again,