Pulsar: A resource-control architecture for time-critical service ...

applications
&
M. Astley
S. Bhola
M. J. Ward
K. Shagin
H. Paz
G. Gershinsky
The complexity of real-time systems is growing extremely rapidly, as they move from
isolated devices to multilevel networked systems. Traditional methodologies for
developing and managing these systems are not scaling to meet the requirements of a
new generation of distributed applications. While developers of complex real-time
applications are looking to service-oriented architecture to address their needs for
ease of development and flexibility of integration, current software infrastructures for
service-oriented applications do not address the issue of predictable latency for the
applications they host. In this paper, we present Pulsar, a resource-control architecture
for managing the end-to-end latency of a set of distributed, time-critical applications.
The primary entity of Pulsar is called a controller, which regulates an aspect of resource
allocation or scheduling policy. Controllers utilize policy configurations, which may
include latency targets to be achieved or resource allocations to be honored, and
interact with resource allocators and schedulers (e.g., thread schedulers, memory
allocators, or bandwidth reservation mechanisms) to effect local policy. Controllers
also provide feedback on how well they are executing a policy. Pulsar includes an
application model which captures resource-sensitive behavior and requirements and
is independent of high-level programming models and application programming
interfaces.
INTRODUCTION
As real-time systems become increasingly complex
and accommodate a new generation of distributed
applications, traditional methodologies for develop-
ing and managing these systems are not scaling to
meet the requirements of these applications. While
service-oriented architecture (SOA) offers an ap-
proach to addressing the needs of developers of
complex real-time applications with respect to ease
of development and exibility of integration, the
issue of predictable latency for these SOA applica-
tions is inadequately addressed by current software
infrastructures.

Copyright 2008 by International Business Machines Corporation. Copying in
printed form for private use is permitted without payment of royalty provided
that (1) each reproduction is done without alteration and (2) the Journal
reference and IBM copyright notice are included on the rst page. The title
and abstract, but no other portions, of this paper may be copied or distributed
royalty free without further permission by computer-based and other
information-service systems. Permission to republish any other portion of the
paper must be obtained from the Editor. 0018-8670/08/$5.00 2008 IBM
IBM SYSTEMS JOURNAL, VOL 47, NO 2, 2008
ASTLEY ET AL.
265 Real-time applications which are deployed over
such infrastructures have several key features:
Distributed resources. Components of an applica-
tion may be distributed, and may compete with
other distributed applications for resources such
as CPU, networking bandwidth, and memory;
Mixed requirements. The set of applications may
include hard real-time (i.e., those in which
deadlines must be met and all events must be
handled) and soft real-time applications, as well as
non-real-time workloads;
Dynamic loads. Workloads may be event-driven
and can vary signicantly over time; and
Dynamic resources. The set of available resources
(both on the server and the network) may change
over time due to failures or other reasons.
For example, a trade execution application for a
stock exchange may have components which are
distributed (for scalability and fault-tolerance) over
a cluster of servers. The application is responsible
for providing timely data from the market, as well as
accepting and executing orders to buy or sell
securities. The load experienced by the application
is heavily inuenced by trade volume, which varies
both predictably (e.g., due to the trade backlog at
market open) and unpredictably (e.g., due to
breaking news). In addition, the application may
service a variety of clients with different require-
ments and importance. Large investment banks, for
example, are willing to pay a high premium to
ensure a bounded latency on trade execution.
Individual investors, on the other hand, may settle
for a low-cost, best-effort service.
Satisfying the timeliness requirements for such an
application is a challenging problem. For example,
the dynamic load and availability of the system
prohibit traditional techniques such as static allo-
cation and scheduling. Instead, the system must be
able to shift resource allocation spontaneously as
needed. Likewise, the system must optimize re-
source allocation according to differing service level
and latency requirements. For example, clients may
express their requirements as a service level
agreement (SLA), which quanties the importance
and exibility of meeting requirements at different
service levels. SLAs can be used to allocate
resources in a manner which optimizes overall
benet to clients. Finally, scalable and fault-tolerant
control mechanisms are needed to minimize over-
head and provide resiliency in the face of server and
network failures.
In this paper, we present Pulsar, a resource-control
architecture for managing the end-to-end latency of
a set of distributed, time-critical applications. Pulsar
applications are described using a programming
model that captures resource-sensitive behavior and
requirements that are independent of high-level
programming models and application programming
interfaces (APIs). In particular, timeliness require-
ments are modeled as utility functions which map
utility as a function of end-to-end latency. Depend-
ing on the shape of the utility function, the system is
able to make trade-offs between total utility and
available resources. These trade-offs are reected in
resource-control policies which are distributed
among the nodes of the system.
Resource-control policies are constructed and en-
forced using controllers, which are arranged in a
hierarchy in order to regulate various aspects of
resource allocation or scheduling policy. Top-level
controllers establish overall resource-control poli-
cies based on utility trade-offs. These policies are
passed to child controllers and may include
latency targets which must be achieved or resource
allocations which must be honored. Intermediate
controllers (e.g., one per server) translate policies
received from parent controllers into policies dele-
gated to child controllers. At the lowest level,
controllers interact directly with resource allocators
and schedulers, e.g., thread schedulers, memory
allocators, or bandwidth reservation mechanisms,
in order to effect local policies.
In this paper, we present: (1) a distributed,
hierarchical control mechanism for distributed real-
time applications; (2) an abstract model of resources
which allows high-level control of these applications
without exposing explicit resource-control mecha-
nisms (e.g., resource pooling and sharing); (3) a
framework for utility-based optimization which
maximizes total system utility based on a novel
intermediate model of application requirements; and
(4) a method for online feedback and error
correction to improve system performance.
The paper is structured as follows. We present a
programming model for real-time applications in the
next section. This is followed by a description of an
architecture which supports this model. We dem-
ASTLEY ET AL.
IBM SYSTEMS JOURNAL, VOL 47, NO 2, 2008
266 onstrate our approach by way of a distributed
power-line fault detection application which we
describe in the section Scenario: Power-line fault
detection. In the section Evaluation, we evaluate
the performance of our architecture in this scenario.
We review related work in the subsequent section,
followed by our conclusion.
PROGRAMMING MODEL
Whereas real-time applications typically target a
specic high-level programming model, such as the
Real-time CORBA** specication
1
or the Real-time
Specication for Java**,
2
our techniques are fo-
cused on low-level resource management mecha-
nisms which are often independent of high-level
APIs. As a result, in this paper we describe
applications in terms of a low-level model which
captures resource-sensitive behavior and require-
ments. We view our model as a possible interme-
diate target for high-level APIs and languages,
including those that facilitate building service-
oriented applications like Web Services Business
Process Execution Language (WS-BPEL)
3
and Ser-
vice Component Architecture (SCA).
4
Applications with real-time deadlines are dened in
terms of jobs, job sets, and job ows. A job is a
discrete unit of computation that is localized on a
node in the system. Jobs may execute concurrently,
and all jobs are preemptible. Constraints among jobs
(e.g., mutual exclusion) may be specied in job sets
as described below. Jobs are organized into job sets
consisting of a job list (a list of jobs contained in the
set), a dependency graph (a directed acyclic graph
[DAG] with a unique root, one or more leaf nodes,
vertices which represent jobs, and directed edges
which represent dependencies between jobs), and
trigger events (a set of external events which cause
the job set to be executed).
The unique root of a job set is called the start job and
the leaf nodes are called end jobs. An edge in a job
set dependency graph is either a local edge or a
remote edge, according to