Tuning the WCET of Embedded Applications

Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
Tuning the WCET of Embedded Applications Tuning the WCET of Embedded Applications
Wankang Zhao
1
, Prasad Kulkarni
1
, David Whalley
1
,
Christopher Healy
2
, Frank Mueller
3
, Gang-Ryung Uh
4
1
Computer Science Dept., Florida State University, Tallahassee, FL 32306-4530; e-mail: whalley@cs.fsu.edu
2
Computer Science Dept., Furman University, Greenville, SC 29613; e-mail: chris.healy@fur man.edu
3
Computer Science Dept., North Carolina State University, Raleigh, NC 27695; e-mail: mueller@cs.ncsu.edu
4
Computer Science Dept., Boise State University, Boise, ID 83725; e-mail: uh@cs.boisestate.edu
Abstract
It is advantageous to not only calculate the WCET of an
application, but to also perform transformations to reduce
the WCET since an application with a lower WCET will
be less likely to violate its timing constraints. In this
paper we describe an environment consisting of an inter-
active compilation system and a timing analyzer, where a
user can interactively tune the WCET of an application.
After each optimization phase is applied, the timing ana-
lyzer is automatically invoked to calculate the WCET of
the function being tuned. Thus, a user can easily gauge
the progress of reducing the WCET. In addition, the user
can apply a genetic algorithm to search for an effective
optimization sequence that best reduces the WCET. Using
the genetic algorithm, we show that the WCET for a num-
ber of applications can be reduced by 7% on average as
compared to the default batch optimization sequence.
1. Introduction
Generating acceptable code for applications on embed-
ded systems is challenging. Unlike most general-purpose
applications, embedded applications often have to meet
various stringent constraints, such as time, space, and
power. Constraints on time are commonly formulated as
worst-case (WC) constraints. If these constraints are not
met, even only occasionally in a hard real-time system,
then the system may not be considered functional. The
worst-case execution time (WCET) must be calculated to
determine if a timing constraint will be met.
Unfortunately, many embedded system developers
empirically estimate the WCET by testing the application
and measuring the execution time. Testing alone is unsafe
since the WC input data is often difcult to derive. This
approach can result in an unsafe application since WC
timing constraints may not be met when an application is
deployed. More knowledgeable developers will test, mea-
sure, and make conservative assumptions in case the tim-
ing measurements do not truly reect the WCET, which
can result in loose estimates and higher overall costs.
Thus, accurate WCET predictions are required to produce
safe and cost effective embedded systems.
Accurate
WCET predictions can only be obtained by a tool that stat-
ically analyzes an application to calculate the WCET.
Such a tool is called a timing analyzer, and the process of
performing this calculation is called timing analysis.
WCET constraints can impact power consumption as
well. In order to conserve power, one can determine the
WC number of cycles required for a task and lower the
clock rate to still meet the timing constraint with less
slack. In contrast, conservative assumptions concerning
WCET may result in a processor being deployed that has a
higher clock rate and consumes more power.
Automatically generating acceptable code for embed-
ded microprocessors with a compiler is often much more
difcult than generating code for general-purpose proces-
sors. Besides sometimes having to meet a variety of con-
icting constraints, embedded microprocessors are typi-
cally much less regular and have many specialized archi-
tectural features. Because of the typical large volumes
produced for a product involving an embedded computer
system, many embedded systems applications are still
being developed in assembly language by hand in order to
meet the imposed constraints and to deal with the dif-
culty of exploiting the features of the machine. In fact,
two of the authors of this paper have recently spent time in
industry and have personally witnessed the development
and maintenance of assembly code applications. How-
ev er, dev eloping an application in assembly has many dis-
advantages that include higher development and mainte-
nance costs and less portable code.
It would be desirable to develop embedded system
applications in a high level language and still be able to
tune the WCET of an application. We hav e provided this
capability by integrating a WCET timing analyzer with an
interactive compilation system called VISTA (Vpo Inter-
active System for Tuning Applications) [1, 2]. One fea-
ture of VISTA is that it can automatically obtain
-1- performance feedback information, which can be used by
both the application developer and the compiler to make
phase ordering decisions. This information can include a
variety of measures, such as execution time or code size.
In this paper we describe how we modied VISTA so it
can use WCET as one of its performance criteria.
The remainder of the paper is structured as follows.
First, we review related work on improving, displaying,
and estimating WCET. Second, we give a brief overview
of the timing analyzer that we used in this work. Third,
we summarize the StarCore SC100 and how we retargeted
our compiler and timing analyzer for this processor.
Fourth, we describe VISTA and how we integrated the
timing analyzer with this framework. Fifth, we show the
benets that were achieved by performing searches using a
genetic algorithm to improve the WCET. Finally, we dis-
cuss future plans for developing compiler optimizations to
improve WCET and give the conclusions of the paper.
2. Related Work
There have been a variety of different techniques used
for timing analysis of optimized code over the years [3, 4,
5, 6, 7]. However, we are unaware of any timing analyzer
whose predictions are used by a compiler to select which
optimizations should be applied.
While there has been much work on developing com-
piler optimizations to reduce execution time and, to a
lesser extent, compiler optimizations to reduce space and
power consumption, there has been very little work where
compiler optimizations have been developed to reduce
WC performance. Marlowe and Masticola outlined how a
variety of standard compiler optimizations could poten-
tially affect timing constraints of critical portions in a task.
However, no implementation was described [8]. Hong and
Gerber developed a programming language with timing
constructs and used a trace scheduling approach to
improve code in what would be deemed a critical section
of the program. However, no empirical results were given
since the implementation did not interface with a timing
analyzer to serve as a guide for the optimizations or to
evaluate the impact on reducing WCET [9]. Both of these
papers outlined strategies that attempt to move code out-
side of critical portions within an application that have
been designated by a user to contain timing constraints. In
contrast, most real-time systems use the WCET of entire
tasks to determine if a schedule can be met. Lee et. al.
used WCET information to choose how to generate code
on a dual instruction set processor for the ARM and the
Thumb [10]. ARM code is generated for a selected subset
of basic blocks that can impact the WCET. Thumb code is
generated for the remaining blocks to minimize code size.
In contrast, we are using WCET information to select
compiler optimizations, as opposed to which instruction
set to select for code generation.
A user interface was developed at Florida State Univer-
sity that allows users to select portions of source code and
obtain timing predictions. Unlike VISTA, this interface
did not allow the user to affect the generated code or pro-
vide feedback during the compilation process [11, 12, 13].
Genetic algorithms have long been used to search for
solutions in a space that is too large to exhaustively evalu-
ate. Genetic algorithms have been used to search for
effective optimization sequences to improve speed, space,
or a combination of both [14, 2]. Genetic algorithms have
also been used in the context of timing analysis for empiri-
cally estimating the WCET, where mutations on the input
resulted in different execution times (objective function)
[15, 16, 17]. Our approach, in contrast, relies on a genetic
algorithm to identify optimization phase sequences that
result in reduced WCET, which is an orthogonal problem.
3. The Timing Analyzer
In this section we briey describe the timing analyzer
that we have previously developed and that served as the
starting point for the timing analyzer in this study. Figure
1 depicts the organization of the framework that was used
by the authors in the past to make WCET predictions. The
VPO (Very Portable Optimizer) compiler [18] was modi-
ed to produce the control ow and constraint information
as a side effect of the compilation of a source le. A static
cache simulator uses the control-ow information to give a
caching categorization for each instruction and data mem-