SyCHOSys: Compiled Energy-Performance Cycle Simulation

previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
SyCHOSys: Compiled Energy-Performance Cycle Simulation Appears in
Workshop on Complexity-Effective Design
,
27th
ISCA
, Vancouver, Canada, June 2000
SyCHOSys: Compiled Energy-Performance Cycle Simulation
Ronny Krashinsky, Seongmoo Heo, Michael Zhang, and Krste Asanovi碿
MIT Laboratory for Computer Science, Cambridge, MA 02139
f
ronny|heomoo|rzhang|krste
g
@lcs.mit.edu
Abstract
SyCHOSys (Synchronous Circuit Hardware Orchestra-
tion System) generates high-speed energy-performance cy-
cle simulators by compiling a processor description into
efcient C++ code. This framework can custom compile
a cycle simulator with arbitrary mixed levels of simulation
detail ranging from gate-level to purely behavioral mod-
els. In addition, SyCHOSys can compile detailed energy
statistics gathering code into the simulator and generate
a custom analysis tool to combine the resulting statistics
with capacitance values extracted from circuit layout infor-
mation to give energy dissipation. To increase simulation
speed, we group circuit nodes having the same switching
activity and only count transitions once per group. We have
also developed energy estimation techniques that exploit
the properties of well-designed low-power microproces-
sors to improve the accuracy of simple transition-sensitive
energy models. We evaluate SyCHOSys using a custom
datapath circuit, and show close agreement ( 7% error)
with SPICE energy numbers, while simulating over 7 or-
ders of magnitude faster than SPICE and 5 orders of mag-
nitude faster than PowerMill. We also describe a structural
energy-performance simulation of a pipelined MIPS pro-
cessor built with SyCHOSys that can track all internal sig-
nal node transitions at 16 kHz.
1
Introduction
Energy dissipation is emerging as a key constraint for
both high-performance and embedded microprocessor de-
signs, requiring architects to consider energy in addition to
performance when evaluating design decisions. Unfortu-
nately, estimating energy dissipation for a candidate design
is considerably more difcult than estimating performance.
Circuit simulators such as SPICE [9] or PowerMill [7]
provide accurate energy numbers but run much too slowly
to evaluate the effect of architectural modications on large
benchmark programs. A number of techniques have been
proposed to estimate energy dissipation at higher levels of
abstraction. One class of methods make use of statisti-
cal measures of circuit complexity and/or expected activ-
ity to estimate energy dissipation [6, 10]. Although these
methods quickly provide estimates, they can give large er-
rors for test inputs that dont match the modeled statistics,
and cannot give cycle-by-cycle breakdowns of where en-
ergy was dissipated. For architectural studies, transition-
sensitive methods are more useful. These methods mea-
sure the actual signal transitions caused by an input work-
load and use them to animate energy models [8]. This tech-
nique has the advantage of providing detailed energy infor-
mation on a cycle by cycle basis, but has the disadvantage
of requiring dynamic simulation of whole program execu-
tion. One approach for obtaining the required fast proces-
sor simulator is to hand craft a C or C++ RTL model for a
particular processor conguration, such as in the Simple-
Power system [15], but writing and modifying such models
is time-consuming and error-prone.
To support our research into new energy-efcient ar-
chitectures, we are developing a fast but exible energy-
performance simulation framework named SyCHOSys
(Synchronous Circuit Hardware Orchestration System).
SyCHOSys is fast because it translates a structural ma-
chine description and related statistics gathering code into
inlined C++ code which is then compiled with a native
C++ compiler. The resulting cycle simulator is compara-
ble in performance to hand-crafted simulators and an or-
der of magnitude faster than commercial compiled Verilog
simulators. SyCHOSys is exible because it allows arbi-
trary C++ code to be included in the simulator. In ad-
dition, rather than generate a closed stand-alone simula-
tor, SyCHOSys produces a C++ object that can itself be
linked with other C++ code. SyCHOSys supports cycle
simulation at all levels of detail from purely behavioral to
gate level, and allows arbitrary forms of statistics gather-
ing code to be included. SyCHOSys saves effort com-
pared with a hand-crafted simulator because it automati-
cally schedules code block execution to satisfy all inter-
module data dependencies. In addition, it can automati-
cally add code to monitor inter-module activity for energy
transition counting. The structural input description al-
lows SyCHOSys to group nodes that have the same switch-
ing behavior to reduce the run-time overhead of transition counting.
A further contribution of this paper is the development
of accurate energy models driven by the limited infor-
mation available from cycle-accurate transition counting.
As described below, we exploit the properties of well-
designed low-power microprocessors to calibrate our mod-
els to give energy numbers within PowerMills error ( 7%
from SPICE) while allowing simulation over 5 orders of
magnitude faster.
The remainder of this paper is structured as follows.
Section 2 describes the structure of the SyCHOSys cycle
simulation system using a simple circuit example. Sec-
tion 3 describes the fast energy modeling techniques we are
developing for use with cycle simulators. In particular, we
focus on fast accurate techniques for estimating datapath
energy. Section 4 describes how energy statistics gather-
ing is added into the compiled cycle simulator. Section 5
evaluates the speed and accuracy of our datapath modeling
technique for the GCD circuit. Section 6 discusses the pro-
cessor models we are developing, and Section 7 describes
our plans for future work. Finally, Section 8 compares
SyCHOSys with other related work, and Section 9 sum-
marizes the paper.
2
SyCHOSys Overview
SyCHOSys generates cycle simulators from attened
structural netlists, as shown in Figure 1.
We use our
SyCHONet language to describe the structural netlists, and
C++ as the behavioral modeling language for netlist leaf
cells.
SyCHOSched takes a structural netlist as input,
and statically schedules evaluation of the behavioral blocks
specied in the netlist. It outputs C++ code containing calls
to the blocks behavioral methods. The resulting code can
then be compiled and linked together with the blocks def-
initions and with an external C++ environment that drives
the simulation by calling the statically scheduled evalua-
tion methods.
gcc
SyCHONet
SyCHOSched
scheduled
C++ code
SyCHOTick
C++ code
SyCHOLib
C++ library
SyCHOTick
simulator
Figure 1:
SyCHOSys framework.
Control
X
Y
Zero?
Sub
Figure 2:
GCD circuit. Note that the registers receive enable sig-
nals from the Control, and that Zero and Sub are dynamic logic.
GCD(x, y) {
if (x < y)
return GCD(y, x);
else if (y!=0) return GCD(x-y, y);
else
return x;
}
Figure 3:
Euclids
greatest common divisor algorithm.
To help explain the operation of SyCHOSys, we show a
small example synchronous circuit in Figure 2. This cir-
cuit implements Euclids greatest common divisor (GCD)
algorithm shown in Figure 3.
2.1
SyCHONet
The SyCHONet representation of the circuit is shown in
Figure 4. The SyCHONet format consists of one line for
each component in the circuit. Each SyCHONet line spec-
ies the name of the component, the behavioral type of the
component, and an ordered list of the components inputs.
Additionally, components such as ip-ops, latches, and
dynamic logic which have clock-dependent behavior are
tagged as such in the netlist. Untagged components are as-
sumed to be combinational logic blocks. The SyCHONet
format is designed to be machine-generated from a hierar-
chical design description such as structural Verilog.
2.2
SyCHOLib
SyCHOLib is a library of behavioral models. Each C++
behavioral component denes an
Evaluate()
method
which maps inputs to outputs. Figure 5 shows the Mux2
class. These methods can be parameterized using the C++
template mechanism, e.g., to accommodate variable bit-
widths. When parameterized components are included in a
SyCHONet, the template parameters are specied, as with
the Mux2 shown in Figure 4. Additionally, some compo-
nents dene more than one evaluation method; for exam-
ple, dynamic logic components dene a
Precharge()
method in addition to the
Evaluate()
method.
2 X
{ N-CLK
FF_En<32> } (NextX.output, Ctrl.Xen);
Y
{ N-CLK
FF_En<32> } (X.output, Ctrl.Yen);
NextX
{
Mux2<32>
} (Y.output, XSubY.output, Ctrl.XMuxSel);
XSubY
{ H-DYNAMIC
Sub<32>
} (X.output, Y.output);
YZero
{ H-DYNAMIC
Zero<32>
} (Y.output);
YZeroLatch
{ H-LATCH
Latch<1>
} (YZero.output);
XLessYLatch
{ H-LATCH
Latch<1>
} (XSubY.signbit);
Ctrl
{
GCDCtrl
} (XLessYLatch.output, YZeroLatch.output);
Figure 4:
Netlist for GCD circuit.
template<int bits>
class Mux2 {
public:
Mux2(){};
inline void