Interconnect-Power Dissipation in a Microprocessor
td align=right>
Interconnect-Power Dissipation in a Microprocessor
Interconnect-Power Dissipation in a Microprocessor
Nir Magen
Intel Israel (74) Ltd.
Mobile Platform Group
Haifa 31015, Israel
(972) 48656830
nir.magen@intel.com
Avinoam Kolodny
Electrical Engineering Dept.
Technion
Haifa 32000, Israel
(972) 48294764
kolodny@ee.technion.ac.il
Uri Weiser
Intel Israel (74) Ltd.
Corporate Technology Group
Petach-Tikva 49527, Israel
(972) 39207246
uri.weiser@intel.com
Nachum Shamir
Intel Israel (74) Ltd.
Mobile Platform Group
Haifa 31015, Israel
(972) 48655913
nachum.shamir@intel.com
ABSTRACT
Interconnect power is dynamic power dissipation due to switching
of interconnection capacitances. This paper describes the
characterization of interconnect power in a state-of-the-art
high-performance microprocessor designed for power efficiency.
The analysis showed that interconnect power is over 50% of the
dynamic power. Over 90% of the interconnect power is consumed
by only 10% of the interconnections. Relations of interconnect
power to wire length distribution and hierarchy level of nets were
examined. In light of the results, a routers algorithms were
modified, to use larger wire spacing and minimal length routing
for the high power consuming interconnects. The power-aware
router algorithm was tested on synthesized blocks, demonstrating
average saving of 14% in the dynamic power consumption
without timing degradation or area increase. The results
demonstrate the obtainable benefits of tuning physical design
algorithms to save power.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Design Styles Microprocessors;
B.7.2 [Integrated Circuits]: Design Aids Placement and
Routing.
General Terms
Performance, Design.
Keywords
Interconnect power, low-power design, routing, wire spacing.
1. INTRODUCTION
Power dissipation of high-performance microprocessors is
becoming a limiting factor and hence design for efficient power
consumption is becoming a major design consideration. Dynamic
power is currently the main component of the power dissipation
[1]. Dynamic power consumption due to periodical switching of
capacitors is approximated by the well-known expression :
2
j
j
P
AF C V
f
=
(1)
Where AF
j
and C
j
are the Activity Factor (AF) and capacitance
for the j
th
signal. V is the supply voltage and f is the clock
frequency. In this calculation we neglect the short-circuit current,
which is later added by using an overall factor of about 10% as
shown by [2]. This paper focuses on Interconnect Power, i.e.
energy dissipation due to the switching of interconnection
capacitances, which are part of the total switched capacitance of
each net C
j
.
Previous studies have commented on the growing significance
of the interconnect power [3] [4]. Various methods were used to
reduce the power consumption. Most efforts were invested in
voltage reduction [5]
and frequency optimizations, such as [6],
gate sizing [7]
and clock gating [8]. However no direct design
effort to reduce the interconnect switched capacitance is known to
the authors.
In this work we study the role of interconnect power in the
overall dynamic power consumption. We quantify the magnitude
of this significant component and characterize the top power
consuming interconnections in order to detect power saving
opportunities. We also study methods to reduce the interconnect
power consumption by tuning and optimization of routing
algorithms. Our approach is based on a detailed case-study of a
recent microprocessor designed for power-efficiency.
In Section 2 we present our methodology for estimation and
extraction of the interconnect power component. Results of the
case study analysis are described in Section 3, along with possible
directions for power reduction in interconnect design. As a proof
of concept, we developed interconnect-power-aware router
algorithms and performed design experiments that are described
in Section 4. Ideas for future work are discussed in Section 5 and
the conclusions are summarized in Section 6.
2. POWER EVALUATION
METHODOLOGY
The interconnect power analysis was performed on a state-
of-the-art microprocessor designed for power-efficiency,
consisting of 77 million transistors, fabricated in 0.13 祄
technology. Dynamic power dissipation was analyzed using a
Stochastic Dynamic Power estimation (SDPE) technique [9].
Activity factors for all signals were extracted for 32 high-power
and focused stress tests by SDPE simulations. The power data
generated by the SDPE showed excellent correlation to silicon
measurements of the same tests, showing differences of less than
5% of the total power. In order to analyze the interconnect power
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
SLIP 04, February 14-15, 2004, Paris, France.
Copyright 2004 ACM 1-58113-818-0/04/0002$5.00.
7
consumption we generated a study database that included
structural and power attributes for the entire processor core
(excluding the second level cache, since it is very large, highly
regular and consumes negligible amount of dynamic power). The
stored information included the following data for each signal net:
Interconnect Length summing all metal segments between the
drivers and receivers. The net length was extracted from the
layout data. Repeater separated segments were considered as part
of the original net. The interconnect length based analysis doesnt
include the global clock grid, because its unique grid layout
makes its total length singular and quite meaningless. Local clock
signals, that branch out of the clock grid are included though.
Capacitances summing all types of capacitive loads, including
diffusion capacitances of drivers, capacitances of the metal wiring
and gate load of the receivers. Repeater gate and diffusion
capacitances were added to the original net.
The metal capacitance includes cross-capacitance to neighboring
nets, with a Miller factor of 1. When neighboring wires switch
simultaneously the energy associated with cross-capacitance may
be zero (if both wires switch in the same direction), or double the
energy of independent transitions (if they switch in opposite
directions). This is similar to the effect of crosstalk on delay. In
delay analysis the nets are often decoupled and the cross-
capacitance is multiplied by a Miller Factor [10]. This is
justified since delay analysis seeks a worst-case delay failure.
However, unlike delay which is localized in space and time,
dissipated power is a cumulative parameter which integrates many
transitions by all signals. Therefore, it is reasonable to assume
that random simultaneous transitions average-out, using an
average Miller factor of 1 (meaning that the extracted cross-
capacitance are not multiplied by any factor).
Fan out counting all gate-connected transistors, for modeling
simplicity (e.g. an inverter load is considered to be a fan out of 2).
Activity Factor average calculated for all SDPE tests.
Design hierarchy data separating the signals into local and
global sets. Nets inside a functional unit block (FUB) are
considered to be local nets while those interconnecting the blocks
are global nets.
Miscellaneous characterization data including: The metal layers
information and net classification (clock, signal nets and others).
Using these attributes, we could analyze various aspects of
interconnect-related contributions to the total dynamic power
expressed in equation (1).
3. MICROPROCESSOR ANALYSIS
RESULTS
3.1 Interconnect-Length Based Analysis
The interconnections attributes were examined to reveal
dependencies on wire length. The global clock grid is excluded
from this analysis. First the wire length distribution in the
processor core was analyzed, as in [1]. The number of nets versus
the net length is plotted over the figure published in [1], shown
here as Figure 1.
0.001
0.01
0.1
1
10
100
1000
10000
1
10
100
1000
10000
10