TFA: A Threshold-Based Filtering Algorithm for Propagation Delay and ...
Yahoo! is not affiliated with the authors of this page or responsible for its content.
TFA: A Threshold-Based Filtering Algorithm for Propagation Delay and Output Slew Calculation of High-Speed VLSI Interconnect
TFA: A Threshold-Based Filtering Algorithm for Propagation Delay and Output Slew
Calculation of High-Speed VLSI Interconnects
S. Abbaspour, A.H. Ajami
*
, M. Pedram, and E. Tuncer
*
Dept. of EE Systems, University of Southern California, Los Angeles, CA 90089
*
Magma Design Automation, Santa Clara, CA 95054
Abstract -
This paper describes an efficient threshold-based
filtering algorithm (TFA) for calculating the interconnect delay and
slew (transition time) in high-speed VLSI circuits. The key idea is to
divide the circuit nets into three groups of low, medium and high
complexity nets, whereby for low and medium complexity nets either
the first moment of the impulse response or the first and second
moments are used. For the high-complexity nets, which are
encountered infrequently, TFA resorts to the AWE method. The key
contribution of the paper is to come up with very effective and efficient
way of classifying the nets into these three groups. Experimental
results show that on a large industrial circuit using a state-of-the-art
commercial timing analysis that incorporates TFA, we were able to
achieve delay and slew estimation accuracies that are quite
comparable with the full-blown AWE-based calculators at runtimes
that were only 14% higher than those of a simple Elmore-delay
calculator.
1 Introduction
As the CMOS process technologies scale down towards
nanometer regions, the accuracy and efficiency of gate
propagation delay and interconnect delay calculation become
more critical to the successful timing closure of the integrated
circuit design flow. This is mainly due to the rapidly increasing
circuit speeds, the rising chip temperatures, and the growing
weight of the various parasitic effects in the modern designs.
Circuit delay in VLSI circuits consists of two components:
the delays of electrical signals through the wires (known as the
interconnect propagation delay) and the 50% propagation
delay of the driving gates (known as the gate propagation
delay). Consider the circuit in Figure 1. The overall delay from
output pin A of gate G
1
to the output pin C of the gate G
2
(which will be referred to as the stage delay) is written as the
sum of the interconnect delay from output pin A to the input
pin B of gate G
2
and gate propagation delay from input pin B to
the output pin C of the gate G
2
:
AC
AB
BC
Delay
Delay
Delay
=
+
(1)
This stage delay definition is used due to the fact that the
error in estimating the input slew of a gate does not reflect as a
dramatic error in the output slew of that gate. Therefore,
according to this definition, the stages can be considered
independent from each other [1].
The most efficient method to calculate the interconnect
delays in VLSI circuits is by using the first moment of the
impulse response, also known as the Elmore delay, which is an
upper-bound on the 50% propagation delay in RC trees [2]. To
improve the accuracy of the Elmore delay, higher order
moment matching techniques and model-order reduction
methods have been employed [3], [4], [5]. Indeed, the higher-
order moment matching methods can result in a very high
accuracy for RC trees while being much faster than the SPICE
[6] simulation. However, state-of-the-art
EDA tools should be
able to calculate the path delay and propagated slew very
efficiently [7], [8], i.e., in nearly linear time in the size of the
RC tree. This is mainly because of the scale of such
calculations both in terms of the number of RC tree
configurations that must be processed in a large complex
design in order to complete its timing calculation and the
number of times such calculations must be repeated during the
circuit optimization flow in order to achieve a high quality
design solution. Even the fastest higher-order moment
matching techniques is quite inefficient to be employed in the
static timing analysis engines of high-capacity EDA tools.
Therefore, despite its inherent, yet well-known and well-
documented, shortcomings of the Elmore delay (cf. [10]), the
runtime efficiency of the Elmore delay calculator makes it the
top choice of delay calculation schemes in many of the
commercial tools both during the front-end and back-end
optimizations.
A
B
C
G
1
G
2
Figure 1: General delay model of an RC tree driven by a
CMOS gate and driving other gates.
The gate propagation delay can be divided into two terms:
the intrinsic gate delay and the (extrinsic) gate load delay. The
intrinsic gate delay arises from the native characteristics of the
CMOS transistors as switching devices in the gates whereas the
gate load delay accounts for the timing effect of the load of a
logic cell on its input-to-output switching speed. Clearly, the
intrinsic gate delay is equal to the gate propagation delay under
zero load conditions.
Figure 2(a) depicts a CMOS gate, which drives a purely
capacitive load (C
L
), where one of its inputs switches with a
signal transition time of T
in
causing a change in the output of
the gate. The gate propagation delay is a function of the input
transition time and the output load. In commercial ASIC cell
libraries, it is usual to characterize different output transition
times (e.g. 10%, 50%, and 90%) as a function of the input
transition time and output capacitance by:
( ,
)
in
L
t
f T C
=
(2)
where
denotes the percentage of the output transition time, t
is the output delay with respect to the 50% point of the input
signal, and f
is the corresponding delay function.
In VDSM technologies, the effect of interconnect
resistances on the gate load delay should not be ignored. Using
the sum of all gate and interconnect capacitances (known as the
total-capacitance method) as the capacitive load of a logic cell
tends to yield very pessimistic gate load delay estimates [8]. A
better approximation for an n
th
order RC load (i.e., one with n
distributed capacitances to ground) as seen at the output node
of a logic cell is to use a second order RC-
model [11], [12].
From the RC-
load, the timing analysis engine can calculate
an appropriate capacitance (known as the effective-capacitance
method) as the load for gate delay calculation and input-to-
output slew propagation [13], [15].
Among the various sources of error that affect the accuracy
of a path delay calculator, the most important factor is the error
in slews. The reason is that if the input slew of some gate or
interconnect segment on the path is miscalculated, this error
will potentially not only propagate to the path of the output, but
is also exacerbated along the path because of the strong
dependency of the output slew and delay of the gate or
interconnect on the input slew. Thus, it is important to have a
delay calculator that is capable of accurately calculating slew
rates along a path.
V(v)
t(s)
Gate/Cell
T
in
C
L
90%
50%
10%
t
10%
t
50%
t
90%
(a)
(b)
Figure 2: (a) A gate driving a capacitive load, (b) explanation
of t
.
.
.
.
Most EDA tools use a progressive refinement approach
combined with local optimizations to complete physical design
of a target circuit. Local design iteration loops comprising of a
number of optimization steps are commonplace. Global design
iteration loops are possible but undesirable due to the cost of
such iterations and the dangers of design closure failure [7]
.
Many of the design decisions made during these local
optimization steps closely rely on the results of a static timing
analysis engine to determine the circuit delay in general and the
timing-critical paths in particular. Early in the physical design
process, there is a lot of uncertainty about the exact locations of
logic cells, the routing, and the I/O assignments. As a result,
the interconnect capacitance values that are used at this early
stage tend to exhibit a high degree of inaccuracy. That is in fact
why statistical wire load delay models were proposed in the
first place and were heavily used until a few years ago, when it
became clear that the only way to get timing closure on a high-
performance circuit design is to adopt a progressive refinement
approach whereby the early design planning (including netlist
partitioning, macro-cell placement, top-level routing, etc.) and
detailed netlist optimizations (including cell selection,
buffering, gate and wire sizing, etc.) go hand in hand.
Regardless of all this, the fact remains that the early
interconnect load estimates can be quite coarse. Therefore,
using an elaborate timing analyzer that provides highly
accurate delay estimates, albeit at a high computational cost, is
in fact overkill.
The amount of stage delay error is very dependent on the
methods used for calculating the gate and interconnect delays.
A comparison between the Elmore and the AWE methods for
interconnect delay and slew calculation for a commercial high-
performance 90nm design (with 120,000 gates for all of its