Energy-Efficient GHz-Class Charge-Recovery Logic

gic
Visvesh S. Sathe, Member, IEEE, Juang-Ying Chueh, Member, IEEE, and
Marios C. Papaefthymiou, Senior Member, IEEE
AbstractIn this paper, we present Boost Logic, a charge-
recovery circuit family that can operate efciently at clock fre-
quencies in excess of 1 GHz. To achieve high energy efciency,
Boost Logic relies on a combination of aggressive voltage scaling,
gate overdrive, and charge-recovery techniques. In post-layout
simulations of 16-bit multipliers with a 0.13- m CMOS process at
1 GHz, a Boost Logic implementation achieves 5 times higher en-
ergy efciency than its minimum-energy pipelined, voltage-scaled,
static CMOS counterpart at the expense of 3 times longer latency.
In a fully integrated test chip implemented using a 0.13- m bulk
silicon process and on-chip inductors, chains of Boost Logic gates
operate at clock frequencies up to 1.3 GHz with a 1.5-V supply.
When resonating at 850 MHz with a 1.2-V supply, the Boost Logic
test chip achieves 60% charge-recovery.
Index TermsAdiabatic, charge-recovery, energy recovery, res-
onant systems.
I. I
NTRODUCTION
P
OWER minimization has become a primary concern in
VLSI design. Several conventional techniques are utilized
to curb dynamic and leakage power in conventional CMOS cir-
cuits. One of the most effective methods is pipelining and subse-
quent voltage scaling to minimize energy dissipation at a given
operating frequency. At high operating frequencies, however,
the energy and delay overhead of pipeline registers becomes sig-
nicant and degrades overall system efciency.
In systems with signicant switching activity, charge-
recovery circuits have the potential to dissipate less energy than
their pipelined, voltage-scaled CMOS counterparts. Several
charge-recovery logic styles have been proposed [1][5]. Over
a range of relatively low operating frequencies (a few hun-
dred megahertz), these charge-recovery techniques have been
shown to achieve lower energy dissipation when compared to
voltage-scaled CMOS. Achieving energy savings over CMOS
at higher operating frequencies has remained elusive, however.
Although performance limits of charge-recovery circuits
are fundamentally determined by the need for gradually tran-
sitioning power-clocks, prevalent operating frequencies in
charge-recovery circuits are more a consequence of design than
any such fundamental constraint. Some of the main factors that
lead to lower speeds in charge-recovery circuits are the use of
diode-connected transistors [6], [7], the use of pMOS devices
Manuscript received April 21, 2006; revised August 10, 2006. This
work was supported in part by the U.S. Army Research Ofce under Grant
DAADA19-03-1-0122.
The authors are with the Department of Electrical Engineering and Computer
Science, University of Michigan, Ann Arbor, MI 48109-2121 USA (e-mail:
vssathe@eecs.umich.edu).
Digital Object Identier 10.1109/JSSC.2006.885053
in evaluation trees [8], [9], and the excessive time required to
resolve the complementary outputs of the dual-rail gates during
evaluation [2], [4], [10].
In this paper, we present a novel dynamic charge-recovery
logic family called Boost Logic [11]. Boost Logic achieves sig-
nicant energy savings over voltage-scaled static CMOS across
a range of frequencies much higher than currently demonstrated
in charge-recovery literature. A unique feature of Boost Logic
gates that enables energy-efcient and high-throughput opera-
tion is an aggressively scaled, conventionally switching Logic
stage that operates in tandem with a charge-recovery Boost
stage. Logic performs the logical evaluation of a Boost Logic
gate operating at an ultra-low DC supply voltage of approxi-
mately one threshold voltage,
. After Logic pre-resolves the
differential outputs of a Boost Logic gate to the level of about
one threshold voltage, Boost amplies the difference between
the outputs nodes to the full rail in an energy-efcient charge-re-
covery manner, providing a large gate overdrive to fanout gates
and thereby reducing delay in their Logic stages. Thus, Boost
Logic achieves lower energy dissipation without incurring the
performance degradation typical of conventional voltage-scaled
designs.
Fig. 1(a) illustrates the concept behind Boost Logic. Each
Boost Logic gate consists of two parts operating in tandem over
nonoverlapping time intervals: A conventionally switching
logical evaluation stage (Logic) and a charge-recovering stage
(Boost). Fig. 1(b) shows simplied voltage waveforms of a
Boost Logic gate output. In the rst phase of its operation,
Logic resolves the output nodes to supply rails
and
. In
the second phase of its operation, Boost amplies this voltage
difference between the outputs by making them track comple-
mentary resonating clock signals
and , oscillating with peak
voltage
. These clocks will henceforth be referred to as
power-clocks. This full-rail swing provides fanout Logic stages
with a gate overdrive of
, allowing
them to perform evaluation at frequencies much higher than
expected of such aggressively voltage-scaled logic. Although
Boost enables aggressive voltage scaling in Logic, signicantly
reducing energy dissipation, it is vital that the power dissipation
of Boost itself does not nullify these advantages. To that end, an
initial voltage difference is provided to Boost by Logic, greatly
aiding its sense-amplifying action, and resulting in efcient
charge-recovery.
Although previously proposed logic families have used the
idea of increased gate overdrive through the use of bootstrapping
techniques [3], [12], these methods lack the robustness offered
by the use of a Boost stage. These methods are also limited in
the amount of achievable gate overdrive. More recently, LVS
logic has been proposed [13], where sense ampliers are used
to amplify low-swing gate outputs.
0018-9200/$20.00 2007 IEEE SATHE et al.: ENERGY-EFFICIENT GHz-CLASS CHARGE-RECOVERY LOGIC
39
Fig. 1. Boost Logic. (a) Cascade. (b) Simplied waveform.
We have characterized the performance and energy-efciency
of Boost Logic through extensive simulations. Specically, we
have explored the robustness of Boost Logic gates to clock skew.
We have also investigated the effect of power-supply variation
on the energy and performance of Boost Logic. To compara-
tively assess the energy efciency of Boost Logic, we have im-
plemented 16-bit carry-save multipliers in both Boost Logic and
pipelined, voltage-scaled static CMOS. Both multipliers have
been designed for operation at 1 GHz. In post-layout simulations
with extracted parasitics, Boost Logic achieves approximately
5 times higher energy efciency than its minimum-energy static
counterpart, although at the cost of 3 times longer latency.
Boost Logic derives signicant performance improvements
from the use of low
devices in its Logic stage. Post-layout
simulations of a 16-bit multiplier with low
devices result in
a 33% reduction in latency and a 32.6% reduction in energy dis-
sipation as compared to Boost Logic with regular
devices.
Beyond simulations, in this paper we provide measurements
from a fully-integrated test chip that demonstrates the operation
of Boost Logic gate cascades at operating frequencies exceeding
1 GHz. We present chip measurements obtained over a range
of frequencies in the neighborhood of its natural frequency.
Moreover, we present measurements exploring the trade-offs in-
volved in on-chip clock generation and the trade-offs between
the DC supply and power-clock voltages with regard to overall
chip efciency. When operating at its resonant frequency of
850 MHz, the Boost Logic chip achieves 60% recovery in its
resonant portion. Driven 17% off resonance at 1 GHz, it still re-
covers 40% of its resonating energy.
The remainder of the paper is organized as follows. In
Section II, we present Boost Logic and discuss its structure
and operation. We explain how the Boost stage achieves sig-
nicant charge-recovery at high frequencies. The results of
extensive simulations investigating the sensitivity of Boost
gates to power-supply uctuation and clock skew are discussed
in Section III. In Section IV, we compare the energy and
throughput of a 16-bit Boost Logic carry-save multipliers
with its voltage-scaled pipelined CMOS counterpart. We also
demonstrate the performance benets obtained from the use of
low
devices through simulation results obtained on a 16-bit
Boot Logic multiplier implemented with low
devices. In
Section V, we present the Boost Logic test chip along with
measurements obtained. Conclusions are given in Section VI.
II. B
OOST
L
OGIC
B
ASICS
In this section, we rst analyze the structure and operation
of Boost Logic. We subsequently consider the energy and delay
equations that govern the operation of Boost Logic and show
how Boost Logic achieves high throughput with signicant en-
ergy savings.
A. Structure
Fig. 2 shows the structure of a Boost Logic gate. Boost
Logic is a two-phase, dual-rail, partially charge-recovering
logic. The structure of a Boost gate can be divided into two
partslogical evaluation (Logic) and charge-recovering
amplication (Boost). Logic can be implemented in any
transistor topology as long as it supports the use of the clocked
transistors M5M8. These clocked transistors de-couple Logic
from the output nodes when Boost drives them. Depending
on the gate implementation and the operating frequency, we
have found the use of the clocked CMOS or pseudo-nMOS
logic styles (shown in Fig. 2) to be particularly effective. The
implementation of the clocked pseudo-nMOS logic evaluation
trades off the voltage difference in the pre-resolved output
nodes (pseudo