Self-Healing Asynchronous Arrays
s content.
Self-Healing Asynchronous Arrays
Self-Healing Asynchronous Arrays
Song Peng and Rajit Manohar
Computer Systems Laboratory
Cornell University
Ithaca, NY 14853, USA
Abstract
This paper presents a systematic method for designing
of a self-healing asynchronous array in the presence of er-
rors. By adding spare resources in one of three different
ways and forcing the asynchronous circuit to stall in case
of failure, the specic self-reconguration logic is activated
by a deadlock detector and the array circuit can be recon-
gured around the faulty components and recover from er-
rors automatically. Experimental evaluations show that this
method requires less hardware cost, smaller critical circuit
size, lower performance overhead and is more scalable than
traditional NMR-based techniques.
1
Introduction
The continuous advance of microelectronics has led to
a substantial reduction in both transistor dimensions and
power supply voltages, helping VLSI circuits operate faster
and consume less active power. However, technology scal-
ing causes circuits to be more sensitive to defects in fabri-
cation [3] and threatens the nearly unlimited lifetime relia-
bility standards that we have come to expect [18]. The re-
duced amount of charge stored on circuit nodes also makes
circuits more susceptible to transient faults [3]. Thus, fault
tolerant design, which improves both fabrication yield and
chip reliability, is once again becoming an important issue.
While there is a wealth of literature that examines fault
tolerance in clocked logic [8], less attention has been paid to
asynchronous circuits. The absence of clock signals means
that a faulty clockless circuit might exhibit problems that
would not normally arise in a clocked system [9], making
existing fault tolerance techniques for synchronous systems
ineffective or inefcient. For instance, the most widely used
approach to achieving fault tolerance in clocked VLSI sys-
tems is the hardwired duplication-and-comparison method
such as N-modular Redundancy (NMR) [8]. However, it is
non-trivial to apply such duplication-and-comparison tech-
niques to asynchronous logic without signicant gate tim-
E-mail: {
speng,rajit
}
@csl.cornell.edu
ing assumptions [19]. Unlike clocked systems where the
outputs from all replicas can be sampled at the same time
and thus easily compared against each other, the local hand-
shake in asynchronous circuits makes it unclear when the
non-directly related outputs are expected to match. In ad-
dition, faults in asynchronous logic may prevent the result
from appearing on the output, permanently blocking the
comparison procedure.
Besides hardwired duplication-and-comparison, another
possible fault tolerance approach, which can be conve-
niently formulated as a graph problem [5], is to utilize
self-checking and reconguration to maintain functional-
ity in the presence of failures. Although this approach in-
curs fault detection and reconguration overheads as well
as fault recovery time, smaller hardware redundancy and
less power consumption make it an attractive defect/fault
tolerance method [3]. Moreover, the absence of a compar-
ison procedure makes this approach better suited for asyn-
chronous circuits.
To reduce design complexity, a systematic way to build
a recongurable fault tolerant asynchronous system is to
make each of its components fault tolerant. In a digital
VLSI system, many computation modules such as adders,
array multipliers, FIR lters, etc, can be modeled as a lin-
ear array or a collection of linear arrays with external inputs
and outputs, given that communication propagates linearly
through them. Thus, the construction of a self-healing asyn-
chronous array provides the basis for recongurable fault
tolerant asynchronous VLSI design at ne-grained level.
The class of asynchronous circuits considered in this pa-
per, are quasi-delay-insensitive (QDI). QDI circuits are de-
signed to operate correctly under the assumption that gates
and wires have arbitrary nite delay, except for a small num-
ber of special wires known as isochronic forks [12]. A QDI
system can be taken as a collection of concurrent hardware
modules (called processes) that communicate atomic data
items (called tokens) with each other through one-to-one
message-passing channels. The message-passing channels
usually consist of data and acknowledge rails. The notion
of causality and event-ordering is implemented in terms of
handshake protocols on those channels [12].
The following contributions are made in this paper. First,
1
we propose a general framework of recongurable fault tol-
erant design for asynchronous circuits, as well as the 2- and
3-Dimensional implementation methods (Section 2). Sec-
ond, we develop three fault tolerant array models for this
framework (Section 3) and present the construction of cor-
responding self-reconguration logic (Section 4). Third,
we evaluate all the self-healing designs of different array
models, and show that they result in smaller hardware cost,
higher performance and lower energy overheads than tra-
ditional NMR method, as well as better scalability (Sec-
tion 5). Fourth, we analyze the relationship between recon-
guration complexity and spare resource cost, compare the
self-healing designs of different array models, and assess
the advantages of each scheme (Section 5).
2
General Framework of Self-Healing Asyn-
chronous Circuit Design
In this section, we propose a general framework of self-
healing asynchronous circuits with respect to an arbitrary
number of hard and soft errors, which is shown as Figure 1.
w/ Fault Tolerant
Reconfiguration
Detection
Target Circuit
Logic
Deadlock
Graph Topology
Figure 1. Block diagram of a recongurable
self-healing asynchronous circuit.
The target asynchronous circuit is built on a K</i>-fault tol-
erant graph model with spare resources. Pass gates, whose
control inputs come from the reconguration logic, are
added to the wires of graph edges to make the target circuit
recongurable. Self-checking logic is added to the target
circuit to achieve deadlock in the presence of failure (fail-
stop). When the target circuit deadlocks, the deadlock de-
tection logic recognizes this and activates the online recon-
guration logic, which recongures the target circuit around
the faulty components. The computation restarts from the
beginning or the last architectural checkpoint after the cir-
cuit has been recongured.
Since no extra circuitry other than self-checking logic
and pass-gates is on the critical path, small performance
overhead is expected in this recongurable fault tolerant de-
sign. Moreover, there is no switching activity in the recon-
guration logic when the target asynchronous circuit op-
erates correctly, therefore low energy overhead is also an-
ticipated. Unlike hardwired NMR method where the la-
tency severely increases with large K due to the dramati-
cally higher complexity of the voter, performance overhead
of this fault tolerant design does not increase signicantly
with K because the number of gates being used for each
conguration remains the same. Thus, this recongurable
fault tolerant design is expected to scale well in terms of
performance and energy overhead with respect to K.
In this framework, the reconguration logic and dead-
lock detection circuits must be fault-free to achieve fault
tolerance. Thus, those circuits are critical (error-sensitive)
and must be made highly reliable. With traditional 2D (2-
Dimensional) integration technology, those circuits could
be implemented using conservative layout design rules and
large transistor sizing (even with thicker oxide). With re-
cent 3D integration technology [2] where planar device lay-
ers are stacked in a three-dimensional structure and adja-
cent device planes can be connected by short and verti-
cal wires, all the error-sensitive transistors can be placed
onto a separate device layer which is fabricated with a ro-
bust/conservative (micron or submicron) technology, while
the target circuits are placed onto another device layer with
an aggressive (deep-submicron or nanometer) technology.
We choose a half-buffer based circuit template (called
precharge half buffers (PCHB)) [10] as the target QDI cir-
cuit. A PCHB circuit can have multiple inputs and outputs,
and it can be used to construct almost any pipelined QDI
logic. For instance, the asynchronous MiniMIPS micropro-
cessor [13] uses PCHBs for more than 90% of its circuits.
Thus, implementing self-healing behavior in PCHB circuits
takes an important step toward fault tolerance in general
asynchronous logic. Similar to a precharge domino circuit
in synchronous design, a PCHB circuit performs computa-
tions using pull-down (NMOS) networks, making it fast. In
this circuit, each variable is usually dual-rail encoded with
an explicit acknowledge. Validity and neutrality of the in-
puts and the output(s) are checked and synchronized (by
C-elements), which generates the common acknowledge to
all inputs and precharge/enable signal for data computation.
By adding separate data validity rail to each variable,
replicating all explicit acknowledges and crosschecking be-
tween duplicated internal control signals, we developed a
PCHB-based circuit template (called FS-PCHB) [15] which
achieves fail-stop with respect to both hard and soft errors.
Figure 2 shows the block diagram of a FS-PCHB circuit.
In Figure 2,