IRECT

dding=10 cellspacing=0 width=100%>Yahoo! is not affiliated with the authors of this page or responsible for its content.
IRECT C
onventional DRAM architectures have
reached their practical upper limit in
operating frequency and bus width.
Mass-market CPUs operating at over 200 MHz
and media processors executing more than 2
GOPs (gigaoperations per second)
1,2
are now
in production. Their external memory band-
width of approximately 500 Mbytes/s cannot
meet increasing application demands. In
addition, no longer does just the CPU con-
sume the majority of main memory band-
width. A modern multimedia PCs graphics
accelerator, media processor, and system I/O
all consume signicant memory bandwidth.
Efforts to extend conventional DRAMs
have included scaling the SDRAMs memo-
ry clock from 66-MHz to 100-MHz operation.
However this adaptation created numerous
system design issues while offering only 33%
additional peak bandwidth. This article
explores the memory bandwidth scaling
problem and then describes our solution,
the Direct RDRAM device, which success-
fully meets multimedia requirements and ts
seamlessly into the PC chassis.
Bandwidth scaling problem
A users perception of interactivity and
performance of a multimedia computer is
largely determined by processing through-
put. Fast processors are necessary, but mem-
ory bandwidth also plays a key, though
often-overlooked, role.
Because current memory subsystems can
only transfer data for one requester at a time,
the length of time required to nish a trans-
fer in progress adds to the latency of any
pending requests. For a given bus width and
clock frequency, the amount of time the bus
is occupied depends on the transfer size and
the memory bus bandwidth. Therefore,
memory bandwidth directly affects memory
system latency.
In multimedia computing, bandwidth-
dependent latency is a dominant factor in
the memory subsystems performance. The
focus of mass market computing on band-
width-intensive multimedia-oriented appli-
cations further increases the memory
bandwidth requirements. So, in next-gener-
ation multimedia PCs, memory bandwidth
will largely determine a users perception of
interactivity and performance.
Traditional approaches to increasing mem-
ory bandwidth include speeding up the mem-
ory clock, increasing the bus width, or both.
For conventional DRAMs, these approaches
are reaching their practical limits.
Clock rate scaling. This approach is the
most technically challenging. The legacy
matrix interconnection topology of SDRAM-
based systems simply does not lend itself to
economic scaling beyond 100 MHz. Even
the transition from 66-MHz to 100-MHz sys-
tem operation is expected to be challenging
due to stringent system timing requirements
that dictate precise component and PCB
modeling.
3
There are several classes of system nets in
a conventional memory system. For exam-
ple, an SDRAM-based system may have an
address net, a clock net, a data net, a DQM
net, and a control net (CS, WE, RAS, CAS)
(Figure 1). Each of the nets has a different
loading and settling time from the other nets.
A key issue limiting memory bus frequency
in these systems is the fact that the loading
on these nets increases nonuniformly from
net to net as memory modules are added to
the system (Figure 2a).
Motherboards are designed to operate
reliably at both minimum and maximum sys-
tem memory capacities. The system timing
depends on the signal loading, which, in
turn, depends on the number and storage
capacity of modules inserted. Since the delay
of the various nets scales nonuniformly, the
systems timing margin degrades.
18
IEEE Micro
0272-1732/97/$10.00 1997 IEEE
Providing three times
the memory
bandwidth of the 66-
MHz SDRAM
subsystem, Direct
RDRAM modules t
seamlessly into the
existing mechanical
space and airow
environment of the
industry-standard PC
chassis.
D
IRECT
R
AMBUS
T
ECHNOLOGY
:
T
HE
N
EW
M
AIN
M
EMORY
S
TANDARD
Richard Crisp
Rambus Inc.
. Another frequency-limiting factor in SDRAM-based sys-
tems results from the fact that the SDRAM modules (DIMMs)
are connected in parallel to the primary bus transmission
lines routed on the motherboard. Because each DIMM sig-
nal either has a heavy capacitive load or long module rout-
ing trace (or both), each DIMM signal represents a signicant
stub load on the motherboard. These stubs cause trouble-
some signal reections if left unterminated.
Systems that operate signicantly faster than 66 MHz need
faster DRAMs to deliver balanced performance. Often the
DRAM modules must be buffered, either on the module or
the motherboard. Though buffered modules reduce the
dependence of the motherboard timing on the module load-
ing, and reduce the effect of the stubs, they have a disad-
vantage. Buffered modules require additional components,
PCB area, routing, and system power. They also add one or
two clock cycles to every memory access, depending on the
extent of the buffering.
Data transfer. A second approach to increasing memory
bandwidth is transferring memory data on both clock edges
without changing the properties of any other nets. Since the
address net has the highest loading dependent delay, leav-
ing that network unchanged simplies the design task. Yet
one of the critical problems is meeting the required setup
and hold specifications for the data bus at each device.
Changing to a rising- and falling-edge clocked data bus nec-
essarily requires improved clock access time specications.
Current SDRAMs require nearly a whole clock cycle to
establish valid data on the output pins. For example, a
SDRAM with a 10-ns cycle time has a worst-case output delay
from the rising clock edge of 9 ns. To trigger the output
buffer to drive data on both clock edges while maintaining
a 10-ns minimum clock period, such SDRAMs must feature
a reduced output delay of at least a factor of two.
Due
to the difculty in meeting bus-timing constraints, the
maximum system clock frequency must be reduced from that
of a single-edge clocked system to avoid violating critical
timing specications.
4
Any clock rate reduction would there-
fore come at the expense of memory control bandwidth. In
many cases, memory control bandwidth limits performance
because each memory word may come from a nearby but
different address such as in the nonsequential accesses char-
acteristic of texture map rendering.
Increased bus width. A third scal-
ing approach involves increasing the
bus width to 128 bits. Although a sim-
ple idea electrically, it comes at the
expense of doubling pins, memory,
word width, I/O power, and memory
granularity. Furthermore, doubling the
bus width creates a host of mechani-
cal and PCB layout problems.
Since a wide, high-speed bus can
generate large transient currents in the
driver elements, a signicant number
of ground and power pins are need-
ed on the controller to support a large
number of bus I/O pins. Because a
64-bit SDRAM or page-mode/EDO
bus interface typically uses between 110 and 130 pins (count-
ing power and ground pins), a 128-bit-wide bus will have sig-
nicantly more than 200 pins.
Core logic chipsets used in PCs have ports for the CPU, the
graphics, one or more system I/O buses, and a 64-bit mem-
ory interface. These core logic chips require as many as 472
pins today,
5
so doubling the memory interface pins is not an
attractive option. The extra pins take more silicon area,
increase package cost, and increase on-chip supply noise.
Wide buses also consume more power. For example, a 128-
bit LVTTL bus operating at 100 MHz and driving a 3.3-V
swing into an 80-pF load/pin consumes over 5.5 watts ver-
sus 2.75 watts for a 64-bit bus.
Increasing the bus width also increases the memory gran-
ularity. For a 128-bit bus using 64-Mbit devices (4M

16), the
minimum memory capacity is 64 Mbytes. If

8 devices are
used, the granularity jumps to 128 Mbytes. When the 256-
Mbit generation reaches cost parity with the 64-Mbit genera-
tion, the granularity issue will worsen by a factor of four. The
granularity issue is particularly important in applications that
require only a small amount of high-bandwidth memory such
November/December 1997
19
.
Memory
control
Control
Conventional DRAM modules
Data/
DQM
Addr/
clock
Figure 1. Nets in a conventional DRAM system.
Address
Clock
Data
Control
Memory capacity
Loading, dela
y
Address,
clock,
data,
control
Memory capacity
Loading, dela
y
(a)
(b)
Figure 2. Signal loading versus memory capacity for SDRAM (a) and Direct RDRAM
(b) systems. as 3D graphics and DVD playback.
Direct Rambus technology
Our solution, the Direct Rambus DRAM (RDRAM), takes
another approach that provides 1.6-Gbytes/s bandwidth from
a single DRAM. It nears 95% efciency when subjected to
typical multimedia PC main memory
workloads. Using a 16-bit data eld
and a separate 8-bit address and con-
trol eld, a Direct RDRAM indepen-
dently controls and schedules all row
and column resources as well as I/O
data. Direct RDRAMs, while using
conventional PCB and connector
technology, bring high speed and
low-power operating modes to serve
the needs of both line-operated and
portable products.
Our technology uses a narrow bus
topology operating at a high clock
rate to solve the memory bandwidth
problem. A Direct Rambus channel
includes a controller and one or