Distributed Hot Swap Management in an Embedded System

he Host then has an
opportunity to prepare the board before
initiating the second phase by lighting a
blue LED on the board, which indicates
that it is now safe to fully extract the
board. Hot Swap thus increases the avail-
ability of CompactPCI systems by allow-
ing the removal and replacement of
peripheral boards without the need to
power down the entire chassis. Unfor-
tunately, the System Host is still a single
point of failure. If the System Host fails or
needs to be upgraded, the entire chassis
must still be powered down in order for it
to be removed and replaced.
The Hot Swap specification imposes cer-
tain requirements on the software running
on the System Host. In particular, when
detecting that a peripheral board is about
to be removed, the System Host is re-
quired to ensure that the board being
extracted no longer accesses the PCI bus,
and that the board is not accessed from the
PCI bus. This is necessary to ensure the
integrity of signals on the bus as the board
is removed. This requirement is reason-
ably easy to meet when all communica-
tion is between the peripheral board and
the System Host, but significantly harder
if there is direct communication between
peripheral boards.
An alternative approach
In many CompactPCI systems, periph-
eral boards have a reasonable amount of
intelligence. They may have fully func-
tional CPUs and even run Operating
Systems. For these boards, it is natural
that they communicate directly with one
another.
In systems with a large amount of data to
process in a short period of time, the band-
width of the CompactPCI bus may not be
sufficient. This is exacerbated by the fact
that the available bandwidth is shared
between all the peripheral boards and the
System Host.
In such a system, it may make sense for
peripheral boards to communicate directly
via some other medium, quite possibly in
a point-to-point manner. This is the impe-
tus behind PICMG 2.16, the forthcoming
PICMG 2.18, and other related standards
that introduce alternative communication
links between peripheral boards. In sys-
tems such as these, the CompactPCI back-
plane may be little more than a source of
power and a clock. A clock signal can eas-
ily be generated on the peripheral boards
if it is not present on the backplane but is
needed for a local PCI bus on the board.
This leaves the non-redundant System
Host with just one job, supporting Hot
Swap of the peripheral boards.
Removing the system host
If
the work involved in handling Hot Swap
events could be distributed to the periph-
eral boards, the System Host could be
removed completely, eliminating the sin-
gle point of failure.
M
H O T S W A P
TECHNOLOGY FEATURE:
H O T S W A P
Many modern embedded systems have a need for minimal downtime due to failures, maintenance, or upgrades, that are
addressed by permitting insertion and removal of boards while the system is active. Both CompactPCI and VME systems
support live insertion and/or removal of boards called Hot Swap. It is important that notification of such a change in
configuration is communicated appropriately throughout the system so that it can react properly, perhaps by redistribut-
ing work between the available resources or shutting down any direct communication between boards before the boards
involved are physically removed from the system. In a CompactPCI system, the notification takes the form of an interrupt
to a System Host that takes the appropriate action. However, not only does the System Host constitute a single point of
failure, but in many systems where there is no other traffic over the CompactPCI bus, handling this interrupt may be the
only reason for the System Host to be present at all. Elimination of this centralized control function would therefore lead
to a cheaper, more robust and optimal system architecture.
In an embedded multiprocessor system, entities that need to know about insertion or removal events may be distributed
throughout the system. Such entities may care about Hot Swap events in one of two ways. They may be interested in
knowing about a Hot Swap event within some defined time of it having occurred, or they may need to know about an
impending Hot Swap event and take some action before it occurs. In the former case, asynchronous notification of com-
pletion of the event is sufficient, however the latter case demands synchronous action on the part of the interested entity.
The mechanism used for both cases should ideally provide complete decoupling of interested entities and sources of Hot
Swap events and achieve an acceptable response time so that completion and notification of events is not unduly delayed.
This article discusses how a modified version of the Real-Time CORBA Notification Service may be used to meet these
requirements.
Distributed Hot Swap management in an
embedded system
Distributed Hot Swap management in an
embedded system
By Chris Brand and Geoff Holt Reprinted from
CompactPCI Systems
/ November 2003
〤opyright 2003
In a CompactPCI Hot Swap system, per-
ipheral boards have a micro-switch at-
tached to the lower board latch. When the
latch changes state between open and
closed, the System Host is interrupted and
can deal with the event. A Hot Swap Con-
trol and Status Register in configuration
space of the device are used to communi-
cate the details of the event. With a CPU
on the peripheral board, it is relatively
simple to implement similar functionality
entirely on the board itself. Opening or
closing the latch generates an interrupt to
the CPU on the peripheral board. That
CPU can query the hardware to determine
exactly what happened and how to re-
spond. It is even possible to implement a
register that behaves exactly like a Hot
Swap Control and Status Register, in
which case the software that runs on the
local CPU to handle Hot Swap events may
be exactly the same software that usually
runs on the System Host. This allows each
peripheral board to independently detect
that it is about to be removed.
In a system with no System Host, there
can be no communication over the Com-
pactPCI backplane, so there is nothing
there that needs stopping. There may of
course be other communication links that
need to be moved to a quiescent state
however. There is a need to handle inser-
tion as well as extraction. An insertion
may look exactly like a power-up, the
board moves from an un-powered state to
one where it has power, but there is also
the possibility that the latch was opened,
causing the extraction processing to be
performed, and then re-closed without
actually extracting the board and causing
it to power-cycle. This change in state
needs to be handled in much the same way
as when the board first gets powered up.
It is possible to design a board that detects
if there is a clock on the CompactPCI bus.
If it detects such a clock, it should meet all
the normal CompactPCI requirements. If
it does not detect such a clock, it can
assume that the CompactPCI backplane
is not being used for communication and
that there is no System Host present. In
this case, the local CPU can take over the
handling of Hot Swap management tasks
as described previously.
Types of interaction with
Hot Swap events
In an embedded system with no central-
ized control, entities that care about inser-
tion or removal events (Hot Swap event
consumers) may be distributed on differ-
ent heterogeneous processors throughout
the system.
Such consumers may care about Hot
Swap events in one of two ways. They
may merely need to react to a Hot Swap
event after it has happened. For example
different consumers may need to: Initialize hardware that has been
inserted. Instantiate proxies for hardware that
has been inserted. Update the system state on a user
interface.
Although an asynchronous mechanism is
sufficient, in fact desirable for these types
of notifications, it still needs to be efficient
enough to achieve acceptable response
times.
Alternatively they may need to be directly
involved in an impending Hot Swap
extraction event in order to perform some
processing before the extraction actually
happens. For example different consumers
may need to: Prepare them and save their state if
they are physically located on the
hardware to be removed. Shutdown communications if
they have a direct connection to the
hardware to be removed. Reconfigure the application if they
are acting in some type of system
supervisory role.
Clearly a synchronous mechanism is re-
quired to notify all directly involved enti-
ties in a timely manner and ensure that
they have completed their processing
before extraction is allowed to proceed.
General requirements for
decentralized Hot Swap event
notification
In order to support decentralized Hot
Swap in a heterogeneous computing envi-
ronment, an event notification mechanism
is required that: Provides both efficient synchronous
and asynchronous notification
services (while not required, it is
desirable that these services are
accessed via a common interface). Supports diverse target processors. Supports location transparency so
that events are distributed identically
irrespective of whether they are local
to the hardware in question, within
the same chassis, or somewhere else
in an arbitrarily large system. Provides loose (i.e. determined at
run-time) coupling between sources
and consumers of Hot Swap events
(since system configuration is by
definition changing on the fly). Is