A Comparison of Bus Architectures for Safety-Critical Embedded Systems

e>
A Comparison of Bus Architectures for Safety-Critical Embedded Systems CSL Technical Report September 2001
Minor revision June 2002
A Comparison of Bus Architectures for Safety-Critical
Embedded Systems
John Rushby
Computer Science Laboratory
SRI International
Menlo Park CA 94025 USA
This research was supported by NASA Langley Research Center under contract
NAS1-20334 and Cooperative Agreement NCC-1-377 with Honeywell Tucson, and
by the DARPA MoBIES program under contract F33615-00-C-1700 with US Air
Force Research Laboratory.
Computer Science Laboratory 333 Ravenswood Ave. Menlo Park, CA 94025 (650) 326-6200 Facsimile: (650) 859-2844
Abstract
Avionics and control systems for aircraft use distributed, fault-tolerant computer sys-
tems to provide safety-critical functions such as ight and engine control. These systems
are becoming modular, meaning that they are based on standardized architectures and com-
ponents, and integrated, meaning that some of the components are shared by different
functionsof possibly different criticality levels.
The modular architectures that support these functions must provide mechanisms for
coordinating the distributed components that provide a single function (e.g., distributing
sensor readings and actuator commands appropriately, and assisting replicated components
to perform the function in a fault-tolerant manner), while protecting functions from faults
in each other. Such an architecture must tolerate hardware faults in its own components and
must provide very strong guarantees on the correctness and reliability of its own mecha-
nisms and services.
One of the essential services provided by this kind of modular architecture is communi-
cation of information from one distributed component to another, so a (physical or logical)
communication bus is one of its principal components, and the protocols used for control
and communication on the bus are among its principal mechanisms. Consequently, these
architectures are often referred to as buses (or databuses), although this term understates
their complexity, sophistication, and criticality.
The capabilities once found in aircraft buses are becoming available in buses aimed at
the automobile market, where the economies of scale ensure low prices. The low price of
the automobile buses then renders them attractive to certain aircraft applicationsprovided
they can achieve the safety required.
In this report, I describe and compare the architectures of two avionics and two auto-
mobile buses in the interest of deducing principles common to all of them, the main differ-
ences in their design choices, and the tradeoffs made. The avionics buses considered are
the Honeywell SAFEbus (the backplane data bus used in the Boeing 777 Airplane Informa-
tion Management System) and the NASA SPIDER (an architecture being developed as a
demonstrator for certication under the new DO-254 guidelines); the automobile buses con-
sidered are the TTTech Time-Triggered Architecture (TTA), recently adopted by Audi for
automobile applications, and by Honeywell for avionics and aircraft control functions, and
FlexRay, which is being developed by a consortium of BMW, DaimlerChrysler, Motorola,
and Philips.
I consider these buses from the perspective of their fault hypotheses, mechanisms, ser-
vices, and assurance.
i ii Contents
Contents
iii
List of Figures
v
1
Introduction
1
2
Comparison
11
2.1
The Four Buses
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.1.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.1.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2
Fault Hypothesis and Fault Containment Units
. . . . . . . . . . . . . . . .
13
2.2.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.2.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.2.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
Clock Synchronization
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.3.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.3.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.3.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.3.4
FlexRay
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.4
Bus Guardians
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.4.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.4.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.4.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.4.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.5
Startup and Restart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.5.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.5.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.5.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
iii 2.5.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.6
Services
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.6.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.6.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2.6.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.6.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.7
Flexibility
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.7.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.7.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.7.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.7.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.8
Assurance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.8.1
SAFEbus
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.8.2
TTA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.8.3
SPIDER
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.8.4
FlexRay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3
Conclusion
41
Bibliography
45
iv List of Figures
1.1
Generic Bus Conguration
. . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Bus Interconnect
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3
Star Interconnect
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.4
SPIDER Interconnect
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
v vi Chapter 1
Introduction
Embedded systems generally operate as closed-loop control systems: they repeatedly sam-
ple sensors, calculate appropriate control responses, and send those responses to actuators.
In safety-critical applications, such as y- and drive-by-wire (where there are no direct con-
nections between the pilot and the aircraft control surfaces, nor between the driver and the
car steering and brakes), requirements for ultra-high reliability demand fault tolerance and
extensive redundancy. The embedded system then becomes a distributed one, and the basic
control loop is complicated by mechanisms for synchronization, voting, and redundancy
management.
Systems used in safety-critical applications have traditionally been federated, meaning
that each function (e.g., autopilot or autothrottle in an aircraft, and brakes or suspension
in a car) has its own fault-tolerant embedded control system with only minor interconnec-
tions to the systems of other functions. This provides a strong barrier to fault propagation:
because the systems supporting different functions do not share resources, the failure of
one function has little effect on the continued operation of others. The federated approach
is expensive, however (because each function has its own replicated system), so recent ap-
plications are moving toward more integrated solutions in which some resources are shared
across different functions. The new danger here is that faults may propagate from one func-
tion to another; partitioning is the problem of restoring to integrated systems the strong de-
fenses against fault propagation that are naturally present in federated systems. A dual issue
is that of strong composability: here we would like to take separately developed functions
and have them run without interference on an integrated system platform with negligible
integration effort.
The problems of fault tolerance, partitioning, and strong composability are challenging
ones. If handled in an ad-hoc manner, their mechanisms can become the primary sources of
faults and of un</i>reliability in the resulting architecture [
Mac88
]. Fortunately, most aspects
of these problems are independent of the particular functions concerned, and they can be
handled in a principled and correct manner by generic mechanisms implemented as an
architecture for distribut