A review of process fault detection and diagnosis Part I: Quantitative ...

s
Venkat Venkatasubramanian
a,
*
, Raghunathan Rengaswamy
b,
*
, Kewen Yin
c
,
Surya N. Kavuri
d
a
Laboratory for Intelligent Process Systems, School of Chemical Engineering, Purdue Uni
versity, West Lafayette, IN 47907, USA
b
Department of Chemical Engineering, Clarkson Uni
versity, Potsdam, NY 13699-5705, USA
c
Department of Wood and Paper Science, Uni
versity of Minnesota, St. Paul, MN 55108, USA
d
BP, Houston, TX, USA
Received 12 February 2001; accepted 22 April 2002
Abstract
Fault detection and diagnosis is an important problem in process engineering. It is the central component of abnormal event
management (AEM) which has attracted a lot of attention recently. AEM deals with the timely detection, diagnosis and correction
of abnormal conditions of faults in a process. Early detection and diagnosis of process faults while the plant is still operating in a
controllable region can help avoid abnormal event progression and reduce productivity loss. Since the petrochemical industries lose
an estimated 20 billion dollars every year, they have rated AEM as their number one problem that needs to be solved. Hence, there is
considerable interest in this field now from industrial practitioners as well as academic researchers, as opposed to a decade or so ago.
There is an abundance of literature on process fault diagnosis ranging from analytical methods to artificial intelligence and statistical
approaches. From a modelling perspective, there are methods that require accurate process models, semi-quantitative models, or
qualitative models. At the other end of the spectrum, there are methods that do not assume any form of model information and rely
only on historic process data. In addition, given the process knowledge, there are different search techniques that can be applied to
perform diagnosis. Such a collection of bewildering array of methodologies and alternatives often poses a difficult challenge to any
aspirant who is not a specialist in these techniques. Some of these ideas seem so far apart from one another that a non-expert
researcher or practitioner is often left wondering about the suitability of a method for his or her diagnostic situation. While there
have been some excellent reviews in this field in the past, they often focused on a particular branch, such as analytical models, of this
broad discipline. The basic aim of this three part series of papers is to provide a systematic and comparative study of various
diagnostic methods from different perspectives. We broadly classify fault diagnosis methods into three general categories and review
them in three parts. They are quantitative model-based methods, qualitative model-based methods, and process history based
methods. In the first part of the series, the problem of fault diagnosis is introduced and approaches based on quantitative models are
reviewed. In the remaining two parts, methods based on qualitative models and process history data are reviewed. Furthermore,
these disparate methods will be compared and evaluated based on a common set of criteria introduced in the first part of the series.
We conclude the series with a discussion on the relationship of fault diagnosis to other process operations and on emerging trends
such as hybrid blackboard-based frameworks for fault diagnosis.
# 2002 Published by Elsevier Science Ltd.
Keywords: Fault detection; Diagnosis; Process safety
1. Introduction
The discipline of process control has made tremen-
dous advances in the last three decades with the advent
of computer control of complex processes. Low-level
control actions such as opening and closing valves,
called regulatory control, which used to be performed by
* Corresponding authors. Tel.:
/
1-765-494-0734; fax:
/
1-765-494-
0805 (V. Venkatsubramanian), Tel.:
/
1-315-268-4423; fax:
/
1-315-
268-6654 (R. Rengaswamy).
E-mail
addresses:
venkat@ccn.purdue.edu
(V.
Venkatasubramanian),
raghu@clarkson.edu
(R. Rengaswamy).
Computers and Chemical Engineering 27 (2003) 293
/
311
www.elsevier.com/locate/compchemeng
0098-1354/03/$ - see front matter
# 2002 Published by Elsevier Science Ltd.
PII: S 0 0 9 8 - 1 3 5 4 ( 0 2 ) 0 0 1 6 0 - 6 human operators are now routinely performed in an
automated manner with the aid of computers with
considerable success. With progress in distributed con-
trol and model predictive control systems, the benefits to
various industrial segments such as chemical, petro-
chemical, cement, steel, power and desalination indus-
tries have been enormous. However, a very important
control task in managing process plants still remains
largely a manual activity, performed by human opera-
tors. This is the task of responding to abnormal events
in a process. This involves the timely detection of an
abnormal event, diagnosing its causal origins and then
taking appropriate supervisory control decisions and
actions to bring the process back to a normal, safe,
operating state. This entire activity has come to be called
Abnormal Event Management (AEM), a key compo-
nent of supervisory control.
However, this complete reliance on human operators
to cope with such abnormal events and emergencies has
become increasingly difficult due to several factors. It is
difficult due to the broad scope of the diagnostic activity
that encompasses a variety of malfunctions such as
process unit failures, process unit degradation, para-
meter drifts and so on. It is further complicated by the
size and complexity of modern process plants. For
example, in a large process plant there may be as
many as 1500 process variables observed every few
seconds (
Bailey, 1984
) leading to information overload.
In addition, often the emphasis is on quick diagnosis
which poses certain constraints and demands on the
diagnostic activity. Furthermore, the task of fault
diagnosis is made difficult by the fact that the process
measurements may often be insufficient, incomplete
and/or unreliable due to a variety of causes such as
sensor biases or failures.
Given such difficult conditions, it should come as no
surprise that human operators tend to make erroneous
decisions and take actions which make matters even
worse, as reported in the literature. Industrial statistics
show that about 70% of the industrial accidents are
caused by human errors. These abnormal events have
significant economic, safety and environmental impact.
Despite advances in computer-based control of chemical
plants, the fact that two of the worst ever chemical plant
accidents, namely Union Carbides Bhopal, India,
accident and Occidental Petroleums Piper Alpha acci-
dent (
Lees, 1996
), happened in recent times is a
troubling development. Another major recent incident
is the explosion at the Kuwait Petrochemicals Mina Al-
Ahmedi refinery in June of 2000, which resulted in
about 100 million dollars in damages.
Further, industrial statistics have shown that even
though major catastrophes and disasters from chemical
plant failures may be infrequent, minor accidents are
very common, occurring on a day to day basis, resulting
in many occupational injuries, illnesses, and costing the
society billions of dollars every year (
Bureau of Labor
Statistics, 1998; McGraw-Hill Economics, 1985; Na-
tional Safety Council, 1999
). It is estimated that the
petrochemical industry alone in the US incurs approxi-
mately 20 billion dollars in annual losses due to poor
AEM (
Nimmo, 1995
). The cost is much more when one
includes similar situations in other industries such as
pharmaceutical, specialty chemicals, power and so on.
Similar accidents cost the British economy up to 27
billion dollars every year (
Laser, 2000
).
Thus, here is the next grand challenge for control
engineers. In the past, the control community showed
how regulatory control could be automated using
computers and thereby removing it from the hands of
human operators. This has led to great progress in
product quality and consistency, process safety and
process efficiency. The current challenge is the automa-
tion of AEM using intelligent control systems, thereby
providing human operators the assistance in this most
pressing area of need. People in the process industries
view this as the next major milestone in control systems
research and application.
The automation of process fault detection and
diagnosis forms the first step in AEM. Due to the broad
scope of the process fault diagnosis problem and the
difficulties in its real time solution, various computer-
aided approaches have been developed over the years.
They cover a wide variety of techniques such as the early
attempts using fault trees and digraphs, analytical
approaches, and knowledge-based systems and neural
networks in more recent studies. From a modelling
perspective, there are methods that require accurate
process models, semi-quantitative models, or qualitative
model. At the other end of the spectrum, there are
methods that do not assume any form of model
information and rely only on process history informa-
tion. In addition, given the process knowledge, there are
different search techniques that can be applied to
perform diagnosis. Such a collection of bewildering
array of methodologies and alternatives often pose a
difficult challenge to any aspirant who is not a specialist
in these techniques. Some of these ideas seem so far
apart from one another that a non-expert researcher or
practitioner is often left wondering about the suitability
of a method for his or her diagnostic situation. While
there have been some excellent reviews in this filed in the
past, they often focused on a particular branch, such as
analytical models, of this broad