PII: S0168-9274(98)00118-4

tioning techniques based on sparse approximate inverses are considered.
A description of the preconditioners is given, and the results of an experimental comparison performed on one
processor of a Cray C98 vector computer using sparse matrices from a variety of applications are presented.
A comparison with more standard preconditioning techniques, such as incomplete factorizations, is also included.
Robustness, convergence rates, and implementation issues are discussed.

1999 Elsevier Science B.V. and
IMACS. All rights reserved.
Keywords: Sparse linear systems; Sparse matrices; Preconditioned Krylov subspace methods; Incomplete
factorizations; Factorized approximate inverses; SPAI; FSAI; Incomplete biconjugation
1. Introduction
One of the most important problems in scientic computing is the development of efcient parallel
iterative solvers for large, sparse systems of linear equations Ax
= b. Krylov subspace methods,
which require at each iteration matrixvector products and a few vector operations (dot products,
vector updates), can be efciently implemented on high-performance computers, but they necessitate
preconditioning in order to be effective. Many of the most popular general-purpose preconditioners, such
as those based on incomplete factorizations of A, are fairly robust and result in good convergence rates,
but are highly sequential and it is difcult to implement them efciently on parallel computers, especially
for unstructured problems. Thus, preconditioning is currently the main stumbling block precluding high
performance in the solution of large, sparse linear systems. Corresponding author. E-mail: benzi@lanl.gov.
1
Work supported in part by the Department of Energy through grant W-7405-ENG-36 with Los Alamos National Laboratory.
2
E-mail: tuma@uivt.cas.cz. Work supported in part by the Grant Agency of the Czech Academy of Sciences through grants
No. 2030706 and 205/96/0921.
0168-9274/99/$20.00

1999 Elsevier Science B.V. and IMACS. All rights reserved.
PII: S 0 1 6 8 - 9 2 7 4 ( 9 8 ) 0 0 1 1 8 - 4 306
M. Benzi, M. Tuma / Applied Numerical Mathematics 30 (1999) 305340
Since the early days of vector and parallel processing, a great amount of work has been devoted to the
problem of extracting as much inherent parallelism as possible from the best serial preconditioners, such
as SSOR and incomplete factorization methods. As a result, good performance is possible for certain
classes of problems (e.g., highly structured matrices arising from the discretization of PDEs on regular
grids). On the other hand, it is still very difcult to solve general linear systems with an irregular sparsity
pattern efciently on vector and parallel computers.
Another line of research consists in developing alternative preconditioning methods which have natural
parallelism. Among the rst techniques of this type we mention polynomial preconditioners, which
are based on approximating the inverse of the coefcient matrix A with a low-degree polynomial in
the matrix. These methods have a long history (see, e.g., [20,24]), but came into vogue only after the
rst vector processors had become available [38,58]. Polynomial preconditioners only require matrix
vector products with A and therefore have excellent potential for parallelization, but they are not as
effective as incomplete factorization methods at reducing the number of iterations. With polynomial
preconditioning, the reduction in the number of iterations tends to be compensated by the additional
matrixvector products to be performed at each iteration. More precisely, it was shown by Axelsson [4]
that the cost per iteration increases linearly with the number m
+1 of terms in the polynomial, whereas the
number of iterations decreases more slowly than O(1/(m
+ 1)). Therefore, polynomial preconditioners
cannot be very effective, especially on serial and shared memory computers. For distributed memory
parallel machines they are slightly more attractive, due to the reduced number of dot products and matrix
accesses. Another attractive feature is that polynomial preconditioners require almost no additional
storage besides that needed for the coefcient matrix A and have very small set-up times. Also, they can
be used in a matrix-free context, and they are easily combined with other preconditioners. On the other
hand, polynomial preconditioners have serious difculties handling the case of matrices with general
complex spectra (i.e., with eigenvalues on both sides of the imaginary axis). In summary, it is fair to
say that polynomial preconditioning is unlikely to yield a satisfactory solution to the problem of high-
performance preconditioning for general sparse matrices.
In recent years there has been a growing interest in yet another class of algebraic preconditioners
sparse approximate inverses. These methods are based on approximating the inverse matrix directly,
as in polynomial preconditioning. However, with polynomial preconditioning the approximate inverse
is available only implicitly in the form of a polynomial in the coefcient matrix A; in contrast, with
sparse approximate inverse preconditioning a matrix M
A
1
is explicitly computed and stored. The
preconditioning operation reduces to a matrixvector product with M. Methods of this kind were rst
proposed in the early 1970s (see [10,48]), but they received little attention, due to the lack of effective
strategies for automatically determining a good nonzero pattern for the sparse approximate inverse.
Several such strategies have recently been developed, thus spurring renewed interest in this class of
preconditioners.
While parallel processing has been the main motivation driving the development of approximate
inverse techniques, there has been at least one other inuence. It is well known that incomplete
factorization techniques can fail on matrices which are strongly nonsymmetric and/or indenite. The
failure is usually due to some form of instability, either in the incomplete factorization itself (zero or very
small pivots), or in the back substitution phase, or both; see [30]. Most approximate inverse techniques
are largely immune from these problems, and therefore constitute an important complement to more
standard preconditioning methods even on serial computers. M. Benzi, M. Tuma / Applied Numerical Mathematics 30 (1999) 305340
307
We remark that approximate inverse techniques rely on the tacit assumption that for a given sparse
matrix
A, it is possible to nd a sparse matrix M which is a good approximation of A
1
. However,
this is not at all obvious, since the inverse of a sparse matrix is usually dense. More precisely, it can be
proved that the inverse of an irreducible sparse matrix is structurally dense. This means that for a given
irreducible sparsity pattern, it is always possible to assign numerical values to the nonzeros in such a way
that all entries of the inverse will be nonzero; see [40]. Nevertheless, it is often the case that many of the
entries in the inverse of a sparse matrix are small in absolute value, thus making the approximation of
A
1
with a sparse matrix possible. Recently, much research has been devoted to the difcult problem of
capturing the important entries of A
1
automatically.
There exist currently several alternative proposals for constructing sparse approximate inverse
preconditioners, a few of which have been compared with standard incomplete factorization methods.
However, a direct comparison between different approximate inverse techniques on a broad range of
problems is still lacking. The present paper attempts to ll this gap. Our main contribution consists in a
systematic computational study aimed at assessing the effectiveness of the various methods for different
types of problems. While we refrain from ranking the various methods in some linear ordering, which
would make little sense in a subject like preconditioning for general sparse matrices, we are able to
draw some tentative conclusions about the various methods and to provide some guidelines for their
use. In addition, we compare the approximate inverse techniques with various incomplete factorization
strategies, showing that under appropriate circumstances approximate inverse preconditioning can indeed
be superior to more established methods.
The paper is organized as follows. In Section 2 we give an overview of sparse approximate inverse
techniques. In Section 3 we provide some information on how these techniques were implemented for
our experiments. The results of numerical experiments, carried out on a Cray C98 vector computer, are
presented and discussed in Section 4. The main results of this study are summarized in Section 5. An
extensive list of references completes the paper.
2. Overview of approximate inverse methods
In this section we give an overview of approximate inverse techniques and some of their features.
Because the emphasis is on the practical use of these methods, we do not give a detailed description
of their theoretical properties, for which the interested reader is referred to the original papers. See
also the overviews in [5,29,71]. It is convenient to group the different methods into three categories.
First, we consider approximate inverse methods based on Frobenius norm minimization. Second, we
describe factorized sparse approximate inverses. Finally, we consider preconditioning methods which
consist of an incomplete factorization followed by an approximate inversion of the incomplete factors.
The approximate inversion can be achieved in many ways, each leading to a different preconditioner.
Throughout this section we shall be concerned with linear systems of the form Ax
= b where A is a real
n
n matrix and b is a given real