A Bayesian approach to crack detection in electrically conducting media

INSTITUTE OF PHYSICS PUBLISHING Inverse Problems 17 (2001) 121136 www.iop.org/Journals/ip INVERSE PROBLEMS PII: S0266-5611(01)16980-7 A Bayesian approach to crack detection in electrically conducting media
Kim E Andersen1,3 , Stephen P Brooks2 and Martin B Hansen1
1 Department of Mathematical Sciences, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg , Denmark 2 Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WB, UK E-mail: emil@math.auc.dk (K E Andersen) Received 9 September 2000, in final form 21 November 2000 Abstract In this paper, we review powerful new computational techniques which facilitate the Bayesian approach to statistical inference and discuss how they may be used to solve general inverse problems. Their power and flexibility is illustrated by the problem of detecting a finite set of linear non-intersecting perfectly insulating cracks in a homogeneously electrically conducting medium. In this case, efficient algorithms only exist if the number of cracks is known a priori. However, in this paper we demonstrate how uncertainty about the number of cracks can be incorporated into the modelling process and assessed together with crack locations. 1. Introduction It is often desirable to determine the interior structure of a given body in a non-destructive manner. Common non-destructive evaluation systems are based upon tomographic principles where the reconstruction is an inverse problem. The primary interest is in some object but data are only accessible about some transform F . The data are observed at m discrete points ti and are corrupted by noise, such that they take the form (t) = F (t) + (t), t = t1 , . . . , tm , (1) where (t) denotes a random noise process, typically assumed to be Gaussian. The most obvious approach for reconstructing is to estimate by ^ using standard statistical methods (Efromovich 1999) and then use ^ = F -1 ^ as an estimate of . However, such problems are generally ill posed (Hadamard 1923), which means that either the inverse operator F -1 does not exist, or it is an unbounded operator. In the latter case, even small perturbations of the data, and thereby ^ , may result in unacceptably large distortions of ^ . In order to avoid this instability Tikhonov regularization (Tikhonov 1963a, b) is often used. This is a method of balancing the degree of misfit of the solution and a functional J ( )
3 Author to whom correspondence should be addressed. 2001 IOP Publishing Ltd Printed in the UK 121 0266-5611/01/010121+16$30.00 122 K E Andersen et al which measures the `appropriateness' of the function . The specification of J is crucial and is usually chosen on the basis of prior information about the solution space. The Bayesian approach has therefore enjoyed some popularity as a tool for formalizing the incorporation of prior information and for properly accounting for uncertainty (Tarantola 1987). To add to this popularity, powerful new simulation techniques are now available which allow even more complicated and realistic models (see Mosegaard and Tarantola 1995, Mosegaard 1998, DahlJensen et al 1998, Fox and Nicholls 1997, Mosegaard and Rygaard-Hjalsted 1999, Kaipio et al 2000). In this paper we review some of the recent progress in Bayesian computation, focusing upon Markov chain Monte Carlo (MCMC) methods, and indicate how they may be used to solve general inverse problems. For illustration, we will consider the problem of detecting a finite set of linear non-intersecting perfectly insulating cracks in homogeneously conducting media. A variety of different imaging algorithms exist for non-destructive evaluation on the basis of electrical impedance tomography data. Many are general-purpose algorithms which seek to reconstruct the entire conductivity profile within the object (see e.g. Barber and Brown 1986, Santosa and Vogelius 1990, Cheney et al 1998). Such algorithms typically base all computations for the underlying conductance problem upon a two-dimensional finiteelement formulation. However, as illustrated by Bryan and Vogelius (1994), the efficiency and versatility of these algorithms can be improved dramatically when structural prior information about the profile is available. Bryan and Vogelius (1994) develop a crack detection algorithm based upon a variation of Newton's method in which an initial crack configuration 0 is iterated until a solution is met. However, this method has the drawback that the solution obtained is critically dependent upon the initial crack configuration, as the initial number of cracks remains fixed throughout the iterative estimation procedure. Since the number of cracks is rarely known a priori, ad hoc techniques are generally used to overcome this problem. In this paper, we show how problems associated with the a priori determination of the number of cracks may be overcome by explicitly incorporating this source of uncertainty within the modelling process. In this way we determine not only the number of cracks but also the crack locations in the media under study. In section 2 we summarize the mathematical model for the voltage potential in the presence of cracks in an electrically conducting media. In section 3 we introduce the Bayesian approach to statistical inference and the computational techniques required for its implementation. Specific implementational issues within the reconstruction algorithm are addressed in section 4. In section 5 we address the behaviour and performance of our approach and finally in section 6 we discuss in the light of the achieved results further potential developments. 2. Modelling boundary measurements in the presence of cracks Suppose that any crack lies within a two-dimensional simply connected conductor and is modelled as a perfectly insulating line segment. In this paper we assume that a given crack is parametrized by
(t) = a + t (b - a), 0 t 1, where a = (x1 , y1 ) and b = (x2 , y2 ) denote s endpoints in . We let = n k denote k=1 a finite collection of n non-intersecting cracks in (see figure 1). At low frequencies and with a given background conductivity (x) satisfying 0 < 0 (x) 1 for x \ , Maxwell's steady-state conductance read ( v) = 0 in \ Bayesian crack detection
(0,1)
21 23 25 27 29 19 17 15 13 11 9 7 5 3 1 59 57 55 39 41 43 45 47 49 51 53 123 (-1,0) 31 33 35 37 (1,0) (0,-1) Figure 1. Schematic representation of the experimental setup. The object has an interior crack which is to be determined. Current is flowing in at electrode xin = 51 and out at electrode xout = 9. Corresponding potential voltage differences are recorded. v =0 on k , k = 1, . . . , n, subject to appropriate boundary conditions on , e.g. v= on , where v denotes the voltage potential induced in , denotes the normal field to and is a known function. For mathematical convenience, the steady-state equations are rewritten to obtain in ( -1 u) = 0 u = ck on k , \ k = 1, . . . , n, for a particular set of constants ck , k = 1, . . . , n. The function u represents the -harmonic conjugate to v given by (u) = v in \ , where denotes counterclockwise rotation by /2. The Neumann boundary condition takes the form u = on , -1 which is used to model the applied input current. As current flows into the object at electrode xin and exits at electrode xout (see figure 1) and the electrodes are modelled as point sources, then the function is defined as (x) = xin (x) - xout (x), where xin (x) = 1 0 if xin = x, else.
x , The function xout (x) is defined similarly. By application of standard potential theory arguments, Bryan and Vogelius (1994) develop an integral equation formulation of this boundary value problem based upon the assumption that the object has constant background conductivity 1. As this formulation depends upon singular boundary data u, a more useful formulation is obtained by constructing a perturbation v = u - u0 which is smooth up to . Here u0 satisfies the conductance equations for x 124 K E Andersen et al when no cracks are present. As v is smooth up to , any problems associated with the lack of regularity of u are avoided. For the purposes of this paper, we let denote the collection of allowable crack configurations, i.e. = n , where n is the set of n non-intersecting cracks fully n=0 within . Furthermore, we let S : H 1 ( ) denote the solution operator of the derived integral equation, where H 1 ( ) is a Sobolev space of order one (for more details consult Bryan and Vogelius 1994). To discretize the integral formulation obtained by Bryan and Vogelius (1994) we let z (t) parametrize for t-values 0 t < 1 and parametrize k for values k t < k + 1 for any k = 1, . . . , n. Furthermore we let K(ti , tj ) = [z (ti ), z (tj )]|z (tj )|, z(tj ) G(ti , tj ) = [z (ti ), z (tj )]|z (tj )|, where / z(tj ) on the Green function denotes the normal outward derivative with respect to z (tj ) and 1 log |x - y |, 2
x, y R2 . is (x, y ) = Finally, we let u(x(t)) kj = v[z (tj )] on
t=tj k (k = 1, . . . , n), on (k = 0), where / on k denotes the normal flux across the crack. Now, by application of Nystr m's method (see e.g. Atkinson 1976), the integrals in the o integral formulation of Bryan and Vogelius (1994) are discretized by quadrature rules to obtain a set of linear equations. Hence, if we let m denote the number of nodes to be used in the quadrature rule, we obtain the following linear system:
1 - 2 0i + m j =1 m j =1 m j =1 n K(ti , tj )j 0j -
m n m G(ti , k + tj )j kj = 0, k=1 j =1 K(l + ti , tj )j 0j - lj j |z (l + tj )| = 0 G(l + ti , k + tj )j kj - cl = -u0 [z (l + ti )],
m k=1 j =1 and
j =1 0j j |z (tj )| = 0, for i = 1, . . . , m and l = 1, . . . , n. Due to linear dependences in the first m equations, any one of them may be ignored, leaving a quadratic linear system of mn + m + n equations. The remaining linear system is, for a moderate level of discretization, well conditioned due to the presence of the logarithmic singularity along the first part of the diagonal s = t. The singularity can be dealt with by a simple form of product integration (Atkinson 1976) based upon the applied quadrature rule. Of course, the quadrature rule used on the boundary need not be the same as the one used on the cracks. As the solution is smooth on , a simple trapezoidal rule can be used with weights wi = 1/m for i = 1, . . . , m. However, on the cracks care must be taken due to singularities at the endpoints. Bryan and Vogelius (1994) propose using a quadrature rule that places many points close to, but not at, the endpoints. The weights are chosen to accommodate Nystr m's method corresponding to a midpoint rule with variably o spaced nodes. Bayesian crack detection 125 Figure 2. The deterministic reconstruction algorithm applied to unperturbed simulated crack data: (a) the experimental setup; (b)(e) crack estimates for one, two, three and four cracks fitted to data (initial guess, ----; estimate, ). In practice, the crack detection algorithm of Bryan and Vogelius (1994) begins with an initial guess 0 for which the corresponding boundary voltage data can be predicted by solving the linear system described above. This crack configuration is then iteratively improved with respect to the residual sum of squares using a Newtonian updating scheme. We refer the reader to Bryan and Vogelius (1994) for further details. Their algorithm was tested on a large number of simulated data sets with linear cracks and it appeared to work well when the number of cracks in the initial configuration was the same as the number in the true configuration. Bryan and Vogelius (1994) also consider the problem associated with estimating the number of cracks. Using simulated data with e.g. three `true' cracks, they proceed as illustrated in figure 2 and described as follows. The reconstruction procedure is begun by attempting to fit the simulated data with just one crack with endpoints situated at (-0.35, 0.0) and (-0.35, 0.35). A root for the linear system is found, but at this point it is impossible to say whether or not the simulated data come from just a single crack or several cracks. The middle one-tenth of the estimated crack is cut out and two new cracks are formed and used as the initial guess for a new run of the algorithm. This process is repeated so that at each stage the largest crack of those in the current configuration is split and this new configuration is used as an initial value for repeating the process with an additional crack. The estimated cracks are shown in figures 2(b)(d). Bryan and Vogelius (1994) point out that three things are likely to occur when splitting a crack to obtain a new crack configuration: (1) the cracks find different locations, (2) the cracks remain essentially where they are or (3) one of the split cracks shrinks and the other cracks remain where they were previously estimated to be. The behaviour described in (2) or (3) usually indicates that the previous number of cracks used was the right number to fit to 126 K E Andersen et al the data. When fitting four cracks to the data behaviour (3) is observed. From this, Bryan and Vogelius (1994) conclude that three cracks is the right number of cracks to fit to the data. In this paper, we provide a more rigorous framework for determining the number and position of the cracks. We begin by obtaining the likelihood function, L, derived from (1) under the assumption that the errors (t) are independently Gaussian distributed with mean zero and variance 2 , i.e. 1 (2) L( , | ) m exp{-V ( , )}, where V ( ,) =
m i=1 [ (ti ) - F (ti )]2 /2 2 , F (ti ) = S (ti ) - S (ti-1 ) is the recorded voltage differences and S is the solution operator introduced in section 2. The classical maximum-likelihood approach to models described in such a manner is to seek the configuration and parameter 2 that maximizes this function. However, maximum-likelihood estimation corresponds to nonregularized inversion and usually leads to ill posed problems, and therefore additionally regularization is called for. In the next section we adopt the Bayesian approach to regularization, in which priors are used to compress the probability mass of the likelihood function to a smaller support. 3. The Bayesian approach to statistical inverse problems If we assume that the observed data are described by the model given in (1), then the statistical problem is to estimate and 2 given . For this we need to determine the posterior distribution of = ( , 2 ), denoted by ( | ), which describes our beliefs about the possible crack configurations and variance parameter after having observed the data . Let us assume also that our beliefs about the possible configurations before having observed any data can be described by a prior distribution denoted by p( ). Bayes' formula may then be used to derive the posterior distribution as a function of the prior distribution and the data represented through the likelihood function. We thus have that ( | ) p( )L( | ). The posterior distribution is the basis for statistical inference and is typically summarized by obtaining marginal moments such as the posterior means of the parameters of interest. However, these are not the only scalar summaries. Bayesian point estimation is based upon selecting the parameter value which minimizes the expected value (under the posterior) of a user-specified loss function which attributes a cost to discrepancies between the estimate and the `true' value, (see e.g. Bernado and Smith 2000). The posterior mean minimizes the expected loss under a certain quadratic error loss function - 2 . However, if we felt that the absolute error loss function - were more appropriate, we would use the posterior mode for example. In practice, the user should decide on the best criteria to use before deciding on a form of point estimate. In this paper, we shall adopt the quadratic error loss function and use the posterior mean for inference. The problem is therefore reduced to the computational task of performing the integration E ( ) = ( | ) d , typically over a large parameter space. Often, explicit evaluation is impossible and traditionally one would use numerical integration or analytic approximation methods. However, MCMC Bayesian crack detection 127 methods provide an alternative integration technique whereby posterior means, for example, are estimated by using the sample mean from a series of random draws from the posterior distribution. These random draws are obtained by constructing an irreducible Markov chain { 1 , 2 , . . .} with state space and with stationary distribution ( | ). MCMC sampling was first introduced by Metropolis et al (1953) and was subsequently adapted by Hastings (1970). Over the past ten years such methods have enjoyed widespread popularity within the statistical literature and there exist various standard techniques for constructing the necessary chains (see e.g. Brooks 1998, Robert and Casella 1999). For the purposes of this paper, we shall distinguish between three separate simulation algorithms that will be combined to provide a technique suitable for the crack detection problem. We begin by describing the MetropolisHastings updating scheme for updating a crack configuration without altering the number of cracks. We then describe the reversible jump MCMC (RJMCMC) update, which allows us to update the number of cracks, and finally we introduce the simulated tempering method as a means of increasing the computational efficiency of the overall simulation process. 3.1. MetropolisHastings updates MetropolisHastings updates are used to move around the parameter space by proposing moves which are subsequently either accepted or rejected. Suppose that we are currently in configuration , then we draw a new configuration from some proposal density q( , ). This proposal is then accepted with probability ( , ) = min 1, ( | )q( , ) . ( | )q( , ) (3) However, if the proposal is rejected, the chain remains in the current state. Many proposal distributions lead to irreducible Markov chains which ensure the convergence of the posterior mean estimate, though several forms possess useful analytic properties. For example, when the proposal distribution q is symmetric, i.e. q( , ) = q( , ), the acceptance function in (3) reduces to ( , ) = min 1, ( | ) , ( | ) which is essentially the original Metropolis update proposed by Metropolis et al (1953). The MetropolisHastings updating scheme takes the simple algorithmic form Initialize 0 for t = 1 to T do Sample from q given t-1 Compute the acceptance probability t = ( t-1 , ) Sample uniform random variable ut on (0, 1) if ut < t then t = otherwise t = t-1 end In practice the MetropolisHastings updating scheme can be used either to update the entire state vector or individual elements. Since `large' jumps tend to have correspondingly small acceptance probabilities, typical MCMC algorithms consist of a sequence of updates focusing upon each element of the state vector in turn. We shall discuss this approach, known as the single-component MetropolisHastings scheme, in more detail in section 4. 128 K E Andersen et al 3.2. Reversible jump MCMC The MetropolisHastings updates are used to update the state vector, essentially moving the positions of the current cracks. In order to move between configurations with different numbers of cracks, we require what are known as RJMCMC updates, since such updates involve moving between states of different dimensions. For example, the introduction of a new crack to the current crack configuration will increase the dimensionality of the state vector, since additional parameters will be needed to describe the position of the new crack. If we view a crack configuration as a point process on the product space one can adopt the algorithm of Geyer and Mller (1994), extending the basic MetropolisHastings algorithm to unconditional point processes. Later Green (1995) via the RJMCMC methodology provided a general framework, extending the basic MetropolisHastings algorithm to general state spaces, so that becomes a general measure, rather than a density. Also, the proposal density q( , ) is replaced by the proposal distribution q( , d ). is generated by Suppose a dimension-changing move m is proposed and the proposal a deterministic invertible function f ( , u), where u is a continuous random variable. Then Green (1995) shows that if rm ( ) denotes the probability of choosing move type m when in state , q(u) denotes the density function of u and ( | ) denotes the posterior density of , then the corresponding acceptance probability becomes ( , ) = min 1, ( | )rm ( ) f ( , u) ( | )rm ( )q(u) ( , u) , (4) where m denotes the reverse move to m. Note that the final term in the above ratio is the Jacobian, arising from the change of variables associated with moving from one space to the other. Algorithmically, the reversible jump updating procedure proceeds identically to that for MetropolisHastings updates. The MetropolisHastings and reversible jump updates will produce a Markov chain with the required stationary distribution. However, the resulting chain may be slow to move around the state space so that large run lengths are required in order to obtain reliable inference. To improve the speed with which the state space is traversed (often termed the mixing rate), Marini and Parisi (1992) and Geyer and Thompson (1995) suggest the use of simulated tempering. 3.3. Simulated tempering Simulated tempering is based upon using a series of stationary distributions 1 , . . . , L and augmenting the state vector to include an indicator variable signalling which distribution is being used at any time. If we let 1 = and choose l , l = 2, . . . , L, so that movement around the state space becomes easier as l increases, then we can run a chain on the augmented state space and base inference only upon observations in the chain attributed to distribution 1 . Since movement is easier for larger l, we obtain a more rapidly mixing chain and movement within the target distribution 1 is facilitated by brief tours in other temperatures. The chain swaps between temperatures i and j at time t and in state t with probability min 1, j ( i (
t| )cj pj i , | )ci pij t where pij denotes the probability of proposing the chain to move from sampler i to sampler j . The ci s are approximate normalizing constants that ensure that the chain divides its time roughly equally among the L different samplers (Geyer and Thompson 1995). Bayesian crack detection 129 4. Implementation In this section, we provide a detailed description on the parametrization of the models and on the updating mechanisms required within the MCMC simulation for the crack detection problem. For simplicity we also assume that is the unit circle. In order to ensure that the posterior is dominated by the likelihood, we adopt a vague prior distribution on the set of allowable crack configurations so that the number of cracks is Poisson distributed with mean . Then for any number of cracks n, we place a prior distribution on the possible crack configurations, n , by assuming that any configuration in n is equally likely. These considerations can be formalized by assuming that the prior distribution p on has a density (with respect to the unit rate Poisson process on the product space ) which satisfies p( ) n( ) 1 I( ), where n( ) is the number of cracks in configuration and 1 denotes the indicator function. I The interested reader is referred to Daley and Vere-Jones (1988) for further details. For our purposes it is sufficient to note that the density is well defined. In order to move freely around the state space, we propose using two within-model moves, that allow for the movement of cracks within , and one between-model move that adds and deletes cracks to/from the current configuration. It is obvious that the addition and deletion moves are the minimum requirement for moving completely around the state space. However, the introduction of other moves such as the within-model moves supplement these to make movement around the state space more rapid. Once we have a set of moves we then want to make them as efficient as possible by scaling the proposals appropriately. If we make jumps which are too `small', we will tend to accept them, but movement will be slow. If we make the jumps too `large', they will often be rejected and once again movement is slow. Optimal scales are therefore usually obtained via pilot tuning. Moreover, the proposals are chosen to make the scheme irreducible, so that convergence of the posterior mean is ensured. The first move, which we refer to as move type A, involves taking the coordinates of the endpoints of any particular crack and sampling a new value for each within a neighbourhood of the current value. This simultaneously translates and scales the chosen crack. Move type B involves rotating a chosen crack about its centre and, finally, move type C enables us to add a crack by proposing a new crack randomly placed within . The reverse move in which a crack is deleted will be referred to as move type C . These moves are illustrated in figure 3. Within each iteration, we begin by deciding whether we will perform a between-model or within-model move. With probability bd we will perform a between-model move, which will be either a death (i.e. we delete a crack--this is proposed with probability d ) or a birth (i.e. we add a new crack--this is proposed with probability 1 - d ). If we choose to perform a withinmodel move, then with probability r we will choose to rotate all cracks, else we translate all cracks. These updates involve taking each crack in turn and performing the selected move type. The updating scheme is graphically illustrated in figure 4. We now consider the move types in turn and explain how each is performed. 4.1. Translating a crack Suppose that we have decided to attempt a move of type A. We do this by considering in turn the four parameters (x1 , y1 , x2 and y2 ) of each crack and updating each parameter by selecting a new value within some small region of the current value. More specifically, to update any one of the four parameters, we sample a value uniformly in a symmetric interval around the current position with width 2 l. In proposing a change to any one of these parameters, we 130 K E Andersen et al Translate a crack Rotate a crack Add/kill a crack A B C C Figure 3. The proposed updates we use in the MCMC algorithm: from left to right we illustrate the different possible transitions for translating (A), rotating (B), and adding (C) and deleting (C ) cracks. Figure 4. Schematic representation of the tree that the transition kernel ascends to pick a transition type. propose a move from to , with qA ( , the acceptance probability in (3) reduces to A ( , ) = min{1, 1 I( ) = 1/2 l. Since this proposal is symmetric, , )]}. ) exp[V ( , ) - V ( Once the first of the four parameters has had a new value proposed and subsequently either accepted or rejected, we move on to the second. This continues until all four parameters have undergone this procedure. When all four parameters have been updated, we move on to the next crack until all cracks have been updated in turn. 4.2. Rotating a crack If we choose to make a move of type B, we first let denote an arbitrarily chosen crack in and assume that it is rotated anticlockwise about its centre according to a rotation parameter . We parametrize by (x1 , y1 , x2 , y2 ) with its centre given by (c1 , c2 ) = (x1 + x2 , y1 + y2 )/2. Let denote the angular direction from (c1 , c2 ) to (x1 , y1 ) with respect to the horizontal axis, then the crack is rotated counter-clockwise according to a rotation parameter uniformly sampled in [- , ]. Hence, the new crack is given by its new endpoints (x1 , y1 ) = (c1 , c2 ) + [cos( + ), sin( + )]/2, (x2 , y2 ) = (c1 , c2 ) - [cos( + ), sin( + )]/2, Bayesian crack detection 131 where = (x1 - x2 )2 + (y1 - y2 )2 denotes the length of . Thus, qB ( , ) = 1/2 and B is identical to A above. When the first crack has been updated, we move on to the next crack until all cracks have been updated in turn. 4.3. Adding/deleting a crack Assume that configuration consists of n cracks, then, in the birth move, a new crack is proposed by sampling two points a = (x1 , y1 ) and b = (x2 , y2 ) uniformly in . By u we denote the vector of random variables (x1 , x2 , y1 , y2 ) for the new crack position so that qC (u) = 1/ 2 . Here, the probability of proposing move type C is given by rC = bd (1 - d ) and the probability of proposing the reverse is given by rC = bd d /(n + 1). The latter being derived from the product of the probabilities of picking a between-model move, then a death and finally picking crack from n + 1 cracks in . The Jacobian term in the acceptance probability given by |f ( , u)/( , u)| is simply one, since f ( , u) = ( , u). Hence, 2 d exp[V ( , )-V ( , )] . (1 - d )(n + 1) The death of a crack is performed similarly, with one of the n cracks being proposed for deletion, a move which is subsequently accepted with probability (1 - d )n C ( , ) = min 1, exp[V ( , ) - V ( , )] . 2 d C ( , ) = min 1, 1 I( ) 4.4. Updating the variance The moves that we have so far described deal with movements in . However, the error variance 2 is also a parameter that requires updating. We assume a priori that the precision = 1/ 2 is gamma distributed with parameters 1 and 2 , in which case the posterior conditional density for is given by ( | , ) 1 +n-1 exp - 2 +
m i=1 [ (ti ) - F (ti )]2 /2 i.e. the conditional distribution of is gamma distributed with parameters 1 + n and 2 + m [ (ti ) - F (ti )]2 /2. It is easy to show that if we choose this distribution as i=1 our proposal, then the corresponding acceptance probability is identically equal to one. This is in fact a special case of the MetropolisHastings update, known as a Gibbs sampling update, see Geman and Geman (1984) and Gelfand and Smith (1990). We update the variance in this manner at each iteration. 4.5. Simulated tempering updates Pilot runs of the simulation appeared to exhibit rather slow mixing behaviour, primarily due to the existence of local maxima in the likelihood such that new cracks can be slow to move to more likely positions. To overcome this problem, the state space of the chain can be augmented to introduce simulated tempering updates, which greatly improves sampling performance. These updates were attempted after every ten iterations so that at temperature i we propose a move to 1 i1 with probabilities pii+1 = pii-1 = 2 for i = 2, . . . , L - 1 and p12 = pLL-1 = 1 (see also table 1). This change is accepted with probability cj pj i -m { exp[V ( , )]}1/j -1/i , min 1, ci pij 132 K E Andersen et al Table 1. Prior temperatures and corresponding proposal steps and prior probabilities used in the crack detecting MCMC algorithm. Shown also is an example of observed frequencies after a run of the chain for 50 000 iterations. Temperature 1 1.5 2 4 7.5 l 0.01 0.02 0.05 0.10 0.15 0.25 /48 /24 /12 /6 /4 /2 Prior probability 0.3627 0.1874 0.1471 0.1192 0.1009 0.0827 Observed frequencies 0.162 0.161 0.188 0.177 0.150 0.162 where j denotes the proposed new temperature (j = i 1). The Markov chain we have proposed here for crack detection was implemented and tested (experimental code-checking simulation can be performed by letting V 0, so that the chain is entirely prior driven). Several simulations were performed varying only the -parameters and, since estimates from these different runs are similar, we assume that all are sampling from the same (and correct) stationary distribution. In practice it is important to monitor the performances of the MCMC algorithm so that we can be sure that the chain(s) has settled to the correct distribution and that we have produced a sufficiently long sample for inference. There are two issues here. Firstly, the chain takes some time to settle to its stationary distribution and output from this initial part of the chain (or burn-in period) is usually discarded. Secondly, once the chain has converged to its stationary distribution, we must run the chain long enough to obtain reliable inference. Slowly mixing chains require longer run lengths than more rapidly mixing ones, so it is often useful to be able to obtain some measure of the mixing properties of the chain. Various methods for determining run lengths on the basis of an initial pilot run have been proposed in the literature, from simple methods based upon autocorrelation function plots to more complex methods involving information about the transition kernel as well as the chain output. Brooks and Roberts (1998) provide an overview and some advice as to which methods may be of most use in any particular context. For the crack detection problem extensions to the original method of Gelman and Rubin (1992) appear to be the most suitable (see Brooks and Gelman 1998, Brooks and Guidici 2000). 5. Results In this section, we consider the performance of our approach on simulated data similar to that used by Bryan and Vogelius (1994) and discussed in section 2. The simulated data come from three cracks, of which two are only 0.05 units away from the boundary, to which 60 electrodes have been attached. The data are perturbed by Gaussian noise with zero mean and variance 2 = 0.25 approximately corresponding to a 10% noise-to-signal ratio. For the simulations, we chose bd = 1/4, r = 2/15 and d = 1/2, so that the chain spends more time translating and rotating cracks than inserting and deleting them. A tempering scheme based on the temperatures 1, 1.5, 2, 4, 7.5 and are used and a temperature change is proposed after every ten iterations. For the prior on the number of cracks we choose = 0.37. To efficiently explore we suggest proposing `bigger' moves in higher temperatures and different values of l and are therefore used. Table 1 provides these values together with the prior temperature probabilities which are based upon a series of pilot runs where the probabilities are adjusted from run to run to obtain values that make the chain visit the different Bayesian crack detection 133 Figure 5. (a) Samples from ( | ) and (b) estimated cracks with 95% credible intervals (credible intervals, ----, mean cracks, ). temperatures roughly equally often (see Geyer and Thompson 1995). The temperature scheme used here and described in table 1 has been tailored (via pilot tuning) to this particular problem and may not be suitable for other data sets. In general some degree of pilot tuning is necessary in order for the simulations to work well (see Geyer and Thompson 1995, who describe a variety of more sophisticated methods for choosing both the prior temperature probabilities and the spacing between successive temperatures). These methods may be of most use when handling problems with densely packed non-intersecting cracks. The MCMC algorithm was started with 2 = 1 in the same initial crack configuration used by Bryan and Vogelius (1994) and ran for 50 000 iterations with 1 = 15 and 2 = 1 2 . In the following interpretation of the simulation results we need a partial ordering of the cracks in order to uniquely identify the different cracks in any given configuration. Assume that is a finite collection of n linear non-intersecting cracks where crack i is uniquely i i i i parametrized by its endpoints (x1 , y1 ) and (x2 , y2 ). A partial ordering of the cracks is then i i obtained by ordering each crack individually, so that x2 x1 for i = 1, . . . , n. Moreover, the n 1 2 cracks are ordered globally so that x1 x1 x1 . The output from the chain whilst in the cold distribution is of particular interest, since it is only these that are drawn from ( | ). Figure 5(a) shows the sampled crack configurations from , whereas figure 5(b) shows the corresponding mean crack configuration with 95% credible intervals. As the cracks are clearly split into three separate clusters, mean and credible intervals are constructed within each cluster. This method is sufficient for the problem at hand, but for even higher noise levels one may encounter problems with `overlapping' clusters of cracks. In such cases an intensity plot of the posterior probabilities for the cracks passing any point in may be more appropriate. Each figure is based upon 8100 samples from . The corresponding posterior means are given in table 2 together with 95% credible intervals. In figure 6 we give the trace plots for the temperatures, the number of cracks, the first 1 coordinate of the first crack (x1 ) and the variance 2 . From figure 6(a) it is apparent that the chain divides its time roughly equally between the six different samplers. A trace plot for the number of cracks is shown in figure 6(b). We can see that the chain appears to move fairly freely between having zero and three cracks. Actually, the posterior distribution is concentrated on configurations with three cracks. Figure 6(c) shows the trace plot for the first coordinate of the 134 K E Andersen et al Table 2. The resulting posterior means, standard deviances and credible intervals for the locations of the three cracks, together with the error variance. 95% CI Parameter
1 x1 1 y1 1 x2 1 y2 2 x1 2 y1 2 x2 2 y2 3 x1 3 y1 3 x2 3 y2 Estimate 0.5827 0.0076 0.9475 0.0002 -0.4608 -0.4140 -0.1678 -0.4174 -0.7009 -0.6715 -0.5783 -0.5381 0.2785 St. d. 0.0103 0.0108 0.0093 0.0032 0.0103 0.0104 0.0099 0.0101 0.0097 0.0098 0.0093 0.0098 0.0363 Lower 0.5685 -0.0383 0.9456 -0.0087 -0.4957 -0.4339 -0.1790 -0.4502 -0.7244 -0.6866 -0.6088 -0.5634 0.2073 Upper 0.6089 0.0039 0.9820 0.0037 -0.4555 -0.3932 -0.1401 -0.4106 -0.6863 -0.6482 -0.5725 -0.5250 0.3496 Truth 0.5833 0.0000 0.9500 0.0000 -0.4583 -0.4167 -0.1667 -0.4167 -0.7083 -0.6667 -0.5833 -0.5417 0.2500 2 1.0 0.8 0.2 0.0 0
0 1 Number of cracks 2 3 4 I/T 0.4 0.6 5 0 100000 200000 300000 400000 500000 Iteration (a) 0.45 0.50 10000 20000 30000 40000 50000 Iteration (b) 0.62 x1 1 2 0.56 0.58 0.20 0.25 0.30 0.35 0.40 0.60 0 2000 4000 6000 Iteration (c) 8000 0 2000 4000 6000 Iteration (d) 8000 Figure 6. Trace plots from the Markov chain: (a) the visited temperatures; (b) number of cracks; (c) the trace of the first coordinate in the first crack's endpoint in the cold distribution; (d) the error variance. Bayesian crack detection 135 first crack. Finally, figure 6(d) shows the trace plots for the variance, which exhibits excellent mixing properties. 6. Discussion In this paper we have developed a pratical Bayesian approach for the reconstruction of an unknown set of linear cracks based upon electrostatic boundary measurements. The reconstruction technique is an extension of a very efficient algorithm developed by Bryan and Vogelius (1994). The advantage of the Bayesian approach is that uncertainty about the number of cracks is a part of the modelling process and thereby the use of ad hoc techniques is avoided. Moreover, the method allows for statistical inference, in the sense that uncertainty about the estimates can be evaluated. The Bayesian reconstruction technique has been tested on a substantial number of different crack configurations data and relatively precise reconstructions were provided, even under high noise levels. Consequently the Bayesian reconstruction technique seems rather promising and very robust and it would therefore be interesting to test it on real crack data, as well as applying it to other problems. However, we note that the method is computationally intensive. For comparison it took 40 min to run 50 000 simulations on a shared 400 MHz RISC processor with 1024 MB memory. Nevertheless, by constructing a proposal distribution based on the gradient of the forward map much more efficient Monte Carlo methods may be derived. Since the gradient already has been provided by Bryan and Vogelius (1994), this modified algorithm would be fairly easy to implement. Acknowledgments The authors would like to thank two anonymous referees for their constructive remarks and criticisms on an earlier draft of this paper which greatly improved the clarity and focus of the final manuscript. This research was partially supported by the EU TMR network ERBFMRX-CT96-0095 on `Computational and statistical methods for the analysis of spatial data'. The work of MBH was also supported by MaPhySto--Centre for Mathematical Physics and Stochastics, funded by a grant from the Danish National Research Foundation. References
Atkinson K E 1976 A Survey of Numerical Methods for the Solution of Fredholm Integral Equations of the Second Kind (Philadelphia, PA: SIAM) Barber D and Brown B 1986 Recent developments in applied potential tomography--APT Information Processing in Medical Imaging ed S Bacharach (Amsterdam: Nijhoff) pp 10621 Bernado J M and Smith A F M 2000 Bayesian Theory (New York: Wiley) Brooks S P 1998 Markov chain Monte Carlo method and its application Statistician 47 69100 Brooks S P and Gelman A 1998 General methods for monitoring convergence of iterative simulations J. Comput. Graph. Stat. 7 43455 Brooks S P and Guidici P 2000 MCMC convergence assessment via two-way ANOVA J. Comput. Graph. Stat. 9 26685 Brooks S P and Roberts G O 1998 Assessing convergence of Markov chain Monte Carlo algorithms Stat. Comput. 8 31935 Bryan K and Vogelius M 1994 A computational algorithm to determine crack locations from electrostatic boundary measurements. The case of multiple cracks Int. J. Eng. Sci. 32 579603 Cheney M, Isaacson D and Newell J C 1998 Electrical impedance tomography SIAM Rev. 41 85101 Dahl-Jensen D, Mosegaard K, Gundestrup N, Clow G D, Johnsen S J, Jensen A W and Balling N 1998 Past temperatures directly from the Greenland ice sheet Science 282 26871 136 K E Andersen et al Daley D J and Vere-Jones D 1988 An Introduction to the Theory of Point Processes (New York: Springer) Efromovich S 1999 Nonparametric Curve Estimation (New York: Springer) Fox C and Nicholls G 1997 Sampling conductivity images via MCMC The Art and Science of Bayesian Image Analysis, Proc. Leeds Annual Statistics Research Workshop ed K V Mardia, C A Gill and R G Aykroyd (Leeds: Leeds University Press) pp 91100 Gelfand A E and Smith A F M 1990 Sampling-based approaches to calculating marginal densities J. Am. Stat. Assoc. 85 398409 Gelman A and Rubin D B 1992 Inference from iterative simulation using multiple sequence Stat. Sci. 7 45772 Geman S and Geman D 1984 Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images IEEE Trans. Pattern Anal. Mach. Intell. 6 72141 Geyer C J and Mller J 1994 Simulation procedures and likelihood inference for spatial point processes Scand. J. Stat. 21 35973 Geyer C J and Thompson E A 1995 Annealing Markov chain Monte Carlo with applications to ancestral inference J. Am. Stat. Assoc. 90 90920 Green P 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination Biometrika 82 71132 Hadamard J 1923 Lectures on the Cauchy Problem in Linear Partial Differential Equations (New Haven, CT: Yale University Press) Hastings W K 1970 Monte Carlo sampling methods using Markov chains and their applications Biometrika 57 97109 Kaipio J P, Kolehmainen V, Somersalo E and Vauhkonen M 2000 Statistical inversion and Monte Carlo sampling methods in electrical impedance tomography Inverse Problems 16 1487522 Marini E and Parisi G 1992 Simulated tempering: a new Monte Carlo scheme Europhys. Lett. 19 4518 Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H and Teller E 1953 Equations of state calculations by fast computing machines J. Chem. Phys. 21 108792 Mosegaard K 1998 Resolution analysis of general inverse problems through inverse Monte Carlo sampling Inverse Problems 14 40526 Mosegaard K and Rygaard-Hjalsted C 1999 Probabilistic analysis of implicit inverse problems Inverse Problems 15 57383 Mosegaard K and Tarantola A 1995 Monte Carlo sampling of solutions to inverse problems J. Geophys. Res. B 100 12 43147 Robert C P and Casella G 1999 Monte Carlo Statistical Methods (New York: Springer) Santosa F and Vogelius M 1990 A backprojection algorithm for electrical impedance imaging SIAM J. Appl. Math. 50 21643 Tarantola A 1987 Inverse Problem Theory (Amsterdam: Elsevier) Tikhonov A N 1963a Regularization of incorrectly posed problems Sov. Dokl. 4 16247 ----1963b Solution of incorrectly formulated problems and the regularization method Sov. Dokl. 4 10358