2.2 Load Estimation Methods used within LOADEST
t> -
Help for Webmasters
« back to results for ""
Below is a cache of http://pubs.usgs.gov/tm/2005/tm4A5/Pages4_5_6.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive.
Yahoo! is not affiliated with the authors of this page or responsible for its content.
2.2 Load Estimation Methods used within LOADEST
4
Load Estimator (LOADEST): A FORTRAN Program for Estimating Constituent Loads in Streams and Rivers
where a
0
and a
j
are model coefficients, NV is the number of explanatory variables, and X
j
is an explanatory variable
2
.
Equation (4) is then exponentiated to yield an estimate of instantaneous load:
(5)
where
is a rating curve estimate of instantaneous load. Development of load estimates using equations 4 and 5 is thus
a 3-step process:
(1) Model Formulation. The form of the linear model (the right-hand side of equation 4) is determined based
on the users knowledge of the hydrologic and biogeochemical system. Each explanatory variable (X
j
) is a func-
tion of a data variable (streamflow or time, for example) that is thought to influence instantaneous load. The num-
ber and form of explanatory variables is highly dependent on the system under study and the constituent of
interest. A simple model with a single explanatory variable (log streamflow) is often sufficient for prediction of
suspended-sediment load (Crawford, 1991), whereas a model with six explanatory variables based on various
functions of streamflow and time is often applicable to nutrients (Cohn and others, 1992a). Additional guidance
on model formulation is provided elsewhere (Judge and others, 1988; Draper and Smith, 1998; Helsel and Hirsch,
2002).
(2) Model Calibration. Given the form of the regression model, a time series of constituent load and the explan-
atory variables is used to develop the model coefficients (a
0
and a
j
, equ. 4) by using ordinary least squares (OLS)
regression. The regression equation then is used to calculate estimates of log load [
] for each observation
in the time series (the calibration data set). Residual error for each observation is equal to the difference between
observed and estimated values of log load [ln(L) -
].
(3) Load Estimation. Estimates of the instantaneous load are obtained using the retransformed version of the
regression model (equ. 5) and a time series of explanatory variables (the estimation data set). Individual estimates
of instantaneous load then are used to determine the total (equ. 2) or mean (equ. 3) load.
As outlined above, estimation of constituent loads using the regression approach is theoretically straightforward. Several
statistical complications arise, however, when dealing with real-world data. Load calculations within LOADEST are
therefore more complex than the calculations described above. Three of these complicating factors (retransformation bias,
data censoring, and nonnormality) are described below, where the three load estimation methods used within LOADEST are
detailed. Additional issues that are germane to all three methods are described in Sections 2.3 and 2.4.
2.2
Load Estimation Methods used within LOADEST
The load estimation process is complicated by retransformation bias, data censoring, and nonnormality. As noted by
Ferguson (1986), rating curve estimates (equ. 5) of instantaneous load are biased; estimates may underestimate the true load
by as much as 50 percent. This retransformation bias is addressed by introducing bias correction factors for the calculation of
instantaneous load. Data censoring occurs when one or more observations used in the calibration step have constituent
concentrations that are less than the laboratory detection limit (Gilbert, 1987). Although substitution (setting C equal to one-
half the detection limit, for example) appears to be a simple remedy for the replacement of less-than values, none of the
substitution methods commonly used yield adequate results (Helsel and Cohn, 1988). A more rigorous treatment of censored
data is therefore required. A final complication is the assumption of OLS regression that the model residuals are normally
distributed. Alternate methods for estimating model coefficients are applicable when model residuals do not follow a normal
2
The
i
subscript is omitted from
L
in equation 4 and all subsequent equations.
L
RC
a
0
a
j
X
j
j
1
=
NV
+
exp
=
L
RC
L
( )
ln
L
( )
ln
THEORY 5
distribution. Because of these complications, LOADEST provides three methods for load estimation; each method is described
below.
2.2.1 Maximum Likelihood Estimation (MLE)
As an alternative to OLS regression, model coefficients (a
0
and a
j
, equ. 4) may be calculated using the method of
maximum likelihood (MLE). When the calibration data set includes censored data, implementation of MLE also is known as
tobit regression (Helsel and Hirsch, 2002). As with OLS, tobit regression assumes that model residuals are normally distributed
with constant variance.
Given the model coefficients provided by regression, estimates of instantaneous load may be obtained by retransforming
equation 4. When the calibration data set is uncensored, the bias correction factor of Bradu and Mundlak (1970) provides a
minimum variance unbiased estimate (MVUE) of instantaneous load (Cohn and others, 1989):
(6)
where
is the MLE estimate of instantaneous load, m is the number of degrees of freedom, s
2
is the residual variance,
and V is a function of the explanatory variables (Cohn and others, 1989). The model coefficients in equation 6 (a
0
and a
j
) are
estimated by maximum likelihood; the bias correction factor [g
m
(m,s
2
,V)] is an approximation of the infinite series given in
Finney (1941). Within LOADEST, g
m
(m,s
2
,V) is replaced by a similar function, phi (Likes, 1980).
Under the MLE method, estimates of instantaneous load are developed for all of the observations in the estimation data
set using equation 6. Mean load estimates for various time periods then are calculated using equation 3 (where
=
).
Standard errors reflecting the uncertainty in each estimate of mean load are calculated by using the method described by Likes
(1980) and Gilroy and others (1990) (for specifics, see equations 925 in Gilroy and others, 1990).
2.2.2 Adjusted Maximum Likelihood Estimation (AMLE)
For the case of censored data, model coefficients estimated by tobit regression (MLE, Section 2.2.1) exhibit first-order
bias. In addition, the Bradu-Mundlak bias correction factor (g
m
,
equation 6) results in biased estimates of instantaneous load.
By using adjusted maximum likelihood estimation (AMLE, Cohn 1988; Cohn and others, 1992b), first order bias in the model
coefficients is eliminated using the calculations given in Shenton and Bowman (1977). A nearly unbiased (Cohn 1988)
estimate of instantaneous load then is given by:
(7)
where
is the AMLE estimate of instantaneous load, a and b are functions of the explanatory variables (Cohn and
others, 1992b),
and are parameters of the gamma distribution, and s
2
is the residual variance. The model coefficients in
equation 7 (a
0
and a
j
) are maximum likelihood estimates corrected for first-order bias; the bias correction factor
[H(a,b,s
2
,
,)] is an approximation of the infinite series given in Cohn and others (1992b).
Under AMLE, estimates of instantaneous load are developed for all of the observations in the estimation data set using
equation 7. Mean load estimates for various time periods then are calculated using equation 3 (where
=
). The
uncertainty associated with each estimate of mean load is expressed in terms of the standard error (SE) and the standard error
of prediction (SEP). The SE for each mean load estimate (Cohn and others, 1992b; equ. 35) represents the variability that may
L
MVUE
a
0
a
j
X
j
j
1
=
NV
+
exp
g
m
m s
2
V
, ,
(
)
=
L
MVUE
L
L
MVUE
L
AMLE
a
0
a
j
X
j
j
1
=
NV
+
exp
H a b s
2
, , , ,
(
)
=
L
AMLE
L
L
AMLE
6
Load Estimator (LOADEST): A FORTRAN Program for Estimating Constituent Loads in Streams and Rivers
be attributed to the model calibration (parameter uncertainty). Calculation of the SEP begins with an estimate of parameter
uncertainty (the SE) and adds the unexplained variability about the model (random error). Because SEP incorporates
parameter uncertainty and random error, it is larger than SE and provides a better description of how closely estimated loads
correspond to actual loads. The SEP is therefore the preferred method of describing uncertainty in loads and is used within
LOADEST to develop 95 percent confidence intervals for each estimate of mean load.
2.2.3 Least Absolute Deviation (LAD)
All of the regression methods discus