Deconstructing Arguments From The Case Against Hypothesis Testing

e.wayne.edu/jmasm/sawilowsky_hypothesis_testing.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
Deconstructing Arguments From The Case Against Hypothesis Testing Journal of Modern Applied Statistical Methods

Copyright
© 2003 JMASM, Inc.
November, 2003, Vol. 2, No. 2, 467-474 1538 9472/03/$30.00
467
Deconstructing Arguments From The Case Against Hypothesis Testing
Shlomo S. Sawilowsky
Educational Evaluation and Research

Wayne State University



The main purpose of this article is to contest the propositions that (1) hypothesis tests should be
abandoned in favor of confidence intervals, and (2) science has not benefited from hypothesis testing. The
minor purpose is to propose (1) descriptive statistics, graphics, and effect sizes do not obviate the need for
hypothesis testing, (2) significance testing (reporting p values and leaving it to the reader to determine
significance) is subjective and outside the realm of the scientific method, and (3) Bayesian and qualitative
methods should be used for Bayesian and qualitative research studies, respectively.

Key words: Hypothesis testing, bracketed intervals, significance testing, effect size, Bayes, qualitative


Introduction

There has been an increasing amount of journal
space given to the case against hypothesis
testing over the past quarter of a century. The
ensuing debate has taken many directions and
has been graced with many forms of
argumentation (see, e.g., Sawilowsky, 2003a;
Knapp & Sawilowsky, 2001). Two styles of
attack against hypothesis testing are contested
here.

The first is the proposition that
hypothesis testing should be abandoned in favor
of confidence intervals. (I prefer the term
bracketed instead of confidence interval for
reasons noted in Sawilowsky, 2003a.) Ancillary
to this attack is the proposition that hypothesis
testing is tolerable if and only if it is (a)
buttressed with a report of effect sizes, (b)
accompanied by graphical displays, or (c)
Bayesian.

The second style of attack is that
hypothesis testing should be abandoned due to
philosophical arguments. An example is
embodied in the question if science has
benefited by hypothesis testing.
_______________________________________

Shlomo Sawilowsky is Professor of Education
and Wayne State University Distinguished
Faculty Fellow. Email: shlomo@wayne.edu.
The author gratefully acknowledges discussions
on the ether with Rabbi Chaim Moshe Bergstein.

The Confidence Interval Attack

Neyman (1934), who discovered the
bracketed interval, equated the probabilities
associated with its lower and upper bound with
the ordinary concept of probability (1934, p.
590). Initially, he seemed to equate it with the
fiducial argument promulgated by Fisher (1930).
The presumed lack of difference in the
derivation of bracketed intervals and fiducial
probabilities was the focus of the discussion
subsequent to the reading of Neymans (1934)
paper before the Royal Statistical Society.
Bowley (1934) raised the question and presented
his answer, I am not at all sure that the
confidence is not a confidence trick Does
it really take us any further?... I think it does
not (p. 609). He considered bracketed intervals
to be nothing more than ordinary probabilities
expressed in a new form.

Neyman (1934) replied that questions
raised in the discussion on the confidence
intervals would require too much space. In fact,
to clear up the matter entirely, a separate
publication is needed[and] this is in
preparation (p. 623). He alluded to the nature of
the response that would follow: It has been
suggested in the discussion that I used the term
confidence coefficient instead of the term
fiducial probability. This is certainly a
misunderstanding (p. 623). Did Neyman
differentiate between his proposed bracketed
interval and the venerable hypothesis test? DECONSTRUCTING THE CASE AGAINST HYPOTHESIS TESTING
468



No. Neyman (1935) immediately
disabused readers of the statistical literature of
this notion. He stated, The problem of
estimation in its form of confidence intervals
stands entirely within the bound of the theory of
probability (p. 116), as does hypothesis testing.
How, then, did the claim that bracketed intervals
are superior and preferred eventually arise as a
weapon in the arsenal of the camp attempting to
make a case against hypothesis testing?

Neyman (1941) reviewed the
development of the bracketed interval, which is
translated from the Polish przedzial ufnoci.
He mentioned this phrase in 1930 in lectures at
the University of Warsaw and the Central
College (Agriculture) in Warsaw, Poland. Prior
to the redaction of the theory, Pytkowksi (1932)
published a practical application.

Neyman (1941) recounted that he had
noticed numerical similarities obtained with his
method and that of the fiducial argument. As a
result, he had initially assumed the two
paradigms were identical. Neyman was satisfied
with considering the bracketed interval as an
extension of the fiducial argument because
Fisher (1930) had priority.
Eventually,
Neyman
(1934)
became
estranged from the fiducial argument. He no
longer considered the two theories
interchangeable. He left the reasons unstated in
his opening presentation before the Society.

Fisher (1934) attended the reading as a
discussant. Historical accounts of the exchange
were varied. Some expressed chagrin with
Fisher, who offered minimal comments on the
new methodology, and instead concentrated on
the relative merits of random vs purposive
sampling selection. Others, in noting Bowleys
(1934) comment that the paper was difficult to
understand, assumed that Fisher might have
neglected to read Neymans paper prior to the
reading and simply didnt follow it. Still others
proposed that this was Fishers feeble attempt at
blocking his baton from being passed to
Neyman, just as Karl Pearson had tried in vain
two decades prior with Fisher.

These reports misrepresented Fishers
response. Most of his comments were directed to
the sampling problem because that was the
primary thesis of Neymans (1934) paper.
Moreover, a careful review of the published
discussion indicates that Fisher understood the
papers implication quite well. His response was
a terse defense of the fiducial argument as the
explanation of ordinary probability.

Neyman (1941) was surprised! Fiducial
probability and the fiducial distribution of a
parameter were more or less, lapsus linguae,
difficult to avoid in the early stages of a new
theory (p. 129). The fiducial argument was
vague, misconceived, and vacuous in explaining
ordinary probability.

The aftermath took the form of
considerable and animated debate in the
literature on the fiducial argument. Many
mathematical statisticians, regardless of
theoretical persuasion, joined in the fray by
publishing their support or concern. Wald
(1939), Wald and Wolfowitz (1939), and Welch
(1939) sided with the bracketed interval. Fisher
(1935), Starkey (1938), Sukhatme (1938), and
Yates (1939) defended the fiducial argument.
Pitman (1939) opined that the two theories were
essentially the same, as did Bartlett (1939) to a
lesser extent.

Bartlett (1936, 1939) also escalated the
debate with the contention that where results
diverge, the fault lies within the fiducial
argument. As can be imagined, Fisher (1937,
1939a, 1939b) and Yates (1939) accepted the
gauntlet. Jeffreys (1940) attempted to restore
calm in claiming that the bracketed interval and
the fiducial argument were both subsumed under
inverse probability in the system of Bayes. This
had no effect on the debate, of course, because
few of the combatants were Bayesian. The
controversy would only die with Fisher.

Neyman (1941) succinctly described the
relationship between the two theories: There is
none (p. 130) because the theories of fiducial
argument and of confidence intervals differ in
their basic conceptions (p. 149). He was:

inclined to think that the literature on the
theory of fiducial argument was born out
of ideas similar to those underlying the
theory of confidence intervals. These
ideas, however, seem to have been too
vague to crystallize into a mathematical
theory. Instead, they resulted in
misconceptions of fiducial probability
and fiducial distribution of a SHLOMO S. SAWILOWSKY

469
parameter In this light, the theory of
fiducial inference is simply non-existent.
(p. 149)


Return to the confidence interval
attack against hypothesis testing. Fishers
fiducial