Analysis of variance with just summary statistics as input

Citation metadata

Author: Richard S. Lehman
Date: May 1993
From: The American Statistician(Vol. 47, Issue 2)
Publisher: American Statistical Association
Document Type: Article
Length: 6,874 words

Main content

Article Preview :


More times than I care to recount, I have been asked how to use various statistical software packages to perform a one-way analysis of variance (ANOVA) and appropriate multiple comparison procedures on summary statistics for which the original data have been discarded.

An alternative to the technique presented by Larson (1992) is to generate a sample of size n having mean [bar]x and sample variance [s.sup.2] by using n -- 2 values of [bar]x and assigning [bar]x -- a and [bar]x + a as the two remaining values, where a is chosen to produce the desired variance. Simple algebra indicates that a = s[square root of (n - 1)/2].

Compared to the techniques illustrated by Larson using a SAS program, this alternative procedure is easier to explain to clients and simpler to employ with software packages which require data to be input by the user and/or do not have the necessary looping or generating capacities.

Milton W. Loyer Department of Mathematical Sciences Lycoming College Williamsport, PA 17701

The article by Larson in the May 1992 issue of The American Statistician presents a procedure that, in expanded form, is very useful. It is easily applied to designs more complex than one-way ANOVA's, including factorial designs with fixed and random effects, multivariate analysis of variance, and others. The procedure has been an automatic feature of the MULTIVARIANCE program since its first publication in 1968 (see "New Developments in Statistical Computing," 4991, 45,246). The sufficient statistics are the matrix of cell means, a vector of cell frequencies, and the variance--covariance matrices (or correlation matrices and standard deviations).

In particular, the method has been used to:

* Allow students to reanalyze textbook examples or published results when the raw data are not available;

* Reanalyze very large data sets, computing the sufficient statistics on the first run and using them as input on subsequent runs. This is an efficient way to perform several analyses when N or p is very large;

* Review manuscripts submitted for publication. In one instance, I suspected that a study would have reported very different results if a more appropriate analysis had been employed. I used this feature of MULTIVARIANCE and, to the editor's delight but the author's dismay, showed that the major conclusion in the paper should be reversed. What a review!

Jeremy D. Finn Graduate School of Education State University of New York at Buffalo Buffalo, NY 14260

Larson (1992) provides an interesting way to generate a set of data for an ANOVA design beginning with only group statistics. His surrogate data sets have the same number of values per condition as the original data, as well as the same condition means and standard deviations. In addition, the surrogate data produce an ANOVA output identical to the original.

One use for surrogate data sets is in teaching, permitting an instructor to supply a...

Source Citation

Source Citation
Lehman, Richard S. "Analysis of variance with just summary statistics as input." The American Statistician, vol. 47, no. 2, May 1993, p. 157. Accessed 21 May 2022.

Gale Document Number: GALE|A13838424