## *The Analysis of Variance*, by Henry Scheffé

Below we give extracts from the Preface to Henry Scheffé's classic text

*The Analysis of Variance*as well as an extract from the Introduction and short extracts from twelve reviews. All but one of these review is of the first edition of 1959 while one is from a review of the 1999 reprint of this first edition. We have simply listed the reviews in alphabetical order by the reviewer's surname.**1. From the Preface.**

In this book I have tried to elucidate in a unified way what appears to me at present to be the basic theory of the analysis of variance. This necessitates considering several different mathematical models for the subject. The theory in Part I, namely that for fixed-effects models with independent observations of equal variance, I judge to be jelled into a fairly permanent form, but the theory of Part II, namely that under other models, I expect will undergo considerable extension and revision. Perhaps this presentation will help stimulate the needed growth. What I feel most apologetic about is the little I have to offer the reader on the unbalanced cases of the random-effects models and mixed models. These cannot be generally avoided in planning biological experiments, especially in genetics, the situation being unlike that in the physical sciences. This gap in the theory I have not been able to fill. ... This book contains 117 problems at the ends of the chapters and appendices, of which 38 require numerical computations with "real" data. The variety of applications in these 38 problems should give some idea of the broad applicability of the analysis of variance, even though the problems were chosen only because they furnish suitable examples of the methods described in the text, and with no conscious attempt at inclusion of many substantive fields. The importance of carrying through a considerable amount of numerical work is greater here than it is in learning most branches of statistics. Indeed, some practitioners of the analysis of variance would regard the computational techniques as the most important part of the subject, and consider as perverted my emphasis on the choice of mathematical models. I realise that many practitioners have developed reliable intuitive and verbal paths to the correct analysis in given situations without defining the model, but I find it easier to follow the path to which I am constrained by the choice of model; the approach of choosing the model and then making the analysis dictated by it seems to me also to be simpler to teach, as well as more appropriate for a book on the theory of the subject.

**2. From the Introduction.**

The following rough definition of our subject may serve tentatively: The analysis of variance is a statistical technique for analysing measurements depending on several kinds of effects operating simultaneously, to decide which kinds of effects are important and to estimate the effects. The measurements or observations may be in an experimental science like genetics or a non-experimental one like astronomy. A theory of analysing measurements naturally has implications about how the experiment should be planned or the observations should be taken, i.e., experimental design. Historically, the present technique of analysis of variance has been developed mainly in connection with problems of agricultural experimentation.

An agricultural experiment of a relatively simple structure to which the analysis of variance would be applicable would be the following: In each of three localities four varieties of tomatoes are grown in tanks containing chemical solutions. Two different chemical solutions, which we shall call "treatments", are used, with different proportions of the chemicals. For each treatment in each locality there is a mixing tank from which the fluid is pumped to all the tanks on this treatment, connected "in parallel:" We do not want a "series" connection, where the outflow from one tank is the inflow to another, because this would confound the effects of the varieties in these two tanks with the effects (if any) of order in the "series" connection. The tanks are arranged outdoors with the same orientation, so that the plants in one tank will not appreciably shade those in another, etc. For each treatment in the three localities the chemicals are renewed according to the same specifications. Each variety is grown in a separate tank, with the same number of plants in each. The yield of each tank is the weight of ripe tomatoes produced. The yield from a tank may depend on the variety, the chemical treatment, and the locality. In particular, it will depend on interactions among these factors, a useful concept of the analysis of variance ... The sort of questions for which our theory offers answers is the following: Are the varieties different in yield when averaged over the two treatments and three localities? Do the yields demonstrate differential effects of the varieties for different localities? How can we quantitatively express the differences with a given degree of confidence? Etc.

An agricultural experiment of a relatively simple structure to which the analysis of variance would be applicable would be the following: In each of three localities four varieties of tomatoes are grown in tanks containing chemical solutions. Two different chemical solutions, which we shall call "treatments", are used, with different proportions of the chemicals. For each treatment in each locality there is a mixing tank from which the fluid is pumped to all the tanks on this treatment, connected "in parallel:" We do not want a "series" connection, where the outflow from one tank is the inflow to another, because this would confound the effects of the varieties in these two tanks with the effects (if any) of order in the "series" connection. The tanks are arranged outdoors with the same orientation, so that the plants in one tank will not appreciably shade those in another, etc. For each treatment in the three localities the chemicals are renewed according to the same specifications. Each variety is grown in a separate tank, with the same number of plants in each. The yield of each tank is the weight of ripe tomatoes produced. The yield from a tank may depend on the variety, the chemical treatment, and the locality. In particular, it will depend on interactions among these factors, a useful concept of the analysis of variance ... The sort of questions for which our theory offers answers is the following: Are the varieties different in yield when averaged over the two treatments and three localities? Do the yields demonstrate differential effects of the varieties for different localities? How can we quantitatively express the differences with a given degree of confidence? Etc.

**3. Review by: David Roxbee Cox.**

*Journal of the Royal Statistical Society*, Series A (General)

**123**(4) (1960), 482-483.

The first adjective that comes to mind to describe this book is 'professional'. The author has mastery of his subject and its literature, and clearly has taken great care over the organisation of the book, over the details of the writing, and over the provision of illustrative examples and exercises. The mathematical analysis is carried through with much attention to detail and to the need to provide results of sufficient generality to cover a wide range of special cases. The main tools are vector and matrix algebra; appendices set out the principal mathematical results used in the book. While the flavour of the book is mathematical, I did not feel that any mathematical complications had been introduced for their own sake. ... Altogether this is a most important book, deserving to be widely read. Statisticians working in fields where analysis of variance is used extensively are likely to find the book extremely valuable in consolidating their knowledge of the theoretical side of the subject. The work is intended also as a text for students; it seems to me right, however, that students should have first a mathematically more elementary and intuitive introduction to the subject.

**4. Review by: Arthur Pentland Dempster.**

*Technometrics*

**2**(4) (1960), 517.

This book is an excellent, clearly written text on the theory of analysis of variance. The author fashions a compromise between the aim of reaching students whose mathematics barely includes some calculus and the aim of surveying modern theoretical research in the field. As a result he covers the middle ground best, but the student lacking mathematical depth will find the going hard, and it is doubtful that the book will give much guidance to future theoretical research. The student will find ample exercises, especially in Part I. There is extensive footnoting directed at the reader who wants more depth than the main text affords. The author's viewpoint is that of a mathematical statistician; that is, the first concerns are with the definition of mathematical models and with the application of the methods of mathematical statistics to these models to produce appropriate point estimates, confidence intervals and hypothesis tests. Still, a secondary aim of relating the theory to practice is not neglected and there is considerable discussion of where the models came from, how to decide among competing statistical methods, how to carry out computations, etc.

**5. Review by: Paul Sumner Dwyer.**

*SIAM Review*

**5**(1) (1963), 84-86.

This book integrates the basic theory of those topics which the author considers as belonging to analysis of variance into a unified presentation. The author does distinguish between the treatment of the fixed effects models with independent observations of equal variance which he covers in the six chapters of Part I and the treatment of other models which appears in the four chapters of Part II. ... In planning an important undertaking such as this, the author must give special attention to the mathematical background of the prospective readers and determine how the work should be organized for persons with the background specified. Since most of the derivations of the text are algebraic rather than analytic, the author has specified an understanding of calculus as the prerequisite in the field of analysis. In the field of algebra, he has provided several appendices which the reader is advised to master before undertaking the study of the textual material, if he is unfamiliar with this basic algebra.. ... There are places in the text where more extensive training in analysis is needed for adequate understanding. Generally, however, the author indicates these places and suggests that the reader with minimum training may skip the material indicated without being handicapped in understanding the bulk of the material in the later sections and chapters. But the presentation is essentially mathematical and the author uses precise and extensive notation for the variety of topics which he has covered. ... All in all, this is an excellent book which will be used not only by the student who studies it in a systematic manner but also by the research worker who needs a good reference source to the theory, methods, and techniques which are essential to the solution of problems with the analysis of variance.

**6. Review by: N L Johnson.**

*Journal of the Institute of Actuaries (1886-1994)*

**86**(2) (1960), 229-230.

The present book will provide a final answer to all those who excuse the use of the 'recipe' books on the grounds that 'nothing else is available'. Here is a book giving a thorough treatment of most of the basic ideas, and much of the subsequently elaborated superstructure of analysis of variance as understood by a well-trained mathematical statistician of the present day. The demands on the mathematical equipment of the reader are severe, but not excessively so. The text is, indeed, 'algebraic', as compared with the 'arithmetic' writing of pioneers such as R A Fisher, J Wishart and F Yates. Although it does seem possible that a useful text could be produced with rather fewer letters and rather more numbers, a careful study indicates that some real practical purpose lies behind most of the analysis.

**7. Review by: Thomas E Kurtz.**

*Amer. Math. Monthly*

**67**(9) (1960), 933.

This book presents the theory of the analysis of variance and is not a cookbook, though computational methods are outlined and sufficient problems included to give the reader ample opportunity for practice. The author stresses vector space and geometrical ideas; for instance, the sum-of-squares decomposition may be viewed as resolving the observation vector into components lying in specified subspaces. In the opinion of the reviewer, this book is more comprehensive and complete than its predecessors, and is recommended to all serious students of statistics.

**8. Review by: S L.**

*Population (French Edition)*

**16**(3) (1961), 569.

The powerful tool of statistical analysis consisting of the analysis of variance is frequently used in demography. The book by Scheffé, professor of statistics at the University of California, Berkeley, is a remarkable perfection, after many years of teaching and practice. In addition to important mathematical developments, the book contains an examination of various models, treatment of numerical calculations, and suggestions as to the presentation of results. The author explains in the first part, the case of non-random effects and organizing experiments, Latin squares, incomplete block designs, etc., and in the second part, the case of random effects and mixed models. Particular developments are devoted to the examination of the validity of the results, considering the basic assumptions, normality, independence of observations, equal variances, among others. This is particularly important in an area such as demography, where the data to analyse rarely come from organized experiments, and where non-satisfaction, at least in part, of these assumptions is rather the rule. An introduction to vector and matrix algebra ends the book, in the appendix.

**9. Review by: C C Li.**

*The Quarterly Review of Biology*

**36**(2) (1961), 154.

A review by a biologist and for biologists of a book on statistical theory may not do full justice to it, and this could be the case with Scheffé's book. It is a highly formal, comprehensive, and mathematical treatment of the theory of the analysis of variance. Throughout the 10 chapters packed with derivations of theorems and formulas, there is not a single numerical example in the text except for two small samples in connection with the permutation tests. Nor does the book deal with the design of experiments, although some topics concerning design have been touched on incidentally. Contrary to the earnest hope of the author, the treatise cannot be considered suitable for self-study, at least not by users of statistical methods.

**10. Review by: Dennis Victor Lindley.**

*The Mathematical Gazette*

**83**(498) (1999), 571-572.

The book under review is a reissue of the 1959 original which was amongst the first to provide a complete account of the theory as it was understood in the late 50s. The University of California had then, under Neyman, a leading school of mathematical statistics and Henry Scheffé was one of its luminaries. The book soon became deservedly popular with mathematical statisticians because of its clear and precise exposition. The current edition is a reprint of the original, without any additional material, the author having died as the result of an accident in 1977. ... Science and mathematics, and the blend of these that is modern statistics, are subjects in which the ideas of one generation build heavily on those of earlier generations, so that texts rapidly become out-of-date. I was surprised therefore to find how well the material of 40 years ago is still pertinent and readable today; indeed, it is hard to think of a recent text that could replace Scheffé's. ... One development has been to say that he looked at the problem in the wrong way so that his treatment is fundamentally unsound, a better and easier one being available. Others say he was broadly right and have extended his ideas. Whoever is correct, the book provides a fine description of the mathematics of the analysis of variance as it was in 1959 and, in essentials, as its basics remain today

**11. Review by: Robin L Plackett.**

*The Mathematical Gazette*

**45**(353) (1961), 272.

The problems of statistical inference are thoroughly examined, and it would be difficult to find any issues of importance which have been overlooked. Although the layout of the calculations is described, and numerical exercises are included, this is not one of the many books which discuss in detail the presentation and non-mathematical interpretation of experimental results. The author has made many studies on analysis of variance models, and his full survey of their theoretical aspects is a welcome addition to the relevant textbooks.

**12. Review by: Leonard Jimmy Savage.**

*Mathematical Reviews*MR0116429

**(22 #7217)**.

One can hardly say succinctly just what the analysis of variance is. But there is unquestionably a nexus of topics, each connected to some of the others, that goes by this name and covers a large portion of the theory of statistics that is of frequent applicability. ... The coverage is unusually complete and penetrating, and for many such minor topics as are not treated in detail here, there are adequate references to the literature. There are topics to be found here that are in no other book, and of course there are some altogether new developments. ... While occasionally avoiding certain technical refinements of little practical importance, the many mathematical passages of the book are general, complete and rigorous. ... No book that covers such a great variety of technical ideas as this one can be really light reading. ... a different choice of mathematical methods might have helped. Also, in spite of evident thought and ingenuity, the technical notations remain dazzling and hard to remember ... Whatever may be said in criticism is overshadowed by the merits of this book which is unique in its field and will be indispensible to all who seriously do, study, or teach, statistics in connection with experimentation.

**13. Review by: Alan Stuart.**

*Economica, New Series*

**28**(112) (1961), 453-454.

Professor Scheffé spares no mathematics, and gathers together nearly all the useful results in his field. The deliberate generality of his exposition has clarity (and, in places, elegance) as reward for the pertinacious reader. The readiness to admit our ignorance, where it exists, brings out the pressing need for further research in various directions. Above all, it is made plain that Analysis of Variance (Fisher's most brilliant discovery of how to separate out the effects of different factors upon a variable) is not a method, but a class of methods. Before we can decide which to use, we must specify a mathematical model for the observations we make. (We "empirical" British statisticians have been much less concerned with the development of the alternative models than have the Americans, but this is simply due to our laziness and to a lingering incorrect feeling that the observations contain "information" which cannot really be changed by a change in the underlying model.) It is precisely here that collaboration between the applied scientist and the specialist statistician is most needed; the former to reveal the facts of his situation, the latter to formalise these and specify the appropriate mode of analysis. Scheffé's book will be essential to specialist statisticians and to teachers of mathematical statistics, but his correct refusal to shirk mathematical issues makes it hard work for others. It is well worth the effort.

**14. Review by: Colin White.**

*American Scientist*

**48**(3) (1960), 460.

The procedures known as the analysis most widely used technique in statistics. There are applied statisticians who make their living largely as a result of their detailed knowledge of this method and their skill in coping with the computational problems that arise. Frequently, such persons, in the words of the author, "have developed reliable intuitive and verbal paths to the correct analysis in given situations without defining the model." The present book, however, is firmly based on the use of mathematical models. Nevertheless, it is well-suited to scientists who wish to obtain deeper insight into this branch of statistics. ... This is a timely and meaty book.