A Proposed Basic Course in Statistics

George W Snedecor, Professor of Statistics, Iowa State College, presented his 'Proposed Basic Course in Statistics' in the Journal of the American Statistical Association 43 (241) (1948), 53-60. We give a version of his article below.

A Proposed Basic Course in Statistics

INTRODUCTION

A basic introductory course in statistics was advocated by the National Research Council Committee on Applied Mathematical Statistics in its stimulating report of May, 1947, Reprint and Circular Series, Number 128. The functions of such a course are stated as follows: "First, it should form part of a general education and as such it should be self-contained. Secondly, it should provide essential training for students majoring in the natural and social sciences, which may be developed further in later courses. Finally, it should interest promising young students in statistics as a profession."

Concerning laboratory work in statistics, the Report contains these penetrating remarks: "Often the main objective of such laboratories is the calculation of means, variances, correlation coefficients and other statistical quantities from numerical data of various types. The laboratory work should place more emphasis on interpreting or drawing inferences from data and on the nature of those inferences. Simple experiments should be devised for illustrating probability laws of various kinds, for carrying out sampling operations and other random processes. The traditional flipping of coins and rolling of dice are not adequate for illustrating many important random processes. The mathematical theory of many of these processes is too complicated to be handled at an elementary level; their experimental demonstration will give the beginning student some feeling for their significance."

The writer finds himself in entire agreement with this report of the Committee on Applied Mathematical Statistics. He finds, also, that many other teachers realise the need for revision in the traditional methods of presenting the subject. These statisticians are not only striving to reorient their own ideas but are experimenting with new methods of teaching. It seems timely to propose a concrete program which may serve as a springboard for discussions.

Two merging trends emphasise the necessity for changes in the teaching of statistics. One trend is the astonishing growth in the impact of statistics on society. For many years after the American Statistical Association was founded in 1839, popular knowledge of the subject was confined almost entirely to governmental statistics and the data of economics, though insurance was beginning to enter its present dominating position in our social structure. Today statistics has captured the popular fancy. The extensive data on the sports pages, the various opinion polls, popularity tests of many kinds including those of radio programs, the cost-of-living indexes with their repercussions on wage policies, surveys of consumer preference, crop estimates, quality control - these are some outstanding examples of the preoccupation of people with statistical evidence.

The second trend merging with the first is the no less astonishing growth of statistical theory. Estimation and the testing of hypotheses have clothed with living flesh the dry bones of numerical data. The emphasis on sample-to-population inferences has put new meaning into statistical terminology. The Fisherian concept of "information" has shifted interest from the formal rules of calculation and summarisation to the vital processes of getting information into the data by use of appropriate sampling and experimental designs.

The merging of these two trends into a single stream makes new demands on our statistical personnel. Neither the isolated theorist nor the submerged practitioner is able to keep abreast of the current. The specialist in mathematical statistics must acquaint himself with practical problems, and equally, the work-a-day statistician must familiarise himself with the logic of statistical theory: fortunately, this is being made more readily available to those who do not have the benefit of mathematical symbolism. It is only by the intermixture of the two streams that statistics can freely flow onward.

The fact should not be overlooked that the recent upsurge of interest in statistics is based on an ancient pattern of thinking, a form that has developed along with the thought process itself and that is more primitive and more extensive than the logical forms. Concepts of type and of departure from type are counterparts of statistical averages of location (means, regression coefficients, etc.) and of measures of scale such as the interquartile interval and standard deviation. The process of sampling is deeply embedded in human actions. Judgments about probability together with consequent behaviour determine much of the pattern of our daily lives. Anyone entering the profession of statistics, especially the teaching of it, should enjoy the confidence that he is engaging in a fundamental activity of mankind.

Since statistics is so intimate a part of our social organisation, the conclusion is inevitable that it should be taught generally to our young people. As indicated in the National Research Council Committee report, the elements will doubtless be introduced into the high school curriculum as soon as there is a sufficient supply of trained teachers. Meanwhile, the basic course, taught at the freshman or sophomore level, should serve as the foundation of college curricula to produce such teachers along with other professional statisticians.

Essentially, my idea is to bring the student into awareness of and harmony with the statistical content of our society. This content is extensive. It includes gossip, news and probability theory; sports and old age pensions; gambling and weather prediction; birth rates and living costs; the stock market and epidemics. Such large-scale activities as insurance and the census must be integrated with the more academic concepts of probability, distributions, sampling, estimates and tests of hypotheses.

In working out the following syllabus, the author has set himself the ideal of presenting sound statistics in an interesting fashion. It would seem inadvisable to introduce so vital a subject in an austere and forbidding style. On the other hand, one must be constantly on guard lest he produce false impressions that will later have to be tediously eradicated. Only the elements of statistical thinking should be incorporated in this basic course, but the elements should constitute strategically chosen timbers that will fit readily into the projected structure.

It is a deep-seated conviction of the author that the tools of statistics should be brought out and sharpened only after the need for them has been felt. Tables, graphs, and calculations of the various averages have all too often been presented as the subject matter of statistics rather than as its implements. The very young people whom we wish to enlist, those with imagination and intellectual enthusiasms, may be alienated by dull routine if introduced before any necessity for it becomes evident.

It is clear that no more than an outline of a basic course can be suggested now. The outline will be changed and filled in by the efforts of many teachers who will gain experience from trying the experiment. Some have already started; the important thing now is to get more people working at the job.

PROPOSED OUTLINE OF COURSE

I. Introduction. The interest people show in counting and measuring. Some historical items showing the antiquity of the habit. Statistical form of much human thinking. Uncritical attitude of people toward numerical statements. Illustrate fallacies. Interesting and uninteresting statistics. Illustrative material: batting averages, birth rates, election returns, opinion polls, vital statistics, the average man.

II. Inquiry by Sampling. Propose a contract to find out for a radio station the number of listeners to Program X. Develop limitations in extent of inquiry and in number of units interviewed. Discuss design of sample, including size. Quota sampling, random sampling, area methods. Use of mail and telephone.

Assume sampling completed and ballots counted. Contrast known fraction of listeners in sample with unknown fraction in the population. Take confidence interval from table and explain meaning. Expand to population total number of listeners, using census information about area sampled.

This sampling problem has been introduced ahead of the chapter on a priori probability for three reasons: (i) It is the newer and more practical concept; (ii) It leads immediately into a modern social problem instead of into the more ancient and perhaps less honourable one of games of chance; (iii) The term population has its obvious and fundamental meaning, sample-to-population inferences being inevitable. Experience has convinced me that the student feels himself on familiar ground.

Laboratory: Conduct opinion poll sampling of student body; construct questionnaire, design sample, discuss interview technique, tabulate results, make inferences about population. Students enjoy this experience, ask many penetrating questions and open opportunities for sound instruction.

III. Random Sampling from Population with Known Constitution. Use bowl containing equal numbers of beads of two colours, or use table of random numbers with equal numbers of odd and even digits. Draw many samples of 10, recording numbers of "successes." Contrast sample ratios with that of population. Set confidence interval from each sample and determine proportion of correct statements made. Compare results with theory. Emphasise sampling variation and observe improved reliability when samples are combined into larger samples. Compare with poll of radio listeners where parameter is unknown.

Laboratory: Extend sampling experience by use of coins or dice. Record result of each toss for later use.

IV. Frequency Distribution. Tabulate numbers of samples with 0 to 10 successes. Present distribution graphically. The mode as an estimate. The small fraction of extreme values as an explanation of confidence in sampling. The mean number of successes in the aggregate of each student's sampling. Emphasise the greater reliability and utility of the mean as compared with the mode. Combine samples of 10 into larger groups and observe (i) the greater variation in the number of successes and (ii) the lesser variation in the proportion of successes. Law of Large Numbers.

V. Insurance. Next to government, the greatest cooperative social enterprise in America. Historical items. The premium, the expectation of loss, is the price of protection equally enjoyed by all participants. Life insurance and mortality tables. Expectation of life. Emphasise definite statements about uncertain events. Pure premium for single year term insurance. Level premiums for term and life policies. Insurance vs. savings accounts.

Laboratory: Calculating machines will be required to compute various premiums. Insurance is now a social problem of great importance. The soundness of the trend away from pure insurance which is predominantly statistical, towards various forms of savings devices which are mainly financial, is questionable. The student should be able clearly to distinguish between insurance and investment.

VI. A Priori Probability I: Games of Chance. A primitive, universal human interest. Probability and expectation. Conditions of fair play. Effects of limits on resources and time-sample vs. population. Playing against The House. Systems of play. Probabilities of runs.

Laboratory: Try a system like the Martingale, noting winners and losers at end of specified number of throws. Balance accounts in the aggregate. Work out probabilities in some game.

This is one of the chapters that affords great opportunity for exposing popular fallacies.

VII. A Priori Probability II: The Binomial Distribution. Develop first with probability of 1/2. Compare sample distributions with theoretical. Raise question of testing hypotheses. Develop distribution with probability different from 1/2. Skewed distributions. Mean and mode.

Laboratory: Throw dice to get data for samples for asymmetrical binomial. Compare samples with population. Save data for testing in next chapter.

VIII. Empirical Probability. Prevalence of vague judgments and actions based on them. Sampling basis of most calculated probability. Sampling from populations with unknown probability. Illustrate by throwing a loaded die. Calculate chi-square for hypothesis of perfect balance. Get distribution of chi-square from samples of 10. Test of hypothesis. Meaning and use of table of chi-square. Contrast samplings from populations with known and unknown probability. Emphasise the more practical problem of sampling from unknown probability with resulting inferences and tests. Ordinarily the parameter is forever unknown, but pertinent hypotheses intrude themselves.

Laboratory: Extend the experimental basis for the chi-square distribution. Test an ample number of hypotheses about various samplings that have been made. Extend to more than one degree of freedom. Test the goodness of fit of samples from binomial distributions.

This chapter is the climax of the first part of the course. A substantial body of statistical theory has been accumulated together with some practical problems and a number of socially advantageous applications. Sampling from specified populations has been emphasised. The binomial distribution has been developed. A sampling distribution of chi-square has been built up, leading to confidence in the table. Estimates and tests of hypotheses have been justified and applied. Uncertain inference has been exemplified. If the student ends his contact with the course here he will have had experience with the fundamental concepts of statistics.

IX. Measurement. Start with guesses at the length of an 18-20 inch bar. Distribution of guessed lengths. Measure with scale and summarise in distribution. Emphasise variation as characteristic of all measurement. Parameter is again always unknown. Develop idea of normal distribution as a model like the binomial. Properties of the normal distribution. Mean and standard deviation of sample as appropriate estimates of parameters. Methods of calculation. Repeated measurements of the same thing compared to measurements of the members of a sample from a normal population. Emphasise conceptual character of models and impossibility of learning parameters by sampling.

Laboratory: Let all measure height of one member of class. Distribution of heights of men and women. Perform some psychological experiment leading to near-normal distribution.

X. Sampling Distributions. Each member of class draws 10 or more samples of 10 from a normal population, using table of random digits for randomisation. Mean and variance are calculated for each sample. Tabulate distribution of each, showing normality of first and skewness of second. Estimate population mean and variance from each. Calculate t for each sample, using known population mean. Distribution of t. Calculate confidence interval based on each sample and verify fraction of correct statements.

Laboratory: Construct normal curve from parameters of sampled population. Fit normal distribution to distribution of means. Test normality of distribution of variance.

XI. Sampling From Some Human Population. If time and facilities are available, this may be an actual sampling. Usually this chapter will be limited to tabulation, summarisation and presentation of available data from some sample survey. Calculate mean and variance. Test normality of distribution. Confidence intervals. Expand estimates to population totals.

Laboratory: Most of this chapter is the laboratory type of work.

XII. Non-Normal Distributions. Rectangular and skewed. The median as an estimate. Sample distributions of mean and median. Emphasise that no known actual distribution is normal. Use of median and related order statistics.

Laboratory: Draw complete set of samples from some small rectangular populations showing central tendency of means.

XIII. Regression. Growth curves and economic trends. Calculation of linear regression on several assumptions. Deviations of individuals from trend; case studies. Estimates, confidence statements and tests of hypotheses.

Laboratory: Get results of aptitude tests and college grades. Estimate latter from former. Each student calculates his own deviation.

XIV. Index Numbers. Construct a simple cost-of-living index with direct economic meaning-the changing cost of a specific bill of goods. Emphasise necessity of examining computation before attaching meaning to an index.

Laboratory: Construct cost-of-student-living index in your college. Continue from year to year.

XV. Correlation in normal bivariate population. Estimates. Confidence interval and test of hypothesis

\rho = 0

. Mental tests. Inherited characteristics. The variance of differences. Correlation and regression. Rank correlation in samples from non-normal populations.

Laboratory: Construct sampling distribution of r in small samples.

XVI. Sampling from More than One Population; e.g., men's heights and women's. Combining estimates - under what circumstances is it appropriate? Stratified sampling. Estimates of mean and variance - weights. Analysis of variance in groups.

Laboratory: Draw samples and verify various estimates. Distribution of F.

The foregoing chapters constitute a course occupying two quarters. With omission of chapters V, XI, XIV, and the latter parts of some other chapters, a semester's course is available. For a third quarter, one or more of the remaining topics may be expanded to suit special groups; quality control for engineers and industrial economists, assays for entomologists, etc.

XVII. Statistical Instruments of the Federal and State Governments. The census with population studies. Public health and vital statistics. Marketing and other economic statistics. Crop estimates. Relief and security statistics.

XVIII. Financial and Business Statistics. Markets and market records. Trends and forecasts. Consumer acceptance and preferences.

XIX. Quality Control. Historical items. Control chart with statistical features. Contracts involving control contrasted with those providing inspection. Sequential test.

XX. Assays. Governmental regulations. Pure food and drugs. Insecticides. Vitamins. Purity of seed.

XXI. Experimentation. Science and experiments. Alternation of induction and experiment. Control of extraneous variation. Groups and "randomised blocks." Analysis of variance. Statistical control-covariance. Broadening the basis of inference-factorial design. Interaction.

XXII. The Statistical Attitude. Group vs. individual. The usual vs. the exceptional. News vs. everyday life. Detection of fallacies. Probability vs. certainty. Evidence vs. proof.

The mathematical accompaniment of such a course can be adapted to the training of the students. It may vary from simple algebraic derivations of formulas with practice in the notation to a full course in mathematical statistics. Since the logical concepts of statistical theory can be presented and verified experimentally, without the symbolism of mathematics, it is the author's opinion that the mathematical formulation should follow some more general presentation such as that outlined in this syllabus.

On writers of texts for this course will rest the heavy responsibility of making their appeal directly to the student. At present there are not enough teachers to go around. An instructor's manual containing background material (such, for example, as the data from a consumer's preference survey) would seem to be a necessity, since this kind of source material is not available to many. In fact, such a manual may well be expanded into a book on the teaching of elementary statistics.

Last Updated September 2020