New York. John Wiley & Sons, Inc.
Cochran presents an introduction to the book and to the methods of sampling theory in an introduction. We present an extract from that Introduction below:
1. Advantages of the sampling method.
Our knowledge, our attitudes, and our actions are based to a very large extent upon samples. This is equally true in everyday life and in scientific research. A person's opinion of an institution that conducts thousands of transactions every day is often determined by the one or two encounters which he has had with the institution in the course of several years. The traveller who spends 10 days in a foreign country and then proceeds to write a book telling the inhabitants how to revive their industries, reform their political system, balance their budget, and improve the food in their hotels is a familiar figure of fun. But in a real sense he differs from the political scientist who devotes 20 years to living and studying in the country only in that he bases his conclusions on a much smaller sample of experience and is less likely to be aware of the extent of his ignorance. In every branch of science we lack the resources to study more than a fragment of the phenomena that might advance our knowledge.
Until recent years, relatively little attention was given to the problem of how to draw a good sample. This does not matter so long as the material from which we are sampling is uniform, so that any kind of sample gives almost the same results. Laboratory diagnoses about the state of our health are made from a few drops of blood. This procedure is based on the assumption that the circulating blood is always well mixed and that one drop tells the same story as another - an assumption which we as laymen fervently hope is correct. But when the material is far from uniform, as is often the case, the method by which the sample is obtained is critical, and the study of techniques that ensure a trustworthy sample becomes important.
This book contains an account of the body of theory that has been built up to provide a background for good sampling methods. In most of the applications for which this theory was constructed, the aggregate about which information is desired is finite and delimited - the inhabitants of a town, the machines in a factory, the fish in a lake. In some cases it may seem feasible to obtain accurate information by taking a complete enumeration or census of the aggregate. Administrators who have been accustomed to dealing with censuses have sometimes been suspicious of samples and reluctant to use them in place of censuses. Although this attitude is losing ground, it may be well to list the principal advantages of sampling as compared with complete enumeration.
i. Reduced cost. If data are secured from only a small fraction of the aggregate, expenditures may be expected to be smaller than if a complete census is attempted.
ii. Greater speed. For the same reason, the data can be collected and summarized more quickly with a sample than with a complete count. This may be a vital consideration when the information is urgently needed.
iii. Greater scope. In certain types of inquiry, highly trained personnel or specialized equipment, limited in availability, must be used to obtain the data. A complete census may then be impracticable: the choice lies between obtaining the information by sampling or not at all. Thus surveys which rely on sampling have more scope and flexibility as to the types of information that can be obtained. On the other hand, if information is wanted for many subdivisions or segments of the population, it may be found that a complete enumeration offers the best solution.
iv. Greater accuracy. Because personnel of higher quality can be employed and can be given intensive training, a sample may actually produce more accurate results than the kind of complete enumeration that it is feasible to take.
2. The principal steps in a sample survey.
As a preliminary to a discussion of the role which theory plays in a sample survey, it is convenient to describe briefly the steps that are usually involved in the planning and execution of a survey. Surveys vary greatly in their complexity. To take a sample from 5000 cards, neatly arranged and numbered in a file, is an easy task. It is another matter to sample the inhabitants of a region where transport is by water through the forests, where there are no maps, where fifteen different dialects are spoken, and where the inhabitants are suspicious of a stranger, and very suspicious of an inquisitive stranger. Problems which are baffling in one survey may be trivial or non-existent in another.
The principal steps in a survey are grouped somewhat arbitrarily under nine headings.
i. Statement of the objectives of the survey. A lucid statement of the objectives is most helpful. Without this, it is easy in a complex survey to forget the objectives when engrossed in the details of planning, and to make decisions that are at variance with the objectives.
ii. Definition of the population to be sampled. The word population will be used to denote the aggregate from which the sample is chosen. The definition of the population may present no problem, as when sampling a batch of electric light bulbs in order to estimate the average length of life of a bulb. In sampling a population of farms, on the other hand, rules must be set up to define a farm, and borderline cases will arise. These rules must be usable in practice: the enumerator must be able to decide in the field, without much hesitation, whether a doubtful case belongs to the population or not.
Whenever possible, the population to be sampled should obviously coincide with the population about which information is wanted. Sometimes this requirement is judged, rightly or wrongly, to be too difficult. In a new area of research, where the collection of data presents perplexing problems of measurement, it may be decided to concentrate the resources on this aspect of the survey, choosing a population that is compact and easy to sample, although this is not the broader population about which information is really wanted. In this event one should also collect any comparative information about the two populations that helps to show whether inferences to the broader population can be attempted.
iii. Determination of the data to be collected. It is well to verify that all the data are relevant to the purpose of the survey, and that no essential data are omitted. There is frequently a tendency to collect too many data, some of which are never subsequently examined.
iv. Methods of measurement. When the kinds of data that are needed have been decided, there may be a choice as to the methods of measurement to be employed. For instance, data about a person's state of health may be obtained from statements which he makes or from a more or less thorough medical examination. With human populations, the manner and the order in which questions are asked may produce substantial differences in the results: see e.g. Payne (1951).
v. Choice of sampling unit. As a preliminary to the selection of a sample, the population must be subdivided in some way into parts which will be called sampling units, or units. The sampling units must together comprise the whole of the population, and they must be non-overlapping, in the sense that every element in the population belongs to one and only one unit. Sometimes the appropriate unit is obvious, as with a population of light bulbs, where the unit is the single bulb. Sometimes there is a considerable choice of unit. In sampling the people in a town, the unit might be an individual person, the members of a household, or all persons dwelling in the same city block. In sampling an agricultural crop, the unit is likely to be an area of land whose shape and dimensions are at our disposal.
The construction of a complete list of sampling units, sometimes called a frame, maybe one of the major practical problems. Sometimes the frame is impossible to construct, as with the population of fish in a lake.
vi. Selection of the sample. There is now a variety of procedures by which the sample may be selected. The selection involves also a decision about the size of the sample, which in turn requires a provisional estimate of the cost of the survey, to ensure that the sample will fall within the allowable budget.
vii. Organization of the field work. In extensive surveys, many problems of business administration are involved. The personnel must receive training in the purpose of the survey and in the methods of measurement to be employed and must be adequately supervised in their work. A procedure for early checking of the quality of the returns may be invaluable. Plans must be made for handling non-response, that is, the failure of the enumerator to obtain information from certain of the units in the sample.
viii. Summary and analysis of the data. The first step is to edit the completed questionnaires, in the hope of amending recording errors, or at least of deleting data that are obviously erroneous. Decisions about tabulating procedure are needed in the case where answers to certain questions were omitted by some respondents or had to be deleted in the editing process. Thereafter, the tabulations which lead to the estimates are performed. Different methods of estimation may be available for the same data.
ix. Information gained for future surveys. The more information we have initially about a population, the easier it is to devise a sample which will give accurate estimates. Any completed sample is potentially a guide to improved future sampling, through the data which it supplies about the means, standard deviations, and nature of the variability of the principal measurements, and about the costs involved in getting the data. Sampling practice advances more rapidly when provisions are made to assemble and record information of this type.
There is another important respect in which any completed sample facilitates future samples. Things never go exactly as planned in a complex survey. The alert sampler learns to recognize mistakes in execution and to see that they do not occur in future surveys.
3. The role of sampling theory.
This list of the steps in a sample survey has been given in order to emphasize that sampling is a practical business, which calls for several different types of skill. In some of the steps-the definition of the population, the determination of the data to be collected and of the methods of measurement, and the organization of the field work-sampling theory plays at most a minor role. Although these topics will not be discussed further in this book, their importance should be realized. Sampling demands attention to all phases of the activity: poor work in one phase may ruin a survey in which everything else is done well.
The purpose of sampling theory is to make sampling more efficient. It attempts to develop methods of sample selection and of estimation that provide, at the lowest possible cost, estimates that are precise enough for our purpose. This principle of specified precision at minimum cost recurs repeatedly in the presentation of theory.
In order to apply this principle, we must be able to predict, for any sampling procedure that is under consideration, the precision and the cost to be expected. So far as precision is concerned, we cannot foretell exactly how large an error will be present in an estimate in any specific situation, for this would require a knowledge of the true value for the population. Instead, the precision of a sampling procedure is judged by examining the frequency distribution which is generated for the estimate ' if the procedure is applied again and again to the same population, This is, of course, the standard technique by which precision is judged in statistical theory.
A further simplification is introduced. With samples of the sizes that are common in practice, there is often good reason to suppose that the sample estimates are approximately normally distributed. Consequently the sampling variance of the estimate is used to provide, in inverse terms, a measure of its precision. A considerable part of the theory deals with the calculation of formulas for the sampling variances of estimates obtained by various procedures.
The study of sampling from an infinite population is a relatively old and well-established discipline. The development of theory specifically for application to sample surveys is quite recent. Nearly all the references in this book are less than 20 years old and the majority are less than 10 years old. The primary stimulus to sample survey theory was the increasing use of sample surveys as a means of obtaining information. Most of the work in sample survey theory has been done by persons who are also actively engaged in the conduct of surveys. In their turn, the advances in theory increased the scope and utility of the sampling method and contributed to a further growth in the practical use of surveys.
One difference between sample survey theory and the older theory of sampling is that the population is with which we have to deal in survey work contain a finite number of units. The methods used to prove theorems are different, and the results are slightly more complicated, when sampling is from a finite instead of an infinite population. For practical purposes these differences in results for finite and infinite populations are seldom important. Whenever the size of the sample is small relative to the size of the population, as happens in the great majority of applications, results derived from an infinite population are fully adequate. In general, results for finite populations will be presented in this book. In some of the more difficult problems, the theory for infinite populations will be used to simplify the presentation.
WILLIAM G COCHRAN
The Johns Hopkins University