Since much interest has been evinced in the historical origin of the statistical theory underlying the methods of this book, and as some misapprehensions have occasionally gained publicity, ascribing to the originality of the author methods well known to some previous writers, or ascribing to his predecessors modern developments of which they were quite unaware, it is hoped that the following notes on the principal contributors to statistical theory will be of value to students who wish to see the modern work in its historical setting.
Thomas Bayes' celebrated essay published in 1763 is well known as containing the first attempt to use the theory of probability as an instrument of inductive reasoning; that is, for arguing from the particular to the general, or from the sample to the population. It was published posthumously, and we do not know what views Bayes would have expressed had he lived to publish on the subject. We do know that the reason for his hesitation to publish was his dissatisfaction with the postulate required for the celebrated "Bayes' Theorem." While we must reject this postulate, we should also recognise Bayes' greatness in perceiving the problem to be solved, in making an ingenious attempt at its solution, and finally in realising more clearly than many subsequent writers the underlying weakness of his attempt.
Whereas Bayes excelled in logical penetration, Laplace (1820) was unrivalled for his mastery of analytic technique. He admitted the principle of inverse probability, quite uncritically, into the foundations of his exposition. On the other hand, it is to him we owe the principle that the distribution of a quantity compounded of independent parts shows a whole series of features - the mean, variance, and other cumulants - which are simply the sums of like features of the distributions of the parts. These seem to have been later discovered independently by Thiele (1889), but mathematically Laplace's methods were more powerful than Thiele's and far more influential on the development of the subject in France and England. A direct result of Laplace's study of the distribution of the resultant of numerous independent causes was the recognition of the normal law of error, a law more usually ascribed, with some reason, to his great contemporary, Gauss.
Gauss, moreover, approached the problem of statistical estimation in an empirical spirit, raising the question of the estimation not only of probabilities but of other quantitative parameters. He perceived the aptness for this purpose of the Method of Maximum Likelihood, although he attempted to derive and justify this method from the principle of inverse probability. The method has been attacked on this ground, but it has no real connection with inverse probability. Gauss, further, perfected the systematic fitting of regression formulae, simple and multiple, by the method of least squares, which, in the cases to which it is appropriate, is a particular example of the method of maximum likelihood.
The first of the distributions characteristic of modern tests of significance, though originating with Helmert, was rediscovered by K Pearson in 1900, for the measure of discrepancy between observation and hypothesis, known as c2. This, I believe, is the great contribution to statistical methods by which the unsurpassed energy of Prof Pearson's work will be remembered. It supplies an exact and objective measure of the joint discrepancy from their expectations of a number of normally distributed, and mutually correlated, variates. In its primary application to frequencies, which are discontinuous variates, the distribution is necessarily only an approximate one, but when small frequencies are excluded the approximation is satisfactory. The distribution is exact for other problems solved later. With respect to frequencies, the apparent goodness of fit is often exaggerated by the inclusion of vacant or nearly vacant classes which contribute little or nothing to the observed c2, but increase its expectation, and by the neglect of the effect on this expectation of adjusting the parameters of the population to fit those of the sample. The need for correction on this score was for long ignored, and later disputed, but is now, I believe, admitted. The chief cause of error tending to lower the apparent goodness of fit is the use of inefficient methods of fitting. This limitation could scarcely have been foreseen in 1900, when the very rudiments of the theory of estimation were unknown.
The study of the exact sampling distributions of statistics commences in 1908 with "Student's" paper The Probable Error of a Mean. Once the true nature of the problem was indicated, a large number of sampling problems were within reach of mathematical solution. "Student" himself gave in this and a subsequent paper the correct solutions for three such problems - the distribution of the estimate of the variance, that of the mean divided by its estimated standard deviation, and that of the estimated correlation coefficient between independent variates. These sufficed to establish the position of the distributions of c2 and of t in the theory of samples, though further work was needed to show how many other problems of testing significance could be reduced to these same two forms, and to the more inclusive distribution of z. "Student's" work was not quickly appreciated, and from the first edition it has been one of the chief purposes of this book to make better known the effect of his researches, and of mathematical work consequent upon them, on the one hand, in refining the traditional doctrine of the theory of errors and mathematical statistics, and on the other, in simplifying the arithmetical processes required in the interpretation of data.