# Aitken: Statistical Mathematics

The Oliver and Boyd series of mathematical texts were widely used by students throughout the 1940s to 1960s. They were sold at a price that students could afford and tended to cover the right amount of material for a lecture coure. One of the books in the series was Statistical Mathematics by A C Aitken. Below we give the title page, and Aitken's Introduction to the little book taken from the 1939 edition:

Statistical Mathematics
by
A C Aitken

Oliver and Boyd
Edinburgh and London
1939

STATISTICS AS A SCIENCE: AXIOMS OF PROBABILITY

1. Introductory.

The word "statistics" is defined in the Concise Oxford Dictionary as follows: in the plural, "numerical facts systematically collected, as statistics of population, crime"; in the singular, "science of collecting, classifying and using statistics." This definition adequately conveys the present meaning of the word; but the term was once restricted, as its derivation shows, to systematic collections of data descriptive of political communities, a domain partly taken over now by the more special word "demography."

The word statistics (in the plural) is used nowadays to characterize "numerical facts systematically collected" in any field whatever of observation or experiment. The technique of collecting data and the principles to be heeded in order to avoid bias in the interpretation are described at length and exemplified in chapters of more extensive treatises which the reader may consult. He may also form a general idea of practical details by studying the prefatory description of method in some actual published investigation, for example into housing and economic conditions in a particular town or area. In any case the principles to be observed in arranging a statistical investigation can be thoroughly grasped only when the analysis used to interpret the data is well understood; and this involves a knowledge of the Science of statistics (in the singular).

The intermediate stage of tabulation, by which collected data are set out in the most perspicuous form for analysis or inspection with a particular aim, is also usually the subject of a chapter, with illustrative examples and criticisms, in larger treatises than the present one. Here again the reader may learn much from the attentive perusal of statistical year-books and similar publications, and from the results tabulated in other published investigations. The principles are those of logical classification of different categories; and the art of tabulation rests in making the relation of the categories and the numbers in various categories as clear as possible to the eye yet compact on the printed page. Thus one may have statistics of employed persons according to age, sex, district, trade and wage; how can the respective numbers best be set out in one or more tables with rows and columns, row-totals, column-totals, sub-totals and grand totals? This is a typical problem of tabulation, and the chief aids towards resolving it rest on experience and common sense.

Statistics involves classification by number in categories. Let us note for further reference the possible relations of individuals in two categories $A$ and $B$. It may be that an individual of the collection cannot be both $A$ and $B$ at the same time; for example if a coin falls "heads," it certainly has not fallen "tails." The categories $A$ and $B$ are then mutually exclusive; their relation is that of "either ... or." On the other hand, the categories $A$ and $B$ may be of such a kind that an individual may belong to both at the same time; the relation of such categories is that of "both ... and."

2. Statistics as a Science.

The concern of the present book will for the most part be with statistics (in the singular) as a science. The typical order of development of the "exact" sciences (as they are somewhat loosely called) has been along the following lines. First of all, the examination of data collected in a particular field of inquiry is found to disclose elements of regularity, suggesting a law or laws. This is the stage of inductive synthesis. These laws are expressed, if possible, in the form of logical or numerical axioms, resembling those of Euclidean geometry. The methods of logic and mathematics are then brought into play to develop the consequences of the axioms, producing an assemblage of theorems or propositions. This department of the science, namely the posing of axioms and the deduction of theorems, is usually called the pure branch of the science. Even if future observations should invalidate the axioms extrinsically, the discrepancies between theory and fact being too great to be explained away, these axioms and the deductions based on them would still have an abstract validity, as a logical structure of propositions exempt from self-contradiction; but for the description and explanation of the phenomena a new set of axioms would have to be found. On the other side, the corroborative part of the science consists in interpreting the abstract functions, formulae, equations, constants, invariants and the like, which occur in the pure formulation, as measures and measurable relations of actual phenomena, or numbers constructed from those measures in a definite way. This interpretative discipline constitutes the applied branch of the science.

Such a division or dichotomy into pure and applied can be recognized in almost any science. A good example is Newtonian dynamics, according to which the motions of all bodies in the universe were presumed to obey certain axioms and postulates, namely Newton's laws of force and motion and the law of gravitation. Later experiments, more numerous, more delicate, more comprehensive, suggested that this formulation, though describing almost all observed dynamical phenomena with a precision unprecedented in history, did not sufficiently account for certain exceptional facts, such as the precession of the perihelion of Mercury. The discrepancies between prediction and actuality were extraordinarily small, but they were persistent. There thus arose a theory, or rather a succession of supplementary theories, of relativity, formulated on a new axiomatic basis by which the discrepancies of the earlier one might be reconciled, or removed. This reformulation of hypotheses still proceeds, is still incomplete, and undergoes modification from time to time.

What is the axiomatic basis of the science of statistics, and what are the facts upon which the inductive synthesis is based? The facts are certain regularities which have been observed in the proportionate frequency with which certain simple events happen or do not happen, when the circumstances under which they may occur are reconstructed again and again in repeated trials; and the axioms, and the structure of theorems founded upon them, constitute the subject called mathematical probability. As for the facts, anyone who is interested can collect a few for himself. Spin an ordinary coin a large number of times, and one can hardly fail to notice that the proportions of heads and of tails are very nearly equal; or shake a well-made die repeatedly from a dice-box and one will find that after many trials each face of the die has turned up in about one-sixth of the total number of trials.

Example.

The reader is recommended to experiment with simple repeated trials of this kind, and for future reference to record the results in sequence, in the order in which they occur. For example, the record of spins of a coin might be
00101 01110 01101 00001 10111 ...

or the like, where "1" denotes "heads," and "0" "tails."

It is instinctive to look for some cause for this approximate equality of frequency in heads and tails, and natural to locate this cause as somehow resident in the two-sided nature and appreciable symmetry of the coin; or to ascribe the approximate equality of frequency of the faces of the die to its six-sided and nearly uniform configuration. Simple ideas such as these suggest by generalization and abstraction the axioms of probability; but the choice of axioms may be made in various ways, which lead to different formulations of the theory of probability.

3. Survey of Various Definitions of Probability.

No single particular definition of probability has so far met with predominating acceptance. The requisites of a satisfactory basis would be these: breadth of application, sufficient closeness to the intuitions in which the concept originates, and freedom from excessive complexity or abstruseness. No theory as yet proposed has been able to make these requisites compatible. We may survey some contrasting standpoints.

Probability as the Logic of Uncertain Inference.
One view is that probability may be regarded as a kind of extension of classical logic, an extension conveniently described as the "logic of uncertain inference." This view has been expounded by J M Keynes in A Treatise on Probability (London, 1921), especially in Part II, Chapters X-XVII, where references to earlier expositions are given. Probability is here regarded as "the degree of our rational belief" in the truth of a given proposition, such belief being contingent on a body of relevant knowledge. A logical algebra is developed, but the theorems are stated in symbolic, not in numerical or metrical terms, and can be applied to the objective problems of statistics only by an abrupt and dubious transition from the symbolic to the metrical.

Probability à Priori, and Probability as Relative Frequency.
As our simple illustrations of the coin and the die have suggested, the crude intuition of probability rests on the observation that when a given set of circumstances $S$, such as a symmetrical coin spun rapidly, has been present on numerous occasions in the past, it has been associated in a nearly constant proportion of those occasions with some event $E$, such as the fall of "heads."

The apriorist theory directs attention to the set of circumstances S, or rather to the invariant part of S. In many spins of a coin or die something remains unchanged, namely those properties which describe the coin or die as a rigid constant configuration. The apriorist will regard the probabilities of falls 1, 2, 3, 4, 5, 6 of a die as some part of the description of the die, as measuring indeed some quality resident in the structure of the die, before any spinning is performed. Now the classical a priori definition took account only of a very limited class of "systems" $S$, namely those possessing symmetry, in the sense that the different aspects (such as faces 1, 2, 3, 4, 5, 6 of the die) were presumed physically indistinguishable. Such an assumption is an idealization of the facts, for we can never hope to test completely the symmetry of any actual coin or die; not only would the tests be infinitely many and impossibly delicate, but the concept of the rigidity and permanence in time of a material body is not sustained by modern physics. However, symmetry being presumed, the six faces 1, 2, 3, 4, 5, 6 were characterized as "equally likely" to be found uppermost after any throw, and the probability of 1 /6 was attributed to each of these "events." More generally, if $n$ equally likely aspects of a proposed system $S$ were discriminated, $m$ of these being favourable to the event $E$, the probability of $E$ with respect to $S$ was defined as $p(E ; S) = \large\frac{m}{n}\normalsize$.

Criticism is easy. The logician will not fail to pounce upon the words "equally likely," pointing out that they are synonymous with "equally probable", and that therefore probability is being defined by what is probable, a circulus in definiendo being thus committed. Postponing the defence, we may pass on to inquire what could be the definition of probability, should the tests have disclosed asymmetry in $S$. The inquiry is most pertinent, for the heterogeneous and the asymmetrical are the prevalent order of nature, the homogeneous and the symmetrical being the exception. One has no difficulty for example in conceiving a die which might be an irregular hexahedron, heterogeneous in density and with non-parallel and unequal opposite edges and faces. Such dice, and more complicated asymmetrical systems, have been subjected to repeated trials, which have shown a tendency of relative frequency of falls towards a constancy resembling that observed in symmetrical systems.

Stability of Relative Frequency.
Another view from the angle of "common sense," in some respects antithetical to the view just mentioned, is the frequency view. Here the invariability of the configurative part of $S$, whether symmetrical or unsymmetrical, is tacitly assumed, and attention is concentrated upon the sequence of trials, and the incidence of $E$ in these. For example, the die is thrown again and again. When $E$ occurs, let us write 1; when $E$ does not occur, let us write 0. A succession of $n$ trials then gives a sequence
$A = a_{1} a_{2} a_{3} a_{4} ... a_{n}$ ,      (1)

each $a_{j}$ , being 1 or 0.

Let $m$ be the number of 1's in this sequence. A very limited experience, such as spinning a coin or die 10 times on several occasions, will show that in a finite number $n$ of trials made upon the same system $S$ on two or more occasions, different values of $m$ are not only possible but usual. Thus, if $E$ is the throw of an ace with a single die, 100 throws may on one occasion give $m = 15$ and on another occasion give $m = 20$. It follows that in order to define a probability $p(E ; S)$ which shall be unique and not discordant with experience, we must idealize once again, postulating a limiting process as $n$ tends to infinity and writing
$lim \large\frac{m}{n}\normalsize = p (E; S)$.     (2)

where the limit is taken as $n --> \infty$.

This is in fact a definition, supported by a certain school of statisticians, based upon the limit of frequency ratio or relative frequency m/n. Though at first sight attractive, it fades a little on scrutiny. Granted the postulate of this limit $p$ for one sequence of trials upon $S$, can we accept the more stringent postulate that the same limiting value $p$ is obtained for any other infinite sequence of trials on $S$? Not without further assumptions, for one might imagine a mechanism sufficiently delicate to throw heads with a coin, or an ace with a die, on almost all occasions. There is therefore some restriction on the manner of throwing, or on the initial state of $S$. This restriction is usually stated in the form of a condition that successive throws must be "random," but this merely transfers the burden of explanation to a new and undefined concept, "randomness." To discuss various attempts to define randomness would take us too far afield. It is easy to say that randomness is absence of any law; but what is "law" in this connexion?

Another difficulty is that the tendency of relative frequency $\large\frac{m}{n}\normalsize$ towards a limit $p$ is different in nature from the corresponding tendency to a limit which mathematicians have discerned and used in the infinite sequences of mathematical analysis. To take a classical example, in the sequence defining a certain simple geometric series,
1, 1 - 1 /2 , 1 - 1 /2 + 1 /4 , 1 - 1 /2 + 1 /4 - 1 /8 , ... .     (3)

the deviations of the successive terms from 2 /3 are respectively 1 /3, -1 /6 , 1 /12 , -1 /24 , ... , each being numerically half its predecessor, so that, given a small number $\epsilon$, such as 1 /1000000 , we can always find some term sufficiently far along the sequence, after and including which all terms deviate from 2 /3 by less than $\epsilon$. Thus 2 /3 is the limit of this sequence. But what can be asserted concerning the sign and magnitude of the deviation $\alpha_{n}$ considered as a function of $n$, in
$\alpha_{n} = \large\frac{m}{n}\normalsize - p(E ; S)$?

It would seem that the only kind of assertion about $\alpha_{n}$ which would carry conviction would itself involve somewhere the notion of probability; and here the risk of committing a circle in definition again raises its head.

It should be added that the chief defects of the approach to probability by limit of frequency ratio have lately been removed by the work of von Mises, Copeland, Dörge, Wald and others. These writers admit only certain sequences $A$ of suitable postulated properties, including that of limiting ratio; but some logical difficulties remain, and the modified formulations lose the primitive simplicity in which they originated.

It would seem, however, that a more natural course, and one more in line with the general method of science, would be to try to explain the effect, namely the relative frequency of $E$, by an analysis of the cause, namely the system $S$. This suggests a return to the a priori standpoint; and it may be noted that several authors at the present time, Fréchet, Kolmogorov, Cramer and others, have been independently engaged in rehabilitating the a priori definition by furnishing it with a better axiomatic basis.

4. Probability as Measure of a Sub-Aggregate.
Let us examine more closely the system $S$, keeping some simple system such as a coin or die in mind. The approximately constant element in our sequences $A$, namely the almost stable frequency ratio of $E$, must reflect - at least so our intuition suggests - the constant element of $S$, such as the rigid configuration of a coin or die; the irregularity which we name randomness doubtless reflects the variable part of $S$, such as the initial position, velocity and angular velocity of projection. What is $S$ when an unsymmetrical and heterogeneous die is spun and falls? It consists of
(i) the die, specified as a particular constant rigid body,
(ii) the floor or table on which it may impinge or finally rest,
(iii) the surrounding air, and so on; together with
(iv) the circumstances of projection, described by coordinates of initial position, momentum and angular momentum.

The coordinates specifying the rigidity of the die and the configuration of the table or floor are constant components of S, the other initial coordinates of $S$ are variable. The set of coordinates of $S$ at the instant of projection may be called the initial phase. Each variable coordinate, such as the initial position, or the initial momentum, has a certain field of variation. Hence we must assume a set of possible phases which, if they can be enumerated in some order, may be designated by $S_{1} , S_{2} , ..., S_{j} , ...$; and this ensemble of possible initial phases $S_{j}$ constitutes an aggregate $S$ of the kind specially studied in pure mathematics. [We use the same letter $S$ as before, regarding the system now as the totality of its possible phases.] If dynamical determinism be assumed, but not otherwise, the initial phase will decide whether or not the event E will occur. Consequently the possible initial phases may be classified as $E$-phases or not-$E$-phases (let us say $Ê$-phases), so that the whole phase aggregate is divided into two sub-aggregates. Now the question of assigning a measure to such aggregates has been deeply studied in modern pure mathematics, the guiding idea being that of extending as widely as possible the scope of a concept familiar in simple cases, namely the cardinal number of a finite set of objects, the length of a line, the area of a surface, the volume of a solid. If $M$ is the measure of the whole aggregate $S$ of possible phases, and $pM$ the measure of the aggregate of $E$-phases contained in it, then $p$ is the probability $p(E ; S)$.

Something has been glossed over here; there is the tacit assumption that the initial phases are "equally likely." But let us insist that the question of equal likeliness is not one for the abstract formulation at all; for to specify the aggregate is in effect to say that its elements, the initial phases, are equally likely. For example, if the aggregate were of points on a continuous line segment, and the measure were ordinary length, then we have implied in this description that all points in the segment are equally likely. On the other hand, the question of equal likeliness is crucial in the application to experiment or observation, that is, in applied statistics, where a wrong choice of the aggregate may alter all the probabilities. This has long been known in problems of so-called geometrical probability. For example, given a circle, let a chord be drawn across it at random: what is the probability that the length of the chord exceeds half the diameter? It depends entirely on the manner in which the chord is drawn. If it is done by taking a point on the circumference and then drawing the chord at any angle, all angles being thus supposed equally likely, then the probability is 2 /3 ; but if it is done by taking any diameter and drawing the chord at right angles to any point taken in the diameter, the diameters and points being equally likely, then the probability is 3 /2 .

The inclusion of the words "equally likely" in a definition is in fact a concession; it puts the reader more gently at terms with the abstract formulation by anticipating its chief future application. The usage is not uncommon. When a point is defined as "that which has position but no magnitude" the same appeal is made to an application, but the same suspicion of a circle in definition is incurred, for how can position be defined without the notion of a point? And if a straight line is defined as "lying evenly" between its extreme points, what else does "evenly" mean but "in a straight line"? Every definition which is not pure abstraction must appeal somewhere to intuition or experience by using some such verbal counter as "point," "straight line" or "equally likely," under the stigma of seeming to commit a circle in definition.

This prologue, though it has omitted many subtler points which could be amplified at very great length, must now be cut short.
To summarize:
(i) events $E$ are conceived as associated with, or caused by, phases $S$, of circumstances;
(ii) each $S_{j}$ gives rise unambiguously either to $E$ or to $Ê$;
(iii) the phases $S_{j}$ form in their totality a set or aggregate $S$, of which the phases favourable to $E$, and those favourable to $Ê$, form complementary subsets;
(iv) a measure $M$ can be given to the whole set $S$, and if $pM$ is the measure of the subset favourable to $E$, then $p$ is the probability $p(E ; S)$ of $E$ with respect to $S$;
(v) the question of equal likeliness of phases is the same as the question of specifying the aggregate and its measure, and in practical applications this must be determined by the circumstances of the particular problem. Let us finally add that the word phase can be extended to include coordinates other than dynamical ones; also that the name "fundamental probability set" is used by some writers for the set $S$ of phases $S_{j}$ .

Last Updated July 2008