Harold Jeffreys on Probability

Between 1919 and 1923 Harold Jeffreys and Dorothy Wrinch wrote three papers on probability and scientific inference. These are:

D Wrinch and H Jeffreys, On Some Aspects of the Theory of Probability, Philosophical Magazine 38 (1919), 715-731.

D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry, Philosophical Magazine 42 (1921), 369-390,

D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry, Philosophical Magazine 45 (1923), 368-374.

In their views of probability they were influenced by William Ernest Johnson and John Maynard Keynes. Dorothy Wrinch had, in fact, attended lectures by Johnson. Jeffreys used the ideas from these papers in his book Scientific Inference published by Cambridge University Press in 1931. We give below an extract from Jeffreys' book on Probability.


Harold Jeffreys

1. What is probability?

Suppose that a man wishes to catch a train announced to start at 1.00 p.m. When he is a quarter of a mile from the station he looks back and sees that a church clock some distance away indicates 12.55. Will he catch the train?

From previous experience he knows that a quarter of a mile in five minutes means comfortable walking Without wasting time. The distance, with slight exertion, can be done in four minutes. Hence he may reasonably expect to catch the train, especially if he hurries slightly. But he has to get a ticket before he will be admitted to the platform. If he finds nobody waiting at the booking office this is a matter of ten seconds; but if there is a queue of ten people it will take two minutes, and he has no means of knowing which will occur in this case. Again, though the church clock is usually reliable, it has been known on a few occasions to be as much as three minutes slow. If that is so on this occasion, and the train is punctual, his chance of catching the train disappears. On the other hand, if the train is a few minutes late, as sometimes happens, he will catch it even if there is a queue and the clock is slow. Further, there is always the possibility of something quite unforeseen, such as an accident on the line. In that event the 11.14 train may arrive at 1.30 and his problem will be solved.

Now we notice that in this situation the man has some definite information, which is relevant to the proposition "he will catch the train". But numerous other possibilities, none of which he can foresee, are also intensely relevant. Therefore his available knowledge, though relevant to the proposition at issue, is not such as to make it possible to assert definitely that this proposition is true or false. Further, extra data will have a definite effect on his attitude to the proposition. If he meets an astronomer whose watch has just been compared with a wireless time signal, and who assures him that the church clock is accurate, he feels more confident. On the other hand, if a crowded omnibus passes him he expects his worst fears about the queue to be verified. Thus the attitude to the proposition under discussion does not amount to a definite assertion of its truth or falsehood; it is an impression capable of being modified at any time by the acquisition of new knowledge.

Probability expresses a relation between a proposition and a set of data. When the data imply that the proposition is true, the probability is said to amount to certainty; when they imply that it is false, the probability becomes impossibility. All intermediate degrees of probability can arise.

The relation of the laws of science to the data of observation is one of probability. The more facts are in agreement with the inferences from a law, the higher the probability of the law becomes; but a single fact not in agreement may reduce a law, previously practically certain, to the status of an impossible one. A specimen of a practically certain law is Ohm's law for solid conductors. Newton's inverse square law of gravitation first became probable when it was shown to give the correct ratio of gravity at the earth's surface to the acceleration of the moon in its orbit. Its probability increased as it was shown to fit the motions of the planets, satellites, and comets, and those of double stars, with an astonishing degree of accuracy. Leverrier's discovery of the excess motion of the perihelion of Mercury scarcely changed this situation, for the phenomenon was qualitatively explicable by the attraction of the visible matter within Mercury's orbit. Newton's law was first shown to be wrong, as a universal proposition, when it was found that such matter could not actually be present in sufficient quantity to account for the anomalous motion of Mercury.

The fundamental notion of probability is intelligible a priori to everybody, and is regularly used in everyday life. Whenever a man says "I think so" or "I think not" or "I am nearly sure of that" he is speaking in terms of this concept; but an addition has crept in. If three persons are presented with the same set of facts, one may assert that he is nearly certain of a result, another that he believes it probable, while the third will express no opinion at all. This might suggest that probability is a matter of differences between individuals. But an analogous situation arises with regard to purely logical inference. One person, reading the proof of Euclid's fifth proposition, is completely convinced; another is entirely unable to grasp it; while there, is at any rate one case on record when a student said that the author had rendered the result highly probable. Nobody says on this account that logical demonstration is a matter for personal opinion. We say that the proposition is either proved or not proved, and that such differences of opinion are the result of not understanding the proof, either through inherent incapacity or through not having taken the necessary trouble. The logical demonstration is right or wrong as a matter of the logic itself, and is not a matter for personal judgment. We say the same about probability. On a given set of data p we say that a proposition q has in relation to these data one and only one probability. If any person assigns a different probability, he is simply wrong, and for the same reasons as we assign in the case of logical judgments. Personal differences in assigning probabilities in everyday life are not due to any ambiguity in the notion of probability itself, but to mental differences between individuals, to differences in the data available to them, and to differences in the amount of care taken to evaluate the probability.

2. Principles of probability

The mathematical discussion of probability depends on the principle that probabilities can be expressed by means of numbers. This depends in turn on two deeper postulates:
1. If we have two sets of data pp and pp', and two propositions qq and qq', and we consider the probabilities of qq given pp, and of qq' given pp', then whatever p,p,q,qp, p', q, q' may be, the probability of qq given pp is either greater than, equal to, or less than that of qq' given pp'.

2. All propositions impossible on the data have the same probability, which is not greater than any other probability; and all propositions certain on the data have the same probability, which is not less than any other probability.
The relations greater than and less than are transitive; that is, if one probability is greater than a second, and the second greater than a third, then the first probability is greater than the third. If one probability is greater than a second, the second is said to be less than the first; and if neither of two probabilities is greater than the other we say that they are equal. This postulate ensures the existence of a definite order among probabilities, such that each probability follows all smaller ones and precedes all greater ones.

Such an order once established, we can construct a correspondence between probabilities and real numbers, so that to every probability corresponds one and only one number, and so that of every pair of probabilities the less corresponds to the smaller number. When this is done the system of numbers can be used as a scale of reference for probabilities. But the choice is not yet unique. Obviously if x1,x2,...,xnx_{1} , x_{2} , ..., x_{n} are a set of positive numbers in increasing order of magnitude, x12,x22,...,xn2x_{1}^{2}, x_{2}^{2}, ..., x_{n}^{2} are another set, exp(x1),exp(x2),...,exp(xn)exp(x_{1}), exp(x_{2}), ..., exp(x_{n}) a third, x11+x1,x21+x2,...,xn1+xn\large\frac{x_1}{1+x_1}, \frac{x_2}{1+x_2}, ..., \frac{x_n}{1+x_n} a fourth, and any number of such sets can be found, such that if probabilities correspond term by term with the numbers of one set in order of magnitude they will correspond equally well with those of any other set. We need a further rule before we can decide what number to attach to any given probability. Such a rule is a mere method of working, or convention; it expresses no new assumption. We decide that
3. If several propositions are mutually contradictory on the data, the number attached to the probability that some one of them is true shall be the sum of those attached to the probabilities that each separately is true.
If we do this it follows at once that 0 is the number to be attached to a proposition impossible on the data. For consider any three mutually exclusive propositions p,q,r,p, q, r, and suppose we have the further datum that pp is true. The number attached to a proposition impossible on the data being aa, it follows that the numbers attached to qq and rr separately on the data are both aa. Hence, by our rule, since qq and rr are mutually exclusive, the number attached to the proposition that one of them is true is 2a2a. But the proposition "q or r is true" is itself impossible on the data and therefore has the number aa attached to it. Hence 2a=a2a = a, and therefore a=0a = 0.

Again, let us consider any set of mm equally probable and mutually contradictory propositions, and call the number attached to any one of them, on the same data, xx. If we select any tt of them, the number attached to the proposition that one of these tt is true is txtx, by our rule.

Now take t=mt = m, and suppose that on our data there is just one true proposition among the mm, but that we have no means of knowing which it is. The number attached to the proposition that one of the mm propositions is true is mxmx. But on our data this proposition is certain, and therefore mxmx is the number corresponding to certainty, which is a definite constant by Prop. 2. We therefore choose 1 as the constant to be attached to certainty. This is another convention. Thus mx=1mx = 1, and we derive the rule:
4. If m propositions are equally probable on the data and mutually contradictory, and one of them is known to be true, each has the number 1m\large\frac{1}{m}\normalsize associated with it. Further, the proposition that one out of any t of them is true has the number tm\large\frac{t}{m}\normalsize associated with it.
The conditions for the application of this method are practically realizable. Suppose that mm balls, one of them with a characteristic mark on it, but indistinguishable by touch, were placed in a bag and shaken. tt balls are then withdrawn. Then the proposition that any particular ball is the marked one is inconsistent with the proposition that any other is marked, and all such propositions are equally probable. We have therefore a set of equally probable and mutually exclusive propositions, mm in number. Our rule therefore has a practical application. Also mm may be any integer, and tt may be any integer less than mm or equal to it. Hence
5. Any rational proper fraction, including 0 and 1, can be a probability number.
We shall call the class of probabilities expressible by rational fractions RR-probabilities.

It follows from this that any probability can be made to correspond to a real number, rational or irrational. For any given probability PP either corresponds to a rational fraction or does not. In the former case the proposition is granted. In the latter case every RR-probability is either greater or less than PP. Hence PP divides the RR-probabilities into two classes R1R_{1} and RR, such that the probabilities in R1R_{1} are all less than PP and those in RR are all greater than PP. Also, since the relation "greater than" among probabilities is transitive, every fraction corresponding to an RR probability is greater than every fraction corresponding to an R1R_{1} probability. Hence PP determines a cut in the series of rational fractions. But this is precisely the method of defining a real irrational number. when it is specified which rational fractions are on one side of the cut and which on the other side, there is one and only one real number that can occupy the cut. We then associate the probability PP with this number. In this way we arrive at the result:
6. Every probability can be associated with a real number, rational or irrational.
We still have to prove that the results given by our rules are consistent; that is, if a probability PP is greater than another probability QQ, that the number associated with PP by our rules is greater than that associated with QQ. Suppose first that PP and QQ are both RR-probabilities. Then we can find four integers t,m,r,st, m, r, s so that the number associated with PP is tm\large\frac{t}{m}\normalsize and that associated with QQ is rs\large\frac{r}{s}\normalsize. Now consider a class of msms mutually exclusive propositions containing one true one. We may divide them up into mm sets of ss each; one and only one of these sets contains the true proposition. The probability-number that one of tt of these sets contains the true proposition is t/mt/m. But this is also the probability-number that one of tsts propositions selected from the original msms propositions shall be the true one, which by our rule is tsms\large\frac{ts}{ms}\normalsize and equal to tm\large\frac{t}{m}\normalsize, as it should be. Thus tm\large\frac{t}{m}\normalsize is the number associated with the proposition that one out of the tsts alternatives is true; similarly rs\large\frac{r}{s}\normalsize is associated with the proposition that one out of rmrm alternatives is true. If then PP is greater than QQ, the number of alternatives needed to give probability PP must exceed that needed to give probability QQ; therefore tsts is greater than rmrm. But this is equivalent to saying that tm\large\frac{t}{m}\normalsize is greater than rs\large\frac{r}{s}\normalsize; and therefore the greater probability is associated with the greater number.

Consistency is therefore proved for RR-probabilities. For others the result is easily generalized. For if two non-rational probabilities are associated with real numbers aa and bb, of which aa is the greater, we can find a rational fraction tm\large\frac{t}{m}\normalsize lying between them. Then the probability associated with aa is greater than that associated with tm\large\frac{t}{m}\normalsize, and that associated with tm\large\frac{t}{m}\normalsize is greater than that associated with bb. Hence, in virtue of the transitive property of the relation more probable than, the probability associated with aa is greater than that associated with bb. In other words, the greater number corresponds to the greater probability.

We have seen how definite numbers can be associated with probabilities, so that the higher number always corresponds to the higher probability. In consequence of our fundamental assumption our rules always imply the existence of a definite probability-number. The rules, as we stated before, are conventions and not hypotheses; for if the probability-number assigned by our rules is xx, any function of xx that always increases with xx would satisfy the fundamental assumption. But the choice that we have made seems to be far the most convenient. Henceforth we shall have no need to speak of probabilities apart from their associated numbers, and when we speak of the probability of a proposition on given data we shall mean the number associated with the probability by our rules.

Last Updated August 2007