Harold Jeffreys on Probability

Between 1919 and 1923 Harold Jeffreys and Dorothy Wrinch wrote three papers on probability and scientific inference. These are:

D Wrinch and H Jeffreys, On Some Aspects of the Theory of Probability, Philosophical Magazine 38 (1919), 715-731.

D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry, Philosophical Magazine 42 (1921), 369-390,

D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry, Philosophical Magazine 45 (1923), 368-374.

In their views of probability they were influenced by William Ernest Johnson and John Maynard Keynes. Dorothy Wrinch had, in fact, attended lectures by Johnson. Jeffreys used the ideas from these papers in his book Scientific Inference published by Cambridge University Press in 1931. We give below an extract from Jeffreys' book on Probability.

Probability

by
Harold Jeffreys

1. What is probability?

Suppose that a man wishes to catch a train announced to start at 1.00 p.m. When he is a quarter of a mile from the station he looks back and sees that a church clock some distance away indicates 12.55. Will he catch the train?

From previous experience he knows that a quarter of a mile in five minutes means comfortable walking Without wasting time. The distance, with slight exertion, can be done in four minutes. Hence he may reasonably expect to catch the train, especially if he hurries slightly. But he has to get a ticket before he will be admitted to the platform. If he finds nobody waiting at the booking office this is a matter of ten seconds; but if there is a queue of ten people it will take two minutes, and he has no means of knowing which will occur in this case. Again, though the church clock is usually reliable, it has been known on a few occasions to be as much as three minutes slow. If that is so on this occasion, and the train is punctual, his chance of catching the train disappears. On the other hand, if the train is a few minutes late, as sometimes happens, he will catch it even if there is a queue and the clock is slow. Further, there is always the possibility of something quite unforeseen, such as an accident on the line. In that event the 11.14 train may arrive at 1.30 and his problem will be solved.

Now we notice that in this situation the man has some definite information, which is relevant to the proposition "he will catch the train". But numerous other possibilities, none of which he can foresee, are also intensely relevant. Therefore his available knowledge, though relevant to the proposition at issue, is not such as to make it possible to assert definitely that this proposition is true or false. Further, extra data will have a definite effect on his attitude to the proposition. If he meets an astronomer whose watch has just been compared with a wireless time signal, and who assures him that the church clock is accurate, he feels more confident. On the other hand, if a crowded omnibus passes him he expects his worst fears about the queue to be verified. Thus the attitude to the proposition under discussion does not amount to a definite assertion of its truth or falsehood; it is an impression capable of being modified at any time by the acquisition of new knowledge.

Probability expresses a relation between a proposition and a set of data. When the data imply that the proposition is true, the probability is said to amount to certainty; when they imply that it is false, the probability becomes impossibility. All intermediate degrees of probability can arise.

The relation of the laws of science to the data of observation is one of probability. The more facts are in agreement with the inferences from a law, the higher the probability of the law becomes; but a single fact not in agreement may reduce a law, previously practically certain, to the status of an impossible one. A specimen of a practically certain law is Ohm's law for solid conductors. Newton's inverse square law of gravitation first became probable when it was shown to give the correct ratio of gravity at the earth's surface to the acceleration of the moon in its orbit. Its probability increased as it was shown to fit the motions of the planets, satellites, and comets, and those of double stars, with an astonishing degree of accuracy. Leverrier's discovery of the excess motion of the perihelion of Mercury scarcely changed this situation, for the phenomenon was qualitatively explicable by the attraction of the visible matter within Mercury's orbit. Newton's law was first shown to be wrong, as a universal proposition, when it was found that such matter could not actually be present in sufficient quantity to account for the anomalous motion of Mercury.

The fundamental notion of probability is intelligible a priori to everybody, and is regularly used in everyday life. Whenever a man says "I think so" or "I think not" or "I am nearly sure of that" he is speaking in terms of this concept; but an addition has crept in. If three persons are presented with the same set of facts, one may assert that he is nearly certain of a result, another that he believes it probable, while the third will express no opinion at all. This might suggest that probability is a matter of differences between individuals. But an analogous situation arises with regard to purely logical inference. One person, reading the proof of Euclid's fifth proposition, is completely convinced; another is entirely unable to grasp it; while there, is at any rate one case on record when a student said that the author had rendered the result highly probable. Nobody says on this account that logical demonstration is a matter for personal opinion. We say that the proposition is either proved or not proved, and that such differences of opinion are the result of not understanding the proof, either through inherent incapacity or through not having taken the necessary trouble. The logical demonstration is right or wrong as a matter of the logic itself, and is not a matter for personal judgment. We say the same about probability. On a given set of data p we say that a proposition q has in relation to these data one and only one probability. If any person assigns a different probability, he is simply wrong, and for the same reasons as we assign in the case of logical judgments. Personal differences in assigning probabilities in everyday life are not due to any ambiguity in the notion of probability itself, but to mental differences between individuals, to differences in the data available to them, and to differences in the amount of care taken to evaluate the probability.

2. Principles of probability

The mathematical discussion of probability depends on the principle that probabilities can be expressed by means of numbers. This depends in turn on two deeper postulates:

1. If we have two sets of data $p$ and $p'$ , and two propositions $q$ and $q'$ , and we consider the probabilities of $q$ given $p$ , and of $q'$ given $p'$ , then whatever $p, p', q, q'$ may be, the probability of $q$ given $p$ is either greater than, equal to, or less than that of $q'$ given $p'$ .

2. All propositions impossible on the data have the same probability, which is not greater than any other probability; and all propositions certain on the data have the same probability, which is not less than any other probability.

The relations greater than and less than are transitive; that is, if one probability is greater than a second, and the second greater than a third, then the first probability is greater than the third. If one probability is greater than a second, the second is said to be less than the first; and if neither of two probabilities is greater than the other we say that they are equal. This postulate ensures the existence of a definite order among probabilities, such that each probability follows all smaller ones and precedes all greater ones.

Such an order once established, we can construct a correspondence between probabilities and real numbers, so that to every probability corresponds one and only one number, and so that of every pair of probabilities the less corresponds to the smaller number. When this is done the system of numbers can be used as a scale of reference for probabilities. But the choice is not yet unique. Obviously if

x_{1} , x_{2} , ..., x_{n}

are a set of positive numbers in increasing order of magnitude,

x_{1}^{2}, x_{2}^{2}, ..., x_{n}^{2}

are another set,

exp(x_{1}), exp(x_{2}), ..., exp(x_{n})

a third,

\large\frac{x_1}{1+x_1}, \frac{x_2}{1+x_2}, ..., \frac{x_n}{1+x_n}

a fourth, and any number of such sets can be found, such that if probabilities correspond term by term with the numbers of one set in order of magnitude they will correspond equally well with those of any other set. We need a further rule before we can decide what number to attach to any given probability. Such a rule is a mere method of working, or convention; it expresses no new assumption. We decide that

3. If several propositions are mutually contradictory on the data, the number attached to the probability that some one of them is true shall be the sum of those attached to the probabilities that each separately is true.

If we do this it follows at once that 0 is the number to be attached to a proposition impossible on the data. For consider any three mutually exclusive propositions

p, q, r,

and suppose we have the further datum that

p

is true. The number attached to a proposition impossible on the data being

a

, it follows that the numbers attached to

q

and

r

separately on the data are both

a

. Hence, by our rule, since

q

and

r

are mutually exclusive, the number attached to the proposition that one of them is true is

2a

. But the proposition "q or r is true" is itself impossible on the data and therefore has the number

a

attached to it. Hence

2a = a

, and therefore

a = 0

.

Again, let us consider any set of

m

equally probable and mutually contradictory propositions, and call the number attached to any one of them, on the same data,

x

. If we select any

t

of them, the number attached to the proposition that one of these

t

is true is

tx

, by our rule.

Now take

t = m

, and suppose that on our data there is just one true proposition among the

m

, but that we have no means of knowing which it is. The number attached to the proposition that one of the

m

propositions is true is

mx

. But on our data this proposition is certain, and therefore

mx

is the number corresponding to certainty, which is a definite constant by Prop. 2. We therefore choose 1 as the constant to be attached to certainty. This is another convention. Thus

mx = 1

, and we derive the rule:

4. If m propositions are equally probable on the data and mutually contradictory, and one of them is known to be true, each has the number $\large\frac{1}{m}\normalsize$ associated with it. Further, the proposition that one out of any t of them is true has the number $\large\frac{t}{m}\normalsize$ associated with it.

The conditions for the application of this method are practically realizable. Suppose that

m

balls, one of them with a characteristic mark on it, but indistinguishable by touch, were placed in a bag and shaken.

t

balls are then withdrawn. Then the proposition that any particular ball is the marked one is inconsistent with the proposition that any other is marked, and all such propositions are equally probable. We have therefore a set of equally probable and mutually exclusive propositions,

m

in number. Our rule therefore has a practical application. Also

m

may be any integer, and

t

may be any integer less than

m

or equal to it. Hence

5. Any rational proper fraction, including 0 and 1, can be a probability number.

We shall call the class of probabilities expressible by rational fractions

R

-probabilities.

It follows from this that any probability can be made to correspond to a real number, rational or irrational. For any given probability

P

either corresponds to a rational fraction or does not. In the former case the proposition is granted. In the latter case every

R

-probability is either greater or less than

P

. Hence

P

divides the

R

-probabilities into two classes

R_{1}

and

R

, such that the probabilities in

R_{1}

are all less than

P

and those in

R

are all greater than

P

. Also, since the relation "greater than" among probabilities is transitive, every fraction corresponding to an

R

probability is greater than every fraction corresponding to an

R_{1}

probability. Hence

P

determines a cut in the series of rational fractions. But this is precisely the method of defining a real irrational number. when it is specified which rational fractions are on one side of the cut and which on the other side, there is one and only one real number that can occupy the cut. We then associate the probability

P

with this number. In this way we arrive at the result:

6. Every probability can be associated with a real number, rational or irrational.

We still have to prove that the results given by our rules are consistent; that is, if a probability

P

is greater than another probability

Q

, that the number associated with

P

by our rules is greater than that associated with

Q

. Suppose first that

P

and

Q

are both

R

-probabilities. Then we can find four integers

t, m, r, s

so that the number associated with

P

\large\frac{t}{m}\normalsize

and that associated with

Q

\large\frac{r}{s}\normalsize

. Now consider a class of

ms

mutually exclusive propositions containing one true one. We may divide them up into

m

sets of

s

each; one and only one of these sets contains the true proposition. The probability-number that one of

t

of these sets contains the true proposition is

t/m

. But this is also the probability-number that one of

ts

propositions selected from the original

ms

propositions shall be the true one, which by our rule is

\large\frac{ts}{ms}\normalsize

and equal to

\large\frac{t}{m}\normalsize

, as it should be. Thus

\large\frac{t}{m}\normalsize

is the number associated with the proposition that one out of the

ts

alternatives is true; similarly

\large\frac{r}{s}\normalsize

is associated with the proposition that one out of

rm

alternatives is true. If then

P

is greater than

Q

, the number of alternatives needed to give probability

P

must exceed that needed to give probability

Q

; therefore

ts

is greater than

rm

. But this is equivalent to saying that

\large\frac{t}{m}\normalsize

is greater than

\large\frac{r}{s}\normalsize

; and therefore the greater probability is associated with the greater number.

Consistency is therefore proved for

R

-probabilities. For others the result is easily generalized. For if two non-rational probabilities are associated with real numbers

a

and

b

, of which

a

is the greater, we can find a rational fraction

\large\frac{t}{m}\normalsize

lying between them. Then the probability associated with

a

is greater than that associated with

\large\frac{t}{m}\normalsize

, and that associated with

\large\frac{t}{m}\normalsize

is greater than that associated with

b

. Hence, in virtue of the transitive property of the relation more probable than, the probability associated with

a

is greater than that associated with

b

. In other words, the greater number corresponds to the greater probability.

We have seen how definite numbers can be associated with probabilities, so that the higher number always corresponds to the higher probability. In consequence of our fundamental assumption our rules always imply the existence of a definite probability-number. The rules, as we stated before, are conventions and not hypotheses; for if the probability-number assigned by our rules is

x

, any function of

x

that always increases with

x

would satisfy the fundamental assumption. But the choice that we have made seems to be far the most convenient. Henceforth we shall have no need to speak of probabilities apart from their associated numbers, and when we speak of the probability of a proposition on given data we shall mean the number associated with the probability by our rules.

Last Updated August 2007