The foundations of the calculus of probabilities
Paul Lévy delivered the lecture Les fondements du calcul des probabilités to an audience at the École Polytechnique. Much of the material is similar to that in Lévy's article with the same title which appeared in Dialéctica in June 1949. We give an English translation of part of Lévy's lecture to the École Polytechnique, which was published in French in Revue de Métaphysique et de Morale 59 (2) (1954), 164-179.
The foundations of the calculus of probabilities
- Laplace ended the Introduction to his Treatise on Probability with a reflection that cannot be repeated too often: "The theory of probability is, basically, only common sense reduced to a calculus: it allows us to appreciate with precision this that righteous spirits feel by a sort of instinct, often without being able to realise it."
Unfortunately, common sense is not the most widely shared thing in the world, and arguably no one can boast of having infallible common sense. Laplace himself, who did not lack this in general, proved it by following Condorcet in erroneous considerations on judicial errors. Both had forgotten that the errors most to be feared result from circumstances which deceive all, or nearly all of the judges at the same time; they are systematic errors, which do not relate to the calculation of probabilities.
Since Laplace, other eminent scientists who were also men of great sense, Cournot, Joseph Bertrand, Henri Poincaré, and more recently Émile Borel, have exposed the foundations of the calculus of probability. There is probably not much original left to say on this subject. It is all the same useful to talk about it again, because interpretation errors are still frequent. Martingale researchers forget that, according to Joseph Bertrand's formula, the ball has neither consciousness nor memory. It is not, however, to combat such a gross error that I decided to speak to you today about these problems, but because, for the explanation and the justification of axioms, there are two theories involved, the empiricist theory and the rationalist theory, and that each has supporters who do not disarm. Yet the question is not much more complex than that of the foundations of geometry, and, to highlight the analogy that exists, despite undeniable differences between the logical frameworks of geometry and the calculus of probability, let us first recall the today uncontested character of the axioms of geometry.
1° There is, above all, a fundamental axiom: the displacement of an invariable solid is possible, and this in all directions, that is to say that in three-dimensional space the possible displacements form a continuous group with six parameters.
2° There are three geometries compatible with this axiom, and an additional axiom is needed to characterise Euclidean geometry, which, as a first approximation, is that of the world in which we live.
3° A more complete discussion is necessary to know if one or the other of these geometries corresponds to reality.
We know what are the conclusions of modern science. Not only does there not exist a rigorously invariable solid, but we cannot speak of space independently of time. Classical geometry is only a convenient approximation.
- Coming now to the calculus of probabilities, we will follow a similar order by first studying the origin and mathematical formulation of axioms, and then discussing their practical value. From the first point of view, it is indisputable that the notion of probability made its appearance thanks to games of chance. Their study was what the observation of solid bodies was for geometry. They always involve experiments in each of which there are a number of possible cases which are said to be equally probable.
To arrive at the first axiom of mathematical theory, we do not need to know what this means. The probability is a certain aggregate which, in the case of games of chance, is assumed to be evenly distributed among the different possible cases. In the study of other problems, it will be assumed to be unequally distributed among the different possible cases. If there is an uncountable infinity of possible cases, the distribution can be continuous, that is to say that each isolated case will have zero probability, and we can only define a law of probability by giving ourselves the probabilities of certain groups of cases. In the most general case, probability is therefore a set function, defined, if not for all conceivable sets of possible cases, but for a family of sets that we can call measurable. This function must always be positive, equal to unity for the set of all possible cases, and completely additive; this last condition is exactly the equivalent of the classic principle of total probabilities: if several events or a countable infinity of events are mutually exclusive, i.e. if they correspond to groups of cases having in pairs no common case, the probability that one of them is realised is obtained by adding their probabilities. Modern analysis, by introducing the expression additive set function, has basically added nothing to this old principle.
- This first principle is not sufficient to distinguish the probability of any physical quantity capable of being decomposed into finite or infinitesimal elements. The real basis of the calculation of probabilities is the second principle or principle of compound probabilities, which makes it possible to replace the set of two experiments by a single experiment: if the first experiment is likely to carry out an event , and the second an event , the probability that and are both realised is being the probability of , and the conditional probability of , if is realised. If the two experiments are independent (a case which is often the only one considered), is simply the probability of . But the notion of conditional probability is also clear when the two experiments are successive, and the law of probability of the second depends on the result of the first. It should only be observed that can be the union of cases , of probabilities , and to which correspond for conditional probabilities ; the probability that and are both realised then results from the combined application of the two principles: it is , the symbol being able to represent an integral.
If the two experiments are not successive, the meaning of is not a priori so clear. We then have an interest in first giving ourselves the law of compound probability, on which the set of the two experiments depends, and we define by equalling to the probability that and are both realised. The principle of compound probabilities then remains true, because we define conditional probability so that it remains true.
No new principle is necessary to pass from the case of two experiments to the case of a finite number or of a countable infinity of experiments, if they follow one another in time. These cases were for a long time the almost exclusive object of the work of probability. But a new chapter in science was born thirty years ago and has developed rapidly; it is the study of stochastic processes, in which time is a continuous variable. Chance intervenes, or can intervene, at any moment, and we cannot limit ourselves to considering a series of distinct moments. But it is difficult to reason from the start on this continuous intervention of chance. We are thus led to first consider a series of increasing values of time, then to go back to consider more and more intermediate instants, and that interpolation is essentially based on the definition of the notion of conditional probability starting from the compound probability (given itself a priori, or else obtained by an earlier application of the second principle).
We cannot present today the mathematical development of these theories. It may be the subject of subsequent lectures. Our current object being to discuss the foundations of the calculus of probabilities, we need only note that, mathematically, everything derives from the initial statement of the second principle, relating to the case of two successive experiments; it is in this case that it is a question of justifying it. We do not rule out any essential difficulty by limiting ourselves, as we are going to do, to the case where the second experiment includes a finite number of cases, all equally probable.
- It is here that we must distinguish the two theories of which we have spoken. Rationalist theory is based on a subjective notion: two cases seem to me equally probable if I see no reason to expect one rather than the other. It is then obvious that if, having learned the realisation of , I consider two cases and as equally probable, it follows that before being informed of this realisation, I had to consider the two successions and as equally probable. The principle of compound probabilities can easily be deduced from this remark.
I am ill-qualified to present empiricist theory, which I have never fully understood. What is certain is that its supporters dismiss all subjective notions, and only want to talk about what is verifiable. Probability then only has meaning if it can be defined as the limit frequency in the course of an unlimited series of experiments. It is this series, considered as a whole, which becomes the basis of the theory, and, once admitted that probability is a limit frequency, the principle of compound probabilities becomes evident.
- I will say immediately what seems to me to constitute an essential objection to this theory: a collection of objects cannot have other properties than those which result from the properties of these objects. If Gustave Lebon spoke of crowd psychology, he could not ignore that he was talking about the reactions of individuals to each other and that crowd psychology is only a synthesis of individual psychologies. Likewise, if a series of experiments presents certain characteristics, this must result from the characteristics of the individual experiments. That these characters are difficult to formulate and to understand well, it is possible; but that is no reason not to try to understand them. Science must seek to understand. Newton sought to understand why the planets obey Kepler's laws; he thus, for an empirical and clear notion, substituted the mysterious notion of an attraction at a distance. Yet no one has denied that this was considerable progress. Likewise, if the notion of probability relative to a single experiment seems difficult to understand, its introduction, to explain the existence of a limit frequency in certain series of experiments, constitutes an important advance. To refuse this notion is to note a fact and give up understanding it.
- Although this remark seems decisive to me, I will examine the consequences of rejecting this notion. Richard von Mises, an excellent mathematician, who tried almost thirty years ago to renew the empiricist theory, gave the collective name to the series of experiments in which there is a limiting frequency which can be considered as a probability. He saw clearly that the existence of a limit frequency is not a sufficient condition for it to be called probability. Thus, when the sequence of results obtained is periodic, there is indeed a limit frequency for each of the possible cases and this cannot be the effect of chance. To answer this objection, von Mises displayed unnecessary ingenuity; he could only rule out the cases presenting certain characteristics that chance has no chance of producing. But any given sequence, having a zero probability of being realised, should be discarded. Von Mises placed himself in the same position as would be the organiser of a lottery to whom, after the placement of a million tickets, we would have asked not to make participate in the drawing of the single jackpot those tickets which would not have a one in a hundred thousand chance of winning; he should remove them all. With subjective theory, on the contrary, there is no paradox. A certain sequence must be realised, but we do not know in advance which one, and, as there is an uncountable infinity of possible and equally probable sequences, each of them has a zero probability.
- Another objection to empiricist theory is the impossibility in which it finds itself of proving Bernoulli's theorem: having postulated the existence of a limit frequency, it cannot hope to prove it. This existence is, moreover, only almost certain, and it should not be given as certain.
Here again, the subjective theory rules out any difficulty. A series of experiments is a compound experiment; the frequency of each possible case with possible values, of which the second principle makes it possible to calculate the probabilities, and we thus note that, if is large, the frequency of each case very probably differs very little from its probability . At the limit tends in probability to , that is to say that, whatever positive, the probability of tends towards zero; this is Bernoulli's classical theorem. E Borel and F P Cantelli have since shown that tends even almost surely to zero. All this does not imply any vicious circle, if we accept taking as a starting point the relative probability of a single experiment.
It should be noted that, if we obtain a result that seems verifiable, it is because a sufficiently small probability is practically equivalent to impossibility, in the sense that we do not expect the occurrence of a very unlikely event, and that more often than not we act as if it was impossible. This principle applies even if there is only one experiment: if we focus our attention on a very unlikely case, we can neglect it, which does not mean that we can neglect all similar cases., or even repeat this operation too often without expecting a certain percentage of disappointments; even if a probability is very small, its product by a sufficiently large number ceases to be negligible. These are common sense questions which it seems unnecessary to dwell on.
- There are fates that must be killed. I thought I had to fight, once again, a theory which has a singular tendency to rise from its ashes. But you must not make me say what I did not say. What I am asking is that we recognise the need to introduce the notion of probability in the study of a single experiment, and I claim that without this notion we cannot understand anything about the calculation of probabilities. On the other hand, I do not dispute that this notion interests us above all because it can lead to verifiable properties in the case of a large number of experiments. We can in particular say: probability is something hidden which is brought to light by repeating the experiment; the frequency is a measurement which is all the more precise as the number of repetitions is greater. This statement of Bernoulli's theorem is undoubtedly the one which best shows the respective roles of probability and frequency, in rationalist theory, and in the case of games of chance where the probability is well defined. It should, moreover, be noted that one can study series in which the successive experiments are not identical to each other, and obtain results which will greatly generalise Bernoulli's theorem. The frequency must then be compared to the average probability. These series of experiments are not collectives, and their interpretation in empiricist theories seems to me unnecessarily complicated, if not impossible.
- I have now spoken enough of these theories; I, in turn, must respond to the objections of the empiricists. M Chapelon was once astonished that one could base a science on a negative idea, saying: we have no reason .... This objection obliges me to clarify what I had initially implied. Our ignorance is not enough for us to be able to regard different cases as equally probable; more precisely, if we do, it is only a provisional judgment on the value of which we have no illusions. So, if John, Paul and Peter are candidates in the elections, I can find it convenient to express my ignorance by saying that in my opinion each has a one in three chance of being elected; but that does not mean that, if they're destined to compete a lot of times, I expect everyone to be elected roughly one in three times.
In games of chance, there is something else. In a card game, for example, the important fact is the identity of the form of the different cards, which cannot be distinguished from each other by touch. They all play the same role in the movements that the player makes when shuffling the cards, and if the game seems to me hard fought, that is to say if I am unable to deduce anything from the information I may have on its previous state, the different permutations also seem probable to me. In the same way with the dice, or the game of tosses or tails, the symmetry characteristics of the dice or of the play constitute a precise reason for considering them as equally probable, if one has no other information. Similar reasons exist in all games of chance, and it seems to me that the a priori evaluation of a probability can only be satisfactory when there are reasons of this kind.
- It remains for us to examine the most serious objection: science, which wants to be objective, cannot rely on a subjective notion. Just because I believe two equally probable cases does not mean that their frequencies will be asymptotically equal. We must therefore carefully examine the passage from the subjective to the objective; I must, in each case, ask myself if I am not mistaken in my assessment. We must first of all discuss the notion of a very unlikely event, on which everything is based. We must admit that we cannot demonstrate its objective character; but common sense obliges us to admit it. Thus it is not impossible that letters drawn at random reconstruct the text of this lecture; yet none of you doubts that, in writing it, I proceeded otherwise. What I want to point out now is that probabilists only explicitly state a principle that in other sciences is commonly applied, saying that a sufficiently small probability is practically equivalent to impossibility. So I spoke earlier about Kepler's laws: we could only consider the first as established because five points determine an ellipse and this could not be the effect of chance if, at a large degree of approximation, hundreds of observed positions for each known planet were placed on an ellipse. Nobody hesitated to consider that we had discovered a law of nature, which would remain verified in the future. No experimental law would be of value if we did not admit this kind of reasoning. It is a similar reasoning that the probabilist makes when he says, for example: in a game of a thousand tosses, each case will occur about five hundred times, with a gap which, without any calculation, we realise that it cannot exceed two hundred. One realises, in fact, instinctively that, among the immense number of theoretically possible parts, there is only a tiny fraction which gives a greater difference. If by chance such a gap were realised (and it was not after billions and billions of similar experiments), one should not hesitate to conclude that there was a cause other than chance. The conditions of the game would not have been what we thought. However, the calculation of the number of possible series confirms this conclusion, and the fact that we feel a priori that the probability is an approximate value of the frequency does not reduce the value of this calculation.
- The conclusion of these remarks is that we have the right to make calculations relating to chance. We only have to make sure, when we evaluate the probabilities relating to an experiment, that we have analysed the conditions in which it is carried out, clearly distinguished the essential conditions which will reproduce it if we repeat the experiment and the accessory conditions which vary, from time to time, in short, that no systematic error remains. Then the probability becomes objective, and we can be confident in our predictions, knowing that they are never completely certain. This requires a discussion which varies from one problem to another, and to which games of chance lend themselves particularly well. We will be satisfied with studying the shuffling of the cards. Although our object is a discussion of principles, we must first recall the mathematical scheme of the theory of shuffling cards.
- Other games of chance give rise to similar comments. In roulette, the perfect equality of squares plays the role that the identity of the cards played earlier. What replaces the shuffle is that the ball is thrown hard enough that the uncertainty carries over a large number of turns. This again is a condition which is never perfectly realised, and there is perhaps a very slight systematic error which could be revealed if the dealer's physical disposition remained the same for long enough. From this point of view, the wheels that we see at fairs are preferable, because the initial position of the wheel varies from one shot to another, and there cannot be any cause of systematic error other than construction defects. The theory of throwing a dice is arguably more difficult. But if its symmetry is perfect, and if its initial position varies from time to time, there cannot be any systematic error; any reason one would think of invoking in favour of one side applies likewise to the others. If, on the contrary, the die has the shape of an irregular polyhedron, or only if it is badly centred, the theory becomes more difficult, if not impossible. We get the impression that there may be a certain integral invariant which must play a role. But, for my part, the argument based on symmetry no longer exists, a theory based on the study of such an invariant would not entirely persuade me. I prefer to mix empiricism with theory by saying: the notion of probability being made familiar by the study of simple cases, we realise that reasons which are difficult to analyse exactly mean that each face has a well-determined probability; but experiment alone enables these probabilities to be determined, with a precision all the greater the longer that it is carried out.
- I hope that you will not reproach me for going outside my field of competence if I now speak to you about the applications of probability to other sciences, and first of all to physics. The first to date was the kinetic theory of gases, and it is perhaps the most perfect, because the theory is sufficient in this case to predict physical phenomena. Under the most frequent conditions, the number of molecules is so large that there is no need to distinguish the probabilities and the frequencies, and the study of the probabilities makes it possible to define physical constants, temperature and pressure, and to explain the laws of diffusion. In rarefied gases, the number of molecules is less, and there are fluctuations which theory explains perfectly.
The study of radioactivity has not reached such a satisfactory stage. We know that an atom in a radioactive body has a certain probability of decaying; at least we know that this way of considering the question leads to a satisfactory explanation of the observed phenomena. But we do not know the construction of the atom well enough to explain and calculate a priori this probability as we calculate that of turning up the king of hearts.
The statistics of Bose and Fermi, the uncertainty principle of Heisenberg, the probability wave theory of Louis de Broglie are the same syntheses of experimental facts. They explain the observed facts but do not satisfy the mind, because we in turn do not explain them by phenomena of a more familiar aspect, and they can, like almost all physical theories, be only more or less exact. As you know, these theories have led some scholars to a strange conclusion: determinism would be doomed, and it would be impossible for probabilistic theory this time to be compatible with an underlying determinism. Despite the authority of von Neumann, who claimed to demonstrate this impossibility definitively, I have never doubted the falsity of this opinion. The kinetic theory of gases is not in contradiction with determinism; it implies, on the contrary, that each molecule follows a trajectory well determined by physical laws. Likewise, in games of chance, we know that the outcome of the toss is actually determined by infinitesimal, unpredictable and time-to-time variations in player gestures. It could not be otherwise with the phenomena studied by Heisenberg and Louis de Broglie and I was delighted to learn recently that the latter, having discovered the error made by von Neumann, seems to return to determinism. I am deeply convinced that no scientific law is more certain than what is sometimes called the deterministic hypothesis: every phenomenon has a cause, or a set of causes, which determines it in its finest details. This is also the opinion of M Jean Ullmo, who will shortly give you a lecture on this question.
- I do not want to stray too far from my topic. But after this deterministic profession of faith, I think I should at least tell you briefly how I respond to the objections that are often made to integral determinism. One is that determinism leads to fatalism: why go to trouble, if everything is determined in advance? This objection is based in my opinion on a complete misunderstanding. Those who claim it forget that one of the elements in the chain of causes and effects is precisely human desires and efforts. No doubt Pasteur could not fail to discover the remedy for rabies, but it was because he was a Pastor, that is to say because he had the desire to achieve it and the necessary genius. In other cases, I know in advance how such and such a person will react under such circumstances; because I know his character, I know what he will do; his action is in a sense determined, since I can foresee it; in another sense he is free, which only means that this person has only obeyed his will and has not been subjected to any external constraint. There is no other conceivable freedom than this.
A more serious objection is this: Does not the basis of morality collapse if monsters have the excuse of having been created as they are, with certain instincts for which they are not responsible? Should we not agree with Philinte, who looks at them with the same feelings as when he sees
Evil monkeys or raging wolves?I do not hesitate to say that the psychologist must agree with Philinte. But moralists and educators should know that it sometimes depends on them whether an individual endowed with certain instincts becomes a normal being, or a criminal. No one disputes that there is a morality without which life in society would not be possible, that normal beings have an innate feeling of it, and that they often believe they explain this feeling by that of a freedom of which they do not know at all just what it means. To explain my thinking to you exactly, I would have to both waste your time and perhaps touch on matters which are best not to be discussed here. I will only say to those who see a contradiction between determinism and morality that moralists must take the world as it is, without believing that nature has the concern of justifying morality and human laws; but, for my part, integral determinism seems to me to be in contradiction neither with morality nor with the feeling of our responsibility.
- Let us return to the applications of the calculus of probabilities. I do not know if, apart from those I have already mentioned, there are some which allow an a priori evaluation of probabilities which is really satisfactory. In any case, more often than not, we can repeat what I said earlier about a die which is irregular. These are complex phenomena in which we can think, by analogy with those we have studied, that reasons impossible to analyse exactly tend to produce certain phenomena with more or less constant frequencies. The study of games of chance will not have been useless, since it allows the scientist to be guided by the feeling of this analogy. But the frequencies cannot be predicted a priori, and experimental statistics will be the basis of research. Naturally, the statistician will endeavour to group only comparable phenomena. Often he will discover the existence of parameters which influence the phenomena studied, and, if the observed results are not numerous enough to be grouped into series in each of which the parameters are constant, the statistician will apply other methods to study the influence of these parameters.
If the theory does not predict frequencies, it will still be useful. Any attempt to explain a phenomenon, even if it does not lead to a precise explanation, makes it possible to foresee the parameters which may intervene, and whose influence must be systematically studied. If the sketched theory allows more accurate predictions, statistics will show whether these predictions are true. Thus, just as in any science, experiment and theory must complement and join each other, statistics, which studies series of experimental results, and the calculus of probabilities, based on theoretical considerations, have opposite starting points, but try to come together.
- To conclude, I would like to choose a particular example in order to present in concrete form some remarks of a general nature on the concept of probability relating to an single case. Let us take a determined individual, Jean N, and, to clarify ideas, suppose he is exactly forty years old, lives in a low-rent apartment in Clichy, and is a taxi driver. What is the probability that he will die in a year? Or rather, can we talk about this probability?
We cannot speak here, as with the choice of a card in a well-fought game, of a priori evaluable probability, and we can only resort to statistics. But a statistic implies a grouping of analogous cases; it relates to what M Fréchet calls a category of tests. To which category will we attach the case of Jean? We know too much about him for there to be other truly analogous cases, and to begin with we can only consult a statistic that is far too general. We will no doubt easily find a statistic which answers the following question: among the men who, in France, in the first half of this century, have reached forty years old, what is the proportion of those who died before forty-one? This proportion gives us a first approximation of the probability relative to John; but this is only a rough approximation. There are many corrections to be made.
First, the mortality is not the same in a suburb of Paris and in the whole country; it may also depend on housing conditions. We may be lucky enough to find a statistic relating to Clichy, and another relating to mortality in homes with a level of hygiene comparable to that of the building where Jean lives. But these statistics are unlikely to be done for every sex and age; we cannot do better than to correct our first approximation as if he were a person of unknown age and sex, drawn at random from the population, and to judge whether the two corrections deduced from the two statistics found are simply added, or are composed according to a more complicated law.
But there are many other things to consider. Jean is a taxi driver; there is an occupational risk, and, as it can be increased or decreased according to skill and prudence, it can only be roughly evaluated.
This evaluation will result in a correction which will be positive or negative, depending on whether this occupational risk is greater or less for John than for the average individual of the population studied first. It is then necessary to take into account the state of health of Jean, the diseases which he had, the information which one can have on his tendency to contract such or such disease, on the average longevity of his family. Finally, it is necessary to know his type of life, to know if he is married or single, to know how he will be treated in the event of illness, and if the occupation of his possible holidays creates an additional risk.
I will not continue this enumeration. I wanted to show you that, in an individual case, there is generally a great deal of information available which makes it necessary to modify the initial assessment. Each one can undoubtedly be the object of a statistical study, which indicates the correction which it is necessary to make. But it only indicates this roughly, and we do not know a priori whether these corrections are independent or not. Also the probability sought is only something roughly defined, like the height of a wave or the dimensions of a cloud. It is subjective in the sense that it depends both on the information we have and on our ability to appreciate the consequences. It also has an objective value. But, before explaining my thoughts on this point, I want to make another point.
There is one category of data that I have not considered so far, and that is historical events. A war or an epidemic can increase the likelihood that Jean will die within a year; progress in medicine can, on the contrary, diminish it. Now, these are circumstances whose probability cannot be objectively defined, since the repetition of roughly analogous experiences is not conceivable. One can undoubtedly speak subjectively about the probability of such an event; but this probability will vary considerably with the information available, and can never be assessed with precision. Of a hoped-for or feared event we can only say that it is very probable, or very unlikely, or that we are in the intermediate case where the probabilities of the two possible eventualities are of the same order of magnitude; but one will only express an opinion more or less satisfactorily justified, and in any case any precise evaluation would be illusory. It is therefore dangerous to apply the calculus of probabilities to historical events. It should also be noted that, for the problem which concerns us, even if we could estimate the probability of a war, we would still be far from knowing the probability that it results in the death of Jean.
We can therefore only assess the probability of Jean dying in the year by ruling out the possibility of a phenomenon such as a war, an exceptional epidemic, or medical progress relating to a disease with which Jean is threatened. As in all the applications of the calculus of probabilities, it is necessary to eliminate all the causes of systematic errors, or at least (since this does not depend on us), know that our forecasts are only valid if certain external circumstances are not changed. With this condition, the probability of which I spoke earlier has an objective value. By that I mean that, subject to the stated caveat, it allows an approximate forecast of certain frequencies. If, for example, we evaluate in the same way the probabilities relating to a few thousand individuals, scattered enough in the population so that it is unlikely that several of them die in the same accident, the number of those who die in the year, and its probable value deduced from these assessments, are likely to differ even less the more information we have on these individuals, and the better we have assessed the probable effects of the circumstances that have come to our knowledge. Naturally, the order of magnitude to be expected for this difference can in no case be lower than what it is when the probabilities are perfectly known. The uncertainty of the evaluation, added to the accidental error itself, is likely to increase it rather than decrease it.
Last Updated September 2020