Why these postulates?
Recall Cox’s five postulates:
- Cox-plausibilities are real numbers.
- Consistency with Boolean algebra: if two claims are equal in Boolean algebra, then they have equal Cox-plausibility.
- There exists a conjunction function f such that for any two claims A, B, and any prior information X,
- There exists a negation relation (actually a function too). In slightly different notation that in the previous post, it is:
- The negation relation and conjunction function (and their domains) satisfy technical regularity conditions.
I doubt such evidence will be forthcoming. The reason is that the key property that distinguishes the reals from the other possible choices for the domain is that the reals are totally ordered. By making this choice, we ensure that the Cox-plausibilities of all the claims in whatever universe of discourse we’re considering will also be totally ordered. Conversely, if the domain is only a poset, then for at least one pair of claims, we won’t be able to pick the one with the larger plausibility (or state that they’re equally plausible).
This is not intrinsically a strike against posets as accurate representations of some set of available information — I acknowledge that a lack of universal comparability may indeed be quite appropriate in some settings. Rather, my intuition is that these settings are precisely the ones for which the paucity of prior information ensures that very little is actually achievable in practice.
This postulate says that the Cox-plausibility for a given Boolean expression depends on its truth table rather than the particular symbols used in the expression. Personally I have no qualms accepting this postulate, and it would surprise me if anyone found it controversial. Perhaps something interesting could be generated without this postulate, but I feel sure that any such development will not have semantics appropriate to plausibility.
I won’t give a full argument for the necessity of this particular form, but I will offer an example of the kind of reasoning that leads to it.
Instead of the conjunction function I gave above, suppose we consider
Can this possibly serve as the functional relationship between the the plausibility of a conjunction and the plausibility of the conjuncts?
It can’t. Consider the plausibility of the claim that some person, (say, the seventh person you encounter tomorrow) has blue eyes. This is a conjunction of the claim that the person has a blue left eye (LB) and a blue right eye (RB). It is not the case that these conjuncts are logically equivalent, but it is “nearly” so in some sense. But in different states of prior information (perhaps about one’s geographic location, e.g., Iceland, India) the plausibility of the conjunction can vary quite freely. That is,
and so for any sufficiently regular f?,
so this functional form isn’t flexible enough to do the job we’d want it to do.
In a similar way, test cases can be defined and applied to all of the possible functional forms, of which there are a small number. Only two survive: the one I gave in postulate #3 and the one obtained from it by interchanging B and A.
Curiously, the literature did not contain a full account of such a test for quite some time; the full procedure was carried out in Whence the laws of probability? by A.J.M. Garrett.
Actually, there’s no real reason to separate negation and conjunction into separate postulates. The link immediately above is to a paper which uses NAND instead of conjunction and negation separately. It’s also possible to get to Cox’s theorem by considering conjunction and disjunction.
Cox’s original proof assumed that the conjunction function and negation relation were twice-differentiable; Jaynes offered a proof that assumed just differentiability. Mathematically speaking, these are quite strong assumptions; by using them, Cox and Jaynes left open the possibility of non-differentiable solutions.
Frankly, given my training as an engineer, I’m not too concerned that other, non-differentiable solutions might exist. I’d be happy to consider such solutions as they appear in the literature, but for me, the differentiable solution is enough to be going on with. But if you don’t share my cavalier attitude, be comforted, for you are not alone. Within the past 15 years or so, interest in the necessary assumptions for Cox’s theorem was revived by a paper of J. Y. Halpern purporting to give a counter-example to the theorem. The counter-example violated one of the regularity conditions Cox did in fact assume; Halpern acknowledged this and went on to argue that the necessary regularity conditions were nevertheless “not natural”.
This argument prompted a straight-up counterargument by K. S. Van Horn and also research into more natural regularity condtions. (The K. S. Van Horn paper is to my go-to link when referring to Cox’s theorem in blog comments; if you like what you’ve read here, read that next.) More recently, Frank J. Tipler (of Omega Point infamy) and a co-author have written a paper that assumes a lot of mathematical machinery with which I am not familiar and claims to give a “trivial” proof of Cox’s theorem. And very recently, K. Knuth and J. Skilling have offered an approach they call “simple and clear” and which they claim unites and extends the approaches of Kolmogorov and Cox.
Assuming, then, that somewhere in the morass of links above there exists a set of satisfactory postulates, where does that leave us? Let me characterize Cox’s theorem in three ways, two negative and one positive.
First off, Cox’s theorem is not a straitjacket. Unlike other approaches to Bayesian foundations, we made no loaded claims of rational behavior, preference, or, arguably, belief. We can jump into and out of any or all joint (over parameters/hypotheses and data) prior probability distributions we care to set up, examining the consequences of each in turn.
Second, Cox’s theorem is not a guarantee of soundness. Just as in classical logic, nothing in protects us from garbage-in-garbage-out. If we want to argue that the conclusions of a Bayesian analysis are well-warranted, we must justify the prior distributions we used in terms of the available prior information.
Finally, Cox’s theorem is a guarantee of validity. It justifies Bayes’ theorem as an objectively well-founded method for computing the plausibility of claims post-data given the plausibility of claims pre-data.