(I’m frustrated with the length this post and how much time it’s taking me to finish, so I’m splitting it into two parts.)
I subscribe to a school of thought some call “Jaynesian” after Edwin T. Jaynes. Its foundation is a theorem of Richard T. Cox, a physicist who studied electric eels, not to be confused with the eminent statistician Sir David R. Cox. Since my first project will be to engage with Professor Mayo’s diametrically opposed views on the proper way to use (and think about the use of) statistics in science, it seems worthwhile to describe the theorem and the reasons I take it to be foundational to statistics — of the Bayesian variety, at least.
This section is somewhat disjointed; I chose conciseness and links over a well-written and complete introduction.
Cox sought to formalize the notion of the plausibility of a claim. (Following Professor Mayo, I use “claim” in preference to “assertion” or “proposition”.) The kinds of claims I’m talking about are those that admit two extremes of plausibility: these kinds of claims could either be known to be true, and thus be as plausible as it is possible to be; or they could be known to be false, and thus be as implausible as it is possible to be. In particular, I’m not talking about the kinds of claims that can still be under dispute once all the relevant facts are known, e.g., aesthetic or moral judgments.
Even in the times of antiquity, we knew how to guarantee that only true conclusions would be inferred from true premises. But absent any premises, the rules of that method — the syllogisms of Aristotelian logic — cannot operate; to get out something worthwhile, we have to provide worthwhile input. In likewise fashion, we need some background knowledge — oh heck, let’s just call it prior information — on which to base the assessment of the plausibility of a claim.
Boolean algebra provides rules for manipulating arbitrary combinations of conjunctions (), disjunctions (), and negations () of claims. (Such combinations are themselves claims). Actually, disjunction can be defined in terms of conjunction and negation, so Cox focused on the latter two operations.
So that’s the background. Next, I’ll give Cox’s postulates, a statement of the theorem, and a high-level overview of the proof. In the post following this one, I’ll discuss why I think the postulates are reasonable and how the resulting perspective guides my approach to statistics.
I’ll refer to quantities of the type considered by Cox as Cox-plausibilities to distinguish them the informal notion of plausibility. (Cox, writing in 1946, used the term “likelihood” which I will avoid for obvious reasons.) But first, I need some notation: upper-case letters will refer to claims; I’ll use X as the symbol for some generic prior information. For Cox-plausibilities I’ll use, e.g., the symbol A|X to represent the the (real-valued) Cox-plausibility of claim A evaluated on prior information X.
The following five postulates set out the properties that a system of Cox-plausibilities must have. They are:
- Cox-plausibilities are real numbers.
- Consistency with Boolean algebra: if two claims are equal in Boolean algebra, then they have equal Cox-plausibility.
- There exists a conjunction function f such that for any two claims A, B, and any prior information X,
- There exists a negation relation. (Okay, it’s actually a function too.) For now I defer giving it a symbol; in words, it says that the Cox-plausibility of the negation of a claim is a function of the Cox-plausibility of the claim (all relative to the same prior information).
- The negation relation and conjunction function (and their domains) satisfy technical regularity conditions.
Succinctly stated, Cox’s theorem is: any system of Cox-plausibilities is isomorphic to probability.
The proof as seen from 30,000 ft
Conjunction is associative in Boolean algebra:
The consistency postulate (and the regularity conditions on the domain of the conjunction function) then require that the conjunction function obeys a functional equation sometimes called the Associativity Equation:
The Associativity Equation plus the regularity conditions imply that there must exist a strictly increasing invertible function g such that
This equation is significant enough to deserve its own name: it’s the product rule. Note that for the function p, defined as
the product rule still holds.
The negation relation can now be stated as: there exists a function h such that
Together, the consistency requirement, the product rule, and the negation relation yield another functional equation:
Invoking those regularity conditions again, the general solution of the above equation is
Stated in terms of p, the above equation is:
which can be rearranged to give the sum rule:
Taken together, the sum rule and product rule are the usual starting point for expositions of probability theory, so that’s the whole thing.