A foundation for Bayesian statistics, part two: Cox’s postulates

Why these postulates?

Recall Cox’s five postulates:

  1. Cox-plausibilities are real numbers.
  2. Consistency with Boolean algebra: if two claims are equal in Boolean algebra, then they have equal Cox-plausibility.
  3. There exists a conjunction function f such that  for any two claims A, B, and any prior information X, 
A\wedge B|X=f\left(A|X,\, B|A\wedge X\right). 
  1. There exists a negation relation (actually a function too). In slightly different notation that in the previous post, it is:

\neg A|X=h\left(A|X\right).

  1. The negation relation and conjunction function (and their domains) satisfy technical regularity conditions.

Real numbers

Choices other than the reals are possible: one might choose a domain with less structure, such as a lattice, or larger dimensionality, such as the set of real intervals as in Dempster-Shafer theory. At present, it seems to me that choosing one these other possibilities as the domain doesn’t buy us much in terms of usefulness for practical applications. That said, I could be swayed from this view by relevant evidence.

I doubt such evidence will be forthcoming. The reason is that the key property that distinguishes the reals from the other possible choices for the domain is that the reals are totally ordered. By making this choice, we ensure that the Cox-plausibilities of all the claims in whatever universe of discourse we’re considering will also be totally ordered. Conversely, if the domain is only a poset, then for at least one pair of claims, we won’t be able to pick the one with the larger plausibility (or state that they’re equally plausible).

This is not intrinsically a strike against posets as accurate representations of some set of available information — I acknowledge that a lack of universal comparability may indeed be quite appropriate in some settings. Rather, my intuition is that these settings are precisely the ones for which the paucity of prior information ensures that very little is actually achievable in practice.

Consistency postulate

This postulate says that the Cox-plausibility for a given Boolean expression depends on its truth table rather than the particular symbols used in the expression. Personally I have no qualms accepting this postulate, and it would surprise me if anyone found it controversial. Perhaps something interesting could be generated without this postulate, but I feel sure that any such development will not have semantics appropriate to plausibility.

Conjunction function

I won’t give a full argument for the necessity of this particular form, but I will offer an example of the kind of reasoning that leads to it.

Instead of the conjunction function I gave above, suppose we consider

A\wedge B|X=f_?\left(A|B\wedge X,\, B|A\wedge X\right).

Can this possibly serve as the functional relationship between the the plausibility of a conjunction and the plausibility of the conjuncts?

It can’t. Consider the plausibility of the claim that some person, (say, the seventh person you encounter tomorrow) has blue eyes. This is a conjunction of the claim that the person has a blue left eye (LB) and a blue right eye (RB).  It is not the case that these conjuncts are logically equivalent, but it is “nearly” so in some sense. But in different states of prior information (perhaps about one’s geographic location, e.g., Iceland, India) the plausibility of the conjunction can vary quite freely. That is,

R_{B}|L_{B}\wedge Iceland\approx R_{B}|L_{B}\wedge India,

L_{B}|R_{B}\wedge Iceland\approx L_{B}|R_{B}\wedge India,

and so for any sufficiently regular f?,

f_{?}\left(L_{B}|R_{B}\wedge Iceland,\, R_{B}|L_{B}\wedge Iceland\right)\approx f_{?}\left(L_{B}|R_{B}\wedge India,\, R_{B}|L_{B}\wedge India\right).


L_{B}\wedge R_{B}|Iceland\ne L_{B}\wedge R_{B}|India,

so this functional form isn’t flexible enough to do the job we’d want it to do.

In a similar way, test cases can be defined and applied to all of the possible functional forms, of which there are a small number. Only two survive: the one I gave in postulate #3 and the one obtained from it by interchanging and A.

Curiously, the literature did not contain a full account of such a test for quite some time; the full procedure was carried out in Whence the laws of probability? by A.J.M. Garrett.

Negation relation

Actually, there’s no real reason to separate negation and conjunction into separate postulates. The link immediately above is to a paper which uses NAND instead of conjunction and negation separately. It’s also possible to get to Cox’s theorem by considering conjunction and disjunction.

Regularity conditions

Cox’s original proof assumed that the conjunction function and negation relation were twice-differentiable; Jaynes offered a proof that assumed just differentiability. Mathematically speaking, these are quite strong assumptions; by using them, Cox and Jaynes left open the possibility of non-differentiable solutions.

Frankly, given my training as an engineer, I’m not too concerned that other, non-differentiable solutions might exist. I’d be happy to consider such solutions as they appear in the literature, but for me, the differentiable solution is enough to be going on with. But if you don’t share my cavalier attitude, be comforted, for you are not alone. Within the past 15 years or so, interest in the necessary assumptions for Cox’s theorem was revived by a paper of J. Y. Halpern purporting to give a counter-example to the theorem. The counter-example violated one of the regularity conditions Cox did in fact assume; Halpern acknowledged this and went on to argue that the necessary regularity conditions were nevertheless “not natural”.

This argument prompted a straight-up counterargument by K. S. Van Horn and also research into more natural regularity condtions. (The K. S. Van Horn paper is to my go-to link when referring to Cox’s theorem in blog comments; if you like what you’ve read here, read that next.) More recently, Frank J. Tipler (of Omega Point infamy) and a co-author have written a paper that assumes a lot of mathematical machinery with which I am not familiar and claims to give a “trivial” proof of Cox’s theorem. And very recently, K. Knuth and J. Skilling have offered  an approach they call “simple and clear” and which they claim unites and extends the approaches of Kolmogorov and Cox.

Summing up

Assuming, then, that somewhere in the morass of links above there exists a set of satisfactory postulates, where does that leave us? Let me characterize Cox’s theorem in three ways, two negative and one positive.

First off, Cox’s theorem is not a straitjacket. Unlike other approaches to Bayesian foundations, we made no loaded claims of rational behavior, preference, or, arguably, belief. We can jump into and out of any or all joint (over parameters/hypotheses and data) prior probability distributions we care to set up, examining the consequences of each in turn.

Second, Cox’s theorem is not a guarantee of soundness. Just as in classical logic, nothing in protects us from garbage-in-garbage-out. If we want to argue that the conclusions of a Bayesian analysis are well-warranted, we must justify the prior distributions we used in terms of the available prior information.

Finally, Cox’s theorem is a guarantee of validity. It justifies Bayes’ theorem as an objectively well-founded method for computing the plausibility of claims post-data given the plausibility of claims pre-data.

  1. Christian Hennig said:

    Nice, thanks.
    I observe that there are different kinds of arguments used. Some are about what people think plausibility “really is” and how it should be handled, some others are of a pragmatic nature: “We need real numbers/differentiability, because this allows us to use this setup in a certain convenient way/make the kind of statement we’d like to make/prove a nice theorem.”
    I have no problems with this as long as it’s transparent, but I think that such an approach has very limited power to convince people who are sceptical, because whatever results you get in a practical situation, it always leaves open the objection “things could have been done in a quite different way, violating a convenience assumption here and there, and then results could have been very different”. It is basically a subjective choice if you are willing to make these assumptions or not (not denying that there are certain reasons in their favour). Fine by me, but I think that they are often sold to people as being something stronger.
    Another problem with this approach is that if you arrive at probabilities implicitly deducing them theoretically from such assumptions, this helps less with interpreting them and makes them much more abstract quantities. For example, Cox’ theorem still leaves open certain monotonic transformations of probabilities, which for example means that the intuitive interpretation of a probability of 60% as “a bit more than half of the cake”, which in some sense is granted by both relative frequency and betting rate interpretations of probability, does not work here.

    • I hope I have not made arguments about what people in general think plausibility really is — I meant to describe only what *I* desire out of a formalization of plausibility. I agree that the assumptions that go into the proof are often sold to people as being something stronger; I tried to be clear about the fact that technical assumptions of some kind are necessary, if not the specific ones used to arrive at probability. That said, if someone wants to argue that things could have been done in a quite different way, it really falls to them to actually *do* things in a different way and show what the consequences are. I promise to listen carefully to anyone who makes a constructive argument, e.g., Halpern, Mayo*. I also promise to pay attention to those claim that the consequences of doing things *this* way are undesirable, e.g., Robins, Owhadi.

      * For instance, (putting aside the fact that Mayo isn’t aiming at plausibility,) the errors statistical approach views the warrant for claims as test-relative, thereby denying Cox’s first postulate.

    • I just want to add a comment about, “For example, Cox’ theorem still leaves open certain monotonic transformations of probabilities, which for example means that the intuitive interpretation of a probability of 60% as “a bit more than half of the cake”, which in some sense is granted by both relative frequency and betting rate interpretations of probability, does not work here.”

      The law of large numbers still works here. On the frequency interpretation, the meaning of that theorem is a bit of a puzzle: didn’t we already define probability as frequency? Or if we didn’t — maybe we even claim that the law of large numbers *justifies* the frequency interpretation! — then what is the interpretation of the probability law of any single, one-off random variable? In the Jaynesian approach, the derivation of probability as Cox-plausibility answers the latter question.

      • Christian Hennig said:

        To me the law of large numbers from a frequentist point of view is not a puzzle, but I accept that many explain it in such a way that it becomes one. The thing is that you need two different kinds of repetitions; one for the definition of probability (let’s call it defr) and one that is a probability model already, as used in the LOLN. The interpretation of the LOLN is then through potentially infinite repetition (defr) of the modelled probability sequence that *models* infinite repetition in probability terms.
        So no, the LOLN doesn’t *justify* the frequentist interpretation, but it makes sense, at least if you feel able to juggle three different kinds of infinite sequences (defr of a single observation, modelled, and defr of the model).

      • I see what you’re saying; I never thought of it that way. I never needed to — I encountered Jaynes before I gained whatever understanding I can claim of probability theory’s measure theoretical underpinnings.

  2. Entsophy said:

    Van Horn’s paper is the best. This work all seems to suffer from the same problem though. They’re trying to make the proof as simple as possible. That’s a big mistake if you’re trying to use the theorem to convince someone about Bayes.

    If convincing doubters is the goal, then it’s better to have set of axioms where it’s clear what price in practice is payed by not accepting Cox’s conclusion. If you had that, the proof could be 100 pages long and it wouldn’t matter. Without that, the proof could be two lines and Frequentists will just shrug it off.

    • I think the idea is to make the assumptions as unobjectionable as possible, not to make the proof as simple as possible. Admittedly, the Tipler paper does tout a “trivial” proof by assuming a bunch of mathy things; I seem to recall that an earlier draft of that paper on the arXiv made more of the (purportedly) compelling nature of the assumptions. (I’d have to go digging through Wikipedia and textbooks and maybe MathExchange/MathOverflow to get enough background to actually see if the Tipler approach is compelling, and I can’t be arsed.)

  3. As I read the argument on real numbers, it seems to be that if you do not assume a total order going in then you do not get a total order going out, so one cannot use (Bayesian) probability alone to solve problems. But you might use something else, such as Keynes’ confidence or attitude to risk.

    As an example, in the UK we used to feed pigs to pigs. There was thought to be a low risk (i.e., Bayesian probability of serious harm). But we perhaps ought to taken a more precautionary approach, noting that there was little evidence to support the estimated risk.

    Similarly, whenever a medic talks probabilities, I like to look at the evidence base. Is this irrational?

    • “As I read the argument on real numbers, it seems to be that if you do not assume a total order going in then you do not get a total order going out, so one cannot use (Bayesian) probability alone to solve problems.”

      I don’t think the latter clause follows from the former, but each clause is correct on its own.

      One cannot use Bayesian probability alone to solve problems in the same sense that one cannot use the rules of logical deduction alone to solve problems. In logic one needs premises upon which to perform inference; in Bayesian probability one needs prior information and data.

      • Taking what you say at face value, could we agree that in Ellsberg’s examples one lacks enough ‘prior information and data’, in which case we might agree that Cox’s assumptions are often debatable? (I would say ‘not met’ or ‘beg the question’.)

      • I don’t know if I’d go that far. Ellsberg’s examples demonstrate what people actually choose rather than what an ideal reasoner with unlimited computational resources (and divorced from all utility concerns) would expect.

        A ideal reasoner would need to consider human psychology and game-theoretical concerns (e.g., issues like how did the question come to be posed and what does the person posing the question hope to gain), so the hypothesis space is huge. Humans run into problems with bounded capacity for introspection and bounded computational resources in these situations; Cox’s postulates are applicable in principle but not in practice, as it were.

        Maybe this seems like pointless hairsplitting to you…

  4. Pedro Terán said:

    I’m glad I’ve discovered this blog. However, I strongly disagree with you on this topic (I am also interested in your reading of Mayo, but I haven’t gone through it yet).

    It looks like what you want is a justification for using Bayesian reasoning in practice. Regarding that modest aim (far more modest than Jaynes’), the argument seems to be a non sequitur.

    If using a plausibility measure that satisfies Cox’s postulates is important to you, then everybody already knew probability did so. Cox’s argument is not necessary for that, and it leaves you with the extra task of defending Cox’s postulates.

    What exactly is the ‘validity’ you vindicate from invoking Cox’s argument? It leaves things at a point in which several alternatives to probability are still on the table. For example, unlike in the Dutch book argument, if my plausibility of A is .9 and my plausibility of not-A is .9 too, that’s perfectly fine. According to the argument, if I do so then it is ‘valid’ as well to assign them any other values that are still equal. Not only this is clearly wrong (I doubt there actually exists an approach to uncertainty in which an assignment of .999 and .999 has the same meaning as an assignment of .001 and .001) but it offers no reason to prefer the Bayesian solution .5 and .5 to any other.

    Two sensible things about the Cox argument are:
    a) It offers a non-frequentist explanation of the formula P(A|B)=P(AB)/P(B).
    b) If you accept Cox’s postulates, then you might not mind using probability instead. (But has there ever been anybody who accepted Cox’s postulates and rejected probability at the same time?)

    • Welcome Pedro! I too am glad you’ve discovered the blog. Unfortunately it’s on a bit of a hiatus while I pursue a time-intensive business opportunity.

      I’m not sure what you mean by “extra”. Anyone who applies Bayesian reasoning ought to both defend the prior probability assessments in specific analyses and also defend the approach in general. This is my defense of the approach in general.

      The kind of validity I have in mind is an extension of the validity of deductive arguments, i.e., a posterior plausibility assessment is a logical consequence of the corresponding prior plausibility assessment. This is essentially identical to your point (a).

      As to what picks out the Bayesian solution of 0.5 — or say rather, what picks out the probability scale from the set of all possible isomorphically-related plausibility scales — the answer is that in a collection of identically distributed Bernoulli random variables, under a certain condition (exchangeability, which is weaker than independence) the law of large numbers still works and makes the expected frequency equal to 0.5, and not 0.999 or 0.001, even if you’re using a plausibility scale in which one or the other of those values is the numerical plausibility of both A and not-A. It can be proven that under exchangeability, the probability scale has numerical equality with expected frequency in general. We use the probability scale because of this nice property — that’s the whole story.

      • Pedro Terán said:

        Thank you for your reply. I hope the business opportunity will work out fine.

        OK, I am aware of the exchangeable LLN. Although you have a point there, people who assign plausibility .999 to both A and not-A would be shocked if the asymptotic frequencies of A and not-A in a sequence of experiments turned out to be .999 and .999. Thus they are unlikely to buy that argument.

        That is my general problem with the Cox argument. It is successful to make you feel justified if one already believes the conclusions to be correct, since it is about reaching the same conclusions from weaker premises than one previously did. But it is not very forceful to people who accept neither the premises nor the conclusions of the argument (the conclusions may depend on who states them: “It is acceptable to be a Bayesian” or “It’s unacceptable not to be a Bayesian”).

        As regards the postulates themselves, I feel we should distinguish between postulating them as reasonable for *some* problems or for *all* problems.

        Since frequencies satisfy them, there can be little discussion that they are reasonable and sensible in some problems. But usually what people (e.g. Jaynes) want is to establish that Bayesianism is ‘valid’ for all problems and non-Bayesianism is ‘invalid’ for all problems. In that context, I personally don’t find any of the postulates harmless, and I find (2), (3), (4), and (5) unacceptable, or extremely implausible, as universal features of sound reasoning.

        I’ll try to read the discussion of Mayo tomorrow.

      • I’m curious — can you give examples of scenarios where postulates (2)-(5) would lead you astray? I’m particularly interested in problems with (2).

      • I posted a comment yesterday but something may have gone wrong as it didn’t either appear or prompt an ‘awaiting moderation’ warning.

        Traditionally, pro-Bayesian arguments are conditional on the reader accepting classical logic as common ground already recognized as being ‘true’. Assumption (2) is the same thing as assuming the Lindenbaum-Tarski algebra at hand to be a Boolean algebra:

        If you want Cox’s theorem to say ‘OK, rest assured that what you’re doing is reasonable’, then that’s no problem if you never get to feel there is a need for using objects which do not obey Boolean algebra rules. But if, à la Jaynes, you want Cox’s theorem to say ‘OK, rest assured that what everybody else in the world is doing is wrong’, then the argument becomes a non-starter.

        For example, intuitionists deny that ‘A or not-A’ is a tautology, wherefore they can only be expected to concoct plausibility assignments where Plaus(A or not-A)=Plaus(Truth) is violated frequently. Cox’s theorem remains silent about such assignments, in particular it does not prove that they should be isomorphic to probabilities.

        Another example is fuzzy logics, or plausible reasoning about fuzzy sets, since they do not form a Boolean algebra.

        I have been revisiting Van Horn’s paper today. He violates intuitionistic logic as early as his usage of the double negation rule in Proposition 2. Even earlier, his requirement R2(3) is incompatible with most fuzzy logics, since ‘B and B’ is not equivalent to B in them (here I am assuming Van Horn’s state of information B,B,X is the same thing as state of information B,X).

        It may be possible to try and use Cox’s theorem to rule out the p-value as a ‘competitor’ of the posterior probability. But it is a totally different game to rule out other uncertainty theories in the market, which are at odds with the common logical assumptions of both Bayesianism and frequentism.

        I hope this clarifies (2). Time permitting, I’ll remark on some of the other assumptions.

      • Only new commenters’ comments are held for moderation; once I approve such a comment, the commenter is no longer considered “new” and all subsequent comments should appear immediately.

        I don’t think you’ve written anything wrong, exactly, but it does seem off-base somehow. To me, intuitionistic and fuzzy logics don’t seem to be “about” truth per se. Intuitionistic logics seem to be about constructive provability, and fuzzy logics seem to be about modeling inherently vague concepts via continuous degrees of set membership.

        I deny that Jaynes says everyone else is doing it wrong. He doesn’t say, “Never use intuitionistic logic and fuzzy logic ever.” He does say one shouldn’t use them to model truth and/or plausibility, and that seems eminently reasonable to me — they are the wrong tools for that job (as far as I can see; I invite correction on this point).

        From this perspective, the propositions ‘A’ and ‘not-A’ in intuitionistic logic translate to “A is constructively provable” and “not-A is constructively provable” in colloquial English. These can’t both be true in a consistent formal system, but their negations can be.

        As to fuzzy logic, this link goes into more depth about the “right tool for the job” idea; and even Lotfi Zadeh thinks that probability theory and fuzzy logic are complimentary rather than competitive.

        (Even though I’m disagreeing with you, I’m enjoying our conversation! Looking forward to more.)

    • I don’t think we disagree. I think we see the same pieces but assemble them in a different hierarchical order and looking for a different global sense.

      Concerning the underlying role of logic, I have a more abstract point of view. To me, the original motivation of intuitionistic logic or why it has a place under the sun in human culture, and whether it ‘is about’ truth or not, is not essential here. The relevant thing is that, after identifying all propositions Jaynes’ robot considers equivalent, the resultant algebraic structure might possibly be a Heyting algebra instead of a Boolean algebra. If the robot examined its knowledge base and extracted the underlying ‘laws’ not broken by what it considers true knowledge, they would be the same as for an intuitionist, even for completely different reasons or for no reason at all. But, reason or no reason, we certainly would want its methods of plausible reasoning to be consistent with the internal logic of its knowledge base.

      If you accept that said logic can be Boolean logic, then you already concede that it could as well be a weaker logic, simply because it is a weaker assumption. If one claims to have the unique general, universally valid way of plausible reasoning, which nonetheless depends critically on the underlying logic being Boolean logic, something should be said about it.

      Jaynes had two things to say. First, that he disregards non-classical logics because he is content to follow a venerable tradition. This is an acceptable position but it is of a pragmatic, not foundational, nature; and it begs the question whether the foundational role of Cox’s theorem might actually fail for them. He also has the heuristic impression that ‘logic can only move forward’, which is a serious misunderstanding of how research in mathematized areas works.

      Second, out of the book he has taken a more explicit stance against them. For instance (p.268 in this paper: http://bayes.wustl.edu/etj/articles/backward.look.pdf ),
      “Fuzzy Sets are –quite obviously … crude approximations to Bayesian prior probabilities. They were created only because their practitioners … concluded that probability theory is not applicable to such problems. As soon as one recognizes probability as the general way of specifying incomplete information, the reason to introduce Fuzzy Sets disappears … Likewise, much of Artificial Intelligence (AI) is … usable in some restricted kind of problems. In Bayesian inference … without any limitation to a restricted class of problems.” (Jaynes’ emphasis; the AI part is unrelated to my point but supports my former ‘universally valid’ bit.)

      The argument is: Under some assumptions, plausible reasoning must proceed by the rules of probability. Fuzzy sets do not satisfy those assumptions. Therefore, there never was a need of inventing fuzzy sets (instead of: Therefore, I cannot say anything about fuzzy sets.)

      There are two ways in which fuzzy sets might get involved with Cox’s theorem, and I think it is important not to conflate them. One is as a representation of uncertainty, and the other as part of the logical machinery that powers the reasoning.

      It seems unavoidable that there is a persistent confusion between the actual aims of fuzzy sets and fuzzy logic, as forms of uncertainty/knowledge representation, and those of Jaynes’ ‘probability as logic’ project. Since fuzzy sets are subjective, uncertainty-tinged, and presented as a many-valued logic, many Bayesians –fond of Jaynes’ ‘probability as logic’ or otherwise– have persistently misread that they are meant for the same purpose they want probability for. Work on fuzzy sets is persistently mistaken as being motivated by a rejection of probability to which probability defenders should react (see e.g. expressions in paper titles like Cheeseman’s ‘In defense of probability’ or O’Hagan’s ‘Probability is perfect’).

      That is pure nonsense. If fuzzy sets are crude approximations to prior probabilities, why are there tens of thousands of papers on fuzzy sets yet nobody has ever bothered to define an update method to obtain ‘posterior’ fuzzy sets? And why are there hundreds of thousands of papers on probability yet nobody has ever bothered to define what the union and the intersection of two prior probabilities are?

      Thus fuzzy sets are not plausibility assessments in the sense of Cox’s argument, but one may certainly want to reason about pieces of knowledge that are formalized as fuzzy sets instead of ordinary sets. For instance, will tomorrow afternoon be cloudy? I don’t know, but I have to decide whether I’ll go to the beach. The weather is extremely variable this time of the year, the breeze is still strong and cold, and the beach is 55 minutes from my place. If it is non-cloudy, being there will be pleasurable. If it’s cloudy, I’ll waste 110 minutes of my life plus all the time I will be there hidden behind a rock, cursing my decision-making patterns.

      Set C=’Cloudy’ and let’s say my plausibility assignment is Pl(C)=.7, Pl(not-C)=.8. It is not rescalable to a probability via Cox’s theorem because I also assign Pl(C and not-C)=.6, which is inconsistent with Assumption (2). Still, it is a probability measure without any need of rescaling. My assignment violates no Kolmogorov axiom: since I don’t consider C and not-C to be disjoint events, they need not sum up to 1. I’ll even assign Pl(C or not-C)=.9, for modularity to be preserved (.6+.9=.7+.8).

      Your way of looking at this is to say that the propositions ‘C will happen’ and its negation are contradictory, so I cannot be speaking about the truth of whether event C will happen or not. And (if I understood you correctly), since I am not reasoning about truth, by using probability or ‘plausible reasoning’ I am invading a territory where fuzzy sets are not the right tool.

      It is true that, when C and not-C are looked at from the standpoint of fuzzy set theory, ‘not-C will happen’ does not match the meaning of the classically negated proposition ‘C will not happen’. In this sense I can do little to convince you not to interpret it as a meaning shift instead of a meaning extension.

      My way of looking at it is that I’m doing the same thing you do, in that you couldn’t detect anything wrong in Pl(C)=.7, Pl(D)=.8, Pl(C and D)=.6, Pl(C or D)=.9 for two non-disjoint events C,D. The only problem is that I label D with the name ‘not-C’, but you yourself imply that my label ‘not-‘ has a different meaning from your label ‘(true) not-‘, so why should that be a problem at all. My label ‘not-‘ means (again my abstract approach) that I commit to certain restrictions in choosing the plausibilities of some compound events involving ‘not-‘ tags. In all cases, your usage of the label ‘not-‘ makes you commit to the same restrictions (and others), which is why I see no undue trespassing at all.

      This is the sense in which Zadeh preaches that probability and fuzzy logic are complementary, rather than competitive. In problems in which fuzzy logic is a more appropriate modelling tool than binary logic, probability should be equipped to work with fuzzy sets, fuzzy events, or fuzzy propositions, instead of only binary sets, binary events, and binary propositions (see e.g. http://link.springer.com/chapter/10.1007%2F978-3-540-44465-7_1 ).

      As I mentioned before, there are two levels of discussion. Zadeh’s claim is a dual one. ‘Not competitive’ means that fuzzy sets are not a tool for doing the same thing probability does. But ‘complementary’ does not mean a situation of ‘separate magisteria’. It means that probability is built on classical logic but there is little intrinsic reason why it should be so.

      I hope this extremely winding reply clarifies something. If not, sorry for the verbosity 🙂

      • Oddly, I did have to approve the above comment… I guess I just don’t understand WordPress’s moderation system.

        I think you hit the nail on the head when you wrote that we don’t disagree, we just see the same pieces but assemble them in a different hierarchical order and looking for a different global sense.

        I don’t stand with Jaynes and others when they dismiss fuzzy logic; to me they seem to be laboring under a misapprehension. That said, whenever a fuzzy set membership function is considered relevant to a problem, the domain of the membership function is the fundamental set underlying the whole thing. It seems appropriate to go ahead and define a (the?) relevant sigma-algebra on the domain and then put a probability measure on that sigma-algebra — which is just what you’re saying in your cloudy-day-at-the-beach example, I think.

        I’m not familiar enough with the jargon to put this very precisely, but it seems to me that a similar thing happens for intuitionistic logic. You say the robot might find a Heyting algebra of propositions — I say that for any such set of propositions, the set of intuitionistically permitted “truth” assignments is mutually exclusive and exhaustive, and the robot properly puts the probability distribution on that fundamental underlying set.

      • I don’t want to step out of your interest zone, so I’ll try not to add new ramifications to the discussion.

        I think keeping both aspects separate as you suggest is a legitimate strategy. Although it seems to defeat the purpose of using fuzzy sets, i.e. having more expressive power than ordinary sets allow, there can be no problem with that if you don’t feel the need to have extra expressive power.

        In the cloudiness example, the probability measure is defined on a family of fuzzy sets which is not a Boolean algebra. I call those fuzzy sets events, but I guess you might want to call them non-events instead.

        One question is how you would communicate with somebody in that mindset. For instance, if I told you that yesterday afternoon the (non-)event not-C, having a certain membership function, happened, how would you update your prior probability that the cloud cover percentage was under 20%?

        In any case, don’t stop what you’re doing to think an answer to that. I pose it more as a rhetorical question.

        The insistence in truth seems to me a lot like if we called non-Euclidean geometries ‘not really true geometries’ because they are not about ‘true’ points and lines. Take for instance elyptic geometry.

        Just like you claim intuitionistic reasoning about a proposition P to be really classical reasoning about the proposition ‘P is provable’, in the geometry analogy you might say elyptic geometry is ‘really’ about relationships between objects in the surface of a sphere. You might say the Euclidean sphere is ‘true’ and the elyptic space is ‘not true’, and that elyptic geometry is not about points and lines in the elyptic space, but about pairs of diametrically opposed points and maximal circles in the sphere. Concurrently with your comment on Assumption (2), you might say that eliptic geometry doesn’t have semantics appropriate to be a geometry, since it is truly about circles, not lines.

      • I’m comfortable saying that elliptic geometry is really ‘about’ relationships in models that satisfy the axioms of the theory in question. One may prefer a convention that geometry considers theories that satisfy the parallel postulate; I personally prefer the convention that geometry considers theories that satisfy the first four of Euclid’s Axioms, whence ‘elliptic geometry’ is a sub-theory of ‘geometry’.

        Postulate 2 says that ‘plausible inference’ is about ‘propositions’ under the convention that ‘propositions’ satisfy the Law of the Excluded Middle. In an old paper of Dubois and Prade you linked on one of your blogs, de Finetti is quoted as writing,

        ‘Propositions are assigned two values, true or false, and no other, not because there “exists” an a priori truth called “excluded middle law”, but because we call “propositions” logical entities built in such a way that only a yes/no answer is possible… A logic, similar to the usual one, but leaving room for three or more [truth] values, cannot aim but at compressing several ordinary propositions into a single many-valued logical entity, which may well turn out to be very useful…’

        Insofar as the extra expressive power is just lossless compression, I guess you’re right — I don’t feel that I need it.

      • There’s a misquotation of De Finetti in a Dubois-Prade paper, from retranslating into English a translation error in the French edition. It might potentially be that fragment (or not; I mean it just as a warning).

        There is another bit from De Finetti, quoted in p.2 of
        according to which he considers his approach to probability as a many-valued logic. That might shed light on what he meant by saying that many-valued logic might be useful.

        It is interesting that De Finetti was aware of many-valued logic, which had been initiated only in the 1920s by Lukasiewicz and his students in Poland.

        I’m not sure if you imply that quoting De Finetti suggests a lack of posterior or serious arguments for many-valued logic (since you jump from De Finetti’s statement to the conclusion that you don’t need it). If so, quoting De Finetti is obviously a rhetorical device directed at Bayesians.

        Let me recap my position so far:
        1. Using Bayesian reasoning is OK.
        2. Not caring about logic except for Boolean logic is OK.
        3. The claim that probability is the general, universal, only valid way to reason about uncertainty does not follow from Cox’s argument.
        4. Proponents of Cox’s argument are happy with its premises, even if many things other people do which are not covered by them. Those other people can’t be blamed if they find Cox’s argument for Bayesianism normatively irrelevant to them.
        5. I would be happy to see someone extend Cox’s argument to the weakest logical assumptions possible. (There is ongoing research to extend the Dutch book argument since 2006, and an early 1994 paper by Jeff Paris.)

        I don’t think we can or want to say more about Assumption (2).

        One reason why I give Assumption (2) what you probably think is a disproportionate importance is that I don’t read the assumptions in arguments for Bayesianism as having been chosen innocently. In many cases, e.g. Cox’s argument, or the Dutch book argument, or the decision-theoretic approach in the Bernardo-Smith book, it is possible to depart from weaker assumptions and still carry the argument– only, solutions other than the traditional probability formulas appear.

        Since weaker assumptions are easier to accept, it would look like the most convincing variants are those in which probability doesn’t turn out to be the unique way to go. But those variants are not the ones used in the mainstream Bayesian literature: only those with more restrictive assumptions and ‘the desired conclusion’ are presented.

        From the outside, when you see that phenomenon repeated across arguments independently defended by different people, it takes some self-restrain not to suspect that the specific sets of assumptions may have been chosen because they lead to ‘the desired conclusion’ and then introduced as self-evident or common-sensical, instead of the other way round.

        That is sad, and detracts from the credibility of the whole Bayesian project. After all, it is far from evident that Bayesian probabilities should satisfy the same formulas as frequencies. Variants which don’t lead only to those formulas can be interpreted as counterexamples of a sort against those who do. So, whenever the assumptions in the arguments are explained by handwaving, and there’s in my opinion a lot of handwaving in e.g. Jaynes, I fear for the worse.

        (Just to clarify what I mean by handwaving: “Since the propositions now being considered are of the Aristotelian logical type which must be either true or false, the logical product [A and not-A] is always false, the logical sum [A or nor-A] always true. The plausibility that A is false must depend in some way on the plausibility that it is true.” Jaynes, p.30.)

      • I agree with your 5 points. But there is one thing more that I want to say about postulate 2: when I actually went back and read what I wrote about it in the original post, I realized that your comments and my responses don’t seems to be focused on the postulate per se, but rather on the fact that whole program being undertaken here is to extend classical logic, which basically requires limiting the universe of discourse to Boolean variables in the first place. Postulate 2 takes that as a given and just says that, e.g., X = ‘(A and B) or C’ has the same plausibility as Y = ‘(A or C) and (B or C)’ as a consequence of the fact that X and Y have the same truth table.

        I took the de Finetti quote as pointing out that one many-valued proposition is information-theoretically equivalent to a set of true/false propositions. Just as one programming language can be better suited to some tasks than another, a many-valued logic can be useful; nevertheless, just as one universal Turing machine can simulate a different UTM, nothing fundamentally inexpressible in terms of Boolean logic is actually going on in many-valued logics. (Edited to add: oh hey, apparently what I understood to be the idea behind the quote goes by the name Suszko’s Thesis.)

        It’s pretty clear that a certain amount of ‘targeting’ of the conclusion is going on, and Jaynes’s hand-waving in particular has come in for a lot of criticism from mathematicians and philosophers. But the fact that all of these very different approaches (one not on your list is Abraham Wald’s Complete Class Theorem) converge at the same place is reassuring to me. The sets of assumptions might be strong, but they are sufficiently different from each other to convey a certain robustness, to my thinking at least.

        I too am interested in what is possible when Cox’s assumptions are weakened. In fact, a fellow reader of Professor Mayo’s blog, Alexandre Galvão Patriota, asked me to have a look at a draft of a paper on the topic (the paper has since been submitted). If I recall correctly, he showed that possibility theory can be obtained that way. (I found that actually giving useful feedback was a pretty big project and I had other pressing commitments at the time, alas.)

      • (Btw, sorry for the grammatical inaccuracies. I’ve been writing late in the night. Also, for the record, Paris’s paper is from 2001, not 1994. That’s another Paris paper on Cox’s theorem.)

        If one day somebody is crazy enough to go through all this exchange, they may appreciate some references on ‘more general’ variants, or at least some names to google. On Cox’s theorem there’s Romas Aleliunas’s work in 1986-90, which seems to have been distributed mainly through AI conferences and philosophical journals. On the Dutch book argument, see Peter Walley’s 1991 book on imprecise probabilities. On Bayesian decision theory, there’s plenty of material since David Schmeidler’s 1989 paper. There have been a lot of Econometrica papers in the last ten or fifteen years by Massimo Marinacci and co-workers, among other people. Don’t trust my dates though 😉

        I knew Wald presented an argument, but I never got to looking it up. For what little I know it is based on procedure admissibility, which is what deterred me since Lindley’s admissibility argument does not do the job (Goodman-Nguyen-Rogers 1991). But maybe the arguments are otherwise different.

        Of course, Assumption (2) is reasonable within the project of extending classical logic to probability, which is itself a reasonable project. The problem is when people follow e.g. Jaynes’ dictum “It is clear that, not only is the quantitative use of the rules of probability theory as extended logic the only sound way to conduct inference; it is the failure to follow those rules strictly that has for many years been leading to unnecessary errors, paradoxes, and controversies” (my emphasis).

        I don’t think Suszko’s argument is applicable to our situation. I think it is a defensive argument (in the sense of not being directed at showing that what I deny is right) and therefore unnecessary, since I’m not denying anything of what you do. Moreover, it’s the geometry thing again: only one model is ‘right’ and the others are ‘unnecessary’ because they could be rewritten in the language of the first.

        I find this kind of argument dubious because it depends on already accepting the exceptionality of the ‘favourite’ model. Since every Boolean algebra is a model of every fuzzy logic, I could use the exact same argument to claim Boolean logic is unnecessary: I can speak about Boolean objects without committing to Boolean logic. The question is: I can, so what?

        I know a 2013 paper by Patriota in Fuzzy Sets and Systems. He defines a ‘measure of evidence’ for a hypothesis which is a possibility measure. Thus it’s attractive to try and extend Cox’s theorem to cover possibility.

        In my understanding (I haven’t checked the details), possibility measures are already allowed by Aleliunas’s version of Cox’s theorem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

In the Dark

A blog about the Universe, and all that surrounds it

Minds aren't magic

Paul Crowley

Mad (Data) Scientist

Musings, useful code etc. on R and data science


Reasoning about reasoning, mathematically.

The Accidental Statistician

Occasional ramblings on statistics

Slate Star Codex

The Schelling Point for being on the #slatestarcodex IRC channel (see sidebar) is Wednesdays at 10 PM EST

Models Of Reality

Stochastic musings of a biostatistician.

Data Colada

Thinking about evidence and vice versa

Hacked By Gl0w!Ng - F!R3

Stochastic musings of a biostatistician.

John D. Cook

Stochastic musings of a biostatistician.

Simply Statistics

Stochastic musings of a biostatistician.

Less Wrong

Stochastic musings of a biostatistician.

Normal Deviate

Thoughts on Statistics and Machine Learning

Xi'an's Og

an attempt at bloggin, nothing more...

%d bloggers like this: