# How “Severity, part one” got Mayo wrong (and a preview of “Two Severities”)

In the comments to my first post on severity, Professor Mayo noted some apparent and some actual misstatements of her views.To avert misunderstandings, she directed readers to two of her articles, one of which opens by making this distinction:

“Error statistics refers to a standpoint regarding both (1) a general philosophy of science and the roles probability plays in inductive inference, and (2) a cluster of statistical tools, their interpretation, and their justiﬁcation.”

In Mayo’s writings I see two interrelated notions of severity corresponding to the two items listed in the quote: (1) an informal severity notion that Mayo uses when discussing philosophy of science and specific scientific investigations, and (2) Mayo’s formalization of severity at the data analysis level.

One of my besetting flaws is a tendency to take a narrow conceptual focus to the detriment of the wider context. In the case of Severity, part one, I think I ended up making claims about severity that were wrong. I was narrowly focused on severity in sense (2) — in fact, on one specific equation within (2) — but used a mish-mash of ideas and terminology drawn from all of my readings of Mayo’s work. When read through a philosophy-of-science lens, the result is a distorted and misstated version of severity in sense (1) .

As a philosopher of science, I’m a rank amateur; I’m not equipped to add anything to the conversation about severity as a philosophy of science. My topic is statistics, not philosophy, and so I want to warn readers against interpreting Severity, part one as a description of Mayo’s philosophy of science; it’s more of a wordy introduction to the formal definition of severity in sense (2).

Pingback: Two Severities? (PhilSci and PhilStat) | Error Statistics Philosophy

Awaiting your simple toy model with interest, as I often struggle to follow Mayo’s examples on her blog and am hoping this one will shed some light on what SEV actually entails.

– A side note

I find myself regular asking myself questions like “what is the probability that there is also signal in my data and not just noise” as well “how well does a Poisson distribution with a measured rate fit my data.” Both of these seem to be at least reasonable questions. Or does the supposed lack of philosophical consistency between these questions indicate I am somehow doing science wrong?

I don’t see any lack of philosophical consistency. Care to explain?

The former involves computing Bayes factors for the signal+noise hypothesis vs the noise-only hypothesis given a background measurement and a foreground search. The latter entails (at least for me) using the Pearson ChiSq test or the Kolmogorov-Smirnov test to check how well the data matches a hypothesized likelihood function.

As I find the techniques for both eminently useful in understanding the character of my results, I am not going to stop. But the “how dare you pick and chose statistical methods irrespective of the philosophical underpinnings” stance seems to be real and not a product of my imagination. I see value in discussing foundations in this subject but the nature of discourse usually leads me to concluding it was a waste of time.

If you’re satisfied with the output of a Chi-squared or K-S test, I’m not going to gainsay you — you know best what fits your requirements.

That said, the whole point of doing such a test is that on the basis of your prior information, you can’t rule out the possibility that your data isn’t Poisson. From a Bayesian point of view, that indicates that your prior information is best represented by a prior probability measure on the space of pmfs on the naturals; such things fall into the category of “Bayesian nonparametrics”. There are Bayesian nonparametric pmf estimation approaches that will answer the question “how well does a Poisson distribution with a measured rate fit my data” in a fully Bayesian way — although once you have your posterior measure, it can answer substantive questions beyond just the question of Poisson-ness too, so perhaps the whole issue of Poisson-ness would fall by the wayside.

I’ll never say “how dare you”, but I will say “this is how I would do it if I had unbounded resources and no time constraints”.

The Chi-squared test has a perfectly good Bayesian justification (although the explination for what’s happening and why is different) which is why Jaynes had no problem using it.

It’s true that there’s a sensible Bayesian interpretation of the Chi-squared test. I forgot about that!

(The question such tests answer are usually not the substantive questions of interest, though. What I wrote about how having the Bayesian nonparametric posterior measure in hand likely makes such questions superfluous still goes.)

West: I think your first question is captured along the following (related) lines: How well does “noise alone” accord with the data (e.g., as in Higgs Boson experiments: discerning if background alone could readily explain observed bumps), or you might be asking how much of it could be accounted for by sources of “noise”. The probability actually refers to the method’s giving rise to thus and so observed pattern–that is, it’s a probability given by the sampling distribution.

Of course it could be that you’re asking about the relative frequency with which noise generates such and such data, in which case, the prior probability is frequentist. In either case, the question is well captured by frequentist methods.

I know certain people on this blog may wish to deny I could know what I’m talking about regarding these essentially philosophical and logical issues (which is why I rarely drop by), but that should not deter you from getting to the bottom of things. Good luck.

Mayo: The question I find myself work on these days is “what is the likelihood of obtaining N coincident pairs from spatially separate instruments given estimates of the single detector event rates and the width of the coincidence window.” Quite of bit of frequentist thinking, courtesy of my advisor, went into designing the heirarchical model that answers this question, despite my own Bayesian proclivities. Since real detector data rarely has fortuitous properties of being Poisson distributed in time or chi-squared in power, it was necessary to develop tests to check my modeling assumptions (as mentioned in my first comment). Also helps to check that the algorithm is working by feeding it data where I know the answer a priori, but assume as little as possible it’s properties.

I guess the wall of text above was an attempt to provide some sort of explanation of my heterodox statistical habits.

At this point, my hanging about blogs that discuss the topics of inference, probability and the like, amounts to an effort to improve my communication skills and understand different perspectives. Not sure yet whether commenting has been an overall positive or negative.

West: Thanks for your response. Who is your advisor? I find the statistical inferences in high energy physics very illuminating. Even though I’m a complete outsider, I hope to learn a lot more through Robert Cousins in the next year.

“Likelihood” alludes to fixed data, and so I think your first use should be probability.

I too wonder whether commenting on blogs, or having a blog is positive or negative (errorstatistics.com).

Mayo,

You have my personal word that I will never respond to your comments or criticize anything you write again in any venue. So please educate the world about the error of their Bayesian ways free of worry.

Regards,

Joseph

Corey: Apologies for jacking the thread.

Not at all! One of the reasons I started the blog was to have conversations with people who care about these things.

(And I’ve jacked a good few comment threads in my time; I’d not really be in a position to complain even if I minded.)

Trying to stear this back towards SEV

Many physics experiments tackle the twin problems of detection and parameter estimation of quiet signals buried in background. Sometimes it’s involves counting rare particle decays in the case of the Higgs. Other times it involves detecting anomalous fluctuations in a noisy time series.

How would one using this methodology (SEV) distinguish signal from noise or in the case of a single hypothesis, quantify departure from the expectation from noise alone? Does one get parameter estimation for free by solving the detection problem or are addition tests needed? What data products are needed before making any inferences? Is it possible to do prediction in this paradigm, and not just post experiment analysis?

While no doubt commenters can answer these questions for other inference paradigms, I am interested in how someone practicing error statistics would tackle these issues. I promise these questions stem from curiousity, not a desire to antagonize.

My next post will give the simplest SEV analysis I can imagine; I’m not going to attempt an answer to this question until that’s done. Then we’ll be able to try to make correspondences between the simple analysis and the signal detection problem.

Hey Corey, it looks like you’re on track to beat 2013’s record of 5 posts/year. Wahoo!