At the most basic, I want to measure things. The parameter of a distribution labels a distribution and hence represents a distribution. A sample of n=1 is a bad measurement of a distribution to me.

Yes I can set up and solve a decision problem for an n=1 problem, but that involves a whole lot of other factors. For whatever reason, in real-life decisions I seem to often use a minimax approach rather than a most probable case.

PS Looking for good discussions I found this by Efron which is actually a pretty fascinating read (don’t think I’d read it before):

https://projecteuclid.org/download/pdf_1/euclid.aos/1176345778

(‘Maximum likelihood and decision theory’)

Are we interested in

– Properties of collectives of estimates, OR

– Properties of estimates of collectives

?

E.g. in this case we are asked to make an estimate (or decision or whatever) based on a single sample. You could think of this as a decision problem where the goal is to make decisions in a series of n=1 cases. You want a good decision rule to follow. So this framing is about a collective of decisions (i.e. a decision rule!) and its properties.

On the other hand, another natural way to think of statistics is as estimating the characteristics (=parameters) of collectives (=distributions or datasets). In this case any estimate of a population based on n=1 is likely to be bad, no? The key question is then: can I estimate the characteristics of a population with n > 1 but n < N0, I have some guarantee of having a good estimate.

The answer is, presumably, yes for some characteristics and given some further assumptions. In the present case we could define a series of estimates for each n, Tn, e.g. the max likelihood estimate. Now I don’t care about having a good estimate for n =1, I just care about having a good estimate for some n > 1 but less than infinity. E.g. that the estimate ‘stabilises’ after n > N0 for some N0 and so I don’t have to collect more (or infinite amounts of!) data. The central limit theorem (and more sophisticated limit theorems) help us out here.

Now, whether these problems amount to something like the same thing (as Fisher-Neyman unification advocates would argue) I don’t know. But I do know that there is a difference between a sequence of M, n = 1 estimates and an estimate based on M measurements. The following classic bad joke illustrates this:

> Three statisticians go hunting. When they see a rabbit, the first one shoots, missing it on the left. The second one shoots and misses it on the right. The third one shouts: “We’ve hit it!”

]]>Regardless, as mentioned to Corey, you could set up an essentially equivalent example without an explicit gap and instead with a discontinuous pdf. The math would work the same but perhaps the intuition is different.

In this case there is no physical reason for the gap and so d(x1,x2) = 0, while L(theta; x1) != L(theta; x2). That is: you claim different evidence based on an arbitrarily small difference in the data.

This seems more like an argument against using a likelihood-based approach when you allow discontinuous densities, than in the case where there was a gap that arose from some clear physical censoring.

An obvious (but somewhat hacky) solution in the discontinuous case is to define the likelihood as a smoothed version of the discontinuous pdf and take the limit as the smoothing goes to infinity. Or to average over a small neighbourhood of the data, which amounts to the same sort of thing. But this is just a consequence of densities being somewhat ‘ill-defined’ objects requiring either regularity conditions or regularisation methods.

On the other hand, this hack is possibly not available for the ‘physical gap’ case. It doesn’t make as much sense to smooth over the large physical gap. So I’m back to wondering whether there should be explicit treatment of the censoring or not?

]]>I do address this point in the post. It’s in the middle of the wall of text; you’ll find it with a Ctrl-F for “intuition”.

I want to be clear that I’m **not** saying severity is Wrong, capital W. The point is more to write down a simple model, as close as possible to the sort of exponential family model that always has default-Bayes/frequentist agreement, in which the “correct” Neyman-Pearson procedure is inescapable and in which the contrast between the kinds of inferences the two formal methods license is very stark.

But then I’d say indeed your intuition that this is somehow wrong is no less questionable than the test/severity result, because indeed something as big as x=-1 is very rarely observed under μ ≤ 0, and much more often under for example μ=1. So there is a case for μ > 0 here, isn’t it?

I’m generally wary of arguments like “so-and-so result looks weird in my intuition so must be bad”. (Apparently you wrote an earlier posting in which you asked people to vote what they think the answer should be – but aren’t we doing formal statistics partly because human intuitions with probabilities should not be trusted?) ]]>

The stuff Mayo writes about “discrepancy from the null” is mainly to address “howlers” about how tests treat all alternatives (and composite nulls) the same in the face of data. She uses SEV in the subjunctive mode to show what different tests would have licensed had they been performed (and then sliiiiiides on over to saying the resulting inferences are licensed even though the tests were not pre-specified). But fine, that’s severity the informal concept, not SEV the function.

Okay, let’s get our Neyman-Pearson on.

H0: μ ≤ 0

Ha: μ > 0.

Type I error rate α = 0.05

The Karlin-Rubin theorem says the UMP test is the threshold test that rejects H0 when X > x_threshold. So we need x_threshold such that

Fr(X ≤ x_threshold ; μ = 0) = 1 – α,

and if the observed value of x exceeds x_threshold then we reject H0: μ ≤ 0.

That threshold value is… wait for it… wait for it…

x_threshold = -1.028

Yes, really.

What severity you compute depends on the result of the test. If you reject the H0, you have evidence against it, and can ask, on top of this, whether you can also rule out values close to the H0 but not part of it. E.g., if you reject μ ≤ 0, you may wonder whether there's severe evidence even against μ ≤ c, c > 0, or equivalently, in favour of μ > c. If you don't reject H0, as here, you can only ask whether you have severe evidence against part of the alternative, i.e., against μ > c or equivalently in favour of μ ≤ c, c > 0, but you cannot have severe evidence against the H0, as you'd claim when claiming μ > 0.

]]>