**by Greg Mayer**

*The New York Times* has a data analysis division which they call The Upshot; I think they created it to compensate for the loss of Nate Silver’s *538*, which was once hosted by the *Times*. The Upshot reporters and analysts tend to be policy wonks with some statistical savvy, so I took note of a big story they had on page 1 of Sunday’s (2 January) paper on why many prenatal tests “are usually wrong.”

The upshot, if you will, of the story is that many prenatal tests for rare chromosomal disorders unnecessarily alarm prospective parents because, even if the test result is positive, it is unlikely that the fetus actually has the disease. This is because when a disease is rare most positives are false positives, even when the test is quite accurate. For the five syndromes analyzed by the *Times*, the proportion of false positives (i.e. “wrong” results) ranged from 80% to 93%!

The *Times* does not go into detail of how they got those figures, but from links in their footnotes, I think they are empirical estimates, based on studies which did more conclusive followup testing of individuals who tested positive. My first thought, when looking at Sunday’s paper itself (which of course doesn’t have links!), was that they had used Bayes’ Theorem, the manufacturers’ stated *sensitivity* and *specificity* for their tests (the two components of a test’s accuracy), and the known *prevalence* of the condition to calculate the false positive rate.

Bayes’ Theorem is an important result in probability theory, first derived by the Rev. Thomas Bayes, and published posthumously in 1763. There is controversy over the school of statistical inference known as Bayesian statistics; the controversy concerns how one can form a “prior probability distribution”, but in this case we have an empirically derived prior probability distribution, the prevalence, which can be thought of as the probability of an individual drawn at random from the population in which the prevalence is known (or well-estimated) having the condition. There is thus no controversy over the application of Bayes’ Theorem to cases of disease diagnosis when there is a known prevalence of the condition, such as in the cases at hand.

Here’s how it works. (Remember, though, that I think the *Times* used empirical estimates of the rate, not this type of calculation.)

Using Bayes’ Theorem, we can say that the probability of having a disease (D) given a positive test result (+) depends on the sensitivity of the test (= the probability of a positive result given an individual has the disease, P(+∣D)), the specificity of the test (= the probability of a negative result given an individual does not have the disease, P(-∣ not D)), and the prevalence of the disease (= the probability that a random individual has the disease, P(D)). Formally,

P(D∣+) = P(+∣D)⋅P(D)/P(+)

where the terms are as defined above, and P(+) is the probability of a random individual testing positive. This is given by the sensitivity times the prevalence plus the specificity times (1- the prevalence), or

P(+) = P(+∣D)⋅P(D) + P(**+**∣ not D)⋅(1-P(D))

The whole thing in words can be put as

probability you are ill given a positive test =

sensitivity⋅prevalence/[sensitivity⋅prevalence + (**1-**specificity)⋅(1-prevalence)]

Let’s suppose we have a sensitive test, say P(+∣D)=.95, which is also quite specific, say P(-∣ not D)=.95 (sensitivity and specificity need not be equal; this is only a hypothetical), and a low prevalence, say P(D)=.01. Then

probability you are ill given a positive test =

= (.95)(.01)/[(.95)(.01)+(.05)(.99)]

= .16.

Thus, if you had a positive test, 84% of the time it would be “wrong”! This is right in the neighborhood of the rates found by the *Times* for the five conditions they examined. Notice that in this example, both sensitivity and specificity are high (which is good– you want both of these to be near the maximum of 1.0 if possible), but because prevalence is low (.01), the test is still usually “wrong”.

In an earlier discussion of Bayes’ Theorem, Jerry noted:

This [tests for rare conditions being usually wrong] is a common and counterintuitive result that could be of practical use to those of you who get a positive test. Such tests almost always mandate re-testing!

He’s absolutely right. A test with these properties is useful for screening, but not for diagnosis– you’d usually want to get a more definitive test before making any irreversible medical decisions. (For COVID 19, for example, PCR tests are more definitive than the quicker antigen tests.) The *Times* also discusses some of the unsavory aspects of the marketing of these tests, and the tragedy of the truly life and death decisions that can ensue, all of which flow from the results of the tests being misunderstood.

(Note: an alert reader spotted a mistake in the verbal equation, and in checking on it I spotted another in one of the symbolic equations. Both corrections have now been made, which are in **bold** above. The numerical result was not affected, as I’d used the correct numbers for the calculation, even though my verbal expression of them was wrong!)

For a nice but brief discussion, with some mathematical details, of the application of Bayes’ theorem to diagnosis, see sections 1.1-1.3 of Richard M. Royall’s *Statistical Evidence: A Likelihood Paradigm* (Chapman &Hall, London, 1996). Royall is not a Bayesian, which demonstrates the uncontroversial nature of the application of Bayes’ Theorem to diagnosis.