AIDS vaccine and statistical significance
October 20, 2009 at 1:01 PM by Dr. Drang
This morning came word that the results of the AIDS vaccine trial, reported as a modest success last month, were, on further analysis, not statistically significant. I’ve been waiting a couple of weeks for this shoe to drop, as even the originally-reported results were pretty weak. I’m not claiming any expertise in vaccine research, but I do know how to do elementary statistics.
According to the LA Times article, the switch from statistically significant to not statistically significant came when seven of the test subjects were reclassified. That’s seven subjects out of over 16,000. How can such a small change flip the results from success to failure? Because the original results were on the edge of statistical significance to begin with, as anyone with a little statistics background could have calculated.
Here are the results as they were first given:
Observed | Vaccine | No vaccine |
---|---|---|
HIV positive | 51 | 74 |
HIV negative | 8,146 | 8,124 |
This is what’s known as a 2×2 contingency table. There are two treatments, vaccine and no vaccine, and two outcomes, HIV positive and HIV negative. The basic test of statistical significance poses the following question:
Assume that the vaccine has no effect, that there would have been 125 HIV-positive test subjects even if all 16,395 subjects had been untreated. This “pretend there’s no effect” assumption is called the null hypothesis. If one had then arbitrarily split the test population into two groups, 8,197 in one group and 8,198 in the other, what is the probability that the distribution of HIV-positive subjects would be at least as skewed as they turned out?
By “at least as skewed” I mean a distribution of the HIV-positive subjects between the vaccinated and unvaccinated that’s at least as far from an equal distribution as the test results were. Since the test results were 51-74, distributions like 50-75, 49-76, 48-77, etc., all the way to 0-125, would all fall into the “at least as skewed” category. (So, by the way, would results skewed the opposite way: 74-51, 75-50, and so on. Results like this would suggest the vaccine has a negative effect, but that’s also a violation of the null hypothesis.)
If the calculated probability is sufficiently low—in other words, if it’s quite unlikely that the test results were due to chance alone—we say that the results are statistically significant. It’s fairly common to use 5%, one chance in twenty, as the upper limit for statistical significance, but lower values are sometimes used. In fact, it’s often thought to be good practice to provide that probability instead of just reporting whether the results are significant or not.
Which leaves us with the problem of calculating that probability. Fortunately, the 2×2 contingency table is a very well-studied problem, and the procedure is straightforward. First, we rewrite the table of test results (often called the observed values) including the row and column sums.
Observed | Vaccine | No vaccine | Sum |
---|---|---|---|
HIV positive | 51 | 74 | 125 |
HIV negative | 8,146 | 8,124 | 16,270 |
Sum | 8,197 | 8,198 | 16,395 |
If the null hypothesis is true, then the probability of any subject in the test group becoming HIV positive is
or about three-quarters of one percent. We then make a similar table filled, not with the test results, but with the expected values based on the above probability
Expected | Vaccine | No vaccine | Sum |
---|---|---|---|
HIV positive | 62.496 | 62.504 | 125 |
HIV negative | 8134.504 | 8135.496 | 16,270 |
Sum | 8,197 | 8,198 | 16,395 |
These are the expected results under the null hypothesis. Note that the row and column sums remain the same.
Now that we have the expected values, we calculate the deviation of the observed values from the expected values, which is just a subtraction. We’ll leave out the row and column sums this time, because they don’t deviate.
Deviation | Vaccine | No vaccine |
---|---|---|
HIV positive | -11.496 | 11.496 |
HIV negative | 11.496 | -11.496 |
These deviations are usually called errors, even though no mistakes have been made. Statisticians spend a a lot of time studying the properties of errors. For a 2×2 contingency table with a sufficiently high count in each table cell, statisticians have shown that if we square the errors,
Square error | Vaccine | No vaccine |
---|---|---|
HIV positive | 132.16 | 132.16 |
HIV negative | 132.16 | 132.16 |
standardize the square errors by dividing by the expected values,
Standardized | Vaccine | No vaccine |
---|---|---|
HIV positive | 2.11473 | 2.11447 |
HIV negative | 0.01625 | 0.01625 |
and add all the values in the table,
we get a number known as the chi-squared statistic, so called because, under the null hypothesis, the sum of the standardized square errors is a random variable with a chi-squared distribution.
The chi-squared distribution is actually a family of distributions, parameterized by the number of degrees of freedom. For a 2×2 contingency table, the number of degrees of freedom is 1. Why is it called the degrees of freedom and why is it 1 in this case? Consider our 2×2 contingency table with none of the observed values filled in, but with the row and column sums fixed.
Observed | Vaccine | No vaccine | Sum |
---|---|---|---|
HIV positive | 125 | ||
HIV negative | 16,270 | ||
Sum | 8,197 | 8,198 | 16,395 |
If I were to give you just one of the missing values, you’d be able to fill in the rest of the table through subtraction from the row and column sums. In a sense, then, I am free to fill in only one of the values; all the others are contingent on that one. That’s why there’s only one degree of freedom in a 2×2 contingency table.
(If you’re wondering why the row and column sums are fixed, it’s because those sums are bound up in the null hypothesis, and we are doing our calculations under that hypothesis.)
We’re now just one step away from our answer. The probability, under the null hypothesis, that the distribution of HIV-positive test subjects would be at least as skewed as the observed results is equal to the probability that a chi-squared random variable with one degree of freedom would be larger than 4.2617. In the old days, when I was a student, we’d look this number up in a chi-squared table, but today we have more convenient options.
I used the chi2cdf
function in Octave to get an answer of 0.03898. You could also use this Wolfram Alpha page to get the same answer.1 So there’s a 4% chance of getting results at least as skewed as those observed in the study. This is close to the usual 5% boundary, so it’s not surprising that reclassifying a few subjects would make the results not statistically significant.
Addendum
Here’s a transcript of the Octave session in which I made all the calculations. It takes a lot less time to do it than to explain it. Note that in some cases I’m using regular matrix operators (-
, *
), and in other cases I’m using element-by-element operators (.^
, ./
).
octave-3.2.3:1> obs = [51, 74; 8146, 8124]
obs =
51 74
8146 8124
octave-3.2.3:2> sum(obs,1)
ans =
8197 8198
octave-3.2.3:3> sum(obs,2)
ans =
125
16270
octave-3.2.3:4> p = sum(obs,2)(1)/sum(sum(obs))
p = 0.0076243
octave-3.2.3:5> exp = [p; 1-p]*sum(obs,1)
exp =
62.496 62.504
8134.504 8135.496
octave-3.2.3:6> err = obs - exp
err =
-11.496 11.496
11.496 -11.496
octave-3.2.3:7> sqerr = err.^2
sqerr =
132.16 132.16
132.16 132.16
octave-3.2.3:8> stdsqerr = sqerr./exp
stdsqerr =
2.114726 2.114468
0.016247 0.016245
octave-3.2.3:9> chi2stat = sum(sum(stdsqerr))
chi2stat = 4.2617
octave-3.2.3:10> 1 - chi2cdf(chi2stat,1)
ans = 0.038981
-
Note that I used 10,000 as the “right endpoint” on the Wolfram Alpha page. What I really wanted as the right endpoint was infinity, but I couldn’t figure out how to tell Alpha that (neither “infinity” nor “Infinity” worked). Because the chi-squared density function drops off rapidly with increasing values, 10,000 was a good proxy for infinity. Even 100 would have given four digits of accuracy in the calculated probability. ↩