AIDS vaccine and statistical significance
October 20, 2009 at 1:01 PM by Dr. Drang
This morning came word that the results of the AIDS vaccine trial, reported as a modest success last month, were, on further analysis, not statistically significant. I’ve been waiting a couple of weeks for this shoe to drop, as even the originallyreported results were pretty weak. I’m not claiming any expertise in vaccine research, but I do know how to do elementary statistics.
According to the LA Times article, the switch from statistically significant to not statistically significant came when seven of the test subjects were reclassified. That’s seven subjects out of over 16,000. How can such a small change flip the results from success to failure? Because the original results were on the edge of statistical significance to begin with, as anyone with a little statistics background could have calculated.
Here are the results as they were first given:
Observed  Vaccine  No vaccine 

HIV positive  51  74 
HIV negative  8,146  8,124 
This is what’s known as a 2×2 contingency table. There are two treatments, vaccine and no vaccine, and two outcomes, HIV positive and HIV negative. The basic test of statistical significance poses the following question:
Assume that the vaccine has no effect, that there would have been 125 HIVpositive test subjects even if all 16,395 subjects had been untreated. This “pretend there’s no effect” assumption is called the null hypothesis. If one had then arbitrarily split the test population into two groups, 8,197 in one group and 8,198 in the other, what is the probability that the distribution of HIVpositive subjects would be at least as skewed as they turned out?
By “at least as skewed” I mean a distribution of the HIVpositive subjects between the vaccinated and unvaccinated that’s at least as far from an equal distribution as the test results were. Since the test results were 5174, distributions like 5075, 4976, 4877, etc., all the way to 0125, would all fall into the “at least as skewed” category. (So, by the way, would results skewed the opposite way: 7451, 7550, and so on. Results like this would suggest the vaccine has a negative effect, but that’s also a violation of the null hypothesis.)
If the calculated probability is sufficiently low—in other words, if it’s quite unlikely that the test results were due to chance alone—we say that the results are statistically significant. It’s fairly common to use 5%, one chance in twenty, as the upper limit for statistical significance, but lower values are sometimes used. In fact, it’s often thought to be good practice to provide that probability instead of just reporting whether the results are significant or not.
Which leaves us with the problem of calculating that probability. Fortunately, the 2×2 contingency table is a very wellstudied problem, and the procedure is straightforward. First, we rewrite the table of test results (often called the observed values) including the row and column sums.
Observed  Vaccine  No vaccine  Sum 

HIV positive  51  74  125 
HIV negative  8,146  8,124  16,270 
Sum  8,197  8,198  16,395 
If the null hypothesis is true, then the probability of any subject in the test group becoming HIV positive is
$$p=\frac{125}{16,395}=0.007624$$or about threequarters of one percent. We then make a similar table filled, not with the test results, but with the expected values based on the above probability
Expected  Vaccine  No vaccine  Sum 

HIV positive  62.496  62.504  125 
HIV negative  8134.504  8135.496  16,270 
Sum  8,197  8,198  16,395 
These are the expected results under the null hypothesis. Note that the row and column sums remain the same.
Now that we have the expected values, we calculate the deviation of the observed values from the expected values, which is just a subtraction. We’ll leave out the row and column sums this time, because they don’t deviate.
Deviation  Vaccine  No vaccine 

HIV positive  11.496  11.496 
HIV negative  11.496  11.496 
These deviations are usually called errors, even though no mistakes have been made. Statisticians spend a a lot of time studying the properties of errors. For a 2×2 contingency table with a sufficiently high count in each table cell, statisticians have shown that if we square the errors,
Square error  Vaccine  No vaccine 

HIV positive  132.16  132.16 
HIV negative  132.16  132.16 
standardize the square errors by dividing by the expected values,
Standardized  Vaccine  No vaccine 

HIV positive  2.11473  2.11447 
HIV negative  0.01625  0.01625 
and add all the values in the table,
$${\chi}^{2}=2.11473+2.11447+0.01625+0.01625=4.2617$$we get a number known as the chisquared statistic, so called because, under the null hypothesis, the sum of the standardized square errors is a random variable with a chisquared distribution.
The chisquared distribution is actually a family of distributions, parameterized by the number of degrees of freedom. For a 2×2 contingency table, the number of degrees of freedom is 1. Why is it called the degrees of freedom and why is it 1 in this case? Consider our 2×2 contingency table with none of the observed values filled in, but with the row and column sums fixed.
Observed  Vaccine  No vaccine  Sum 

HIV positive  125  
HIV negative  16,270  
Sum  8,197  8,198  16,395 
If I were to give you just one of the missing values, you’d be able to fill in the rest of the table through subtraction from the row and column sums. In a sense, then, I am free to fill in only one of the values; all the others are contingent on that one. That’s why there’s only one degree of freedom in a 2×2 contingency table.
(If you’re wondering why the row and column sums are fixed, it’s because those sums are bound up in the null hypothesis, and we are doing our calculations under that hypothesis.)
We’re now just one step away from our answer. The probability, under the null hypothesis, that the distribution of HIVpositive test subjects would be at least as skewed as the observed results is equal to the probability that a chisquared random variable with one degree of freedom would be larger than 4.2617. In the old days, when I was a student, we’d look this number up in a chisquared table, but today we have more convenient options.
I used the chi2cdf
function in Octave to get an answer of 0.03898. You could also use this Wolfram Alpha page to get the same answer.^{1} So there’s a 4% chance of getting results at least as skewed as those observed in the study. This is close to the usual 5% boundary, so it’s not surprising that reclassifying a few subjects would make the results not statistically significant.
Addendum
Here’s a transcript of the Octave session in which I made all the calculations. It takes a lot less time to do it than to explain it. Note that in some cases I’m using regular matrix operators (
, *
), and in other cases I’m using elementbyelement operators (.^
, ./
).
octave3.2.3:1> obs = [51, 74; 8146, 8124]
obs =
51 74
8146 8124
octave3.2.3:2> sum(obs,1)
ans =
8197 8198
octave3.2.3:3> sum(obs,2)
ans =
125
16270
octave3.2.3:4> p = sum(obs,2)(1)/sum(sum(obs))
p = 0.0076243
octave3.2.3:5> exp = [p; 1p]*sum(obs,1)
exp =
62.496 62.504
8134.504 8135.496
octave3.2.3:6> err = obs  exp
err =
11.496 11.496
11.496 11.496
octave3.2.3:7> sqerr = err.^2
sqerr =
132.16 132.16
132.16 132.16
octave3.2.3:8> stdsqerr = sqerr./exp
stdsqerr =
2.114726 2.114468
0.016247 0.016245
octave3.2.3:9> chi2stat = sum(sum(stdsqerr))
chi2stat = 4.2617
octave3.2.3:10> 1  chi2cdf(chi2stat,1)
ans = 0.038981

Note that I used 10,000 as the “right endpoint” on the Wolfram Alpha page. What I really wanted as the right endpoint was infinity, but I couldn’t figure out how to tell Alpha that (neither “infinity” nor “Infinity” worked). Because the chisquared density function drops off rapidly with increasing values, 10,000 was a good proxy for infinity. Even 100 would have given four digits of accuracy in the calculated probability. ↩