Binomial baseball

While reading a recap of last night’s World Series games, I saw this statistic: of the 65 Series that have had a sixth game, the team with the 3–2 lead has won the Series 43 times. This was, I think, intended to show us that the Astros have a strong chance to beat out the Dodgers for the title. And they do. But not as strong as you might expect.

If the teams were evenly balanced and each game independent of the others, we would expect the team with the 3–2 lead to win 75% of the time. 50% of the time they’d win the sixth game and the Series would be over; 25% of the time (50% of the other 50%) they’d win in the seventh game. So the leading team “should” have won the Series 48 or 49 times out of 65, not 43 times.

Is this 5 or 6 game difference meaningful? For that we need to do some calculations using the binomial distribution. Python’s SciPy set of libraries has a subsection of statistical modules, including one for binomial distribution calculations. We can import it this way:

python:
from scipy.stats import binom

Let’s start by figuring out the probability that the leading team would win 43 times in 65 trials. With a 75% probability of winning the Series in each trial, the probability of 43 Series wins in 65 chances is calculated through

python:
binom.pmf(43, 65, .75)

where the pmf function gets its name from the standard abbreviation for “probability mass function.” The answer is 0.029 or just under 3%. This makes it seem very unlikely that our assumption of 50–50 games would lead to only 43 Series wins for the leading team.

But that isn’t the way these sorts of calculations are normally done. If we want to find out if a seemingly out-of-whack result is “statistically significant,” we should look at the probability of results that are at least as far away from our expectations as the actual result was. In our case, that means looking not only at the probability of 43 Series wins out of 65 chances, but also 42 wins, 41 wins, and so on. We then add up all of these “at least as weird” probabilities.

The usual terminology for this sort of summation is “cumulative distribution function,” and the binom module has a function for it:

python:
binom.cdf(43, 65, .75)

The result is 0.0695, or about 7%. Another way of looking at this is that if our assumption of 50–50 games were correct, there’s a 93% chance that the leading team would win the Series more than 43 times in 65 chances.

In hypothesis testing, the value 0.0695 is called the p-value, and it’s common in many fields to consider a result statistically significant if its p-value is less than 0.05. Using that criterion, we would not take the difference between our “null hypothesis” of 50–50 games and the World Series history as statistically significant.1

But it’s something for Dodgers fans to cling to.


  1. Yes, I’ve been a little breezy here with my definition of null and alternate hypotheses and one-sided vs. two-sided rejection areas, but it’s just baseball.