# Coin flips and Stack Overflow

Here’s the wrong answer, given by Stack Overflow user truppo:

Each time you start a sequence of 7, there is a 1/(2^7) = 1 in 128 probability that it will be your desired sequence (no matter if its 7 heads/7 tails/something else), and thus a 127 in 128 probability that you will not.

In 100 flips, you can start this sequence 93 times, and in order NOT to hit at least one you need to fail all 93: (127/128)^93 = 48%.

Thus, chance of actually hitting is 100 - 48 = 52%.

What bothered me about this answer was that it was quite close to the probability I presented near the bottom of my post. Mine was an estimate calculated through Monte Carlo simulation and was about 54%. Could it be that truppo’s simple analysis gave the exact solution? As it turns out, the answer is no, and there are several reason why.

## Statistics and hypothesis testing

Let’s start with a statistical approach. If an event has a probability $p$ of occurring in a single trial, then the number of occurrences, $n$, in a set of $N$ independent trials is a random variable, by which I mean that its value is unknown. You cannot predict how many occurrences there will be before you run the $N$ trials, and the next time you run $N$ trials, the number of occurrences may be different.

The behavior of $n$ is governed by the binomial distribution and has an expected value (or mean) of $N p$ and a variance of $N p (1-p)$. We can use this information along with my Monte Carlo results to test the validity of truppo’s solution.

First, we’ll get a few more digits out of truppo’s answer.

If we assume that this is $p$, then the expected number of runs of 7 in 10,000 trials is 5178 and its standard deviation (square root of the variance) is 49.97.

The count of runs of 7 in my Monte Carlo solution was 5432, which is

standard deviations away from the mean. This is way out in the upper tail of the distribution. In a statistical sense, then, the Stack Overflow probability and mine aren’t close at all. The chance of getting a value more than 5 standard deviations (250 occurrences) from the mean is (via the binocdf function in Octave)

binocdf(5178-251, 10000, .51781) + (1 - (binocdf(5178+250, 10000, .51781)) = 5.3e-07


which is distinctly less than one in a million1. We can be pretty safe, then, in concluding that our initial assumption, that $p = 0.5178$, is wrong.

## Picking at the logic

Simply saying that truppo’s answer is wrong isn’t very satisfying. Why is it wrong? For that, we need to look into the logic of his solution.

Truppo starts by saying that the probability of getting 7 identical results in 7 flips is $1/128$. This is correct if we’re interested only in 7 heads or only in 7 tails, but we’re interested in runs of either kind. Therefore, the probability we should be starting out with is $1/64$, and the probability of not getting a run of 7 in 7 flips is $63/64$.

Truppo then looks at the sequence of 100 flips and says that you can have 93 runs of 7 within this sequence: flips 1–7, flips 2-8, flips 3–9, and so on. In fact, you can have 94 runs of 7 within 100 flips. The last one is 94–100. This is the kind of off-by-one error that Stack Overflow’s audience of programmers should be quite familiar with.

OK, so can we then calculate our probability by fixing those two obvious errors, but still keep truppo’s basic logic? Does