Coin flips and Stack Overflow

I didn’t intend to return to this topic, but I’ve been getting a few visits from a link on this page at stackoverflow.com. The question at stackoverflow is the same one I address in the Stochasticity post: what’s the probability of seeing a run of 7 identical outcomes (7 heads or 7 tails) in a sequence of 100 coin flips? A wrong answer has been posted (yes, someone is wrong on the internet), and I can’t do anything to fix it there; I can’t comment on or downgrade the wrong answer because I have no reputation points there, and I can’t post a correct answer because the question’s been closed. But I do want to talk about it because it involves some interesting topics.

The Stack Overflow answer

Here’s the wrong answer, given by Stack Overflow user truppo:

Each time you start a sequence of 7, there is a 1/(2^7) = 1 in 128 probability that it will be your desired sequence (no matter if its 7 heads/7 tails/something else), and thus a 127 in 128 probability that you will not.

In 100 flips, you can start this sequence 93 times, and in order NOT to hit at least one you need to fail all 93: (127/128)^93 = 48%.

Thus, chance of actually hitting is 100 - 48 = 52%.

What bothered me about this answer was that it was quite close to the probability I presented near the bottom of my post. Mine was an estimate calculated through Monte Carlo simulation and was about 54%. Could it be that truppo’s simple analysis gave the exact solution? As it turns out, the answer is no, and there are several reason why.

Statistics and hypothesis testing

Let’s start with a statistical approach. If an event has a probability [p] of occurring in a single trial, then the number of occurrences, [n], in a set of [N] independent trials is a random variable, by which I mean that its value is unknown. You cannot predict how many occurrences there will be before you run the [N] trials, and the next time you run [N] trials, the number of occurrences may be different.

The behavior of [n] is governed by the binomial distribution and has an expected value (or mean) of [N p] and a variance of [N p (1-p)]. We can use this information along with my Monte Carlo results to test the validity of truppo’s solution.

First, we’ll get a few more digits out of truppo’s answer.

[1 - \left ( \frac{127}{128} \right )^{93} = 0.5178]

If we assume that this is [p], then the expected number of runs of 7 in 10,000 trials is 5178 and its standard deviation (square root of the variance) is 49.97.

The count of runs of 7 in my Monte Carlo solution was 5432, which is

[\frac{5432 - 5178}{49.97} = 5.08]

standard deviations away from the mean. This is way out in the upper tail of the distribution. In a statistical sense, then, the Stack Overflow probability and mine aren’t close at all. The chance of getting a value more than 5 standard deviations (250 occurrences) from the mean is (via the binocdf function in Octave)

binocdf(5178-251, 10000, .51781) + (1 - (binocdf(5178+250, 10000, .51781)) = 5.3e-07

which is distinctly less than one in a million1. We can be pretty safe, then, in concluding that our initial assumption, that [p = 0.5178], is wrong.

Picking at the logic

Simply saying that truppo’s answer is wrong isn’t very satisfying. Why is it wrong? For that, we need to look into the logic of his solution.

Truppo starts by saying that the probability of getting 7 identical results in 7 flips is [1/128]. This is correct if we’re interested only in 7 heads or only in 7 tails, but we’re interested in runs of either kind. Therefore, the probability we should be starting out with is [1/64], and the probability of not getting a run of 7 in 7 flips is [63/64].

Truppo then looks at the sequence of 100 flips and says that you can have 93 runs of 7 within this sequence: flips 1–7, flips 2-8, flips 3–9, and so on. In fact, you can have 94 runs of 7 within 100 flips. The last one is 94–100. This is the kind of off-by-one error that Stack Overflow’s audience of programmers should be quite familiar with.

OK, so can we then calculate our probability by fixing those two obvious errors, but still keep truppo’s basic logic? Does

[1 - \left( \frac{63}{64} \right)^{94}]

give us the right answer?

No, because the most serious of truppo’s errors is implicit in that formula, which comes from the binominal distribution. In using the binomial distribution, truppo is saying that the 94 sets of 7 coin flips are statistically independent of one another, and that is clearly false. Flips 1–7, for example, cannot be independent of flips 2–8 because they have 6 flips in common. This error means that the answer of 52% is not merely numerically wrong, but conceptually wrong--the entire process of calculation is based on a mistake.

Finally

I don’t really mean to dump on someone who was, after all, just trying help someone else. But it would be good if the help were actually helpful. Stack Overflow is supposed to use the wisdom of the crowds to get good answers. If the crowds are kept from answering, the wisdom never appears.

Tags:


  1. Yes, I could have taken advantage of the central limit theorem and used a Gaussian approximation to the binomial, but where’s the fun in that?