How many samples?

In my Monte Carlo solution of the Two Child Problem, I did 10,000 simulations using Python’s random module. Was that enough? If I didn’t already know the answers (the usual case when doing Monte Carlo simulation), how could I estimate the accuracy of the results? Not surprisingly, the answers to these results can be found in the field of statistics.

When the process being simulated consists of a series of independent trials, each trial having one of two results (which we’ll call passes and fails, although any two mutually exclusive results could be used), it’s said to be a Bernoulli sequence, and the count of passes follows a binomial distribution

P(xpassesinntrials)=n!x!(nx)!px(1p)nxP(x \;\textrm{passes in}\; n \;\textrm{trials}) = \frac{n!}{x! (n-x)!}\: p^x \: (1-p)^{n-x}

where pp is the probability of a passing a single trial.1 You may recognize the term with the factorials as the binomial coefficient.

An estimate of pp is easily calculated:

p^=xn\hat{p} = \frac{x}{n}

where we put the hat over the pp to distinguish the estimate from the true (albeit typically unknown) value.

Because we could get a different value of xx every time we run a set of nn trials, p^\hat{p} is a random variable. If the number of trials is large enough, p^\hat{p} will have a normal distribution with a mean of

μp^=p\mu_{\hat{p}} = p

and a standard deviation of

σp^=p(1p)n\sigma_{\hat{p}} = \sqrt{\frac{p (1-p)}{n}}

That the mean of p^\hat{p} is equal to pp signifies that p^\hat{p} is an unbiased estimate of pp: as we increase nn, p^\hat{p} will tend toward pp.

Because we know the distribution of p^\hat{p} and its parameters, we can calculate the probability that p^\hat{p} will fall within some neighborhood, ±δ\pm \delta, around pp,

P(pδp^p+δ)=Φ(δσp^)Φ(δσp^)=12Φ(δσp^)P(p - \delta \le \hat{p} \le p + \delta) = \Phi\left(\frac{\delta}{\sigma_{\hat{p}}}\right) - \Phi\left(-\frac{\delta}{\sigma_{\hat{p}}}\right) = 1 - 2\Phi\left(-\frac{\delta}{\sigma_{\hat{p}}}\right)

where Φ\Phi is the standard normal cumulative distribution function,

Φ(z)=z12πeζ2/2dζ\Phi(z) = \int_{-\infty}^z \frac{1}{\sqrt{2\pi}} e^{-\zeta^2/2} d\zeta

The Φ\Phi function can’t be integrated analytically, but many numerical analysis programs include a function for it. Octave has a function called normcdf that does the trick. Back in the not-so-good old days, you’d have to look up it up in a table; every statistics textbook had a table of Φ\Phi in its appendix.

This is all very nice, but you can only do these calculations if you know pp, and the typical reason you’re doing the Monte Carlo simulation is that you don’t know pp. What you have is a particular value of p^\hat{p} and you want to know how good it is.

Enter the concept of the confidence interval. By flipping the terms around and using p^\hat{p} to estimate the standard deviation,

σ^p^=p^(1p^)n\hat{\sigma}_{\hat{p}} = \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

we can say with a confidence2 of

12Φ(δσ^p^)1 - 2\:\Phi\left(-\frac{\delta}{\hat{\sigma}_{\hat{p}}}\right)

that the true value of pp is in the interval

p^±δ\hat{p} \pm \delta

A more convenient way to work is to introduce a new variable,

α=Φ(δσ^p^)\alpha = \Phi\left(-\frac{\delta}{\hat{\sigma}_{\hat{p}}}\right)

and choose a confidence level of 12α1 - 2\alpha. Then

δ=Φ1(α)σ^p^=Φ1(1α)σ^p^\delta = -\Phi^{-1}(\alpha)\:\hat{\sigma}_{\hat{p}} = \Phi^{-1}(1-\alpha)\:\hat{\sigma}_{\hat{p}}

where Φ1\Phi^{-1} is the inverse of the standard normal CDF (Octave function norminv).

The confidence interval at this level is

p^±Φ1(1α)σ^p^\hat{p} \pm \Phi^{-1}(1-\alpha)\:\hat{\sigma}_{\hat{p}}

or, expanding out the σ^p^\hat{\sigma}_{\hat{p}},

p^±Φ1(1α)p^(1p^)n\hat{p} \pm \Phi^{-1}(1-\alpha) \: \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

That was a longish derivation, but the result is simple to use. Let’s say we’ve done a Monte Carlo simulation and we want to get the 95% confidence interval for pp. Here’s what we do:

  1. Note that for 95% confidence, α=0.025\alpha = 0.025.
  2. Use whatever tools available to determine that Φ1(0.975)=1.96\Phi^{-1}(0.975) = 1.96.
  3. Take the values of nn and p^\hat{p} from the simulation and calculate the interval

    p^±1.96p^(1p^)n\hat{p} \pm 1.96\sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

Now let’s say we’re planning a simulation and want to choose a value of nn to get a decent estimate of pp.

  1. For 95% confidence, use Φ1(0.975)=1.96\Phi^{-1}(0.975) = 1.96, as before.
  2. Get a preliminary value of p^\hat{p}, perhaps by doing a small Monte Carlo run.
  3. Decide how tight we want the confidence interval, ±δ\pm \delta, to be.
  4. Calculate nn via

    n=p^(1p^)(1.96δ)2n = \hat{p} (1 - \hat{p}) \left(\frac{1.96}{\delta}\right)^2

Note that if we want a very tight confidence interval, δ\delta will be small and nn will have to be large. This makes sense: to get a close estimate, we need a lot of samples.

Although it’s a common confidence level, there’s nothing magical about 95%. We could just as easily use another value. Here are the Φ1\Phi^{-1} values for a few others.

Confidence α\alpha Φ1(1α)\Phi^{-1}(1-\alpha)
99% 0.005 2.576
95% 0.025 1.960
90% 0.050 1.645
75% 0.125 1.150
50% 0.250 0.675

As expected, to get higher confidence, we need more samples.

Let’s calculate the confidence interval for the Monte Carlo simulation of the first example of the Two Child Problem. The result was

If we restrict ourselves to families that have at least one son,
the probability of having two sons is 2520/7535 = 0.334

So nn is 75353, and p^\hat{p} is 0.334. The 99% confidence interval is

0.334±2.5760.334·0.6667535=0.334±0.0140.334 \pm 2.576 \sqrt{\frac{0.334 \cdot 0.666}{7535}} = 0.334 \pm 0.014

So if we didn’t already know from earlier work that the true probability was 1/31/3, we could use this result to say with 99% confidence that the true probability was between 0.320 and 0.348.

I should mention that the formulas in this post aren’t restricted to Monte Carlo simulation. They’re valid any time you’re sampling from a pass/fail type of population.


  1. The count of fails also follows a binomial distribution; just substitute 1p1-p for pp and nxn-x for xx

  2. We use the word confidence instead of probability because we’re really not calculating a probability here. The true value of pp isn’t a random variable, it’s a number that we just don’t happen to know. 

  3. Why isn’t it 10,000? In this simulation we filtered out about a quarter of the sample families because they didn’t have sons, leaving 7535.