Variance of a sum
November 13, 2025 at 12:07 PM by Dr. Drang
Earlier this week, John D. Cook wrote a post about minimizing the variance of a sum of random variables. The sum he looked at was this:
where and are independent random variables, and is a deterministic value. The proportion of that comes from is and the proportion that comes from is . The goal is to choose to minimize the variance of . As Cook says, this is weighting the sum to minimize its variance.
The result he gets is
and one of the consequences of this is that if and have equal variances, the that minimizes the variance of is .
You might think that if the variances are equal, it shouldn’t matter what proportions you use for the two random variables, but it does. That’s due in no small part to the independence of and , which is part of the problem’s setup.
A natural question to ask, then, is what happens if and aren’t independent. That’s what we’ll look into here.
First, a little review. The variance of a random variable, , is defined as
where is the mean value of and is its probability density function (PDF). The most familiar PDF is the bell-shaped curve of the normal distribution.
The mean value is defined like this:
People often like to work with the standard deviation instead of the variance. The relationship is
Now let’s consider two random variables, and . They have a joint PDF, . The covariance of the two is defined like this:
It’s common to express the covariance in terms of the standard deviations and the correlation coefficient, :
If we were going to deal with more random variables, I’d explicitly include the variables as subscripts to , but there’s no need to in the two-variable situation.
The correlation coefficient is a pure number and is always in this range:
A positive value of means that the two variables tend to be above or below their respective mean values at the same time. A negative value of means that when one variable is above its mean, the other tends to be below its mean, and vice versa.
If and are independent, their joint PDF can be expressed as the product of two individual PDFs:
which means
because of the definition of the mean given above. Cook took advantage of this in his analysis to simplify his equations. We won’t be doing that.
Going back to our definition of ,
the variance of is
To get the value of that minimizes the variance, we take the derivative with respect to and set that equal to zero. This leads to
This reduces to Cook’s equation when , which is what we’d expect.
At this value of , the variance of the sum is
Considering now the situation where , the value of that minimizes the variance is
which is the same result as before. In other words, when the variances of and are equal, the variance of their sum is minimized by having equal amounts of both, regardless of their correlation. I don’t know about you, but I wasn’t expecting that.
Just because the minimizing value of doesn’t depend on the correlation coefficient, that doesn’t mean the variance itself doesn’t. The minimum variance of when is
A pretty simple result and one that I did expect. When and are positively correlated, their extremes tend to reinforce each other and the variance of goes up. When and are negatively correlated, their extremes tend to balance out, and stays closer to its mean value.