Variance of a sum

Earlier this week, John D. Cook wrote a post about minimizing the variance of a sum of random variables. The sum he looked at was this:

Z=tX+(1t)Y

where X and Y are independent random variables, and t is a deterministic value. The proportion of Z that comes from X is t and the proportion that comes from Y is 1t. The goal is to choose t to minimize the variance of Z. As Cook says, this is weighting the sum to minimize its variance.

The result he gets is

t=Var(Y)Var(X)+Var(Y)

and one of the consequences of this is that if X and Y have equal variances, the t that minimizes the variance of Z is t=1/2.

You might think that if the variances are equal, it shouldn’t matter what proportions you use for the two random variables, but it does. That’s due in no small part to the independence of X and Y, which is part of the problem’s setup.

A natural question to ask, then, is what happens if X and Y aren’t independent. That’s what we’ll look into here.


First, a little review. The variance of a random variable, X, is defined as

Var(X)=(xμX)2fX(x)dx

where μX is the mean value of X and fX(x) is its probability density function (PDF). The most familiar PDF is the bell-shaped curve of the normal distribution.

The mean value is defined like this:

μX=xfX(x)dx

People often like to work with the standard deviation σX instead of the variance. The relationship is

Var(X)=σX2

Now let’s consider two random variables, X and Y. They have a joint PDF, fXY(x,y). The covariance of the two is defined like this:

Cov(X,Y)=(xμX)(yμY)fXY(x,y)dxdy

It’s common to express the covariance in terms of the standard deviations and the correlation coefficient, ρ:

Cov(X,Y)=ρσXσY

If we were going to deal with more random variables, I’d explicitly include the variables as subscripts to ρ, but there’s no need to in the two-variable situation.

The correlation coefficient is a pure number and is always in this range:

1ρ1

A positive value of ρ means that the two variables tend to be above or below their respective mean values at the same time. A negative value of ρ means that when one variable is above its mean, the other tends to be below its mean, and vice versa.

If X and Y are independent, their joint PDF can be expressed as the product of two individual PDFs:

fXY(x,y)=fX(x)fY(y)

which means

Cov(X,Y) = (xμX)(yμY)fX(x)fY(y)dxdy = (xμX)fX(x)dx(yμY)fY(y)dy = 0

because of the definition of the mean given above. Cook took advantage of this in his analysis to simplify his equations. We won’t be doing that.


Going back to our definition of Z,

Z=tX+(1t)Y

the variance of Z is

σZ2=t2σX2+2t(1t)ρσXσY+(1t)2σY2

To get the value of t that minimizes the variance, we take the derivative with respect to t and set that equal to zero. This leads to

t=σY2ρσXσYσX22ρσXσY+σY2

This reduces to Cook’s equation when ρ=0, which is what we’d expect.

At this value of t, the variance of the sum is

σZ2=(1ρ2)σX2σY2σX22ρσXσY+σY2

Considering now the situation where σY=σX, the value of t that minimizes the variance is

t=σX2ρσX22σX22ρσX2=12

which is the same result as before. In other words, when the variances of X and Y are equal, the variance of their sum is minimized by having equal amounts of both, regardless of their correlation. I don’t know about you, but I wasn’t expecting that.

Just because the minimizing value of t doesn’t depend on the correlation coefficient, that doesn’t mean the variance itself doesn’t. The minimum variance of Z when σY=σX is

σZ2=12(1+ρ)σX2

A pretty simple result and one that I did expect. When X and Y are positively correlated, their extremes tend to reinforce each other and the variance of Z goes up. When X and Y are negatively correlated, their extremes tend to balance out, and Z stays closer to its mean value.