Backup

Based on the responses to this post, I ordered two 1 TB hard disks from Amazon, one to replace the bootable SuperDuper! backup drive that died a week ago and the other to act as a Time Machine backup.1 They arrived yesterday and are now sitting near the back edge of my desk.

Two new hard drives

There’s nothing special about them—just a couple of cheapo Western Digital USB 2 drives. I thought about getting FireWire, but

  1. Apple is clearly moving away from FireWire with this new Thunderbolt thing; and
  2. my week’s experience with the portable drive (which is USB) showed that USB 2 was fast enough for incremental backups.

So the extra expense of FireWire was unnecessary.

Was it wise to buy two identical drives? Normally, I’d say no. The usual concern with backup systems is simultaneous failure. If we have two disks, A and B, the probability of both failing is

P(AB)=P(A|B)P(B)P\;(AB) = P\;(A | B)\;P\;(B)

where P(A|B)P\;(A | B) is the conditional probability that A fails given that B fails.

If the disks are made by different manufacturers, it’s reasonable to believe they’re statistically independent, so

P(A|B)=P(A)P\;(A | B) = P\;(A)

In words, this means that the failure of B has no effect on the likelihood of a failure of A. Thus,

P(AB)=P(A)P(B)P\;(AB) = P\;(A)\;P\;(B)

If, on the other hand, the disks are the same make and model, one would expect that failure of one makes failure of the other more likely:

P(A|B)>P(A)P\;(A | B) > P\;(A)

This isn’t an expression of direct causality. The failure of drive B doesn’t cause the failure of A, but since the designs are the same, whatever conditions led to B’s failure are going to increase the probability of A’s failure.

So with drives from the same manufacturer, you get a higher probability of both failing:2

P(AB)=P(A|B)P(B)>P(A)P(B)P\;(AB) = P\;(A | B)\;P\;(B) > P\;(A)\; P\;(B)

Knowing this, why did I buy two of the same type of drive? First, I’ve had good experience with Western Digital disks, so I consider their probabilities of failure to be lower than other drives in the same price range.3 Second, the two drives aren’t really doing the same thing, so while they are seeing the same temperature and humidity, they’re not being worked through the same duty cycles; this makes them more independent than drives being used for exactly the same purpose.

And third, they look nicer in a stack when they’re the same model.

One problem with two identical, featureless black boxes is that I would soon forget which was which. Hence the little labels. They’re printed on Avery 5167 labels, kind usually used for return addresses. Avery makes template files available for all their label designs, but they’re built for MS Word, which does me no good. Fortunately, I have a little Perl script, called ptlabels, that I wrote last year when I needed to quickly make hundreds of sequentially numbered labels for test samples. The advantage of using ptlabels is that I can just make this little text file,

#Backup|1 TB
April 20, 2011

#Time Machine|1 TB
April 20, 2011

pass it as input to ptlabels, and watch the labels come out of my printer. Even if I had a copy of Word, this would be easier than using a template.

I don’t know how useful it was to put the date on the labels, but I had the room and figured it might come in handy some day. And it makes me look like I’m really organized.


  1. I wasn’t going naked for the week. I had a portable drive I enlisted to temporarily act as the SuperDuper! backup while I decided what to do. 

  2. There’s a lot more about conditional probabilities in this post about Bayes’ Theorem

  3. I recognize that this may be simple fantasy, but there’s a limit to how much research I feel like doing.