On the importance of backups

I’m a bit paranoid about backing up. Particularly when it comes to the crown jewels i.e. my long-term research projects. Or, at least, I *thought* I was paranoid about backing up. It turns out I’m not. Consider the two following maxims of backups

  • If you need your backups you’re already in a compromising situation.
  • A backup is not a backup until you’ve seen a full restore.

So my backup strategy means taking the content from my laptop and pushing it onto my home NAS box. I normally use a nice GUI program such as Deja Dup which uses an implementation of the venerable rsync underneath. Then for the long-term research projects I use the git version control system (I also use darcs and have largely given up on bzr, due to SCM proliferation). I do a “git push” of all my projects from my laptop, to my home NAS and to a server in Ireland. This is a paranoid strategy….or is it.

This last week the disk in my home NAS box died. I’ve since replaced it, but have not had the time (or an external 3.5″ SATA caddy) to recover the data. The disk is foobar’ed, so I’m expecting only a partial recovery. That’s fine. I have all my data on my laptop. However, bang! Maybe it was the cold, or just plain entropy, but my laptop has developed a hardware fault. Thankfully, it’s not a disk fault. If it was I’d have lost my entire music collection, but not much else.

You see, when my NAS drive failed my paranoia kicked in. I asked my mate Neal to give me ssh access to his home NAS for my most important long-term research projects. I pushed up to his system, and the system in Ireland. But think about it…if I had a disk fault on my laptop, because I was already in a compromising situation, I could have lost a lot of important data. Already two independent random events were conspiring against me, why not a third? There was a window of a day between which I had no NAS storage and when my laptop failed. If I had replaced that disk a day later, I could have lost some data. Less important data, but data costs money.

All is well. I have all my data. Even my music collection (which isn’t too important with respect to my 8 years of research work). I’m also confident in my backups as every time I use git to back up a research project I can verify, using “git log”, that all my data *is* actually backed up. There have been many occasions on which friends or colleagues have told me that they assumed their backups were working. They have lost data due to not doing a systematic full restore. They’ve never verified their backup worked by *actually* trying to recover some data. Hopefully that’ll never happen to me, but you can’t be too complacent…or too paranoid.

2 Responses to “On the importance of backups”

  1. Aindriu says:

    Wholeheartedly agreed. A senior IT engineer I worked with had what to me seemed an overcautious attitude to backing up. Before making changes on systems he would do incremental images. He would drop date to usb sticks and take extra images to removable hard drives then plug them into different machines to make sure they were readable. He would export registries and test reimporting.

    It took me a year to realise what he was doing was just good sense. You don’t have any kind of Raid on this NAS machine? From the sounds of it even a Raid 1 would have been a good investment here.

  2. balor says:

    It’s a cheap NAS box. Built that way on purpose. It’s a fanless miniITX system with a big disk and very small power consumption. I’ve got a 2.5″ HD in it too, so _really_ important stuff gets backed up to that. It’s a balance of paranoia and cost. I really didn’t want another ~10W disk in the machine.

Leave a Reply