Archives: Redux

My recent Archives article was met with some controversy and debate, which is great. I love controversy and debate, and a terrific discussion ensued. That discussion has led me to think a bit harder on my archive plan, and I’d like to follow up on the matter with some of the specifics of said plan, and expand on some of the ideas touched on therein.

It’s Personal
In the Archives post I basically said I’d be archiving all my “non-essential data” to hard drives and reserving optical media archives for only the most essential archives. I should first point out that what I am talking about here is my personal data. This is not necessarily a method I’d use at work or for a client. Archive methods should be specific to the needs of the situation.

The Future
One of my rationales for using hard drives was that hard drives are more likely than optical to be accessible in 10 years with the equipment of the day. It’s this particular idea that received a great deal of criticism, and I’m starting to see why.

Just a few weeks ago I had occasion to archive some museum kiosks that ran from some very old PowerMacs. Luckily, these PowerMacs were just barely of the era when ATA drives were starting to be used as internal drives on Macs. Getting the data off these systems was fairly straightforward. I simply hooked PowerMacs’ the ATA drives up to a firewire case and archived the data to DMG. Shortly thereafter, however, I wanted to perform a similar process with a slightly earlier vintage PowerMac. This machine, however, contained a SCSI drive. And finding a way to access and archive this drive proved almost impossible without going to extreme lengths and making obscure hardware purchases. Had there been some kind of optical archive of these systems, I would have almost certainly been able to pull a backup using today’s equipment.

I’m not sure what the future of optical media is. Until recently, I was pretty convinced it was not long for this world and would surely be displaced as a distribution medium by the web. But after thinking on the comments to that article, and talking to people way smarter than me on such matters, I realize I may be wrong. And if that’s the case, optical will be more likely to be readable than hard drives ten years in the future. But whatever the case, this is certainly true for media from ten years ago. You’re more likely to be able to read ten year old optical media than you are hard drives of that era.

Non-Essential Data
That said, I’d like to clarify the “non-essential data” qualifier I tossed in in the article. To be clear, I’m not completely eschewing optical media for my archives. What the article represented was my shift from optical as my only form of backup to hard drives as a significant if not primary form of data backup and archive.

To get even more specific, in the past I archived everything to optical media. But with the huge amounts of data I now collect, that’s not really so practical anymore, nor is it necessary. So these days the bulk of my data — large, non-essential data, things like ripped DVDs, video captures from tape, software installers, and data with a shelf life (i.e. that is only useful for a period of time or that relies on old versions of software or hardware) etc. — will be archived to hard drive. This will allow easy storage and retrieval. And it should last long enough. The idea is that this data isn’t forever data. It’s stuff I want to keep around for a while, but if I haven’t needed it in ten years, I probably won’t ever need it again.

More important data — of which there’s really not that much, but stuff like big video projects (sans captured media), photos, my websites, contacts, stuff that would really kill me to lose — I’ll be burning to optical. That way I have double backups of it (I’ll also keep it in the hard drive archive), and I’ll have it on a more robust medium that may have a better chance of being readable than hard drives in the future.

So what’s really going on here, for me, is a prioritization of my data backups that’s reflected in my archive procedures. With this prioritization, I can now rely much more heavily on hard drives as an archive medium. Using hard drives I can back up and access a lot more stuff with much greater ease and speed. Doing this allows me to use optical media only for the most important data. But make no mistake: optical will still be an important component in my backup strategy.

Live Archive
I wanted to also take a minute to mention one way hard drives are somewhat future-proof and useful as a true archive, and this is the idea of a live, rolling archive.

In the lab where I used to work we kept — or tried  to keep — a long-term archive of all student work that was accessible to incoming students so that they could look at and benefit from the work of their predecessors. Our students made all sorts of work, from web projects to video and animation projects to installations. And their work was initially being archived to all manner of media, from tape media to optical. There was no standard. By the time I got involved there were projects going back ten or fifteen years, and it was becoming clear that, no matter what medium we used today, we’d need to re-archive everything every so often as data access techniques and hardware evolved. I believe that, in a case like this, where the archive is constantly growing and reaches back well over ten years, but to which access is always required, the concept of the hard-drive-as-archive-medium is a sound one. The implementation would be fairly simple in concept: everything — the entire archive — is kept on a hard drive to which the community has access. As the archive grows, say every few years, it is transferred to larger storage. As storage standards change, it is transferred to the latest greatest medium of the day. Of course, redundant backups are also kept of the entire archive. But since this data is constantly being re-archived, hard drives — or whatever replaces them in the future — make for a sensible way to have a rolling, live archive, and reduce the need for more permanent solutions like optical. Perhaps Chucky, in the comments to Archives, put it best:

“In other words, hard drive archival demands cycling your backups over time to new hard drives with fresh magnetic media and evolving HD interfaces.”

I guess the overarching lesson here, if there is one, is that your archive method should reflect the specifics of your situation; there is no one archive method for everyone. The corollary to that, for me, is that hard drives can (and will) now be a significant part of my archive method.

6 Comments

  1. Posted November 15, 2009 at 7:09 PM | Permalink

    Very interesting arguments here. Looking over the past 25 years, optical media always kept backward compatibility. Modern Blue-Ray drives are still able to read CDs, DVDs and DL-DVDs. On the other side, HD interfaces are usually incompatible with each other, like SCSI, ATA and SATA.

    Very good post, like always.

    Regards,
    Rodrigo

  2. Posted November 16, 2009 at 9:31 AM | Permalink

    Yes, I may have erred slightly in my assumptions about the death of optical. Thanks be to the good old TASB community for righting my course. Being able to have perspective-altering discussions about this stuff is one of my favorite bonuses about having this site.

    -systemsboy

  3. Matt Smith
    Posted November 17, 2009 at 5:55 AM | Permalink

    I read the post preceding this, but didn’t comment. Personally, for home
    stuff (and by home stuff I also mean a lot of private work I do for
    people, contracts, video stuff, web stuff, a hell of a lot of personal
    photography, and time machine backups for 3/4 macs) I use something
    called a Drobo.

    It currently has 2x 1.5TB drives and a couple of 750GB drives in it. If
    you’re not familiar with the Drobo it’s a kind of ‘magic’ RAID (they
    call it BeyondRAID). It has many of the properties of RAID (perhaps not
    speed, at least with the standard Drobo), but you can put any sized
    drive in. The unit takes 4 drives, you can start off with just two. As
    one drive gets full you just eject that and stick in a larger one. The
    Drobo then goes about copying any data so that it’s all resilient. The
    Drobo has a hard limit of 16TB, but that’s a hell of a lot of storage in
    a fairly small box.

    Now obviously this is stuck with the SATA interface, but perhaps the
    more useful feature is that it converts that into USB/FW. For the
    foreseeable future (5 years I guess) you’ll be able to get SATA drives,
    and when
    you can’t you’ll just be able to keep this as a resillient boxed archive
    (perhaps keep one/two spare drives on hand for long term archival
    purposes incase a drive fails and you can’t purchase them anymore). As drives change, just buy a similar device with whatever
    interface drives use then and carry on like that. As long as FW/USB
    remains backwards compatible you’ll not have to re-archive.

    Interestingly Sun are heading towards SSD type storage for long-term /
    archival storage. Their thinking behind this is that SSD is far more
    resillient to corruption issues, and for large archival purposes there’s
    a huge net benefit of lower electricity costs (HDD’s keep spinning,
    regardless of whether the data’s being accessed). I think if Sun are
    planning on moving to drives as apposed to optical / tape then it’s a
    pretty sure bet that that’s what everyone will be doing (at least in the Enterprise).

  4. Posted November 17, 2009 at 12:57 PM | Permalink

    I keep hearing about Drobos, and they sound promising for certain applications. I’ve heard about some people having problems with them, but those may have been in the early days of the device. I’ve heard a lot of good buzz as well, and I’ve considered them from time to time. I’m not sure I could quite justify the expense of one yet; I just don’t have that much data. I’m also not convinced it would make a great longer-term solution, as you mentioned. I would add that a RAID device might not be totally appropriate for longer term archiving because of the trouble and expense of reconstructing the RAID, and because of the interdependency of the drives for said reconstruction.

    Nonetheless, Drobo remains on my radar as an intriguing short-term backup solution for certain situations, and mine could certainly turn into one of those situations someday.

    Thanks for the info. I find the bit about Sun really intriguing.

    -systemsboy

  5. Matt Smith
    Posted November 18, 2009 at 5:29 AM | Permalink

    Well the good thing about a Drobo (besides its expense) is that as long as they’re SATA drives, it doesn’t matter what RPM, what manufacturer, what capacity etc. It will take any SATA disk.

    I had a 1.5TB drive go (when I only had the one). The data auto secured itself over the remaining 3 750GB drives (thankfully I just about had enough capacity with those 3 drives). That took about 19 hours. I replaced the faulty 1.5TB drive, it secured all the data again (this took about 48 hours). All the time the Drobo is useable. I believe technically its slower but I didn’t really notice.

    That’s why I think it’s an ideal long term storage solution, as long as you have a spare SATA drive to protect against long-term failure then I think it’s an ideal, although perhaps costly, solution.

  6. Posted November 19, 2009 at 2:16 AM | Permalink

    I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Post a Comment

Your email is never shared. Required fields are marked *

*
*