Backups, virtualization and shared storage

When it comes to the actual operations of backup policies, they are often dictated by corporate data retention policies: such as, “We must have a recovery window of N days, and we must store the backups off-site.” It is rare, in my experience, for an organization to actually specify the required media (e.g., tape) or where the offsite location needs to be. Typically, though, backups are written to tape on-site and then moved to an off-site location soon after.

With the adoption of virtualized environments (that live on shared storage), these operations took an odd turn. Many organizations are still backing up their virtual machines via agent/server backup applications: EMC/Legato NetWorker, IBM Tivoli, Symantec BackupExec, etc. And they run the same way on virtual machines as they do on physical machines: you install an agent on the (virtual machine) client, and the backup server connects to that agent and sucks a (full) copy out of the virtual machine. You may do this full every night, or incrementals, but you’re still accomplishing the same thing — you’re taking the contents of the VM, which lives on shared storage, out through the VM itself.

My question is: why not just backup the virtual machine where it sits? Why not just tell your Filer to back it up primarily? A .vmdk is the same thing no matter where or how it sits.

A couple of weeks ago I had a customer (a very big organization) that remarked that the single biggest workload in their virtualized environment was backups. That is, their Filers see the most IOps and their switches see the most traffic at 4am every morning when they kick off their system backups. This is because each VM is backed up via an agent — and the backup server pulls the entire contents of the virtual machine out through the virtual machine itself.

This problem isn’t limited to virtual machines, either. Many organizations that deploy SQL Server (or, to a lesser extent, Oracle) are running full backups of the actual SQL databases, even though those databases are living on shared storage. Again, while this is feasible and effective, it is hardly efficient.

My question is: why not just backup the SQL Server (or Oracle) database where it sits? Why not just tell your Filer to back it up primarily? An SQL database is the same no matter where it sits.

In the case of NetApp (my employer), it doesn’t matter if you’re using Oracle Database over NFS on Linux or SQL Server over FC on Windows: your database is living in WAFL, in a whole big series of 4KB blocks. And because we can take snapshots of anything stored on WAFL no matter how you access it, why not just take a snapshot of the database? It’s just a bunch of blocks, right? After all, the Filer doesn’t care if it’s a chunk of SQL or a chunk of VMDK or a chunk of Word document. Blocks are blocks are blocks, and if it’s in a block we can snap it (and, we can dedupe it.)

So why not just take a snapshot of it? One common objection is “Well, a snapshot is fine, but I have to store them off-site”. Okay, cool, I understand — I was the same way. So let’s take that snapshot and move it off-site; to another Filer in a different building or a different city or a different country. “Well, that sounds okay, but my auditor tells me it has to be read-only”. Okay, cool, I understand — I was the same way. So let’s take that snapshot and lock it as read-only. “Well, that sounds okay too, but my CIO tells me it has to live for 7 years.” Okay, cool, I understand — I was the same way. So let’s take that snapshot and vault it.

What I’m trying to get at is, the policies you’re living inside of (in this case, backup policies) shouldn’t dictate the technologies you use. Just because you need off-site backups doesn’t beholden you to use tape. You should, under any policy at any organization, use the best tool or technology for the job. In the case of virtualized environments living on shared storage, what is the best tool? What is the best technology? If the data are blocks living on a Filer somewhere, why not just back them up where they belong — on the Filer itself?


One thought on “Backups, virtualization and shared storage

  1. Are you saying keep the local backup on the same disk array as the primary.

    A typical problem, is where an off siting process, be that replicating the filer to remote site B, or some form of copy to a remote site needs to take place.

    There remote copy processes have a high rate of failure ( bandwidth, or the wan is down or the “sneaker net” isn’t working today.

    If you are backing up to the same physical local system, you could be in a situation where you have not off system”ed or offsited your data for up to 48 hours.

    I would not feel comfortable with my data residing on a single system for 48 hours at a single site.

    If the data was copied to another entirely different system, at the same site a backup server, or another entirely seperate disk system, and hasn’t been offsited for 48 hours, thats not ideal, but feels a lot better than holding data, snapshots etc, within a single system.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s