Getting started with Clustered Data ONTAP & FC storage

A couple of days ago, I helped a customer get their cDOT system up and running using SAN storage. They had inherited a Cisco MDS switch running NX-OS, and were having trouble getting the devices to log in to the fabric.

As you may know, Clustered Data ONTAP ”’requires”’ NPIV when using Fiber Channel storage — i.e., hosts connecting to a NetApp cluster via the Fiber Channel protocol. NPIV is N-Port ID Virtualization for Fiber Channel, and should not be confused with NPV — which is simply N-Port Virtualization. Scott Lowe has an excellent blog post comparing and contrasting the two.

NetApp uses NPIV in order to abstract away the underlying hardware (i.e., FC HBAs) to the client-facing hardware (i.e., Storage Virtual Machine Logical Interfaces). The use of logical interfaces, or LIFs, allows us to not only carve up a single physical HBA port into many logical ports, but also for the WWPNs to be different. This is particularly useful when it comes to zoning — if you buy an HBA today, you’ll create your FC zone based on ”’the LIF WWPNs”’ and not the HBA’s.

For example, I have a two-node FAS3170 cluster, and each node has two FC HBAs:

dot82cm::*> fcp adapter show -fields fc-wwnn,fc-wwpn
node       adapter fc-wwnn                 fc-wwpn                 
---------- ------- ----------------------- ----------------------- 
dot82cm-01 2a      50:0a:09:80:89:6a:bd:4d 50:0a:09:81:89:6a:bd:4d 
dot82cm-01 2b      50:0a:09:80:89:6a:bd:4d 50:0a:09:82:89:6a:bd:4d 

(Note that that command needs to be run in a privileged mode in the cluster shell.) But the LIFs have different port addresses, thanks to NPIV:

dot82cm::> net int show -vserver vs1
  (network interface show)
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
                         up/up    20:05:00:a0:98:0d:e7:76 
                                                     dot82cm-01    2a      true
                         up/up    20:06:00:a0:98:0d:e7:76 
                                                     dot82cm-01    2b      true

So, I have one Vserver (sorry, SVM!) with four LIFs. If I ever remove my dual-port 8Gb FC HBA and replace them with, say, a dual-port 16Gb FC HBA, the port names on the LIFs that are attached to the SVM will ”not” change. So when you zone your FC switch, you’ll use the LIF WWPNs.

Speaking of FC switches, let’s look at what we need. I’m using a Cisco Nexus 5020 in my lab, which means I’ll need the NPIV (not NPV!) license enabled. To verify if you have that license, it’s pretty simple:

nane-nx5020-sw# show feature | i npiv
npiv                  1         enabled

That’s pretty much it. For a basic fabric configuration on a Nexus, you need the following to work with cluster-mode:

  1. An NPIV license
  2. A Virtual Storage Area Network, or VSAN
  3. A zoneset
  4. A zone

I’m using a VSAN of 101; for most environments the default VSAN is VSAN 1. I have a single zoneset, which contains a single zone. I’m using aliases to make the zone slightly easier to manage.

Here is the zoneset:

nane-nx5020-sw# show zoneset brief vsan 101
zoneset name vsan101-zoneset vsan 101
  zone vsan101-zone

You can see that the zoneset is named vsan101-zoneset, and it’s in VSAN 101. The member zone is rather creatively named vsan101-zone. Let’s look at the zone’s members:

nane-nx5020-sw# show zone vsan 101
zone name vsan101-zone vsan 101
  fcalias name ucs-esxi-1-vmhba1 vsan 101
    pwwn 20:00:00:25:b5:00:00:1a
  fcalias name dot82cm-01_fc_lif_1 vsan 101
    pwwn 20:05:00:a0:98:0d:e7:76

Note that I have two hosts defined by aliases, and that those aliases contain the relevant WWPN from the host. Make sure you commit your zone changes and activate your zoneset!

Once you’ve configured your switch appropriately, you need to do three four things from the NetApp perspective:

  1. Create an initiator group
  2. Populate that initiator group with the host’s WWPNs
  3. Create a LUN
  4. Map the LUN to the relative initiator group

When creating your initiator group, you’ll need to select a host type. This will ensure the correct ALUA settings, amongst others. After the initiator group is populated, it should look something like this:

dot82cm::> igroup show -vserver vs1
Vserver   Igroup       Protocol OS Type  Initiators
--------- ------------ -------- -------- ------------------------------------
vs1       vm5_fcp_igrp fcp      vmware   20:00:00:25:b5:00:00:1a

We’re almost there! Now all we need to do is map the initiator group to a LUN. I’ve already done this for one LUN:

dot82cm::> lun show -vserver vs1
Vserver   Path                            State   Mapped   Type        Size
--------- ------------------------------- ------- -------- -------- --------
vs1       /vol/vm5_fcp_volume/vm5_fcp_lun1 
                                          online  mapped   vmware      250GB

We can see that the LUN is mapped, but how do we know which initiator group it’s mapped to?

dot82cm::> lun mapped show -vserver vs1
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
vs1        /vol/vm5_fcp_volume/vm5_fcp_lun1          vm5_fcp_igrp  0  fcp

Now we have all the pieces in place! We have a Vserver (or SVM), vs1. It contains a volume, vm5_fcp_volume, which in turn contains a single LUN, vm5_fcp_lun1. That LUN is mapped to an initiator group called vm5_fcp_igrp of type vmware, over protocol FCP. And that initiator group contains a single WWPN that corresponds to the WWPN of my ESXi host.

Clear as mud?


Is VMware forcing the homogenization of enterprise storage?

Stephen Foskett wrote this article about possible futures of enterprise storage; in turn it was further examined by Scott Lowe here.

Stephen’s article posits this: is VMware forcing the homogenization of enterprise storage? (If it is, will that be at the expense of it’s corporate parent, EMC?) Is VMware’s progress in technologies like VMFS, snapshots etc. moving customers to make different purchasing decisions, away from VMware? My answer is: no. At least, not in a significant fashion.

In order for VMware’s storage offerings to prosper, they need to be able to supplant existing storage vendors that are not offering significant value-adds. Those can come in many forms, though: be they great hardware offerings, great software offerings, or perhaps even great price offerings. The companies that don’t offer anything strategic or unique are likely to go by the wayside because those are the companies whose products VMware can “replace” with its own offerings.

It’s important, then, to keep in mind the key differentiators of various storage vendors. If they have a product or a solution that fills a particular niche better than anyone else, they’re likely to survive (even against VMware as its own software vendor). However — like any performer in any market — if you make a poor product in a competitive industry, you’re going to be replaced. The only thing that makes VMware’s offerings uniquely potent is that they’re sometimes able to better-leverage the vSphere platform because they literally wrote the APIs that they’re hooking in to.

Migrating vCenter Server from SQL Server Express to SQL Server Standard

Under vSphere 4, we were using SQL Server Express 2005; when we upgraded to vSphere 5 we kept the same database (even though vSphere 5 comes bundled with SQL Server Express 2008). However, we had long since surpassed 5 hosts, hence VMware suggested we migrate from SQL Server Express 2005 to SQL Server Standard 2008 R2. Here is a quick synopsis of how that happened:


  1. Stop all VMware services, in particular VMware vCenter
  2. Install SQL Server 2008 R2 Management Studio
  3. Perform a full backup of the VIM_VCDB database (I do this via SQL Server Management Studio)
  4. Uninstall VMware vCenter Server
  5. Move the backup somewhere, like C:\
  6. Uninstall SQL Server 2005

Installing SQL Server 2008 R2:

  1. Install SQL Server 2008 R2 Enterprise Edition
  2. Installation features:
    • Database Engine Services
    • Management Tools – Basic
    • Management Tools – Complete
  3. Service Accounts:
    • SQL Server Agent: NT_AUTHORITY\SYSTEM; startup type Automatic
    • SQL Server Database Engine: NT_AUTHORITY\SYSTEM; startup type Automatic
    • SQL Server Browser: NT_AUTHORITY\LOCAL S...; startup type Automatic

Restoring the vCenter Server DB:

  1. Launch SQL Server Management Studio & connect to your instance
  2. Right-click on Databases and choose Restore Database…
  3. Select the file & and the database name (probably VIM_VCDB5 or something unique)
  4. Create a new, 64-bit System DSN and point it to the new database. Use SQL Server Native Client 10.0 as your driver
  5. Make sure the default database is VIM_VCDB5, not master!
  6. Start the SQL Server 2008 R2 agent, if it’s not already running

Now, you should be able to install vSphere 5. When prompted, select the new DSN you created, and make sure you use your existing database!

For those using VMware Update Manager, you will also need to re-create a 32-bit System DSN and point it to the same VIM_VCDB5 database. You create a 32-bit DSN by calling the 32-bit ODBC manager which is located at c:\windows\SysWOW64\odbcad32.exe. (You’ll still use SQL Server Native Client 10.0 as the driver, though.)

Fixing slow boot times with ESXi & NetApp iSCSI LUNs

I’ve been using a mix of iSCSI and NFS LUNs since VMware ESX 3.5; I used them quite heavily in ESX & ESXi 4 without issue. Since moving to ESXi 5, though, I’ve noticed that my ESXi hosts are taking a long time to boot — more than 45 minutes! During the boot process, they’re hanging on this screen for the majority of that time:

You can see that the message is vmware_vaaip_netapp loaded successfully. I did some debugging work, and my boss chimed in with his suggestions, and we got the issue squared away this morning. I had narrowed it to the point where I could identify the cause of the symptoms: the presence of a dynamic iSCSI target. You can have as many NFS datastores as you want, and even as many iSCSI software HBAs as you want, but the moment you add a dynamic iSCSI target is the moment where you have issues — at least in our environment. What I mean by that is, we have a large number of server VLANs (several dozen) and our NetApp filers provide file services to almost all of those VLANs:

[root@palin ~]# rsh blender iscsi portal show
Network portals:
IP address        TCP Port  TPGroup  Interface         3260    2000    vif0-260       3260    2001    vif0-265         3260    2003    vif0-278       3260    2006    vif0-251         3260    2007    vif0-252         3260    2008    vif0-254         3260    2009    vif0-256       3260    2010    vif0-257         3260    2011    vif0-258       3260    2012    vif0-259         3260    2013    vif0-262         3260    2014    vif0-264         3260    2015    vif0-266        3260    2016    vif0-267       3260    2017    vif0-268       3260    2018    vif0-269         3260    2019    vif0-270       3260    2021    vif0-272       3260    2022    vif0-273         3260    2023    vif0-274        3260    2024    vif0-275       3260    2025    vif0-276       3260    2026    vif0-277       3260    2027    vif0-281

You can see that there are two dozen VIFs there, each on their own VLAN. In my case, I’m looking for the target that sits on vif0-265; I don’t care about any of the other targets. Trouble is, though, that my ESXi hosts only have VLAN 265 trunked to their VMkernels, hence the only target they can see is on VLAN 265. After I explained this to my boss, he said “I bet the Filer is enumerating all of those portals to the ESXi host, and 99% of them are timing out” (because they are inaccessible.)

Turns out, he was right! This is taken from the iscsi(1) man page:

* If a network interface is disabled for iSCSI use (via iscsi interface disable), then it is not accessible to any initiator regardless of any accesslists in effect.

Since we’re using initiator groups and not accesslists, this is our problem: the Filer is indeed enumerating every portal (all two dozen!) it has configured, even though our ESXi host is only trunked out to one of them. So, I’m waiting for 23 separate connections to time out so that 1 connection can work. So we came up with this:

[root@palin ~]# rsh blender iscsi interface accesslist add vif0-265   
Adding interface vif0-265 to the accesslist for

[root@palin ~]# rsh blender iscsi interface accesslist show     
Initiator Name                      Access List    vif0-265

Now, because that accesslist exists on that VIF, the Filer replies only with an initiator target as being present on the correct VIF & VLAN (in this case, vif0-265). Problem solved! Now, all I have to do is go through and add the rest of my iSCSI initiator names to my Filers, and Robert will be my father’s brother.

Deleting problem files from VMFS data stores

Following up from my issues yesterday, I had a bunch of files (old, bad snapshot deltas) that I needed to delete. Problem was, I couldn’t:

/vmfs/volumes/datastore/vm/badsnapshots # rm -rf *
rm: cannot remove 'foghorn-000101-ctk.vmdk': Invalid argument
rm: cannot remove 'foghorn-000101-delta.vmdk': Invalid argument

Try as I might, I couldn’t get rid of them. Via lsof I couldn’t see that the files weren’t locked; indeed, I was able to move them, just not delete them. So I cheated, by echoing a character to the file (to verify its sanity and update it’s mtime):

echo "a" > *

Then, rm worked. Victory!

vSphere problems with vpxa on hosts

I had a very bizarre issue recently, where two of my 20 vSphere ESXi 5 hosts disconnected from their clusters. When I’d try and reconnect (or, remove & connect) the hosts from the clusters, I would get an error message saying the host couldn’t be added because timed waiting for vpxa to start. Bad grammar theirs, not mine!

After filing a support request with VMware, a very helpful engineer helped me determine the cause. Looking through the vpxa logs (/var/log/vpxa.log), he noticed that some virtual machines on each host had lots of snapshot files, and vCenter Server was having trouble managing that host. So, we enabled SSH on the problematic ESXi host, and took a look:

/vmfs/volumes/4e68dec0-274d0c10-21f1-002655806654/Foghorn(Test BlackBaud DB) # ls
Foghorn-000001-ctk.vmdk    Foghorn-000041-delta.vmdk  Foghorn-000081.vmdk        Foghorn-000122-ctk.vmdk    Foghorn-000162-delta.vmdk  Foghorn-000202.vmdk
Foghorn-000001-delta.vmdk  Foghorn-000041.vmdk        Foghorn-000082-ctk.vmdk    Foghorn-000122-delta.vmdk  Foghorn-000162.vmdk        Foghorn-000203-ctk.vmdk

I cut that off, because there were more than 200 delta files for that VM! Obviously, the snapshot process had spun way out of control for this particular VM. It’s unclear why this happened, but removing those VMs from the host allowed me to add the hosts back to the cluster.

After that, I simply cloned the problematic VMs (which automatically flattens the snapshots) into new VMs and the problem was solved.

Ballooning results are in!

And boy are they ugly! The first graph is the performance of our VM with 3GB of vRAM, of which 2.5GB is reserved. Its virtual disk is mounted locally (i.e., it’s on the Blade’s SCSI drive that the ESX host & VM is running on.) Note the high orange Memory Active line, which the VM is able to enjoy because we reserved it a whole 2.5GB. This means, when we run a workload (via the stress project) that chews vRAM, the performance of this VM does not suffer. Specifically, it does not need to consume Memory Balloon, indicated by the very low blue line:

Local disk when ballooning

(Note that the test finished by around 2:57pm, hence the rapid decline of active memory.)

Now look what happens to our poor 1GB vRAM VM, of which it has 0GB reserved and its virtual disk is on a comparatively slow NFS mount. Active memory (the orange line) is high as we generate a disk I/O workload (via bonnie++), but in addition to that we see a huge demand for balloon memory — indicated by the blue/teal line. This poor VM has the worst of both worlds — not enough memory allocated to it, and when it swaps it swaps really slowly.

NFS disk when ballooning

(Note that the test finished by around 2:57pm, hence the somewhat rapid decline of balloon memory.)

To give you an idea of just how poor the performance becomes in the test, here is a “before” (i.e., NFS disk with no need to balloon) and “after” (i.e., NFS disk and strong need to balloon) test of the VM’s filesystem performance.

Excel graphs

That is incredible — incredibly poor! When under RAM duress, the starving, memory-hungry VM performs at barely 1/40th the performance of the fat, memory-happy VM. I can only imagine how upset this makes the guest OS.