Mixing ESX and ESXi in vSphere 4.1 clusters?

As everyone knows, vSphere 4.2 (and onward) is going to be ESXi-only. So, we’re going to need to convert our ESX clusters to ESXi sooner rather than later. Differences between ESX and ESXi are well-documented, so I won’t get in to them here. However, one thing that’s come up in my attempts to test ESX to ESXi conversions (and by “conversions” I mean “removing the ESX host, installing ESXi on it and adding it back to the cluster”) are the differences in network architecture.

In short, ESX has Service Consoles and ESXi does not. ESXi, instead, has Management Networks. In ESX, a Service Console is not a type of VMkernel; in ESXi, a Management Networks is a type of VMkernel. In ESX, your first Service Console’s device ID is probably vswif0. In ESXi, your first Management Network is probably vmk0.

In my organization, this complicates things when it comes to vMotion of VMs between an ESX and an ESXi host in the same cluster. Here’s why:

Label Public/private Device Switch Uplinks Mode
Service Console Public vswif0 vSwitch0 vmnic0, vmnic1 Active/Active
VMkernel 0 Private vmk0 vSwitch1 vmnic2, vmnic3 Active/Passive
VMkernel 1 Private vmk1 vSwitch1 vmnic3, vmnic2 Active/Passive

Compare that with ESXi on identical hardware:

Label Public/private Device Switch Uplinks Mode
Management Network Public vmk0 vSwitch0 vmnic0, vmnic1 Active/Active
VMkernel 0 Private vmk1 vSwitch1 vmnic2, vmnic3 Active/Passive
VMkernel 1 Private vmk2 vSwitch1 vmnic3, vmnic2 Active/Passive

Note the key difference: on ESX, the two VMkernels that connect to the datastores are vmk0 and vmk1. On ESXi, the two VMkernels that connect to the datastores are vmk1 and vmk2. This is because the first VMkernel, vmk0, is actually the Management Network port in ESXi. So, now we have a mismatch in the architecture between ESX and ESXi.

As you can see, vSphere recognizes that the two hosts are using different ports for their VMkernels (vmk0 vs. vmk1). Now, you can accept this warning and the vMotion will continue. But — and this is a big but — if you attempt to put one of these hosts into Maintenance Mode, vSphere will not accept this warning and your VMs will never migrate off the host because of this conflict.

There is one way to alleviate this issue: if I make the default vMotion port on each host to be vmk1 instead of vmk0, then yes, it will work and vSphere won’t give me a warning about different VMkernel devices being used on the two hosts. However, this doesn’t cure the problem with Maintenance Mode — the VMs will not vMotion automatically if you enter a host into maintenance mode.

Has anyone encountered this in their own testing? What have you done to mitigate the issue? This is with virtually 100% identical hardware on the two hosts; one is a HP BL465c G6 blade and the other a BL465c G7. Datastores are accessed via iSCSI, though the same problem occurs using NFS. And no, going to Fiber Channel is not an option 😀 Both hosts are vSphere build 348481, i.e. vSphere 4.1.0.