VMware ESX, Microsoft NLB and why you should care

…especially if you use Cisco switches!

We are a reasonably heavy user of Microsoft’s network load-balancing (NLB) technology in Windows 2003 and 2008 Server architectures. Specifically, we’re clustering a reasonably large (4000 mailboxes) instance of Exchange Server 2007, as well as numerous SQL Server 2005 & 2007 instances. We’re doing this on both physical, virtual and hybrid deployments.

While I’m not our resident Microsoft expert, my understanding of the fundamentals of this kind of clustering is this: servers transmit heartbeats/keepalives to other servers, and react accordingly. This is complicated by the fact that you have two options for heartbeat transmission: unicast and multicast. Microsoft’s default is unicast, VMware’s recommendation is multicast and you can read the rationale for this decision here. (There is even more information here. For information about setting up multicast NLB, go here.)

Where this turns into an embarrassment-of-riches problem is if you’re using Cisco switches. You can read a somewhat wordy explanation here, which summarizes the problem better than I can. But essentially it boils down to this: VMware recommend you use multicast NLB. In order to use multicast NLB, you need to (*gulp*) hard-code MACs and ARP entries in your Cisco switching infrastructure. For those with relatively small systems infrastructure, this isn’t the biggest deal in the world. But when you have more than 12,000 ports on campus, it prevents some strong scalability (and, subsequently, feasibility) problems. Every time you set up a cluster — which admittedly might not be that often — you’re going to have to coordinate configuration changes to your switching platforms.

How does this make you feel?

When incorrectly configured — which is to say, the switches have not been configured at all to handle the uniqueness of multicast NLB — you can experience problems where nodes seemingly drop off the network. This causes a split-brain scenario, where the cluster hasn’t actually failed over but it thinks that it has. When you add shared storage into the mix, things develop the inertia to turn pear-shaped very quickly. Which is what we’re seeing now, and what we’re trying to debug…


8 thoughts on “VMware ESX, Microsoft NLB and why you should care

  1. Pingback: Twitter Trackbacks for VMware ESX, Microsoft NLB and why you should care « Bitpushr's Blog [bitpushr.wordpress.com] on Topsy.com

  2. Microsoft is the party at fault here. NLB was the cheap option for load balancing in the days when HUBS were king and Load Balancers did not exist.

    Using the same MAC address on every server (even as a secondary MAC address) completely breaks the fundamental theory behind ethernet switches. As such, Microsoft has deprecated the use of NLB for more than eight years.

    You shouldn’t be using it. It’s time to get modern and implement IOS Server Load Balancing (at the very least), or get a load balancing appliance.

    Additionally, NLB incurs a massive CPU penalty on each server since every server MUST process every packet, even if they are not the final handler. Since Microsoft networking stack isn’t very efficient, this could waste up to 25% of your CPU in networking processing instead of application processing. You can be throwing away many tens of thousands.

    You can check my site for an article on simple IOS SLB, and there are plenty of smaples on Cisco’s site. Note that IOS SLB in a standard (free) feature in most Cisco routing switches.

  3. Oh, I should mention that Cisco’s multicast implementation is a bit problematic as well.

    They took the purist view that there should be an IGMP Master on a router interface that has IP Multicast enabled. Cisco is at fault for that, although, technically they are correct.

    Again, you shouldn’t be using NLB.

  4. Its actually a good thing *a very good thing* that the Cisco L3 switch does not automatically populate its ARP entries with mcast MACs — Why? — It would be a huge security and performance risk. Any rouge station could send ARP’s responses with multicast MACs and because mcast MACs are flooded like broadcasts (if no IGMP entry exists) then all stations on the subnet would receive traffic intended just for one station. Not only would that flood the network with unnecessary traffic (performance risk), it would make what was intended to be unicast data available to all stations, including the rouge (security risk).

    Furthermore, you would not need to configure *every* switch in the network. Only the (2) switches acting as the L3 default gateway for your NLB members would need configuration. The individual L2 switches the NLB members may be connected to do not need any special configuration because IGMP Snooping will automatically learn the appropriate ports for forwarding the NLB mcast MAC. In this case, since IGMP Snooping is supported and on by default on all modern Cisco L2 switches — having Cisco switches actually made your life easier, not worse! 🙂


  5. Pingback: links for 2009-09-10 | benway.net

  6. Pingback: links for 2009-09-10 | Savage Nomads

  7. I am not so sure about all these comments here. True, with IGMP the routers will learn the IGMP group and ports. However, the multicast mac address used by Microsoft cluster (03-BF-xx-xx-xx) is NOT a IANA standard multicast mac address (ie. 01:00:5e:yy:yy:yy). Therefore , the switch might learn the group, but program the wrong mac address in the NIC registers.
    Or does the switch program the source mac of the IGMP report in its registers. This is unclear to me and it is not what i am seeing: i have enabled IGMP MC mode for NLB, i see the IGMP groups, but i still don’t see the mac address in the tables/interfaces


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s