Conclusion, and possible vindication

From the previous graph, we can see that when the ESX host does not have enough pRAM — and, by nature, that the VMs do not have unfettered access to enough vRAM — that when the memory balloon inflates, performance goes to hell. Note that I deliberately put only a small amount of pRAM in the ESX host, and then slightly overcommitted the vRAM in the two VMs. I did this for a reason; I wanted to see deliberately have one VM with a reservation actively use all its memory (via the very useful stress) tool, which will in turn cause the other VM — with no reservation for its vRAM — to suffer as soon as the first VM begins a memory-intensive workload.

I got the results that I was expecting in so far as the 2nd VM, with little vRAM, suffers drastically when it must make its memory balloon calls to disk. Now, the only thing left to do was to prove that I can address the problem by adding more pRAM to the ESX host. So, I added another 4GB of pRAM to the ESX host (bringing it to a total of 8GB), and left the two VMs as they were (one with 3GB of vRAM assigned & 2.5GB reserved, the other with 1GB of vRAM and 0GB reserved.)

The results were what I expected — because the ESX host has enough RAM to allow the second, starving VM to balloon to RAM instead of NFS disk, its performance when it balloons (due to the first VM having a high memory reservation and a memory-intensive workload) is virtually the same as when it doesn’t balloon.

I suppose that, for now at least, the moral of the story is that you have to watch your performance! :-) Seriously though, just because there is some amount of free pRAM on the ESX host doesn’t mean the VMs can necessarily access it in a manner that keeps performance at acceptable levels. If you need to balloon your memory, you want to do it on the absolutely fastest storage (hopefully pRAM) that you can. Abstracting this away and doing it to disk via swap is going to have pretty serious performance implications; if you have many VMs all doing this at once then it is going to get very ugly very quickly.

Advertisements

Ballooning results are in!

And boy are they ugly! The first graph is the performance of our VM with 3GB of vRAM, of which 2.5GB is reserved. Its virtual disk is mounted locally (i.e., it’s on the Blade’s SCSI drive that the ESX host & VM is running on.) Note the high orange Memory Active line, which the VM is able to enjoy because we reserved it a whole 2.5GB. This means, when we run a workload (via the stress project) that chews vRAM, the performance of this VM does not suffer. Specifically, it does not need to consume Memory Balloon, indicated by the very low blue line:

Local disk when ballooning

(Note that the test finished by around 2:57pm, hence the rapid decline of active memory.)

Now look what happens to our poor 1GB vRAM VM, of which it has 0GB reserved and its virtual disk is on a comparatively slow NFS mount. Active memory (the orange line) is high as we generate a disk I/O workload (via bonnie++), but in addition to that we see a huge demand for balloon memory — indicated by the blue/teal line. This poor VM has the worst of both worlds — not enough memory allocated to it, and when it swaps it swaps really slowly.

NFS disk when ballooning

(Note that the test finished by around 2:57pm, hence the somewhat rapid decline of balloon memory.)

To give you an idea of just how poor the performance becomes in the test, here is a “before” (i.e., NFS disk with no need to balloon) and “after” (i.e., NFS disk and strong need to balloon) test of the VM’s filesystem performance.

Excel graphs

That is incredible — incredibly poor! When under RAM duress, the starving, memory-hungry VM performs at barely 1/40th the performance of the fat, memory-happy VM. I can only imagine how upset this makes the guest OS.

Ballooning memory, shared storage & file I/O performance

For quite some time now, our team & I have been chasing a strange, inconsistent problem — a performance issue, specifically — within our VMware infrastructure. It is difficult to repeat on purpose, but the issue does recur reasonably frequently. And now we may finally have identified it’s cause… maybe!

What we’re seeing is that, on both Linux and Windows VMs, the VMs are periodically “losing” their virtual disks. The disks themselves are located on shared storage; in our case they are on NetApp filers and shared to the ESX hosts via NFS. Windows VMs seem to be much more resilient in dealing with the issue than Linux VMs — despite using journaled filesystems, Linux VMs will sometimes experience such severe problems that the filesystems actually crash (though this probably happens only once every few months or so.) More typically than that, in the Linux world the issue is manifested as a SCSI error. This makes sense, given that VMware abstracts the virtual disks away through using a SCSI driver in Linux (specifically for us it’s the mptscsi driver. Here’s an example of the issue:

mptscsi: ioc0: attempting task abort! (sc=f7dcb940)
scsi0 : destination target 0, lun 0
command = Write (10) 00 00 83 3d d0 00 00 08 00
mptbase: ioc0: WARNING - IOCStatus(0x004b): SCSI IOC Terminated on CDB=2a id=0 lun=0
mptscsi: ioc0: task abort: SUCCESS (sc=f7dcb940)

Up until recently, I thought I had looked at virtually every possibility that existed that could cause a slowdown between the ESX host (in our case, HP BL465c Blades) and the shared storage (NetApp FAS-3020s.) Or so I thought. I’d looked at packet frame sizes, NIC teaming algorithms, duplex matching, spanning-tree configs, NFS keepalives, NFS window sizes and a whole bunch of other things I’d forgotten about. My boss had a suggestion that I hadn’t thought of — perhaps the root cause of everything was balloon memory and VMs swapping? At first, I didn’t believe him; after all we have VMs using 167GB of RAM spread across a cluster of six hosts that has a total of 228GB. While that is an admittedly high average utilization of 73% per Blade, we only have 36GB in the Blades and they will take up to 64GB each. So, we have head-room to grow our RAM resources if we can find the money in our collective wallet.

His theory was an interesting one. When VMs experience balloon memory conditions (that is, they demand more memory than is available to them from the pool), they essentially stop using RAM for their temporary storage (as they have run out) and start using disk for their temporary storage. Under the best conditions (i.e., fast local disk), this would involve a hopefully-slight hit to performance. But what if the disk they’re swapping to is not, in fact, local — what if it is a disk presented via NFS over a LAN that is part of a larger shared storage pool? Simply put, you cannot expect equal performance in terms of latency or throughput as you can over a 2Gb NFS link (a pair of 1Gb NICs bonded together) as you could a fast SATA, SAS or SCSI disk will perform in the real world. Now what if this problem is occuring not on a single VM but on multiple VMs? Or even dozens? You would have a cascading problem — some VMs run out of RAM and begin swapping to what is really an NFS disk. More VMs run out of RAM and they begin swapping to an NFS disk. And your disk — in actuality, the TCP/IP pipe to your disk — becomes increasingly and increasingly congested.

To test his theory, we concocted a strange kind of environment. We built a single ESX host with 4 CPUs (which are essentially irrelevant here) and 4GB of physical RAM. Then we deployed 2 Red Hat VMs: one had 3GB of RAM and had a truly local (i.e., on the Blade itself) virtual disk ; the other VM had had 1GB of RAM and a NFS-mounted virtual disk. But the key is this: the 3GB VM got all of its RAM reserved, and the 1GB VM had none of its RAM reserved. His theory is that, if we introduce a workload on the 3GB (local disk) VM that chews up all its RAM — which it has gleefully reserved — the 1GB VM, when faced with a workload, will need to swap to disk. NFS disk. And it will do it slowly. And then we will see performance go to hell.

I’m running the tests right now, but so far it looks like he will be proved right. Which is embarrassing for yours truly ;-)