Conclusion, and possible vindication

From the previous graph, we can see that when the ESX host does not have enough pRAM — and, by nature, that the VMs do not have unfettered access to enough vRAM — that when the memory balloon inflates, performance goes to hell. Note that I deliberately put only a small amount of pRAM in the ESX host, and then slightly overcommitted the vRAM in the two VMs. I did this for a reason; I wanted to see deliberately have one VM with a reservation actively use all its memory (via the very useful stress) tool, which will in turn cause the other VM — with no reservation for its vRAM — to suffer as soon as the first VM begins a memory-intensive workload.

I got the results that I was expecting in so far as the 2nd VM, with little vRAM, suffers drastically when it must make its memory balloon calls to disk. Now, the only thing left to do was to prove that I can address the problem by adding more pRAM to the ESX host. So, I added another 4GB of pRAM to the ESX host (bringing it to a total of 8GB), and left the two VMs as they were (one with 3GB of vRAM assigned & 2.5GB reserved, the other with 1GB of vRAM and 0GB reserved.)

The results were what I expected — because the ESX host has enough RAM to allow the second, starving VM to balloon to RAM instead of NFS disk, its performance when it balloons (due to the first VM having a high memory reservation and a memory-intensive workload) is virtually the same as when it doesn’t balloon.

I suppose that, for now at least, the moral of the story is that you have to watch your performance! :-) Seriously though, just because there is some amount of free pRAM on the ESX host doesn’t mean the VMs can necessarily access it in a manner that keeps performance at acceptable levels. If you need to balloon your memory, you want to do it on the absolutely fastest storage (hopefully pRAM) that you can. Abstracting this away and doing it to disk via swap is going to have pretty serious performance implications; if you have many VMs all doing this at once then it is going to get very ugly very quickly.

Advertisements

2 thoughts on “Conclusion, and possible vindication

  1. Good set of posts. Personally, the takeaway is “vswap can be good, but it can also be very bad”. It’s good in the sense that oversubscription of EVERY asset (CPU cycles via virtualization, Memory via TPS+ballooning, storage via thin-provisioning) means you can be more efficient.

    BUT – when you need that resource, the underlying behavior becomes the behavior of the overall config.

    So – in the case of vswap, make sure it’s placed on a datastore with a significant amount of performance. This can be an NFS datastore, but if it IS an NFS datastore, design it with the same discipline you would with an FC network. Also recognize the differences in the multipathing behavior (through future vSphere releases, each NFS datastore will use only a single TCP session, which means a single ethernet link, regardless of how link aggregation/failover is configured).

    A good “try it” is to move the vSwap to local disk. I don’t recommend this practice in general, as it limits a lot of advanced use cases, but’s it is a quick/easy way to isolate performance issues.

    Reply
  2. Great investigatory work going on around memory overcommit and its relationship to storage.

    I may have overlooked it (gosh I hope I didn’t), but have you ran the same tests with vswap located on VMFS (say via iSCSI) and again with vswap located on local storage?

    This data would be great to share from tests ran by a third party.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s