Ballooning results are in!

And boy are they ugly! The first graph is the performance of our VM with 3GB of vRAM, of which 2.5GB is reserved. Its virtual disk is mounted locally (i.e., it’s on the Blade’s SCSI drive that the ESX host & VM is running on.) Note the high orange Memory Active line, which the VM is able to enjoy because we reserved it a whole 2.5GB. This means, when we run a workload (via the stress project) that chews vRAM, the performance of this VM does not suffer. Specifically, it does not need to consume Memory Balloon, indicated by the very low blue line:

Local disk when ballooning

(Note that the test finished by around 2:57pm, hence the rapid decline of active memory.)

Now look what happens to our poor 1GB vRAM VM, of which it has 0GB reserved and its virtual disk is on a comparatively slow NFS mount. Active memory (the orange line) is high as we generate a disk I/O workload (via bonnie++), but in addition to that we see a huge demand for balloon memory — indicated by the blue/teal line. This poor VM has the worst of both worlds — not enough memory allocated to it, and when it swaps it swaps really slowly.

NFS disk when ballooning

(Note that the test finished by around 2:57pm, hence the somewhat rapid decline of balloon memory.)

To give you an idea of just how poor the performance becomes in the test, here is a “before” (i.e., NFS disk with no need to balloon) and “after” (i.e., NFS disk and strong need to balloon) test of the VM’s filesystem performance.

Excel graphs

That is incredible — incredibly poor! When under RAM duress, the starving, memory-hungry VM performs at barely 1/40th the performance of the fat, memory-happy VM. I can only imagine how upset this makes the guest OS.


One thought on “Ballooning results are in!

  1. A few questions to help my understanding:
    1) You are running the stress project for memory consumption, right? Which parameters were you using?
    2) You were also running bonnie++? Was it just running the second, small, starved VM? What parameters were you using for bonnie++?
    3) Did you collect guest swap statistics in the starved VM? vmstat should show that.
    4) Were you seeing any host swap?

    I am curious exactly what was causing the performance problems in the second VM:
    * That the balloon was non-zero and the guest had to page or…
    * That the balloon was non-zero and the guest had to page *slowly* or…
    * That the guest paging activity was interfering with the guest’s normal IO or…
    * That the host was swapping.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s