Setting the stage
I've been working intensely with an enterprise licensed vCenter Operations Manager lately and find it a powerful tool to analyse, monitor and optimize vSphere landscapes (and possibly others, I have not had the opportunity to work with AWS adapter for instance). On the side I play around with a few foundation level instances with various customers.
Recently I stumbled across a very annoying issue with the custom interface that I will just briefly outline here, but not go into it too deeply just yet. vCOps allows you to create custom tags and assign them to resources. Google it, its well worth using this feature as it effectively groups resources. So instead of creating a dashboard and filtering for a whole bunch of resources, you end up filtering for your custom resource tag only. vCOps will then, whenever you refresh the dashboard, pull the metrics from the tagged resources. Tag another resource (say you add a datastore to your environment and tag it using your custom tag) your dashboard will automatically display the newly added resource and its metrics.
The problem is that it only works so well with the heatmap widget, but that's not the topic of the post. Should you want info on this, feel free to reach out via the comments below or Twitter @Str0hhut.
In a recent discussion on the issue with my friend Iwan (@e1_ang) he pointed out that the vApp version of vCOps allows custom grouping of resources while the Windows version does not. That got me thinking about what else I might have missed in the vApp version. So I started comparing the user interfaces. In the vApp version you can indeed create a group by clicking the configuration link in the top right corner and then "Manage Group Types". However at the moment it is beyond me to figure out how to assign resources to my newly created group. Info on this is also appreciated via comments or Twitter.
The hidden gem
Back to topic. While scouting out the menu of the vApp I discovered an interesting link that I highly recommend all aspiring downsizers and every one else interested to click on
What follows is a dashboard that is remarkably similar to the custom dashboards of a licensed vCOps edition. And aside from what vCOps says about itself in the regular dashboard this one will give you a lot more information in detail of whats happening. I have downsized my deployment by quite a bit (UI VM has been capped to 3GB Ram and 1vCPU, Analytics to 5GB and 1 vCPU as well), but have been thinking it has been running quite well so far. Sure, health of the Analytics VM is at 51, there is a memory constraint, but other than that it feels quite snappy, graphs are all up and running, data is there plentyful and (almost everything) works as expected. Well, almost. I've been suspecting it might be due to the restricted resources, that the "Normal"-calculation is not working for every resource. The newly found dashboard confirms that at a rather detailed level.
As you can see from the screenshot, both my collection and analytics tier appear to be in bad shape, whereas my presentation tier is happily buzzing along. Furthermore if you drill into the tiers you'll get an in depth view of which services are affected.
This is a the view you get when you drill into the tree as follows:
- double click Collection
- double click vCenter Operations Collector (the bottom most icon of the resulting tree)
- double click vCenter Operations Adapter (top right corner of the resulting tree)
I then selected the OS resource from the health tree and expanded the Memory Usage folder in the metric selector to find some interesting metrics.
Interestingly enough its not the Analytics VM that is swapping like crazy, but the UI VM.
So I started asking myself how I could find out which of the two VMs is really undersized. Its obvious that the Analytics VM needs more power (or is it? Read on!), the analytics tier is on red alert. Going back to the default dashboard both VMs report that they are memory constrained. Both VMs are demanding to use 100% of their configured memory resources, however both are using only about 50% of what they have. Best guess at this point is a memory constraint of the host vCOps is running on. Both vCenter client and vCOps provide plenty evidence that this is the case. This does not come as a surprise, yes the host is very much constrained, that is the reason why I sized the vCOps VMs down to begin with.
I'm still curious about why the UI VM is swapping so much when the Analytics VM does not seem to be swapping at all. Better yet logging into each VM and seeing what the Linux kernel has to say about it I found very different suggestions:
Analytics VM
localhost:~ # free -m
total used free shared buffers cached
Mem: 4974 4937 37 0 12 789
-/+ buffers/cache: 4134 839
Swap: 4102 0 4102
UI VM:
vcops:~ # free -m
total used free shared buffers cached
Mem: 3018 2999 19 0 0 391
-/+ buffers/cache: 2607 411
Swap: 4102 1264 2838
Wait, what? So the Analytics VM is not swapping, we knew that already. In fact, the kernel of the analytics VM somehow even managed to allocate a small amount of memory as an I/O buffer! It seems to me that despite of what vCOps thinks of itself from an OS point of view the Analytics VM is sized just about right.
Resume
This is no resume as to what is happening in this particular environment and how well the vCOps vApp handles down sizing just yet. I have not gathered nearly enough information to fully analyze the situation and draw conclusiong, my brain is buzzing with ideas and paths to follow along. For now I have set memory reservations for both VMs to force the host to provide each with their entitlements. I will have to think this over and investigate some more in the days to come, as well as observe how the memory reservations change the picture. Stay tuned, this may get interesting.
To be continued...