Metric Correlation with vRealize Operations

vRealize Operations is a tool of many talents. It keeps an ever-vigilant watch over the performance and capacity of your VMware SDDC. Self-driving operations means that you can focus on other tasks while vRealize Operations takes care of the routine. Workloads can be balanced to avoid capacity constraints and performance bottlenecks. Configuration drifts can be automatically remediated. And alerts can be triggered and assigned to the appropriate team when human intervention is needed. We’ve done a lot to ease the burden of operations management, but one area that still requires the skill of an IT administrator is troubleshooting.



vRealize Operations includes several troubleshooting dashboards that walk you through a typical troubleshooting workflow. These dashboards include troubleshooting vSphere clusters, hosts, VM’s, applications, vSAN, and datastores. I always recommend using these as a great starting point to determine where your problems lie.



Then there are times when you need to dig deeper and look across many metrics or even other VM’s or hosts. These tasks can be much more tedious. This requires opening dozens of metrics, looking for a correlation between them. Costing you valuable time while your applications suffer and users and breathe down your neck.


Introducing Metric Correlation


If we haven’t given you enough reasons to upgrade to 7.5 already, then here’s one more! End the frustration! Jump up and down in jubilation! Because here comes Metric Correlation!! New to vRealize Operations 7.5 is the ability to automatically identify a correlation between metrics.

To illustrate how this feature works, let’s look at a Virtual Machine that runs a data processing job twice a day for several hours at a time.



Looking at CPU demand, we can clearly see when the job runs. If we want to see what other resources are consumed during this job, we could either sift through the metrics ourselves or we can simply tell vROps to find a correlation for us. Let’s try the latter!!

To use metric correlation, simply navigate to an object’s All Metrics page. Open the metric you want to correlate and set the time range. Seven days is the default. Then open the dropdown located at the top right corner of the metric and open correlation. There are two options in the correlation menu. Let’s look at both of these.


All Self-Metrics




All self-metrics will search for a correlation across all non-instanced metrics for that object. This means that vROps will correlate our selected metric against all other top-level metrics. For example, we’ll check for a correlation with top-level CPU metrics, but not against the usage metrics of the individual cores.



As you can see, we’ve identified a correlation between CPU and Memory. vRealize Operations will return 10 results, but you can tell it to fetch more by scrolling to the bottom and clicking show more.


Now that vRealize Operations has identified a correlation between CPU, Memory, and Storage, we can pin these to the all metrics chart by clicking on the push pin.



Correlating between peer VM’s


Now, what if we wanted to see what other peer VM’s have the same CPU demand patterns? We can simply tell vROps to look for a correlation between peer objects.


vROps will look across all peer objects for a correlation of the selected metric. In this case, we have selected CPU Demand % for a VM so vROps will look across all peer VM’s for another one with a similar CPU Demand % profile. A peer is an object of the same type the belongs to the same parent. For example, another VM that lives on the same host or ESXi hosts that reside in the same cluster would be peers.

Let’s see if there are any other peer VM’s with similar CPU demand.


Sure enough, vRealize Operations discovered a similar CPU demand pattern in ERA-02, another VM running a similar job. You’ll notice that the values are drastically different, with one VM reaching 50% CPU demand and the other nearly 100%. This doesn’t really matter, as vROps is only looking for metrics with similar patterns, and not necessarily similar values. You can correlate metrics for as long as three months, or just a few hours. vROps just needs 11 data points at a minimum to run a correlation search.


Metric Correlation between ESXi hosts


Let’s look at another example of how Metric Correlation can be used. We have a vSAN cluster that is experiencing some intermittent issues. While doing some load testing, we discovered something interesting on one of our hosts.


We’re generating a constant load on this cluster so we would expect to see everything running nice and flat. However, when we look at the network transmit rates, we see a lot of drops. At this point, we don’t know if this is an issue with just this host or if it’s more widespread. We can quickly see if the other hosts are having similar issues by correlating this metric with the peer ESXi hosts.


Sure enough, the other hosts are seeing the same thing.


By correlating this metric to other hosts in the cluster, we can broaden the scope of our search quickly. Because this issue was impacting all hosts in the cluster, we started looking at what these hosts have in common. In this case, we identified a faulty TOR switch as the culprit of these issues. Once it was swapped out, things started working normally again! Thank you vRealize Operations!


Metric Correlation is a powerful new feature in vRealize Operations 7.5. Correlation is key to pointing us in the right direction when investigating issues. By leveraging this new feature to quickly identify other areas of interest, you’ll save time troubleshooting and resolve issues faster. Try it for yourself by upgrading to vRealize Operations 7.5 today or take advantage of the 60-day free trial! For more information, check out the vRealize Operations product page.

