In a recent innovations meeting one of my colleagues suggested the use of pvscsi over the default LSI Logic drivers. The idea being the same as with vmxnet3 over e1000g adapters to save CPU resources. However that does not automatically yield a performance improvement. Other people have talked about their findings in the past and VMware themselves have said something about it too. To my surprise my findings were a bit different than proposed by VMware.
The CPU utilization difference between LSI and PVSCSI at hundreds of IOPS is insignificant. But at larger numbers of IOPS, PVSCSI can save a lot of CPU cycles.
My setup was very simple, a 64bit W2K8R2 VM with 4GB Ram, 2 vCPUs on an empty ESX cluster and empty storage. I was running my tests during off hours so impact by other VMs on the possibly shared storage (I do not know for sure, unfortunately, how the storage is setup in detail. Thus I don't know if the arrays are shared or dedicated. The controllers will be shared however.) is unlikely, the assigned FC storage LUNs for test purposes only. Apart from the OS drives the VM had two extra VMDKs, each using its own dedicated virtual SCSI controller, pvscsi and LSI Logic SAS.
I might have done something seriously wrong but here's what I found:
Using iometer's Default Access Specification (100% random access at 2kb block sizes, 67% read) I did indeed find very significant differences, but not what I had expected:
pvscsi: Avg Latency: 6.28ms, Avg IOPS: 158, Avg CPU Load 53%
LSI: Avg Latency: 3.16ms, Avg IOPS: 316, AVG CPU Load 34%
Multiple runs confirmed these findings.
Later changing the access specs to a more real world scenario VMware's proposition became more and more true, the values approached each other. At 60% random IO both adapters managed roughly 300 IOPS at 10% CPU load.
Conclusion
I cannot conclude much as I know too little about the storage configuration. However I wanted to see what happened if I scaled up a little. Using the very same storage I deployed a NexentaStor CE, gave it 16GB Ram for caching and 2 VMDKs on the same datastores as the initial VM (each Eager Zeroed Thick) and configured a Raid0-ZPool. I configured 4 zVol LUNs inside the storage appliance and handed them out via iSCSI, migrated the W2K8 VM into the provided storage (and nested ESXi for that matter, just to make it a little more irrelevant) and ran the same tests again. Now utilizing multiple layers of caching I got quite different values:
pvscsi: Avg Latency 1.59ms, Avg IOPS 626, Avg CPU Load 11%
LSI: Avg Latency 1.72ms, Avg IOPS 582, Avg CPU Load 21%
The performance impact is indeed insignificant, none of this is interesting for enterprise workloads. The CPU utilization difference is significant however, as it nearly doubles! As I said before all of this is irrelevant and pretty much a waste of time, it just shows that the platform doesn't have the bang properly make use of a paravirualized scsi controller to begin with. To me that is a little disappointing and an eye opener.
Follow up
Overriding capacity management I migrated the VM onto a production cluster to see whether the storage systems there are a bit more capable. However again the results are not what I expected:
pvscsi: Avg Latency 0.58ms, Avg IOPS 1708, Avg CPU Load 17%
LSI: Avg Latency 0.47ms, Avg IOPS 2126, Avg CPU Load 21%
Again I conclude that none of this is relevant, unfortunately, and I'm going to have to go into questioning the engineering team who set up this storage platform to find some answers as to how they decided what to set up.
No comments:
Post a Comment