vSphere Metrics Watch: Four Key Storage Metrics
Now that we’ve covered which memory and processor metrics to keep any eye on, let’s take a closer look at storage. When programs aren’t running, they are stored on disk along with the data that is required by said programs. In addition, in many modern data centers, servers are scheduled to boot directly from storage, so there are often files related to the operating system (vSphere) located on storage as well.
Disk latency is the amount of time that it takes for a read or write operation sent to a storage device to actually be processed. This metric takes into account delays that might be introduced due to RAID and other storage configuration decisions.
If you’re seeing average disk latency exceeding 20 milliseconds or so or if you’re seeing major spikes on an ongoing basis, you may be running into issues that could be workload affecting. For example, I’ve seen disk latency issues break Microsoft clusters.
When it comes to disk latency, there can be any number of causes, each of which calls for a different resolution.
Obviously, every workload has different levels of tolerance, so do a bit of study before jumping to too many conclusions. Further, when it comes to disk latency, there can be any number of causes, each of which calls for a different resolution.
- Perhaps you have too few disk spindles serving too many I/Os. Solutions may include adding additional spindles or moving to solid state disks. You may also need to investigate the storage head end to make sure it can keep up with the load.
- Sometimes, adding additional RAM to a virtual machine can be helpful in these cases, but only if adding that RAM has the potential to lead to less I/O hitting storage.
- If you have antimalware software on your virtual machines, make sure it’s not running at the same time on all of your VMs. That activity can spike I/O, increasing overall latency.
- Make sure no virtual machines are swapping to disk. See the next item for more information.
Kernel Command Latency
VMFS operations in vSphere are run through the VMkernel. As such, there will necessarily be some latency imposed by the VMkernel while it processes the storage commands. This latency value should be very low, as in sub 2ms, but could be as high as 4ms. If you see it go beyond that figure, something isn’t right and the server is experiencing too much load, which could result in poor storage performance.
Queue latency is the amount of time that a storage command sits in a queue in the storage stack before being serviced and is a subset of overall kernel latency, but is an important metric in its own right. In general, if you see any queue latency exceeding 1ms, that’s too much and you should consider examining your storage environment to determine if there are optimization opportunities.
Datastore Disk Usage
Finally, storage isn’t just about performance. There’s an obvious capacity component that is critically important to most organizations as well. In a vSphere environment, you need to make sure that your storage devices, whether they’re VMFS or NFS, don’t run out of space. This is generally a pretty easy task if you use disk formats that consume all of their space as soon as you create new volumes. If, however, you use thin provisioning to maximize the available capacity of your storage environment, it becomes possible to overallocate the amount of physical storage you have. In these instances, it becomes possible to run out of disk space on a volume at which point the virtual machines on that volume will stop operating.
To prevent this from happening, ensure that there is always enough space assigned to your volumes, especially if you’re using thin provisioning.
Storage plays a critical role in the overall performance of an environment, but it also has the responsibility for storing the data assets for an organization. Make sure your vSphere storage environment is running at peak conditions while also having enough capacity to meet ongoing needs.