Sizing and Performance
Undersizing or Oversizing virtual machines can have a negative impact to your VM and other VMs running on the same hypervisor host. Please ensure you're following the best practices when creating or modifying VMs.
- Start with one vCPU per VM and increase as needed.
- Vendors often suggest resource requirements that are much higher than necessary to ensure the applicaiton works well the first time. Take vendor suggestions with a grain of salt.
- Do not allocate more vCPUs than needed to a VM as this can unnecessarily limit resource availability for other VMs and increase CPU Ready wait time (waiting for CPU) on the VM.
Adding more vCPUs (virtual CPU) to a VM can equal less computing power.
Virtualization makes highly efficient use of the CPU, memory, and storage resources available on a physical server. Hypervisor overhead accounts for just about 5% of the physical host resource overhead, leaving the majority of those physical resources available for use by virtual servers.
A hypervisor is a load sharing computing system. Every VM running on a hypervisor host gets an equal share of CPU, memory, storage, and network attention from the system scheduler. Because a badly configured VM can affect the amount of resources that are available to his neighbors, properly sizing VM workloads is essential to maximizing its performance and minimizing wasted resources.
In the physical world, a server is limited to physical factors - for CPU, it's limited by the processor family, number of cores per CPU socket, and the speed of the physical processors that are present. If an application on a physical server is slow, it was often remedied by buying a larger server, procuring more memory, or by swapping the storage system out for faster disks or SSD drives.
Virtualized server CPU resources work in a similar manner. However, because a virtualized server is a shared system, increasing the number of CPU resources for a VM can actually throttle the VM and perform more slowly than intended.
If a physical host has two CPU sockets, each running a 16 core CPU. With every clock cycle, the two CPUs can execute up to 32 instructions at once.
The number of VMs running on a host can vary, but we generally want to run around 50 per host. These VMs vary in size from 1 vCPU to some that have 16 vCPU's configured on them, and have mixed workloads from nothing running other than the OS up to intensive video processing.
As mentioned earlier, all VMs have the same priority. The CPU resource scheduler acts in a round robin - type fashion and checks in with each VM to see if it has any CPU work to execute. If the VM has CPU work to perform at that time, the scheduler will assigns cores located on the physical CPUs to perform the work. One physical CPU core gets assigned to each vCPU configured for the VM. If the VM does not have any work to perform, the scheduler skips to the next VM in line. Once all of the physical cores are filled, or the scheduler runs out of time, the physical CPU's execute the instructions as part of that cycle.
We have 32 cores available to schedule each clock cycle. What happens if we fill 30 of them, and the next VM that has work to do has 4 vCPU's?
The answer is that the scheduler tells the 4 core VM that it has no CPU resources available and it will skip to the next VM. When the scheduler finds a VM with 2 vCPUs and has work to perform, it will finish filling the queue and execute the work. The amount of time that the VM has to wait to get work scheduled is recorded in milliseconds in the list of CPU statistics as "CPU Ready" time – the VM was ready for CPU attention, but had to wait X milliseconds to get scheduled.
Will that 4 vCPU VM get priority to be serviced in the next scheduling interval? Not necessarily – the CPU resource scheduler will check in to see if that VM has work waiting to be done at the next regular interval. In reality, the odds are good for a 4 vCPU VM to get scheduled regularly. What if the VM had 16 vCPU's or more instead of 4? On a busy host, the chances of getting work scheduled each clock cycle decreases the higher the number of configured vCPUs.
Take a theoretical example of a host that has 6 very busy 2 vCPU VMs and one 32 vCPU VM. The 32 vCPU VM will probably show gigantic co-stop times because it can't get work scheduled until there's a clock cycle where none of the other VMs have work to be performed and all 32 physical cores are free.
The most accurate answer is that "it depends". If you have 4 threads that contain work that are ready to be executed every clock cycle, then yes, it will definitely help your VM run better. If you only have 2 threads ready to do work each cycle, then adding 2 more vCPUs will be wasted. You may not see a difference in performance initially, but as the physical hosts get busier, VMs that are over provisioned will eventually see performance suffer.
VMWare uses a CPU statistic called 'CPU Ready' that keeps track of how long a VM workload waits for CPU attention from the host it is running on.
This counter can be found in the VMware vSphere Client by highlighting your VM, click on the Monitor tab, click on Performance, and choose Advanced. Click on Chart Options. Choose CPU on the left side, and a Timespan of Real-time. Under "Select Counters for this chart", choose "Readiness". The display will show a value for each vCPU configured for the VM, and an aggregate value for the VM as a whole.
The definition of "Readiness" is: "Percentage of time that the Virtual Machine was ready, but could not get scheduled to run on one of the physical CPU." You want this value to be as low as possible, ideally. We get concerned at the 5% mark. Occasional spikes over 5% can be OK - it can indicate a host that is just very busy, but sustained bursts over 5% can also indicate that an additional core may benefit performance on the machine. However, an additional core is only beneficial if there is a corresponding additional thread of work in the VM that is waiting to be executed. The UFIT Virtual Platform team should examine this VM further to see if adding additional cores would benefit performance.
VMware uses a statistical counter called "CPU Co-Stop" to measure the amount of time that a VM has to wait to execute an instruction due to CPU scheduling - finding the right number of physical cores available on the host CPU needed to execute one clock cycle's worth of work waiting in the threads of the VM.
The definition of "CPU Co-Stop" is: Time that the virtual machine is ready to run, but is unable to run due to co-scheduling constraints. In order for a single instruction set to run, the scheduler has to find the same number of physical CPU cores free that match the number of vCPU's configured for a virtual machine. Even vCPUs that have no work queued up to perform.
This counter can be found in the VMware vSphere Client by highlighting your VM, click on the Monitor tab, click on Performance, and then choose Advanced. Click on Chart Options. Choose CPU on the left side, and a Timespan of Real-time. Under "Select Counters for this chart", choose "Co-Stop". The chart will show a value for each vCPU configured for the VM, and an aggregate value for the VM as a whole. The values expressed in the chart are in milliseconds summed up over a sampling interval of 20 seconds. To calculate a percentage of wait, take the value shown and divide by 20000, then multiply by 100.
The number that approaches concern is 5%. If you see sustained values above 5%, it can be a sign of too many vCPU's assigned, or that it is running on a host that's very busy. Further examination by the ICT Hypervisor team can determine whether lowering the number of vCPU's or moving it to a different host can help performance.
IMPORTANT NOTE
CPU Co-Stop numbers will be deceptively low if the physical host has sufficient CPU capacity to handle the full workload of all of the VMs (including yours) running on it (wasted vCPUs or not).
If the physical host has plenty of processing power, and no VMs ever have to wait for work to be scheduled, then Co-Stop values will always remain low. This is particularly the case for newer hardware infrastructure in the beginning. However, the negative effects of over provisioning will become more and more pronounced as the physical hosts get busier, and those numbers will eventually start climbing once performance starts to suffer.
If you do not have a feeling for how much work your VM will perform in our environment, start with as little as possible. Start with 2 vCPU's to enable multi-threading - it's easy to add additional vCPU capacity if the VM needs it, whereas removing vCPU capacity requires downtime for the virtual server.
Watch the performance counters on the Real-Time CPU display in the VMware vSphere Client. If the "Usage in MHz" counter stays at 100% for sustained periods of time, you may have a runaway process, or you may have threads with additional work that is waiting to be performed. (Check the CPU Ready values). Occasional spikes to 100% are perfectly normal - you just do not want high sustained use. Be sure to look at the values for the individual vCPUs in the Real-Time CPU display rather than the aggregate for the VM as a whole. Ask the UFIT Virtual Platform team to examine activity on the VM.
If you look at the Usage in MHZ screen and see activity on some vCPUs and 0 or very low value numbers on the others, it may be a sign that you have too many vCPUs assigned that aren't being used. Reducing the number of vCPUs can prove to be beneficial - especially for VMs that have higher numbers of vCPUs.
Because of the shared nature of virtualization, physical sizing requirements are rarely relevant to creating performant VMs. Start with a low number of vCPUs, examine performance statistics in the vCenter console after the product is running, and add additional vCPU s only if needed. Remember that it is easy to hot add vCPU's but more painful to power down a VM to remove them.
Check the "Usage in MHz" counters in the Real-Time CPU display under Monitor, Performance, Advanced in the VMware vSphere Client for your VM. If they show high sustained values for all of the vCPU's assigned to it, then you may need to add more capacity. Check the CPU Ready counters in that same graph to see how long the wait is. If it is a low %Ready value per vCPU, then your speed issue may be caused by other factors, like a poorly constructed database query, not using indexes, etc.
Consider reducing the number of vCPUs assigned to the VM. The lower the number of vCPUs assigned, the easier it is to schedule work for it. In more than one scenario, reducing the number of vCPUs assigned increased performance for a VM that seemed sluggish. You can easily go back the other direction if it does not help.