Skip to content

Category: VMware

Understanding VMware vSphere Distributed Resource Scheduler (DRS):

 

A detailed, professional diagram showing VMware DRS in action. The image should illustrate a VMware vSphere cluster with three hosts (Host A, Host B, Host C) and multiple virtual machines. Host A is heavily loaded, and arrows represent virtual machines being migrated to Host B and Host C for load balancing. The design includes a polished and modern look with clear labels for each host and VM, as well as a clean background.

VMware vSphere Distributed Resource Scheduler (DRS) is a critical feature for dynamic resource management in virtualized environments. By intelligently balancing workloads across hosts, DRS ensures optimal performance and efficient use of resources. In this article, we’ll explore how DRS works, its benefits, and how to configure it in your environment.

 

Key Benefits of DRS

  1. Workload Balancing:
    Ensures that no host is overloaded, improving the overall cluster performance.
  2. Power Efficiency:
    Shuts down underutilized hosts during low demand to save energy and reduces costs.
  3. Improved VM Performance:
    Allocates resources dynamically to prevent resource contention.
  4. Operational Simplicity:
    Automates the process of balancing workloads, reducing manual intervention.

How Does DRS Work?

  1. Resource Monitoring:
    DRS continuously monitors CPU, memory, and other resources across the cluster.
  2. VMotion:
    Uses VMware vMotion to migrate VMs between hosts without downtime.
  3. Dynamic Thresholds:
    Balances workloads based on thresholds defined by the user (e.g., conservative vs. aggressive).

Configuring DRS in vSphere

  1. Enable DRS:
    • In the vCenter Server, right-click on your cluster and go to “Settings.”
    • Enable DRS and set the automation level (manual, partially automated, or fully automated).
  2. Set Resource Pools (Optional):
    • Create resource pools to allocate resources to groups of VMs as needed.
  3. Define Rules:
    • Create affinity and anti-affinity rules to control VM placement.
  4. Monitor Performance:
    • Use the DRS dashboard to analyze the balancing actions and cluster performance.

Best Practices for Using DRS

  • Always enable vMotion alongside DRS for seamless migrations.
  • Regularly review automation levels to match operational requirements.
  • Test rules and thresholds in a controlled environment before applying them in production.
  • Combine DRS with VMware High Availability (HA) for enhanced fault tolerance.

 

Resolving ESXi Host CPU Overload Issues

An ESXi host experiencing CPU overload typically exhibits symptoms such as:

  • VMs becoming unresponsive or slow.
  • High CPU Ready Times in vSphere performance metrics.
  • Consistently maxed-out CPU usage in the host’s performance tab.

Common causes include:

  1. Oversized VMs: Allocating more vCPUs than needed.
  2. Resource Contention: Too many VMs competing for CPU resources.
  3. Misconfigured Resource Pools: Imbalanced resource allocation.
  4. Unoptimized Applications: Inefficient software consuming excessive CPU.
  5. Background Processes: Host-level tasks like backups or snapshots running during peak hours.

Steps to Troubleshoot and Resolve

  1. Analyze Performance Metrics
  • In the vSphere Client, go to Monitor > Performance for the affected host or VMs.
  • Look for:
    • CPU Usage (%): High values indicate overload.
    • CPU Ready (%): High values (above 5%) indicate VMs waiting too long for CPU.
    • Co-Stop (%): High values indicate vCPU scheduling issues.
  1. Optimize VM Configurations
  • Reduce the number of vCPUs allocated to each VM unless absolutely necessary. Many applications perform well with fewer vCPUs.
  • Power off unused or idle VMs to free up resources.
  1. Check and Reconfigure Resource Pools
  • Review resource pools to ensure proper allocation.
  • Avoid strict limits unless required, as they can starve VMs of CPU during peak loads.
  1. Balance Workloads Across Hosts
  • Use vMotion to migrate high-load VMs to hosts with spare CPU capacity.
  • Enable DRS (Distributed Resource Scheduler) if available, to automatically balance workloads.
  1. Address Application-Level Issues
  • Identify high-CPU-consuming processes within the VMs.
  • Work with application owners to optimize software settings or update inefficient programs.
  1. Update ESXi and Guest OS Drivers
  • Ensure that the ESXi host and VM tools are updated to the latest versions. Outdated software can lead to inefficient CPU usage.
  1. Monitor Background Tasks
  • Stagger resource-intensive tasks such as backups, virus scans, or snapshots to run during off-peak hours.
  1. Add Host Resources
  • If the cluster consistently runs at high capacity, consider adding more hosts or upgrading the existing hardware to handle increased demand.

Preventive Measures

  • Monitor Regularly: Use vRealize Operations or another monitoring tool to proactively track resource usage.
  • Enable DRS: Automate load balancing to prevent bottlenecks.
  • Right-Size VMs: Periodically evaluate and adjust vCPU and memory allocations based on actual usage patterns.
  • Reserve Resources Strategically: Use reservations for critical VMs but avoid over-reserving resources unnecessarily.
  • Plan Capacity: Regularly review cluster capacity to ensure it aligns with business needs and future growth.

How to Configure High Availability (HA) in VMware vSphere

High Availability (HA) in VMware vSphere is an essential feature to ensure service continuity in case of host or VM failures. This article will guide you through the steps to configure HA in vSphere and highlight best practices for maintaining a robust and reliable environment.

Before enabling HA on a cluster, ensure the following requirements are met:

  • Proper vSphere licensing.
  • A configured cluster in vSphere.
  • A stable and redundant management network.
  • Shared storage accessible by all hosts in the cluster

Section 3: Step-by-Step Guide to Configure HA in vSphere

  1. Access the vCenter Server:
    • Log in to the vCenter interface.
  2. Create or Select a Cluster:
    • Right-click on the Datacenter, select “New Cluster,” and set up the cluster.
  3. Enable High Availability:
    • In the cluster menu, click “Configure.”
    • Under “Availability,” click “Edit.”
    • Enable HA, set failover policies, and click “OK.”
  4. Add Hosts to the Cluster:
    • Drag and drop hosts into the cluster or add them manually.
  5. Test the Configuration:
    • Simulate a failure to confirm that VMs automatically restart on another host.

Best Practices for Configuring HA

  • Configure multiple paths for management networks.
  • Use Distributed Resource Scheduler (DRS) with HA for load balancing.
  • Regularly monitor the cluster to detect issues before failures occur.

“Scan or remediation is not supported on because of unsupported OS …” for certain operating systems

 

The VMware wrote this  workaround, so you can manually add the operating system to the list for VMware Update Manager.

  1. Connect to your vCenter Server Appliance per SSH and log in
  2. Create a backup of the vci-integrity.xml file:
mkdir /backup && cp /usr/lib/vmware-updatemgr/bin/vci-integrity.xml /backup/
  1. Modify the vci-integrity.xml file by opening the file using vi editor:
vi /usr/lib/vmware-updatemgr/bin/vci-integrity.xml
  1. Locate the <vci_vcIntegrity> ….. </vci_vcIntegrity> section
  2. Enter edit mode by hitting the insert or the letter i button
  3. Before the </vci_vcIntegrity> line, add the following lines, depending on the operating system configured in your virtual machine. If entering both versions of the same OS (ie: Windows 2019 AND 2022), see the below Note section
  • For Debian 11 (32 bit):
    <supportedLinuxGuestIds>
      <debian11Guest/>
    </supportedLinuxGuestIds>
  • For Debian 11 (64 bit):
    <supportedLinuxGuestIds>
      <debian11_64Guest/>
    </supportedLinuxGuestIds>​​​​​​
  • For Red Hat Enterprise Linux 9 (64 bit):
    <supportedLinuxGuestIds>
      <rhel9_64Guest/>  
    </supportedLinuxGuestIds>

Some Linux have same issue, for that follow the list of all supported Linux Guest OS IDs

asianux3Guest
asianux3_64Guest
asianux4Guest
asianux4_64Guest
asianux5_64Guest
centosGuest
centos64Guest
coreos64Guest
debian4Guest
debian4_64Guest
debian5Guest
debian5_64Guest
debian6Guest
debian6_64Guest
debian7Guest
debian7_64Guest
debian8Guest
debian8_64Guest
oracleLinuxGuest
oracleLinux64Guest
rhel7Guest
rhel7_64Guest
rhel6Guest
rhel6_64Guest
rhel5Guest
rhel5_64Guest
rocklinux_64Guest
fedoraGuest
fedora64Guest
sles12Guest
sles12_64Guest
sles11Guest
sles11_64Guest
sles10Guest
sles10_64Guest
opensuseGuest
opensuse64Guest
ubuntuGuest
ubuntu64Guest
otherlinuxguest
otherlinux64guest

Round Robin ESXi

Best practices to add in your ESXi if you have this storage.

  • To storage Compelant  – esxcli storage nmp satp rule add -s “VMW_SATP_ALUA” -V “COMPELNT” -P “VMW_PSP_RR” -O “iops=3”
  • To storage PowerMAx – esxcli storage nmp satp rule add -s “VMW_SATP_SYMM” -V “EMC” -M “SYMMETRIX” -P “VMW_PSP_RR” -O “iops=1”
  • To storage Huawei – esxcli storage nmp satp rule add -V HUAWEI -M XSG1 -s VMW_SATP_DEFAULT_AA -P VMW_PSP_RR -c tpgs_off
  • To storage IBM – esxcli storage nmp satp set –default-psp VMW_PSP_RR –satp VMW_SATP_SVC
  • To storage Dell Unity – esxcli storage nmp satp rule add –satp “VMW_SATP_ALUA_CX” –vendor “DGC” –psp “VMW_PSP_RR” –psp-option “iops=1″ –claim-option=”tpgs_on”
  • To storage Huawei  . esxcli storage nmp satp rule add -V HUAWEI -M XSG1 -s VMW_SATP_DEFAULT_AA -P VMW_PSP_RR -O iops=1 -c tpgs_off

The Datastore still appears because there is an active process or an iso mounted/snapshot.

The Datastore still appears because there is an active process or an iso mounted/snapshot.

First select your cluster -> right click -> datastore -> and rescan

Second, go to vCenter select the datastore and check Check which host you didn’t unmount on, also you can go to the VM tab of this datastore and check if you have a VM.

Login to ESXi over ssh and check this outpu

lsof | grep name of datastore

Run this esxcli storage filesystem list find storage UUID and send me this output.

lsof | grep storage uuid

Replace the naa.

Run this command vsish -e ls /storage/scsifw/devices/<“naa. id of your datastore”>/worlds/ |sed ‘s:/::’ |while read i;do ps |grep $i;done

Check the lock file/ process file  to unmount the datastore

 

Remove vSphere Replication flag from VM

In my case the customer removed the vSphere replication appliance, so I needed removed all vsphere replications.

  1. Login to ESXi
  2. Execute this command to list the VMS
    vim-cmd vmsvc/getallvms | awk ‘$3 ~ /^\[/ {print $1}’
  3. Execute this command to remove the flag (you need replace VM id)
    vim-cmd hbrsvc/vmreplica.disable 1011