Skip to content

Author: Luciano Batalha

The Problem: ESXi Host Loses Network Connectivity

Step-by-Step Troubleshooting Guide

  1. Verify Network Configuration on the Host
  • Log in via DCUI (Direct Console User Interface) or SSH.
  • Check the IP address, gateway, and DNS settings:

esxcli network ip interface ipv4 get

  • Confirm that the host can ping the gateway or vCenter server:

ping <gateway-IP>

  1. Check vSwitch and Port Group Settings
  • In the vSphere Client, navigate to Networking > Virtual Switches.
  • Ensure the following:
    • The physical NICs (vmnics) are attached and active.
    • Port groups have the correct VLAN ID settings.
  • Verify the load balancing and failover policies for inconsistencies.
  1. Inspect Physical Network Connections
  • Check for issues with cables, switches, or ports.
  • Test connectivity using tools like link lights on NICs or port activity indicators on the switch.
  • Replace faulty cables or move connections to different switch ports if needed.
  1. Test and Reconfigure NICs
  • Verify the status of all NICs:

esxcli network nic list

  • Re-enable or restart any problematic NICs:

esxcli network nic down -n <vmnic-name>
esxcli network nic up -n <vmnic-name>

  1. Address Driver or Firmware Issues
  • Check the HCL (Hardware Compatibility List) for your ESXi version.
  • Update or reinstall the NIC driver and firmware if outdated or incompatible:

esxcli software vib install -v /path/to/driver.vib

  1. Monitor for IP Conflicts
  • Use tools like ARP tables on the switch or router to detect conflicting IP addresses.
  • Assign a new static IP address to the host if conflicts are found.
  1. Restart Management Network
  • Restart the management network via the DCUI:
    • Select Troubleshooting Options > Restart Management Network.
  • Alternatively, restart it via SSH:

services.sh restart

  1. Examine Logs for Deeper Insights
  • Review network-related logs for clues:

tail -f /var/log/vmkernel.log

tail -f /var/log/hostd.log

Resolving a Common VMware ESXi Issue – PSOD on Boot

The PSOD is VMware’s equivalent of the “blue screen” in Windows. It halts the ESXi host and displays diagnostic information in a purple background. One scenario that often triggers a PSOD is a hardware compatibility or driver issue, especially after an upgrade or new hardware deployment.

Root Cause Analysis

Common reasons for a PSOD on boot include:

  1. Incompatible Drivers: Using a driver version that doesn’t match the hardware or ESXi version.
  2. Faulty Hardware: Issues with RAM, storage controllers, or network adapters.
  3. Configuration Errors: Misconfigured BIOS or firmware settings.
  4. Corrupted Filesystem: Problems with the ESXi boot partition.

Step-by-Step Resolution

  1. Gather Information from the PSOD Screen
  • Note the error message and codes displayed on the PSOD.
  • Look for references to specific drivers, memory modules, or hardware.
  1. Reboot in Recovery Mode
  • Restart the host and enter Recovery Mode from the boot menu.
  • Check system logs using the esxcli command:

esxcli system syslog view –log /var/log/vmkernel.log

 

  1. Verify Hardware Compatibility
  • Cross-check the hardware against VMware’s Hardware Compatibility List (HCL) to ensure support for your ESXi version.
  • If issues arise, update firmware or replace problematic components.
  1. Roll Back Drivers
  • If a driver is causing the issue, try rolling back to a previous version:

software vib remove -n <driver-name>

 

  • Reboot the host and confirm stability.
  1. Check for Corrupted Boot Files
  • Boot the ESXi installation media and select Repair System.
  • Reinstall or repair the ESXi system files without wiping the datastore.
  1. Update or Patch ESXi
  • Check VMware’s knowledge base for patches or updates related to your error code.
  • Update your ESXi host:

esxcli software profile update -d <URL-to-depot>

Advanced VMware ESXi Troubleshooting

Troubleshooting VMware ESXi environments often requires advanced techniques and a deep understanding of where critical information is stored. From log files to configuration files, knowing where to look can significantly reduce downtime and speed up root cause analysis. In this guide, we’ll explore essential file locations, logs, and tips to troubleshoot ESXi like a pro.

VMware ESXi stores log files that capture system events, kernel activities, and errors. These logs are invaluable for diagnosing issues.

Log File Description Location
vmkernel.log Records kernel activities, such as VM operations. /var/log/vmkernel.log
hostd.log Tracks management service operations (Hostd). /var/log/hostd.log
vpxa.log Logs vCenter agent operations. /var/log/vpxa.log
messages General system messages (hardware-related issues). /var/log/messages
vmkwarning.log Captures warnings from the VMkernel. /var/log/vmkwarning.log
esxupdate.log Logs patching and upgrade activities. /var/log/esxupdate.log
shell.log Tracks shell commands entered by users. /var/log/shell.log

Accessing ESXi Logs

1. Using vSphere Client:

  • Navigate to the host in the vSphere Client.
  • Go to Monitor > Logs to view and download logs.

2. Accessing Logs via SSH:

  • Enable SSH on the ESXi host.
  • Use an SSH client to connect to the host and navigate to /var/log.

3. Using DCUI (Direct Console User Interface):

  • Log in to the DCUI.
  • Navigate to View System Logs and select the desired log file.

Configuration File Locations

Understanding where ESXi stores its configuration files can help diagnose startup and configuration issues.

Configuration File Description Location
esx.conf ESXi host configuration. /etc/vmware/esx.conf
config.xml Hostd configuration file. /etc/vmware/hostd/config.xml
datastores.xml Datastore configuration. /etc/vmware/vmware/datastores.xml
vpxa.cfg vCenter agent configuration. /etc/opt/vmware/vpxa/vpxa.cfg
.vmx files VM configuration files. Stored in the VM’s directory

Best Practices for Troubleshooting ESXi

  1. Take Backups Before Changes:
    • Always back up the ESXi configuration (vicfg-cfgbackup) before troubleshooting.
  2. Use VMware Knowledge Base (KB):
    • Cross-reference error messages with VMware’s KB articles for guidance.
  3. Isolate the Problem:
    • Identify whether the issue is related to hardware, VM, network, or storage.
  4. Monitor Resource Usage:
    • Use esxtop to view real-time resource usage and identify bottlenecks.
  5. Engage VMware Support:
    • If an issue persists, collect logs and open a support ticket with VMware.

Virtual Machine Hardening in VMware ESXi

Virtual machines (VMs) are critical assets in any VMware ESXi environment. Securing them is as important as securing physical infrastructure. Hardening your VMs ensures that they are protected against threats and vulnerabilities. In this guide, we’ll discuss actionable steps and best practices to harden your VMs running on VMware ESXi.

Why Harden Virtual Machines?

VMs often house critical workloads, sensitive data, and business applications. Without proper hardening, they can become easy targets for attackers. Hardening virtual machines reduces their attack surface by disabling unnecessary features, securing communications, and enforcing access controls.

Virtual Machine Hardening Checklist

1. Limit Virtual Hardware Exposure

  • Disable Unnecessary Devices:
    Remove devices like floppy drives, parallel ports, and CD-ROMs unless required.
  • Restrict Network Adapter Settings:
    Disable promiscuous mode, forged transmits, and MAC address changes in VM network adapter settings.
  • Set VM Memory and CPU Limits:
    Use resource limits to prevent resource exhaustion by malicious or misconfigured VMs.

2. Secure Boot and UEFI

  • Enable Secure Boot for supported guest operating systems to prevent unauthorized OS changes.

3. Use VM Encryption

  • Encrypt virtual disks and VM configuration files using VMware’s built-in encryption feature.
  • Use a Key Management Server (KMS) to manage encryption keys securely.

Network Security for Virtual Machines

1. Isolate VM Traffic

  • Use VLANs to segment VM traffic.
  • Separate management, storage, and application traffic.

2. Enable Firewalls

  • Configure distributed firewalls in VMware NSX (if available) to control VM communication.

3. Monitor Network Traffic

  • Use tools like VMware vRealize Network Insight to analyze and monitor network traffic patterns.

Access Control and Authentication

1. Restrict Access to VM Management

  • Assign least privilege roles to users accessing VMs via vCenter or ESXi.
  • Enable two-factor authentication for vSphere accounts.

2. Disable Unnecessary VM Services

  • Disable VM features like copy/paste between guest and host, drag-and-drop, and unnecessary COM ports.

3. Guest OS Hardening

  • Update and patch the guest operating system regularly.
  • Disable guest OS features not required for the workload.

Logging and Monitoring

1. Enable VM Activity Logging

  • Ensure logs capture VM actions such as power on/off, migrations, and configuration changes.
  • Send logs to a central syslog server for analysis.

2. Monitor VM Health and Anomalies

  • Use VMware vRealize Operations or third-party tools for proactive monitoring.
  • Set alerts for unusual activity like high CPU usage or network spikes.

Best Practices for Virtual Machine Hardening

  1. Regularly Update VMware Tools:
    • Keep VMware Tools up-to-date to ensure compatibility and security improvements.
  2. Perform Regular Security Audits:
    • Periodically review VM configurations to ensure compliance with security policies.
  3. Backup VMs Securely:
    • Encrypt VM backups to protect data in case of a breach.
  4. Follow VMware’s Security Configuration Guide:
    • Use VMware’s official hardening guide as a reference for best practices.

Recommend powercli settings

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.copy.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.dnd.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.setGUIOptions.enable” -value $false

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.paste.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.diskShrink.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.diskWiper.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.ghi.launchmenu.change” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.memSchedFakeSampleStats.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.unity.push.update.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “tools.guestlib.enableHostInfo” -value $false

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.device.connectable.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.device.edit.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.getCreds.disable” -value $true

Get-VM -name XXX| New-AdvancedSetting -Name “guest.command.enabled” -value $false

Get-VM -name XXX| New-AdvancedSetting -Name “vmci0.unrestricted” -value $false

Get-VM -name XXX| New-AdvancedSetting -Name “log.rotateSize” -value “1000000”

Get-VM -name XXX| New-AdvancedSetting -Name “log.keepOld” -value “10”

Get-VM -name XXX| New-AdvancedSetting -Name “tools.setInfo.sizeLimit” -value “1048576”

Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.dnd.disable” -value $true

How to Identify the RDM ID of a Virtual Machine in VMware vSphere

Raw Device Mapping (RDM) allows virtual machines to directly access a physical storage device, making it ideal for scenarios like clustering and certain database setups. However, when troubleshooting or managing storage configurations, it’s essential to identify the RDM ID associated with a virtual machine. In this guide, we’ll explain how to find the RDM ID using VMware tools like the vSphere Client and, ESXi CLI.

Identifying the RDM ID via vSphere Client

Steps to Follow:

  1. Open the vSphere Client and log in to the vCenter Server.
  2. Navigate to the VM Settings of the virtual machine with the RDM.
  3. Locate the disk marked as Physical (RDM) or Virtual (RDM).
  4. Check the Device Name (e.g., vmhba2:0:1) in the disk settings. This is the identifier for the RDM.
    Tip: You may also see the RDM file path in the datastore browser, which points to the mapping file (e.g., /vmfs/volumes/datastore_name/vm_name/rdm.vmdk).

Using the ESXi CLI to Identify RDM Mappings

Steps to Use CLI:

  1. Enable SSH on the ESXi Host:
    • Use the vSphere Client or DCUI to enable SSH access to the ESXi host.
  2. SSH into the Host:
    • Use an SSH client (e.g., PuTTY) to log in to the ESXi host.
  3. List RDM Mappings:
    Run the following command to identify RDM devices attached to the VM:

    ls -l /vmfs/devices/rvm

    This command displays the mappings of RDM devices.

  4. Verify VM Device Links:
    Identify the RDM mappings for the specific VM by cross-referencing the VM’s disk configuration.

Best Practices for Managing RDMs

  1. Label RDM Devices Clearly:
    Use consistent naming conventions in vSphere to avoid confusion.
  2. Document RDM Configurations:
    Maintain detailed records of RDM devices and their associated VMs.
  3. Monitor Storage I/O:
    Regularly check the performance and health of physical devices backing RDMs.
  4. Test Before Modifying:
    If you plan to migrate or change an RDM device, ensure compatibility and test in a non-production environment.

Object type requires hosted I/O

After power outage the VM will not power-on and throws the following error:

Object type requires hosted I/O

Login to the ESXi host over ssh.
Browse to the VM folder containing the disk files.

Run the following command:

vmkfstools -x check “test.vmdk”
Disk needs repaired

vmkfstools -x repair “test.vmdk”
Disk was successfully repaired.

Start VM

VMware Free ESXi No Longer Free

As of February 12, 2024, VMware, under Broadcom’s ownership, discontinued the free version of its ESXi hypervisor.

This change is part of VMware’s transition from perpetual licensing to subscription-based models, aiming to streamline and simplify its product offerings.

Consequently, the free vSphere Hypervisor (ESXi 7.x and 8.x) is no longer available for download or use.

Users seeking virtualization solutions are now encouraged to explore VMware’s subscription-based offerings, such as VMware Cloud Foundation and VMware vSphere Foundation.

For those who previously utilized the free ESXi version, it’s important to note that while existing installations may continue to function, they will no longer receive updates or official support. Transitioning to a subscription-based model will ensure access to the latest features, security patches, and technical support.

This shift reflects a broader industry trend towards subscription services, offering customers more flexibility and continuous access to product enhancements. Organizations currently relying on the free ESXi hypervisor should assess their virtualization needs and consider migrating to VMware’s subscription-based solutions to maintain a secure and supported virtual infrastructure.

Configuring and Testing NTP on VMware ESXi

Accurate time synchronization is critical in VMware environments for tasks like logging, auditing, cluster synchronization, and troubleshooting. VMware ESXi uses NTP to synchronize time with external servers. In this guide, we’ll walk through the steps to configure, test, and troubleshoot NTP on ESXi.

Configuring NTP on ESXi Using vSphere Client

Steps to Configure NTP:

  1. Log in to the vSphere Client:
    Open the vSphere Client and connect to your vCenter Server.
  2. Navigate to the Host Settings:
    • Select the ESXi host from the inventory.
    • Go to Host > Configure > System > Time Configuration.
  3. Edit Time Configuration:
    • Click Edit in the Time Configuration section.
    • Select Use Network Time Protocol (Enable NTP).
    • Add the NTP server addresses (e.g., time.google.com, pool.ntp.org).
    • Ensure the NTP service is set to Start and Stop with Host.
  4. Start the NTP Service:
    • Go to Services under the Host settings.
    • Locate the NTP Daemon (ntpd) service and click Start.

Tips:

  • Use at least two NTP servers for redundancy.
  • Ensure the ESXi host can reach the NTP servers over the network.

Configuring NTP via the ESXi Host Client

If you’re managing a standalone ESXi host:

  1. Log in to the ESXi Host Client.
  2. Navigate to Manage > System > Time & Date.
  3. Click Edit Settings and enter the NTP servers.
  4. Enable the Start and Stop with Host option for the NTP service.
  5. Save the settings and manually start the NTP service.

esting NTP Configuration

1. Verify NTP Status:

Use the vSphere Client or Host Client to check the NTP service status under the Services section.

2. Test NTP via CLI:

SSH into the ESXi host and run:

ntpq -p

This command shows the NTP peer list and synchronization status. Look for:

  • Remote: The NTP server.
  • Offset: Time difference between the host and the NTP server.
  • Reach: Indicates whether the server is reachable.

3. Check System Time:

Run the following command to verify the host’s current time:

date

Troubleshooting NTP Issues

1. Verify Network Connectivity:
Ensure the ESXi host can reach the NTP server.

  • Use ping or nc to test connectivity:
    ping <ntp_server>
    nc -zv <ntp_server> 123

    2. Check Firewall Rules:

    • Ensure port 123/UDP is open on the host and between the ESXi host and the NTP server.

    3. Review NTP Logs:

    • Check the NTP service logs on the ESXi host:
      cat /var/log/ntp.log

      4. Restart NTP Service:

      If synchronization issues persist, restart the NTP service:
      /etc/init.d/ntpd restart

      5. Verify Time Drift:

      • Check if the host’s hardware clock is out of sync using:
        hwclock --show

Troubleshooting Datastore Connectivity Issues in VMware ESXi

The Problem: Datastore Connectivity Lost

An ESXi host may lose connection to one or more datastores for several reasons:

  1. Network Configuration Issues (for NFS/iSCSI): Incorrect IP settings or firewalls.
  2. Storage Array Failures: Issues with the backend SAN or NAS hardware.
  3. Pathing Problems: Multipath configuration errors or a single-path failure.
  4. Corrupted Filesystem: Datastore metadata issues on the storage device.

How to Troubleshoot and Resolve

  1. Verify the Storage Status in vSphere
  • Navigate to Storage > Datastores in vSphere Client.
  • Check if the affected datastore is listed and its status (e.g., Inactive or Not Connected).
  1. Validate Physical Connections
  • For iSCSI or NFS datastores:
    • Ensure the host’s VMkernel NICs are online and configured with the correct IP.
    • Test network connectivity to the storage target using ping or vmkping:

vmkping <storage-IP>

  • For SAN-based datastores:
    • Inspect HBA (Host Bus Adapter) connections and verify the fiber cables or SFPs.
  1. Rescan Storage Adapters
  • Perform a manual rescan to detect any lost paths or devices:
    • Go to Host > Storage Adapters > Rescan All in the vSphere Client.
    • Alternatively, use SSH:

esxcli storage core adapter rescan –all

  1. Check Multipath Configuration

Use the command:

esxcli storage nmp device list

Look for any inactive paths and troubleshoot based on the path state.

For active-active arrays, ensure the correct Path Selection Policy (PSP) is set (e.g., Round Robin).

  1. Validate Storage Array Health

Log in to the storage array management interface to check for:

Controller failures.

Degraded or offline LUNs/volumes.

Restart the affected LUN if needed, ensuring no other hosts are dependent on it.

  1. Recreate the Datastore Mount (NFS/iSCSI)

If the datastore remains inaccessible:

Unmount the datastore from the ESXi host.

Recreate the connection by adding the NFS or iSCSI target:

esxcli storage nfs add –host=<server-IP> –share=<nfs-path> –volume-name=<name>

 Repair Filesystem Corruption (VMFS)

  • Use VMware’s built-in recovery tools to fix VMFS issues:

vmkfstools -R /vmfs/devices/disks/<datastore-ID>