Skip to content

Month: November 2024

The Problem: ESXi Host Loses Network Connectivity

Step-by-Step Troubleshooting Guide

  1. Verify Network Configuration on the Host
  • Log in via DCUI (Direct Console User Interface) or SSH.
  • Check the IP address, gateway, and DNS settings:

esxcli network ip interface ipv4 get

  • Confirm that the host can ping the gateway or vCenter server:

ping <gateway-IP>

  1. Check vSwitch and Port Group Settings
  • In the vSphere Client, navigate to Networking > Virtual Switches.
  • Ensure the following:
    • The physical NICs (vmnics) are attached and active.
    • Port groups have the correct VLAN ID settings.
  • Verify the load balancing and failover policies for inconsistencies.
  1. Inspect Physical Network Connections
  • Check for issues with cables, switches, or ports.
  • Test connectivity using tools like link lights on NICs or port activity indicators on the switch.
  • Replace faulty cables or move connections to different switch ports if needed.
  1. Test and Reconfigure NICs
  • Verify the status of all NICs:

esxcli network nic list

  • Re-enable or restart any problematic NICs:

esxcli network nic down -n <vmnic-name>
esxcli network nic up -n <vmnic-name>

  1. Address Driver or Firmware Issues
  • Check the HCL (Hardware Compatibility List) for your ESXi version.
  • Update or reinstall the NIC driver and firmware if outdated or incompatible:

esxcli software vib install -v /path/to/driver.vib

  1. Monitor for IP Conflicts
  • Use tools like ARP tables on the switch or router to detect conflicting IP addresses.
  • Assign a new static IP address to the host if conflicts are found.
  1. Restart Management Network
  • Restart the management network via the DCUI:
    • Select Troubleshooting Options > Restart Management Network.
  • Alternatively, restart it via SSH:

services.sh restart

  1. Examine Logs for Deeper Insights
  • Review network-related logs for clues:

tail -f /var/log/vmkernel.log

tail -f /var/log/hostd.log

Resolving a Common VMware ESXi Issue – PSOD on Boot

The PSOD is VMware’s equivalent of the “blue screen” in Windows. It halts the ESXi host and displays diagnostic information in a purple background. One scenario that often triggers a PSOD is a hardware compatibility or driver issue, especially after an upgrade or new hardware deployment.

Root Cause Analysis

Common reasons for a PSOD on boot include:

  1. Incompatible Drivers: Using a driver version that doesn’t match the hardware or ESXi version.
  2. Faulty Hardware: Issues with RAM, storage controllers, or network adapters.
  3. Configuration Errors: Misconfigured BIOS or firmware settings.
  4. Corrupted Filesystem: Problems with the ESXi boot partition.

Step-by-Step Resolution

  1. Gather Information from the PSOD Screen
  • Note the error message and codes displayed on the PSOD.
  • Look for references to specific drivers, memory modules, or hardware.
  1. Reboot in Recovery Mode
  • Restart the host and enter Recovery Mode from the boot menu.
  • Check system logs using the esxcli command:

esxcli system syslog view –log /var/log/vmkernel.log

 

  1. Verify Hardware Compatibility
  • Cross-check the hardware against VMware’s Hardware Compatibility List (HCL) to ensure support for your ESXi version.
  • If issues arise, update firmware or replace problematic components.
  1. Roll Back Drivers
  • If a driver is causing the issue, try rolling back to a previous version:

software vib remove -n <driver-name>

 

  • Reboot the host and confirm stability.
  1. Check for Corrupted Boot Files
  • Boot the ESXi installation media and select Repair System.
  • Reinstall or repair the ESXi system files without wiping the datastore.
  1. Update or Patch ESXi
  • Check VMware’s knowledge base for patches or updates related to your error code.
  • Update your ESXi host:

esxcli software profile update -d <URL-to-depot>