Step-by-Step Troubleshooting Guide
esxcli network ip interface ipv4 get
ping <gateway-IP>
esxcli network nic list
esxcli network nic down -n <vmnic-name>
esxcli network nic up -n <vmnic-name>
esxcli software vib install -v /path/to/driver.vib
services.sh restart
tail -f /var/log/vmkernel.log
tail -f /var/log/hostd.log
The PSOD is VMware’s equivalent of the “blue screen” in Windows. It halts the ESXi host and displays diagnostic information in a purple background. One scenario that often triggers a PSOD is a hardware compatibility or driver issue, especially after an upgrade or new hardware deployment.
Root Cause Analysis
Common reasons for a PSOD on boot include:
Step-by-Step Resolution
esxcli system syslog view –log /var/log/vmkernel.log
software vib remove -n <driver-name>
esxcli software profile update -d <URL-to-depot>
Troubleshooting VMware ESXi environments often requires advanced techniques and a deep understanding of where critical information is stored. From log files to configuration files, knowing where to look can significantly reduce downtime and speed up root cause analysis. In this guide, we’ll explore essential file locations, logs, and tips to troubleshoot ESXi like a pro.
VMware ESXi stores log files that capture system events, kernel activities, and errors. These logs are invaluable for diagnosing issues.
Log File | Description | Location |
---|---|---|
vmkernel.log |
Records kernel activities, such as VM operations. | /var/log/vmkernel.log |
hostd.log |
Tracks management service operations (Hostd). | /var/log/hostd.log |
vpxa.log |
Logs vCenter agent operations. | /var/log/vpxa.log |
messages |
General system messages (hardware-related issues). | /var/log/messages |
vmkwarning.log |
Captures warnings from the VMkernel. | /var/log/vmkwarning.log |
esxupdate.log |
Logs patching and upgrade activities. | /var/log/esxupdate.log |
shell.log |
Tracks shell commands entered by users. | /var/log/shell.log |
/var/log
.Understanding where ESXi stores its configuration files can help diagnose startup and configuration issues.
Configuration File | Description | Location |
---|---|---|
esx.conf |
ESXi host configuration. | /etc/vmware/esx.conf |
config.xml |
Hostd configuration file. | /etc/vmware/hostd/config.xml |
datastores.xml |
Datastore configuration. | /etc/vmware/vmware/datastores.xml |
vpxa.cfg |
vCenter agent configuration. | /etc/opt/vmware/vpxa/vpxa.cfg |
.vmx files |
VM configuration files. | Stored in the VM’s directory |
vicfg-cfgbackup
) before troubleshooting.esxtop
to view real-time resource usage and identify bottlenecks.Virtual machines (VMs) are critical assets in any VMware ESXi environment. Securing them is as important as securing physical infrastructure. Hardening your VMs ensures that they are protected against threats and vulnerabilities. In this guide, we’ll discuss actionable steps and best practices to harden your VMs running on VMware ESXi.
VMs often house critical workloads, sensitive data, and business applications. Without proper hardening, they can become easy targets for attackers. Hardening virtual machines reduces their attack surface by disabling unnecessary features, securing communications, and enforcing access controls.
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.copy.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.dnd.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.setGUIOptions.enable” -value $false
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.paste.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.diskShrink.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.diskWiper.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.ghi.launchmenu.change” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.memSchedFakeSampleStats.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.unity.push.update.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “tools.guestlib.enableHostInfo” -value $false
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.device.connectable.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.device.edit.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.getCreds.disable” -value $true
Get-VM -name XXX| New-AdvancedSetting -Name “guest.command.enabled” -value $false
Get-VM -name XXX| New-AdvancedSetting -Name “vmci0.unrestricted” -value $false
Get-VM -name XXX| New-AdvancedSetting -Name “log.rotateSize” -value “1000000”
Get-VM -name XXX| New-AdvancedSetting -Name “log.keepOld” -value “10”
Get-VM -name XXX| New-AdvancedSetting -Name “tools.setInfo.sizeLimit” -value “1048576”
Get-VM -name XXX| New-AdvancedSetting -Name “isolation.tools.dnd.disable” -value $true
Raw Device Mapping (RDM) allows virtual machines to directly access a physical storage device, making it ideal for scenarios like clustering and certain database setups. However, when troubleshooting or managing storage configurations, it’s essential to identify the RDM ID associated with a virtual machine. In this guide, we’ll explain how to find the RDM ID using VMware tools like the vSphere Client and, ESXi CLI.
vmhba2:0:1
) in the disk settings. This is the identifier for the RDM./vmfs/volumes/datastore_name/vm_name/rdm.vmdk
).This command displays the mappings of RDM devices.
After power outage the VM will not power-on and throws the following error:
Object type requires hosted I/O
Login to the ESXi host over ssh.
Browse to the VM folder containing the disk files.
Run the following command:
vmkfstools -x check “test.vmdk”
Disk needs repaired
vmkfstools -x repair “test.vmdk”
Disk was successfully repaired.
Start VM
As of February 12, 2024, VMware, under Broadcom’s ownership, discontinued the free version of its ESXi hypervisor.
This change is part of VMware’s transition from perpetual licensing to subscription-based models, aiming to streamline and simplify its product offerings.
Consequently, the free vSphere Hypervisor (ESXi 7.x and 8.x) is no longer available for download or use.
Users seeking virtualization solutions are now encouraged to explore VMware’s subscription-based offerings, such as VMware Cloud Foundation and VMware vSphere Foundation.
For those who previously utilized the free ESXi version, it’s important to note that while existing installations may continue to function, they will no longer receive updates or official support. Transitioning to a subscription-based model will ensure access to the latest features, security patches, and technical support.
This shift reflects a broader industry trend towards subscription services, offering customers more flexibility and continuous access to product enhancements. Organizations currently relying on the free ESXi hypervisor should assess their virtualization needs and consider migrating to VMware’s subscription-based solutions to maintain a secure and supported virtual infrastructure.
Accurate time synchronization is critical in VMware environments for tasks like logging, auditing, cluster synchronization, and troubleshooting. VMware ESXi uses NTP to synchronize time with external servers. In this guide, we’ll walk through the steps to configure, test, and troubleshoot NTP on ESXi.
time.google.com
, pool.ntp.org
).If you’re managing a standalone ESXi host:
Use the vSphere Client or Host Client to check the NTP service status under the Services section.
SSH into the ESXi host and run:
ntpq -p
This command shows the NTP peer list and synchronization status. Look for:
Run the following command to verify the host’s current time:
date
ping
or nc
to test connectivity:ping <ntp_server>
nc -zv <ntp_server> 123
2. Check Firewall Rules:
cat /var/log/ntp.log
If synchronization issues persist, restart the NTP service:
/etc/init.d/ntpd restart
hwclock --show
The Problem: Datastore Connectivity Lost
An ESXi host may lose connection to one or more datastores for several reasons:
How to Troubleshoot and Resolve
vmkping <storage-IP>
esxcli storage core adapter rescan –all
Use the command:
esxcli storage nmp device list
Look for any inactive paths and troubleshoot based on the path state.
For active-active arrays, ensure the correct Path Selection Policy (PSP) is set (e.g., Round Robin).
Log in to the storage array management interface to check for:
Controller failures.
Degraded or offline LUNs/volumes.
Restart the affected LUN if needed, ensuring no other hosts are dependent on it.
If the datastore remains inaccessible:
Unmount the datastore from the ESXi host.
Recreate the connection by adding the NFS or iSCSI target:
esxcli storage nfs add –host=<server-IP> –share=<nfs-path> –volume-name=<name>
Repair Filesystem Corruption (VMFS)
vmkfstools -R /vmfs/devices/disks/<datastore-ID>