So I have some ESXi servers running and needed to do hardware monitoring with Nagios.
I found check_esx_wbem.py a Python script that uses VMWare CIM (if you need to enable CIM, read more here)
The script requires python and the pywbem module. In my case, I did aptitude install ;)
The usage is simple really:
Usage : ./check_esx_wbem.py hostname user password [verbose]
Example : ./check_esx_wbem.py https://myesxi:5989 root password
Using verbose, you get a lot of output such as this:
20101014 17:09:14 Check classe CIM_ComputerSystem
20101014 17:09:15 Element Name = System Board 7:1
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:2
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:3
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:4
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:5
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:6
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:7
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:8
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Board 7:9
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:1
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:2
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:3
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:4
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:5
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:6
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = System Internal Expansion Board 16:7
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Element Name = esxi.example.com
20101014 17:09:15 Element Name = Hardware Management Controller (Node 0)
20101014 17:09:15 Element Op Status = 0
20101014 17:09:15 Check classe CIM_NumericSensor
20101014 17:09:15 Element Name = System Board 8 Power Meter
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 7 Temp 24
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 6 Temp 23
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 5 Temp 22
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Drive Backplane 1 Temp 21
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 9 Temp 20
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Processor 3 Temp 19
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 7 Temp 18
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 6 Temp 17
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 5 Temp 16
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 4 Temp 15
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 3 Temp 14
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 2 Temp 13
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Internal Expansion Board 1 Temp 12
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 8 Temp 11
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 7 Temp 10
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 6 Temp 9
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 4 Temp 7
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 3 Temp 6
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 2 Temp 5
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Memory Module 1 Temp 4
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = Processor 1 Temp 2
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = External Environment 1 Temp 1
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 4 Fans
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 2 Fan 2
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Element Name = System Board 1 Fan 1
20101014 17:09:15 Element Op Status = 2
20101014 17:09:15 Check classe CIM_Memory
20101014 17:09:16 Element Name = Proc 1 Level-1 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-1 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-1 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-1 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-2 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-2 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-2 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-2 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Proc 1 Level-3 Cache
20101014 17:09:16 Element Op Status = 0
20101014 17:09:16 Element Name = Memory
20101014 17:09:16 Element Op Status = 2
20101014 17:09:16 Check classe CIM_Processor
20101014 17:09:16 Element Name = Proc 1
20101014 17:09:16 Element Op Status = 2
20101014 17:09:16 Check classe CIM_RecordLog
20101014 17:09:16 Element Name = IPMI SEL
20101014 17:09:16 Element Op Status = 2
20101014 17:09:16 Check classe OMC_DiscreteSensor
20101014 17:09:16 Element Name = Power Supply 3 Power Supplies
20101014 17:09:16 Element Op Status = 2
20101014 17:09:16 Element Name = System Chassis 3 Ext. Health LED
20101014 17:09:16 Element Name = System Chassis 2 Int. Health LED
20101014 17:09:16 Element Name = System Chassis 1 UID Light
20101014 17:09:16 Check classe VMware_StorageExtent
20101014 17:09:16 Check classe VMware_Controller
20101014 17:09:17 Check classe VMware_StorageVolume
20101014 17:09:17 Check classe VMware_Battery
20101014 17:09:17 Check classe VMware_SASSATAPort
OK
Nagios Integration
Create a check command definition in nagios such as this:
define command {
command_name check_esxi
command_line /usr/bin/python /usr/lib/nagios/plugins/check_esx.py https://'$HOSTADDRESS$':5989 '$ARG1$' '$ARG2$' verbose
}
Create a service tied to a host:
define service {
host_name ESXi-server
service_description Hardware ESXi
use generic-service
check_command check_esxi!root!password
register 1
}
Restart Nagios and Presto, now you are monitoring the hardware on your ESXi server.
No comments:
Post a Comment