Feature #1012
xymon scripts that query /dev/disk/by-id look at the wrong disks when a raid disk is present
Status: | Closed | Start date: | 05/06/2017 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 100% |
|
Category: | - | Spent time: | - | |
Target version: | - |
Description
the xymon-smart.sh and xymon-hddtmp.sh scripts have the following like the query disks:
ls /dev/disk/by-id/* | grep ve '-part' -ve '/wwn' |
the line should be the following to remove the mdadm raid devices:
ls /dev/disk/by-id/* | grep ve '-part' -ve '/wwn' ve '/md' |
In addition an the disks being used in mdadm raid are not going to to be listed in the subsequent mount query thus
an addition check should be added to agaist the /proc/mdstat for the mounted raid disk
#check if device is directly mounted
if ! mount | grep -q /dev/$DISKDEV
then
# check if device is mounted by mdadm
if ! cat /proc/mdstat | grep -q $DISKDEV
then
continue
fi
fi
Associated revisions
xymon: closes #1012
History
Updated by thekingofspain almost 8 years ago
I tried to attach two files on the submit of this defect and received a web error on the submit. This then caused two other duplicate defects to be created. Attaching my versions of the files below.
xymon-hddtemp.sh
#!/bin/sh
# NOTE: Must be run as root, so you probably need to setup sudo for this.
ls /dev/disk/by-id/* | grep -ve '-part' -ve '/md-' -ve '/wwm- ' |
while read DISK
do
DISKDEV=`ls -l $DISK | awk -F/ '{print $NF}'`
DISKNAME=`echo $DISK | awk -F/ '{print $5}' | tr ":" "_"`
#check if device is optical
if [[ $DISKDEV == "sr"* ]]
then
continue
fi
#check if device is mounted
if ! mount | grep -q /dev/$DISKDEV
then
# check if device is used by mdadm
if ! cat /proc/mdstat | grep -q $DISKDEV
then
continue
fi
fi
#check if SMART is disabled and enable
DRES=`sudo /usr/bin/smartctl -A $DISK`
if [[ $DRES == *"SMART Disabled. Use option -s with argument 'on'"* ]]
then
sudo /usr/bin/smartctl -s on $DISK
DRES=`sudo /usr/bin/smartctl -A $DISK`
fi
hddtemp=`echo "$DRES" | grep Temperature_Celsius | awk '{print $10}'`
TEMP=": $hddtemp"
if [[ $hddtemp == "" ]]
then
TEMP="- No Temp Sensor Found"
COLOR="4&clear"
elif test $hddtemp -gt 55
then
COLOR="1&red"
elif test $hddtemp -ge 50
then
COLOR="2&yellow"
else
COLOR="3&green"
fi
echo "${COLOR} $DISKNAME $TEMP"
done > /tmp/hddcheck
COLOR=`cat /tmp/hddcheck | awk '{print $1}' | sort | uniq | head -1 | cut -c3-`
# Report status to Xymon Server
$XYMON $XYMSRV "status ${MACHINE}.hddtemp ${COLOR} Hard Drive Temperatures (in °C)
xymon-smart.sh
#!/bin/sh
# NOTE: Must be run as root, so you probably need to setup sudo for this.
if test -f /tmp/dres; then rm -f /tmp/dres; fi
ls /dev/disk/by-id/* | grep -ve '-part' -ve '/md-' -ve '/wwn-' |
while read DISK
do
DISKDEV=`ls -l $DISK | awk -F/ '{print $NF}'`
#check if device is optical
if [[ $DISKDEV == "sr"* ]]
then
continue
fi
#check if device is directly mounted
if ! mount | grep -q /dev/$DISKDEV
then
# check if device is used by mdadm
if ! cat /proc/mdstat | grep -q $DISKDEV
then
continue
fi
fi
DRES=`sudo /usr/bin/smartctl -H -n standby $DISK`
DCODE=$?
#check if SMART is disabled and enable
if [[ $DRES == *"SMART Disabled. Use option -s with argument 'on'"* ]]
then
sudo /usr/bin/smartctl -s on $DISK
DRES=`sudo /usr/bin/smartctl -H -n standby $DISK`
DCODE=$?
fi
DSTBY=$(( $DCODE & 2 ))
DFAIL=$(( $DCODE & 8 ))
DWARN=$(( $DCODE & 32 ))
if test $DSTBY -ne 0
then
COLOR="4&clear"
elif test $DFAIL -ne 0
then
COLOR="1&red"
elif test $DWARN -ne 0
then
COLOR="2&yellow"
else
COLOR="3&green"
fi
echo "${COLOR} $DISK (/dev/$DISKDEV)"
echo "${COLOR} $DISK (/dev/$DISKDEV)" | cut -c2- >>/tmp/dres
echo "" >>/tmp/dres
echo "$DRES" | egrep -v "^smartctl|^Copyright|^$|^===" >>/tmp/dres
echo "-----------------------------------------------------------------------------" >>/tmp/dres
echo "" >>/tmp/dres
echo "" >>/tmp/dres
done >/tmp/dcheck
COLOR=`cat /tmp/dcheck | awk '{print $1}' | sort | uniq | head -1 | cut -c3-`
$XYMON $XYMSRV "status ${MACHINE}.smart ${COLOR} SMART Health Check
`cat /tmp/dcheck | cut -c2-`
============================== Detailed status ==============================
`cat /tmp/dres`
"
rm -f /tmp/dres /tmp/dcheck
exit 0
Updated by thekingofspain almost 8 years ago
typo in xymon-hddtmp.sh
should be '/wwn-' vs '/wwm-'
not the baseline version of the xymon-hddtmp.sh script did not have the wwn filter but the xymon-smart.sh did.
Updated by brfransen almost 8 years ago
- Tracker changed from Bug to Feature
Updated by brfransen almost 7 years ago
- % Done changed from 0 to 100
- Status changed from New to Closed
Applied in changeset 168df166590c264c05a7385776d80c30591a6ba6.