1) Objectives/Description
To setup a system & network monitoring application for monitoring our entire network smoothly and dynamically. Our chosen application is Nagios.Nagios is a system and network monitoring application. It watches hosts and services that we specify, alerting when things go bad and when they get better. Nagios have many features through which we can easily & smoothly monitor our entire network.
After completion of installation we’ll end up with:
- Nagios
and the plugins will be installed underneath /usr/local/nagios
- Nagios
will be configured to monitor a few aspects of our local system (CPU load,
disk usage, current user etc.) and few network services (SMTP, POP3, HTTP,
SNMP etc.).
- The
Nagios web interface will be accessible at http://localhost/nagios/
2) System Requirements
The only
requirement of running Nagios is a machine running Linux (or UNIX variant) and
a C compiler. Should have TCP/IP configured, as most service
checks will be performed over the network.
2.1 Prerequisites:
·
OS=
Red hat Linux, Enterprise
4/5/6. Cent OS 4/5/6
Make
sure that we have the following packages installed in our system:
- Apache
- GCC Compiler
- GD Development libraries
3)
Installation Procedure
During
portions of the installation we'll need to have root access to the
machine. Then we have to check all the prerequisites packages are installed or
not. To check the packages:
#rpm
–qa | grep httpd
#rpm
–qa | grep gcc
#rpm
–qa | grep gd
#rpm
–qa | grep glibc
If
the packages are not installed then we have to install the packages.
#rpm
–Uvh httpd-* or #yum -y install httpd-*
#rpm
–Uvh gcc-* or #yum -y install gcc-*
#rpm
–Uvh gd-* or #yum -y install gd-*
#rpm
–Uvh glibc-* or #yum -y install glibc-*
3.1 Create Account
Information:
Create
a new nagios user account and give it a password
#/usr/sbin/useradd
nagios
#passwd
nagios
Create a new nagcmd group for allowing external
commands to be submitted through the web interface. Add both the nagios user
and the apache user to the group.
#/usr/sbin/groupadd
nagcmd
#/usr/sbin/usermod -G nagcmd nagios
#/usr/sbin/usermod
-G nagcmd apache
3.2
Download Nagios and the Plugins:
Download the source code tarballs of both Nagios and the
Nagios plugins (visit http://www.nagios.org/download/
for links to the latest versions).
Create a directory for
storing the downloads.
#mkdir
~/downloads
#cd ~/downloads
3.3
Compile and Install Nagios:
Extract the Nagios source code tarball.
#cd ~/downloads
#tar xzf nagios-3.0.2.tar.gz
#cd nagios-3.0.2
Run the Nagios configure script, passing the
name of the group we have created earlier like so:
#./configure
--with-command-group=nagcmd
Compile the nagios source code
#make
#make all
Install binaries, init scripts, sample config files
and set permissions on the external command directory.
#make install
#make
install-init
#make
install-config
#make install-commandmode
3.4 Customize Configuration
Sample
configurations files have now been installed in the /usr/local/nagios/etc
directory. These sample files should work fine for getting started with Nagios.
We'll need to make just one change before we proceed...
Edit the /usr/local/nagios/etc/objects/contacts.cfg config
file with one of our favourite editor and change the email address associated
with the nagiosadmin contact definition to the address we'd like to use
for receiving alerts.
#vi /usr/local/nagios/etc/objects/contacts.cfg
3.5 Configure the Web Interface
Install
the Nagios web config file in the Apache conf.d directory.
#make
install-webconf
Create a nagiosadmin account for logging into the Nagios
web interface and assign a password to it.
#htpasswd -c
/usr/local/nagios/etc/htpasswd.users nagiosadmin
Restart Apache to make the new settings take effect.
#service httpd
restart
3.6 Compile and Install the Nagios Plugins
Extract
the Nagios plugins source code tarball.
#cd ~/downloads
#tar xzf
nagios-plugins-1.4.11.tar.gz
#cd nagios-plugins-1.4.11
Compile and install the plugins.
#./configure
--with-nagios-user=nagios --with-nagios-group=nagios
#make
#make install
Add
Nagios to the list of system services and have it automatically start when the
system boots.
#chkconfig --add
nagios
#chkconfig
nagios on
Verify the sample
Nagios configuration files.
#/usr/local/nagios/bin/nagios
-v /usr/local/nagios/etc/nagios.cfg
If
there are no errors, start Nagios.
#service nagios
start
We should now be able to access the Nagios web interface at the URL below. We'll be prompted for the username (nagiosadmin) and password we specified earlier.
http://localhost/nagios/
Make sure the
machine's firewall rules are configured to allow access to the web server if we want to access the Nagios interface remotely.
4) Configuration
Once
we get Nagios installed and running properly, we’ll no doubt want to start
monitoring more than just our local machine. So we need to configure some
configuration files for monitoring Windows/Linux machines, Routers/Switches,
Network Printers & publicly available service (HTTP, FTP, SSH etc).
All configuration files are resides in /usr/local/nagios/etc location. Main
configuration files are:
·
/usr/local/nagios/etc/cgi.cfg
·
/usr/local/nagios/etc/nagios.cfg
·
/usr/local/nagios/etc/resource.cfg
4.1 Monitoring
Routers & Switches:
4.1.1 Configuring Nagios:
To monitor a network switch/ router we need
to edit the main nagios config file
#vi /usr/share/nagios/etc/nagios.cfg
Remove the leading pound (#) sign from the following line in the
main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg
Save the file and exit.
Now we need to define some object
definition in the switch.cfg file.
#vi /usr/local/nagios/etc/objects/switch.cfg
Add a new host definition for the switch that we're going to
monitor. A sample host definition is already in switch.cfg. Change the host_name,
alias, and address fields to appropriate values for the switch.
define host {
use generic-switch ; Inherit default values from template
host_name DHK01_EDGE_SW
; The name we're giving to the switch
Alias DHK01_EDGE_SW ; A longer name associated with switch
address 172.30.0.7 ; IP address of the switch
hostgroups EDGE_SW ;
Host groups this switch is associated with
}
4.1.2
Monitoring Packet Loss and RTA:
Now
we can add some service definitions (to the same configuration file) to monitor
different aspects of the switch.
Add
the following service definition in order to monitor packet loss and round trip
average between the Nagios host and the switch every 5 minutes under normal
conditions.
define service{
use generic-service ; Inherit values from a template
host_name DHK31_EDGE_SW ; The name of the host the service is
associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ;
Check the service every 5 minutes under normal conditions
retry_check_interval 1 ;
Re-check the service every minute until its final/hard state is determined
}
This service will be:
- CRITICAL if
the round trip average (RTA) is greater than 600 milliseconds or the
packet loss is 60% or more.
- WARNING if the
RTA is greater than 200 ms or the packet loss is 20% or more.
OK
if the RTA is less than 200 ms and the packet loss is less than 20%.
4.1.3 Monitoring
SNMP Status Information:
If
the switch or router support s SNMP, we
can monitor a lot of information by using the check_snmp plugin. Add the
following service definition to monitor the uptime of the switch.
define service{
use generic-service ;Inherit values from template
host_name DHK01_EDGE_SW
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
In the check_command
directive of the service definition above, the "-C public" tells the
plugin that the SNMP community name to be used is "public" and the
"-o sysUpTime.0" indicates which OID should be checked.
If we want to ensure that a specific port/interface on the switch
is in an up state, we could add a service definition like this:
define service{
use generic-service ; Inherit values from a template
host_name DHK01_EDGE_SW
service_description Port 1 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
Once
we've added the new host and service definitions to the switch.cfg file,
we're ready to start monitoring the router/switch. To do this, we'll need to verify
our configuration and restart nagios.
In order to verify your configuration, run Nagios with the -v
command line option like so:
#/usr/local/nagios/bin/nagios
-v /usr/local/nagios/etc/nagios.cfg
Restarting/reloading
is nececessary when you modify your configuration files and want those changes
to take effect.
#/etc/rc.d/init.d/nagios
reload
Or
#service nagios
restart