Sunday, December 13, 2015

Nagios Installation and Configuration

1) Objectives/Description
 
To setup a system & network monitoring application for monitoring our entire network smoothly and dynamically. Our chosen application is Nagios.


Nagios is a system and network monitoring application. It watches hosts and services that we specify, alerting when things go bad and when they get better. Nagios have many features through which we can easily & smoothly monitor our entire network.



After completion of installation we’ll end up with:
  • Nagios and the plugins will be installed underneath /usr/local/nagios
  • Nagios will be configured to monitor a few aspects of our local system (CPU load, disk usage, current user etc.) and few network services (SMTP, POP3, HTTP, SNMP etc.).
  • The Nagios web interface will be accessible at http://localhost/nagios/


2) System Requirements
 
The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler. Should have TCP/IP configured, as most service checks will be performed over the network.
 
 
2.1 Prerequisites:
 
·         OS= Red hat Linux, Enterprise 4/5/6. Cent OS 4/5/6

Make sure that we have the following packages installed in our system:
  • Apache
  • GCC Compiler
  •  GD Development libraries


3) Installation Procedure

     During portions of the installation we'll need to have root access to the machine. Then we have to check all the prerequisites packages are installed or not. To check the packages:

   #rpm –qa | grep httpd
   #rpm –qa | grep gcc
   #rpm –qa | grep gd
   #rpm –qa | grep glibc

   If the packages are not installed then we have to install the packages.

   #rpm –Uvh httpd-* or #yum -y install httpd-*
   #rpm –Uvh gcc-* or #yum -y install gcc-*
   #rpm –Uvh gd-* or #yum -y install gd-*
   #rpm –Uvh glibc-* or #yum -y install glibc-*

   3.1 Create Account Information:

   Create a new nagios user account and give it a password

   #/usr/sbin/useradd nagios
   #passwd nagios
   
   Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
   #/usr/sbin/groupadd nagcmd
   #/usr/sbin/usermod -G nagcmd nagios
   #/usr/sbin/usermod -G nagcmd apache


   3.2 Download Nagios and the Plugins:
   Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions).
   Create a directory for storing the downloads.
   #mkdir ~/downloads
   #cd ~/downloads
   
   3.3 Compile and Install Nagios:
   Extract the Nagios source code tarball.
   #cd ~/downloads
   #tar xzf nagios-3.0.2.tar.gz
   #cd nagios-3.0.2

   Run the Nagios configure script, passing the name of the group we have created earlier like so:
#./configure --with-command-group=nagcmd

   Compile the nagios source code

   #make
   #make all

   Install binaries, init scripts, sample config files and set permissions on the external command directory.

   #make install
   #make install-init
   #make install-config
   #make install-commandmode

   3.4 Customize Configuration
   Sample configurations files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. We'll need to make just one change before we proceed...

   Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with one of our favourite editor and change the email address associated with the nagiosadmin contact definition to the address we'd like to use for receiving alerts.
   #vi /usr/local/nagios/etc/objects/contacts.cfg

   3.5 Configure the Web Interface
   Install the Nagios web config file in the Apache conf.d directory.
   #make install-webconf
   
   Create a nagiosadmin account for logging into the Nagios web interface and assign a password to it.
   #htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
   
   Restart Apache to make the new settings take effect.
   #service httpd restart
   
   3.6 Compile and Install the Nagios Plugins
   Extract the Nagios plugins source code tarball.
   #cd ~/downloads
   #tar xzf nagios-plugins-1.4.11.tar.gz
   #cd nagios-plugins-1.4.11

   Compile and install the plugins.
   #./configure --with-nagios-user=nagios --with-nagios-group=nagios
   #make
   #make install

   3.7 Start Nagios
   Add Nagios to the list of system services and have it automatically start when the system boots.
   #chkconfig --add nagios
   #chkconfig nagios on

   Verify the sample Nagios configuration files.

   #/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

   If there are no errors, start Nagios.
   #service nagios start

   3.8 Login to the Web Interface
   We should now be able to access the Nagios web interface at the URL below. We'll be prompted for the username (nagiosadmin) and password we specified earlier.
   http://localhost/nagios/
   Make sure the machine's firewall rules are configured to allow access to the web server if we want to access the Nagios interface remotely.


   4) Configuration
   Once we get Nagios installed and running properly, we’ll no doubt want to start monitoring more than just our local machine. So we need to configure some configuration files for monitoring Windows/Linux machines, Routers/Switches, Network Printers & publicly available service (HTTP, FTP, SSH etc).
   
   All configuration files are resides in /usr/local/nagios/etc location. Main configuration files are:
·         /usr/local/nagios/etc/cgi.cfg
·         /usr/local/nagios/etc/nagios.cfg
·         /usr/local/nagios/etc/resource.cfg
   
   4.1 Monitoring Routers & Switches:

4.1.1 Configuring Nagios:
   To monitor a network switch/ router we need to edit the main nagios config file
   #vi /usr/share/nagios/etc/nagios.cfg
   
   Remove the leading pound (#) sign from the following line in the main configuration file:
   #cfg_file=/usr/local/nagios/etc/objects/switch.cfg
   Save the file and exit.
   
   Now we need to define some object definition in the switch.cfg file.
   #vi /usr/local/nagios/etc/objects/switch.cfg
   
   Add a new host definition for the switch that we're going to monitor. A sample host definition is already in switch.cfg. Change the host_name, alias, and address fields to appropriate values for the switch.
define host {
        use        generic-switch  ; Inherit default values from template
        host_name  DHK01_EDGE_SW  ; The name we're giving to the switch
        Alias      DHK01_EDGE_SW  ; A longer name associated with switch
        address    172.30.0.7     ; IP address of the switch
        hostgroups EDGE_SW        ; Host groups this switch is associated with
        }

4.1.2 Monitoring Packet Loss and RTA:
Now we can add some service definitions (to the same configuration file) to monitor different aspects of the switch.

Add the following service definition in order to monitor packet loss and round trip average between the Nagios host and the switch every 5 minutes under normal conditions.
define service{
        use                    generic-service ; Inherit values from a template
        host_name              DHK31_EDGE_SW  ; The name of the host the service is associated with
        service_description    PING           ; The service description
        check_command          check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
        normal_check_interval  5       ; Check the service every 5 minutes under normal conditions
        retry_check_interval   1       ; Re-check the service every minute until its final/hard state is determined
        }
This service will be:
  • CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more.
  • WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more.
OK if the RTA is less than 200 ms and the packet loss is less than 20%.


4.1.3 Monitoring SNMP Status Information:
If the switch or router supports SNMP, we can monitor a lot of information by using the check_snmp plugin. Add the following service definition to monitor the uptime of the switch.
define service{
        use                    generic-service ;Inherit values from template
        host_name              DHK01_EDGE_SW
        service_description    Uptime 
        check_command          check_snmp!-C public -o sysUpTime.0
        }
In the check_command directive of the service definition above, the "-C public" tells the plugin that the SNMP community name to be used is "public" and the "-o sysUpTime.0" indicates which OID should be checked.


If we want to ensure that a specific port/interface on the switch is in an up state, we could add a service definition like this:
define service{
        use                    generic-service ; Inherit values from a template
        host_name              DHK01_EDGE_SW
        service_description    Port 1 Link Status
        check_command          check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
        }


Once we've added the new host and service definitions to the switch.cfg file, we're ready to start monitoring the router/switch. To do this, we'll need to verify our configuration and restart nagios.
In order to verify your configuration, run Nagios with the -v command line option like so:
#/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Restarting/reloading is nececessary when you modify your configuration files and want those changes to take effect.
#/etc/rc.d/init.d/nagios reload
Or

#service nagios restart


  Enjoy :)




















1 comment:

Unknown said...

Good job!!! :)
But how about client side configuration like nrpe configuration.

Restore Archived Log into VMware Aria Operations for Logs (formerly known as vRealize Log Insight - vRLI)

As we cannot keep all logs in searchable space in vRLI production system due to performance and slowness issue, it is always recommended to ...