Alister West

home is where your code is ...

A Systems Monitoring Solution

For monitoring multiple hosts there are several levels of monitoring/alerts both locally and external.

Application Level

Monitor services for a host. Either user-space or root-space monitoring depending on the box is setup.

  • User

    • user-space is left mostly to the user.
    • ubic monitor for perl services. (this in turn is monitored by a cron script)
  • Root

    • custom background process monitors root services (crond, ftpd, apache, mail (qmail), antivirus (clamd), spam (spamassissan), sshd, freespace, mount, load, memory, ntpd)
    • also monitor user services installed in root space but run as different user:group (mysql, modperl, varnish, solr, memcached)
    • restart if can't find. notify on problems. escalate as per config.

Host Machine

  • snmp service for remote monitoring.
  • simple snmp stats in realtime (diskstats, httpd-stat)
  • services (apahce,qmail) dump info to file for quick lookup (when connection issues possible)
  • run apache stats every 1min
  • run mail stats every 5min
  • run system processes dump into stats-HH-MM.out (1min kept till next write - 24hrs)

Monitoring Services

  • monitor-box: monitors services with ping, telnet, etc. alerts/escalates on errors.
  • mon sends emails to monitor machine which handles escalating/alerts/etc..
  • for clustered boxes look at snmp ping (99% good enough to check host-up/down)
  • machines send their syslogs to centralised server (network support is builtin to syslog-ng.conf)
    • syslog-ng can also combine error-log of multiple web-apps into one log.
    • ErrorLog "|/usr/bin/logger -p -t mysite" # apache ErrorLog

Graphing Services

  • graphing-box: custom multi-worker app uses snmp data from all hosts.
  • host list sync'd from master server.
  • pull snmp data from all hosts
  • data stored in $data/$hosts/$service.rrd files.
  • images generated from .rrd files.
By Alister West