Argus - The All Seeing, System and Network Monitoring Software

Home
Features
Testimonials
Screen Shots
Demo
Download
Docs
History
Future
Links
Contributing
Mailing List
Contacting

Notifications

Whenever something changes state, Argus can notify someone about it.

Argus realizes that in the real world, sometimes people forget, they go out of range of their paging service, their batteries die, etc. So, by default, when Argus wants to be sure someone knows about something it will require that the message be acknowledged and will resend and/or escalate until someone does.

Configuring

To specify where to send notifications:

	notify:		mail:support@example.com  qpage:joe

you can specify more than one. Currently notifying by mail and qpage are supported.

By default, Services generate notifications and Groups do not. This can be changed:

	Group "Foo" {
		sendnotify:	yes
		Service Ping {
			sendnotify:	no
		}
	}

You can change the message that gets sent:

	Service UDP/SNMP {
		# cisco inlet temperature
		oid:		.1.3.6.1.4.1.9.9.13.1.3.1.3.1
		maxvalue:	27
		messagedn:	server room is too hot
		messageup:	server room has cooled back down
	}

If you only want one message sent, and never resent or escalated:

	autoack:	yes

By default, notifications are resent every 5 minutes, to change this, specify a value in seconds:

	renotify:	120

Escalating

After attempting to notify someone of a problem repeatedly, you may want to try notifying someone else:

	escalate:    10 qpage:manager; 30 qpage:cio; 60 qpage:ceo

which means:

  • after 10 minutes page the manager
  • after 30 minutes page the CIO
  • after 1 hour page the CEO

In previous versions of Argus, the number in the escalate parameter, above, was "number of pages sent", and another version in seconds. If you are running an older version of code, you may wish to check.

Acknowledging

On the webpage, click "Un-Acked Notifies". Ack them one-by-one by clicking "Ack", or ack several by checking their checkboxes and clicking "Ack Checked", or ack all of them by clicking "Ack All".

The ability to ack is controlled by the "acl_ntfyack" access control list. The ability to "Ack All" is controlled by the "acl_ntfyackall" access control list.

In order to remain sane, the author highly recommends verifying that anyone who may be sent notifications is also permitted to acknowledge them.

Adding Other Notification Methods

Support for user defined notification methods was added in version 3.2

Argus comes with mail and qpage built in, to add other methods, pick a name for the method and add to the top of the config file:

	Method "annoy" {
		command:	winpopup %R
		send:		%M
	}

	notify:		annoy:bob

This will cause notifications to be sent to 'bob' using the 'winpopup' program. Various % sequences can be used:

  • %R - the recipient ('bob' in the example)
  • %F - the from address (set with the 'mailfrom' parameter)
  • %M - the message
  • %E - extra info (things like 'RESENT' or 'ESCALATED')
  • %N - the method name ('annoy' in the example)
  • %P - the priority (set with the 'priority' parameter)
  • %S - the object state (up or down)
  • %C - the number of times the notification has been sent
  • %I - the notification id number
  • %O - the object name - full name
  • %O{param} - the current value of the specified object parameter (3.3)
  • %T - the notification creation time
  • %T{format} - the notification creation time, using the given strftime format (3.3)
  • %Y - the severity level (set with the 'severity' parameter) (3.3)
  • %Z - the timezone used for the above time

and in the message text (message{up,dn}) some additional sequences can be used:

  • %i - the notification id number (3.3)
  • %o{param} - the value of the specified object parameter at message creation time (3.3)
  • %p - the priority (set with the 'priority' parameter) (3.3)
  • %r - the reason the object went down
  • %s - the object state (up or down) (3.3)
  • %t - the notification creation time (3.3)
  • %t{format} - the notification creation time, using the given strftime format (3.3)
  • %v - the value returned by the test
  • %y - the severity level (set with the 'severity' parameter) (3.3)
  • %z - the timezone used for the above time (3.3)