10 Tips for Monitoring Best Practices (Alerting and Notifications)

top ten monitoring best practices monitisMonitoring tools are only as good as YOU make them!  Not really what you wanted to hear?  Sorry but the truth will set you FREE.  This Article will cover 10 Tips for Monitoring Best Practices.  Guess what… they are all about YOU not the monitoring tools or features!

All monitoring solutions simply do what YOU tell them to.  Monitoring tools come in two varieties:

  • Sensitive types:  All alerts are automatically on, at the most sensitive level
  • Quite types:  All alerts are disabled and you need to enable them manually.

Tip 1:  Don’t just use the defaults, plan and configure alerts that work for you!

Good monitoring tools will allow for granular alerting, often used for “Escalation Alerting”.  This means you can set up alerts and thresholds based on the number of failures for a particular metric.  Here is a quick example

Alert/Threshold 1:  Email to the “new hire” he/she is young and is up at 2am anyway

Alert/Threshold 2:  Email/ SMS to lead engineer…. new hire is over his/her head

Alert/Threshold 3:  Email/SMS/ Phone call to YOU, put on your cape and save the day!

Tip 2:  Classify your systems based on importance.

Not all systems are as critical as others.  Identify the most important systems and be sure their alerting is a bit more sensitive than the others.  To do this I suggest making a list of all of your systems,  then placing them into logical categories.  Here are a few fun example categories:

Category 1:  Crap I’m about to lose my job/client!!!!

Category 2:  My Boss is going to kick my butt….

Category 3:  Here comes the phone calls….

Category 4:  Forgot to power the server on last week… nobody noticed….

Tip 3:  Never allow for a single point of failure

Not all Monitoring tools are created equal.  On premise solutions are a single point of failure (who monitors the monitoring solution?).  When SaaS monitoring tools provide more than 1 location, you should be using more than one location.  Agent Based Monitoring can monitor each device but they can also monitor fellow agents.

Tip 4:  Know your Audience, and what gets their attention

In a world where there are more cellphones in the US than there are people, knowing what media will get you and your team’s attention is key to an successful monitoring solution.  Monitoring tools that provide a wide range of alerting methods will ensure that, when that alert comes there will be someone there to hear it.

Tip 5:  Test your monitoring tool and alert system

I suggest that you keep a few spare sub.domains around for testing.  These can be handy when trying to verify your alerting and escalation rules.  

Tip 6:  Never set up an email filter for your Alerts

Never set up email filters/rules for your alerts.  If you have done this, your system is not set up correctly please go back to Tip 1.  Automatic rules that process alerts to folders will result in an effect called “out of sight, out of mind”.  This a quick way to ignore alerts, and increase downtime.

Tip 7:  If everything is quite, something is wrong

Very few system have 100% uptime.  Downtime is sometimes unavoidable.  Keep any eye on your Monitoring tool.  If you have not received an alert in a while, be sure that everything is still configured correctly.  

Tip 8:  Create a process for how Alerts are resolved

Creating a process on how to handle Alerts will allow for the quickest resolution and hold all parties accountable.  Emailing 5 people in a group only causes confusion and more work.  

Tip 9:  Ask for help, that is why Vendors have Support!

You are not alone when setting up a monitoring tool.  Good vendors have support and technical staff that are there to assist you.  Take advantage of their knowledge of the product and their experiences with other customers.  Be direct and simply ask them to review your set up before it fails and you find out it was set up incorrectly.

Tip 10:  Document Everything!!

It is always important to document exactly how you have set up your monitoring tool and why. You will want to be sure that this documentation can be uploaded dynamically and all team members are aware of how to find it.