Editor’s Note: This post was updated in August 2016 for accuracy and comprehensiveness.
Monitoring tools are only as good as YOU make them! Not really what you wanted to hear? Well, at least the truth will set you free! This article will cover 10 tips for monitoring best practices. Guess what… they are all about you – not the monitoring tools or features!
All monitoring solutions simply do what YOU tell them to. Usually, monitoring tools come in two varieties:
Sensitive type – all alerts are automatically switched on.
Quite type – all alerts are disabled and you need to enable them manually.
To create a manageable alerting system, simply follow these rules!
And don’t forget to check Monitis’ new alerting integrations brought to boost your operational efficiency level!
1: Plan and Configure Alerts That Work for You!
Good monitoring tools will allow for granular alerting, often used for Escalation Alerting. This means you can set up alerts and thresholds based on the number of failures for a particular metric. Here is a quick example:
Alert Threshold 1: Email to the “new hire” (he/she is young and is up at 2 A.M. anyway)
Alert Threshold 2: Email/ SMS to lead engineer…. That new hire is over his head!
Alert Threshold 3: Email/SMS/ Phone call to YOU, put on your cape and save the day!
2: Set Priorities! Classify Your Systems Based on Importance
Not all systems are as critical as others. Identify the most important systems and be sure their alerting is a bit more sensitive than the others. To do this I suggest making a list of all of your systems, then placing into logical categories. Here are some example categories, that you definitely should adopt:
Category 1: I’m about to lose my job/client!!
Category 2: My boss is going to kick my butt….
Category 3: Here comes the phone calls…
Category 4: Forgot to power the server on last week… nobody noticed….
3: Never Allow a Single Point of Failure
On premise solutions are a single point of failure (who monitors the monitoring solution?). When SaaS monitoring tool provides more than 1 location, you should be using more than one location. Agent-based monitoring can monitor each device but they can also monitor fellow agents.
4: Know Your Audience
In a world where there are more cellphones in the world than there are people, knowing what media will get you and your team’s attention is key to a successful monitoring solution.
Monitoring tools that provide a wide range of alerting methods will ensure that, when that alert comes there will be someone there to hear it.
5: Test Your Monitoring Tool and Alert System
I suggest that you keep a few spare sub.domains around for testing. These can be handy when trying to verify your alerting and escalation rules.
6: Never Set Up an Email Filter for Your Alerts
Never set up email filters or rules for your alerts. If you have done this, then your system is not set up correctly. Please, go back to Tip 1. Automatic rules that process alerts to folders will result in an effect called “out of sight, out of mind”. This a quick way to ignore alerts, and increase downtime.
7: If Everything is Quiet, Something is Wrong…
Very few system have 100% Uptime. Downtime is sometimes unavoidable. Keep any eye on your monitoring tool. If you have not received an alert in a while, be sure that everything is still configured correctly.
8: Create a Process for How Alerts are Resolved
Creating a process on how to handle Alerts will allow for the quickest resolution and hold all parties accountable. Emailing 5 people in a group only causes confusion and more work.
9: Ask for Help
You are not alone when setting up a monitoring tool. Good vendors have support and technical staff that are there to assist you. Take advantage of their knowledge of the product and their experiences with other customers. Be direct and simply ask them to review your set up before it fails and you find out it was set up incorrectly.
10: Document Everything!!
It is crucial to document exactly how you have set up your monitoring tool and why. Make sure that this documentation can be editable in real-time and easily accessible for all team members.