Generic server monitoring with Monitis & M3
Had I been told to monitor a cat chasing a mouse with Monitis, my answer would have been – “Yes, it’s probably possible”.
With the not-so-recent addition of M3 to the arsenal of monitoring tools Monitis can utilize, it is possible to monitor anything. However this alone is far from being enough. Smart implementations of proper applicative monitoring is what should be practiced.
This whole article is a port of Konstantin Kosenkov’s hard work into M3. Most of the credit should go to him. Konstantin managed to design and implement a few very interesting counters in a PostgreSQL (PgSQL in short) installation.
PostgreSQL in a nutshell
PgSQL is widely popular and we at Monitis decided it’s time to build a comprehensive set of monitors in order to provide proper server monitoring and increased uptime if you are using PgSQL.
However if you are not a PgSQL user, I’d still suggest to continue reading this article as the monitors and counters suggested here are invaluable for any database installation and not necessarily PgSQL.
In the next few paragraphs I’ll outline the counters we’ve chosen to represent, one by one.
Operating system counters
Please have this M3 configuration file open while continuing to read as it makes things a lot clearer.
First two monitors implemented are not PgSQL counters. However provide proper a nice overview of what’s going on in the system.
The first three counters are showing the CPU utilization while the fourth shows the load average.
Next two counters in the next monitor show read and write requests per second to the PgSQL device. Combine both of them and we get what we call – IOPS – Input/Output Operations Per Second.
Just looking at these two monitors in Monitis should give you a good idea whether the system is or not overloaded.
TPS – Transactions per second
TPS can give you a good idea as to how busy your system is and how fast your business and application is growing, together with the “busy hours” of your business.
Another idea is to monitor your TPS and think of ways to offload these transactions from your DB.
TPS is measured with a very simple query. M3 will query the total number of transactions, then query it again and calculate the difference over that time.
Cache Hit Ratio & Commit Ratio
Every DB implementation strives to load as much data as it can into memory (RAM). But not always it can load all of the data into memory.
Whenever a query is made, if the query was served from data which resided in the DB memory – it’ll be called a cache hit. If the DB will have to fetch the data from a disk, it’ll be called a cache miss.
The ‘Hit Ratio’ counter will show you how well your DB is doing in caching data. A higher value means you are doing well – meaning the DB barely has to access the disk – meaning queries are returned much quicker.
Commit ratio on the other hand shows you how many commits (writes) are actually successful in the PgSQL instance. Anything lower than 80%-90% might point at a serious problem.
Database I/O – Cache hits vs. Blocks read
We feature two counters under this monitor:
- Blocks read by PgSQL
- Cache hits by PgSQL
These two counters were coupled in another monitor to give you a better view about the ‘Cache Hit Ratio’ counter we introduced before.
Dividing cache hits by blocks read should give you the cache hit ratio. The close the cache hit number is to the number of blocks read – the better your PgSQL instance is performing – meaning it fetches less from the disk and more from memory.
User connections vs. Max user connections
This monitor is very self-explanatory. We’ll display the number of currently connected users vs. the maximum users number the PgSQL is able to handle.
Memory & disk footprint
Both these monitors are very simple yet show you how much your PgSQL instance occupies from your whole system.
Memory counters will show a total on the server, used and used by PgSQL.
Same goes for disk – total, used and used by PgSQL.
Configuring alerts in Monitis is easy. Configuring these alerts will ensure a better uptime for your server. Be the first to know if you run out of disk or if your PgSQL instance is in need for more RAM.
How does it work?
Check out M3 from here.
Edit M3Templates.pm – filling in the correct DB you want to monitor, the device it resides on and your API and secret key. Then simply invoke:
# ./AddMonitors.pl pgsql_statistics.xml # ./TimerRun.pl pgsql_statistics.xml
It’s running – login to Monitis, sit back and enjoy the statistics gathering!