How I Implemented Monitoring System to Monitor My Workplace's Servers

 It was a peaceful noon. No phone calls, white noises came from the air conditioner, eyes struggling to be opened thanks to bunch of lunch.

Then all of sudden the server was down. Phone turned into unlimited ringing, anxiety kicking in, fingers slamming keyboard frantically to check what happened.

Then after some chaos, the problem solved. Manager came in, asked 'What Happened? Why?'

None can explain clearly -- why? Of course, the lack of data!


 

Killing the comfort zone

Everyday can be a pretty peaceful day when working as a sysadmin, as long as the servers and networking devices are not making any ruckus. Even when they're causing chaos, and the problem are successfully cleared by my team, the peaceful day came back. This made my company stayed in the comfort zone of 'dealing with the problems when it come and going back to sleep after it is done'.


 Well, can't do a lot about that because we all busy doing any simple things anyway. Like resetting the printer or access point.

Okay I won't continue ranting about my workplace's jobdesk, but the best doctor said that preventing is always better than curing. And I am really eager to take harder jobs rather than just being a helpdesk!

To prevent any chaos coming, we need DATA. Why wait for the servers to hang because of the root filled 100% with logs when you can deploy a software to give you notification when it reached 95%? This is what we need -- a monitoring software.

I googled a lot and at first I finally implemented one -- Pandora FMS


The first attempt

Installed everything just fine, linked it into some servers, and done. Fancy graphics came out, the manager seemed to be pleased.

But not for me, I guess the software just doesn't suited up to my needs. It felt clunky, eating a lot of resources (maybe due to my lack of optimizations), small community support, etc etc

Disclaimer, this was on early stage -- I don't know how Pandora FMS perform now, because it was a really back then -- maybe when it's still on beta. Even now I browsed the website and it changed dramatically already.

So I was not amused, and googled for more. Then another one came out -- Zabbix.


The second attempt

To be honest, Zabbix also confused me. I think it had some steep learning curve, and I was still a newbie back then so it made sense. At first it also kinda overwhelmed me because of many items discovered, and I didn't understand what part or things they monitored.

But after some time of more research, I understood most of it. Thinking that it's going to waste more time if I hopped to another Monitoring Software again, I tried to stick with Zabbix and try to optimize it more.

Years passed, and I think I am pretty impressed. Now still using zabbix, optimizing it more and bring the strength out such as the timescale DB, notifications using Telegram, etc.


The benefits

It is obvious that we changed big time -- from the solving attitude to preventing attitude. This is I think the important part of being a sysadmin, the analysis for historical data from monitoring systems. It helped a lot, while maybe at first sound kinda lame.

Our servers went from some downs each months into maybe just once or no down at all. The cause of downtime can be quite simple -- and monitoring system picked them nicely, telling us the sysadmin to do the job.

Manager also can be pretty tamed by the fancy graphics came from the monitoring software, so it's quite the win-win. I just can't imagine nowadays running without being monitored. It's just that good.


 

Next steps

Preventing disaster is cool, but I decided to go more. This year, with monitoring software, I want to implement another monitoring software that will be used alongside Zabbix.

It looks like it will be Netdata. I need real-time server data to troubleshoot something that is missing on Zabbix (yes, Zabbix is kinda for historical and long run purpose).

And I want to make a performance and utilization dashboard, telling me the wasted RAM not used on all my servers, and probably it can be used for other purpose. If it could generate a monthly report then it will also be cool.