Inquisitive, student, teacher and a wanna be story teller. Working on https://learningpaths.io/
Monitor all your server infrastructure
While working on infrastructure set up for our product http://bit.ly/use-highlights I was looking for open source tools for server infrastructure monitoring which doesn’t cost a bomb and will help me customise as we go forward. Two major contenders were Nagios and Zabbix. I read a couple of posts comparing the two, but https://www.comparitech.com/net-admin/nagios-vs-zabbix was my favourite comparison as I felt that it was very objective and detailed. One major advantage I felt with Nagios was that you could upgrade to Nagios Nagios XI with the existing set up. But for our scenarios we felt that Zabbix was more than sufficient. So went ahead with it.
This is not going to be a detailed HowTo blog. But this will be summary of my learnings, links to relevant installation guides that worked for me, links to detailed TODO articles and my thumb rules.
There are three major components to Zabbix. Zabbix Server,Saving... Zabbix Agent and Zabbix Web Interface. Zabbix server is the one which collects all the relevant data from the server you want to monitor. The servers you want to monitor are called Agents. You can have the server monitoring without the Zabbix web part. But I would suggest using it as it makes the experience better.
If you just want to get a feel of the UI before installing anything, head out to https://zabbix.org/zabbix/index.php and “sign in as guest”. You will not have Configuration and Administration tabs but you can check out Monitoring, Inventory and Reports sections.
I will skip the details here because you can find the HOWTO articles from the internet. We are using Ubuntu Servers and the following articles by https://twitter.com/tecadmin were very handy.
By default the Zabbix Web Interface uses apache. If you wan’t to use Nginx in its place add the following configurations in your nginx.
Think everything you want to do for monitoring in terms of Host Groups, User Groups and Templates.
Zabbix Server : This is the main server that monitors everything. For installation follow this link. Generally the Web Interface is also installed on this server only. I personally feel(not really sure) that is is better to keep it separate from other servers which needs to be monitored. That way even if the actual server farm is down, we will be able to at-least monitor the downtimes and why they might have gone down.
Zabbix Agent : This is a piece of software that will help send the stags from the server being monitored to the Zabbix server. If you want to monitor Zabbix server as well you can install zabbix agent in that server as well.
Zabbix Host : Host is the server which you need to monitor. So if you need to monitor three different servers you install the Zabbix agent on all these servers.
Adding hosts in Zabbix Server : Once you have installed Zabbix server and Agent in respective servers you also need to add Hosts in the Zabbix Server.
Host Groups : A host group can be though of as categorisation or tagging. I feel that being liberal with the use of Host Groups can be advantageous. Following are some of the Host Groups I created #LearningPaths, #staging, #live, #database, #mongodb, #appserver, #search. These host groups will come very handy when you are using templates or creating actions.
Templates : Think of think as monitoring templates that you can apply to various host groups. As a rule of thumb I always apply templates to Host Groups. If ever feel like removing a host or applying template to only one host that is an indicator for me that I can rework the Host Groups or add a new host group which meets the current criteria. This has saved a lot of time for me over the time. I am mentioning Templates before other criteria as I feel that it is very important to think everything from the perspective of templates and groups.
We are planning to use the following templates for monitoring Highlights Servers.
Template OS Linux — This comes by default with Zabbix installation. This provides most of the parameters that are generally available in other monitoring services like CPU usage, CPU load, Memory usage etc. This is already configured and data is more than sufficient for server level monitoring. We changed the trigger for average CPU load as it was raising too many false alarms.
We are currently configuring the following two. If you had any luck with installing these two please leave a comment. I am able to get all the relevant data but I am not able to push it to the Zabbix server. I think it has something to do with trap settings. I am still figuring it out.
Zabbix MongoDB Template — https://github.com/omni-lchen/zabbix-mongodb
Zabbix Elastic Search Templates — https://github.com/zarplata/zabbix-agent-extension-elasticsearch
As I mentioned earlier I have found it useful to always think in terms of Templates and Host Groups. So if you are planning to create any new items/triggers make sure to add them in a relevant Template and then associate to a host group.
First create one item for every data point that you want to track. You can follow the steps from https://www.zabbix.com/documentation/3.4/manual/config/items/item
Then you can create the triggers based on these item values. You can created a trigger by following the steps from https://www.zabbix.com/documentation/3.4/manual/config/triggers/trigger
While the web interface is good I felt that for debugging it is better to use the console.
On monitoring server
Check that you can connect to the agent on port 10050.
telnet ip-of-your-agent 10050
apt install zabbix-get
zabbix_get -s ip-of-your-agent -k agent.ping zabbix_get -s ip-of-your-agent -k agent.version zabbix_get -s ip-of-your-agent -k agent.hostname
On servers/client that needs to be monitored
In basic configurations (Passive Agent) your monitoring server will ask for the data. So if your server can communicate with the agent that should be sufficient.
But if your using Active Agent mode then you need to make sure that your agent can connect to your monitoring server and can push data.
Check that you can connect to the server on port 10050.
telnet ip-of-your-server 10050
Check that zabbix-sender is installed if not install it.
sudo apt-get install zabbix-sender
Once Zabbix sender is installed you can run a command like
zabbix_sender -vv -z [serverIp] -p 10051 -s [clientName] -k traptest -o "Test value"
In active agent configuration please note that all the data that is pushed to server should be of Item type trapper . So in the above example you should have created a item of type trapper on server with the key traptest Also make sure that the data that you are sending is of the type specified while creating the item on the server.
The dashboard is customisable. So you can change it to have all the relevant problems listed there. The other feature I personally liked was screens. We added all the heartbeat graphs of our servers and we could track them cirtical data like this.
I think notifications are great in Zabbix as it is highly configurable. We used sendgrid with Zabbix. The default notifications in Zabbix UI didn’t work well for us. So we used the script route. We used the library https://github.com/mkgin/sendgrid_zabbix_alert The debugging of Zabbix notifications can be a little irritating. I have just created a draft here will update that when I get time.
What has your experience been? Are you using Zabbix or any other server monitoring tools? Do share your notes in comments.