Monitor all your server infrastructure Background While working on infrastructure set up for our product I was looking for open source tools for server infrastructure monitoring which doesn’t cost a bomb and will help me customise as we go forward. Two major contenders were Nagios and Zabbix. I read a couple of posts comparing the two, but was my favourite comparison as I felt that it was very objective and detailed. One major advantage I felt with Nagios was that you could upgrade to Nagios with the existing set up. But for our scenarios we felt that Zabbix was more than sufficient. So went ahead with it. http://bit.ly/use-highlights https://www.comparitech.com/net-admin/nagios-vs-zabbix Nagios XI This is not going to be a detailed HowTo blog. But this will be summary of my learnings, links to relevant installation guides that worked for me, links to detailed TODO articles and my thumb rules. Some basics before installation There are three major components to Zabbix. Zabbix Server,Saving... Zabbix Agent and Zabbix Web Interface. Zabbix server is the one which collects all the relevant data from the server you want to monitor. The servers you want to monitor are called Agents. You can have the server monitoring without the Zabbix web part. But I would suggest using it as it makes the experience better. If you just want to get a feel of the UI before installing anything, head out to and “sign in as guest”. You will not have Configuration and Administration tabs but you can check out Monitoring, Inventory and Reports sections. https://zabbix.org/zabbix/index.php Installation I will skip the details here because you can find the HOWTO articles from the internet. We are using Ubuntu Servers and the following articles by were very handy. https://twitter.com/tecadmin Installing the Zabbix Server — https://tecadmin.net/install-zabbix-on-ubuntu/ Installing Zabbix Agents - https://tecadmin.net/install-zabbix-agent-on-ubuntu-and-debian Adding Host in Zabbix Server to Monitor — https://tecadmin.net/add-host-zabbix-server-monitor By default the Zabbix Web Interface uses apache. If you wan’t to use Nginx in its place add the following configurations in your nginx. <a href="https://medium.com/media/a0aef49548c4c6f0923538919e0124fc/href">https://medium.com/media/a0aef49548c4c6f0923538919e0124fc/href</a> Thumb Rule Think everything you want to do for monitoring in terms of Host Groups, User Groups and Templates. Quick Glossary : This is the main server that monitors everything. For installation follow Generally the Web Interface is also installed on this server only. I personally feel(not really sure) that is is better to keep it separate from other servers which needs to be monitored. That way even if the actual server farm is down, we will be able to at-least monitor the downtimes and why they might have gone down. Zabbix Server this link. : This is a piece of software that will help send the stags from the server being monitored to the Zabbix server. If you want to monitor Zabbix server as well you can install zabbix agent in that server as well. Zabbix Agent : Host is the server which you need to monitor. So if you need to monitor three different servers you install the Zabbix agent on all these servers. Zabbix Host : Once you have installed Zabbix server and Agent in respective servers you also need to add Hosts in the Zabbix Server. Adding hosts in Zabbix Server A host group can be though of as categorisation or tagging. I feel that being liberal with the use of Host Groups can be advantageous. Following are some of the Host Groups I created #LearningPaths, #staging, #live, #database, #mongodb, #appserver, #search. These host groups will come very handy when you are using templates or creating actions. Host Groups : Think of think as monitoring templates that you can apply to various host groups. As a rule of thumb I always apply templates to Host Groups. If ever feel like removing a host or applying template to only one host that is an indicator for me that I can rework the Host Groups or add a new host group which meets the current criteria. This has saved a lot of time for me over the time. I am mentioning Templates before other criteria as I feel that it is very important to think everything from the perspective of templates and groups. Templates : We are planning to use the following templates for monitoring Servers. Highlights — This comes by default with Zabbix installation. This provides most of the parameters that are generally available in other monitoring services like CPU usage, CPU load, Memory usage etc. This is already configured and data is more than sufficient for server level monitoring. We changed the trigger for average CPU load as it was raising too many false alarms. Template OS Linux We are currently configuring the following two. If you had any luck with installing these two please leave a comment. I am able to get all the relevant data but I am not able to push it to the Zabbix server. I think it has something to do with trap settings. I am still figuring it out. Zabbix MongoDB Template — https://github.com/omni-lchen/zabbix-mongodb Zabbix Elastic Search Templates — https://github.com/zarplata/zabbix-agent-extension-elasticsearch Customizations As I mentioned earlier I have found it useful to always think in terms of Templates and Host Groups. So if you are planning to create any new items/triggers make sure to add them in a relevant Template and then associate to a host group. First create one item for every data point that you want to track. You can follow the steps from https://www.zabbix.com/documentation/3.4/manual/config/items/item Then you can create the triggers based on these item values. You can created a trigger by following the steps from https://www.zabbix.com/documentation/3.4/manual/config/triggers/trigger Debugging While the web interface is good I felt that for debugging it is better to use the console. On monitoring server Check that you can connect to the agent on port 10050. telnet ip-of-your-agent 10050 Using zabbix-get apt install zabbix-get zabbix_get -s ip-of-your-agent -k agent.ping zabbix_get -s ip-of-your-agent -k agent.version zabbix_get -s ip-of-your-agent -k agent.hostname On servers/client that needs to be monitored In basic configurations (Passive Agent) your monitoring server will ask for the data. So if your server can communicate with the agent that should be sufficient. But if your using Active Agent mode then you need to make sure that your agent can connect to your monitoring server and can push data. Check that you can connect to the server on port 10050. telnet ip-of-your-server 10050 Check that zabbix-sender is installed if not install it. sudo apt-get install zabbix-sender Once Zabbix sender is installed you can run a command like zabbix_sender -vv -z [serverIp] -p 10051 -s [clientName] -k traptest -o "Test value" In active agent configuration please note that all the data that is pushed to server should be of Item type trapper . So in the above example you should have created a item of type trapper on server with the key traptest Also make sure that the data that you are sending is of the type specified while creating the item on the server. Reports The dashboard is customisable. So you can change it to have all the relevant problems listed there. The other feature I personally liked was screens. We added all the heartbeat graphs of our servers and we could track them cirtical data like this. Notifications I think notifications are great in Zabbix as it is highly configurable. We used sendgrid with Zabbix. The default notifications in Zabbix UI didn’t work well for us. So we used the script route. We used the library The debugging of Zabbix notifications can be a little irritating. I have just created a will update that when I get time. https://github.com/mkgin/sendgrid_zabbix_alert draft here What has your experience been? Are you using Zabbix or any other server monitoring tools? Do share your notes in comments.
Share Your Thoughts