Monitoring your system is required. It helps you any issues before they cause any major downtime that effect your customers and damage your business reputation. It helps you also to based on the real usage of your system. But collecting metrics from different data sources isn’t enough, you need to your monitoring to meet your own business needs and define the right alerts so that any abnormal changes in the system will reported. detect plan growth personalize In this post, I will show you how to setup a resilient continuous monitoring platform with only open source projects & how to define an event alert to report changes in the system. Clone the following Github repository: git clone https://github.com/mlabouardy/terraform-aws-labs.git 1 — Terraform & AWS In the directory, update the file with your own credentials (make sure you have the right policies) : tick-stack/terraform variables.tfvars AWS IAM region = “AWS REGION” access_key = “YOUR AWS ACCESS KEY ID” secret_key = “YOUR AWS SECRET KEY” key_name = “YOUR SSH KEY PAIR” Issue the following command to download the AWS provider plugin: terraform init Issue the following command to provision the infrastructure: terraform apply — var-file=variables.tfvars 2 — Ansible & Docker Update the file with your instance DNS name: inventory [servers] ec2–52–206–156–244.compute-1.amazonaws.com Then, install the Ansible custom role: ansible-galaxy install mlabouardy.tick Execute the : Ansible Playbook ansible-playbook — private-key=aws.pem -i inventory playbook.yml Point your browser to , you should see : http://DNS_NAME:8083 InfluxDB Admin Dashboard Now, create an in ( ): InfluxDB Data Source Chronograf http://DNS_NAME:8888 Create a new Dashboard as follow: You can create multiple graphs to visualize different types of metrics: Note: For in depth details on how to create interactive & dynamic dashboards in check . Chronograf my previous tutorial You need to elaborate on the data collected to do something like alerting. So make sure to enable : Kapacitor Define a new alert to send a notification if the is higher than . Slack CPU utilization 70% To test it out, we need to generate some workload. For this case, I used : stress apt-get install stress Stressing the CPU: stress — cpu 4 — timeout 20s After few seconds, you should receive a notification. Slack