Hello everyone, in the , I started describing my experience of developing a Python Telegram bot for checking the operability and monitoring of my service located on a remote server. In a nutshell, when you're working on a pet project or even some work tasks, you might want to have all the current system status information at hand (I particularly like the ability to see everything and manage it through a Telegram bot) without spending a lot of time on development. previous article In the previous part, we looked at a way to get instant metrics on demand. In this part, we will be doing simple alerting, i.e., receiving a message in the bot when the system starts to fail. In the next third part, we will cover the case of collecting analytics and receiving online charts. As in the previous part, the example will be based on a real task, but I will be marking those places in the code that you can change to your logic so that the main part of the example can be reused. In this case, I need to get an alert in the form of a message in the Telegram bot in the event that the node I'm interested in loses network connection (or for some other reason), but its last synchronized block is lagging behind the network's last block. As before, we first need to set up the virtual environment: cd ~ virtualenv -p python3.8 up_env # creating an environment source ~/up_env/bin/activate # activating the environment and install the necessary dependencies: pip install python-telegram-bot pip install "python-telegram-bot[job-queue]" --pre pip install --upgrade python-telegram-bot==13.6.0 # the code was written in the times before version 20, so here the version is explicitly specified pip install numpy # needed for the median value function pip install web3 # needed for requests to nodes (replace with what you need) The file does not undergo changes in this case and remains the same as in the previous part: functions.py import numpy as np import multiprocessing from web3 import Web3 # add those libraries needed for your task # Helper function that checks a single node def get_last_block_once(rpc): try: w3 = Web3(Web3.HTTPProvider(rpc)) block_number = w3.eth.block_number if isinstance(block_number, int): return block_number else: return None except Exception as e: print(f'{rpc} - {repr(e)}') return None # Main function to check the status of the service that will be called def check_service(): # pre-prepared list of reference nodes # for any network, it can be found on the website https://chainlist.org/ list_of_public_nodes = [ 'https://polygon.llamarpc.com', 'https://polygon.rpc.blxrbdn.com', 'https://polygon.blockpi.network/v1/rpc/public', 'https://polygon-mainnet.public.blastapi.io', 'https://rpc-mainnet.matic.quiknode.pro', 'https://polygon-bor.publicnode.com', 'https://poly-rpc.gateway.pokt.network', 'https://rpc.ankr.com/polygon', 'https://polygon-rpc.com' ] # parallel processing of requests to all nodes with multiprocessing.Pool(processes=len(list_of_public_nodes)) as pool: results = pool.map(get_last_block_once, list_of_public_nodes) last_blocks = [b for b in results if b is not None and isinstance(b, int)] # define the maximum and median value of the current block med_val = int(np.median(last_blocks)) max_val = int(np.max(last_blocks)) # determine the number of nodes with the maximum and median value med_support = np.sum([1 for x in last_blocks if x == med_val]) max_support = np.sum([1 for x in last_blocks if x == max_val]) return max_val, max_support, med_val, med_support Now let's look at the main bot file . Since in different tasks you may only need alerting or only request instant values, I don't build a bot that can do everything at once, but instead I divide this functionality into different small examples. In this case, the main bot file code will only include alerting, but you can combine everything you need into one bot. alert_bot.py So, we import libraries and functions from the file above and set the necessary constants: import telegram from telegram.ext import Updater from functions import get_last_block_once, check_service The address of the node, the state of which I'm tracking (also a public node in this case) OBJECT_OF_CHECKING = 'https://polygon-mainnet.chainstacklabs.com' Threshold for highlighting critical lag THRESHOLD = 5 Your Telegram account ID. The easiest way to find out is through the @chatIDrobot bot USER_ID = 123456789 Next, we describe a function that will be called regularly by the timer: def check_for_alert(context): # Call of the main function to check the network state max_val, max_support, med_val, med_support = check_service() # Call of the function to check the state of the inspected node last_block = get_last_block_once(OBJECT_OF_CHECKING) # Forming a message to be sent to Telegram message = "" # Information about the state of nodes in the external network (median, maximum, and number of nodes) message += f"Public median block number {med_val} (on {med_support}) RPCs\n" message += f"Public maximum block number +{max_val - med_val} (on {max_support}) PRCs\n" # this variable will store the decision whether to send an alert # in case the node is lagging or didn't respond to_send = False # state check if last_block is not None: out_text = str(last_block - med_val) if last_block - med_val < 0 else '+' + str(last_block - med_val) # Comparison with the threshold if abs(last_block - med_val) > THRESHOLD: to_send = True message += f"The node block number shift ⚠️<b>{out_text}</b>⚠️" else: message += f"The node block number shift {out_text}" else: # Handling the exception if the node didn't respond to_send = True message += f"The node has ⚠️<b>not responded</b>⚠️" # triggering the alert and sending a message to the user if to_send: context.bot.send_message(chat_id=USER_ID, text=message, parse_mode="HTML") Next, you only have to write the part where the bot is initialized and a regular job checking the node's state is connected to it: # Your Telegram bot token obtained through BotFather token = "xxx" # creating a bot instance bot = telegram.Bot(token=token) updater = Updater(token=token, use_context=True) dispatcher = updater.dispatcher job_queue = updater.job_queue # Here, the interval variable (in seconds) sets the frequency # of launching the check - in this case, every 10 minutes job_queue.run_repeating(check_for_alert, interval=10.0 * 60.0, first=0.0) #bot start updater.start_polling() Next, the code can be run on any VPS server through: source ~/up_env/bin/activate python uptime_bot.py Having previously configured the systemd unit-file. As a result, the bot operation will look as follows - if an alert has been triggered, I receive a message with the problem information: In the following article, I will describe how to implement the remaining task: Retrieve graphs based on the request, showing how everything has been progressing over the past X hours/days. It will consist of two parts: a script to log events triggered by cron and a bot that collects graphs from the logs based on user requests. The source code of the project is available in the GitHub . If you found this tutorial helpful, feel free to star it on GitHub, I would appreciate it🙂 repository Also published . here