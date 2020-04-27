Assets Monitor as a Function

In this article, I’ll show you how to create Assets Monitor with Python3.7 + Serverless lambda

Here is the plan

Setup Serverless framework

Get AWS credentials

Write backend of the Assets Monitor with python3.7

with python3.7 Deploy backend to Lambda

Schedule running

Before we start

In order to go through this tutorial, make sure you have:

Installed Python 3.7

Installed Node.js v6.5.0 or later

AWS account with admin rights (You can create one for free right here)



Step 1: Serverless framework

Let’s start from the beginning. In order to easily write and deploy our lambda we will use this awesome framework called Serverless . It’s written on NodeJS so we need npm to install it. Let’s go:

npm install -g serverless

After this, let’s create a new project from template:

serverless create --template aws-python3 --path my-assets-monitor

It will create a new folder my-assets-monitor with two files:

handler.py — a template for your Python code serverless.yml — configuration file





Step 2: AWS Credentials

This is the best part of this process, because once you’ve got credentials, you’ll never deal with AWS again. It’s super easy:

Log in to your AWS Console, go to My Security Credentials > User and click on “Add User” blue button.

Specify username (something like “serverless-admin”) and choose only “Programmatic access” checkbox.

On the second page, choose “Attach existing policies directly” and look for “Administrator Access”.

Once you created the user, copy your “Access key ID” and “Secret access key”.

This is what you actually need to continue.

(Tip: save them where you can find them ;) )

Congrats. You’ve got your keys. Open your terminal in and execute:

serverless config credentials --provider aws --key xxxxxxxxxxxxxx --secret xxxxxxxxxxxxxx

Step 3: Write Assets Monitor Code

I’m not going to teach you how to write in Python, so just copy this code and paste to your `handler.py` file:

import json import os import sys from concurrent.futures import ThreadPoolExecutor from urllib.parse import urlparse here = os.path.dirname(os.path.realpath(__file__)) sys.path.append(os.path.join(here, "./vendored" )) import requests from bs4 import BeautifulSoup from mailer import Mailer mailer = Mailer(os.environ[ 'TARGET_URL' ], os.environ[ 'SOURCE_EMAIL' ], os.environ[ 'DESTINATION_URL' ],) internal_urls = set() external_urls = set() def multi_threading (func, args, workers) : with ThreadPoolExecutor(workers) as ex: res = ex.map(func, args) return list(res) def is_valid (url) : parsed = urlparse(url) return bool(parsed.netloc) and bool(parsed.scheme) def check_status (url) : global mailer resp = requests.get(url) if resp.status_code > 399 : mailer.assets.append(url) def request_url (url) : try : response = requests.get(url, timeout= 5 ) response.raise_for_status() soup = BeautifulSoup(response.content, "html.parser" ) except requests.exceptions.ConnectTimeout as err: errors = [ 'Connection timed out to your target' ] mailer.send_errors(errors) return False except requests.exceptions.ConnectionError as err: errors = [err] mailer.send_errors(errors) return False except requests.exceptions.HTTPError as err: errors = [ f'Your target raised <strong> {response.status_code} </strong> status code' ] mailer.send_errors(errors) return False return soup def get_all_website_links (url) : urls = set() url_parsed = urlparse(url) domain_name = url_parsed.netloc soup = request_url(url) if soup is False : return False for a_tag in soup.findAll([ "a" , "link" , "img" , "script" ]): source = 'src' if a_tag.name == "a" or a_tag.name == "link" : source = 'href' href = a_tag.attrs.get(source) if href == "" or href is None or '#' in href: continue parsed_href = urlparse(href) if parsed_href.netloc == "" : if href[ 0 ] == "/" : href = url_parsed.scheme + "://" + domain_name + href else : href = url_parsed.scheme + "://" + domain_name + "/" + href if not is_valid(href): continue if href in internal_urls: continue if domain_name not in href: if href not in external_urls: external_urls.add(href) continue urls.add(href) internal_urls.add(href) return urls def main (event, context) : crawled_links = get_all_website_links(os.environ[ 'TARGET_URL' ]) if crawled_links is False : response = { "statusCode" : 500 , "body" : "Error raised trying to get the target" } return response multi_threading(check_status, crawled_links, 20 ) mailer.send_mail() response = { "statusCode" : 200 , } return response

Now create new file in the same directory and call it mailer.py, now copy this code and paste it in:

import boto3 import os class Mailer : def __init__ (self, base_url, source_email, target_email = None) : aws_access_key_id = os.environ[ 'AWS_KEY' ] aws_secret_access_key = os.environ[ 'AWS_SECRET' ] self.client = boto3.client( 'ses' , aws_access_key_id = aws_access_key_id, aws_secret_access_key = aws_secret_access_key, region_name = 'us-east-1' ) self.target_email = source_email if target_email is not None : self.target_email = target_email self.base_url = base_url self.source_email = source_email self.assets = [] def send_mail (self) : subject = "Assets Monitor Asset Failure" body = f""" <h2>There's a problem with one of your assets!</h2> <h4>Base URL: <a href= {self.base_url} > {self.base_url} </a></h4> """ for link in self.assets: body += f'<a href=" {link} "><p> {link} </p></a><br>' self.send(subject, body) def send_errors (self, errors) : subject = "Assets Monitor Website Failure" body = f""" <h3>There's a problem with your monitored website:</h3> <h4> {self.base_url} </h4> """ for err in errors: body += f"<p> {err} </p><br>" self.send(subject, body) def send (self, subject, body) : self.client.send_email( Source = self.source_email, Destination = { 'ToAddresses' : [ self.target_email, ], }, Message = { 'Subject' : { 'Data' : subject, 'Charset' : 'UTF-8' }, 'Body' : { 'Html' : { 'Data' : body, 'Charset' : 'UTF-8' } } }, ReplyToAddresses = [ self.source_email, ], )

If you are lazy enough, just fork or clone it from my GitHub repo

Also, you will need to create requirements.txt file and write:

requests bs4

and execute this command to install it locally:

pip install -r requirements .txt -t vendored

Step 4: Deploy to AWS Lambda

Pretty much like on Heroku, all you need is one configuration file “serverless.yml”. Go ahead and edit yours to make it look like this:

service: my-asset-monitor provider: name: aws runtime: python3 .7 stage: dev region: us-east -1 environment: AWS_KEY: "" AWS_SECRET: "" TARGET_URL: "" SOURCE_EMAIL: "" DESTINATION_EMAIL: "" functions: post: handler: handler.main events: - schedule: name: asset-checker-schedule description: 'Schedule asset checking every 30 minutes' rate: rate( 30 minutes)

Notice that you need to change

AWS_KEY -> Your AWS key that we created

AWS_SECRET -> Your AWS key that we created

TARGET_URL -> The website that you want to monitor (Ex: 'https://google.com')

SOURCE_EMAIL -> Your verified email that you can get/verify here

DESTINATION_URL -> The email that the notifications will be sent to

You can also change the rate(30 minutes) to what ever you want... you can find the rates syntax here





And the magic happens once you execute this in your terminal:

serverless deploy

It will pack all your files into .zip archive and upload to AWS, then it will create AWS CloudEvent to schedule and run the monitor automatically

The serverless backend of your monitor is now live and running.

Now if any asset or link in your website will be broken (Not found/ Server error / etc) you will get an email that's looks like that:

Also if there's any error accessing your target you will get an email like that:

Keep in mind

At the beginning it will cost you nothing, because The AWS Lambda free usage tier includes 400,000 GB-seconds of compute time per month.

But if you have any other lambda function or you multiply the monitors it can get to a point where they charge you...

Our current usage is 60/30 * 60 * 24 * 30 = 86,400 calls

each calls is between 1-5 seconds depends on the assets amount and a usage of 256MB

So 86,400 * 3 (Average) / 4 (GB-sec / 256MB = 3.9xxx) = 66,355 seconds of usage per month

Serverless approach is very convenient in some cases because it allows you to build very scalable solution. It also change your thinking paradigm about managing and paying for servers.

Serverless framework gives you very simple tool to deploy your function with no needing to know AWS or any other cloud provider.

AWS gives you nice free tier period for services so you can build your MVPs totally for free, go live and start paying only if you reach a certain amount of users.

