Assets Monitor as a Function

In this article, I’ll show you how to create Assets Monitor with Python3.7 + Serverless lambda

My name is Uria Franko and I’m a free-lancer developer

Here is the plan

Setup Serverless framework
Get AWS credentials
Write backend of the Assets Monitor with python3.7
Deploy backend to Lambda
Schedule running

Before we start

In order to go through this tutorial, make sure you have:

Installed Python 3.7
Installed Node.js v6.5.0 or later
AWS account with admin rights (You can create one for free right here)

Step 1: Serverless framework

Let’s start from the beginning. In order to easily write and deploy our lambda we will use this awesome framework called Serverless. It’s written on NodeJS so we need npm to install it. Let’s go:

npm install -g serverless

After this, let’s create a new project from template:

serverless create --template aws-python3 --path my-assets-monitor

It will create a new folder my-assets-monitor with two files:

handler.py — a template for your Python code
serverless.yml — configuration file

Step 2: AWS Credentials

This is the best part of this process, because once you’ve got credentials, you’ll never deal with AWS again. It’s super easy:

Specify username (something like “serverless-admin”) and choose only “Programmatic access” checkbox.

On the second page, choose “Attach existing policies directly” and look for “Administrator Access”.

Once you created the user, copy your “Access key ID” and “Secret access key”.
This is what you actually need to continue.
(Tip: save them where you can find them ;) )

Congrats. You’ve got your keys. Open your terminal in and execute:

serverless config credentials --provider aws --key xxxxxxxxxxxxxx --secret xxxxxxxxxxxxxx

Step 3: Write Assets Monitor Code

I’m not going to teach you how to write in Python, so just copy this code and paste to your `handler.py` file:

import json
import os
import sys
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urlparse

here = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(here, "./vendored"))

import requests
from bs4 import BeautifulSoup
from mailer import Mailer

mailer = Mailer(os.environ['TARGET_URL'], os.environ['SOURCE_EMAIL'], os.environ['DESTINATION_URL'],)
internal_urls = set()
external_urls = set()


def multi_threading(func, args, workers):
    with ThreadPoolExecutor(workers) as ex:
        res = ex.map(func, args)
    return list(res)


def is_valid(url):
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)


def check_status(url):
    global mailer
    resp = requests.get(url)
    if resp.status_code > 399:
        mailer.assets.append(url)


def request_url(url):
    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, "html.parser")
    except requests.exceptions.ConnectTimeout as err:
        errors = ['Connection timed out to your target']
        mailer.send_errors(errors)
        return False
    except requests.exceptions.ConnectionError as err:
        errors = [err]
        mailer.send_errors(errors)
        return False
    except requests.exceptions.HTTPError as err:
        errors = [f'Your target raised <strong>{response.status_code}</strong> status code']
        mailer.send_errors(errors)
        return False
    return soup


def get_all_website_links(url):
    urls = set()
    url_parsed = urlparse(url)
    domain_name = url_parsed.netloc
    soup = request_url(url)
    if soup is False:
        return False
    for a_tag in soup.findAll(["a", "link", "img", "script"]):
        source = 'src'
        if a_tag.name == "a" or a_tag.name == "link":
            source = 'href'
        href = a_tag.attrs.get(source)
        if href == "" or href is None or '#' in href:
            continue
        parsed_href = urlparse(href)
        if parsed_href.netloc == "":
            if href[0] == "/":
                href = url_parsed.scheme + "://" + domain_name + href
            else:
                href = url_parsed.scheme + "://" + domain_name + "/" + href
        if not is_valid(href):
            continue
        if href in internal_urls:
            continue
        if domain_name not in href:
            if href not in external_urls:
                external_urls.add(href)
            continue
        urls.add(href)
        internal_urls.add(href)
    return urls


def main(event, context):
    crawled_links = get_all_website_links(os.environ['TARGET_URL'])
    if crawled_links is False:
        response = {
            "statusCode": 500,
            "body": "Error raised trying to get the target"
        }
        return response
    multi_threading(check_status, crawled_links, 20)
    mailer.send_mail()
    response = {
        "statusCode": 200,
    }
    return response

Now create new file in the same directory and call it mailer.py, now copy this code and paste it in:

import boto3
import os


class Mailer:

    def __init__(self, base_url, source_email, target_email = None):
        aws_access_key_id = os.environ['AWS_KEY']
        aws_secret_access_key = os.environ['AWS_SECRET']
        self.client = boto3.client('ses', aws_access_key_id = aws_access_key_id,
                                   aws_secret_access_key = aws_secret_access_key,
                                   region_name = 'us-east-1')

        self.target_email = source_email
        if target_email is not None:
            self.target_email = target_email
        self.base_url = base_url
        self.source_email = source_email
        self.assets = []

    def send_mail(self):
        subject = "Assets Monitor Asset Failure"
        body = f"""
        <h2>There's a problem with one of your assets!</h2>
        <h4>Base URL: <a href={self.base_url}>{self.base_url}</a></h4>
        """
        for link in self.assets:
            body += f'<a href="{link}"><p>{link}</p></a><br>'
        self.send(subject, body)

    def send_errors(self, errors):
        subject = "Assets Monitor Website Failure"
        body = f"""
        <h3>There's a problem with your monitored website:</h3>
        <h4>{self.base_url}</h4>
        """
        for err in errors:
            body += f"<p>{err}</p><br>"
        self.send(subject, body)

    def send(self, subject, body):
        self.client.send_email(
            Source = self.source_email,
            Destination = {
                'ToAddresses': [
                    self.target_email,
                ],
            },
            Message = {
                'Subject': {
                    'Data': subject,
                    'Charset': 'UTF-8'
                },
                'Body': {
                    'Html': {
                        'Data': body,
                        'Charset': 'UTF-8'
                    }
                }
            },
            ReplyToAddresses = [
                self.source_email,
            ],
        )

If you are lazy enough, just fork or clone it from my GitHub repo.

Also, you will need to create requirements.txt file and write:

requests
bs4

and execute this command to install it locally:

pip install -r requirements.txt -t vendored

Step 4: Deploy to AWS Lambda

Pretty much like on Heroku, all you need is one configuration file “serverless.yml”. Go ahead and edit yours to make it look like this:

service: my-asset-monitor

provider:
  name: aws
  runtime: python3.7
  stage: dev
  region: us-east-1
  environment:
    AWS_KEY: ""
    AWS_SECRET: ""
    TARGET_URL: ""
    SOURCE_EMAIL: ""
    DESTINATION_EMAIL: ""



functions:
  post:
    handler: handler.main
    events:
      - schedule:
          name: asset-checker-schedule
          description: 'Schedule asset checking every 30 minutes'
          rate: rate(30 minutes)

Notice that you need to change

AWS_KEY -> Your AWS key that we created
AWS_SECRET -> Your AWS key that we created
TARGET_URL -> The website that you want to monitor (Ex: 'https://google.com')
SOURCE_EMAIL -> Your verified email that you can get/verify here
DESTINATION_URL -> The email that the notifications will be sent to
You can also change the rate(30 minutes) to what ever you want... you can find the rates syntax here

And the magic happens once you execute this in your terminal:

serverless deploy

It will pack all your files into .zip archive and upload to AWS, then it will create AWS CloudEvent to schedule and run the monitor automatically

The serverless backend of your monitor is now live and running.

Now if any asset or link in your website will be broken (Not found/ Server error / etc) you will get an email that's looks like that:

Also if there's any error accessing your target you will get an email like that:

Keep in mind

At the beginning it will cost you nothing, because The AWS Lambda free usage tier includes 400,000 GB-seconds of compute time per month.
But if you have any other lambda function or you multiply the monitors it can get to a point where they charge you...

Our current usage is 60/30 * 60 * 24 * 30 = 86,400 calls
each calls is between 1-5 seconds depends on the assets amount and a usage of 256MB

So 86,400 * 3 (Average) / 4 (GB-sec / 256MB = 3.9xxx) = 66,355 seconds of usage per month

Serverless approach is very convenient in some cases because it allows you to build very scalable solution. It also change your thinking paradigm about managing and paying for servers.

Serverless framework gives you very simple tool to deploy your function with no needing to know AWS or any other cloud provider.

AWS gives you nice free tier period for services so you can build your MVPs totally for free, go live and start paying only if you reach a certain amount of users.

Assets Monitor as a Function

Too Long; Didn't Read

People Mentioned

Companies Mentioned

Here is the plan

Before we start

Step 1: Serverless framework

Step 2: AWS Credentials

Step 3: Write Assets Monitor Code

Step 4: Deploy to AWS Lambda

Keep in mind

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Assets Monitor as a Function

Too Long; Didn't Read

People Mentioned

Companies Mentioned

Here is the plan

Before we start

Step 1: Serverless framework

Step 2: AWS Credentials

Step 3: Write Assets Monitor Code

Step 4: Deploy to AWS Lambda

Keep in mind

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps