Hackernoon logoAssets Monitor as a Function by@uria-franko

Assets Monitor as a Function

Author profile picture

@uria-frankoUria Franko

In this article, I’ll show you how to create Assets Monitor with Python3.7 + Serverless lambda
My name is Uria Franko and I’m a free-lancer developer

Here is the plan

  • Setup Serverless framework
  • Get AWS credentials
  • Write backend of the Assets Monitor with python3.7
  • Deploy backend to Lambda
  • Schedule running

Before we start

In order to go through this tutorial, make sure you have:
  • Installed Python 3.7
  • Installed Node.js v6.5.0 or later
  • AWS account with admin rights (You can create one for free right here)

Step 1: Serverless framework

Let’s start from the beginning. In order to easily write and deploy our lambda we will use this awesome framework called Serverless. It’s written on NodeJS so we need npm to install it. Let’s go:
npm install -g serverless
After this, let’s create a new project from template:
serverless create --template aws-python3 --path my-assets-monitor
It will create a new folder my-assets-monitor with two files:
  1. handler.py  — a template for your Python code
  2. serverless.yml — configuration file

Step 2: AWS Credentials

This is the best part of this process, because once you’ve got credentials, you’ll never deal with AWS again. It’s super easy:
  • Specify username (something like “serverless-admin”) and choose only “Programmatic access” checkbox.
  • On the second page, choose “Attach existing policies directly” and look for “Administrator Access”.
  • Once you created the user, copy your “Access key ID” and “Secret access key”.
    This is what you actually need to continue.
    (Tip: save them where you can find them ;) )
Congrats. You’ve got your keys. Open your terminal in and execute:
serverless config credentials --provider aws --key xxxxxxxxxxxxxx --secret xxxxxxxxxxxxxx

Step 3: Write Assets Monitor Code

I’m not going to teach you how to write in Python, so just copy this code and paste to your `handler.py` file:
import json
import os
import sys
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urlparse

here = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(here, "./vendored"))

import requests
from bs4 import BeautifulSoup
from mailer import Mailer

mailer = Mailer(os.environ['TARGET_URL'], os.environ['SOURCE_EMAIL'], os.environ['DESTINATION_URL'],)
internal_urls = set()
external_urls = set()


def multi_threading(func, args, workers):
    with ThreadPoolExecutor(workers) as ex:
        res = ex.map(func, args)
    return list(res)


def is_valid(url):
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)


def check_status(url):
    global mailer
    resp = requests.get(url)
    if resp.status_code > 399:
        mailer.assets.append(url)


def request_url(url):
    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, "html.parser")
    except requests.exceptions.ConnectTimeout as err:
        errors = ['Connection timed out to your target']
        mailer.send_errors(errors)
        return False
    except requests.exceptions.ConnectionError as err:
        errors = [err]
        mailer.send_errors(errors)
        return False
    except requests.exceptions.HTTPError as err:
        errors = [f'Your target raised <strong>{response.status_code}</strong> status code']
        mailer.send_errors(errors)
        return False
    return soup


def get_all_website_links(url):
    urls = set()
    url_parsed = urlparse(url)
    domain_name = url_parsed.netloc
    soup = request_url(url)
    if soup is False:
        return False
    for a_tag in soup.findAll(["a", "link", "img", "script"]):
        source = 'src'
        if a_tag.name == "a" or a_tag.name == "link":
            source = 'href'
        href = a_tag.attrs.get(source)
        if href == "" or href is None or '#' in href:
            continue
        parsed_href = urlparse(href)
        if parsed_href.netloc == "":
            if href[0] == "/":
                href = url_parsed.scheme + "://" + domain_name + href
            else:
                href = url_parsed.scheme + "://" + domain_name + "/" + href
        if not is_valid(href):
            continue
        if href in internal_urls:
            continue
        if domain_name not in href:
            if href not in external_urls:
                external_urls.add(href)
            continue
        urls.add(href)
        internal_urls.add(href)
    return urls


def main(event, context):
    crawled_links = get_all_website_links(os.environ['TARGET_URL'])
    if crawled_links is False:
        response = {
            "statusCode": 500,
            "body": "Error raised trying to get the target"
        }
        return response
    multi_threading(check_status, crawled_links, 20)
    mailer.send_mail()
    response = {
        "statusCode": 200,
    }
    return response
Now create new file in the same directory and call it mailer.py, now copy this code and paste it in:
import boto3
import os


class Mailer:

    def __init__(self, base_url, source_email, target_email = None):
        aws_access_key_id = os.environ['AWS_KEY']
        aws_secret_access_key = os.environ['AWS_SECRET']
        self.client = boto3.client('ses', aws_access_key_id = aws_access_key_id,
                                   aws_secret_access_key = aws_secret_access_key,
                                   region_name = 'us-east-1')

        self.target_email = source_email
        if target_email is not None:
            self.target_email = target_email
        self.base_url = base_url
        self.source_email = source_email
        self.assets = []

    def send_mail(self):
        subject = "Assets Monitor Asset Failure"
        body = f"""
        <h2>There's a problem with one of your assets!</h2>
        <h4>Base URL: <a href={self.base_url}>{self.base_url}</a></h4>
        """
        for link in self.assets:
            body += f'<a href="{link}"><p>{link}</p></a><br>'
        self.send(subject, body)

    def send_errors(self, errors):
        subject = "Assets Monitor Website Failure"
        body = f"""
        <h3>There's a problem with your monitored website:</h3>
        <h4>{self.base_url}</h4>
        """
        for err in errors:
            body += f"<p>{err}</p><br>"
        self.send(subject, body)

    def send(self, subject, body):
        self.client.send_email(
            Source = self.source_email,
            Destination = {
                'ToAddresses': [
                    self.target_email,
                ],
            },
            Message = {
                'Subject': {
                    'Data': subject,
                    'Charset': 'UTF-8'
                },
                'Body': {
                    'Html': {
                        'Data': body,
                        'Charset': 'UTF-8'
                    }
                }
            },
            ReplyToAddresses = [
                self.source_email,
            ],
        )
If you are lazy enough, just fork or clone it from my GitHub repo.
Also, you will need to create requirements.txt file and write:
requests
bs4
and execute this command to install it locally:
pip install -r requirements.txt -t vendored

Step 4: Deploy to AWS Lambda

Pretty much like on Heroku, all you need is one configuration file “serverless.yml”. Go ahead and edit yours to make it look like this:
service: my-asset-monitor

provider:
  name: aws
  runtime: python3.7
  stage: dev
  region: us-east-1
  environment:
    AWS_KEY: ""
    AWS_SECRET: ""
    TARGET_URL: ""
    SOURCE_EMAIL: ""
    DESTINATION_EMAIL: ""



functions:
  post:
    handler: handler.main
    events:
      - schedule:
          name: asset-checker-schedule
          description: 'Schedule asset checking every 30 minutes'
          rate: rate(30 minutes)
Notice that you need to change
  • AWS_KEY -> Your AWS key that we created
  • AWS_SECRET -> Your AWS key that we created
  • TARGET_URL -> The website that you want to monitor (Ex: 'https://google.com')
  • SOURCE_EMAIL -> Your verified email that you can get/verify here
  • DESTINATION_URL -> The email that the notifications will be sent to
  • You can also change the rate(30 minutes) to what ever you want... you can find the rates syntax here

And the magic happens once you execute this in your terminal:
serverless deploy
It will pack all your files into .zip archive and upload to AWS, then it will create AWS CloudEvent to schedule and run the monitor automatically
The serverless backend of your monitor is now live and running.
Now if any asset or link in your website will be broken (Not found/ Server error / etc) you will get an email that's looks like that:
Also if there's any error accessing your target you will get an email like that:

Keep in mind

At the beginning it will cost you nothing, because The AWS Lambda free usage tier includes 400,000 GB-seconds of compute time per month.
But if you have any other lambda function or you multiply the monitors it can get to a point where they charge you...
Our current usage is 60/30 * 60 * 24 * 30 = 86,400 calls
each calls is between 1-5 seconds depends on the assets amount and a usage of 256MB
So 86,400 * 3 (Average) / 4 (GB-sec / 256MB = 3.9xxx) = 66,355 seconds of usage per month
Serverless approach is very convenient in some cases because it allows you to build very scalable solution. It also change your thinking paradigm about managing and paying for servers.
Serverless framework gives you very simple tool to deploy your function with no needing to know AWS or any other cloud provider.
AWS gives you nice free tier period for services so you can build your MVPs totally for free, go live and start paying only if you reach a certain amount of users.

Tags

The Noonification banner

Subscribe to get your daily round-up of top tech stories!