paint-brush
How To Monitor a Forum for Keywords Using Python and AWS Lambdaby@sahin.kevin
1,167 reads
1,167 reads

How To Monitor a Forum for Keywords Using Python and AWS Lambda

by KevinMarch 5th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Using the Serverless framework, we can quickly create a CRON job with AWS Lambda and Python to check for keywords in a forum topic. We will use the very popular Python packages Requests and BeautifulSoup to parse the HTML code. We are going to monitor Keywords on IndieHackers.com a popular forum for bootstrapped founders. It's really easy with Slack, you just have to create an app to get a webhook URL as explained here. The deployment command: "voilà" for the code.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How To Monitor a Forum for Keywords Using Python and AWS Lambda
Kevin HackerNoon profile picture

While building ScrapingBee I'm always checking different forums everyday to help people about web scraping related questions and engage with the community.

This is very common for early stage startup. There are many benefits to engage with potential customers by answering their questions.

First you get to know them better, and it can give ideas for product development.

And then, you provide value and it make them trust you.

Some forums have a way to send you alerts about keyword of tags, others don't.

Today we are going to see how you can quickly create a CRON job with AWS Lambda and Python to check for keywords in a forum topic.

Prerequisites

In order to scaffold and deploy our project to AWS lambda, we will use the Serverless framework.

It's a great project that makes building/configuring your cloud functions really easy with a simple configuration file.

It handles many different clouds (AWS, Google Cloud, Azure...) and different languages.

Here are the instruction to install it: https://serverless.com/framework/docs/providers/aws/guide/quick-start/

We will use the very popular Python packages Requests and BeautifulSoup to parse the HTML code:

 pip install requests
 pip install beautifulsoup4
 pip freeze > requirements.txt

If we didn't use the Serverless framework, you would need to package the dependencies into a Zip and upload it to AWS.

Thanks to Serverless, we can use a plugin that will parse the requirements.txt file and automatically take care of packaging the dependencies into a Lambda Layer.

To do so:

 npm init

After accepting all the defaults, add this to your serverless.yml:


# serverless.yml
plugins:
  - serverless-python-requirements
custom:
  pythonRequirements:
    dockerizePip: non-linux

You can get more information about this here: https://serverless.com/blog/serverless-python-packaging/

Keyword monitoring

We are going to monitor Keywords on IndieHackers.com a popular forum for bootstrapped founders.

Here is a simple code, that will check all titles for "design":

import json
import requests
from bs4 import BeautifulSoup


def hello(event, context):
    base_url = "https://www.indiehackers.com/"
    r = requests.get(base_url)
    soup = BeautifulSoup(r.text, 'html.parser')

    matches = soup.select('a.feed-item__title-link')
    keyword = 'design'
    matching_links = []
    for i in matches:
        if keyword in i.text:
            matching_links.append(base_url + i.get('href'))

    response = {
        "statusCode": 200,
        "body": matching_links
    }

    return response

Now all we need to do is to send a Slack notification (or email ) when something matches our keyword.

It's really easy with Slack, you just have to create an app to get a webhook URL as explained here.

json = {"text": f"Found a topic matching the keyword on Indie Hackers: {matching_links}"}
slack_request = requests.post(
  WEBHOOK_URL, json=json, headers={"Content-Type": "application/json"}
)

And "voilà" for the code.

Deployment & invocation

In order to invoke your function:

serverless invoke -f hello --log

To automate the function invocation with a CRON job:

functions:
  hello:
    handler: handler.hello
    events:
      - schedule: rate(1 day)

There are different ways to write schedule expression with AWS, you can find a detailed article here

And now de the deployment command:

serverless deploy

And that's it, easy right?

I hope you liked this article, this was a little introduction to the serverless framework and how easy it is to build simple utility scripts like this.

If you like web scraping, I just wrote an article about the different web scraping tools available, don't hesitate to take a look.

Stay tuned for other blog posts about web scraping :)