Start capturing website user data in 5 minutes or less with no developer resources or coding experience needed.
With Amazon Web Services’ recent release of Amazon Personalize to general availability, we at Data in the Raw can now share what our data ninjas have been working on. The simplicity and speed of Data in the Raw coupled with the ease of Amazon Personalize now allows users of both platforms to build world-class recommendation engines powered by
the same algorithms used on Amazon.com.
Using minimal code (trust us, we will show you below!) and at a fraction of the cost of other platforms you can start providing impactful recommendations to your users today.
AWS and Data in the Raw remove the historical complexity and overhead of personalization platforms to make implementation fast and ROI immediate.
Data Scientists and Analysts have struggled alongside marketing counterparts for years to build recommendation engines that directly connect users to product or content. Most U.S. internet users agree that relevant content from brands increases their purchase intent, yet the challenges and costs of providing relevant recommendations to users seem insurmountable to marketers.
Advancements at AWS and innovation from the recent upstart Data in the Raw completely changes the game for building recommendation platforms while owning your own data.
Data in the Raw is a proven and preferred partner for many websites looking to own their user data and self-service big data pipeline. At a fraction of the cost of other platforms, Data in the Raw leverages the latest and most sophisticated technologies so our clients can focus on using data to drive decisions and not worry about infrastructure or security.
For more information on Amazon Personalize check out Amazon Personalize and as always to find out more about Data in the Raw checkout </[email protected]> to sign up for a free trial and start capturing data today.
All signed up for Data in the Raw? Let’s go through the tutorial on how to build your first personalization campaign using Data in the Raw and Amazon Personalize.
To familiarize yourself with the process of creating personalization campaigns, our Data in the Raw team recommends you start by following the “Getting Started” documentation for Amazon Personalize using their demo data.
After you know the basics of Amazon Personalize you can begin building a personalized recommendation system for your website utilizing your own Data in the Raw clickstream data.
We will start by defining a different schema in step 1.8 of the Amazon Personalize “Getting Started” documentation. Instead of the default user-item interaction schema suggested by AWS we will modify slightly with the schema below.
{
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
},
{
"name": "EVENT_TYPE",
"type": "string"
}
],
"version": "1.0"
}
Using this schema allows Data in the Raw clients to ingest data directly from their own data streams. USER_ID should correspond to the Data in the Raw User variable if you have implemented user authentication or MachineId if you have not implemented user authentication. ITEM_ID most commonly corresponds to the Data in the Raw PageLabel variable while EVENT_TYPE most commonly corresponds to the PageType variable. Lastly TIMESTAMP would correspond to CreatedAt. For the purposes of this demo we will assume you have correctly added your PageLabel and PageType variables in your Data in the Raw implementation. For more information and a how-to about PageLabel and PageType visit our documentation page after registering.
After you have your data imported you will be able to create a solution and campaign that is tailored to your individual customers using the steps in the Amazon Personalize “Getting Started” documentation. Our recommendation is that you start with the HRRN algorithm offered in AWS Personalize since this algorithm is based only on clickstream data and the scope of these tutorials.
Now that you have followed the first tutorial and created a recommendation model based on your own site’s clickstream data, what’s next? A recommender with no new data is pretty boring right? What happens next is that you use Data in the Raw and AWS Lambda to produce real-time recommendations for your users faster, easier, and cheaper than any other platform and with the scale, reliability and security of Amazon Web Services.
To start sending data to Amazon Personalize you have to create a tracking ID. Simply go to Amazon Personalize > Dataset groups > Your data set group > Event trackers and follow the basic steps. Super simple and you are finished. Since we have completely removed the need for an SDK or any development work this completes the required configurations of Amazon Personalize.
Now you will use a simple AWS Lambda function that will pull data out of your real-time Data in the Raw pipe and feed directly into your Personalize endpoint. Managing data with a Lambda function allows you to enhance, filter or modify data in multiple ways while ensuring sub-millisecond latency.
Our Data in the Raw team demonstrates this example using Python 3, but you can build your function in any supported Lambda language. With Python having so many great packages and the support of Boto3, this solution is a no-brainer for Data Ninjas everywhere.
However, while Boto3 is natively supported in AWS Lambda inline functions, AWS has been historically slow at updating Lambda to the most recent Boto3 build. As such you will not be able to use the inline functionality offered by AWS Lambda. Never fear though, Data in the Raw offers you the packaged Lambda function here to follow along.
After you have downloaded the zip folder, all the code you will need is in the personalizeStream.py file. Let’s review the Python code now. For those of you familiar with Python and more specifically AWS Lambda packaging of Python, feel free to skip forward to the configuration steps.
# Here we are just importing some standard Python packages.
# Not all of these are used in this tutorial but as you explore more
# will be very handy so we went ahead and packaged them for you.
import json
import boto3
import time
import os
import csv
# Here we are starting to build our standard Lambda function.
# AWS provides multiple tutorials on how to build Lambda functions
# so we will only cover the highest level here.
def lambda_handler(event, context):
count=0
# We will define boto3 resource to access your S3 bucket
# and we will define a boto3 client to access the Personalize endpoint.
# You will see several environment variables defined with the os.environ syntax.
# We will talk about these variables in the how to.
s3 = boto3.resource(
's3',
aws_access_key_id = os.environ['aws_access_key_id'],
aws_secret_access_key = os.environ['aws_secret_access_key'],
region_name='us-west-2'
)
personalize_events = boto3.client(
service_name='personalize-events',
aws_access_key_id = os.environ['aws_access_key_id'],
aws_secret_access_key = os.environ['aws_secret_access_key'],
region_name='us-west-2'
)
# Here we are providing two S3 objects.
# The first is the one you will use to automate your data stream when using the S3
# file created trigger.
# The second is one you can use to test your function if you want to specify an S3 file.
# If you are not familiar with the Lambda os.environ variables the first S3 object allows
# us to capture the file location and name of the file that was created and triggered our
# Lambda (the data we want to process to send).
# Finally, we read the data into an object to process and send to Amazon Personalize.
content_object = s3.Object(os.environ['bucket_id'],event['Records'][0]['s3']['object']['key'])
#content_object = s3.Object(os.environ['bucket_id'], 'bucket/file')
file_content = content_object.get()['Body'].read().decode('utf-8')
# This is a simple data format string to ensure we send data to Personalize
# in the format we specified.
p='%Y-%m-%d %H:%M:%S.%f'
# This is the Python function that we define to send data to Amazon Personalize.
# As you continue to build out the data you send you will be looking to add
# data points to the properties.
def put_e(u,s,e,sA,eT,iI):
personalize_events.put_events(
trackingId = os.environ['trackingId'],
userId= u,
sessionId = s,
eventList = [{
'eventId': e,
'sentAt': sA,
'eventType': eT,
'properties': json.dumps({'itemId': iI})
}]
)
# The following section is where most edits to make the Lambda
# work for your instance will be done.
# In this example we assume you have fully integrated Data in the Raw
# and only want to send product interactions where you have authenticated the user.
# data is a simple helper that loads and formats the JSON object we loaded from the S3 object
data = json.loads("[" + file_content.replace("}\n{", "},\n{") + "]")
for d in data:
if d["type"]=="pageload" and d["pageType"]=="product" and d["pageLabel"]!="not set" and "user" in d :
put_e(d["user"],d["sessionId"],d["id"],int(time.mktime(time.strptime(d["createdAt"],p))),'viewed',d["pageLabel"])
count=count+1
# Finally, in cases when you are manually testing the Lambda
# This will return a count of the lines you sent to Personalize
return {'data_sent':str(count)}
Since there is a high possibility you did not have to make any edit to the code in the prior step this tutorial continues with the assumption that you have loaded your packaged zip folder to an S3 bucket and have proper permissions to access the S3 buckets and Personalize endpoints. If you are having trouble with permissions or configuration reach out to the Data in the Raw team and we will try to help wherever we can.
Now, let’s talk about the Lambda configuration and variables followed by the S3 trigger and how to view the logs to make sure your function is working.
After creating your new Lambda function you will be on the configuration page. The following configuration variables should be added and updated to your personal settings:
Code entry type: Upload a file from Amazon S3, This location should follow the format of https://s3.amazonaws.com/bucket/file (it is the location you stored your zipped code in on the previous step)
Runtime: Python 3.6
Handler: personalizeStream.lambda_handler
aws_access_key_id: Provided in your AWS permissions when you set up users
aws_secret_access_key: Provided in your AWS permissions when you set up users
bucket_id: The S3 Bucket where Data in the Raw delivers data to
trackingId: The tracking id you created in Amazon Personalize. Found under Event Trackers> Event Tracker Overview > Tracking ID
Timeout: Typically, you are fine with the default but for safety we have increased this to 1 minute
Now that you have successfully built your Lambda function you need to tell it when to run. Since Data in the Raw continuously streams data to a provided S3 location this step is very simple.
First, add an S3 trigger to your Lambda function in the Designer pane. Your configurations should be as follows:
Bucket: The bucket connected to your Data in the Raw pipeline
Event Type: All Object create Events
Prefix: your folder (if you provided one)/a really long sequence Data in the Raw created to deliver data to
Enable Trigger: TRUE
Since Data in the Raw delivers data in real time we have adopted this file naming convention: a really long sequence>year> month> day> files
Since we defined the Prefix up to the really long sequence, your Lambda will now trigger for every file created in this location and update your personalized recommendation engine continuously with real-time data.
Congratulations!!!! Now you have officially created your own world-class personalization engine that uses the same algorithms as Amazon.com at a fraction of the cost while owning all your data.
To monitor your Lambda there are multiple offerings from AWS. The easiest is to view your CloudWatch logs from the Lambda interface. Simply go to Lambda Functions> Your Function > Monitoring > View logs in CloudWatch. This will give you information about each run of your Lambda and where you can find error reports if you ever have any.
Testing your Amazon Personalize recommendations is done the same way as in the Amazon tutorial. You can use the user variable (machine id if you have not fully integrated Data in the Raw) for the user you want to view recommendations for. It is fun and a good way to test by using your own user variable; this will help you understand and contextualize how Amazon Personalize really accounts for new data.
At this point you now have an endpoint and API that calls in any web-based scripts and returns personalized recommendations.
These recommendations can be used anywhere, whether on your site, in email, with customer service or on a store front. At Data in the Raw, we love to hear how you are using your data and we love to help whenever we can. For additional information or help with integrations please reach out to Data in the Raw.
As always: Own your data, Build cool stuff, And Ninja on. Get started at </[email protected]>