By making off-the-rack machine learning models accessible for anyone to use, cloud ML services like help make ML-driven customer experiences available to teams at any scale. Amazon Personalize You no longer need in-house data science and machine learning experts to get the benefit of propensity scoring or product recommendations. Key Challenges with Machine Learning However, while models can be outsourced, your data can't. The effectiveness of machine learning insights will always be limited by the quality and completeness of the data they are based on. Cloud ML platforms (by themselves) leave three key challenges unsolved: Collecting and supplying quality user data to train and update your model Making the insights gained from the model available where they are needed Knowing how well your ML-driven experiences are working These are infrastructure challenges, and one way they can be overcome is with a Customer Data Platform (CDP). The goal of a CDP is to get customer data from wherever it is, organize it into a single view of the customer, and make that view available to all services that need it. Instead of thinking about machine learning as just another data silo, a CDP can help you build machine learning insights into your core data infrastructure by connecting ML-driven learnings to additional external services for activation. Let's dig into how a CDP can help you solve each of the three infrastructure challenges. 1. Collecting and Supplying Quality User Data To train an ML model, you need accurate data about user behavior, and lots of it. Data quality can be broken down into three components: To be able to generate recommendations based on all actions of a user you need to be able to resolve user identity. Many off-the rack ML solutions skip this requirement, tracking activity occurring on a particular device and calculating insights for that device only. This method is convenient, but it doesn't reflect a customer's true history of interaction with your brand across your digital properties and therefore can lead to incomplete insights. Identity resolution is a core capability of a CDP. Identity Resolution - Once you solve the identity resolution challenge, you still need to map data from all those different sources to a single schema that you can use to train your model. This means bringing together multiple teams of developers across multiple languages and platforms to collect data under a single schema. Consistency across platforms - Finally you need to upload all that data to your ML platform and keep on updating it in close to real time, or your recommendations will quickly become outdated. Updating in real time - 2. Making ML Insights Available and Actionable Just as the data that powers an ML model can come from any platform, the insights that machine learning models generate are most valuable when they can be used to power personalized experiences for your website, apps, brick-and-mortar stores, call centers, etc. Without modern customer data infrastructure, making ML actionable is a huge challenge. For example: say you've used ML to generate churn risk scores for your customers: Without the ability to automatically connect those insights to additional systems, can your call center automation system treat high risk customers differently? Do your customer support representatives know when they're speaking with a high churn risk customer? Can your website surface retention offers? Can you segment on churn risk in your ESP? Without the data connections provided by a CDP, making your ML scores available where they’re needed would require dedicated development work and additional cost. 3. The Project: Personalized Product Recommendations Items4U ("The finest items, which you will particularly enjoy"), operates a retail business across it's website, native iOS and Android apps, and network of brick-and-mortar stores throughout the country. Our challenge is that the sheer number of items we offer can make the shopping experience on our apps feel a little scattershot. I want to use ML to figure out which products I should focus on surfacing for each user. By the end of this project, I'll have set up a mechanism to deliver personalized product recommendations to each user, which will automatically continue to grow and improve over time. I'll be using and as my Customer Data Platform. At the end of the project I'll be using to measure success. Amazon Personalize mParticle Amplitude At a high level, the data flow looks like this: mParticle collects commerce data from my website, apps and stores. Each action is attributed to a master mParticle User ID and forwarded on to Amazon, using Amazon’s Kinesis streaming service. An AWS Lambda function converts the data into a format that can be used to train ML models and uploads it to Amazon Personalize. The same function requests custom product recommendations from Amazon Personalize, and uploads the recs back to a master customer profile in mParticle. The mParticle customer profile powers personalization on the Items4U website and apps, as well as making the same information available in my messaging and analytics platforms. There’s a fair amount of work required to set up the AWS assets we need, but the good news is that most of it can be automated for subsequent iterations. For this reason, I’m using the AWS CLI and other scripting-friendly tools wherever possible. In this post, we’ll walk through how to: Collect commerce event data through mParticle Create a Kinesis Stream and start streaming event data from mParticle Create a Personalize dataset group Create an AWS Lambda function to load data into my Personalize dataset group until I have enough data to train an ML model Create a Personalize campaign Update my Lambda function to request recommendations for each customer and store the recommendations on mParticle’s customer profile Collect Data with a CDP To train an ML model to give product recommendations, I need data about how my customers interact with products. Fortunately, I don't have to start from scratch just for ML. Capturing commerce data is a core function of mParticle, and by the time a retail brand like Items4U is ready to explore ML, the required data is already being captured and used for more basic use cases, like app analytics, segmentation and re-targeting. When ready to begin integrating ML with a CDP, I've already: Set up inputs to collect data from the following channels: iOS, Android, Web, Custom Feed (Point of Sale), Custom Feed (Amazon Personalize) Added mParticle's client-side SDKs to my , and apps, and configured my point-of-sale platform to forward purchase events to mParticle using the server-side SDK. iOS Android Web NodeJS Capture Product Interactions mParticle uses a single standard schema for capturing commerce events, and this schema is enforced by the SDKs. This means I don't have to rely on individual developers on each platform picking the right event names. To my ML model, a purchase made through the iOS app will look the same as a purchase made on the website, or in-store. For example, here's how I would log a purchase on my web app. product = mParticle.eCommerce.createProduct( , , , ); transactionAttributes = { : , : , : }; mParticle.eCommerce.logProductAction( mParticle.ProductActionType.Purchase, [product], , , transactionAttributes); // 1. Create the product var 'Skateboard' // Name 'prsx-10' // SKU 100.00 // Price 1 // Quantity // 2. Summarize the transaction var Id 'some-transaction-id' Revenue 100 Tax 9 // 3. Log the purchase event; null //optional custom attributes would go here null //optional custom flags would go here What mParticle forwards to downstream services, like my ML model (stripped down to just the fields we care about), will look like this: { , : , : { : }, : { : , : }, : [{ : { : { : , : [{ : , : }] }, : }, : }] } "mpid" 761556044463767215 // master user identity "environment" "production" "user_identities" "email" "user99@example.com" "user_attributes" "$firstname" "Milo" "$lastname" "Minderbinder" "events" "data" "product_action" "action" "view_detail" // Others actions are "add_to_cart", "remove_from_cart", and "purchase" "products" "id" "prsx-10" // Product SKU "price" 100 "timestamp" 1604695231872 "event_type" "commerce_event" Identity Resolution Ideally, my product interaction data is linked to a customer ID that works on my website, on my mobile apps and in-store. Here, that's the mParticle ID (MPID). mParticle's identity resolution allows me to gradually build up identities for each channel and resolve those identities to a single MPID. For example: when a customer visits the website for the first time, I can link a cookie ID to the MPID. If the customer creates an account, I can add an email address, and perhaps a phone number. If they make an online purchase, I can add a secure hash of their credit card number. This means that if the same person then makes a purchase in a physical store with the same credit card, I can attribute that purchase to the same customer profile. This process lets me train my ML models based on a complete set of customer interactions. Create the AWS Assets For this use case I need to bring together mParticle and four AWS services: A stream receives events from mParticle Kinesis A campaign creates product recommendations Personalize A function acts as a broker. It transforms data from mParticle into a format accepted by Personalize, and uploads product recommendations back to mParticle. Lambda controls access and permissions for the other components. IAM These services can be configured in the AWS UI, but I'll be using Amazon's CLI tool. This way, I can reuse my work by creating a script to quickly spin up future iterations. I've followed Amazon's documentation to with access to the above four systems and log in to the console. create an IAM user As I go, I’ll need to save the Amazon Resource Number (ARN) for each asset I create. I’ll need these ARNs to set up interactions between the different resources I create. Create a Kinesis Stream Kinesis is a tool for processing streaming data. mParticle will forward commerce event data to Kinesis, where they will be picked up by the Lambda function I'll set up later. 1. Create the stream aws kinesis create-stream \ --stream-name Items4UCommerceEventStream \ --shard-count 1 Save the from the response. StreamARN 2. Create a role for mParticle to assume For mParticle to be able to upload to the Kinesis stream, I need to create an IAM role for mParticle to assume. This role needs a policy allowing PutRecord access to Kinesis ( ), and a trust policy ( ) allowing mParticle to assume the role. sample sample aws iam create-role --role-name mparticle-kinesis-role --assume-role-policy- file: aws iam put-role-policy --role-name mparticle-kinesis-role --policy-name mp-kinesis-put --policy- file: document ///path/to/mp-trust-policy.json document ///path/to/mp-kinesis-role.json 3. Connect mParticle to Kinesis. mParticle offers an "event" output for streaming event data to Kinesis. This can be set up and controlled from the mParticle dashboard. You can read an overview of event outputs in the . mParticle docs Create Configuration First, I need to create an overall configuration for Kinesis. This holds all the settings that will remain the same for every input I connect. Each mParticle integration requires different settings. For example, API keys are commonly required. For Kinesis, I've already granted mParticle write access using IAM, so I only need to provide my AWS account number here. Connect All Sources Now I need to connect each of my four inputs: iOS, Android, Web and POS, to Kinesis. Set Filters mParticle lets me switch each individual event name on or off for a particular output, like Kinesis. These help me ensure that I'm only sending to Kinesis the data that I need to train my ML model. I'm interested in 4 types of commerce events: Add to cart Add to wishlist Purchase View detail In my filter settings, I leave these four events on, and turn everything else off. Create a Dataset Group Now, I'm streaming events from mParticle to Kinesis, hurrah! But Kinesis is only a staging area. From here, I need to load them into an Amazon Personalize Dataset Group. A Dataset Group is an overall container for a set of user data that can be used to train an ML model. aws personalize create-dataset-group --name Items4UCommerceEvents Save the from the response. datasetGroupArn Create a Schema, Dataset, and Tracker A Dataset Group can include up to three datasets: Contains detail about products, including price, category, color, etc. Items: Contains detail about customers, like age, location, gender, etc. Users: Details interactions between users and items. For example, a user viewing a product, purchasing it, or adding it to a cart or wishlist. Interactions: Only the Interactions dataset is required, so to keep things simple it's the only one I'll use. I can come back later and improve future iterations of my model by adding other datasets. Before I can create the dataset, I need a schema. For this example, I use the following elements: - this will be the mParticle ID User ID - mParticle automatically creates a unique ID for each session, which I can use. Session ID - this will be the SKU of the product Item ID - this will be the type of product interaction: Add to Cart, Add to Wishlist, Purchase, or View Detail. Event Type - time of the interaction. mParticle automatically records a timestamp for each interaction. Timestamp As a Personalize JSON schema, it looks like this: { : , : , : , : [ { : , : }, { : , : }, { : , : }, { : , : }, { : , : } ], : } "type" "record" "name" "Interactions" "namespace" "com.amazonaws.personalize.schema" "fields" "name" "USER_ID" "type" "string" "name" "SESSION_ID" "type" "string" "name" "ITEM_ID" "type" "string" "name" "EVENT_TYPE" "type" "string" "name" "TIMESTAMP" "type" "long" "version" "1.0" 1. Create the schema: aws personalize create-schema \ --name Items4UCommerceEventSchema \ --schema file: ///path/to/items4u-commerce-event-schema.json Save the from the response. schemaArn 2. Create the dataset: aws personalize create-dataset \ --name Items4UCommerceEventsDataset \ --schema-arn {{saved schema arn}} \ --dataset-group-arn {{saved dataset group arn}} \ --dataset-type Interactions Save the from the response. datasetArn 3. Create the tracker: A tracker is an ID linked to the dataset that lets me upload events. aws personalize create-event-tracker \ --name Items4UCommerceEventTracker \ --dataset-group-arn {{saved dataset group arn}} Save the from the response. trackingID Train the Model In order to train a Machine Learning solution, I need at least 1000 records in my dataset. One way to do this is to upload CSVs of historical events. mParticle integrates with several Data Warehouses, including Amazon Redshift. If I have access, I can easily create a training set from my past data. The CSV would look something like this: USER_ID,EVENT_TYPE,ITEM_ID,SESSION_ID,TIMESTAMP ,view_detail,prsx ,Q8bQC4gnO8J7ewB, ,purchase,prsx ,VA9AUJBhoJXAKr7, 761556044463767215 -23 1595492950 -6907502341961927698 -14 1595492945 However, training the model on historical data is not strictly required, and since data warehouse access is often tightly controlled, this step can be a huge bottleneck in attempts to implement ML. An alternative way to train the model is simply to start forwarding real-time event data as it comes in. To do this I need to set up my Lambda function. Eventually, the function will perform three tasks every time a new event is received at my Kinesis stream: Transform the mParticle data into my Interactions schema, and upload it to my Personalize dataset. Call my Personalize Campaign and ask for updated product recommendations for the user. Use the mParticle API to store the updated recommendations on the mParticle user profile. However, since I can't create a Personalize Campaign until I can train a Solution, this first version of the Lambda performs only the first task, while I collect the minimum 1000 events. Lambdas can use several different languages and runtimes. I'll use Node for mine. The first version looks like this: AWS = ( ); JSONBig = ( )({ : }); report_actions = [ , , , ]; personalizeevents = AWS.PersonalizeEvents({ : }); exports.handler = { ( record event.Records) { payload = JSONBig.parse(Buffer.from(record.kinesis.data, ).toString( )); events = payload.events; mpid = payload.mpid; sessionId = payload.message_id; params = { : sessionId, : mpid, : process.env.TRACKING_ID }; eventList = []; ( e events) { (e.event_type === && report_actions.indexOf(e.data.product_action.action) >= ) { timestamp = .floor(e.data.timestamp_unixtime_ms / ); action = e.data.product_action.action; event_id = e.data.event_id; ( product e.data.product_action.products) { obj = { : product.id, }; eventList.push({ : obj, : timestamp, : event_id, : action }); } } } (eventList.length > ) { params.eventList = eventList; personalizeevents.putEvents(params, { (err) .log(err, err.stack); .log( ) }); } } }; // Import Dependencies const require 'aws-sdk' const require 'json-bigint' storeAsString true // needed to parse 64-bit integer MPID // Define the product actions we want to report to Personalize const "purchase" "view_detail" "add_to_cart" "add_to_wishlist" // Initialize Personalize const new apiVersion '2018-03-22' ( ) => event, context for const of // Parse encoded payload const 'base64' 'ascii' // Extract required params const const const const sessionId userId trackingId // Get interactions from events array const for const of if "commerce_event" 0 const Math 1000 const const for const of const itemId properties sentAt eventId eventType if 0 // Upload interactions to tracker ( ) function err if console else console `Uploaded events` ${eventList.length} 1. Create the IAM role: As before I need to create an IAM role to grant my Lambda function the permissions it needs to access Kinesis and Personalize. The necessary trust policy can be found . here aws iam create-role \ --role-name items4u-lambda-personalize-role \ --assume-role-policy- file: document ///path/to/lambda-trust-policy.json Save the from the response. Role.Arn I can use off-the-rack managed policies to grant access to Kinesis and Personalize: aws iam attach-role-policy \ --role-name items4u-lambda-personalize-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole aws iam attach-role-policy \ --role-name items4u-lambda-personalize-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess 2. Create the Lambda: To create the Lambda I need a zip file including the function itself, as well as it's dependencies in the node_modules folder. I'll also need the mParticle API credentials for the Custom Feed I created for Amazon Personalize, and supply these as environment variables for the Lambda, as well as the Dataset Tracker ID. aws lambda create- {{role arn}} \ --handler index.handler \ --environment Variables= \ -- - \ -- . \ -- - :/// / / . \ -- function function name Items4UPersonalizeLambda runtime nodejs12 x zip file fileb path to Items4UPersonalizeLambda zip role "{MP_KEY=SomeAccessKey,MP_SECRET=SomeAccessSecret,TRACKER_ID=SomeTrackerID}" 3. Create an event-source mapping: Configure the Lambda to be triggered by new events received at the Kinesis stream. aws lambda create-event-source-mapping \ -- {{Kinesis stream arn}} \ --starting-position LATEST - \ -- - - function name Items4UPersonalizeLambda event source arn Wait... By now, every time a commerce event is collected across any of my app platforms, mParticle is forwarding it to Kinesis. From here, the Lambda uploads the event to my Personalize dataset. Now I need to wait to get at least 1000 records loaded. This can take some time. In the meantime, I can check the logs in AWS Cloudwatch to make sure the Lambda function is being invoked as expected. Create an ML Campaign A Personalize Campaign requires three components: A "Solution" which describes the particular ML recipe we want to use for the campaign. One dataset group can contain many solutions. A "Solution Version" is an instance of a Solution trained on a specific dataset. The "Campaign" is what will actually dispense product recommendations for a user. 1. Create a Solution: For this example I'll use Amazon's 'User Personalization' recipe. aws personalize create-solution \ --name Items4URecsSolution \ --dataset-group-arn {{dataset group ARN}} \ --recipe-arn arn:aws:personalize:::recipe/aws-user-personalization Save the from the response. solutionArn 2. Create a Solution Version: aws personalize create-solution-version \ --solution-arn {{solution ARN}} Save the from the response. solutionVersionArn The solution version takes some time to create. I can check in on its progress regularly with until the response shows : . describe-solution-version status ACTIVE aws personalize describe-solution-version \ --solution-version-arn {{solution version ARN}} 3. Create the Campaign: aws personalize create-campaign \ --name Items4UProductRecsCampaign \ --solution-version-arn arn:aws:personalize:us-east : :solution/Items4URecsSolution/f58f24b6 \ --min-provisioned-tps -1 521255666488 1 Complete the Lambda The final step is to update my Lambda function to request product recommendations from my new campaign, and send those recommendations back to mParticle. The updated Lambda looks like this: AWS = ( ); JSONBig = ( )({ : }); mParticle = ( ); report_actions = [ , , , ]; personalizeevents = AWS.PersonalizeEvents({ : }); personalizeruntime = AWS.PersonalizeRuntime({ : }); mp_api = mParticle.EventsApi( mParticle.Configuration(process.env.MP_KEY, process.env.MP_SECRET)); exports.handler = { ( record event.Records) { payload = JSONBig.parse(Buffer.from(record.kinesis.data, ).toString( )); events = payload.events; mpid = payload.mpid; sessionId = payload.message_id; params = { : sessionId, : mpid, : process.env.TRACKING_ID }; eventList = []; ( e events) { (e.event_type === && report_actions.indexOf(e.data.product_action.action) >= ) { timestamp = .floor(e.data.timestamp_unixtime_ms / ); action = e.data.product_action.action; event_id = e.data.event_id; ( product e.data.product_action.products) { obj = { : product.id, }; eventList.push({ : obj, : timestamp, : event_id, : action }); } } } (eventList.length > ) { params.eventList = eventList; personalizeevents.putEvents(params, { (err) .log(err, err.stack); { params = { : process.env.CAMPAIGN_ARN, : , : mpid }; personalizeruntime.getRecommendations(params, { (err) .log(err, err.stack); { .log( ) batch = mParticle.Batch(mParticle.Batch.Environment.development); batch.mpid = mpid; itemList = []; ( item data.itemList) { itemList.push(item.itemId); } batch.user_attributes = {}; batch.user_attributes.product_recs = itemList; event = mParticle.AppEvent(mParticle.AppEvent.CustomEventType.other, , { : itemList.join() }); batch.addEvent(event); .log( .stringify(batch)); callback = { (error) { .error(error); } { .log( ); } }; mp_api.uploadEvents(batch, callback); } }); } }); } } }; // Import Dependencies const require 'aws-sdk' const require 'json-bigint' storeAsString true // needed to parse 64-bit integer MPID const require 'mparticle' // Define the product actions we want to report to Personalize const "purchase" "view_detail" "add_to_cart" "add_to_wishlist" // Initialize Personalize and mParticle const new apiVersion '2018-03-22' const new apiVersion '2018-05-22' const new new ( ) => event, context for const of // Parse encoded payload const 'base64' 'ascii' // Extract required params const const const const sessionId userId trackingId // Get interactions from events array const for const of if "commerce_event" 0 const Math 1000 const const for const of const itemId properties sentAt eventId eventType if 0 // Upload interactions to tracker ( ) function err, data if console else // Request product recs var campaignArn numResults '5' userId ( ) function err, data if console else console `Uploaded events` ${eventList.length} // Upload product recs to mParticle const new const for const of const new 'AWS Recs Update' product_recs console JSON const ( ) function error, data, response if console else console 'Product Recs updated successfully' As well as updating the code, I also need to add the environment variable. CAMPAIGN_ARN When I request recs from a Personalize campaign, I can specify the number of recommendations I want. Here, I'm going for a top 5 -- enough to populate a carousel or an email widget. The payload uploaded to mParticle by the Lambda will look like this: { : , : , : { : [ , , , , ] }, : [ { : { : , : , : { : } }, : } ] } "environment" "development" "mpid" "-6907502341961927698" "user_attributes" "product_recs" "prsx-4" "prsx-2" "prsx-15" "prsx-30" "prsx-28" "events" "data" "custom_event_type" "other" "event_name" "AWS Recs Update" "custom_attributes" "product_recs" "prsx-4,prsx-2,prsx-15,prsx-30,prsx-28" "event_type" "custom_event" This payload records the product recommendations in two ways: As an , to record what the current recs were at a specific time. event As a . User attributes are kept up-to-date for each user by mParticle as new data is received. mParticle enriches incoming data with a complete set of user attributes, so any user activities captured on any platform will include the current set of recs as context. user attribute I've now set up a Machine Learning system that can generate a set of product recommendations for every user and update them each time the user interacts with a product. Unlike a model trained on a one-off CSV upload, mine will continue to get better over time as the results of successful and unsuccessful recommendations feed back into the model in a flywheel pattern. By sending the recommendations back to my Customer Data Platform, I can use them for more granular analytics and customer experience use cases. Tracking Success Once we've set up the infrastructure to generate, continuously refine, and activate ML insights, the final piece of the puzzle is to figure out what works and what doesn't. For that, I need my Data Warehouse and my analytics platforms, such as Google Analytics or Amplitude. The commerce data I'm collecting with mParticle is already enough to help me identify general trends. For example, I can tell if the average lifetime value of users is increasing since I started applying ML insights. To dig deeper, I need to understand which ML campaigns I'm deploying for each user, so that I can compare how successful they are. For example, I might want to compare results for my initial product recommendations recipe against results for a control group that sees a default set of products. Alternatively, if I go back and enrich my ML model with additional datasets, or try a different recipe altogether, I'll want to test the new campaign against the original to check that I've actually improved my outcomes. We've already seen that maintaining a single complete customer profile helped me activate on ML insights across all platforms. The same benefits apply to analytics. By storing experiment and variant information on the user profile, mParticle automatically makes that data available to any analytics tools that you are forwarding customer data to. I can use a service like Optimizely to set up my experiments, or I can set up a quick A/B test, just by tweaking my Lambda code a little. Below is a version of the Lambda I set up in part 2 of this post, modified to do a few extra tasks: Check to see if the current user has already been assigned to the A or B variant. Assign the user to a variant group, if necessary. Request product recs from one of two campaigns, depending on the variant. Record the variant info on the mParticle user profile. Changes from earlier are marked with comments. AWS = ( ); JSONBig = ( )({ : }); mParticle = ( ); trackingId = ; report_actions = [ , , , ]; personalizeevents = AWS.PersonalizeEvents({ : }); personalizeruntime = AWS.PersonalizeRuntime({ : }); mp_api = mParticle.EventsApi( mParticle.Configuration(process.env.MP_KEY, process.env.MP_SECRET)); exports.handler = { ( record event.Records) { payload = JSONBig.parse(Buffer.from(record.kinesis.data, ).toString( )); events = payload.events; mpid = payload.mpid; sessionId = payload.message_id; params = { : sessionId, : mpid, : trackingId }; variant_assigned = (payload.user_attributes.ml_variant); variant = variant_assigned ? payload.user_attributes.ml_variant : .random() > ? : ; eventList = []; ( e events) { (e.event_type === && report_actions.indexOf(e.data.product_action.action) >= ) { timestamp = .floor(e.data.timestamp_unixtime_ms / ); action = e.data.product_action.action; event_id = e.data.event_id; ( product e.data.product_action.products) { obj = { : product.id,}; eventList.push({ : obj, : timestamp, : event_id, : action }); } } } (eventList.length > ) { params.eventList = eventList; personalizeevents.putEvents(params, { (err) .log(err, err.stack); { params = { campaignArn: process.env[ ], : , : mpid }; personalizeruntime.getRecommendations(params, { (err) .log(err, err.stack); { batch = mParticle.Batch(mParticle.Batch.Environment.development); batch.mpid = mpid; itemList = []; ( item data.itemList) { itemList.push(item.itemId); } batch.user_attributes = {}; batch.user_attributes.product_recs = itemList; (!variant_assigned) { batch.user_attributes.ml_variant = variant } event = mParticle.AppEvent(mParticle.AppEvent.CustomEventType.other, ); event.custom_attributes = { : itemList.join()}; batch.addEvent(event); mp_callback = { (error) { .error(error); } { .log( ); } }; mp_api.uploadEvents(batch, mp_callback); } }); } }); } } }; const require 'aws-sdk' const require 'json-bigint' storeAsString true const require 'mparticle' const "bd973581-6505-46ae-9939-e0642a82b8b4" const "purchase" "view_detail" "add_to_cart" "add_to_wishlist" const new apiVersion '2018-03-22' const new apiVersion '2018-05-22' const new new ( ) function event, context for const of const 'base64' 'ascii' const const const const sessionId userId trackingId // Check for variant and assign one if not already assigned const Boolean const Math 0.5 "A" "B" const for const of if "commerce_event" 0 const Math 1000 const const for const of const itemId properties sentAt eventId eventType if 0 ( ) function err, data if console else var // Select campaign based on variant `CAMPAIGN_ARN_ ` ${variant} numResults '5' userId ( ) function err, data if console else const new const for const of // Record variant on mParticle user profile if const new 'AWS Recs Update' product_recs const ( ) function error, data, response if console else console 'API called successfully.' With variant information stored on the user profile and automatically passed to Amplitude, I can now group my users by variant in any of my Amplitude charts. User journeys are not always linear. For example: a customer might look at a recommended product on my website and not buy it immediately, but pick it up later in a brick-and-mortar store, or when they next use the native app. If I'm running my experiments and analytics on a per-device basis, I'll miss that conversion. Note that because both my analytics in Amplitude and the user bucketing for my A/B test is based on mParticle's master MPID, my A/B test is more complete than if I had bucketed per device. By using the mParticle ID, I can capture the full effect of my campaigns. Wrapping Up In Machine Learning, as in all data-centric tasks, the right infrastructure is key. When you use a CDP like mParticle to center your data infrastructure around a single, cross-platform customer record, your ML campaigns will be faster to set up, and more effective. By using a CDP, you can: Train your model with data about your , rather than just a set of devices and browser sessions. customers Use centralized quality controls to train your model on the best possible dataset, that's consistent across all your data sources. Use scores and recommendations anywhere - not just on your website and apps, but in emails, push messages or even in your brick-and-mortar stores. Easily set up A/B tests and try out different models. Track the success of your ML experiences in your analytics platform. To learn more about this use case, you can watch me demo it live with AWS on twitch.tv . here Previously published here .