paint-brush
Building Machine Learning Models Using AWS SageMaker by@priya11
608 reads
608 reads

Building Machine Learning Models Using AWS SageMaker

by Priya KumariNovember 4th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Amazon SageMaker is a popular and full-managed service by Amazon that allows developers and data scientists to build, train and deploy machine learning models in a quick and hassle-free manner. This platform offers multiple advantages to the end-users such as removal of heavy lifting during every step of the machine learning process which makes it easy to deploy quality models. SageMaker provides all the components used for machine learning in a single toolset and allows the models to get to faster production with much less effort and at much lower costs.

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Building Machine Learning Models Using AWS SageMaker
Priya Kumari HackerNoon profile picture


Amazon SageMaker is a popular and full-managed service by Amazon that allows developers and data scientists to build, train and deploy machine learning models in a quick and hassle-free manner. This platform offers multiple advantages to the end-users such as removal of heavy lifting during every step of the machine learning process which makes it easy to deploy the quality machine learning models:


• One can build his own model for labeling workflows

• The Amazon SageMaker Studio provides the first fully integrated development environment for machine learning

• The platform can be leveraged by the data scientists and ML Engineers to build and deploy their own MXNet or TensorFlow model

• The marketers can automatically create high-quality machine learning models with the help of Amazon SageMaker Autopilot

• One can also select the ML Instances on the Fly


Using Amazon SageMaker, one can build, train and deploy machine learning models with the help of the XGBoost ml algorithm. Amazon SageMaker is a fully managed service that allows developers and data scientists to build, train and deploy machine learning models quickly.


Training ML models from conceptualization to production is often complex and time-consuming. One has to manage large chunks of data to train a model using the best of algorithms for training and has also to manage the compute capacity while training the model and then deploy the model in an appropriate production environment.


The traditional machine learning development process is a very complex one. It's expensive and iterative and is also very tough because of the absence of any tools for the entire ML workflow. Stitching the tools and the workflows together is time-consuming and also error-prone.


Amazon SageMaker reduces the complexity by making it easier to build and deploy the ML models. SageMaker also provides all the components used for machine learning in a single toolset and allows the models to get to faster production with much less effort and at much lower costs.


The users can choose the right algorithms and frameworks from a wide array of choices available. SageMaker manages all the innate and fundamental infrastructure required to train a model at a petabyte scale and helps deploy the model to production.

Advantages of Building Machine Learning Models Using SageMaker

1. Reduced Cost of Data Labeling With Amazon SgaeMaker Ground Truth

With the help of Amazon SageMaker Ground Truth, marketers can build and manage highly accurate training datasets quickly. Amazon claims that SageMaker Ground Truth allows marketers to reduce data labeling costs by up to 70%. One can use exclusive labelers or use vendors recommended by Amazon through AWS Marketplace. Additionally, continuously learns from labels provided by humans to make high-quality, automatic annotations that significantly lower the costs of labeling.


This fully managed data labeling service by Amazon allows marketers to easily label training data for machine learning at scale. Marketers can build highly precise training datasets to get started with their machine learning models in minutes through the SageMaker Ground Truth console. This can be done by using custom or built-in data labeling workflows.


These workflows support a wide array of use cases such as 3d point clouds, video, images, and text. As part of workflows, labelers have access to assistive labeling features such as automatic 3D cuboid snapping, removal of distortion in 2D images, and auto-segment tools to reduce the time required to label datasets. Moreover, Ground Truth offers automatic data labeling that utilizes machine learning models to label your data.


A Flow Diagram to Explain How Amazon SageMaker Ground Truth Works

2. Models Can Be Built Using a Single Web Interface

With the help of Amazon SageMaker, users can easily build machine learning (ML) models at scale and can train them. One gets access to everything needed to quickly load data, create new notebooks, and use built-in algorithms and frameworks. Amazon SageMaker Studio is the first fully integrated development environment for machine learning. Users can quickly upload data, create new notebooks, can train and tune models, move back and forth between steps to adjust experiments and compare results, and can deploy all the models to production in one place.


In order to automatically generate models, one can use Amazon SageMaker Autopilot along with SageMaker Studio to automatically generate models. SageMaker Autopilot is the first automated machine learning capability provider that gives users complete control and visibility into their ML models. SageMaker Autopilot automatically derives inferences from raw data, applies feature processors, picks the best set of algorithms, and trains and tunes multiple models. SageMaker Autopilot also tracks the performance of different models and then ranks the models within just a few clicks.


Amazon SageMaker Notebooks are one-click, sharable Jupyter notebooks that can be quickly spun up. One can choose from dozens of pre-built notebooks within SageMaker or from hundreds of algorithms and pre-trained models that are available in AWS Marketplace.

3. Amazon SageMaker Supports the Leading Deep Learning Frameworks


With the help of Amazon SageMaker, users can use some of the most popular open-source Deep Learning Frameworks for their use cases and can get the best results. Some of these frameworks include TensorFlow, PyTorch, mxnet, Chainer, Keras, Gluon, sckit-learn, HDROVOD, and DeepGraphLibrary.


Users can run their models on Amazon SageMaker by putting up their code in a docker file right. That’s actually bringing your own docker file putting your code inside that docker file and then registering that on ECR. Another way to train models on SageMaker is by using script mode.


Building a Sample Machine Learning Model For A Bank to Predict Whether a Customer Will Enroll for A Certificate

Such a model can be trained on a Bank Marketing Dataset. This dataset has information on customer demographics, responses to marketing events, and external factors. The data has been labeled for the convenience of the users and a column in the dataset identifies whether a customer has enrolled for a product offered by the bank. This dataset is publicly accessible from the Machine Learning Repository curated by the University of California, Irvine.


For building this model, a user needs to sign up for AWS and create an account. Thereafter the following steps need to be followed:

Step 1: Data Preparation by Creating AWS SageMaker Notebook Instance

In this step, the users create the notebook instance that they can download and process their data. As a part of the data preparation process, users create an Identity and Access Management (IAM) role that allows Amazon SageMaker to access data in Amazon S3.


Amazon SageMaker for Machine Learning

Once you log into the Amazon SageMaker console, in the top right corner select your preferred AWS Region. For example, US West (Oregon) can be taken up as one sample region.


Creating a Notebook Instance in Amazon AWS

In the left navigation pane, select Notebook instances, and then choose to Create notebook instance.


Notebook Instance Settings

Once on the Create notebook instance page, in the Notebook instance setting box, one needs to fill up the following fields: a) For Notebook instance name, type SageMaker-Tutorial. b) For Notebook instance type, choose ml.t2.medium. c) For Elastic inference, one must keep the default section as none.


Create an IAM Role

Choose to Create a new role, in the Permissions and encryption section, for the IAM role. The Create an IAM role dialog box has an Any S3 bucket option that needs to be chosen, and then Creates role needs to be selected. Those who already have a bucket that they’d like to use can choose Specific S3 buckets and then can specify the bucket name.

Permissions and encryption in IAM role

Amazon SageMaker creates the AmazonSageMaker-ExecutionRole*-***role*.


Users can keep the default settings for the remaining options and choose to Create a notebook instance.

In the Notebook instances section, the new SageMaker-Tutorial instance is displayed under the status: “Status of Pending.” The notebook is ready when the Status changes to In-Service.

Step 2: Data Preparation

Notebook instances for data preparation

In this step, marketers use the Amazon SageMaker notebook instance to preprocess the data that you need to train your machine learning model and then upload the data to Amazon S3.


After the users’ SageMaker-Tutorial notebook instance status changes to InService, they can choose the Open Jupyter option.

Opening up a new notebook from conda_python3

In the Jupyter notebook, marketers can choose New and then can choose conda_python3.


Marketers in a new code cell on their Jupyter notebook should run the code above and choose Run.


Code to import required libraries and define environmental variables This code imports the required libraries and then defines the environmental variables that the users need to prepare the data, train their ML models and deploy them.


Code to Create S3 Bucket to Store Data

Marketers can create the S3 bucket to store their data. They can run the code above into the next code cell. Marketers must ensure that they replace the bucket_name your-s3-bucket-name with a unique S3 bucket name. If one doesn't receive a success message after running the code, one must change the bucket name and try again.


If one doesn't receive a success message after running the code, one must changethe bucket name and try again

Users can download the data into a data frame and can run the code above into the next code cell.

Code to shuffle and split the data into the training data and test data

Marketers can shuffle and split the data into the training data and test data by running the code above.


The training data is used by 70% of the customers during the model training loop and then can use gradient-based optimization to iteratively refine the model parameters. Gradient-based optimization allows users to find model parameter values that minimize the model error, using the gradient of the model loss function. The remaining 30% of the customers comprise the test data. This data is used to evaluate the performance of the model and measure how well the trained model generalizes to unseen data.

Step 3: Train the ML Model

This step is all about marketers using their training dataset to train their machine learning models.


In a new code cell on their Jupyter notebook, marketers can copy and paste the following code and choose Run.


Code to reformat the header and the first column of the training data and then loads the datafrom The S3 bucket


This code reformats the header and the first column of the training data and then loads the data from The S3 bucket. This step is required to use the Amazon SageMaker pre-built XGBoost algorithm.

In order to set up the Amazon SageMaker session, marketers must create an instance of the XGBoost model (an estimator), and define the model’s hyperparameters. Then the marketers must copy and paste the code shown above and choose Run.

To start the training job, marketers must copy and paste the following code into the next code cell and choose Run.


Code to Train the Job

This code trains the model using gradient optimization on an ml.xlarge instance. After a few minutes, marketers can witness the training logos that are being generated in their Jupyter notebook.


Code to train the model using gradient optimization

Step 4: Deploying the Model

In this step, marketers deploy the trained model to an endpoint, reformat and load the CSV data and then run the model so that the predictions can be created.

Users must copy and paste the following code in a new cell on their Jupyter notebook and then must choose Run.


Code to deploy the trained model


This code deploys the model on a server and creates a SageMaker endpoint that anyone can access. The entire process may take a few minutes to complete.

In order to predict whether customers in the test data enrolled for the bank product or not, the marketers can copy the following code into the next code cell and choose Run.


Code topredict whether customers in the test data enrolled for the bank product or not

Step 5: Evaluate Model Performance


This step allows marketers to evaluate the performance and accuracy of their machine learning models. The users can run the following code on their Jupyter notebook.

Code to enable thecomparison of actual vs. predicted values


This code enables the comparison of actual vs. predicted values in a table called confusion matrix.

Based on the prediction, the marketers can conclude whether a predicted customer will enroll for a certificate of deposit accurately for 90% of the customers in the test data, with a precision of 65% (278/429) for enrolled and 90% (10, 785/11,928) for didn’t enroll.

Step 6: Data Cleansing

In this step, users terminate the resources used so far in the tab. Terminating the resources that haven't been effectively and optimally used reduces cost and is a best practice. Not terminating the resources causes changes in your account.

Deletion of Your Endpoint – Users can run the following code.


Code to delete your endpoint

Next up the users need to delete their training artifacts and S3 bucket. They must run the following code:


Code to delete trainingartifacts and S3 bucket

After the above step, the users must stop & delete their SageMaker Notebook by following the steps below: Step 1: Open the SageMaker Console. Step 2: Choose Notebook Instances from the option “Notebook.” Step 3: Choose the notebook instance that you have used for creating the predictive analytical insights and then choose Actions, Stop. The instance may take up several minutes to stop. When the Status changes to Stopped, move on to the next step. Step 4: From Actions choose Delete.

Wrap Up

This blog article clearly helps comprehend how users can use Amazon SageMaker to easily build, train and deploy ML models. This powerful service by AWS helps users with its exemplary modular design. SageMaker is used for model training in pre-configured environments and also facilitates model training by hyper-parameter tuning. Amazon's innate pre-built models have been highly optimized to run on AWS services.


These models are pre-built and nobody needs to do much to build and check the model. Marketers can pre-build XGBoost or LDA or PCA or Seq to Seq models all of which are available in SageMaker. Moreover, SageMaker is also used for model deployment and has multi-model endpoints. Users can easily build ML models by providing everything that they need to. They can quickly connect to their training data and can select the best algorithm and framework for their application. As all of the underlying infrastructures are managed by AWS, the models can be trained at a petabyte scale.