Implementing The Perceptron Algorithm From Scratch In Pythonby@NKumar

Implementing The Perceptron Algorithm From Scratch In Python

February 19th, 2019

This is a follow up to my previous post on the Perceptron Model. In this post, we will see how to implement the perceptron model using breast cancer data set in python. The data set is an imbalanced data set, that means the classes ‘0’ and ‘1’ are not represented equally. We will use sklearn’s train_test_split function to split the data in the ratio of 90:10 for training and testing. The entire code discussed in the article is present in this GitHub repository.

Company Mentioned

In this post, we will see how to implement the perceptron model using breast cancer data set in python.

A perceptron is a fundamental unit of the neural network which takes weighted inputs, process it and capable of performing binary classifications. This is a follow up to my previous post on the Perceptron Model.

If you want to skip the theory and jump into code directly click here.

Perceptron Recap

In the perceptron model inputs can be real numbers unlike the Boolean inputs in MP Neuron Model. The output from the model will still be binary {0, 1}. The perceptron model takes the input x if the weighted sum of the inputs is greater than threshold b output will be 1 else output will be 0.

Learning Algorithm

The main goal of the learning algorithm is to find vector w capable of absolutely separating Positive P (y = 1) and Negative N(y = 0) sets of data. Perceptron learning algorithm goes like this,

Fig 2— Perceptron Algorithm

To understand the learning algorithm in detail and the intuition behind why the concept of updating weights works in classifying the Positive and Negative data sets perfectly, kindly refer to my previous post on the Perceptron Model.

Lets Code

Image Source

The data set we will be using is breast cancer data set from sklearn. The data set has 569 observations and 30 variables excluding the class variable. The breast cancer data is an imbalanced data set, that means the classes ‘0’ and ‘1’ are not represented equally. In this example, we are not going to perform any sampling techniques to balance the data because this is a simple implementation of the perceptron model.

Class Imbalance

Before start building the Perceptron Model, first we need to load the required packages and the data set. The data set is present in the sklearn datasets module. Once we load the data, we need to grab the features and response variables using `breast_cancer.data` and `breast_cancer.target` commands.

Perceptron Preprocessing

After fetching the X and Y variables, we will perform Min-Max scaling to bring all the features in the range 0 — 1. Before building the model, we will split the data so that we can train the model on training data and test the performance of the model on testing data. We will use sklearn’s `train_test_split` function to split the data in the ratio of 90:10 for training and testing respectively. Now that we are done with preprocessing steps, we can start building the model. We will build our model inside a class called perceptron.

In the perceptron class, we will create a constructor function `def__init__`. The constructor initializes the weights vector w and threshold b to None.

Perceptron Model

The function `model` takes input values x as an argument and perform the weighted aggregation of inputs (dot product between w.x) and returns the value 1 if the aggregation is greater than the threshold b else 0. Next, we have the `predict` function that takes input values x as an argument and for every observation present in x, the function calculates the predicted outcome and returns a list of predictions.

Finally, we will implement `fit` function to learn the best possible weight vector w and threshold value b for the given data. The function takes input data(x & y), learning rate and the number of epochs as arguments.

Perceptron Model Execution

Once we have our class ready, we initialize a new perceptron class object and using that object we will call `fit` method on our training data to learn the best possible parameters. We will evaluate the model performance on the test data by calculating the testing accuracy.

The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it.

Further Improvements

You can try out a few possible improvements to increase the accuracy of the model,

• Vary the train-test size split and see if there is any change in accuracy.
• Choose larger epochs values, learning rates and test on the perceptron model and visualize the change in accuracy.
• Take random weights in the perceptron model and experiment.

Conclusion

In this article, we have seen how to implement the perceptron algorithm from scratch using python.