541 reads

Data Preprocessing

by Aniket KaleJuly 31st, 2017

Too Long; Didn't Read

Whenever we are dealing with Machine learning <a href="https://hackernoon.com/tagged/algorithm" target="_blank">algorithm</a>. the most important part we have to keep in mind is to pre-process data before feeding to <a href="https://hackernoon.com/tagged/machine-learning" target="_blank">machine learning</a>.

Companies Mentioned

Whenever we are dealing with Machine learning algorithm. the most important part we have to keep in mind is to pre-process data before feeding to machine learning.

Today, we are going to discuss 2 methods of data preprocessing.

Normalization: Normalisation is a very common technique used in data preprocessing. In this method, we assume our data is not normally distributed. In order to scaled data, we calculate the min and max of each column. normalize the each value of a column, we subtract min value from each value and divided by max-min value.

Normalization = value- min/ max-min

Standardization: if we choose to do the standardization of data. then we are assuming that our input data are normally distributed. and we are calculating the means and standard deviation of each column.

SD = sqrt[(value-mean)**2/ count(value-1)]standardize of data = value- mean/SD.

We have plenty of data preprocessing methods. In Python, you can use the preprocessing packages and do the above task easily but I would suggest first understand the data and then decide which method you are going to select to preprocess the data.