paint-brush
Data Preprocessingby@andy12290
541 reads
541 reads

Data Preprocessing

by Aniket KaleJuly 31st, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Whenever we are dealing with Machine learning <a href="https://hackernoon.com/tagged/algorithm" target="_blank">algorithm</a>. the most important part we have to keep in mind is to pre-process data before feeding to <a href="https://hackernoon.com/tagged/machine-learning" target="_blank">machine learning</a>.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Data Preprocessing
Aniket Kale HackerNoon profile picture

Whenever we are dealing with Machine learning algorithm. the most important part we have to keep in mind is to pre-process data before feeding to machine learning.

Today, we are going to discuss 2 methods of data preprocessing.

Normalization: Normalisation is a very common technique used in data preprocessing. In this method, we assume our data is not normally distributed. In order to scaled data, we calculate the min and max of each column. normalize the each value of a column, we subtract min value from each value and divided by max-min value.

Normalization = value- min/ max-min

Standardization: if we choose to do the standardization of data. then we are assuming that our input data are normally distributed. and we are calculating the means and standard deviation of each column.


SD = sqrt[(value-mean)**2/ count(value-1)]standardize of data = value- mean/SD.

We have plenty of data preprocessing methods. In Python, you can use the preprocessing packages and do the above task easily but I would suggest first understand the data and then decide which method you are going to select to preprocess the data.