Today, we are going to discuss 2 methods of data preprocessing.
Normalization: Normalisation is a very common technique used in data preprocessing. In this method, we assume our data is not normally distributed. In order to scaled data, we calculate the min and max of each column. normalize the each value of a column, we subtract min value from each value and divided by max-min value.
Normalization = value- min/ max-min
Standardization: if we choose to do the standardization of data. then we are assuming that our input data are normally distributed. and we are calculating the means and standard deviation of each column.
SD = sqrt[(value-mean)**2/ count(value-1)]
standardize of data = value- mean/SD.
We have plenty of data preprocessing methods. In Python, you can use the preprocessing packages and do the above task easily but I would suggest first understand the data and then decide which method you are going to select to preprocess the data.