As I mentioned in my previous article:
Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…_Learn, Code and Tune…._towardsdatascience.com
apart from Gradient Descent Optimization, there is another approach known as Ordinary Least Squares or Normal Equation Method. This approach, by far is the most successful and adopted in many Machine Learning Toolboxes. There is a descriptive reason behind OLS/Normal Equation Method being successful and widely preferred for Gradient Descent, to which I will be coming in the latter half of this article. Firstly, let’s move on to the Approach and its Python Implementation, secondly, applying it on Practice Datasets and finally, describing intuitively that why this over-performs Gradient Descent.
The Ordinary Least Squares approach for finding the list/vector of parameters which is the minima of the Cost Function, theta is given by:
where theta_1 and theta_0 are the parameters of Uni-Variate Linear Regression, x is the feature and y is the target and this form of OLS is known as General Form
It can be expressed in the form of Vector Algebra known as Normal Equation.
here X is the feature set with a column of 1’s appended/concatenated and Y is the target set
Now, let’s implement this in Python for Uni-Variate Linear Regression, Polynomial Regression and Multi-Variate Linear Regression:
OLS Uni-Variate Linear Regression using the General Form of OLS:
A 2-module Implementation of Uni-Variate Linear Regression using OLS is formulated:
=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature X as input. The implementation of hypothesis() is given below:
def hypothesis(theta, X):h = np.ones((X.shape[0],1))X = X.reshape(X.shape[0], 1)one_column = np.ones((X.shape[0],1))X = np.concatenate((one_column, X), axis = 1)for i in range(0,X.shape[0]):h[i] = float(np.matmul(theta, X[i]))h = h.reshape(X.shape[0])return h
=>coeff_cal(): It is the principal function that takes the Feature Set, X and Label Set, y as input and returns the vector of parameters, theta that is the minima of the Cost Function. The implementation of coeff_cal() is given below:
def coeff_cal(X,y):num = sum(np.multiply((X - np.mean(X)), (y - np.mean(y))))din = sum(np.square(X - np.mean(X)))theta_1 = num/dintheta_0 = np.mean(y) - theta_1 * np.mean(X)return np.array([theta_0,theta_1])
Now, applying the OLS Linear Regression on a Practice Dataset. Considering the Profit Prediction from Population Dataset available at
navoneel1092283/univariate_regression_Contribute to navoneel1092283/univariate_regression development by creating an account on GitHub._github.com
Problem Statement: “Given the population of a city, analyze and predict the profit of a company using Linear Regression”
Data Reading into Numpy Arrays :
data = np.loadtxt('data1.txt', delimiter=',')X_train = data[:,0] #the feature_sety_train = data[:,1] #the labels
Using the 2-module-OLS Uni-Variate Linear Regression:
theta = coeff_cal(X_train, y_train)
theta after OLS for Linear Regression
# getting the predictions...predictions = hypothesis(theta, X_train)
The Regression Line Visualization of the obtained theta is done on Scatter Plot:
scatter = plt.scatter(X_train, y_train, label="training data")regression_line = plt.plot(X_train, predictions,label="linear regression")plt.legend()plt.xlabel('Population of City in 10,000s')plt.ylabel('Profit in $10,000s')
The Regression Line Visualization comes out to be:
Regression Line Visualization after OLS Linear Regression
Now, let’s analyze the Model Performance and Statistically Compare OLS and Batch Gradient Descent in which the Performance Analysis of Gradient Descent Uni-Variate Linear Regression is obtained from:
Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…_Learn, Code and Tune…._towardsdatascience.com
Here, OLS performs better than BGD in terms of Mean Absolute Error.
OLS Polynomial Regression using Vector Algebra Form of OLS
A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated:
=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n), Feature X and Degree of the Polynomial n as input. The implementation of the hypothesis() is given below:
def hypothesis(theta, X, n):h = np.ones((X.shape[0],1))theta = theta.reshape(1,n+1)for i in range(0,X.shape[0]):x_array = np.ones(n+1)for j in range(0,n+1):x_array[j] = pow(X[i],j)x_array = x_array.reshape(n+1,1)h[i] = float(np.matmul(theta, x_array))h = h.reshape(X.shape[0])return h
=>Obtaining the minima of Cost Function as theta:
from numpy.linalg import inv
theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose(), x_array)), x_array.transpose()), y_train)
Now, applying the OLS Polynomial (here Quadratic) Regression on the same dataset.
data = np.loadtxt('data1.txt', delimiter=',')X_train = data[:,0] #the feature_sety_train = data[:,1] #the labels
x_array = np.ones((X_train.shape[0], n+1))for i in range(0,X_train.shape[0]):for j in range(0,n+1):x_array[i][j] = pow(X_train[i],j)
theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose(), x_array)), x_array.transpose()), y_train)
theta after OLS for Polynomial Regression
The Regression Line Visualization of the obtained theta is done on Scatter Plot:
import matplotlib.pyplot as plt#getting the predictions...training_predictions = hypothesis(theta, X_train, 2)
scatter = plt.scatter(X_train, y_train, label="training data")regression_line = plt.plot(X_train, training_predictions, label="polynomial (degree 2) regression")plt.legend()plt.xlabel('Population of City in 10,000s')plt.ylabel('Profit in $10,000s')
The Regression Line Visualization comes out to be:
Regression Line Visualization after OLS Polynomial Regression
Performance Analysis:
The Model Performance is analyzed and 2-way comparison is done. In other words, the performance of OLS Polynomial Regression is compared with Gradient Descent Quadratic Regression and is also compared with OLS Linear Regression.
The performance of Gradient Descent Quadratic Regression is obtained from
Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…_Learn, Code and Tune…._towardsdatascience.com
From the table, the following conclusions can be drawn,
Based on the above 3 points, it can be confirmed that
“OLS Approach is more successful than Gradient Descent Optimization”
Reason : The possible reason is that in Gradient Descent, if the Algorithm, given in
Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…_Learn, Code and Tune…._towardsdatascience.com
is carefully studied, it can be seen that there is no fixed number of iterations mentioned. It is just mentioned “until convergence”. So, by giving a random number of iterations will not fetch best performance. The implementation should be done in such a way that the Algorithm itself finds the required number of iterations for convergence. Another important potential threat of Gradient Descent is
“What will happen if Local Minima is obtained instead of Global Minima ?”
Well, for this issue, the learning rate of Gradient Descent has to be chosen appropriately, in order to avoid the wiggliness of Cost Function.
Modified Implementation of Gradient Descent (for Polynomial Regression) is given below:
def modified_BGD(theta, alpha, h, X, y, n):# n = 1 for Linear Regressionk = 0 # number of iterations by the algorithm set as countertheta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))h = hypothesis(theta, X, n)cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))while(1):theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j]=theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))h = hypothesis(theta, X, n)if ((1/X.shape[0])*0.5*sum(np.square(h - y))) >= cost:breakcost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))k = k + 1theta = theta.reshape(1,n+1)return theta, cost, k
Following this implementation of BGD, there will statistically be very small or no difference in performance measures of Multi-Variate and Polynomial Regression.
Also from the above 2-way comparison table, it can also be confirmed that:
“General Polynomial Regression over-performs Linear Regression, if proper degree of the polynomial is chosen”
OLS Multi-Variate Linear Regression using Vector Algebra Form of OLS
A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated:
=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature Set X (vector of 1’s of dimension = number of samples) as input. The implementation of hypothesis() is given below:
def hypothesis(theta, X):n = X.shape[1]h = np.ones((X.shape[0],1))theta = theta.reshape(1,n)for i in range(0,X.shape[0]):h[i] = float(np.matmul(theta, X[i]))h = h.reshape(X.shape[0])return h
=>Obtaining the minima of Cost Function as theta:
from numpy.linalg import inv
theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose(), X_train)), X_train.transpose()), y_train)
Application of OLS/Normal Equation Method Linear Regression on Housing Price Prediction Dataset of Portland, Oregon in which the housing price depends on 2 features, size of the house (in sq.feet) and number of bedrooms. The Data-Set is available at,
navoneel1092283/multivariate_regression_Contribute to navoneel1092283/multivariate_regression development by creating an account on GitHub._github.com
Problem Statement: “Given the size of the house and number of bedrooms, analyze and predict the possible price of the house”
data = np.loadtxt('data2.txt', delimiter=',')X_train = data[:,[0,1]] #feature sety_train = data[:,2] # label set
one_column = np.ones((X_train.shape[0],1))X_train = np.concatenate((one_column, X_train), axis = 1)
from numpy.linalg import inv
theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose(), X_train)), X_train.transpose()), y_train)
theta after Multi-Variate Linear Regression with Normal Equation Method
Side-by-Side Visualization of Features and Target Variable Actual and Prediction using 3-D Scatter Plots :
=>Actual Target Variable Visualization:
from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3D
sequence_containing_x_vals = list(X_train.transpose()[1])sequence_containing_y_vals = list(X_train.transpose()[2])sequence_containing_z_vals = list(y_train)
fig = pyplot.figure()ax = Axes3D(fig)
ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Actual Housing Price', fontsize=10)
=>Prediction Target Variable Visualization:
# Getting the predictions...predictions = hypothesis(theta, X_train)
from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3D
sequence_containing_x_vals = list(X_train.transpose()[1])sequence_containing_y_vals = list(X_train.transpose()[2])sequence_containing_z_vals = list(predictions)
fig = pyplot.figure()ax = Axes3D(fig)
ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Housing Price Predictions', fontsize=10)
Actual Housing Price Vs Predicted Housing Price
Performance Analysis and Statistical Comparison with Gradient Descent Multi-Variate Linear Regression:
The Performance of Gradient Descent Multi-Variate Linear Regression is obtained from
Implementation of Multi-Variate Linear Regression in Python using Gradient Descent Optimization…_Learn, Implement and Tune…_towardsdatascience.com
Note: Here MAE, MSE and RMSE are all in dollars
Similarly, OLS/Normal Equation Method performed better than Gradient Descent. The Modified Implementation of Gradient Descent for Multi-Variate Linear Regression is given below:
def modified_BGD(theta, alpha, h, X, y, n):# n is the number of featuresk = 0 # required number of iterations set as counter variabletheta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)* X.transpose()[j])h = hypothesis(theta, X, n)cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))while(1):theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)* X.transpose()[j])h = hypothesis(theta, X, n)if((1/X.shape[0])*0.5*sum(np.square(h-y_train))) >= cost:breakcost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))k = k + 1theta = theta.reshape(1,n+1)return theta, cost, k
A major disadvantage of the Modified Implementation of Gradient Descent is that, it takes a lot of time to converge i.e., it requires to iterate a large number say, 600k iterations to converge or to reach the Global Minima. But on the other hand, by running a single statement, we are getting the theta Global Minima in case of OLS/Normal Equation Method.
That’s all about the Implementation of Uni-Variate, Polynomial and Multi-Variate Regression using OLS/Normal Equation Approach
For Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on LinkedIn.
Navoneel Chakrabarty - Contributing Author - Towards Data Science | LinkedIn_View Navoneel Chakrabarty's profile on LinkedIn, the world's largest professional community. Navoneel has 2 jobs listed…_www.linkedin.com