Learn, Code and Tune… As I mentioned in my previous article: _Learn, Code and Tune…._towardsdatascience.com Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from… apart from Gradient Descent Optimization, there is another approach known as Ordinary Least Squares or Normal Equation Method. This approach, by far is the most successful and adopted in many Machine Learning Toolboxes. There is a descriptive reason behind OLS/Normal Equation Method being successful and widely preferred for Gradient Descent, to which I will be coming in the latter half of this article. Firstly, let’s move on to the Approach and its Python Implementation, secondly, applying it on Practice Datasets and finally, describing intuitively that why this over-performs Gradient Descent. The Ordinary Least Squares approach for finding the list/vector of parameters which is the minima of the Cost Function, theta is given by: where and are the parameters of Uni-Variate Linear Regression, is the feature and is the target and this form of OLS is known as theta_1 theta_0 x y General Form It can be expressed in the form of Vector Algebra known as Normal Equation. here is the feature set with a column of 1’s appended/concatenated and is the target set X Y Now, let’s implement this in Python for Uni-Variate Linear Regression, Polynomial Regression and Multi-Variate Linear Regression: OLS Uni-Variate Linear Regression using the General Form of OLS: A 2-module Implementation of Uni-Variate Linear Regression using OLS is formulated: =>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature X as input. The implementation of hypothesis() is given below: def (theta, X):h = np.ones((X.shape[0],1))X = X.reshape(X.shape[0], 1)one_column = np.ones((X.shape[0],1))X = np.concatenate((one_column, X), axis = 1)for i in range(0,X.shape[0]):h[i] = float(np.matmul(theta, X[i]))h = h.reshape(X.shape[0])return h hypothesis =>coeff_cal(): It is the principal function that takes the Feature Set, X and Label Set, y as input and returns the vector of parameters, theta that is the minima of the Cost Function. The implementation of coeff_cal() is given below: def (X,y):num = sum(np.multiply((X - np.mean(X)), (y - np.mean(y))))din = sum(np.square(X - np.mean(X)))theta_1 = num/dintheta_0 = np.mean(y) - theta_1 * np.mean(X)return np.array([theta_0,theta_1]) coeff_cal Now, applying the on a Practice Dataset. Considering the Profit Prediction from Population Dataset available at OLS Linear Regression _Contribute to navoneel1092283/univariate_regression development by creating an account on GitHub._github.com navoneel1092283/univariate_regression : “ ” Problem Statement Given the population of a city, analyze and predict the profit of a company using Linear Regression Data Reading into Numpy Arrays : data = np.loadtxt('data1.txt', delimiter=',')X_train = data[:,0] #the feature_sety_train = data[:,1] #the labels Using the 2-module-OLS Uni-Variate Linear Regression: theta = coeff_cal(X_train, y_train) theta after OLS for Linear Regression # getting the predictions...predictions = hypothesis(theta, X_train) The Regression Line Visualization of the obtained theta is done on Scatter Plot: scatter = plt.scatter(X_train, y_train, label="training data")regression_line = plt.plot(X_train, predictions,label="linear regression")plt.legend()plt.xlabel('Population of City in 10,000s')plt.ylabel('Profit in $10,000s') The Regression Line Visualization comes out to be: Regression Line Visualization after OLS Linear Regression Now, let’s analyze the Model Performance and Statistically Compare OLS and Batch Gradient Descent in which the Performance Analysis of Gradient Descent Uni-Variate Linear Regression is obtained from: _Learn, Code and Tune…._towardsdatascience.com Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from… Here, OLS performs better than BGD in terms of Mean Absolute Error. OLS Polynomial Regression using Vector Algebra Form of OLS A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated: =>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n), Feature X and Degree of the Polynomial n as input. The implementation of the hypothesis() is given below: def (theta, X, n):h = np.ones((X.shape[0],1))theta = theta.reshape(1,n+1)for i in range(0,X.shape[0]):x_array = np.ones(n+1)for j in range(0,n+1):x_array[j] = pow(X[i],j)x_array = x_array.reshape(n+1,1)h[i] = float(np.matmul(theta, x_array))h = h.reshape(X.shape[0])return h hypothesis =>Obtaining the minima of Cost Function as theta: from numpy.linalg import inv theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose(), x_array)), x_array.transpose()), y_train) Now, applying the on the same dataset. OLS Polynomial (here Quadratic) Regression data = np.loadtxt('data1.txt', delimiter=',')X_train = data[:,0] #the feature_sety_train = data[:,1] #the labels x_array = np.ones((X_train.shape[0], n+1))for i in range(0,X_train.shape[0]):for j in range(0,n+1):x_array[i][j] = pow(X_train[i],j) theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose(), x_array)), x_array.transpose()), y_train) theta after OLS for Polynomial Regression The Regression Line Visualization of the obtained theta is done on Scatter Plot: import matplotlib.pyplot as plt#getting the predictions...training_predictions = hypothesis(theta, X_train, 2) scatter = plt.scatter(X_train, y_train, label="training data")regression_line = plt.plot(X_train, training_predictions, label="polynomial (degree 2) regression")plt.legend()plt.xlabel('Population of City in 10,000s')plt.ylabel('Profit in $10,000s') The Regression Line Visualization comes out to be: Regression Line Visualization after OLS Polynomial Regression Performance Analysis: The Model Performance is analyzed and 2-way comparison is done. In other words, the performance of OLS Polynomial Regression is compared with Gradient Descent Quadratic Regression and is also compared with OLS Linear Regression. The performance of Gradient Descent Quadratic Regression is obtained from _Learn, Code and Tune…._towardsdatascience.com Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from… From the table, the following conclusions can be drawn, Gradient Descent Quadratic Regression fetches the least Mean Absolute Error OLS Quadratic Regression fetches the least Mean Square Error and Root Mean Square Error OLS Quadratic and Linear Regression fetch the highest R Square Score. Based on the above 3 points, it can be confirmed that “ ” OLS Approach is more successful than Gradient Descent Optimization The possible reason is that in Gradient Descent, if the Algorithm, given in Reason : _Learn, Code and Tune…._towardsdatascience.com Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from… is carefully studied, it can be seen that there is no fixed number of iterations mentioned. It is just mentioned “until convergence”. So, by giving a random number of iterations will not fetch best performance. The implementation should be done in such a way that the Algorithm itself finds the required number of iterations for convergence. Another important potential threat of Gradient Descent is “ ” What will happen if Local Minima is obtained instead of Global Minima ? Well, for this issue, the learning rate of Gradient Descent has to be chosen appropriately, in order to avoid the wiggliness of Cost Function. Modified Implementation of Gradient Descent (for Polynomial Regression) is given below: def (theta, alpha, h, X, y, n):# n = 1 for Linear Regressionk = 0 # number of iterations by the algorithm set as countertheta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))h = hypothesis(theta, X, n)cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))while(1):theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j]=theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))h = hypothesis(theta, X, n)if ((1/X.shape[0])*0.5*sum(np.square(h - y))) >= cost:breakcost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))k = k + 1theta = theta.reshape(1,n+1)return theta, cost, k modified_BGD Following this implementation of BGD, there will statistically be very small or no difference in performance measures of Multi-Variate and Polynomial Regression. Also from the above 2-way comparison table, it can also be confirmed that: “ ” General Polynomial Regression over-performs Linear Regression, if proper degree of the polynomial is chosen OLS Multi-Variate Linear Regression using Vector Algebra Form of OLS A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated: =>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature Set X (vector of 1’s of dimension = number of samples) as input. The implementation of hypothesis() is given below: def (theta, X):n = X.shape[1]h = np.ones((X.shape[0],1))theta = theta.reshape(1,n)for i in range(0,X.shape[0]):h[i] = float(np.matmul(theta, X[i]))h = h.reshape(X.shape[0])return h hypothesis =>Obtaining the minima of Cost Function as theta: from numpy.linalg import inv theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose(), X_train)), X_train.transpose()), y_train) Application of OLS/Normal Equation Method Linear Regression on Housing Price Prediction Dataset of Portland, Oregon in which the housing price depends on 2 features, size of the house (in sq.feet) and number of bedrooms. The Data-Set is available at, _Contribute to navoneel1092283/multivariate_regression development by creating an account on GitHub._github.com navoneel1092283/multivariate_regression : “ Problem Statement Given the size of the house and number of bedrooms, analyze and predict the possible price of the house” data = np.loadtxt('data2.txt', delimiter=',')X_train = data[:,[0,1]] #feature sety_train = data[:,2] # label set one_column = np.ones((X_train.shape[0],1))X_train = np.concatenate((one_column, X_train), axis = 1) from numpy.linalg import inv theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose(), X_train)), X_train.transpose()), y_train) theta after Multi-Variate Linear Regression with Normal Equation Method Side-by-Side Visualization of Features and Target Variable Actual and Prediction using 3-D Scatter Plots : =>Actual Target Variable Visualization: from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3D sequence_containing_x_vals = list(X_train.transpose()[1])sequence_containing_y_vals = list(X_train.transpose()[2])sequence_containing_z_vals = list(y_train) fig = pyplot.figure()ax = Axes3D(fig) ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Actual Housing Price', fontsize=10) =>Prediction Target Variable Visualization: # Getting the predictions...predictions = hypothesis(theta, X_train) from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3D sequence_containing_x_vals = list(X_train.transpose()[1])sequence_containing_y_vals = list(X_train.transpose()[2])sequence_containing_z_vals = list(predictions) fig = pyplot.figure()ax = Axes3D(fig) ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Housing Price Predictions', fontsize=10) Actual Housing Price Vs Predicted Housing Price Performance Analysis and Statistical Comparison with Gradient Descent Multi-Variate Linear Regression: The Performance of Gradient Descent Multi-Variate Linear Regression is obtained from _Learn, Implement and Tune…_towardsdatascience.com Implementation of Multi-Variate Linear Regression in Python using Gradient Descent Optimization… Here MAE, MSE and RMSE are all in dollars Note: Similarly, OLS/Normal Equation Method performed better than Gradient Descent. The Modified Implementation of Gradient Descent for Multi-Variate Linear Regression is given below: def (theta, alpha, h, X, y, n):# n is the number of featuresk = 0 # required number of iterations set as counter variabletheta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)* X.transpose()[j])h = hypothesis(theta, X, n)cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))while(1):theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)for j in range(1,n+1):theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)* X.transpose()[j])h = hypothesis(theta, X, n)if((1/X.shape[0])*0.5*sum(np.square(h-y_train))) >= cost:breakcost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))k = k + 1theta = theta.reshape(1,n+1)return theta, cost, k modified_BGD A major disadvantage of the Modified Implementation of Gradient Descent is that, it takes a lot of time to converge i.e., it requires to iterate a large number say, 600k iterations to converge or to reach the Global Minima. But on the other hand, by running a single statement, we are getting the theta Global Minima in case of OLS/Normal Equation Method. That’s all about the Implementation of Uni-Variate, Polynomial and Multi-Variate Regression using OLS/Normal Equation Approach For Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on LinkedIn . _View Navoneel Chakrabarty's profile on LinkedIn, the world's largest professional community. Navoneel has 2 jobs listed…_www.linkedin.com Navoneel Chakrabarty - Contributing Author - Towards Data Science | LinkedIn