## Learn, Code and Tune…

As I mentioned in my previous article:

**Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…**

*Learn, Code and Tune….*towardsdatascience.com

apart from Gradient Descent Optimization, there is another approach known as Ordinary Least Squares or Normal Equation Method. This approach, by far is the most successful and adopted in many Machine Learning Toolboxes. There is a descriptive reason behind OLS/Normal Equation Method being successful and widely preferred for Gradient Descent, to which I will be coming in the latter half of this article. Firstly, let’s move on to the Approach and its Python Implementation, secondly, applying it on Practice Datasets and finally, describing intuitively that why this over-performs Gradient Descent.

The Ordinary Least Squares approach for finding the list/vector of parameters which is the minima of the Cost Function, theta is given by:

It can be expressed in the form of Vector Algebra known as Normal Equation.

Now, let’s implement this in Python for Uni-Variate Linear Regression, Polynomial Regression and Multi-Variate Linear Regression:

**OLS Uni-Variate Linear Regression using the General Form of OLS:**

A 2-module Implementation of Uni-Variate Linear Regression using OLS is formulated:

=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature X as input. The implementation of hypothesis() is given below:

defhypothesis(theta, X):

h = np.ones((X.shape[0],1))

X = X.reshape(X.shape[0], 1)

one_column = np.ones((X.shape[0],1))

X = np.concatenate((one_column, X), axis = 1)

for i in range(0,X.shape[0]):

h[i] = float(np.matmul(theta, X[i]))

h = h.reshape(X.shape[0])

return h

=>coeff_cal(): It is the principal function that takes the Feature Set, X and Label Set, y as input and returns the vector of parameters, theta that is the minima of the Cost Function. The implementation of coeff_cal() is given below:

defcoeff_cal(X,y):

num = sum(np.multiply((X - np.mean(X)), (y - np.mean(y))))

din = sum(np.square(X - np.mean(X)))

theta_1 = num/din

theta_0 = np.mean(y) - theta_1 * np.mean(X)

return np.array([theta_0,theta_1])

Now, applying the **OLS Linear Regression** on a Practice Dataset. Considering the Profit Prediction from Population Dataset available at

**navoneel1092283/univariate_regression**

*Contribute to navoneel1092283/univariate_regression development by creating an account on GitHub.*github.com

**Problem Statement**: “*Given the population of a city, analyze and predict the profit of a company using Linear Regression*”

Data Reading into Numpy Arrays :

data = np.loadtxt('data1.txt', delimiter=',')

X_train = data[:,0] #the feature_set

y_train = data[:,1] #the labels

Using the 2-module-OLS Uni-Variate Linear Regression:

theta = coeff_cal(X_train, y_train)

# getting the predictions...

predictions = hypothesis(theta, X_train)

The Regression Line Visualization of the obtained theta is done on Scatter Plot:

scatter = plt.scatter(X_train, y_train, label="training data")

regression_line = plt.plot(X_train, predictions,

label="linear regression")

plt.legend()

plt.xlabel('Population of City in 10,000s')

plt.ylabel('Profit in $10,000s')

The Regression Line Visualization comes out to be:

Now, let’s analyze the Model Performance and Statistically Compare OLS and Batch Gradient Descent in which the Performance Analysis of Gradient Descent Uni-Variate Linear Regression is obtained from:

**Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…**

*Learn, Code and Tune….*towardsdatascience.com

Here, OLS performs better than BGD in terms of Mean Absolute Error.

**OLS Polynomial Regression using Vector Algebra Form of OLS**

A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated:

=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n), Feature X and Degree of the Polynomial n as input. The implementation of the hypothesis() is given below:

defhypothesis(theta, X, n):

h = np.ones((X.shape[0],1))

theta = theta.reshape(1,n+1)

for i in range(0,X.shape[0]):

x_array = np.ones(n+1)

for j in range(0,n+1):

x_array[j] = pow(X[i],j)

x_array = x_array.reshape(n+1,1)

h[i] = float(np.matmul(theta, x_array))

h = h.reshape(X.shape[0])

return h

=>Obtaining the minima of Cost Function as theta:

from numpy.linalg import inv

theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose()

, x_array)), x_array.transpose()), y_train)

Now, applying the **OLS Polynomial (here Quadratic) Regression** on the same dataset.

data = np.loadtxt('data1.txt', delimiter=',')

X_train = data[:,0] #the feature_set

y_train = data[:,1] #the labels

x_array = np.ones((X_train.shape[0], n+1))

for i in range(0,X_train.shape[0]):

for j in range(0,n+1):

x_array[i][j] = pow(X_train[i],j)

theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose()

, x_array)), x_array.transpose()), y_train)

The Regression Line Visualization of the obtained theta is done on Scatter Plot:

import matplotlib.pyplot as plt

#getting the predictions...

training_predictions = hypothesis(theta, X_train, 2)

scatter = plt.scatter(X_train, y_train, label="training data")

regression_line = plt.plot(X_train, training_predictions, label="polynomial (degree 2) regression")

plt.legend()

plt.xlabel('Population of City in 10,000s')

plt.ylabel('Profit in $10,000s')

The Regression Line Visualization comes out to be:

Performance Analysis:

The Model Performance is analyzed and 2-way comparison is done. In other words, the performance of OLS Polynomial Regression is compared with Gradient Descent Quadratic Regression and is also compared with OLS Linear Regression.

The performance of Gradient Descent Quadratic Regression is obtained from

**Implementation of Uni-Variate Polynomial Regression in Python using Gradient Descent Optimization from…**

*Learn, Code and Tune….*towardsdatascience.com

From the table, the following conclusions can be drawn,

- Gradient Descent Quadratic Regression fetches the least Mean Absolute Error
- OLS Quadratic Regression fetches the least Mean Square Error and Root Mean Square Error
- OLS Quadratic and Linear Regression fetch the highest R Square Score.

Based on the above 3 points, it can be confirmed that

“*OLS Approach is more successful than Gradient Descent Optimization*”

**Reason : **The possible reason is that in Gradient Descent, if the Algorithm, given in

*Learn, Code and Tune….*towardsdatascience.com

is carefully studied, it can be seen that there is no fixed number of iterations mentioned. It is just mentioned “until convergence”. So, by giving a random number of iterations will not fetch best performance. The implementation should be done in such a way that the Algorithm itself finds the required number of iterations for convergence. Another important potential threat of Gradient Descent is

“*What will happen if Local Minima is obtained instead of Global Minima ?*”

Well, for this issue, the learning rate of Gradient Descent has to be chosen appropriately, in order to avoid the wiggliness of Cost Function.

Modified Implementation of Gradient Descent (for Polynomial Regression) is given below:

defmodified_BGD(theta, alpha, h, X, y, n):

# n = 1 for Linear Regression

k = 0 # number of iterations by the algorithm set as counter

theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)

for j in range(1,n+1):

theta[j] = theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))

h = hypothesis(theta, X, n)

cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))

while(1):

theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)

for j in range(1,n+1):

theta[j]=theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))

h = hypothesis(theta, X, n)

if ((1/X.shape[0])*0.5*sum(np.square(h - y))) >= cost:

break

cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))

k = k + 1

theta = theta.reshape(1,n+1)

return theta, cost, k

Following this implementation of BGD, there will statistically be very small or no difference in performance measures of Multi-Variate and Polynomial Regression.

Also from the above 2-way comparison table, it can also be confirmed that:

“*General Polynomial Regression over-performs Linear Regression, if proper degree of the polynomial is chosen*”

**OLS Multi-Variate Linear Regression using Vector Algebra Form of OLS**

A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated:

=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n) and Feature Set X (vector of 1’s of dimension = number of samples) as input. The implementation of hypothesis() is given below:

defhypothesis(theta, X):

n = X.shape[1]

h = np.ones((X.shape[0],1))

theta = theta.reshape(1,n)

for i in range(0,X.shape[0]):

h[i] = float(np.matmul(theta, X[i]))

h = h.reshape(X.shape[0])

return h

=>Obtaining the minima of Cost Function as theta:

from numpy.linalg import inv

theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose(), X_train)), X_train.transpose()), y_train)

Application of OLS/Normal Equation Method Linear Regression on Housing Price Prediction Dataset of Portland, Oregon in which the housing price depends on 2 features, size of the house (in sq.feet) and number of bedrooms. The Data-Set is available at,

**navoneel1092283/multivariate_regression**

*Contribute to navoneel1092283/multivariate_regression development by creating an account on GitHub.*github.com

**Problem Statement**: “*Given the size of the house and number of bedrooms, analyze and predict the possible price of the house”*

data = np.loadtxt('data2.txt', delimiter=',')

X_train = data[:,[0,1]] #feature set

y_train = data[:,2] # label set

one_column = np.ones((X_train.shape[0],1))

X_train = np.concatenate((one_column, X_train), axis = 1)

from numpy.linalg import inv

theta = np.matmul(np.matmul(inv(np.matmul(X_train.transpose()

, X_train)), X_train.transpose()), y_train)

Side-by-Side Visualization of Features and Target Variable Actual and Prediction using 3-D Scatter Plots :

=>Actual Target Variable Visualization:

from matplotlib import pyplot

from mpl_toolkits.mplot3d import Axes3D

sequence_containing_x_vals = list(X_train.transpose()[1])

sequence_containing_y_vals = list(X_train.transpose()[2])

sequence_containing_z_vals = list(y_train)

fig = pyplot.figure()

ax = Axes3D(fig)

ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,

sequence_containing_z_vals)

ax.set_xlabel('Living Room Area', fontsize=10)

ax.set_ylabel('Number of Bed Rooms', fontsize=10)

ax.set_zlabel('Actual Housing Price', fontsize=10)

=>Prediction Target Variable Visualization:

# Getting the predictions...

predictions = hypothesis(theta, X_train)

from matplotlib import pyplot

from mpl_toolkits.mplot3d import Axes3D

sequence_containing_x_vals = list(X_train.transpose()[1])

sequence_containing_y_vals = list(X_train.transpose()[2])

sequence_containing_z_vals = list(predictions)

fig = pyplot.figure()

ax = Axes3D(fig)

ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals,

sequence_containing_z_vals)

ax.set_xlabel('Living Room Area', fontsize=10)

ax.set_ylabel('Number of Bed Rooms', fontsize=10)

ax.set_zlabel('Housing Price Predictions', fontsize=10)

Performance Analysis and Statistical Comparison with Gradient Descent Multi-Variate Linear Regression:

The Performance of Gradient Descent Multi-Variate Linear Regression is obtained from

**Implementation of Multi-Variate Linear Regression in Python using Gradient Descent Optimization…**

*Learn, Implement and Tune…*towardsdatascience.com

Similarly, OLS/Normal Equation Method performed better than Gradient Descent. The Modified Implementation of Gradient Descent for Multi-Variate Linear Regression is given below:

defmodified_BGD(theta, alpha, h, X, y, n):

# n is the number of features

k = 0 # required number of iterations set as counter variable

theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)

for j in range(1,n+1):

theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)

* X.transpose()[j])

h = hypothesis(theta, X, n)

cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))

while(1):

theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)

for j in range(1,n+1):

theta[j] = theta[j] - (alpha/X.shape[0]) * sum((h - y)

* X.transpose()[j])

h = hypothesis(theta, X, n)

if((1/X.shape[0])*0.5*sum(np.square(h-y_train))) >= cost:

break

cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y))

k = k + 1

theta = theta.reshape(1,n+1)

return theta, cost, k

A major disadvantage of the Modified Implementation of Gradient Descent is that, it takes a lot of time to converge i.e., it requires to iterate a large number say, 600k iterations to converge or to reach the Global Minima. But on the other hand, by running a single statement, we are getting the theta Global Minima in case of OLS/Normal Equation Method.

That’s all about the Implementation of Uni-Variate, Polynomial and Multi-Variate Regression using OLS/Normal Equation Approach

For Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on **LinkedIn****.**