This article illustrates how to build, in less than 5 minutes, a simple linear regression model with gradient descent. The goal is to predict a dependent variable (y) from an independent variable (X).
We want to predict salaries given years of experience. For that, we will explain a few concepts (Gradient descent, Linear model) and code 4 functions:
After that, we will train our model using the learning rate. Finally, we find the best coefficient and predict new values never seen by the model.
In machine learning, the linear model is a regression model searching for the relationship between the independent variable (X) and the dependent variable.
In this article, we dive into simple linear regression (with only one independent variable).
The formula for simple linear regression is:
y = B0 + B1x
y is the variable we want to predict
x is the independent variable (input variable)
B0 is the term representing y when x = 0
B1 is the coefficient (weight) linked to x.
When you build a simple linear regression model, the goal is to find the parameters B0 and B1. To find the best parameters, we use gradient descent.
Imagine your model finds that the best parameters are B0 = 10 and B1 = 12.
If you want to predict y (salary) based on new data (10 years of experience), you just need to calculate it.
y (salary) = 10 + (12 * 10) = 130k
With a simple calculation, your model succeeds in predicting a salary (130k) based on unknown data (10 years of experience).
Gradient descent is one of the methods to train the model and find the best parameters/coefficient (B0 and B1).
For that, it calculates the errors and adjusts the gradients according to the partial derivative.
Below, I detail and explain the B0 and B1 calculations.
exp = np.array([1, 2, 3, 4, 5])
salaries = np.array([30, 40, 50, 60, 70])
learning_rate = 0.1
B0 = 2
B1 = 2
pred = B0 + B1 * exp
= 2 + (2 * [1, 2, 3, 4, 5])
= 2 + (2 + 4 + 6 + 8 + 10)
= [4 + 6 + 8 + 10 + 12]
errors = pred - salaries
= [4 + 6 + 8 + 10 + 12] - [30, 40, 50, 60, 70]
= [-26, -34, -42, -50, -58]
gradient_B0 = sum(errors) / len(exp)
= sum([-26, -34, -42, -50, -58]) / 5
= -42
gradient_B1 = sum(errors * exp) / len(exp)
= sum([-26 * 1, -34 * 2, -42 * 3, -50 * 4, -58 * 5]) / 5
= sum([-26, -68, -126, -200, -290]) / 5
= -142
B0 = B0 - (gradient_B0 * learning_rate)
= 2 - (-42 * 0.1)
= 2 - (-4.2)
= 6.2
B1 = B1 - (gradient_B1 * learning_rate)
= 2 - (-142 * 0.1)
= 2 - (-14.2)
= 16.2
First, we define an arbitrary or random value for B0 and B1. Based on the formula B0 + B1 * exp, we calculate prediction. Afterward, we calculate errors. Errors are the prediction minus real values (salaries). We use those errors to find gradient_B0 and gradient_B1.
In simple words, gradient descent tries to find the line-minimizing errors.
For that, it updates B0 (Intercept) and B1 (Slope).
B0 represents the value of y when x is 0.
B1 represents the change in y for a unit change in x. For example, if y increases by 10 when x increases by 1, B1 would be 10.
In each iteration, gradient descent will reduce the cost by adjusting the intercept and slope with new values.
First of all, we start with the predict function.
def predict(exp, B0, B1):
return B0 + B1 * exp
To understand it, I will share the formula of simple linear regression and briefly explain the role of coefficients B0 and B1.
linear regression formula: y = β0 + β1 ⋅ X+ϵ
y is the variable we want to predict (salary)
X is the independent variable (years of experience)
B0 and B1 are the coefficients we adjust to find the lower cost. The most the cost is the better the predictions will be.
The role of the cost function is to calculate the difference between prediction and real values. For this article’s purpose, I only print the cost for each iteration. According to your need, you can code your own cost function and use it to adjust the parameters of your model.
I use mean squared error: MSE = (1/n) * Σ(yi — ŷi)²
n is the number of samples
yi is the real value
ŷi is the predicted value
I wrote an article explaining the MSE in detail.
def mse_cost_function(error, predictions):
return np.sum(error ** 2) / (2 * len(predictions))
We apply gradient descent using the learning rate. Its purpose is to adjust the model parameters during each iteration. It controls how quickly or slowly the algorithm converges to a minimum of the cost function.
I fixed its value to 0.01. Be careful, if you have a learning rate too high, the gradient descent could never converge towards the minimum.
def gradient_descent(exp, salaries, B0, B1, learning_rate, num_iterations):
num_samples = len(exp)
cost_history = []
for _ in range(num_iterations):
predictions = predict(exp, B0, B1)
error = predictions - salaries
gradient_B0 = np.sum(error) / num_samples
gradient_B1 = np.sum(error * exp) / num_samples
B0 -= learning_rate * gradient_B0
B1 -= learning_rate * gradient_B1
cost = mse_cost_function(error, predictions)
cost_history.append(cost)
return B0, B1, cost_history
We display the cost history we saved in the gradient descent.
You can try several learning rates and a number of iteration variables to see the impact on the cost curve.
Below I tried with num_iterations = 200
def print_graph(exp, salary):
plt.scatter(exp, salary, label="Real values")
plt.plot(exp, predict(exp, B0, B1), color='red', label="Linear Regression")
plt.xlabel("Years of experience")
plt.ylabel("Salary")
plt.legend()
plt.show()
Importing the libraries and initializing variables.
exp is an independent variable representing years of experience
salaries is a dependent variable representing salary
I set up arbitrary values for B0, B1, learning_rate, and num_iterations.
num_iterations represents the number of iterations/steps the algorithm performs.
import numpy as np
import matplotlib.pyplot as plt
exp = np.array([1, 2, 3, 4, 5])
salaries = np.array([30, 40, 50, 60, 70])
B0 = 2
B1 = 2
learning_rate = 0.01
num_iterations = 1000
B0, B1, cost_history = gradient_descent(exp, salaries, B0, B1, learning_rate, num_iterations)
print_graph(exp, salaries)
We simply use matplotlib to display a scatter plot of cost_history.
It allows us to visualize cost by iteration, in the first iterations, the cost is high, while at the end, the cost tends to be 0.
plt.plot(range(num_iterations), cost_history, marker='o')
plt.xlabel('Iteration')
plt.ylabel('Cost')
plt.title('Cost Evolution during Gradient Descent')
plt.axis([0, num_iterations, 0, max(cost_history)])
plt.show()
After finding the best coefficients, B0 and B1, we are now able to predict new values. For 30 years of experience, our model predicts a salary of 339.73k.
new_prediction = predict(30, B0, B1)
print(new_prediction)
We saw the different steps to code a simple linear regression model.
Explaining concepts such as Linear relationship, gradient descent, learning rate, and coefficient representing the intercept and slope.
We implemented gradient descent with Python by calculating B0 et B1,
and finally, printing the cost evolution with matplotlib.
You can find the full source code on Kaggle and Github.
Also published here.