paint-brush
How to Classify when linear Regression Fails?by@sameer.negi17
1,147 reads
1,147 reads

How to Classify when linear Regression Fails?

by Sameer Negi
Sameer Negi HackerNoon profile picture

Sameer Negi

@sameer.negi17

July 6th, 2018
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

<a href="https://hackernoon.com/tagged/linear-regression" target="_blank">Linear regression</a> is a simple solution to our classification problems but what happens when it fails. As we will see in below problem

People Mentioned

Mention Thumbnail

Sameer Negi

@sameer.negi17

featured image - How to Classify when linear Regression Fails?
Sameer Negi HackerNoon profile picture
Sameer Negi

Sameer Negi

@sameer.negi17

image

Linear regression is a simple solution to our classification problems but what happens when it fails. As we will see in below problem

Suppose we want to classify Y= {0,1} and X are data samples. It is binary classification. Let’s try it with linear Regression

image

Wow, Linear Regression has done the job.It is really a good fit but what happens if i add a new data to given data set.

image

It’s really a bad solution. Linear Regression didn’t worked. What should we do next? If there is problem, there is solution and the solution is Logistic Regression. Let’s start with formal definition and have an idea what is linear and logistic Regression are:

In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic regression is used when the response variable is categorical in nature.

Intuitively, it also doesn’t make sense for h(x) to take values larger than 1 or smaller than 0 when we know that y ∈ {0, 1}.To fix this, lets change the form for our hypotheses h(x). We will choose hypothesis as follows :

image

g(z) is called Sigmoid function or L**ogistic function**. It look like as follows:

image

Let’s assume that

image

if we combine above equations together it can be re written as follows:

image

As in above equation when y =0, h(x) will become 1, p(y|x;theta) = (1-h(x)) and when y =1,(1- h(x)) will become 1, p(y|x;theta) = h(x)

Now, likelihood of the parameters are given by

image

and when we Maximize log likelihood, it will be given by :

image

Now we can use Gradient Descent , we already know that h(x) is given by sigmoid function.

image

For ease, the partial derivative of g(z) with respect to z is given by

image

Now if apply gradient descent by taking the partial derivative of log likelihood with respect to theta

image

If we compare the above rule to the least mean square, it looks identical but it’s not. This is a different learning algorithm because h(x) is now defined as non-linear function of theta transpose * x[i].

image

If you find any inconsistency in my post, please feel free to point out in the comments. Thanks for reading.

If you want to connect with me. Please feel free to connect with me on LinkedIn.


Sameer Negi - autonomous Vehicle Traniee - Infosys | LinkedIn_View Sameer Negi's profile on LinkedIn, the world's largest professional community. Sameer has 3 jobs listed on their…_www.linkedin.com

L O A D I N G
. . . comments & more!

About Author

Sameer Negi HackerNoon profile picture
Sameer Negi@sameer.negi17

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Amin

Mentioned in this story

profiles
X REMOVE AD