How to Classify when linear Regression Fails? by@sameer.negi17
1,099 reads

How to Classify when linear Regression Fails?

Read on Terminal Reader

Too Long; Didn't Read


People Mentioned

Mention Thumbnail
featured image - How to Classify when linear Regression Fails?
Sameer Negi HackerNoon profile picture

@sameer.negi17

Sameer Negi
react to story with heart

image

Linear regression is a simple solution to our classification problems but what happens when it fails. As we will see in below problem

Suppose we want to classify Y= {0,1} and X are data samples. It is binary classification. Let’s try it with linear Regression

image

Wow, Linear Regression has done the job.It is really a good fit but what happens if i add a new data to given data set.

image

It’s really a bad solution. Linear Regression didn’t worked. What should we do next? If there is problem, there is solution and the solution is Logistic Regression. Let’s start with formal definition and have an idea what is linear and logistic Regression are:

In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic regression is used when the response variable is categorical in nature.

Intuitively, it also doesn’t make sense for h(x) to take values larger than 1 or smaller than 0 when we know that y ∈ {0, 1}.To fix this, lets change the form for our hypotheses h(x). We will choose hypothesis as follows :

image

g(z) is called Sigmoid function or L**ogistic function**. It look like as follows:

image

Let’s assume that

image

if we combine above equations together it can be re written as follows:

image

As in above equation when y =0, h(x) will become 1, p(y|x;theta) = (1-h(x)) and when y =1,(1- h(x)) will become 1, p(y|x;theta) = h(x)

Now, likelihood of the parameters are given by

image

and when we Maximize log likelihood, it will be given by :

image

Now we can use Gradient Descent , we already know that h(x) is given by sigmoid function.

image

For ease, the partial derivative of g(z) with respect to z is given by

image

Now if apply gradient descent by taking the partial derivative of log likelihood with respect to theta

image

If we compare the above rule to the least mean square, it looks identical but it’s not. This is a different learning algorithm because h(x) is now defined as non-linear function of theta transpose * x[i].

image

If you find any inconsistency in my post, please feel free to point out in the comments. Thanks for reading.

If you want to connect with me. Please feel free to connect with me on LinkedIn.

Sameer Negi - autonomous Vehicle Traniee - Infosys | LinkedIn_View Sameer Negi's profile on LinkedIn, the world's largest professional community. Sameer has 3 jobs listed on their…_www.linkedin.com

RELATED STORIES

L O A D I N G
. . . comments & more!
Hackernoon hq - po box 2206, edwards, colorado 81632, usa