paint-brush
Predict a TIP using Machine learningby@doronsegal
651 reads
651 reads

Predict a TIP using Machine learning

by Doron SegalOctober 19th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This guide is the second part of a series of tutorials for machine learning using Python and R, you can find part 1 <a href="https://medium.com/@doronsegal/a-starter-guide-for-machine-learning-f2fe14d1665c" target="_blank">here</a>.

People Mentioned

Mention Thumbnail

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Predict a TIP using Machine learning
Doron Segal HackerNoon profile picture

This guide is the second part of a series of tutorials for machine learning using Python and R, you can find part 1 here.


What is linear regression? (from Wikipedia)In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variabley and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. (This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.)

Basically, it’s this function from middle school, Y = mx + b Where Y is the value on the y-axis that will be equal to the slope multiplied by the value of x plus the constant b.

Let’s start by creating a model to predict the tip a waiter will get based on his or her client’s age (I will address the issues with this specific model below). To fit our dataset into a linear regression algorithm X will be age and Y will be tip amount. Later I will demonstrate how to improve our model by using group size instead of age. In that example X will be group size and Y will be tip amount

This is just an example to demonstrate how to build a simple model using linear regression.

Age and tip do not have a strong correlation and there are a number of variables to take into account such as total number of diners, location, type of restaurant, meal-type, etc..

The dataset:

This random dataset, it isn’t designed to make sense.

Python Example

R example

Here’s the graph

This graph is our train dataset

Test dataset graph


**What can we learn from the graph?**As you probably guessed there is a weak correlation between tip amount and the client age. As data-driven people we should always check all the possibilities and try not to be biased. Now let’s try to check a different dataset tip vs a number of diners.

Using a new dataset: number of diners and tip

Group size vs Tip

Now let’s crunch the numbers using our linear regression model

We are going to use the same code (see example above), but I’ll show you the new train dataset and test dataset graph.

How much a group of X will tip (Train set) — stronger correlation

How much a group of X will tip (Test set) — we can see that our test set fit to our train set

So, we see that there is a much stronger correlation between group size tip amount than that of tip amount and client age.

We can run a query on our model to predict the tip amount for a group of 5 diners using this command: my_test_tip = regressor.predict(np.array(5))


**Summary**This is a basic model meant only to demonstrate the basics of linear regression and how to create a model. In real life we use more variables, making our models much more complicated.

I’ve been putting some time into these tutorials, so I hope you guys find it useful. If you want more, press on claps show your support! or buy me a beer by sending some bitcoin love 1Pg4BbrevSEWroo6zS6Kyvi1EMffpAgLac


Cheers!For more Doron Segal checkout my site http://segaldoron.com