“Data are becoming the new raw material of business.” Hello friends, today I am going to tell you the way by just seeing the Dataset how would you know which model I have to choose. So, let’s get started ….! What is Data-set? A (or ) is a collection of data. Most commonly a corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the in question.Let’s see some data-set which is in the form of a . file. data set data-set data set data set csv Jupyter Notebook Assume we have to work on this data-set in which many columns and rows are there. Your first step is to identify your in the Independent and Dependent Variable data-set. A ) is the being tested and measured in a scientific experiment. An **independent variable(Rest all other variables are Independent variable like Street, LotShape, SaleCondition etc.)**is the that is changed or controlled in a scientific experiment to test the effects on the . dependent variable(generally referred to last column in the dataset i.e here the last column is SalePrice variable variable dependent variable Now, we have seen how data-set looks like What you need to know is, whether your problem is a or or . Regression problem Classification problem Clustering problem So, for that, you need to look at the dependent variable which we now already know what is dependent variable? [Note: If you don’t have dependent variable then it means it is Clustering Problem.] Let’s see how dataset looks like without DV(Dependent variable) This data was collected on our social survey mobile platform We have 300,000 millennial and Gen Z members and have collected 150,000,000 survey responses from this demographic to date. Whatsgoodly. Now, if your data-set contains a Dependent variable, then you have to see if it has the Continuous outcome or a Categorical outcome. If it is a Continuous outcome then your problem is a Regression Problem. And if it’s a Categorical outcome then your problem is a Classification problem. Let’s see how dataset looks like with DV(Dependent variable) Regression Case : This is a House Prices Data-set and in this dataset, there are lots of rows and columns are there. And you have to predict the SalePrice which is the Dependent variable, however, rest all others are independent variables. You can easily see it is Regression problem and we have to use some Regression Model on it like -RandomForest, SVR etc. Jupyter Notebook Classification Case: Now, see this dataset in which you have given which all are and you have to predict whether if some new person comes they going to buy new car or not. [ One can easily see it is classification problem because the dependent variable which is Purchased one having binary output 0 or 1 only, where 1 means it will go to buy the SUV and 0 means not going to buy the ] User ID, Gender, Age, Estimated Salary Independent Variable SUV Note: SUV. So, till now we got enough idea by just seeing the dataset we can classify our problem into Regression or Classification or Clustering. Now, how would I know which model is the best one like for example you are working on Home Price Prediction and you have to predict the price of the house based on the several parameters. But, . See, all you can do is use Grid Search for that which provide you which parameters is best for your model. which model should I use or what parameters should I have to insert into that What does the Grid Search do? It will find the optimal values for your model like which parameters should to choose. All you need to do is import the class from the S klearn library. from sklearn.model_selection import GridSearchCV Nobody can tell you in this World which model will give you the best performance or accuracy by just seeing the dataset. All you can do is classify your problem by seeing the dataset whether the dataset is and the model problem is linear or non-linear classification, regression or clustering problem. Don’t be sad because you will have the cheat sheet, which helps you detect the model. Scikit Learn If you find any difficulty in reading the cheat sheet go to this link Cheat Sheet . I hope you like this article!! If you have any problem or query in any topic related to Data Science then do let me know in the comment Section!! I’ll share more concepts soon on as well as . LinkedIn.com Article column Medium Give some love too!​ _Mohit Sharma(themenyouwanttobe&Co.)_ themenyouwanttobe@gmail.com / Telegram

How do I know which model to choose for my machine learning problem?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Data Cleaning

What Are Convolution Neural Networks? [ELI5]

The Noonification: Have U Been Pwned? (1/12/2023)

Goldman Sachs, Data Lineage, and Harry Potter Spells

People are still crazy about Python after twenty-five years

10 Questions to Consider when Setting up a Corporate A.I project

Data Cleaning

What Are Convolution Neural Networks? [ELI5]

The Noonification: Have U Been Pwned? (1/12/2023)

Goldman Sachs, Data Lineage, and Harry Potter Spells

People are still crazy about Python after twenty-five years

10 Questions to Consider when Setting up a Corporate A.I project

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps