Building a machine learning model requires a series of steps, from data preparation, data cleaning, feature engineering, model building to model deployment. Therefore, it can take a lot of time for a data scientist to create a solution that solves a business problem. To help speed up the process, you can use Pycaret, an open-source library. Pycaret can help you perform all the end-to-end processes of ML faster with few lines of code. What is Pycaret? Pycaret is an open-source, low code library in python that aims to automate the development of machine learning models. This library is useful for any data scientist, analyst, ML engineer, or anyone learning machine learning to be more productive and reach conclusions faster. The library has 70+ automated open-source algorithms and over 25+ pre-processing techniques that can help you build machine learning models with good performance. It supports supervised learning (classification and regression), clustering, anomaly detection, and natural language processing tasks. PyCaret is a Python wrapper around several libraries and frameworks such as and many more. machine learning scikit-learn, XGBoost, Microsoft LightGBM, spaCy, XGBoost, Optuna, Hyperopt, Ray, You don't have to worry about data preparation, feature engineering, , or hyperparameter tuning. Pycaret can perform all these tasks automatically with just a few lines of code. feature selection Another benefit of the library is that after building your machine learning model you can directly deploy the transformation pipeline and trained model on Amazon Web Service (AWS), Microsoft Azure, or Google Cloud Platform(GCP). For classification and regression problems, Pycaret uses the following evaluation metrics: Accuracy, AUC, Recall, Precision, F1, Kappa For Classification: MAE, MSE, RMSE, R2, RMSLE, MAPE For Regression: In this article, you will learn how to use the Pycaret library to automate the end-to-end process of machine learning faster with little manual configuration. How to Install Pycaret Installation is easy and takes only a few minutes. All dependencies are also installed with PyCaret.You can view a list of dependencies . here pip install pycaret Load Data In this tutorial, we will use "mobile price datasets" and the goal is to predict a price range indicating how high the price is. You can download the dataset : here You can load the dataset by using the pandas library. pandas pd numpy np data = pd.read_csv( )
data.head() # import packages import as import as #load data "/train.csv" Let's check the shape of the dataset. data.shape (2000,21) As you can see, the dataset has 20 features and 1 target. Prepare the Environment The first step you need to do is preparing the environment to run your machine learning experiments. You need to initialize the setup() function from pycaret.classification module. In the setup function, you need to define the dataframe for your dataset and the target variable, for this problem is price_range. You can also set the experiment name and other settings. pycaret.classification * grid = setup(data=data, target=data.columns[ ], html= , silent= , verbose= ,log_experiment = , experiment_name = ) from import # setup the environment -1 False True True True 'mobile_prices' As I have said before, Pycaret handles all data preprocessing automatically and these steps are applied within setup() and all the operations performed in PyCaret are sequentially stored in a Pipeline. Note: Create a Model. To create a model in pycaret is very simple and straightforward. You need to add only one parameter i.e the model name in the create_model() function. The create_model will train the algorithm and return a table with k-fold cross-validated scores and the means from different evaluation metrics such as accuracy and F1. In this example, we can train K Neighbors classifier by passing the string input called “knn”. You can to see a complete list of more than 60 estimators available in the Pycaret library. click here knn = create_model( ) #create model 'knn' As you can see the mean accuracy is 91.85%. Compare Different Models With Pycaret, you can train and evaluate the performance of all estimators available in the model library using K-fold validation. The compare_models() function will return a score grid with average cross-validated scores from all estimators. best = compare_models() The table above is sorted by using accuracy metric and the estimators that perform well is followed by Linear Discriminant Analysis. K Neighbors Classifier Sometimes accuracy is not a good evaluation metric depending on the nature of your dataset. You can choose other evaluation metrics to determine which model performs better than others. Tune Model You can also improve the performance of your model by tuning its hyperparameters. The tune_model() function from Pycaret can automatically tune the hyperparameter of a machine learning model by using different search algorithms such as: Random Search Grid Search Bayesian search Tree-structured Parzen Estimator search Now we can tune the KNN model to improve its performance. tuned_knn = tune_model(knn) #tune model The output of the function is a score grid with CV scores and the trained model object. After tuning the hyperparameters of the KNN model, the performance has improved from to . 91.85% 93.00% Model Evaluation You can evaluate your trained model by using the evaluate_model() function from Pycaret. The function displays a user interface for analyzing the performance of a trained model. evaluate_model(tuned_knn) You can view the following plots and other performance details such as: Hyperparameters of the trained models. confusion matrix. Precision Recall Class report (for classification problem) Learning curve Decision Boundary Error Validation Curve This function only works in IPython enabled Notebook. Note: Make Prediction To make a prediction on unseen data, you can use the predict_model() function. For a classification problem, the function predicts Label and Score (probability of predicted class) using a trained model. When data is none, it predicts label and score on the test or holdout set which is 30% of the dataset (by default). holdout_prediction = predict_model(tuned_knn) The tuned KNN model still performs well on the test set with an accuracy of . 93.84% Save Trained Model After training and doing a lot of machine learning experiments to get the best performance, you can save the entire pipeline containing all preprocessing steps and trained model object as a binary pickle file by using the save_model() function. You need to pass the trained model object and the name of the model that will be used to create a pickle file. save_model(tuned_knn, model_name = ) # saving model 'knn_model' Final Thoughts on Pycaret Library In this article, you have learned the most important steps to build machine learning models by using the Pycaret library. The library has a lot of modules and examples to help you build machine learning models in different cases. Check the following resources if you are looking to go deeper. Regression Clustering Anomaly Detection Natural Language Processing Association Rule Mining If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post! You can also find me on Twitter . @Davis_McDavid And you can read more articles like this . here Want to keep up to date with all the latest in python? in the footer below. Subscribe to our newsletter

Amazon

Google

Microsoft

Target

Twitter

NLP Datasets from HuggingFace: How to Access and Train Them

How to Perform Data Augmentation with Augly Library

Contact me for collaboration

Nominated for 2022 - Data Science Demon

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - HackerNoon Contributor of the Year - Google

Nominated for 2022 - HackerNoon Contributor of the Year - Machine Learning

Too Long; Didn't Read

Pycaret: A Faster Way to Build Machine Learning Models

Pycaret: A Faster Way to Build Machine Learning Models

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Best African Language Datasets for Data Science Projects

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Best African Language Datasets for Data Science Projects

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps