Scikit-learn is the most popular open-source and free python machine learning library for and Machine learning practitioners. The scikit-learn library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction. Data scientists In this article, I’m happy to share with you the top 5 new features presented in the new version of scikit-learn (1.0). TABLE OF CONTENTS Install Scikit-learn v1.0 New Flexible Plotting API Feature Names Support Pearson’s R Correlation Coefficient OneHot Encoder Improvements Histogram-based Gradient Boosting Models are now stable Install Scikit-learn v1.0 Firstly, make sure you install the latest version (with pip): pip install --upgrade scikit-learn If you are using conda, use the following command: conda install -c conda-forge scikit-learn Note: Version 1.0.0 of scikit-learn requires python 3.7+, NumPy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+ Now, let’s look at the new features! 1. New Flexible Plotting API Scikit-learn 1.0 has introduced new flexible plotting API such as metrics.PrecisionRecallDisplay, metrics.DetCurveDisplay, and inspection.PartialDependenceDisplay. This Plotting API comes with two class methods: (a) from_estimator() This class method allows you to fit a model and plot the results at the same time. Let's look at an example by using PrecisionRecallDisplay to visualize Precision and Recall. matplotlib.pyplot plt sklearn.datasets make_classification sklearn.metrics PrecisionRecallDisplay sklearn.model_selection train_test_split sklearn.ensemble RandomForestClassifier X, y = make_classification(random_state= ) X_train, X_test, y_train, y_test = train_test_split(X, y,test_size= ) classifier= RandomForestClassifier(random_state= ) classifier.fit(X_train, y_train) disp_confusion = PrecisionRecallDisplay.from_estimator(classifier, X_test, y_test) plt.show() import as from import from import from import from import 42 0.2 42 (b) from_predicitons() In this class method, you can just pass prediction results and get your plots. Let's look at an example by using ConfusionMatrixDisplay to visualize the confusion matrix. matplotlib.pyplot plt sklearn.datasets make_classification sklearn.metrics confusion_matrix, ConfusionMatrixDisplay sklearn.model_selection train_test_split sklearn.ensemble RandomForestClassifier X, y = make_classification(random_state= ) X_train, X_test, y_train, y_test = train_test_split(X, y,test_size= ) classifier= RandomForestClassifier(random_state= ) classifier.fit(X_train, y_train) predictions = classifier.predict(X_test) disp_confusion = ConfusionMatrixDisplay.from_predictions(predictions, y_test, display_labels=classifier.classes_) plt.show() import as from import from import from import from import 42 0.2 42 2. Feature Names Support (Pandas Dataframe) In the new version of scikit-learn, you can track the names of the columns of your pandas dataframe when working with transformers or estimators. When you pass a dataframe to an estimator and call the fit method, the estimator will store the features name in the attribute. feature_names_in_ sklearn.preprocessing StandardScaler pandas pd X = pd.DataFrame([[ , , ], [ , , ]], columns=[ , , ]) scalar = StandardScaler().fit(X) print(scalar.feature_names_in_) from import import as 1 2 3 4 5 6 "age" "days" "duration" array(['age', 'days', 'duration'], dtype=object) feature names support is only enabled when the column names in the dataframe are all strings. Note: 3. Pearson’s R Correlation Coefficient This is a new feature in that can measure the linear relationship between each feature and the target for the regression tasks. It is also known as the pearson’s r. feature selection The cross-correlation between each regressor and the target is computed as ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)). Where X is the features of the dataset and y is the target variable. Note: The following example shows how you can compute the Pearson’s r for each feature and the target. sklearn.datasets fetch_california_housing sklearn.feature_selection r_regression X, y = fetch_california_housing(return_X_y= ) print(X.shape) p = r_regression(X,y) print(p) from import from import True (20640, 8) [ 0.68807521 0.10562341 0.15194829 -0.04670051 -0.02464968 -0.02373741 -0.14416028 -0.04596662] 4. OneHot Encoder Improvements The OneHot Encoder in scikit-learn 1.0 can accept values it has not seen before. You just need to set a parameter called handle_unknown to 'ignore' (handle_unknown='ignore') when instantiating the transformer. When you transform data with an unknown category, the encoded columns for this feature will be all zero values. In the following example, we pass an unknown category when we transform the data given. sklearn.preprocessing OneHotEncoder enc = OneHotEncoder(handle_unknown= ) X = [[ ], [ ], [ ]] enc.fit(X) transformed = enc.transform([[ ], [ ],[ ]]).toarray() print(transformed) from import 'ignore' 'secondary' 'primary' 'primary' 'degree' 'primary' 'secondary' [[0. 0.] [1. 0.] [0. 1.]] In the inverse transform, an unknown category will be labeled as None. Note: 5. Histogram-based Gradient Boosting Models are now Stable The two supervised learning algorithms introduced in the previous version of scikit-learn 0.24 (HistGradientBoostingRegressor and HistGradientBoostingClassifier) are no longer experimental and you can simply import and use them as: sklearn.ensemble HistGradientBoostingClassifier, HistGradientBoostingRegressor from import There are more new features in scikit-learn 1.0.0 that I did not mention in this article. You can find the highlights of other features released in scikit-learn 1.0.0 . here Congratulations, you have made it to the end of this article! I hope you have learned something new that will help you on your next machine learning project. If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post! You can also find me on Twitter @Davis_McDavid. And you can read more articles like this . here