7 Scikit-Learn Secrets You Probably Didn’t Know About

0 0 7 minutes read

Image by Author | Ideogram
7 Scikit-Learn Secrets You Probably Didn’t Know About

As data scientists with Python programming skills, we use Scikit-Learn a lot. It’s a machine learning package usually taught to new users initially and can be used right through to production. However, much of what is being taught is basic implementation, and Scikit-Learn contains many secrets to improve our data workflow.

This article will discuss seven secrets from Scikit-Learn you probably didn’t know. Without further ado, let’s get into it.

1. Probability Calibration

Some machine learning model classification task models provide probability output for each class. The problem with the probability estimation output is that it is not necessarily well-calibrated, which means that it does not reflect the actual likelihood of the output.

For example, your model might provide 95% of the “fraud” class output, but only 70% of that prediction is correct. Probability calibration would aim to adjust the probabilities to reflect the actual likelihood.

There are a few calibration methods, although the most common are the sigmoid calibration and the isotonic regression. The following code uses Scikit-Learn to calibrate the technique in the classifier.

from sklearn.calibration import CalibratedClassifierCV from sklearn.svm import SVC svc = SVC(probability=False) calibrated_svc = CalibratedClassifierCV(base_estimator=svc, method=’sigmoid’, cv=5) calibrated_svc.fit(X_train, y_train) probabilities = calibrated_svc.predict_proba(X_test)

from sklearn.calibration import CalibratedClassifierCV

from sklearn.svm import SVC

svc = SVC(probability=False)

calibrated_svc = CalibratedClassifierCV(base_estimator=svc, method=‘sigmoid’, cv=5)

calibrated_svc.fit(X_train, y_train)

probabilities = calibrated_svc.predict_proba(X_test)

You can change the model as long as it provides probability output. The method allows you to switch between the “sigmoid” or “isotonic”.

For example, here is a Random Forest classifier with isotonic calibration.

from sklearn.calibration import CalibratedClassifierCV from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(random_state=42) calibrated_rf = CalibratedClassifierCV(base_estimator=rf, method=’isotonic’, cv=5) calibrated_rf.fit(X_train, y_train) probabilities = calibrated_rf.predict_proba(X_test)

from sklearn.calibration import CalibratedClassifierCV

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state=42)

calibrated_rf = CalibratedClassifierCV(base_estimator=rf, method=‘isotonic’, cv=5)

calibrated_rf.fit(X_train, y_train)

probabilities = calibrated_rf.predict_proba(X_test)

If your model does not provide the desired prediction, consider calibrating your classifier.

2. Feature Union

The next secret we will explore is the implementation of the feature union. If you don’t know about it, feature union is a Scikit-Class that provides a way to combine multiple transformer objects into a single transformer.

It’s a valuable class when we want to perform multiple transformations and extractions from the same dataset and use them in parallel for our machine-learning modeling.

Let’s see how they work in the following code.

from sklearn.pipeline import FeatureUnion, Pipeline from sklearn.decomposition import PCA from sklearn.feature_selection import SelectKBest from sklearn.svm import SVC combined_features = FeatureUnion([ (“pca”, PCA(n_components=2)), (“select_best”, SelectKBest(k=1)) ]) pipeline = Pipeline([ (“features”, combined_features), (“svm”, SVC()) ]) pipeline.fit(X_train, y_train)

from sklearn.pipeline import FeatureUnion, Pipeline

from sklearn.decomposition import PCA

from sklearn.feature_selection import SelectKBest

from sklearn.svm import SVC

combined_features = FeatureUnion([

(“pca”, PCA(n_components=2)),

(“select_best”, SelectKBest(k=1))

])

pipeline = Pipeline([

(“features”, combined_features),

(“svm”, SVC())

])

pipeline.fit(X_train, y_train)

In the code above, we can see that we combined two transformer methods for dimensionality reduction with PCA and selected the best top features into one transformer pipeline with feature union. Combining them with the pipeline would allow our feature union to be used in a singular process.

It’s also possible to chain the feature union if you want to better control the feature manipulation and preprocessing. Here is an example of the previous method we discussed with an additional feature union.

# First FeatureUnion first_union = FeatureUnion([ (“pca”, PCA(n_components=5)), (“select_best”, SelectKBest(k=5)) ]) # Second FeatureUnion second_union = FeatureUnion([ (“poly”, PolynomialFeatures(degree=2, include_bias=False)), (“scaled”, StandardScaler()) ]) pipeline = Pipeline([ (“first_union”, first_union), (“second_union”, second_union), (“svm”, SVC()) ]) pipeline.fit(X_train, y_train) score = pipeline.score(X_test, y_test)

# First FeatureUnion

first_union = FeatureUnion([

(“pca”, PCA(n_components=5)),

(“select_best”, SelectKBest(k=5))

])

# Second FeatureUnion

second_union = FeatureUnion([

(“poly”, PolynomialFeatures(degree=2, include_bias=False)),

(“scaled”, StandardScaler())

])

pipeline = Pipeline([

(“first_union”, first_union),

(“second_union”, second_union),

(“svm”, SVC())

])

pipeline.fit(X_train, y_train)

score = pipeline.score(X_test, y_test)

It’s an excellent methodology for those who need extensive preprocessing at the beginning of the machine learning modeling process.

3. Feature Agglomeration

The next secret we would explore is the feature agglomeration. This is a feature selection method from Scikit-Learn but uses hierarchical clustering to merge similar features.

Feature agglomeration is a dimensionality reduction methodology, which means it is useful when there are many features and some features are significantly correlated with each other. It is also be based on hierarchical clustering, merging the features based on the linkage criterion and distance measurement we set.

Let’s see how it works in the following code.

from sklearn.cluster import FeatureAgglomeration agglo = FeatureAgglomeration(n_clusters=10) X_reduced = agglo.fit_transform(X)

from sklearn.cluster import FeatureAgglomeration

agglo = FeatureAgglomeration(n_clusters=10)

X_reduced = agglo.fit_transform(X)

We set up the number of features we want by setting the cluster numbers. Let’s see how we change the distance measurement into cosine similarity.

agglo = FeatureAgglomeration(metric=”cosine”)

agglo = FeatureAgglomeration(metric=‘cosine’)

We can also change the linkage method with the following code.

agglo = FeatureAgglomeration(linkage=”average”)

agglo = FeatureAgglomeration(linkage=‘average’)

Then, we can also change the function to aggregate the features for the new feature.

import numpy as np agglo = FeatureAgglomeration(pooling_func=np.median)

import numpy as np

agglo = FeatureAgglomeration(pooling_func=np.median)

Try experimenting with the feature agglomeration to acquire the best dataset for your modeling.

4. Predefined Split

The predefined split is a Scikit-Learn class used for a custom cross-validation strategy. It specifies the schema during training and test data splitting. It’s a valuable method when we want to split our data in a certain way, and the standard K-fold or stratified K-fold is insufficient.

Let’s try out predefined split using the code below.

from sklearn.model_selection import PredefinedSplit, GridSearchCV # -1 for training, 0 for test test_fold = [-1 if i < 100 else 0 for i in range(len(X))] ps = PredefinedSplit(test_fold) param_grid = ‘parameter’: [1, 10, 100] grid_search = GridSearchCV(model, param_grid, cv=ps) grid_search.fit(X, y)

from sklearn.model_selection import PredefinedSplit, GridSearchCV

# -1 for training, 0 for test

test_fold = [–1 if i < 100 else 0 for i in range(len(X))]

ps = PredefinedSplit(test_fold)

param_grid = ‘parameter’: [1, 10, 100]

grid_search = GridSearchCV(model, param_grid, cv=ps)

grid_search.fit(X, y)

In the example above, we set the data splitting schema by selecting the first hundred data as training and the rest as the test.

The strategy for splitting depends on your requirements. We can change that with the weighting process.

sample_weights = np.random.rand(100) test_fold = [-1 if i < 80 else 0 for i in range(len(X))] ps = PredefinedSplit(test_fold)

sample_weights = np.random.rand(100)

test_fold = [–1 if i < 80 else 0 for i in range(len(X))]

ps = PredefinedSplit(test_fold)

This strategy offers a novel take on the data-splitting process, so try it out to see if it offers benefits to you.

5. Warm Start

Have you trained a machine learning model that requires an extensive dataset, and want to train it in batch? Or are you using online learning that requires incremental learning using streaming data? If you find yourself in these cases, you don’t want to retrain the model from the beginning.

This is where a warm start could help you.

The warm start is a parameter in the Scikit-Learn model that allows us to reuse our last trained solution when fitting the model again. This method is valuable when we don’t want to retrain our model from scratch.

For example, the code below shows the warm start process when we add more trees to the model and retrain it without starting from the beginning.

from sklearn.ensemble import GradientBoostingClassifier #100 trees model = GradientBoostingClassifier(n_estimators=100, warm_start=True) model.fit(X_train, y_train) # Add 50 trees model.n_estimators += 50 model.fit(X_train, y_train)

from sklearn.ensemble import GradientBoostingClassifier

#100 trees

model = GradientBoostingClassifier(n_estimators=100, warm_start=True)

model.fit(X_train, y_train)

# Add 50 trees

model.n_estimators += 50

model.fit(X_train, y_train)

It’s also possible to do batch training with the warm start feature.

from sklearn.linear_model import SGDClassifier model = SGDClassifier(max_iter=1000, warm_start=True) # Train on first batch model.fit(X_batch_1, y_batch_1) # Continue training on second batch model.fit(X_batch_2, y_batch_2)

from sklearn.linear_model import SGDClassifier

model = SGDClassifier(max_iter=1000, warm_start=True)

# Train on first batch

model.fit(X_batch_1, y_batch_1)

# Continue training on second batch

model.fit(X_batch_2, y_batch_2)

Experiment with a warm start to always have the best model without sacrificing on training time.

6. Incremental Learning

And speaking of incremental learning, we can use Scikit-Learn to do that, too. As mentioned above, incremental learning — or online learning — is a machine learning training process in which we sequentially introduce new data.

It’s often used when our dataset is extensive, or the data is expected to come in over time. It’s also used when we expect data distribution to change over time, so constant retraining is required, but not from scratch.

In this case, several algorithms from Scikit-Learn provide incremental learning support using the partial fit method. It would allow the model training to take place in batches.

Let’s look at a code example.

from sklearn.linear_model import SGDClassifier import numpy as np classes = np.unique(y_train) model = SGDClassifier() for batch_X, batch_y in data_stream: model.partial_fit(batch_X, batch_y, classes=classes)

from sklearn.linear_model import SGDClassifier

import numpy as np

classes = np.unique(y_train)

model = SGDClassifier()

for batch_X, batch_y in data_stream:

model.partial_fit(batch_X, batch_y, classes=classes)

The incremental learning will keep running as long as the loop continues.

It’s also possible to perform incremental learning not only for model training but also for preprocessing.

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() for batch_X, batch_y in data_stream: batch_X = scaler.partial_fit_transform(batch_X) model.partial_fit(batch_X, batch_y, classes=classes)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

for batch_X, batch_y in data_stream:

batch_X = scaler.partial_fit_transform(batch_X)

model.partial_fit(batch_X, batch_y, classes=classes)

If your modeling requires incremental learning, try to use the partial fit method from Scikit-Learn.

7. Accessing Experimental Features

Not every class and function from Scikit-Learn have been released in the stable version. Some are still experimental, and we must enable them before using them.

If we want to enable the features, we need to see what features are still in the experimental and import the enable experiment API from Scikit-Learn.

Let’s see an example code below.

# Enable the experimental feature from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer imputer = IterativeImputer(random_state=0)

# Enable the experimental feature

from sklearn.experimental import enable_iterative_imputer

from sklearn.impute import IterativeImputer

imputer = IterativeImputer(random_state=0)

As of the time this article was written, the IterativeImputer class is still in the experimental phase, and we need to import the enabler in the beginning before we use the class.

Another feature that is still in the experimental phase is the halving search methodology.

from sklearn.experimental import enable_halving_search_cv from sklearn.model_selection import HalvingRandomSearchCV from sklearn.model_selection import HalvingGridSearchCV

from sklearn.experimental import enable_halving_search_cv

from sklearn.model_selection import HalvingRandomSearchCV

from sklearn.model_selection import HalvingGridSearchCV

If you find useful features in Scikit-Learn but are unable to access them, they might be in the experimental phase, so try to access them by importing the enabler.

Conclusion

Scikit-Learn is a popular library that is used in many machine learning implementations. There are so many features in the library that there are undoubtedly many you are unaware of. To review, the seven secrets we covered in this article were: