Feature selection is so slow because it requires the creation of many models. Find out how to make it blazingly faster thanks to approximate-predictions
When developing a machine learning model, we usually start with a large set of features resulting from our feature engineering efforts.
Feature selection is the process of choosing a smaller subset of features that are optimal for our ML model.
Why doing that and not just keeping all the features?
- Memory. Big data take big space. Dropping features means that you need less memory to handle your data. Sometimes there are also external constraints.
- Time. Retraining a model on less data can save you much time.
- Accuracy. Less is more: this also goes for machine learning. Including redundant or irrelevant features means including unnecessary noise. Frequently, it happens that a model trained on less data performs better.
- Explainability. A smaller model is more easily explainable.
- Debugging. A smaller model is easier to maintain and troubleshoot.
Now, the main problem with feature selection is that it is very slow because it requires training many models.
In this article, we will see a trick that makes feature selection extremely faster thanks to “approximate-predictions”.
Let’s try to visualize the problem of feature selection. We start with N features, where N is typically hundreds or thousands.
Thus, the output of feature selection can be seen as an array of length N made of “yes”/“no”, where each element of the array tells us whether the corresponding feature is selected or not.
The process of feature selection consists of trying different “candidates” and finally picking the best one (according to our performance metric).