Optimal Feature Selection

Automatic selection of optimal features from the noise

Linear models struggle with high-dimensional noisy data

Linear models such as linear and logistic regression are some of the most well-studied and understood predictive models. However, they are not well-suited to large numbers of features, and it is hard to know which set of features give the most significant and predictive model.

Finding the perfect subset is unfortunately an NP hard problem, making it computationally prohibitive for problems of practical sizes. There exist heuristic approaches for selecting features such as forward and backward stepwise variable selection, as well as regularization methods like Lasso and elastic net. However, experiments with both synthetic and real data show that these approximation methods such as Lasso fail to find the exact solution, and as a result they often find many extra false features.

True variable selection with modern optimization

Harnessing the power of modern optimization, Optimal Feature Selection is the first known method that offers exact yet computationally practical way to find the set of optimal features.

Compared to LASSO, Optimal Feature Selection reduces falsely selected features by a half while maintaining same level of true positive rates. This means it is more efficient in choosing the right variables, and that the resulting model is simplier, more interpretable, and more accurate.

Optimal Feature Selection has highest accuracy and lower false alarm compared to Lasso in an experiment with 2000 features and only 10 of which are relevant.

High interpretability and accuracy with a fraction of selected features

We applied Optimal Feature Selection in a variety of cases. In an automotive testing setting, Optimal Feature Selection reaches the level of best performance with 8 features, whereas elastic net (the best-in-class method) needs close to 80 to reach the same level of performance.

As the goal of this application was not only to predict accurately but also to learn the physical mechanism that affects the outcome, the simpler model stands out even more, as it offers a practical small set of best variables for the engineers to further investigation and make changes to improve outcome.

Comparing the performance over different number of features selected between Optimal Feature Selection and Elastic Net

Extremely fast and scalable

Optimal Feature Selection is offered in two flavors: the exact method as well as heuristic solutions. Both scale well to large datasets as a result of progress made in modern optimization over the past decades.

In particular, for industry scale datasets with over millions of observations and thousands of features, the heuristic option can find optimal features in just matter of seconds. In addition, experiments show that in the majority of cases studied, the heuristic options in fact find the exact solution, therefore no tradeoff is needed when the data scale is extremely large.

Computational time comparisons. Both exact and approximate methods remain tractable as the number of samples increases.

Want to try Optimal Feature Selection?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.

© 2020 Interpretable AI, LLC. All rights reserved.