Transparent and automated predictive and prescriptive machine learning algorithms from cutting-edge research that empower you to push the boundary of data science
Optimal Decision Trees delivers state-of-the-art predictive performance that is on par with blackbox methods.
It produces a single tree that users can follow and validate, enabling trust in the decision-making process.
It has been applied in 20+ research and industry projects in the past year alone, covering a wide range of fields including health care, banking, and insurance.
A tree output from a cancer mortality prediction study.
In a large computational study, Optimal Decision Trees outperforms CART and is comparable to random forest and XGBoost, as averaged across 66 real-world datasets.
With the power of modern optimization, Optimal Decision Trees builds the entire tree holistically, rather than split-by-split like existing methods, closing the performance gap with black-boxes.
Our approach enables unprecedented flexibility in model construction:
Regression models are widely used for their simplicity, but are built manually over many iterations of trial-and-error to accommodate multiple objectives.
We offer an algorithmic approach to find the optimal regression model in a single step that incorporates all relevant considerations, fast and scalable.
Automated Regression leverages superior performance from cutting-edge research in Optimal Feature Selection.
This method improves upon Lasso, a popular state-of-the-art variable selection algorithm, with higher accuracy and lower false alarm rate. With more exact selection of variables, we ensure better performance and higher interpretability.
Optimal Imputation as a preprocessing step delivers consistent performance gains in prediction tasks:
It serves as the basis for a novel data quality assessment tool that identifies outliers and automates the cleaning and validation process.
Compared against benchmark methods across 84 datasets, Optimal Imputation achieves the best imputation accuracy in the majority of datasets under all scenarios, with a significant reduction of 10-15% in imputation errors.
Traditional imputation approaches compromise the quality of data, resulting in biases and limiting the predictive power.
Optimal Imputation uses global optimization to find the best imputed values by exploiting the relationship across features.
In the example to the right,
|Age||Gender||Years of Employment|
Typical data-to-decisions analytics use point predictions from machine learning to feed into optimization. However, the uncertainty is not captured and the models are subject to error magnification.
Our Optimal Data-Driven Prescription combines machine learning and optimization by using data directly in optimization to incorporate estimates and uncertainties. In many case studies, this leads to significantly better outcomes.
Optimal Prescriptions (red) outperforms the predict-then-prescribe (purple), baseline (gray), and is closest to oracle (black dotted) in an inventory management case.