Improving Malware Detection in Cybersecurity

Opening up the black-box to diagnose problems and foster collaboration

An AI revolution in cybersecurity?

Malware detection has a long tradition of heuristics and blacklist-based protection. These approaches are limited to defending against previously-seen threats, and cannot protect against novel attacks.

Machine learning has enabled a new paradigm for detection of malware in real-time that uses predictive models to assess running programs for abnormal behavior. This approach has the potential to block never-before-seen malware.

Many cybersecurity vendors have been eager to deploy these AI-based protection algorithms without full understanding of how they behave, and in particular, how they can be exploited.

Perils of black-box modeling

Our client had developed a production pipeline using black-box AI models that was able to detect malware with high performance.

However, the performance of the detection was also highly variable: on average it was strong, but on some days the ability to protect against the newest threats was significantly lower.

Neither the data scientists nor the security researchers were able to understand the black-box models, and more importantly, identify where they were failing.

Diagnosis using interpretable models

Optimal Classification Trees enabled a transparent and interpretable view into their data and the prediction process, addressing their key issues:
  • Auditable for security flaws

    The model can be manually audited by security researchers to ensure there are no exploitable weak-points

  • Identification of problem areas

    Looking at the leaves of the tree where the performance is weakest tells us where the model is most uncertain

Example decision tree predicting attack probability. The tree highlights problem areas such as the paths in red where the model is most uncertain

Unlocking a culture of collaboration

Having an interpretable model in hand enabled the security researchers and data scientists to collaborate closely on improving the pipeline.

Security researchers were able to analyze the samples in the weak areas of the tree, deriving new features for the model to help better detect malware.

The security researchers also became confident in the logic of the model and placed more trust in its predictions.

These improvements helped stabilize the model performance and ensure consistently strong protection.

Unique Advantage

Why is the Interpretable AI solution unique?

  • Enables collaboration with domain experts

    The interpretability of the model allows security researchers to collaborate with data scientists to improve the prediction quality

  • Auditable, robust and trusted

    Security researchers can easily validate the logic of the model, minimizing the risk of exploitable flaws in the protection engine

Want to try Interpretable AI software?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.

© 2020 Interpretable AI, LLC. All rights reserved.