Case - Hard Drive Failure - Interpretable AI

Monitoring Hard Drives in a Data Center

Understanding how and why hard drives fail by monitoring their behavior

Correlation analyses are too simplistic

Every quarter, Backblaze, a large data storage provider, publishes data relative to the 130,000 hard drives in their data center. Each hard drive's operation is monitored daily and summarized in several SMART metrics. Additionally, Backblaze records whenever a hard drive fails.

To optimize the data center's operation, workers need to know in advance which hard drives are likely to fail. Backblaze has identified five SMART metrics indicating impeding failure from univariate correlation analyses between individual SMART metrics and failure rates, as well as their workers' experience.

Despite being interesting, these findings are overly simplistic and not immediately actionable. What insights can we get using interpretable machine learning models?

Understanding the overall health of a hard drive

More than simple correlation analyses, Optimal Survival Trees displays paths to failure showcasing how, under certain conditions (represented by SMART metrics values), failures are historically more likely to occur. In a single augmented decision tree, we are able to observe the overall behavior of hard drives throughout their life cycle.

Read the paper

Predicting short-term failures

If we are monitoring hard drives on a shorter time scale, for instance in order to schedule maintenance activities, Optimal Classification Trees can predict failures within a fixed time window.

The end model can be easily visualized and understood by non-technical people, without sacrificing performance.

Our work was featured in a recent blog post by Backblaze and we presented the analysis at a webinar hosted by Backblaze.

Read the blog post

Unique Advantage

Why is the Interpretable AI solution unique?

Detecting interpretable paths to Failure

Optimal Trees can automatically display paths to failure, as well as healthy behaviors, featuring correlations between several SMART metrics simultaneously
Specialized models for specific tasks

Depending on the question we are trying to answer, e.g. overall health monitoring, or predicting failure within a given fixed time window, we can choose between Optimal Survival Trees and Optimal Classification Trees
Adaptable to low data scenarios

If easily accessible data is scarce and comes from a short time frame, interesting findings can still be found using Interpretable AI’s software modules

Want to try Interpretable AI software?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.

Monitoring Hard Drives in a Data Center

Correlation analyses are too simplistic

Understanding the overall health of a hard drive

Predicting short-term failures

Why is the Interpretable AI solution unique?

Detecting interpretable paths to Failure

Specialized models for specific tasks

Adaptable to low data scenarios

Want to try Interpretable AI software?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.

Interpretability

Products

Solutions

Company

Support

Subscribe to our newsletter

Monitoring Hard Drives in a Data Center

Correlation analyses are too simplistic

Understanding the overall health of a hard drive

Predicting short-term failures

Why is the Interpretable AI solution unique?

Detecting interpretable paths to Failure

Specialized models for specific tasks

Adaptable to low data scenarios

Want to try Interpretable AI software? We provide free academic licenses and evaluation licenses for commercial use.We also offer consulting services to develop interpretable solutions to your key problems.

Interpretability

Products

Solutions

Company

Support

Subscribe to our newsletter

Want to try Interpretable AI software?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.