Predicting House Sale Prices

Identify key location and quality driver of house values using Regression Trees with linear predictions

Accurate price prediction is crucial for all players in real estate

In real estate, accurate prediction of house sale prices is important for both buyers and sellers. For that reason, there are many Automated Valuation Models (AVMs) existing on the market such as Zillow's Zestimate®, but even if they are accurate, people argue over whether they work, and the predictions, often as outputs of black-box models, are not very helpful for decision making beyond a simple price prediction.

We aimed to understand how various factors such as location, condition, and property size interact to drive high value houses, using historical transaction data in King County, Washington. We started by using Optimal Regression Trees to predict sale prices. The resulting tree achieved an R-squared value of 0.69, a reasonable predictive performance. We see in the tree that the size of the house (in terms of living space), the location, and quality-related factors such as view and grade all affect the final sales price.

Optimal Regression Tree predicting house sale price

Optimal Regression Trees with Linear Predictions to capture linear effects

In particular, we observe that the size of the house is used by the tree a lot of times. This suggests that the size is important, and moreover, human intuition would tell us that there is likely a strong linear relationship between the size of the house and the sales price.

On the other hand, a simple linear regression model would assign the same marginal value to additional living space for every house. This seems unrealistic and inaccurate, as we might expect more desirable houses to have a higher marginal value of additional living space. Ideally, a better model would separate the houses based on the various characteristics into groups of similar properties, and then apply a different marginal value for additional living space in each group.

In the past, such house-prices decision trees were impractical to fit on even moderately-sized datasets. However, it is straightforward to extend the global optimization model that powers Optimal Trees to incorporate linear predictions. This means that Optimal Trees are unique in offering a practical way to train these trees with linear predictions at scale, allowing us to revisit the problem with more modeling power.

More meaningful interpretations and actionable insights

We fit an Optimal Regression Tree with linear predictions on the same data, resulting in a tree with fewer splits, and much higher predictive power (R-squared of 0.76). This tree suggests that depending on the location, view, and waterfront setting, there is a different coefficient for how the size of the house impacts the predicted sales price.

This data-driven segmentation of houses into groups with different linear regression models is intuitive for real-estate experts to understand. As a result, it makes it easier for buyers and sellers alike to use these tools as part of their pricing, marketing, and renovation decisions.

Optimal Regression Tree with linear predictions in the leaf

Unique Advantage

Why is the Interpretable AI solution unique?

  • More modeling flexibility

    The ability to incorporate linear predictions in Optimal Trees at scale unlocks unprecedented modeling power for decision trees.

  • Meaningful model interpretations

    The results are more intuitive and more actionable for both buyers and sellers.

Want to try Interpretable AI software?
We provide free academic licenses and evaluation licenses for commercial use.
We also offer consulting services to develop interpretable solutions to your key problems.

© 2020 Interpretable AI, LLC. All rights reserved.