Predicting House Sale Prices
Identify key location and quality driver of house values using Regression Trees with linear predictions

Accurate price prediction is crucial for all players in real estate
In real estate, accurate prediction of house sale prices is important for both buyers and sellers. For that reason, there are many Automated Valuation Models (AVMs) existing on the market such as Zillow's Zestimate®, but even if they are accurate, people argue over whether they work, and the predictions, often as outputs of black-box models, are not very helpful for decision making beyond a simple price prediction.
We aimed to understand how various factors such as location, condition, and property size interact to drive high value houses, using historical transaction data in King County, Washington. We started by using Optimal Regression Trees to predict sale prices. The resulting tree achieved an R-squared value of 0.69, a reasonable predictive performance. We see in the tree that the size of the house (in terms of living space), the location, and quality-related factors such as view and grade all affect the final sales price.
Optimal Regression Tree predicting house sale price
Optimal Regression Trees with Linear Predictions to capture linear effects
On the other hand, a simple linear regression model would assign the same marginal value to additional living space for every house. This seems unrealistic and inaccurate, as we might expect more desirable houses to have a higher marginal value of additional living space. Ideally, a better model would separate the houses based on the various characteristics into groups of similar properties, and then apply a different marginal value for additional living space in each group.
In the past, such house-prices decision trees were impractical to fit on even moderately-sized datasets. However, it is straightforward to extend the global optimization model that powers Optimal Trees to incorporate linear predictions. This means that Optimal Trees are unique in offering a practical way to train these trees with linear predictions at scale, allowing us to revisit the problem with more modeling power.

More meaningful interpretations and actionable insights
We fit an Optimal Regression Tree with linear predictions on the same data, resulting in a tree with fewer splits, and much higher predictive power (R-squared of 0.76). This tree suggests that depending on the location, view, and waterfront setting, there is a different coefficient for how the size of the house impacts the predicted sales price.
This data-driven segmentation of houses into groups with different linear regression models is intuitive for real-estate experts to understand. As a result, it makes it easier for buyers and sellers alike to use these tools as part of their pricing, marketing, and renovation decisions.
Optimal Regression Tree with linear predictions in the leaf
Unique Advantage
Why is the Interpretable AI solution unique?
-
More modeling flexibility
The ability to incorporate linear predictions in Optimal Trees at scale unlocks unprecedented modeling power for decision trees.
-
Meaningful model interpretations
The results are more intuitive and more actionable for both buyers and sellers.