Machine-Learning Real Estate Valuation: Not Only a Data Affair

Mar 11, 2019 Articles Machine-Learning Real Estate Valuation: Not Only a Data Affair

Real estate prices are observed when properties change hands. The high costs associated with real estate transactions and the large average volume of a typical transaction (as compared to equity markets for example), lead to infrequent observations for the same asset. In between transactions, real estate professionals and investors need to rely on valuations – the most likely price to be obtained in the market, had the property been put up for sale. It’s a hypothetical value, not the actual registered price.

Valuations are relatively straightforward yet still involved exercises when similar properties in terms of hedonic variables[i] (also called comparables) transacted in the market close to the valuation date. In the absence of reliable comparable transactions, the possible value of a piece of real estate (be it residential or commercial) needs to be assessed using a valuation method. From transparent discounted cash-flow spreadsheets to sophisticated econometric models, any reliable valuation stands to benefit from accurate forecasts of expected levels of cash-flows and discount rates. The buying or selling decision is further influenced by the perceived current state of the real estate cycle but also the projected direction of the cycle.

Predicting rents requires a good understanding of demand and supply dynamics in the space market, construction and financing of construction, the evolution of the natural vacancy rate and possible migration flows of both firms and workers, among the more prominent determinants. Forecasting of discount factors involves a thorough understanding of monetary policy (as a determinant of short rates) and of the term-structure of risk premia. If the valuation is done in a portfolio setting, future correlations of the above variables may also be needed in order to assess the diversification benefits of selling or acquiring exposure to a certain real estate sector.

Given the sensitivity of valuations to changes in discount-rates, it’s worth spending a few paragraphs on the determination of short-rates and the risk-free component of the discount rate.

Going deeper down the monetary policy rabbit hole, the level of short-term rates is usually steered by a central bank using a variant of the Taylor rule. This holds for most inflation-targeting central banks around the world. The rule is a formula which links short-rates to output gaps (which measure how far observed output is from a sustainable non-inflationary level) and inflation gaps (which measure how far observed inflation is from an assumed target, usually set at 2%). Positive gaps lead to increases in the interest-rate with negative gaps leading to decreases in the rate.

Conditional on a myriad of other factors, changes in the short end of the curve will propagate along the entire yield curve, not always in homogeneous fashion though. Thus, extending the above also with a model of the term-structure of interest-rates, we may arrive at the risk-free rate expected to be observed at the future transaction date. Risk-premia on top, also quite variable and cyclical, and we finally have our discount rates at different future points in time.

This brief slalom points to the data hungry but also the modelling intensive nature of real estate valuation. The longer the required holding period for an asset, the further out one needs to forecast the above-mentioned variable. The further out we look, the murkier the estimates and thus less reliable the valuation.

Real estate prices are notorious for their cyclical behavior across long periods of time, in both developed and developing markets. Same holds for credit. The interaction between GDP, credit and real estate represents a valuable source of information regarding future direction of the property cycle. Just as a brief motivator, recall that the most recent episode of US housing boom and bust features lax lending standards and high loan-to-value (LTV) ratios as some of the important boom triggers and the sudden and economy-wide credit stop as the bust galvanizer.

Cycle duration and amplitude vary across time and feature different characteristics across countries. Changes in structural characteristics of both real estate markets (going from local construction and zoning legislation) and credit markets (deregulation and changes in regulatory risk limits) conspire to alter the statistical properties of property cycles.

How and when do these changes show up in the aggregate real estate price data? Do these changes impact forecasting? How should the real estate valuation exercise reflect these changes? These questions may be tackled within a flexible modelling strategy in which the relevance of lead-lags structures and the choice of variables is best left to the diligent work of machine learning algorithms.

Against this backdrop and helped by the data released by Òscar Jordà, Moritz Schularick, and Alan M. Taylor in their “Macrofinancial History and the New Business Cycle Facts.” (available at, I analyze the time-varying trend and cyclical behavior of GDP, real estate prices and credit over more than 100 years for several developed economies. The goal is to identify and estimate the relevance of local vs. global cycles and the spillover effects of both financial and real cycles of other economies.

If a global low-frequency component is at work in the analyzed market, this statistical information is useful to improve the forecasts employed in the valuation exercise. Even though real estate markets remain predominantly local affairs, globalized trade and finance flows can lead to spillovers from one real estate market to another. A decomposition of the variables in a trend and a cyclical component is also useful to assess the possible actions of central banks 3 to 5 years from now and so provide an estimate of future short-rates. This is because the estimated cyclical components are the gaps needed as inputs in the typical Taylor rule.

Let’s take a first look at the data. I compute the 10-year moving average of growth rates of residential real estate prices, GDP/capita and credit to non-financial corporations. Some breaks are present in the data, generally observed over the WWII period, but otherwise the data is well behaved. The synchronicity of credit and house prices using the 10-year average stand out (Fig. 1). This reinforces prior research which indicates the importance of credit in explaining house price dynamics. Naturally one may wonder if the 10-year average is the relevant data span to use to analyze these three variables. ML may be employed here to search for the optimal window size which minimizes forecasting error as gaps are used in the forecasting model. This may well vary over time (in some periods the 10-year average might be the best choice, in some other period it may be the 7-year average), variables and countries.

The 10-year average may be considered as a naïve estimate of the underlying trend. This is the low frequency component of the data. The cyclical component is then obtained by subtracting the trend value from the observed value at all points in time. The most recent period shows increased synchronization in the trends providing some additional explanation for the observed high correlation of real estate prices across most international markets.

Figure 1

Additionally, a view of the joint dynamic across countries may be obtained by estimating a Loess curve across the available cross-section of 10year averages (Fig. 2). This is a straightforward application aimed at revealing the presence of global factors across real estate, GDP and credit growth. The pink bands show recession periods for the US. An interesting empirical observation is the time-varying relationship between credit growth on output and how it changed qualitatively over time. Over most of the considered period, high credit growth rates led to both higher output and house price growth rates – recall we are talking about trends in growth rates. The three variables peak together in the 1920s, 1950s and 1980s. This is not the case though for the 2007 peak where high credit growth led to high real estate price growth but a muted reaction in trend GDP. This observation carries forward indicating that the Secular Stagnation may not offer a complete explanation of the muted post-crisis growth.

A more structural interpretation is possible and is pursued for example in a recent paper with Anh D.M. Nguyen from the Bank of Lithuania. In this paper, we build a multivariate unobserved component model to explain the evolution of output gaps for the US economy over more than a century. The working version of the paper is available at

Preliminary results have been presented and discussed in several recent research seminars at the Bank of Romania ( and the Bank of Ukraine ( along with several academic conferences.

In a cross-country setting, ML offers a natural choice for selection of optimal weights used to compute the global average. Economic theory indicates that trade or GDP may be used to build the relevant weights. Although economically easier to interpret, this avenue may not be optimal from a forecasting perspective. Selecting a time-varying matrix of coefficients produces improved results for a forecasting exercise. Naturally, for each market, the estimation may be improved upon with country-specific individual transaction data.

Figure 2

The quick data/modelling/insight detour highlights that data, although an essential input, need an interpretation lens. This is provided by models and prior conceptual frameworks which are individually suited for the selected question, in our case real estate valuation. It may well be the case that we obtain the same econometric insight from a carefully selected model linking NVR to equilibrium rent as we do from a PCA of a large list of loosely connected covariates. The better strategy is clearly to extract any useful information from all available covariates. Without doubt, causal interpretability of results is preferred to a black-box approach. Data insights need to become actionable business decisions, decisions that can be easily communicated and understood, not wrapped in obscure statistical terminology. Although fuzzy at times, econ 101 remains an important source of causal interpretation and variable selection for model building and estimation.

The AI wave brings unique business and ethical challenges, challenges to which real estate will not remain immune. More importantly, it indicates the increased role played by high purity timely data. At PrepayWay we see blockchain-stored transaction data as a valuable factor of production, one with progressively important value in an AI-driven economy. Our business model is developed recognizing the value of good quality data for the functioning of the real estate ecosystem. We therefore design our products and services not by obfuscating the value of this input to avoid compensating users for their data contribution, but by formally recognizing its worth and organizing the proper market needed to transact it.

[i] Floor area, location, year of construction, etc.

Posted on Mar 11, 2019 by

Mihnea Constantinescu

© 2017-2019 PrepayWay. All rights reserved