Assessing the value of user-generated images of urban surroundings for house price estimation
Meixu Chen; Yunzhe Liu; Dani Arribas-Bel; Alex Singleton (2022). Landscape and Urban Planning, 226, 104486. DOI: 10.1016/j.landurbplan.2022.104486
Abstract
Determinants of housing prices are particularly significant for monitoring and understanding housing prices. Traditional variables are measured through official statistics or questionnaire surveys, which are labour intensive and time-consuming. New forms of data, such as point of interest or street view imagery, have been used to extract housing location and neighbourhood features, but they cannot capture how different individuals recognised and evaluated the properties nearby, which may also be relevant in the house price estimation. Therefore, this study investigates whether user-generated images may be used to monitor and understand housing prices and how they influence real estate values. Within this context, perceived scenes features are extracted and quantified to blend with commonly used determinants of housing prices. Two machine learning algorithms, random forest and gradient boosting machines, are utilised and deployed for integration with a typical housing price modelling-hedonic price model. By comparing the performance and interpretability of different models, the relative importance of features and how they influence the estimation power of the models is visualised and analysed. The findings suggest that random forest predictions perform the best and are interpretable, with geotagged Flickr images adding 4.6% to the model’s accuracy (R²) from 61.9% to 66.5%. Although user-generated images increase minor value in house price estimation, they may be used as a supplementary data source to capture perception features for house price estimation. This could help the restructuring and optimisation of residential areas in future regional construction, planning and development.
Extended Summary
This research investigates whether user-generated social media images can enhance house price estimation by capturing human perceptions of urban environments that traditional data sources miss. Using Inner London as a case study between 2013-2015, the paper combines traditional housing data from HM Land Registry with over one million geotagged Flickr photographs to extract perceived scene features around properties. The methodology employs three types of variables: structural features (property type, age, tenure), location features (distances to amenities, transport links), and novel scene features derived from social media images using computer vision techniques. A Places365 convolutional neural network identifies 365 scene categories from Flickr images, with seven most relevant features selected through random forest feature importance analysis, including plazas, crosswalks, palaces, restaurants, museums, industrial areas, and churches. Three modelling approaches are compared: traditional hedonic pricing models, random forest, and gradient boosting machines, with performance evaluated using cross-validation techniques. The findings demonstrate that machine learning methods significantly outperform traditional hedonic models, with random forest achieving the highest accuracy. When geotagged Flickr images are incorporated, model performance improves modestly but consistently across all approaches, with random forest showing a 4.6% increase in R² from 61.9% to 66.5%. The research reveals that perceived scene features, particularly attractive urban elements like plazas and palaces, have positive associations with house prices, whilst accessibility to transport infrastructure remains the strongest predictor. Distance to underground stations proves most influential, followed by property type and tenure arrangements. The study employs permutation importance and accumulated local effects plots to interpret machine learning results, making them as transparent as traditional econometric models. Whilst the contribution of user-generated images to predictive accuracy is relatively modest, they provide valuable insights into how human perception and interaction with urban environments influences property values. The research suggests these supplementary data sources could inform urban planning decisions by revealing which neighbourhood characteristics people find most appealing. For policymakers and urban planners, this work demonstrates how social media data can complement traditional datasets to better understand housing markets and guide neighbourhood development strategies that enhance both property values and residential satisfaction.
Key Findings
- Random forest models outperform traditional hedonic pricing models, achieving 66.5% accuracy when incorporating social media image features
- Geotagged Flickr images provide modest but consistent improvements to house price prediction across all modelling approaches tested
- Distance to underground stations emerges as the strongest predictor of house prices, followed by property type and tenure
- Perceived scene features like plazas, palaces and crosswalks show positive associations with property values in Inner London
- Machine learning methods demonstrate equivalent interpretability to traditional models through visualisation techniques whilst offering superior predictive performance
Citation
@article{chen2022assessing,
author = {Meixu Chen; Yunzhe Liu; Dani Arribas-Bel; Alex Singleton},
title = {Assessing the value of user-generated images of urban surroundings for house price estimation},
journal = {Landscape and Urban Planning},
year = {2022},
volume = {226},
pages = {104486},
doi = {10.1016/j.landurbplan.2022.104486}
}