Using convolutional autoencoders to extract visual features of leisure and retail environments

Author

Sam Comber; Daniel Arribas-Bel; Alex Singleton; Les Dolega

Published

October 1, 2020

Sam Comber; Daniel Arribas-Bel; Alex Singleton; Les Dolega (2020). Landscape and Urban Planning, 202, 103887. DOI: 10.1016/j.landurbplan.2020.103887

Abstract

Visual characteristics of leisure and retail environments provide sensory cues that can influence how consumers experience and behave within these spaces. In this paper, we provide a computational method that summarises the “visual features” of shopping districts by analysing a national database of geocoded store frontage images. While the traditional focus of social scientific research explores how drivers such as proximity to shopping environments factor into location choice decisions, the visual characteristics that describe the enclosing urban area are often neglected. This is despite the assumption consumers translate visual appearance of a retail area into a judgement of its functional utility which mediates consumer behaviour, patronage intention and the image a retail location projects to passers-by. Such judgements allow consumers to draw fine distinctions when evaluating between competing destinations. Our approach introduces a deep learning model known as Convolutional Autoencoders to extract visual features from storefront images of leisure and retail amenities. These features are partitioned into five clusters before several measures describing the environment around the leisure and retail properties are introduced to differentiate between the clusters and assess which variables are distinctive for particular groupings. Our empirical strategy unpacks different groupings from the clusters, which implies the existence of relationships between visual features of shopping areas and functional characteristics of the surrounding urban environment. Ultimately, using the example of retail landscapes, the core contribution of this paper demonstrates the utility of unsupervised deep learning methods to research questions in urban planning.

Extended Summary

This paper investigates how visual characteristics of shopping environments relate to functional urban characteristics using deep learning techniques. The research applies Convolutional Autoencoders to analyse 314,542 street-level images of leisure and retail properties across England and Wales, collected by the Local Data Company in 2015. Unlike supervised machine learning approaches that require large datasets of human-labelled images, this study uses an unsupervised method to automatically extract visual features without prior categorisation. The methodology involves three stages: extracting visual features using computer vision algorithms, partitioning these features into clusters, and introducing variables describing the neighbourhood characteristics within 15-minute walk catchments around each property. The analysis identifies five distinct visual clusters representing different types of retail environments. Group A comprises 159,251 properties in bustling shopping areas with high street density, comparison retail outlets, and affluent residents. Group B contains 24,567 properties in sparse retail environments with high vacancy rates, large floor areas, and fewer transport links, suggesting peri-urban warehouse spaces. Group C represents 81,310 properties in upmarket hospitality areas with diverse restaurants, bars, and entertainment venues, characterised by extremely low vacancy rates and good transport connectivity. Group D includes 6,962 properties on traditional high streets with longer average street lengths, diverse hospitality outlets, and less affluent local populations. Group E contains 81,310 properties representing everyday consumption environments with mixed retail and service offerings. The study demonstrates that visual-only features correlate significantly with measurable characteristics of built environments, including street network density, store diversity, transport accessibility, and socio-economic indicators. This research addresses limitations of traditional retail environment assessment methods, which rely on costly manual surveys with limited geographical coverage. The computational approach enables analysis at national scale whilst providing insights into the visual qualities that influence consumer behaviour and place perception. The findings have practical implications for urban planning and retail location decisions, offering a tool for evaluating visual amenity alongside functional characteristics when optimising store locations or planning place marketing campaigns. The methodology also contributes to broader urban science applications, potentially supporting analysis of crime patterns, socio-economic conditions, or other urban phenomena through visual environment assessment.

Key Findings

  • Convolutional Autoencoders successfully identified five distinct visual clusters of retail environments across England and Wales using unsupervised machine learning techniques.
  • Visual features strongly correlate with functional urban characteristics including street density, transport accessibility, vacancy rates, and local socio-economic conditions.
  • Dense urban shopping areas with comparison retail outlets and affluent residents form the largest cluster representing bustling commercial environments.
  • The methodology provides a scalable alternative to manual retail environment surveys whilst maintaining accuracy in characterising visual retail landscapes.
  • Unsupervised deep learning approaches offer advantages over supervised methods by eliminating requirements for large human-labelled training datasets in urban analysis.

Citation

PDF Download BibTeX

@article{comber2020using,
  author = {Sam Comber; Daniel Arribas-Bel; Alex Singleton; Les Dolega},
  title = {Using convolutional autoencoders to extract visual features of leisure and retail environments},
  journal = {Landscape and Urban Planning},
  year = {2020},
  volume = {202},
  pages = {103887},
  doi = {10.1016/j.landurbplan.2020.103887}
}