Mapping the geodemographics of digital inequality in Great Britain: An integration of machine learning into small area estimation

Author

Alex Singleton; Alexandros Alexiou; Rahul Savani

Published

July 1, 2020

Alex Singleton; Alexandros Alexiou; Rahul Savani (2020). Computers, Environment and Urban Systems, 82, 101486. DOI: 10.1016/j.compenvurbsys.2020.101486

Abstract

Geographic variation in digital inequality manifests as a result of a range of demographic, attitudinal, behavioural and locational factors. To better understand this multidimensional geography, our paper develops a new geodemographic classification for the spatial extent of Great Britain. In this model, we integrate a range of new small area measures that are drawn from multiple new forms of data including consumer purchasing data, survey and open data sources. Our analytical approach innovatively provides an integration of machine learning into a small-area estimation technique to obtain Lower Super Output Area / Data Zone estimates of Internet use, alongside a range of online engagement and consumption measures. Following the collation of a range of input measures, we implemented a more standard geodemographic framework that utilises the unsupervised clustering algorithm k-means to produce a map of the multidimensional characteristics of digital inequality for Great Britain; creating the Internet User Classification (IUC). Our outputs provide a new and nuanced understanding of the contemporary salient characteristics of digital inequality in Great Britain, which we evaluate both internally and externally within the context of preparations for the 2021 UK Census of the Population, exploring the geodemographic patterns of Census test response rates and the prevalence to complete the survey online. Our innovative work illustrates the strength of a geodemographic approach in mapping spatial patterns of digital inequality, and through the presented application concerning Census response rates and characteristics we demonstrate how the IUC can be operationalised within such settings for local intervention or benchmarking.

Extended Summary

This research aims to map the multidimensional geography of digital inequality across Great Britain through developing a comprehensive geodemographic classification. Digital inequality affects access to goods, services, civic engagement, and economic opportunities, making it crucial for public policy makers to understand its spatial patterns. The study integrates machine learning techniques with traditional small area estimation methods to create detailed measures of internet use and digital engagement at the Lower Super Output Area and Data Zone level. The methodology combines multiple data sources including consumer purchasing data from two major online retailers, broadband infrastructure data from Ofcom, and the British Population Survey. Crucially, the research applies gradient boosting regression trees within a small area estimation framework to predict internet behaviours for areas with insufficient survey coverage. This innovative approach enables the creation of estimates for over 40,000 small areas across Great Britain. Using k-means clustering analysis, the study creates the Internet User Classification (IUC), which identifies ten distinct groups representing different patterns of digital engagement. These range from ‘e-Cultural Creators’ - young, ethnically diverse populations with high social media use near universities - to ‘e-Withdrawn’ groups in deprived urban areas with minimal internet engagement. The classification reveals significant spatial variations in digital inequality, with infrastructure constraints particularly affecting rural areas whilst socioeconomic factors drive exclusion in urban neighbourhoods. External validation using data from the 2017 Census test demonstrates the practical utility of this approach. Areas classified as having lower digital engagement showed reduced online survey completion rates, whilst groups like ‘e-Cultural Creators’ had high online engagement but lower overall response rates, suggesting different intervention strategies may be needed. The research contributes methodologically by successfully integrating machine learning into small area estimation and substantively by providing a nuanced understanding of digital inequality’s geography. The findings have significant implications for public service delivery, particularly as government services increasingly move online. The IUC provides local authorities and national agencies with a tool for identifying areas requiring targeted digital inclusion interventions, understanding likely uptake of online services, and developing appropriate strategies for different community types. This work is particularly relevant given the accelerated digitalisation of public services during and after the COVID-19 pandemic.

Key Findings

  • Developed Internet User Classification identifying ten distinct groups with different digital engagement patterns across Great Britain’s 40,000+ small areas.
  • Successfully integrated machine learning gradient boosting into small area estimation, achieving R² values between 0.72-0.89 for internet behaviour predictions.
  • Revealed significant spatial variations in digital inequality, from highly engaged ‘e-Cultural Creators’ near universities to ‘e-Withdrawn’ groups in deprived areas.
  • External validation using 2017 Census test data confirmed classification utility, with digitally excluded areas showing lower online survey completion rates.
  • Provided practical tool for targeting digital inclusion interventions and predicting online service uptake across different neighbourhood types.

Citation

PDF Download BibTeX

@article{singleton2020mapping,
  author = {Alex Singleton; Alexandros Alexiou; Rahul Savani},
  title = {Mapping the geodemographics of digital inequality in Great Britain: An integration of machine learning into small area estimation},
  journal = {Computers, Environment and Urban Systems},
  year = {2020},
  volume = {82},
  pages = {101486},
  doi = {10.1016/j.compenvurbsys.2020.101486}
}