Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases

Author

Muhammad Adnan; Paul A Longley; Alex D Singleton; Chris Brunsdon

Published

June 21, 2010

Muhammad Adnan; Paul A Longley; Alex D Singleton; Chris Brunsdon (2010). Transactions in GIS, 14(3), 283-297. DOI: 10.1111/j.1467-9671.2010.01197.x

Abstract

Geodemographic classifications provide discrete indicators of the social, economic and demographic characteristics of people living within small geographic areas. They have hitherto been regarded as products, which are the final “best” outcome that can be achieved using available data and algorithms. However, reduction in computational cost, increased network bandwidths and increasingly accessible spatial data infrastructures have together created the potential for the creation of classifications in near real time within distributed online environments. Yet paramount to the creation of truly real time geodemographic classifications is the ability for software to process and efficiency cluster large multidimensional spatial databases within a timescale that is consistent with online user interaction. To this end, this article evaluates the computational efficiency of a number of clustering algorithms with a view to creating geodemographic classifications “on the fly” at a range of different geographic scales.

Extended Summary

This research evaluates clustering algorithms for creating real-time geodemographic classifications that could replace the static, expert-produced neighbourhood typologies currently used by government and businesses. Geodemographic systems classify small geographical areas by their social, economic and demographic characteristics, helping organisations target services and understand local populations. However, existing classifications rely on outdated data sources like the decennial census and use closed methods controlled by commercial providers. The study tests three clustering algorithms—k-means, Clara (Clustering Large Applications), and genetic algorithms—across different geographic scales using UK census data. Performance was measured by computational speed, classification quality using silhouette width analysis, and efficiency with different data standardisation techniques including z-scores, range standardisation, and principal components analysis. The research examined datasets at Output Area, Lower Super Output Area, and Ward levels, representing increasingly large geographical units. For large datasets with small numbers of clusters, Clara performed fastest, but k-means proved more efficient when creating many clusters. Genetic algorithms produced better quality classifications for large datasets but required more processing time. The choice of data standardisation method had minimal impact on computational performance across all algorithms. These findings suggest that different algorithms suit different scenarios in online geodemographic systems. Clara works best for quick analyses of large datasets requiring few neighbourhood types, whilst k-means excels when users need detailed classifications with many categories. The research demonstrates that real-time geodemographic analysis is technically feasible, potentially democratising access to neighbourhood classification tools. This could enable local authorities, health services, and other organisations to create bespoke classifications using current data sources rather than relying on general-purpose commercial products. The implications extend beyond technical efficiency to questions of data ownership and algorithmic transparency in social classification systems. By making geodemographic methods more accessible and responsive, this work challenges the authority of expert-produced classifications and opens possibilities for more timely, application-specific analyses of neighbourhood characteristics.

Key Findings

  • Clara clustering algorithm performs fastest for large datasets when creating small numbers of neighbourhood clusters
  • K-means algorithm proves most efficient for detailed classifications requiring many cluster categories across all dataset sizes
  • Genetic algorithms produce highest quality neighbourhood classifications but require significantly longer computational processing time
  • Data standardisation methods show minimal impact on clustering performance, suggesting flexibility in online system design
  • Real-time geodemographic classification is technically feasible, potentially democratising neighbourhood analysis beyond commercial providers

Citation

PDF Download BibTeX

@article{adnan2010towards,
  author = {Muhammad Adnan; Paul A Longley; Alex D Singleton; Chris Brunsdon},
  title = {Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases},
  journal = {Transactions in GIS},
  year = {2010},
  volume = {14(3)},
  pages = {283-297},
  doi = {10.1111/j.1467-9671.2010.01197.x}
}