Grid-enabling Geographically Weighted Regression: A Case Study of Participation in Higher Education in England

Author

Richard Harris; Alex Singleton; Daniel Grose; Chris Brunsdon; Paul Longley

Published

February 1, 2010

Richard Harris; Alex Singleton; Daniel Grose; Chris Brunsdon; Paul Longley (2010). Transactions in GIS, 14(1), 43-61. DOI: 10.1111/j.1467-9671.2009.01181.x

Abstract

Geographically Weighted Regression (GWR) is a method of spatial statistical analysis used to explore geographical differences in the effect of one or more predictor variables upon a response variable. However, as a form of local analysis, it does not scale well to (especially) large data sets because of the repeated processes of fitting and then comparing multiple regression surfaces. A solution is to make use of developing grid infrastructures, such as that provided by the National Grid Service (NGS) in the UK, treating GWR as an “embarrassing parallel” problem and building on existing software platforms to provide a bridge between an open source implementation of GWR (in R) and the grid system. To demonstrate the approach, we apply it to a case study of participation in Higher Education, using GWR to detect spatial variation in social, cultural and demographic indicators of participation.

Extended Summary

This research demonstrates how to overcome computational limitations of Geographically Weighted Regression (GWR) by implementing it on distributed computing systems to analyse large spatial datasets. GWR is a statistical method that reveals how relationships between variables change across geographical space, but traditional implementations struggle with large datasets because they require fitting thousands of separate regression models. The study addresses this ‘big n problem’ by treating GWR as an ‘embarrassingly parallel’ process, where calculations can be distributed across multiple computer processors simultaneously. The research developed a grid-enabled version of GWR using the UK’s National Grid Service infrastructure, building on the open-source R statistical package. This approach allows each geographical location to be analysed independently on separate processors, with results pooled together afterwards. The time required for analysis becomes inversely proportional to the number of available processors, potentially reducing computation time from weeks to hours. To demonstrate this grid-enabled approach, the research analysed higher education participation rates across 31,378 Lower Super Output Areas in England. The study examined how social, cultural and demographic factors influence university participation rates at the neighbourhood level, using variables including educational qualifications, Key Stage 4 attainment, car ownership, and ethnicity. The analysis revealed significant spatial variation in how these factors affect participation rates. For example, the relationship between car ownership and higher education participation was almost four times stronger in some areas than others, driven largely by a ‘London effect’ where high wealth concentration provides distinct pathways into higher education. The ethnicity variable showed particularly complex spatial patterns, ranging from negative to positive effects across different regions, with statistically significant relationships clustered in urban industrial areas reflecting historical immigration patterns. This grid-enabled GWR provides a benchmark for other spatial statistical methods, demonstrating that computationally intensive local spatial analysis can be made feasible for large datasets. However, the research also identifies limitations including dependency on centralised computing infrastructure, memory constraints in R for very large datasets, and the challenge of interpreting complex geographical patterns revealed by increased analytical power. The work contributes to the emerging field of e-social science by showing how distributed computing can enhance geographical analysis capabilities, though sustainability concerns about continued funding for grid infrastructure remain.

Key Findings

  • Grid-enabled GWR reduces computation time from weeks to hours by distributing analysis across multiple processors simultaneously
  • Higher education participation analysis reveals four-fold variation in car ownership effects, driven by London’s concentrated wealth patterns
  • Spatial relationships between ethnicity and university participation vary from negative to positive across England’s regions
  • Grid computing makes local spatial statistical analysis feasible for datasets exceeding 31,000 geographical units
  • Memory limitations in R and infrastructure sustainability pose ongoing challenges for large-scale spatial analysis

Citation

PDF Download BibTeX

@article{harris2010gridenabling,
  author = {Richard Harris; Alex Singleton; Daniel Grose; Chris Brunsdon; Paul Longley},
  title = {Grid-enabling Geographically Weighted Regression: A Case Study of Participation in Higher Education in England},
  journal = {Transactions in GIS},
  year = {2010},
  volume = {14(1)},
  pages = {43-61},
  doi = {10.1111/j.1467-9671.2009.01181.x}
}