Predicting Participation in Higher Education: A Comparative Evaluation of the Performance of Geodemographic Classifications
Chris Brunsdon; Paul Longley; Alex Singleton; David Ashby (2011). Journal of the Royal Statistical Society Series A: Statistics in Society, 174(1), 17-30. DOI: 10.1111/j.1467-985X.2010.00641.x
Abstract
SummaryParticipation in UK higher education is modelled by using Poisson regression techniques. Models using geodemographic classifications of neighbourhoods of varying levels of detail are compared with those using variables that are directly derived from the census, using a cross-validation approach. Increasing the detail of geodemographic classifiers appears to be justified in general, although the degree of improvement becomes more marginal as the level of detail is increased. The census variable approach performs comparably, although it is argued that this depends heavily on an appropriate choice of predictors. The paper concludes by discussing these results in a broader practice-oriented and pedagogic context.
Extended Summary
This research evaluates how effectively different neighbourhood classification systems can predict participation in UK higher education among 18-19 year olds. Following the 2004 Higher Education Act, which allowed universities to charge variable tuition fees, institutions needed reliable methods to identify and target students from under-represented socio-economic groups to meet access requirements. The study compares geodemographic classifications - systems that categorise neighbourhoods based on the demographic and socio-economic characteristics of residents - with traditional statistical models using census data directly. The research utilised data from the Higher Education Statistics Agency covering all English students studying in English institutions in 2001, combined with 2001 UK census data. Two main geodemographic systems were examined: the public domain Output Area Classification (OAC) with varying levels of detail (7 supergroups, 21 groups, and 52 subgroups) and the commercial Mosaic system (11 groups and 61 types). These were compared against Poisson regression models using six key census variables including population density, age structure, owner occupation rates, unemployment levels, and educational qualifications. Cross-validation techniques were employed to test the predictive accuracy of each approach. The analysis reveals that more detailed geodemographic classifications generally perform better than simpler ones, though the improvement becomes marginal as complexity increases. The biggest performance gains occur when moving from basic national averages to broad neighbourhood categories, with diminishing returns from further subdivision. Census-based regression models slightly outperformed the most detailed geodemographic approaches, achieving a mean absolute deviation score of 0.594 compared to 0.612 for the best geodemographic model. However, this advantage was not consistent across all neighbourhood types and depended heavily on selecting appropriate predictor variables. The research demonstrates clear geographical patterns in higher education participation, with the highest rates found in countryside and prospering suburban areas, whilst the lowest participation occurs in areas characterised as ‘constrained by circumstances’ or ‘municipal dependency’. Importantly, the study suggests geodemographic analysis can inform variable selection for more sophisticated statistical models. The findings have significant implications for widening participation policies in higher education. Whilst sophisticated regression models offer slightly better predictive accuracy, geodemographic classifications provide more accessible and computationally simpler tools for practitioners without advanced statistical knowledge. The research supports the continued use of neighbourhood-based targeting systems whilst highlighting the importance of understanding their limitations and the marginal benefits of increased complexity.
Key Findings
- More detailed geodemographic classifications improve prediction accuracy, but with diminishing returns as complexity increases beyond basic neighbourhood categories.
- Census-based regression models slightly outperform geodemographic approaches but require careful variable selection and advanced statistical expertise.
- Higher education participation shows clear geographical patterns, with highest rates in countryside and prospering suburban areas.
- Geodemographic analysis provides accessible targeting tools for practitioners whilst informing variable selection for sophisticated statistical models.
- Areas with high student populations paradoxically show low participation rates due to residents applying from parental addresses.
Citation
@article{brunsdon2011predicting,
author = {Chris Brunsdon; Paul Longley; Alex Singleton; David Ashby},
title = {Predicting Participation in Higher Education: A Comparative Evaluation of the Performance of Geodemographic Classifications},
journal = {Journal of the Royal Statistical Society Series A: Statistics in Society},
year = {2011},
volume = {174(1)},
pages = {17-30},
doi = {10.1111/j.1467-985X.2010.00641.x}
}