Creating the 2011 area classification for output areas (2011 OAC)
Christopher G Gale; Alexander D Singleton; Andrew G Bates; Paul A Longley (2016). Journal of Spatial Information Science. DOI: 10.5311/JOSIS.2016.12.232
Abstract
This paper presents the methodology that has been used to create the 2011 Area Classification for Output Areas (2011 OAC). This extends a lineage of widely used public domain census-only geodemographic classifications in the UK. It provides an update to the successful 2001 OAC methodology, and summarizes the social and physical structure of neighborhoods using data from the 2011 UK Census. The results of a user engagement exercise that underpinned the creation of an updated methodology for the 2011 OAC are also presented. The 2011 OAC comprises 8 Supergroups, 26 Groups, and 76 Subgroups. An example of the results of the classification in Southampton is presented.
Extended Summary
This research describes the development of the 2011 Area Classification for Output Areas (2011 OAC), an updated geodemographic classification system that categorises UK neighbourhoods based on their social and demographic characteristics. The study employed a comprehensive methodology combining multiple data manipulation techniques, advanced statistical clustering methods, and extensive user consultation to create a more robust classification system than its predecessor. Using exclusively 2011 UK Census data covering all output areas in England, Wales, Scotland, and Northern Ireland, the research tested 27 different combinations of data preparation methods. These included three rate calculation approaches (percentages, index scores, and mean differences), three transformation techniques (log 10, Box-Cox, and inverse hyperbolic sine), and three standardisation methods (z-scores, range standardisation, and inter-decile range standardisation). The paper evaluated these combinations using correlation analysis and clustering-based sensitivity analysis to identify the optimal approach. Following extensive testing of over 10,000 clustering iterations for each combination, the research selected percentage calculation, inverse hyperbolic sine transformation, and range standardisation as the most effective methodology. The final classification uses 60 variables derived from 167 initial census variables, representing five key domains: demographic structure, household composition, housing, socio-economic characteristics, and employment. This represents a significant increase from the 41 variables used in the previous 2001 classification. The 2011 OAC employs k-means clustering with squared Euclidean distance to create a three-tier hierarchical structure comprising 8 Supergroups, 26 Groups, and 76 Subgroups. Key improvements include better discrimination in urban areas, particularly London, enhanced representation of ethnic diversity, and improved coverage of demographic changes between 2001 and 2011. The research incorporated extensive user engagement through online surveys and stakeholder consultations, ensuring the classification meets diverse user needs across academic, commercial, and public sector applications. Unlike commercial geodemographic systems, the 2011 OAC maintains complete transparency by using open-source software (R programming language) and making all code publicly available through GitHub. The classification provides enhanced differentiation of UK population characteristics whilst maintaining compatibility with previous systems. Validation using local area knowledge in Southampton demonstrated that cluster assignments accurately reflect known demographic patterns, with student areas, diverse communities, and deprived neighbourhoods correctly identified. The broader significance extends beyond academic research to practical applications in public service delivery, health planning, education policy, and commercial market analysis. This work establishes a new benchmark for open geodemographic classification, providing free access to sophisticated demographic analysis tools that were previously only available through expensive commercial systems.
Key Findings
- The 2011 OAC uses 60 variables compared to 41 in the previous system, providing enhanced demographic differentiation across UK neighbourhoods.
- Extensive testing of 27 methodological combinations identified inverse hyperbolic sine transformation as optimal for handling census data distributions.
- User engagement exercises guided classification design, prioritising improved urban discrimination and enhanced ethnic diversity representation over direct comparability.
- The three-tier hierarchical structure comprises 8 Supergroups, 26 Groups, and 76 Subgroups, offering flexible application across different analytical scales.
- Complete methodological transparency through open-source software and public code availability distinguishes this system from proprietary commercial alternatives.
Citation
@article{gale2016creating,
author = {Christopher G Gale; Alexander D Singleton; Andrew G Bates; Paul A Longley},
title = {Creating the 2011 area classification for output areas (2011 OAC)},
journal = {Journal of Spatial Information Science},
year = {2016},
doi = {10.5311/JOSIS.2016.12.232}
}