‘Research ready’ geographically enabled smart data
Paul A. Longley; James Cheshire; Alex Singleton (2024). Annals of GIS, 30(3), 267-273. DOI: 10.1080/19475683.2024.2353035
Abstract
This paper reviews and assesses the prospects for developing geographically enabled research ready data (RRD) with reference to current UK initiatives. Examples of projects for which such data have been provisioned are given.
Extended Summary
This paper examines how to develop geographically enabled ‘research ready data’ (RRD) from smart data sources to support social science research across diverse user communities. The research draws on UK initiatives in smart data infrastructure, particularly the Consumer Data Research Centre’s work with linked consumer registers (LCRs). The paper argues that traditional survey-based social investigation, whilst robust, has been transformed by networked computing and big data, creating new opportunities but also challenges around data selectivity, preparation, and inclusivity. The study outlines five key requirements for smart data research: understanding uncertainty and representation gaps; enabling individual-level linkage to avoid ecological fallacy; improving academic access to commercially-sensitive datasets; developing appropriate infrastructure with effective governance and disclosure control; and creating preparation procedures to avoid duplication of effort. The research demonstrates these principles through the LCRs dataset, which links consumer lifestyle surveys, Electoral Registers, energy performance certificates, and property transactions using geographic referencing. This population-wide database covers adult individuals and their residential properties, enabling scale-free measures of coverage and longitudinal tracking. The paper presents a framework for effective smart data RRD provision involving three core components: breadth (multilateral licensing and integrated data spine), curation (internal and external validation procedures), and access (streamlined user interfaces and documentation). Geographic referencing emerges as crucial for linking datasets whilst maintaining individual-level detail within trusted research environments. The study includes over 50 illustrative applications built using the LCRs, ranging from gentrification analysis to COVID-19 research, demonstrating the value of selective abstractions from comprehensive smart data holdings. The research concludes that future development requires AI-driven synthesis of missing population elements whilst retaining detailed geography within secure environments. This vision encompasses spatial data infrastructure that can frame any smart data source to known levels of precision, supporting inclusive social science whilst minimising algorithmic bias. The work emphasises that successful implementation requires multilateral data licensing agreements, extended legal and ethical support, and improved technical capacity for highly disaggregated spatial linkage of individual-level geographic data.
Key Findings
- Smart data requires systematic consolidation addressing uncertainty, individual-level linkage, improved academic access, infrastructure development, and preparation procedures.
- Geographic referencing enables effective linkage of multiple smart data sources whilst maintaining individual-level detail within trusted research environments.
- The linked consumer registers demonstrate successful creation of population-wide research ready data supporting over 50 diverse academic applications.
- Future smart data infrastructure requires multilateral licensing agreements and AI-driven synthesis of missing population elements with detailed geographic retention.
- Effective provision framework encompasses breadth through integrated acquisition, curation via validation procedures, and accessible user interfaces with comprehensive documentation.
Citation
@article{longley2024research,
author = {Paul A. Longley; James Cheshire; Alex Singleton},
title = {‘Research ready’ geographically enabled smart data},
journal = {Annals of GIS},
year = {2024},
volume = {30(3)},
pages = {267-273},
doi = {10.1080/19475683.2024.2353035}
}