Geographic Data Science

Author

Alex Singleton; Daniel Arribas‐Bel

Published

January 1, 2021

Alex Singleton; Daniel Arribas‐Bel (2021). Geographical Analysis, 53(1), 61-75. DOI: 10.1111/gean.12194

Abstract

It is widely acknowledged that the emergence of “Big Data” is having a profound and often controversial impact on the production of knowledge. In this context, Data Science has developed as an interdisciplinary approach that turns such “Big Data” into information. This article argues for the positive role that Geography can have on Data Science when being applied to spatially explicit problems; and inversely, makes the case that there is much that Geography and Geographical Analysis could learn from Data Science. We propose a deeper integration through an ambitious research agenda, including systems engineering, new methodological development, and work toward addressing some acute challenges around epistemology. We argue that such issues must be resolved in order to realize a Geographic Data Science, and that such goal would be a desirable one.

Extended Summary

This paper explores how geography and data science can work together to better understand spatial problems and create a new field called Geographic Data Science. The research examines the emergence of data science as a way to make sense of “big data” - massive datasets generated by smartphones, sensors, social media, and other digital technologies that often contain geographical information. However, the paper argues that current data science approaches often treat location as just another column in a database, ignoring the unique properties of spatial data that geographers have been studying for decades. This represents a significant missed opportunity, as many contemporary big datasets have clear spatial dimensions that require specialist geographical knowledge to analyse properly. The paper proposes that geography can contribute critical perspectives to data science, particularly around understanding spatial relationships, dealing with geographical uncertainty, and ensuring ethical considerations are embedded in spatial analyses. Geography’s long tradition of interdisciplinary working makes it well-positioned to engage with data science methods whilst providing essential context about place, scale, and spatial processes. Conversely, geography can benefit from data science’s advanced computational techniques, machine learning approaches, and infrastructure for handling large datasets. The research outlines a three-stage integration process: coupling (linking tools between platforms), assimilation (embedding practices and methods), and full integration into Geographic Data Science. This progression would involve developing new systems for storing and visualising spatial big data, creating spatially-aware machine learning algorithms, and establishing epistemological frameworks that combine geographical theory with data-driven approaches. The paper highlights several technical challenges, including efficiently handling space-time data at scale, developing spatially-explicit unsupervised learning methods, and creating transparent alternatives to “black box” predictive models that can be scrutinised for social justice implications. The research argues that Geographic Data Science could enhance both prediction and explanation in spatial analysis, moving beyond the traditional divide between these approaches. This integration is particularly important as data science increasingly tackles inherently geographical problems like urban planning, environmental monitoring, and social inequality, but often lacks the theoretical foundation to handle spatial complexity appropriately. The broader significance lies in maintaining geography’s relevance in an increasingly data-driven world whilst ensuring that spatial analysis benefits from computational advances. This collaboration could prevent the reinvention of geographical concepts by data scientists working on spatial problems, whilst helping geography engage with contemporary technological developments and large-scale datasets that traditional geographical methods struggle to handle.

Key Findings

  • Data science often treats location as supplementary information, ignoring unique spatial properties that require specialist geographical analysis methods and theoretical understanding.
  • Geographic Data Science integration should progress through coupling tools, assimilating methods, and developing new spatially-aware machine learning algorithms and epistemological frameworks.
  • Geography’s interdisciplinary tradition positions it well to provide critical perspectives on spatial big data whilst benefiting from data science’s computational techniques.
  • Technical challenges include developing efficient space-time data structures, spatially-explicit unsupervised learning, and transparent alternatives to black box predictive models.
  • This integration could maintain geography’s relevance in data-driven contexts whilst ensuring spatial analysis incorporates contemporary computational advances and large-scale datasets.

Citation

PDF Download BibTeX

@article{singleton2021geographic,
  author = {Alex Singleton; Daniel Arribas‐Bel},
  title = {Geographic Data Science},
  journal = {Geographical Analysis},
  year = {2021},
  volume = {53(1)},
  pages = {61-75},
  doi = {10.1111/gean.12194}
}