Open data products-A framework for creating valuable analysis ready data

Author

Dani Arribas-Bel; Mark Green; Francisco Rowe; Alex Singleton

Published

October 20, 2021

Dani Arribas-Bel; Mark Green; Francisco Rowe; Alex Singleton (2021). Journal of Geographical Systems, 23(4), 497-514. DOI: 10.1007/s10109-021-00363-5

Abstract

This paper develops the notion of “open data product”. We define an open data product as the open result of the processes through which a variety of data (open and not) are turned into accessible information through a service, infrastructure, analytics or a combination of all of them, where each step of development is designed to promote open principles. Open data products are born out of a (data) need and add value beyond simply publishing existing datasets. We argue that the process of adding value should adhere to the principles of open (geographic) data science, ensuring openness, transparency and reproducibility. We also contend that outreach, in the form of active communication and dissemination through dashboards, software and publication are key to engage end-users and ensure societal impact. Open data products have major benefits. First, they enable insights from highly sensitive, controlled and/or secure data which may not be accessible otherwise. Second, they can expand the use of commercial and administrative data for the public good leveraging on their high temporal frequency and geographic granularity. We also contend that there is a compelling need for open data products as we experience the current data revolution. New, emerging data sources are unprecedented in temporal frequency and geographical resolution, but they are large, unstructured, fragmented and often hard to access due to privacy and confidentiality concerns. By transforming raw (open or “closed”) data into ready to use open data products, new dimensions of human geographical processes can be captured and analysed, as we illustrate with existing examples. We conclude by arguing that several parallels exist between the role that open source software played in enabling research on spatial analysis in the 90 s and early 2000s, and the opportunities that open data products offer to unlock the potential of new forms of (geo-)data.

Extended Summary

This research introduces the concept of “open data products” (ODPs) as a framework for transforming complex, inaccessible data into valuable analysis-ready information that can benefit society. The paper addresses how the current data revolution has created vast quantities of information that remain largely unused due to privacy concerns, technical barriers, and the unstructured nature of modern data sources. The research defines open data products as the open result of transparent processes that turn various data sources into accessible information through services, infrastructure, and analytics whilst adhering to open principles throughout development. The methodology involves developing a comprehensive framework with four key building blocks: identifying problems that need insight, adding value through data processing and analysis, applying open geographic data science principles, and conducting active outreach to ensure maximum impact. The framework emphasises the importance of co-design with end users and transparent, reproducible processes that can build trust amongst stakeholders. Key findings demonstrate that open data products offer significant advantages over traditional open data approaches. They enable insights from highly sensitive or controlled data that might otherwise be inaccessible, whilst expanding the use of commercial and administrative data for public benefit. The research illustrates these concepts through case studies including geodemographic classifications and COVID-19 pandemic response data products. The paper argues that open data products bridge the gap between raw data and user needs by generating analysis-ready datasets that unlock research potential and inform evidence-based policy making. The study identifies several challenges including sustainability, funding models, and the need for specialised technical skills and infrastructure. However, it positions open data products as essential tools for maximising the societal value of the current data explosion. The research draws parallels between open data products and the transformative role that open-source software played in democratising access to spatial analysis methods in the 1990s and 2000s. Just as open-source software unlocked computational potential, open data products can unlock the potential of new forms of geographic data. This work contributes to geographic data science literature by providing a framework that expands understanding of how open data can be generated and what constitutes the basis for creating truly useful open datasets, whilst ensuring final usability and reliability for diverse stakeholders.

Key Findings

  • Open data products transform raw, complex data into accessible analysis-ready information through transparent, reproducible processes that add significant value beyond traditional data publishing.
  • The framework enables insights from sensitive or controlled data whilst expanding commercial and administrative data use for public benefit with high temporal and geographic granularity.
  • Four building blocks characterise successful open data products: problem identification, value addition, open geographic data science principles, and active stakeholder outreach and engagement.
  • Open data products parallel the democratising role of open-source software in spatial analysis, potentially unlocking new forms of geographic data for research and policy.
  • Case studies including geodemographic classifications and COVID-19 response demonstrate how open data products bridge gaps between available data and actionable insights for decision-making.

Citation

PDF Download BibTeX

@article{arribasbel2021open,
  author = {Dani Arribas-Bel; Mark Green; Francisco Rowe; Alex Singleton},
  title = {Open data products-A framework for creating valuable analysis ready data},
  journal = {Journal of Geographical Systems},
  year = {2021},
  volume = {23(4)},
  pages = {497-514},
  doi = {10.1007/s10109-021-00363-5}
}