NO.183 Understanding the “Why” of Data and Knowledge Models
December 13 - 17, 2021 (Check-in: December 12, 2021 )
- George H. L. Fletcher
- TU Eindhoven, Netherlands
- Marie Katsurai
- Doshisha University, Japan
- Juan F. Sequeda
- data.world, USA
- Hsiang-Yun Wu
- TU Vienna, Austria
Description of the Meeting
Context and Motivation
Data integration is the problem of 1) combining data residing in different sources, and 2) providing a unified view of this data. A modern manifestation of data integration is Knowledge Graphs which integrates not just data but also knowledge at scale in the form of a graph data model. This has gained wide popularity in academia and industry in areas ranging from search to mapping.
An argument to integrate data using graphs is that graphs bridge the conceptualization gap between how people think about data and how data is physically stored. One can see this for example when people end up drawing graphs on the white board when describing data. Anecdotally, we often hear that a graph data model is a natural representation of data coming from heterogeneous sources.
However, if we look at the history of data models, we observe the following:
1) There has been numerous types of data models and corresponding query languages used for data integration. We can group these into three types of data models: tabular, graph, and tree each with many different flavors. The relational model continues to be dominant in practice, with additional light weight usage in the form of CSV widely used. Graph models such as RDF and Property graph data models have come to the fore. XML and JSON as tree data models are prevalent. Tailored query languages have been designed and engineered for each of these data models.
2). What goes around comes around. We have seen many data models come and go several times over the past 50 years. The first databases before the relational model were network (graph) and hierarchical (tree). During the 80s, Object Oriented Databases were common and foundational graph data models were developed (see the Survey of Graph Database Models by Angles and Gutierrez).
3) We live increasingly in a data-driven world. Data and data analytics play a central role, not only in technical systems but also broadly in organizations and society, across government, academia, and industry and both in private and public spaces. A broad shift is underway, from data-centric analytics to human-centric analytics, where the emphasis in on understanding the role and impact of data in organizations and society, i.e., what do people do with data? and, what does data do to people?
Based on these observations and the current popularity of Knowledge Graphs for data integration, we believe it is the right time to reflect on the tripartite relationship between data models, their corresponding query languages, and the people both using and producing integrated data. Importantly, this reflection should drive us towards a more deeply empirical and social understanding of data models, not based in anecdotes but in data and methodologically rigorous investigation.
Understanding the relationships between data models, query languages, and people
Thus, we argue that we need to understand how people perceive the way data is modeled and represented. In order to do so, we need to work with scientists and experts across communities to design methodologies, experiments and user studies. This requires bringing together data management expertise in theory, systems and semantics with communities who study people (e.g. human data interaction) and those who are actively using data (e.g. data journalist, political scientist, life science, etc.).
An initial list of questions we want to think about in the seminar:
- What is the role and function of data and knowledge modeling in organizations?
- Why do we keep inventing new data models?
- What problem is each new data model addressing?
- What are the overlapping features across these data models?
- How can we best create mappings between these data models?
- Is there an ideal data model for a particular type of user and for a particular type of task?
- What are the affordances necessary to help people in creating and using data models?
- Are new organizational roles needed for data and knowledge modeling and management?
Fundamentally, we need to embrace the role of human understanding in data modeling.
Goals and outcomes of the meeting
1) Build bridges between the data management, human computer interaction, knowledge engineering, and semantic web research communities and practitioners and users of data analytics technologies and the social sciences.
2) Compile existing user studies and methodologies and take steps towards proposing new ones.
3) Articulate a shared Vision Statement on open challenges, for peer-reviewed publication.
4) Concrete action plans for collaborations in research and longer term international projects of broad ambition.