Krister Lindén (University of Helsinki, Finland)Mr Peter Wittenburg (Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands)
The Report “Riding the Wave” created by the high level expert group pointed out what we increasingly often felt since years: we need to take urgent measures with respect to our scientific data, if we do not want to risk the disaster of not being able to access it anymore. However, the report also emphasizes the opportunities of the information hidden in the increasing amount of data. Obviously we need to increase the awareness of all stakeholders such as researchers, data scientists, research organizations and even the public about the huge relevance of our data to extract knowledge that we will need in the coming decades and beyond. A concerted action is required that will amount in a Collaborative Data Infrastructure (CDI) existing of three layers: the researchers as data generators and users, research infrastructures offering community specific data services and a data-oriented e-Infrastructure offering common data services. Since the creator-consumer relations get more anonymous, we need to new ways to establish trust relationships, but yet we do not understand all steps to be taken. Since knowledge about the stored data objects is distributed vertically the responsibility for data curation is shared - yet we do not have proper mechanisms in place to synchronize decisions. The report makes a number of suggestions to come to a vision 2030 for data management and access. We will relate these visions with the current reality of data management and access and will use the linguistic domain as an example. We will describe the data architecture being worked out and also indicate some of the limitations we are faced with when establishing a data infrastructure.