In modern hypothesis-driven science, researchers increasingly rely on effective research data management services and infrastructures that facilitate the acquisition, processing, exchange and archival of research data sets, to enable the linking of interdisciplinary expertise and the combination of different analytical results. The immense additional insight obtained through comparative and integrative analyses provides additional value in the examination of research questions that goes far beyond individual experiments. Specifically, in the research area of fundamental plant research that this consortium focuses on, modern approaches need to integrate analyses across different system levels (such as genomics, transcriptomics, proteomics, metabolomics, phenomics). This is necessary to understand system-wide molecular physiological responses as a complex dynamic adjustment of the interplay between genes, proteins and metabolites. As a consequence, a wide range of different technologies as well as experimental and computational methods are employed to pursue state-of-the-art research questions, rendering the research objective a team effort across disciplines. The overall goal of DataPLANT is to provide the research data management practices, tools, and infrastructure to enable such collaborative research in plant biology. In this context, common standards, software, and infrastructure can ensure availability, quality, and interoperability of data, metadata, and data-centric workflows and are thus a key success factor and crucial precondition in barrier-free, high-impact collaborative plant biology research.
Toward this, the key objectives pursued by this consortium are:
- A specific community standard for fundamental plant research (meta)data and workflow annotation, based on generic, existing and emerging standards and ontologies in plant science and beyond.
- A robust, federated research environment for data computation and management covering the complete data lifecycle.
- Assistive mechanisms ranging from data stewards to intelligent software services to build, link and maintain the complete research context during data acquisition, curation, analysis, and publication.
- Mechanisms for collaborative research based on enrichment and automatized crosslinking of plant-research specific (meta)data to facilitate research context management.
- A platform for data provenance and research sharing including a motivation and credit system to foster the incentive to democratize research data.
- Comprehensive training to ensure data legacy through lectures, courses, workshops and summer schools and providing open training material.
- A central plant data HUB for aggregating services and knowledge, generating a searchable compendium for research in plant biology.
DataPLANT provides an additional layer of services to provide facilities to complement existing generalist infrastructures and focuses on supporting and easing the processes of complete and meaningful research metadata context management which is often lacking or inadequate in fundamental plant sciences. In this manner, we augment and complement existing services in ways that go far beyond best practices currently used. DataPLANT ensures resulting well-annotated research data objects, ongoing qualification of data literacy for plant researchers, and an integration of the plant research domain into the NFDI landscape. Becoming FAIR will drive science - Increasing the level of annotation at the source and tracking provenance using community standards will maximize data discoverability and reuse.
The approaches to data management in the Bioinformatics community will be transferred into professional structures with clear-cut responsibilities. While researchers are still in charge of the actual research, the structures will cover the complete data lifecycle from planning to sustainable referencing and (re)use of data after the original research project has been completed.
In this context, the project DataPLANT (Bioinformatics DATa ENvironment) was funded by the Ministry of Science, Research and the Arts of the State of Baden-Württemberg for four years as part of digital@bw. The aim of the project is to provide the foundation of a Science Data Centre in close cooperation with the life science communities such as research groups and infrastructure providers such as libraries and computing centres.
DataPLANT uses already existing infrastructure such as bwSFS (Storage for Science), BinAC (Bioinformatics and Astrophysics Cluster) and the de.NBI cloud (German Network for Bioinformatics Infrastructure) as well as repositories operated by the university libraries Konstanz and Tübingen. DataPLANT will also coordinate with other players in research data management.
During the project, rules for the preservation of and access to research data will be established and developed. Infrastructure and scientific methods for data analysis will be advanced. A central question during the project is: How can we annotate, curate and mark-up subject-specific and organisational metadata in a unified way that also considers legal and technical aspects such as sensitive data.
Data repositories will provide research infrastructure that is geared to provide unified standards and workflows. In turn, this will provide easy access to infrastructure and data alike which will contribute to equal opportunities in research, especially for young academics. Simultaneously, access to national and international networks, which is necessary for processing huge amounts of data, will be facilitated for research groups. A central piece of the project is the education of young researches in terms of teaching them methods of digitised research, stat-of-the-art procedures in information technologies and research data management. DataPLANT aims to provide a generic access to infrastructure and wants to develop methods for long-term accessibility to data. In principle, DataPLANT is open to additional communities.