An NFDI consortium of plant research

In modern fundamental plant research, scientists produce high-dimensional polymorphic data. To facilitate the collection, processing, exchange, and archiving of these data, the DataPLANT consortium within the NFDI drives digital transformation and democratization of research data in the field of basic plant research.

DataPLANT is well prepared for the upcoming challenges by combining expertise in the areas of basic plant research, information and computer sciences, and infrastructure. The targeted service landscape consists of technical-digital assistance as well as personnel on-site support, since the consortium was entirely designed to be user centric.

DataPLANT’s vision

DataPLANT has made it to its mission to develop a Research Data Management (RDM) system that meets community requirements and allows for the processing and contextualization of research datasets according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). Therefore, we envision the Annotated Research Context (ARC) that covers the entire research cycle, from experiments to computational analyses, actual data and metadata, as well as the resulting publications. This will ensure usability in practice and will lead to the formation of a central information resource.

To achieve this goal, work of the DataPLANT consortium is organized into four Task Areas…

Task Area I - Driving Standardization

In Task Area I, topics such as data quality, interoperability and standardization are promoted. Besides the definition of metadata standards, the application of widely used ontologies, such as the Gene-, KEGG-, and Plant Ontology, and will be supported in a first step. These existing ontologies will then be iteratively adapted to the needs of researchers and their specific experiments in order to comprehensively describe workflows.

The often-long path from the identification of a missing term to the final adoption into the descriptive reference ontology will be significantly shortened with minimal friction. A first approach to achieve this is pursued by DataPLANT’s developed tool SWATE.

Task Area II – Infrastructural Community Support

DataPLANT and the plant research community can draw on extensive infrastructural capacities for compute and storage, such as the de.NBI Cloud in Tübingen and Freiburg and the bwCloud, amongst others. Task Area II will provide the infrastructural basis for the developed tools. For this, the DataPLANT Hub, based on HUBZero, will serve as the central access point. Wherever possible, a single sign-on concept will be used for user-friendliness. Furthermore, a service to facilitate data and metadata versioning, collaboration among researchers, and data sharing within ongoing projects will be set up. To support researchers in publishing their work, a service based on InvenioRDM is established. Unique identification will be accomplished using Digital Object Identifier (DOI) and ORCID.

Task Area III – Personnel Community Support

Task Area III takes care of the continuous community contact and direct personnel support of the individual research groups during their everyday data management tasks. This is accomplished through Data Stewards as a core element of the DataPLANT research data management strategy to bridge the gap between technical solutions, infrastructure, and researchers.

By providing advice in data and workflow management, they promote the process of standardizing metadata, the use of ontologies and data provenance. Through direct on-site interaction, they can immediately respond immediately to the requirements and needs of researchers. Training of these Data Stewards also belongs to the responsibility of Task Area III.

Task Area IV - Increasing reach and sustainable development

Task Area IV is responsible for the communication within the project, as well as integration with the overall NFDI and promotion of common efforts. A central task is the extension to all researchers in the field of fundamental plant research. Only in this way the consortium can credibly represent the research area and establish common standards and concepts. Thus, DataPLANT immediately aimed for close cooperation and coordination and could already accomplish a close connection with the INF-Project SFB TRR175 and the Data Stewards of CEPLAS. A certain provision of support by data stewards, advice on application and legal issues, and infrastructure is intended for researchers who join the project later on.