State of the backend storage infrastructure is evolving

The storage system bwSFS (Storage-for-Science) forms the geo-redundant distributed technical backbone for basic storage services, research data management and sharing of data. It contributes to the storage infrastructure for the DataPLANT community beside the de.NBI infrastructure services. The central storage components of bwSFS are located at the Tübingen and Freiburg computer centers. In order to reasonably manage the intended broad user base of the system - besides DataPLANT, the local communities of the participating universities and the Science Data Center BioDATEN are served as well - and to achieve a seamless integration into the envisioned DataPLANT services, a federated management of project, user and group data is necessary. Already in the implementation phase of the software and services, which involves the subject sciences, it becomes apparent that the existing methods for identity management are not sufficient. Compared to HPC services, storage services require a much deeper integration of existing infrastructures and a more flexible user management, which should also include ORCID, for example.

To support research data management, the use of InvenioRDM, which already includes a convenient user interface and the OAI-PMH interface, is chosen within bwSFS. All central institutions and projects involved in the RDM process were included in this decision at an early stage. In Freiburg, a Gitlab for versioning, collaboration and sharing of data of ongoing projects will be used in the context of TA 4. Established services of the university libraries will be used for DOI allocation in Invenio, and ORCID will be used for persistent identification of researchers. In this way, resources for RDM will be pooled in order to improve support for the disciplines in the implementation of specific RDM requirements and to ensure better advice for researchers. To enforce the FAIR and OpenAccess principles, DMPs will be used to support standardization in TA 1 with guidelines for metadata management, archiving and licensing models.

Kick-Off Task Area 2 "Software / Services"

On October, 30th the participants in Task Area 2 gathered to discuss and define the direction of the developments in software, services and infrastructure. The participants bring in a diverse expertise in a broad range of field to foster a lively exchange and a broad range of concepts and ideas to be discussed with the community: Christoph Garth is an expert in visualizing biological data in HPC context, setting up workflows on large data sets and integrating them into services landscape. Stefan Diesloch contributes deep insights into data base design and transformation of data. Björn Grüning is heavily involved in Galaxy, Bioconda, BioContainers, bioinformatical workflows, Elixir, EOSC(-life), EGI and Healthy Cloud. Heike Leitte is an expert on visualization, human machine interaction, the use of Jupyter notebooks and machine learning. Sven Nahnsen brings in the QBIC insights on *omics workflows, portal design, Nextflow and metadata in the field of bioinformatics. Dirk von Suchodoletz brings in the provider's perspective on the technical aspects in research data management, service integration and general infrastructure. Jens Kürger, the head of TA2, is deeply involved in the bwHPC (Baden-Württemberg HCP), de.NBI Cloud, the Tübingen Machine Learning Cloud, EOSC-life, Healthy Cloud, the GHGA NFDI and participated in MoSGrid. Kaus Rechert contributes his expertise in long-term access to digital objects, such as software preservation, long-term strategies, archiving in the mixture of software and data.

Similar to the overall project, TA2 has to cope with a budget cut as well, i.e. to rearrange the work packages and milestones. The personnel will be hired when the consortial contract is signed by all co-applicants, which is supposed to happen soon. For the architectural framework of the software and services building blocks like the envisioned portal, metadata registry, a data versioning service (based on Git and S3 Storage provided by de.NBI Cloud and bwSFS), IsaTab, SWATE and annotation of data sets were briefly discussed. To deepen the conversation and mutual understanding, there will be regular online meetings (one hour in length) held every second Friday, starting with community needs and a first draft of the ARC concept, followed by an introduction to the planned portal. Further meetings will deal with data versioning and the required underlying infrastructure, ideas and concepts for a metadata registry, workflow systems (Galaxy, NextFlow), federated storage and interfaces to it (data publication, Invenio), data modelling (conversion, machine learning, data quality) and data presentation (visualisation and ontologies). At a certain point the topic of sustainable long-term access will become relevant (format analysis, risiks, migration, software preservation, license management and preservation planning). DataPLANT´s three-pillar model, consisting of standardization, software and services as well as direct community support through data stewards, envisages an agile approach with gradual steps is envisioned for the planning and development, including involving regular community interaction. A starting point is the first outline of the ARC and the feedback from the call for ARCs to be computed and discussed in the upcoming weeks.

DataPLANT is happy to announce a project coordinator

The DataPLANT NFDI was able to convince Cristina, a highly skilled person about to graduate in fundamental plant research as a project coordinator. She will help to work on DataPLANT mission to combine the technical expertise in the areas of basic plant research, information and computer sciences and infrastructure specialists for supporting plant scientists in the handling of research data in a customized way.

She coordinates the DataPLANT consortium within the NFDI, which is concerned with supporting research data management in basic plant research. She is in close contact with the research community and the speakers of the consortium. She will interact closely with thematically related NFDI consortia and the NFDI Directorate. To this end, she already brings a wealth of experience in the plant community to bear to advance collaboration with various stakeholders in Germany and internationally. She also coordinates the governance of DataPLANT and assists the “Data Stewards” in supporting the researchers in the individual groups. In addition, she represents the spokespersons in committees and associations at federal and European level. Her profile: She holds a Master’s degree in plant sciences with a thesis on “Physiological and biochemical analyses of sugar transporter-dependent cold response of Arabidopsis thaliana”. Her further career as a plant physiologist began with her entry into the department of Prof. Neuhaus, where she started her PhD as a member of the BMBF-funded Betahiemis project.

She already possesses excellent skills in the management of scientific projects. In the course of her research activities she was able to acquire knowledge in the management of large data sets and experience in scientific data analysis and data visualization. She has already come into contact with topics relevant to DataPLANT such as ontologies, data containers, FAIR data. She has a wide range of publishing experience and experience in fund acquisition, third-party funding management, and writing project reports. She is also familiar with committee work and the organization of conferences and workshops.

Cristina will take her post in November. Further open positions in the DataPLANT NFDI which will appear in the coming weeks can be found on this web page and positions in the context of the general NFDI are posted at the NFDI webpage at