DataPLANT will engage further with InvenioRDM for Data Publication
- Published on Sunday, 16 May 2021
DataPLANT envisions a much stronger role of data publications in the future. To provide the possibility of data publication, DataPLANT prepares to use InvenioRDM, a modern state-of-the-art framework. InvenioRDM is funded primarily through CERN’s using both structural funding as well as grant money. Invenio is a partners driven endeavor to answer data publication needs, and collaborating on InvenioRDM helps with a cheaper and better product. Collaboration-wise InvenioRDM works as an open collaboration without formal project agreements between partners. Instead, it relies on a governance and code of conduct as the basis for the collaboration. Legally, it requires that the source code is open source licensed and that contributors retain their copyright. Invenio is open to onboard new partners. DataPLANT plans to become a partner through Freiburg university and will commit resources to Invenio in form of two person-months focusing on testing, documentation, publication metadata, and outreach into the plant community. That DataCite metadata schema is the core basis for the metadata model. Additionally, extensions for scientific metadata are possible.
In the momentarily state of development Invenio includes a comfortable user interface and the OAI-PMH interface for indexing and exchange of metadata with other systems. Invenio includes already necessary interfaces to ORCID authentication and DOI/DataCite. It will utilize the bwSFS Storage for Science service as the storage backend. The system is distributed over four geographically distributed locations of the in DataPLANT participating computer centers in Tübingen and Freiburg. For the storage of the actual datasets (the envisioned annotated research context), InvenioRDM natively supports the S3 protocol and thus allows the connection to the object storage infrastructure of bwSFS. In its implementation, Invenio takes advantage of S3 and acts as a broker. The system hands out pre-signed URLs during data transfers to allow a direct connection between clients and object storage. This directly leverages the high availability and performance of the underlying bwSFS infrastructure.
The extensive REST API of InvenioRDM offers additional possibilities for integration with third-party systems like e.g. ORCID for further workflows which are in discussion at the moment. Initial workflows have been already been evaluated, such as the automatic creation of publication drafts from different systems. These include code versioning platforms such as GitLab or GitHub for new code releases, other compute environments such as HPC or Galaxy after completion of jobs, or the envisioned science gateway in the form of the DataPLANT Hub. For the allocation of digital object identifiers (DOI) in Invenio, established services of the participating university libraries will be leveraged.
DataPLANT presented at E-Science-Tage
- Published on Tuesday, 09 March 2021
Once again, the E-Science Days were acomplete success. In addition to the other fortunate consortia that have successfully mastered the first round of funding, we were also present with numerous contributions. Benedikt Venn (@BenediktVenn) used his poster to explain what an ARC (Annotated Research Context) actuall is.
Jens Krüger, one of the co-speakers of DataPLANT went in his tech talk into more detail about the Metadata ToolChain, which is offered by DataPLANT up to now.
Furthermore, our speaker Dirk von Suchodoletz with significant support through professional poster design by our project coordinator Cristina Martins Rodrigues placed DataPLANT on the winner's podium of the award winners with the poster about our Data Steward concept. First placed was Christian Schmidt (@SchmChristian) with his poster about NFDI4BioImage, a consortium we are looking forward to collaborate with in the future. Fingers crossed! Our sister project BioDATEN went third.
To conclude the event, Cristina Martins Rodrigues (@C_MRodrigues), our project coordinator, presented DataPLANT as one of nine scientific consortia of the NFDI during its workshop. Especially the cross-cutting topics to be addressed in collaboration of all NFDI consortia seemed to be relevant for the auditorium. It remains to wait how the NFDI landscape will look like after the completion of the 3rd funding round. We are excited!
DataPLANT presented at NFDI-Talks online series
- Published on Tuesday, 02 March 2021
DataPLANT presented in the second event of the NFDI Talks online series on considerations on project sustainability and community expansion as the total number of researchers covered by the NFDI in Germany should be steadily expanded in the course of the formation process. Additionally the consortia should provide concepts for a long-term perspective after the first funding phase. The ongoing procedures for the formation of new consortia as well as the activities within the already funded consortia contribute to this. A central task of the consortia is to extend their visibility to the entirety of their research communities. Only in this way can they credibly represent their respective fields and establish common standards and concepts. One of the success criteria for the funded consortia is to penetrate their own community as completely as possible and to network with other research areas. For this, a coordinated process for onboarding should be agreed upon, how new members formally join a consortium and how they are integrated into the governance structure.
In addition, a long-term perspective of financial sustainability should be developed. This can be achieved in part through the design of future funding. A complementary path lies in greater involvement of the entire scientific community through the application process for DFG funding and the joint design of data management plans. Consultation with the NFDI could ensure that new projects work according to current workflows, with modern tools and jointly established standards, and that this is supported in the project process by the NFDI. Such support should function independently of the concrete funding progress of the individual NFDI consortia and the initial group of co-applicants and participants. For practical implementation, this could be realized, for example, through a co-applicant role of the respective technical consortium. This avoids questions about potential VAT liability of the supporting consortium through personnel and infrastructural contributions. (DFG: "As far as the financial side is concerned, the point to be clarified in particular is the extent to which consortia can offer services in return for payment without becoming liable to VAT and thus losing their non-profit status.")
Strategic considerations of sustainable development include recommendations and guidelines that should be given to future research and collaborative projects. Specialized science projects in virtually all disciplines typically require advice and support in the various aspects of data management, handling tools and workflows, and technical infrastructure. To be able to respond flexibly in this regard, a significant amount of staff should be allocated to support these communities through data stewards. This support for the co-applicants is already arranged by the DataPLANT consortium according to their needs, but in addition, research groups joining later should also benefit from the NFDI. Therefore, it needs to be discussed what conditions and commitment should be attached to inclusion and, in particular, how the human and infrastructural resources needed to support the new groups can be generated.
A conceivable approach for the future would be that newly submitted research proposals can apply for funds for support services in personnel or infrastructural form, comparable to INF subprojects in SFBs, which, if approved, flow directly to the NFDI or the assigned consortium. This allows in particular to deal well with partial positions - small projects cannot always apply for full positions for data management. It also ensures the sustainability and continuity of the experts employed by the NFDI. Recruitment and training of suitable personnel and follow-up employment after the end of the project are often unsolved challenges in the science enterprise, especially for a demanding range of tasks, such as data stewards.
Participation of the researchers and involvement of the communities in the design of such processes can be ensured via the committees in the individual consortia and the NFDI association. In this way, such an approach can be coordinated with the relevant stakeholders, so that a clear procedure can be established without setting up duplicate structures. Thus, DataPLANT has already started to work together with the TRR175 and Cluster of Excellence CEPLAS regarding a close cooperation and coordination right after the start. Here, a close linkage with the INF project of the CRC and the data stewards of CEPLAS could already be established. Such considerations could potentially be expanded and further developed step by step for the entire NFDI.