DataPLANT will engage further with InvenioRDM for Data Publication

16 May 2021

DataPLANT envisions a much stronger role of data publications in the future. To provide the possibility of data publication, DataPLANT prepares to use InvenioRDM, a modern state-of-the-art framework. InvenioRDM is funded primarily through CERN’s using both structural funding as well as grant money. Invenio is a partners driven endeavor to answer data publication needs, and collaborating on InvenioRDM helps with a cheaper and better product. Collaboration-wise InvenioRDM works as an open collaboration without formal project agreements between partners. Instead, it relies on a governance and code of conduct as the basis for the collaboration. Legally, it requires that the source code is open source licensed and that contributors retain their copyright. Invenio is open to onboard new partners. DataPLANT plans to become a partner through Freiburg university and will commit resources to Invenio in form of two person-months focusing on testing, documentation, publication metadata, and outreach into the plant community. That DataCite metadata schema is the core basis for the metadata model. Additionally, extensions for scientific metadata are possible.

Invenio

In the momentarily state of development Invenio includes a comfortable user interface and the OAI-PMH interface for indexing and exchange of metadata with other systems. Invenio includes already necessary interfaces to ORCID authentication and DOI/DataCite. It will utilize the bwSFS Storage for Science service as the storage backend. The system is distributed over four geographically distributed locations of the in DataPLANT participating computer centers in Tübingen and Freiburg. For the storage of the actual datasets (the envisioned annotated research context), InvenioRDM natively supports the S3 protocol and thus allows the connection to the object storage infrastructure of bwSFS. In its implementation, Invenio takes advantage of S3 and acts as a broker. The system hands out pre-signed URLs during data transfers to allow a direct connection between clients and object storage. This directly leverages the high availability and performance of the underlying bwSFS infrastructure.

The extensive REST API of InvenioRDM offers additional possibilities for integration with third-party systems like e.g. ORCID for further workflows which are in discussion at the moment. Initial workflows have been already been evaluated, such as the automatic creation of publication drafts from different systems. These include code versioning platforms such as GitLab or GitHub for new code releases, other compute environments such as HPC or Galaxy after completion of jobs, or the envisioned science gateway in the form of the DataPLANT Hub. For the allocation of digital object identifiers (DOI) in Invenio, established services of the participating university libraries will be leveraged.