DataPLANT News

Kick-Off Task Area 2 ''Software/Service''

30 Oct 2020

On October, 30th the participants in Task Area 2 gathered to discuss and define the direction of the developments in software, services and infrastructure. The participants bring in a diverse expertise in a broad range of field to foster a lively exchange and a broad range of concepts and ideas to be discussed with the community: Christoph Garth is an expert in visualizing biological data in HPC context, setting up workflows on large data sets and integrating them into services landscape. Stefan Dessloch contributes deep insights into data base design and transformation of data. Björn Grüning is heavily involved in Galaxy, Bioconda, BioContainers, bioinformatical workflows, Elixir, EOSC(-life), EGI and Healthy Cloud. Heike Leitte is an expert on visualization, human machine interaction, the use of Jupyter notebooks and machine learning. Sven Nahnsen brings in the QBIC insights on *omics workflows, portal design, Nextflow and metadata in the field of bioinformatics. Dirk von Suchodoletz brings in the provider's perspective on the technical aspects in research data management, service integration and general infrastructure. Jens Kürger, the head of TA2, is deeply involved in the bwHPC (Baden-Württemberg HCP), de.NBI Cloud, the Tübingen Machine Learning Cloud, EOSC-life, Healthy Cloud, the GHGA NFDI and participated in MoSGrid. Kaus Rechert contributes his expertise in long-term access to digital objects, such as software preservation, long-term strategies, archiving in the mixture of software and data.

Similar to the overall project, TA2 has to cope with a budget cut as well, i.e. to rearrange the work packages and milestones. The personnel will be hired when the consortial contract is signed by all co-applicants, which is supposed to happen soon. For the architectural framework of the software and services building blocks like the envisioned portal, metadata registry, a data versioning service (based on Git and S3 Storage provided by de.NBI Cloud and bwSFS), IsaTab, SWATE and annotation of data sets were briefly discussed. To deepen the conversation and mutual understanding, there will be regular online meetings (one hour in length) held every second Friday, starting with community needs and a first draft of the ARC concept, followed by an introduction to the planned portal. Further meetings will deal with data versioning and the required underlying infrastructure, ideas and concepts for a metadata registry, workflow systems (Galaxy, NextFlow), federated storage and interfaces to it (data publication, Invenio), data modelling (conversion, machine learning, data quality) and data presentation (visualisation and ontologies). At a certain point the topic of sustainable long-term access will become relevant (format analysis, risiks, migration, software preservation, license management and preservation planning). DataPLANT´s three-pillar model, consisting of standardization, software and services as well as direct community support through data stewards, envisages an agile approach with gradual steps is envisioned for the planning and development, including involving regular community interaction. A starting point is the first outline of the ARC and the feedback from the call for ARCs to be computed and discussed in the upcoming weeks.