Reply to the reviewer panel's protocol sent to the DFG

Today the DataPLANT consortium sent the three pages reply to the reviewers remark and used the chance to clarify a couple of open questions. We got a quite positive feedback on our concept of data stewards as a central support element to the fundamental plant research community. This core element makes the training and recruitment of suitable people a central task of the DataPLANT consortium. The successful training of the Data Stewards essentially requires basic skills in the standardized handling of data, their modeling and analysis, as well as a solid domain understanding of quantitative plant biology, especially the methods used in this research field. In addition, practical skills in handling different software packages, analysis and workflow tools as well as in teaching and training are required. To ensure the training of the Data Stewards from the start of DataPLANT, we propose a strategy with a mixture of direct and long-term measures. Corresponding training and further education opportunities are already included in the curricula of the applicant institutions. It is also planned to make existing courses of study (e.g. in the field of data science) more flexible so that the basic training of data stewards can be provided within the framework of these courses of study. A new development of tailor-made courses of study is also conceivable in the long term. With regard to both measures, the applicants were able to secure the support of the partner universities in preliminary discussions. For example, the course of study Quantitative Plant Biology at CEPLAS is a valuable addition to the recruitment of personnel.

The review panel rightfully suggests that the services to be developed in DataPLANT should be flexible enough to ensure sustainability. A central goal in the design of the DataPLANT Hub is to provide flexible, demand-oriented and sustainable support for research data management throughout the entire research cycle. An essential aspect of the hub is the abstraction of workflows using the Common Workflow Language (CWL), which is widely used in all scientific fields, in order to enable researchers to execute workflows on any platform and to avoid the mentioned dependencies. The intended annotation of workflows with metadata as well as the quality measures for workflows to be developed act as additional abstraction mechanisms. Currently, GALAXY is a mature and widely used workflow platform on which the DataPLANT Hub and other measures of the consortium can be based at an early stage. Nevertheless, a sole dependency on GALAXY is actively avoided by abstraction and open creation of the DataPLANT Hub.

DataPLANT aims to mirror the fundamental plant research community across all status groups in research and provider institutions. In order to ensure diversity, DataPLANT will implement the following multi-track strategy: 1) continuous recruitment of new participants and active recruitment of suitable individuals with a special focus on diversity from the growing plant community and beyond; 2) Exemplary role in the design of the initial governance: The governance will be broadly staffed with members from the group of participants with respect to diversity, especially regarding career level, reputation and gender balance. We are also able to involve scientifically very experienced and proven personalities whose experience with clusters of excellence DataPLANT will benefit from. 3) Development of promotion and qualification opportunities (especially in the context of the training of Data Stewards) that allow participants at early career stages to become involved in DataPLANT at an early stage and to take on responsibility. 4) Active use of the explicit diversity measures of the participating institutions.

Similar to the review panel, we consider the operating and financing model of DataPLANT to be a major challenge for long-term sustainability. While initially the services are offered free of charge for participants through project financing, for a permanent offer there is the necessity to put needs and arising costs for the refinancing of resources into relation. Supporting individual researchers and research associations in applying for FDM funding is a key aspect of DataPLANT. Our vision is that the mechanisms developed in DataPLANT will manifest themselves sustainably in the community. DataPLANT brings its ideas and efforts to the overarching NFDI structure (e.g., via the cross cutting topics) and is itself open to being integrated into the emerging frameworks of the NFDI and its approaches. We are in continuous exchange with other consortia on these topics. First approaches to solutions already result from the integration of the providers into collaborative national, national and international infrastructures.

DataPLANT advised on grant application and discusses compensation models with participants

The NFDI starts to shape the future scientific landscape and should be taken into account when planning research projects and cooperations. Together with a participant, the “Plant Genome and Systems Biology” group at the Helmholtz-Zentrum München DataPLANT discussed possible forms of support regarding both infrastructure services and consulting. Such cooperations would require agreed upon compensation models. Both parties acknowledged the fact of challenges regarding cost compensation models which comply with non-profit/public-benefit requirements and capable of being integrated into the central NFDI governance structure.

Calculation of costs and refinancing becomes unavoidable at some point to allow sustainable cooperation. Different funding streams need to be taken into account. Every institution has funds to pay for commercial third-party services and consulting, but it is nearly impossible to proper receive such funds as a research institution. Direct flows of money in a consortium of differently organized and funded research institutions is an issue to be iterated and solved via the corresponding cross-cutting topic. The data steward services can be clearly accounted for. Thus, it could be an option to use vouchers or coupons on services in exchange for redirected financing via a grant application. An endorsement model could be established to foster such developments, where research funding agencies see the NFDI as a service broker. The NFDI structure ensures good scientific practice by offering certified services which are in turn applied for by research groups. The funding agencies would divert a certain amount of support to the NFDI in relation to the equivalent of the services requested. A base level funding e.g. either through universities or the NFDI would compensate for consultation to non-successful applications. Such options need to be discussed and developed together with all stakeholders within DataPLANT, the general NFDI level, and the appropriate political sphere. Efficient ways of reimbursement without overhead and bureaucracy are needed to offer a sustainable model for long-term cooperation and viability. The concept outlined in the DFG NFDI workshop last year end of August including the concept of the NFDI as a registered society could be a feasible way for a non-profit oriented operation model for the individual consortia.

DataPLANT receives positive feedback from the DFG, but still work ahead

DataPLANT was present at the 3rd NFDI Community Workshop - Services for Research Data Management in Neuroscience hosted on the 10th February at Ludwig Maximilians University Munich to exchange with other NFDI consortia and discuss common topics like infrastructure provisioning. The NFDI4Neuro as well as the NFDI4BIMP intend to hand in their proposals in this year’s round of applications. The main topic of this community workshop was the exchange on infrastructure for the NFDI and especially on the providers’ perspective.

The provider’s perspective in the NFDI was elaborated by the BioDATEN science data center. Providers of research IT infrastructures are faced by significant technological changes especially fostered by resource virtualization. Many of the modern services and workflows are operated in an increasingly cloud-like fashion where data and compute moves into centralized, aggregated resources. Such shared resources allow to host new projects faster. The necessary excess capacity is much easier to maintain and justify in centralized resources as the shared overhead to provide is typically much less than in independent systems. Grant providers start to understand the changed technological landscape and to adopt their funding schemes allowing to buy-in into existing resources preferred to establishing single ones per new project. Users are faced by difficult to forecast requirements. It is often impossible for them to define the exactly “right” configuration of a required resource (sizing challenge). These challenges are answered by the IT industry and science driven cloud and aggregated HPC offerings. The aggregation of resources into larger ones can focus the increased efforts for market analysis, system selection, proper procurement and operation of (large scale) IT infrastructures onto few experts. Further on such a strategy would eliminate the contradiction of typical project run times versus the (significant) delay for equipment provisioning and the usual write down time spans of that equipment.

The massive changes in the IT landscape and of the user expectations increase the pressure for re-orientation of university (scientific) computer centers. For many of them cooperation is the chance to significantly widen their scope of IT services. It helps to keep up with the demand by the scientific communities and to offer a relevantly complete service portfolio. Organizationally, it allows for specialization and community focus. When defining future strategies and operation models they might find a new meaning in supporting research data management by providing efficient infrastructures and consultation to the various scientific communities. Further on it offers them the opportunity to participate in infrastructure calls. These developments offer for the researchers the offloading of non-domain specific tasks and services. Suitable governance structures are to be implemented to ensure a persistent relevance of future computer centers through user participation and feedback loops. Close cooperation and consultation (like already done in Freiburg for the bwForCluster NEMO and for the storage infrastructure bwSFS) helps all stakeholders to have suitable, up to date infrastructures tailored to their needs. Such structures are in their infancy for the NFDI, but future NFDI wide coordination should advance this topic.

The financing of IT infrastructures for the various scientific communities is often grant driven (and inherently not sustainable) for sustainable long term services and research data management. The future would see a changed flow of funding from simple project driven and organization centered practice to demand-driven streams to different providers. Large infrastructure initiatives like de.NBI or the NFDI need not only to solve the role of personnel employment (permanent vs. project based) but to define suitable business and operation models compatible with the VAT regulations and the federal and state requirements for cash flows in mixed consortia.