DMP template

From EERAdata Wiki
Revision as of 11:23, 8 February 2023 by Valerias (talk | contribs) (Template for a DMP)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What is a Data Management Plan (DMP) about?

Data management is part of project management and it concerns both, content-wise data in the work packages as well as administrative data created in the project management process. In this sense, it extends to documents used for management, communication, dissemination and exploitation. It is the place where the consortium decides on FAIR and open data practices (e.g., depth of FAIRification, standardized vocabulary to be used, data access protocols, etc.).

The data management plan is a living document that is updated during the project. In its initial form, it presents an agreement among the partners about data management practices. It is revised whenever new practices are enacted, e.g. new forms of data processing are established.

The data management plan ensures a smooth workflow for data governance, in particular guarantees common standards between the data handling partners. At the same time, the Horizon Europe template stresses the importance of FAIR principles both for data practices during the project and also for the use of project results after the end of the project.

A few aspects:

  • identifies all relevant type of data in a project
  • encompasses data created, sourced, collected, processed and published in the project
  • covers data handled as research data in the project but also data about the project such as project meeting minutes etc
  • assigns access rights, licenses, and optionally versioning to all datasets
  • establishes a joint data vocabulary for datasets
  • ensures a smooth workflow for data within work packages, among work packages and to stakeholders outside of the project consortium
  • establishes responsibilities among partners
  • checks if specific measures on data privacy and security (GDPR) apply
  • checks if regulations on intellectual property rights (IPR) apply

Typical structure of a DMP:

  • Introduction: The purpose of the DMP is specified, its relation to other deliverables (incl. Grant Agreement), and the revision strategy.
  • Overview on data flows in the project:
  1. providing administrative information about the project, partners, and third parties (e.g., collection ORCIDS);
  2. providing overview on data sets in the project (e.g., creating a data flow diagram, collecting information about data sets in a table); (3) collecting information on how to re-use and share data (e.g., in a table).
  • Overview on data curation in the project:
  1. providing information on the intended FAIRification effort, detailing the FAIRification workflow that the project wants to follow (e.g., how to address metadata & vocabularies, licensing, and IPR). As a minimum the Dublin Core Standard should be followed. For an overview of ontologies in a field see LOV - Linked Open Vocabulary.
  2. agreement on the handling of data in the project and outside (e.g., restricted/non-restricted data access as collected in the exemplary table below). As a default, all datasets connected to deliverables which have dissemination level PU (public) should be assigned a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
  3. providing information on how IPR is addressed (see the exemplary figure below)
  • Overview on the allocation of resources for data management and FAIRification: e.g., compilation in a table that details the staff responsible for data governance at each partner
  • Overview on issues related to data security and ethics

Developing a DMP

Identify stakeholders - Example: EERAdata

Aligning with stakeholders listed in Miksa et al. 2019 and definitions of stakeholders provided therein (here given in brackets)

Stakeholder Definition in Miksa et al. 2019 Specs in EERAdata
Funder funding agencies and foundations that specify requirements for DMPs and monitor compliance H2020 program, so our reports and deliverables should be included in the DMP. Link to the other project funded (through CORDIS portal).
Ethics review IRBs/REBs that authorize human subjects research Our "new" deliverables should be included here and the agencies who use them.
Legal expert technology transfer offices; copyright and patent lawyers Name our institution's legal experts here? Links to documents: GA, CA, Project Management Handbook.
Researcher Principal Investigator and collaborators, including postdoctoral researchers and graduate and undergraduate students All consortium with ORCID and ResearchGate?
Publisher purveyors of article and data publication services  ??? Link to publications of EERAdata, with DOIs.
Repository operator general (e.g., Zenodo), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories Project and post-project hosts of EERAdata platform, WIKI, etc. (EERA; AIT; ENEA); additional EERAdata repository at GitHub. During project OnlyOffice.
Infrastructure provider providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles, etc.  ??? Same as hosts for the repositories?; In case we use the DMP generation template, e.g. from TU WIEN. Not yet working.
Research support staff data managers/curators, research administrators, and data librarians wider EERAdata consortium with links to admin staff.
Institutional administrator office of research/sponsored programs, chief information officers, university librarians, others. DMP, data management plan; ICPSR,; IRB, institutional review board; maDMP, machine-actionable DMP; REB, research ethics board. H2020 EU portal, project officer.

Collect relevant information

To collect all relevant information from project partners, a questionnaire to project partners and/or task leaders in a project is useful. This list compiles a collection of questions that can be adapted to the

  • Which type of data formats are you using in publishing the results of your work, e.g. as supplementary material to publications? xlsx, csv, netcdf, tiff, jpg, docx, json, aiff, mp3, mp4, pdf, xlm, rdf serializations, specialized proprietary data formats ?
  • What (industry) standards are you suggesting to describe the data in your WP ?
  • What data is strictly embargoed and can not be disclosed even to the consortium, if any?
  • What data can be published at the end of the project, e.g. as supplementary material to a publication? Indicate if you expect limitations to the use of this data. Which will be openly available?
  • Where do you save your data? Do you have regular routines to create back-ups? Are you using cloud solutions for this?
  • Who will be responsible in your team for the overall data governance? Do you have routines in place to check data quality and data storage? Who will be the contact point for data requests ?
  • Do you collect or process data that require specific GDPR treatment, e.g. data anonymization for survey data? Do you have to notify any institution about collecting personnel data due to national laws? If so, what is the timeline for the process?

Template: questionnaire to project partners

This questionnaire is used to inquire data requirements from partners. It was distributed after the kickoff workshop and will be followed up in discussions with individual partners. The answers form an integral part of the data management plan, which is the primary tool for understanding what data inputs, methods, and outputs are going to be involved in the [project name] project.

Please either fill it out or pass it along to the person in your organization best suited to completing it. Please complete this questionnaire by the end of September so that we can move on to the next steps in the creation of the data management plan.

  1. Please specify the [project name] partner institution you are answering these questions for:
  2. Who is responsible for data governance in your institution (a person who will engage in the [project name] project)? Please provide name and email.
  3. Do you have a deputy? Please provide name and email.
  4. Please give the name of the dataset
  5. Context of data

Please describe the type and formats of data input (data output) that is needed to carry out the task together with a short explanation or title describing the context and characteristics of the corresponding dataset. What is the origin of the data? Is it raw data? Is it processed data? (If so, what process was applied or what software has been used?)

Example: Table of numeric data in excel with the following headers: Name of material - chemical characterization - amount in kg etc.

The dataset comprises cleaned data that are purchased from the material supplier Company XYZ.

1. Standardization of data

Is the data connected to any standard, e.g., industry standards such as ISO-Codes or standardized vocabulary etc. Please provide the name of the standard and a link to its documentation. E.g., the data aligns with ISO-Code XYZ. The dataset uses the standardized vocabulary from Provide an example, e.g.IUPAC Standards Online - IUPAC | International Union of Pure and Applied Chemistry. If the standardization information is contained in an internal document, please send this document to xxx@institute.

Example: The dataset follows an internal vocabulary that has been sent to the WP6 coordinators.

2. Use of the data in [project name]

Is the dataset critical for a specific procedure or task in [project name]? When is the dataset needed (or when will it be available)? Will the data be updated?

Example: The dataset is critical input to task Tx.x and will be needed in M24.

3. Confidentiality of data

Please use this field to briefly flag if any of the data are restricted for sharing with the public / within the consortium / among selected partners only / strictly confidential. Also, do you have a license specification for the dataset? E.g. CC-BY,..

Example: The dataset can be shared within the consortium. A license has not been assigned.

4. Do you have another dataset that you would like to submit information for?

If no, the questionnaire will end

If yes, you will fill out this section again for the next dataset

Template for a DMP

A data management plan establishes guidelines to manage all data collected, created, processed and published in a project. It also regulates the mechanisms to be used at the end of the project to share and preserve the project’s data. To this end, the data management plan outlines all intended activities regarding data management and data governance. By its nature, the document is a living document, intended to be adapted and modified as required during the course of the project.

The template developed by the EERAdata project has the following structure:

  • Introduction: The purpose of the DMP is specified, its relation to other deliverables (incl. Grant Agreement), and the revision strategy.
  • Overview on data flows in the project:
  1. providing administrative information about the project, partners, and third parties (e.g., collection ORCIDS);
  2. providing overview on data sets in the project (e.g., creating a data flow diagram, collecting information about data sets in a table);
  3. collecting information on how to re-use and share data (e.g., in a table).
  • Overview on data curation in the project:
  1. providing information on the intended FAIRification effort, detailing the FAIRification workflow that the project wants to follow (e.g., how to address metadata & vocabularies, licensing, and IPR). As a minimum the Dublin Core Standard should be followed. For an overview of ontologies in a field see LOV - Linked Open Vocabulary.
  2. agreement on the handling of data in the project and outside (e.g., restricted/non-restricted data access as collected in the exemplary table below). As a default, all datasets connected to deliverables which have dissemination level PU (public) should be assigned a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). (3) Providing information on how IPR is addressed (see the exemplary figure below)
  • Overview on the allocation of resources for data management and FAIRification: e.g., compilation in a table that details the staff responsible for data governance at each partner
  • Overview on issues related to data security and ethics

Download a deliverable template: [1]

Options for automated workflows and acting on behalf

Review of H2020 project DMPs

List of project DMPs

Project DMP and link Purpose/type of project Structure/elements Notes
REEEM, D8.2 DMP Output: Stakeholder Interaction Portal, a Pathways Diagnostic Tool and an Energy System Learning Simulation. DMP for "data collection to populate models, calibrate them, as well as allow for data exchange between different types of models and different partners" TOC: Project info, authors, history of changes, project summary, about, principles & summary, 1. DMP checklist (data collection, documentation & metadata, ethics & legal compliance, storage & backup, selection and preservation, data sharing, responsibilities & resources, data project impact assessment), 2. Definition and Matter, 3. Links. Pdf not xml document. Not machine-actionable.
HYbuild DMP Output: develop two innovative compact hybrid electrical/thermal storage systems for stand stand-alone and district connected buildings. DMP outlines how data are collected or generated by the HYBUILD project, in terms of how it will be organized, stored, and shared. It specifies which data will be open access and which will be confidential within the consortium, as far as it is possible to do so at this stage. The report has been developed following the Horizon 2020 guidelines (EC DG R&I, 2017) with additional guidance from the joint OpenAIRE and EUDAT webinar “How to write a Data Management Plan” (OpenAIRE and EUDAT, 2018) TOC: executive summary. Acronyms & abbrev. Glossar. 1. Introduction (Aims of project, relation with other project activities, structure, partner contributions), 2. Approach (data availability and open access, data storage & sharing), 3. Descriptions of datasets (template, plus 39 individual data set descriptions), 4. Conclusions, 5. References Pdf not xml document. Not machine-actionable.
RESLAG D1.2 DMP Four large-scale demonstrations to recycle steel slag are considered: Extraction of non-ferrous high added metals; TES for heat recovery applications; TES to increase dispatchability of the CSP plant electricity; Production of innovative refractory ceramic compounds. DMP is to ensure the accessibility and intelligibility of the data that will be generated during the RESLAG project in order to comply with the Guidelines of the “Open Research Data Pilot” (annex II). TOC: Ex. sum, nomenclature, list of figs & tabs, 1. Intro, 2. metadata strategy & standardization, 3. fact sheet (data set descriptions, data set metadata), 4 data sharing, 5 storage and preservation, conclusion, two annexes no maDMP, link to Zenodo repository.
PANTERA, D.15 DMP Identify and implement initiatives aimed at raising the participation of EU countries in the needed R&I for developing technologies, systems and markets in support of the common energy market and the energy transition. DMP: final version of the (open) Data Management Plan for the PANTERA project in month 2 of the project. This Data Management Handling Plan investigates the appropriate methodologies and open repositories for data management and dissemination and tries to offer through open access as much information generated by the PANTERA project. TOC: Abbr., Exec sum, 1 Objective of the report, 2 Framework fro DMP, 3 Data archiving and preserving infrastructure, 4 Datasets and publications for DMP, 5 Ethics Management Plan, 6 Conclusion, 7 References, 8 Annex (List of figs, tabs, Ethis Manual, Consent form, Opt form, Privacy Policy) no maDMP, final DMP already in Month 2 (so it is actually never used)
Link Example Example Example

Sources to learn about DMPs

MaDMP diagram from RDA model

https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/7864356/Active+Data+Management+Plans+Team

DMP tools

Literature

  • Cynthia Hudson Vitale, Heather Moulaison Sandy (2019) Data Management Plans A Review. DESIDOC Journal of Library & Information Technology 39, 322-328. https://doi.org/10.14429/djlit.39.06.15086
  • Miksa T, Simms S, Mietchen D, Jones S (2019) Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3): e1006750. https://doi.org/10.1371/journal.pcbi.1006750
  • Data management plans: time wasting or time saving?
  • Sarah Jones, Robert Pergl, Rob Hooft, Tomasz Miksa, Robert Samors, Judit Ungvari, Rowena I. Davis, and Tina Lee (2020) Data Management Planning: How Requirements and Solutions are Beginning to Converge, Data Intelligence 2:1-2, 208-219 : https://doi.org/10.1162/dint_a_00043
  • Daniel Spichtinger (2022) Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans. International Journal of Digital Curation 17: http://www.ijdc.net/article/view/840
  • Cardoso, J., Castro, L.J., Ekaputra, F.J. et al. (2022) DCSO: towards an ontology for machine-actionable data management plans. J Biomed Semant 13, 21: https://doi.org/10.1186/s13326-022-00274-4

This document describes how to use the DMP Common Standard Ontology for creating a machine-actionable data management plan. It draws heavily from the example provided here.