- 1 What is the DMP about?
- 2 Draft for EERAdata DMP
- 3 Questionnaire to collect relevant information
- 4 Structure
- 5 Options for automated workflows and acting on behalf
- 6 Review of H2020 project DMPs
- 7 Sources to learn about DMPs
- 8 DMP tools
- 9 Machine-actionable DMP
- 10 Literature
What is the DMP about?
Data management is part of project management and it concerns both, content-wise data in the work packages as well as administrative data created in the project management process. In this sense, it extends to documents used for management, communication, dissemination and exploitation.
The data management plan is a living document that is updated during the project. In its initial form, it presents an agreement among the partners about data management practices. It is revised whenever new practices are enacted, e.g. new forms of data processing are established.
The data management plan ensures a smooth workflow for data governance, in particular guarantees common standards between the data handling partners. At the same time, the Horizon Europe template stresses the importance of FAIR principles both for data practices during the project and also for the use of project results after the end of the project.
A few aspects:
- identifies all relevant type of data in a project
- encompasses data created, sourced, collected, processed and published in the project
- covers data handled as research data in the project but also data about the project such as project meeting minutes etc
- assigns access rights, licenses, and optionally versioning to all datasets
- establishes a joint data vocabulary for datasets
- ensures a smooth workflow for data within work packages, among work packages and to stakeholders outside of the project consortium
- establishes responsibilities among partners
- checks if specific measures on data privacy and security (GDPR) apply
- checks if regulations on intellectual property rights (IPR) apply
Draft for EERAdata DMP
- should comply with maDMP, so suggested format is xml (e.g., see an example for maDMP mockup)
- should fulfill the objectives to:
- Manage all data relevant for the project.
- Adhere to FAIR and open data principles.
- Establish a proof-of-concept for a DMP blueprint for the low carbon energy research community (e.g. ready to use for H2020 projects or other funding agencies and to support recommendation no. 1 of the Technopolis Report ‘Request systematically a data management plan (DMP) for all energy research applications to H 2020’)
- All the project activities will be compliant with the General Data Protection Regulation (GDPR).
In the application we have promised that the DMP would fulfill the 10 principles or functionalities of maDMP: (following Miksa et al. 2019):
- Integrate DMPs with the workflow of all stakeholders in the research data ecosystem: EERAdata will provide a collaborative workspace enabling this functionality.
- Allow an automated system to act on behalf of stakeholders: EERAdata will explore the possibility to automatically extract information to DMPs and/or to entries in the federated database (see WP 3). This includes administrative information (e.g. information on funding agency and participants etc as collected in CORDIS), license information (e.g. information on type of license using wizard from EUDAT), automated booking of necessary storage (e.g. using repositories such as DATAVERSE), automatic deposits of project data and associated metadata (e.g. towards automated reporting in H2020), validation & compliance (e.g. by funding agencies)
- Make policies for machines and people: EERAdata follows strict templates for documents.
- Describe - for both machines and humans - the components of the data management ecosystem: EERAdata tests to what extent this request is general or project-specific (with implications for the suggested DMP blueprint).
- Use PIDs and controlled vocabularies: EERAdata follows this in the design of all WP activities.
- Follow a common data model for DMPs: EERAdata builds on the models suggested by the Working Group on DMP of Research Data Alliance (RDA), see the structure below. Moreover, EERAdata involves Ana Slavec (RDA) in the advisory board, who is an export on DMPs.
- Make DMPs available for human and machine consumption: This is the core of EERAdata and its DMP adheres to FAIR and open data principles.
- Support data management evaluation and monitoring: Explore whether the periodic reporting functionality of the EC Portal can be improved through linking to a project’s DMP.
- Make DMPs updatable, living, versioned documents: EERAdata understands its DMP as a living document designed for versioning.
- Make DMPs publicly available.
The EERAdata DMP has dissemination level ‘PU’ (see D1.3).
Structure (following the hierarchy of the RDA model) with Contact information, Cost information, Track of changes, Staff involved, Datasets generated (incl. data quality assurance, data identification number, license, distribution, keywords, metadata, type), Description, Ethical issues, Language, Project information, and Title.
Stakeholders of EERAdata DMP
Aligning with stakeholders listed in Miksa et al. 2019 and definitions of stakeholders provided therein (here given in brackets)
|Stakeholder||Definition in Miksa et al. 2019||Specs in EERAdata|
|Funder||funding agencies and foundations that specify requirements for DMPs and monitor compliance||H2020 program, so our reports and deliverables should be included in the DMP. Link to the other project funded (through CORDIS portal).|
|Ethics review||IRBs/REBs that authorize human subjects research||Our "new" deliverables should be included here and the agencies who use them.|
|Legal expert||technology transfer offices; copyright and patent lawyers||Name our institution's legal experts here? Links to documents: GA, CA, Project Management Handbook.|
|Researcher||Principal Investigator and collaborators, including postdoctoral researchers and graduate and undergraduate students||All consortium with ORCID and ResearchGate?|
|Publisher||purveyors of article and data publication services||??? Link to publications of EERAdata, with DOIs.|
|Repository operator||general (e.g., Zenodo), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories||Project and post-project hosts of EERAdata platform, WIKI, etc. (EERA; AIT; ENEA); additional EERAdata repository at GitHub. During project OnlyOffice.|
|Infrastructure provider||providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles, etc.||??? Same as hosts for the repositories?; In case we use the DMP generation template, e.g. from TU WIEN. Not yet working.|
|Research support staff||data managers/curators, research administrators, and data librarians||wider EERAdata consortium with links to admin staff.|
|Institutional administrator||office of research/sponsored programs, chief information officers, university librarians, others. DMP, data management plan; ICPSR,; IRB, institutional review board; maDMP, machine-actionable DMP; REB, research ethics board.||H2020 EU portal, project officer.|
Questionnaire to collect relevant information
To collect all relevant information from project partners, a questionnaire to project partners and/or task leaders in a project is useful. This list compiles a collection of questions that can be adapted to the
- Which type of data formats are you using in publishing the results of your work, e.g. as supplementary material to publications? xlsx, csv, netcdf, tiff, jpg, docx, json, aiff, mp3, mp4, pdf, xlm, rdf serializations, specialized proprietary data formats ?
- What (industry) standards are you suggesting to describe the data in your WP ?
- What data is strictly embargoed and can not be disclosed even to the consortium, if any?
- What data can be published at the end of the project, e.g. as supplementary material to a publication? Indicate if you expect limitations to the use of this data. Which will be openly available?
- Where do you save your data? Do you have regular routines to create back-ups? Are you using cloud solutions for this?
- Who will be responsible in your team for the overall data governance? Do you have routines in place to check data quality and data storage? Who will be the contact point for data requests ?
- Do you collect or process data that require specific GDPR treatment, e.g. data anonymization for survey data? Do you have to notify any institution about collecting personnel data due to national laws? If so, what is the timeline for the process?
Chapters I Administrative details
- source project data from CORDIS
- source consortium member data with roles from OO
II. Data and project management policies
- data policies as described in (D1, D2, D3)
- project management policies as described in (CA, GA, Project handbook)
III Re-using data
- Linking to other EU project and existing data hubs and databases: database with links
- Linking to FAIR/O standards
- Linking to existing metadata frameworks
III. Creating and collecting data
- pool of experts - link to database & consent forms from the other chapter
- user data collection during workshops - link to these produces at open repositories (GitHub, storyboard, dataverse, project wiki)
- production of project output - link to deliverables (storied in repositories of the project, published paper DOIs, project deliverables)
IV. Processing data
- platform specs (incl. WIKI, website, project repositories)
V. Interpreting data
- linking to publications, WIKI, platform, project repositories
VI. Preserving data
- linking to platform
- preservation policies adhering to FAIR/O
- persistent identifier for the platform and repositories
VII. Giving access to data
- Linking to data policies
- Linking to project output (pool of expert, deliverables, repositories, platform)
Options for automated workflows and acting on behalf
- Collating administrative data: Use Current Research Information Systems (CRIS). At EU exists EuroCRIS. Open question: So, how can we practically and automatically do this for our DMP? It is a question to OpenAIRE.
- License selection: If the institutional policy recommends open access publishing and the data do not contain sensitive information, then CC0 could be the default setting for data, and CC BY for text and media. There is already a wizard from EUDAT. An overview of creative commons licenses is here. Additional information on licensing is provided here by the Center for Open Science.
- Not so much available yet on data depositing and compliance/validation checks?
- c2 metadata - continuous capture of metadata http://c2metadata.org/
- common workflow language https://www.commonwl.org/
- Crossref entry for the H2020 programme
- Grant ID for EERAdata as listed by CORDIS
- HVL identifier given by the Research Organization Registry
Review of H2020 project DMPs
List of project DMPs
|Project DMP and link||Purpose/type of project||Structure/elements||Notes|
|REEEM, D8.2 DMP||Output: Stakeholder Interaction Portal, a Pathways Diagnostic Tool and an Energy System Learning Simulation. DMP for "data collection to populate models, calibrate them, as well as allow for data exchange between different types of models and different partners"||TOC: Project info, authors, history of changes, project summary, about, principles & summary, 1. DMP checklist (data collection, documentation & metadata, ethics & legal compliance, storage & backup, selection and preservation, data sharing, responsibilities & resources, data project impact assessment), 2. Definition and Matter, 3. Links.||Pdf not xml document. Not machine-actionable.|
|HYbuild DMP||Output: develop two innovative compact hybrid electrical/thermal storage systems for stand stand-alone and district connected buildings. DMP outlines how data are collected or generated by the HYBUILD project, in terms of how it will be organized, stored, and shared. It specifies which data will be open access and which will be confidential within the consortium, as far as it is possible to do so at this stage. The report has been developed following the Horizon 2020 guidelines (EC DG R&I, 2017) with additional guidance from the joint OpenAIRE and EUDAT webinar “How to write a Data Management Plan” (OpenAIRE and EUDAT, 2018)||TOC: executive summary. Acronyms & abbrev. Glossar. 1. Introduction (Aims of project, relation with other project activities, structure, partner contributions), 2. Approach (data availability and open access, data storage & sharing), 3. Descriptions of datasets (template, plus 39 individual data set descriptions), 4. Conclusions, 5. References||Pdf not xml document. Not machine-actionable.|
|RESLAG D1.2 DMP||Four large-scale demonstrations to recycle steel slag are considered: Extraction of non-ferrous high added metals; TES for heat recovery applications; TES to increase dispatchability of the CSP plant electricity; Production of innovative refractory ceramic compounds. DMP is to ensure the accessibility and intelligibility of the data that will be generated during the RESLAG project in order to comply with the Guidelines of the “Open Research Data Pilot” (annex II).||TOC: Ex. sum, nomenclature, list of figs & tabs, 1. Intro, 2. metadata strategy & standardization, 3. fact sheet (data set descriptions, data set metadata), 4 data sharing, 5 storage and preservation, conclusion, two annexes||no maDMP, link to Zenodo repository.|
Sources to learn about DMPs
- RDA DMP Common Standard for machine-actionable Data Management Plans: https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard
- maDMP hackathon-2020 https://github.com/RDA-DMP-Common/hackathon-2020
- collection of DMP by Digital Curation Centre
- Horizon Europe DMP template
- material provided by RDMkit by ELIXIR-CONVERGE
- Guidelines Open Research Data and Data Management Plans by the European Research Council
- OpenAire: How to create a Data Management Plan
- https://researchers.ds-wizard.org/questionnaires/detail/b686f656-cdd9-4981-b4c7-9598b2561c8d. Project EERAdata.
- Tool by Digital Curation centre
- Tool by California Digital Library
- Tool by German founding organizations
A machine-actionable data management plan is a document, which is standardized in a way, that software can be programmed against it. More details can be found in Simms et al. 2017 and Miksa et al. 2019. The DMP Common Standard Ontology (DCSO) has been developed to offer standard terminology about Data Management Plans. This allows to draft machine-actionable data management plans, where all the information is structured in a way that it can be processed by a machine. EERAdata used this terminology to offer a tutorial on writing machine-actionable DMPs.
- Cynthia Hudson Vitale, Heather Moulaison Sandy (2019) Data Management Plans A Review. DESIDOC Journal of Library & Information Technology 39, 322-328. https://doi.org/10.14429/djlit.39.06.15086
- Miksa T, Simms S, Mietchen D, Jones S (2019) Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3): e1006750. https://doi.org/10.1371/journal.pcbi.1006750
- Data management plans: time wasting or time saving?
- Sarah Jones, Robert Pergl, Rob Hooft, Tomasz Miksa, Robert Samors, Judit Ungvari, Rowena I. Davis, and Tina Lee (2020) Data Management Planning: How Requirements and Solutions are Beginning to Converge, Data Intelligence 2:1-2, 208-219 : https://doi.org/10.1162/dint_a_00043
- Daniel Spichtinger (2022) Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans. International Journal of Digital Curation 17: http://www.ijdc.net/article/view/840
- Cardoso, J., Castro, L.J., Ekaputra, F.J. et al. (2022) DCSO: towards an ontology for machine-actionable data management plans. J Biomed Semant 13, 21: https://doi.org/10.1186/s13326-022-00274-4