DMP

From EERAdata Wiki
Revision as of 11:20, 19 October 2022 by Valerias (talk | contribs) (Questionnaire to collect relevant information)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What is the DMP about?

Data management is part of project management and it concerns both, content-wise data in the work packages as well as administrative data created in the project management process. In this sense, it extends to documents used for management, communication, dissemination and exploitation.

The data management plan is a living document that is updated during the project. In its initial form, it presents an agreement among the partners about data management practices. It is revised whenever new practices are enacted, e.g. new forms of data processing are established.

The data management plan ensures a smooth workflow for data governance, in particular guarantees common standards between the data handling partners. At the same time, the Horizon Europe template stresses the importance of FAIR principles both for data practices during the project and also for the use of project results after the end of the project.

A few aspects:

  • identifies all relevant type of data in a project
  • encompasses data created, sourced, collected, processed and published in the project
  • covers data handled as research data in the project but also data about the project such as project meeting minutes etc
  • assigns access rights, licenses, and optionally versioning to all datasets
  • establishes a joint data vocabulary for datasets
  • ensures a smooth workflow for data within work packages, among work packages and to stakeholders outside of the project consortium
  • establishes responsibilities among partners
  • checks if specific measures on data privacy and security (GDPR) apply
  • checks if regulations on intellectual property rights (IPR) apply

Draft for EERAdata DMP

Our promise

  • should comply with maDMP, so suggested format is xml (e.g., see an example for maDMP mockup)
  • should fulfill the objectives to:
    • Manage all data relevant for the project.
    • Adhere to FAIR and open data principles.
    • Establish a proof-of-concept for a DMP blueprint for the low carbon energy research community (e.g. ready to use for H2020 projects or other funding agencies and to support recommendation no. 1 of the Technopolis Report ‘Request systematically a data management plan (DMP) for all energy research applications to H 2020’)
    • All the project activities will be compliant with the General Data Protection Regulation (GDPR).

In the application we have promised that the DMP would fulfill the 10 principles or functionalities of maDMP: (following Miksa et al. 2019):

  1. Integrate DMPs with the workflow of all stakeholders in the research data ecosystem: EERAdata will provide a collaborative workspace enabling this functionality.
  2. Allow an automated system to act on behalf of stakeholders: EERAdata will explore the possibility to automatically extract information to DMPs and/or to entries in the federated database (see WP 3). This includes administrative information (e.g. information on funding agency and participants etc as collected in CORDIS), license information (e.g. information on type of license using wizard from EUDAT), automated booking of necessary storage (e.g. using repositories such as DATAVERSE), automatic deposits of project data and associated metadata (e.g. towards automated reporting in H2020), validation & compliance (e.g. by funding agencies)
  3. Make policies for machines and people: EERAdata follows strict templates for documents.
  4. Describe - for both machines and humans - the components of the data management ecosystem: EERAdata tests to what extent this request is general or project-specific (with implications for the suggested DMP blueprint).
  5. Use PIDs and controlled vocabularies: EERAdata follows this in the design of all WP activities.
  6. Follow a common data model for DMPs: EERAdata builds on the models suggested by the Working Group on DMP of Research Data Alliance (RDA), see the structure below. Moreover, EERAdata involves Ana Slavec (RDA) in the advisory board, who is an export on DMPs.
  7. Make DMPs available for human and machine consumption: This is the core of EERAdata and its DMP adheres to FAIR and open data principles.
  8. Support data management evaluation and monitoring: Explore whether the periodic reporting functionality of the EC Portal can be improved through linking to a project’s DMP.
  9. Make DMPs updatable, living, versioned documents: EERAdata understands its DMP as a living document designed for versioning.
  10. Make DMPs publicly available.

The EERAdata DMP has dissemination level ‘PU’ (see D1.3).

Structure (following the hierarchy of the RDA model) with Contact information, Cost information, Track of changes, Staff involved, Datasets generated (incl. data quality assurance, data identification number, license, distribution, keywords, metadata, type), Description, Ethical issues, Language, Project information, and Title.

Stakeholders of EERAdata DMP

Aligning with stakeholders listed in Miksa et al. 2019 and definitions of stakeholders provided therein (here given in brackets)

Stakeholder Definition in Miksa et al. 2019 Specs in EERAdata
Funder funding agencies and foundations that specify requirements for DMPs and monitor compliance H2020 program, so our reports and deliverables should be included in the DMP. Link to the other project funded (through CORDIS portal).
Ethics review IRBs/REBs that authorize human subjects research Our "new" deliverables should be included here and the agencies who use them.
Legal expert technology transfer offices; copyright and patent lawyers Name our institution's legal experts here? Links to documents: GA, CA, Project Management Handbook.
Researcher Principal Investigator and collaborators, including postdoctoral researchers and graduate and undergraduate students All consortium with ORCID and ResearchGate?
Publisher purveyors of article and data publication services  ??? Link to publications of EERAdata, with DOIs.
Repository operator general (e.g., Zenodo), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories Project and post-project hosts of EERAdata platform, WIKI, etc. (EERA; AIT; ENEA); additional EERAdata repository at GitHub. During project OnlyOffice.
Infrastructure provider providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles, etc.  ??? Same as hosts for the repositories?; In case we use the DMP generation template, e.g. from TU WIEN. Not yet working.
Research support staff data managers/curators, research administrators, and data librarians wider EERAdata consortium with links to admin staff.
Institutional administrator office of research/sponsored programs, chief information officers, university librarians, others. DMP, data management plan; ICPSR,; IRB, institutional review board; maDMP, machine-actionable DMP; REB, research ethics board. H2020 EU portal, project officer.

Questionnaire to collect relevant information

To collect all relevant information from project partners, a questionnaire to project partners and/or task leaders in a project is useful. This list compiles a collection of questions that can be adapted to the

  • Which type of data formats are you using in publishing the results of your work, e.g. as supplementary material to publications? xlsx, csv, netcdf, tiff, jpg, docx, json, aiff, mp3, mp4, pdf, xlm, rdf serializations, specialized proprietary data formats ?
  • What (industry) standards are you suggesting to describe the data in your WP ?
  • What data is strictly embargoed and can not be disclosed even to the consortium, if any?
  • What data can be published at the end of the project, e.g. as supplementary material to a publication? Indicate if you expect limitations to the use of this data. Which will be openly available?
  • Where do you save your data? Do you have regular routines to create back-ups? Are you using cloud solutions for this?
  • Who will be responsible in your team for the overall data governance? Do you have routines in place to check data quality and data storage? Who will be the contact point for data requests ?
  • Do you collect or process data that require specific GDPR treatment, e.g. data anonymization for survey data? Do you have to notify any institution about collecting personnel data due to national laws? If so, what is the timeline for the process?

Structure

Chapters I Administrative details

  • source project data from CORDIS
  • source consortium member data with roles from OO

II. Data and project management policies

  • data policies as described in (D1, D2, D3)
  • project management policies as described in (CA, GA, Project handbook)

III Re-using data

  • Linking to other EU project and existing data hubs and databases: database with links
  • Linking to FAIR/O standards
  • Linking to existing metadata frameworks

III. Creating and collecting data

  • pool of experts - link to database & consent forms from the other chapter
  • user data collection during workshops - link to these produces at open repositories (GitHub, storyboard, dataverse, project wiki)
  • production of project output - link to deliverables (storied in repositories of the project, published paper DOIs, project deliverables)

IV. Processing data

  • platform specs (incl. WIKI, website, project repositories)

V. Interpreting data

  • linking to publications, WIKI, platform, project repositories

VI. Preserving data

  • linking to platform
  • preservation policies adhering to FAIR/O
  • persistent identifier for the platform and repositories

VII. Giving access to data

  • Linking to data policies
  • Linking to project output (pool of expert, deliverables, repositories, platform)

Options for automated workflows and acting on behalf

Review of H2020 project DMPs

List of project DMPs

Project DMP and link Purpose/type of project Structure/elements Notes
REEEM, D8.2 DMP Output: Stakeholder Interaction Portal, a Pathways Diagnostic Tool and an Energy System Learning Simulation. DMP for "data collection to populate models, calibrate them, as well as allow for data exchange between different types of models and different partners" TOC: Project info, authors, history of changes, project summary, about, principles & summary, 1. DMP checklist (data collection, documentation & metadata, ethics & legal compliance, storage & backup, selection and preservation, data sharing, responsibilities & resources, data project impact assessment), 2. Definition and Matter, 3. Links. Pdf not xml document. Not machine-actionable.
HYbuild DMP Output: develop two innovative compact hybrid electrical/thermal storage systems for stand stand-alone and district connected buildings. DMP outlines how data are collected or generated by the HYBUILD project, in terms of how it will be organized, stored, and shared. It specifies which data will be open access and which will be confidential within the consortium, as far as it is possible to do so at this stage. The report has been developed following the Horizon 2020 guidelines (EC DG R&I, 2017) with additional guidance from the joint OpenAIRE and EUDAT webinar “How to write a Data Management Plan” (OpenAIRE and EUDAT, 2018) TOC: executive summary. Acronyms & abbrev. Glossar. 1. Introduction (Aims of project, relation with other project activities, structure, partner contributions), 2. Approach (data availability and open access, data storage & sharing), 3. Descriptions of datasets (template, plus 39 individual data set descriptions), 4. Conclusions, 5. References Pdf not xml document. Not machine-actionable.
RESLAG D1.2 DMP Four large-scale demonstrations to recycle steel slag are considered: Extraction of non-ferrous high added metals; TES for heat recovery applications; TES to increase dispatchability of the CSP plant electricity; Production of innovative refractory ceramic compounds. DMP is to ensure the accessibility and intelligibility of the data that will be generated during the RESLAG project in order to comply with the Guidelines of the “Open Research Data Pilot” (annex II). TOC: Ex. sum, nomenclature, list of figs & tabs, 1. Intro, 2. metadata strategy & standardization, 3. fact sheet (data set descriptions, data set metadata), 4 data sharing, 5 storage and preservation, conclusion, two annexes no maDMP, link to Zenodo repository.
PANTERA, D.15 DMP Identify and implement initiatives aimed at raising the participation of EU countries in the needed R&I for developing technologies, systems and markets in support of the common energy market and the energy transition. DMP: final version of the (open) Data Management Plan for the PANTERA project in month 2 of the project. This Data Management Handling Plan investigates the appropriate methodologies and open repositories for data management and dissemination and tries to offer through open access as much information generated by the PANTERA project. TOC: Abbr., Exec sum, 1 Objective of the report, 2 Framework fro DMP, 3 Data archiving and preserving infrastructure, 4 Datasets and publications for DMP, 5 Ethics Management Plan, 6 Conclusion, 7 References, 8 Annex (List of figs, tabs, Ethis Manual, Consent form, Opt form, Privacy Policy) no maDMP, final DMP already in Month 2 (so it is actually never used)
Link Example Example Example

Sources to learn about DMPs

MaDMP diagram from RDA model

https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/7864356/Active+Data+Management+Plans+Team

DMP tools

Literature

  • Cynthia Hudson Vitale, Heather Moulaison Sandy (2019) Data Management Plans A Review. DESIDOC Journal of Library & Information Technology 39, 322-328. https://doi.org/10.14429/djlit.39.06.15086
  • Miksa T, Simms S, Mietchen D, Jones S (2019) Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3): e1006750. https://doi.org/10.1371/journal.pcbi.1006750
  • Data management plans: time wasting or time saving?
  • Sarah Jones, Robert Pergl, Rob Hooft, Tomasz Miksa, Robert Samors, Judit Ungvari, Rowena I. Davis, and Tina Lee (2020) Data Management Planning: How Requirements and Solutions are Beginning to Converge, Data Intelligence 2:1-2, 208-219 : https://doi.org/10.1162/dint_a_00043
  • Daniel Spichtinger (2022) Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans. International Journal of Digital Curation 17: http://www.ijdc.net/article/view/840
  • Cardoso, J., Castro, L.J., Ekaputra, F.J. et al. (2022) DCSO: towards an ontology for machine-actionable data management plans. J Biomed Semant 13, 21: https://doi.org/10.1186/s13326-022-00274-4