WS1

From EERAdata Wiki
Jump to: navigation, search

The first workshop of EERAdata is organized as an online webinar and hackathon from June, 2-4, 2020.

Objective

  • Learn about FAIR and open data practices in the energy and other research communities;
  • Choose selected databases to be investigated in use cases. Discuss and summarize the reasons for selection.
  • Analyze the state of FAIR/O principles for the selected database (with the help of the questionnaire and FAIR/O assessment tools).
  • Evaluate the state of FAIR/O principles for the metadata of selected databases.
  • Suggest general principles for the design of metadata (use-case- and domain-specific),
  • Discuss top-level metadata adhering to FAIR/O principles (use-case- and domain-specific),
  • Learn about the FAIRification of Data Management Plans (DMP).

Workshop concept of EERAdata

Read aheads

Obligatory:

Suggested read on the history of metadata: Metadata - Shaping Knowledge from Antiquity to the Semantic Web by Richard Gartner, https://www.springer.com/gp/book/9783319408910

Join and collaborate interactively!

The workshop is a hackathon! EERAdata is evolving jointly only! Join the discussions and work together online:

EERAdata story board

Objective: collect views from participants regarding energy data, energy metadata, FAIR/O problems etc. to support discussions, identification of gaps & needs (WP2) and elements to design the platform (WP3).

Questions:

  • Q#1: What is your user story regarding FAIR/O energy data?
  • Q#2: What is your user story regarding metadata?

Template to answer: As a <stakeholder>, I want <goal> so that <reason>.

How to post a story?

  1. Click on the link: https://padlet.com/janaschwanitz/aciywzt9fs1h70m2
  2. Click on plus at the right bottom to add a story.
  3. Choose as title depending on the question one wants to answer. Choose for Q#1: FAIR/O data, or for Q#2: metadata. In the example above, the title “metadata” was chosen.
  4. Type your sentence to let us know where the gaps and hopes are: The template is: As a <stakeholder>, I want <goal> so that <reason>.

Example: As a data-driven energy researcher, I want utopia - i.e. one-stop access to all relevant metadata -, so that I can browse available data, check where they are from and whether I can trust them when reusing.

How to comment a story?

  • One can rate stories by clicking on a number of stars.
  • One can comment stories as well as comments with words.
  • It is anonymous commenting and adding of stories, as long as one does not login into one's personal profile at padlet.com.

EERAdata wiki - How to?

The easiest way is you simply start writing and editing. You can install an easy editor by doing what is described in the picture to the right.

How to install an easy editor

Consult the User's Guide for information on using the wiki software.

EERAdata on github

Link here and share your thoughts and issues! Note, work in progress.

EERAdata @ Research Gate

Link here and share your thoughts and issues! Note, work in progress.

Just have fun!

Metadata memory game

Read before playing: This is a little memory game where players need to identify triples, that is three tiles that belong to one data set. All of them relate to low carbon energy. Some tiles have the form of a picture, others show metadata descriptions (in various formats) and some are even sound files. So, the game is to have fun, while learning about metadata. Try out if you can find all triples. Note if you have opened 3 tiles and they do not match, all will automatically close. Just as you know it from a real-life memory game. Link

Agenda and notes Day 1

Builds on read aheads. Online talks and discussions. Space for interaction with participants after each presentation. Moderated discussion of collected comments.

Time slot Topic
10.00-10.20 Welcome and introduction with workshop goals and procedures: “EERAdata - Towards Utopia for low carbon energy research”, Valeria Jana Schwanitz, HVL & PI EERAdata. Link: [1]. Main points:
  • In the energy system the data revolution offers prospects, but we are fare away from harvesting them. The time researchers spent with data governance (finding, cleaning, revising of formats) and bureaucracy exceeds time spent on exciting stuff (thinking, creating new insights, collaborating, and discussing with others) by far.
  • There is a lack of joint standards and common metadata formats to support researchers in finding and reusing heterogeneous data. Machine-actionability is also an issue.
  • The vision of developing a one-stop-entry point for energy research is clear before our eyes: being able to search for data, access rich metadata, choose & select datasets, crunch data real-time online, being able to link to "My-researcher-space", a platform that offers a personalized workspace for data analysis, paper writing, and research collaboration.
10.20-12.15 Online lectures:

The EOSC Nordic: machine-actionable FAIR maturity evaluations & the FAIRification of data repositories” - Andreas Jaunsen, Nordforsk & EOSC-Nordic. Link: [2]. Main points:

  • Goal and vision of EOSC: Enable researchers to access data across domains and disciplines as easy as possible, support locating the relevant data. All European data should be available to researchers, but this does not mean that all data is stored centrally. Instead, databases should be interconnected.
  • FAIR maturity evaluations: Close to 100 repositories have been tested using an automated tool. Most of them only score with 0.10 out of 1.
  • FAIR Evaluation Tool: [3]
  • EOSC Website: [4]

OpenAIRE: Open Access Infrastructure for Research in Europe”, Ilaria Fava, OpenAIRE. Link: [5]. Main points:

  • Goal and vision of OpenAIRE: To "Bridge the worlds where Science is performed and where Science is published", by monitoring, accelerating and supporting Open access research and publishing.
  • OpenAIRE consists of 50 European partners, including 34 National Open Access Desks (NOADs, that provide support on issues related to Open Science Policies, Open Science Infrastucture, Open research Data and Open Access to publications
  • Lessons learned: "Research is global, support is local". Regional differences in culture and maturity of open access infrastructure require support strategies specifically tailored to each region.
  • OpenAIRE Website: [6]
  • OpenAIRE Connect: Platform that allows to connect with the research community of a specific research field [7]
  • OpenAIRE Provide: platform that allows open access publishing of Data [8]
  • Further Links copied from Conference chat:
    • Working Group on Rewards: [9]
    • Open Science Policy Platform: [10]
    • Clarivate Data Citation index: [11]

A short break of 15 min -

Community-driven metadata and ontologies for Materials Science and their key role in artificial-intelligence tools”, Luca Ghiringhelli, FHI Berlin. Link: [12]. Main points:

  • The attributes of a data object can be data or metadata, depending on the context
  • "An Ontology is a formal (= machine readable) representation (= concepts, properties, relations, functions, constraints, axioms are explicitly defined) of the knowledge (= domain specific) of a community (= consensual) for a purpose (= question driven)."
  • Definition of FAIR data:
    • Findable: unique names, human-readable descriptions
    • Accessible: URL, accessible via API
    • Interoperable: typed, extensible schema -> ontologies
    • reusable: hierarchical schema -> data-analztics
  • NOMAD Meta Info: [13]

Metadata practices from IRP Wind”, Anna Maria Sempreviva, DTU. Link: [14]. Main points:

  • Alternative interpretation of FAIR data: Reusability of Data is the final goal with Findability, Accessibility and Interoperability being prerequisites for Reusability. Reusability of Data for multiple purposes multiplies the value of the Data
  • Open data = available data <-> FAIR data = findable data
  • Issue: How to make data findable but safe (in regards to data protection, competitive advantages, etc)?
    • Solution: Create a searchable data catalogue of distributed data
  • How to create a taxonomy?
    • Expert elicitation: Group of experts creates a taxonomy which is then reviewed by wider research community
      • "top-down" approach
      • + clearly defined, controlled vocabulary
      • - static, unable to adapt to new trends
    • Taxonomy based of author keywords: Map keywords used by authors along similarities in meaning, frequency of usage
      • "bottom-up" approach
      • + adaptable, able to track new trends
      • - Mix of disciplines, models, etc; Many errors and ambiguities; single generic words with a broad range of possible interpretations.
  • IRP Wind Website: [15]
12.15-13.00 Lunch Break. Play the EERAdata game “Utopia and metadata”. Or any time.
13.00-14.00 Online lectures:

Humanities and data: for a community-driven path towards FAIRness”, Elena Giglia, UNITO. Re-using presentation held at the Open Science Conference 2020 in Berlin. Presentation stored at zenodo. Link: [16]. Main points:

  • The Data Management Lifecycle: Identify research data -> Plan data management -> collect/produce & Structure & Store -> Deposit for Preservation, Cite & Share -> Dissimination
    • AT which phase to apply the FAIR principles?
  • "There is value and risk at being a first mover (regarding implementation of FAIR principles), but there is a higher risk at being a follower"
  • What is data in the humanities:
    • Never "raw" data
    • Data is always an expression of the method
    • there is always a choice ( methodological, epistemological, political,...)
    • There is always an interpretation, subjectivity (Data are not generated by a machine)
    • there is always a discussion
  • preliminary issues of FAIRness:
    • what language?
    • Lack of skills among researchers
    • registry of existing tools
    • need to preserve specificity of how we do research in the humanities
    • services and tools need to be sustainable
    • Time consuming and no incentive or reward to apply FAIR principles

RISIS - An e-Infrastructure for the STI-Policy research community”, Thomas Scherngell, AIT. Link: [17]. Main points:

  • What is RISIS: First pan-European research infrastructure to study research and innovation dynamics and policies
    • Set of interlinked databases on: Firm Innovation capabilities, R&D output, Public research and Higher Education, Policy Learning.
    • Not all data accessible, but interlinking mechanisms are fully public
  • While Metadata descriptions in, for instance, PDF format are not machine readable, it is also important to have a qualitative format such as PDF that is easily understandable by humans, for instance to present your data/ work to the outside.
  • RISIS Website: [18]
  • RISIS Knowmak tool: Provides Indicators in Key Enabling Technologies and Societal Grand Challanges [19]
  • SIPER: Science and Innovation Policy Repository [20]
14.00-14.30 Break - Game “Utopia and metadata”. Or any time.
14.30-16.00 Discussion to compile a to-do list for work in use cases on the second day. Serves as a guiding and aligning process. Lead by WP2, August Wierling/Valeria, HVL. What is the take-to-day-2 message for your use cases?
  • Use case 1: The main issue for us is re-usability. We need to assess the databases that were chosen previously. Learn from best practices.
  • Use Case 2: Our main issue is privacy/ sensitivity of data. Security should come first. There is a tradeoff between universal metadata language and domain-specific language.
  • Use case 3: Linguistics is a problem. As soon as we change the application of the material, we also change related metadata. Find similarities of already existing metadata.
  • Use case 4: We face different languages and terminologies. The range of interpretation of the same terms is broad. We should address low hanging fruits but also aim at cracking hard nuts to improve FAIR/O principles.
  • General: EERAData is probably more about asking the right questions to the energy research community than providing the right answers. There are already a lot of answers out there, we need to link them to our data issues. We envision being able to suggest low carbon energy metadata standards for and with the research community. Colleagues working on the EERAdata platform will join in on all use case discussions on day 2.
  • Motto: "Ontology is a formal representation of the matured knowledge of a community on a specific purpose".

Agenda Day 2

News of the day before

Not energy, but an inspiring connection: Perhaps unexpected, or perhaps even not: DNA helps to puzzle pieces of Qumran role. DNA taken from animal skins that were used to write on ...

Perhaps unexpected, or perhaps even not: DNA helps to puzzle pieces of Qumran role. DNA taken from animal skins that were used to write on ...

FAIR data and FAIR METADATA - DISCUSSIONS in use cases, parallel sessions

Work in use cases on databases and metadata, led by use case leaders. Suggested outline:

  • 10-12 Discuss and update the preliminary state of FAIR/O for the use case. Use the prepared draft of databases to check compliance with FAIR principles (tools: WP3 questionnaire and others). Compare the assessment results for each database. Observe and discuss agreements and differences across the evaluation tools. Generate the overall picture for FAIR/O compliance for the use case to pin down the state of art. Let’s see if we come to the same result as in our initial assessment for the application (traffic lights). Continuously make notes to report later on results. Select a responsible person. Objective: select 3-5 databases per use case. Discuss which databases to select. One to cover use-case specific challenges; and one with cross-use case relevance, and one a low hanging fruit for which it would be relatively easy to improve the current FAIR/O status.
  • 13-15 Joint brainstorming to discuss FAIR/O state of the metadata for the selected use cases. Evaluate: What is the current description of metadata? How extensive are they? Is only administrative information provided? Or richer context description? What frameworks for metadata are used: taxonomy? thesaurus? ontology? How is the metadata information technically implemented: plain text file? xml? rdf? ... Identify use case specific issues with metadata - What are the gaps? What is perceived as a hard nut to crack? Pay special attention to the metadata of the databases and fill out the table provided WP2. Continuously make notes to report results the next day! Select a responsible person!
  • from 15 Joint recordings of lessons learned. Create and/or update the WIKI for the use case with literature, gaps, best practices, FAIR/O discussion, metadata discussion, suggested next steps, .... Get your head around what to report next day! Plan for 20 min. See the links to WIKI page templates below (Notes from Day 2).

Issues identified across use cases

Use case 1 Use case 2 Use case 3 Use case 4
Buildings efficiency Power transmission & distribution networks Material solutions for low carbon energy Low carbon energy and energy efficiency policies
UC1 logo.jpg
UC 2 logo.jpg
UC 3 logo.jpg
UC 4 logo.jpg
Gaps & challenges per use case in a nutshell
  • qualitative nature of data limiting interoperability
  • data availability issues for time-series
  • multiplicity and scattered nature of data sources (households, industries, utility companies, municipalities)
  • Lack of standardization for metadata taxonomy and common vocabulary
  • Ambiguity on licensing issues for various types of energy data
  • Lack of unique identifier for energy data in most databases
  • microscopic use cases resembling the existing one about PV
  • link between microscopic and macroscopic materials (e.g., turbine blades)
  • metadata for applications of materials
  • linking to other fields
  • heterogeneous data make standardization difficult
  • policies are a topic linked to all use cases in EERAdata
  • metadata for images (e.g., maps) underdeveloped
  • complexity of complete provenance information
  • language and terminology are an issue (e.g., records instead of data)
  • stark discrepancy between FAIR assessment results by humans and machines

Use case 1: Continuously updated summary page UC1,

  • Detailed notes from WS1: WS1UC1
  • Detailed notes from WS2: WS2UC1
  • Detailed notes from WS3: WS3UC1
  • Detailed notes from WS4: WS4UC1
  • Detailed notes from WS5: WS5UC1
  • Detailed notes from WS6: WS6UC1

Use case 2: Continuously updated summary page UC2,

  • Detailed notes from WS1: WS1UC2
  • Detailed notes from WS2: WS2UC2
  • Detailed notes from WS3: WS3UC2
  • Detailed notes from WS4: WS4UC2
  • Detailed notes from WS5: WS5UC2
  • Detailed notes from WS6: WS6UC2

Use case 3: Continuously updated summary page UC3,

  • Detailed notes from WS1: WS1UC3
  • Detailed notes from WS2: WS2UC3
  • Detailed notes from WS3: WS3UC3
  • Detailed notes from WS4: WS4UC3
  • Detailed notes from WS5: WS5UC3
  • Detailed notes from WS6: WS6UC3

Use case 4: Continuously updated summary page UC4,

  • Detailed notes from WS1: WS1UC4
  • Detailed notes from WS2: WS2UC4
  • Detailed notes from WS3: WS3UC4
  • Detailed notes from WS4: WS4UC4
  • Detailed notes from WS5: WS5UC4
  • Detailed notes from WS6: WS6UC4

Note: This schedule is a suggestion. Adjust and organize breaks as needed.

Special session - FAIR data evaluation by machine

15.30 - 16.00 Andreas Jaunsen, Nordforsk. Presentation of the results from an automated FAIR evaluation for selected repositories suggested by the use cases.

The proposed list of databases to check:

Results Note: 22 indicators with the FAIR Maturity Evaluation Service are tested. 0 stands for "failing the test", 1 stands for "standing the test".

Most of them are a mix of web-pages (of repositories). Thus, a random dataset from a few of them was selected.

Database link Result link Result across 22 indicators Aggregate result across FAIR categories
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:JOC_1999_093_R_0001_01 https://fair.etais.ee/evaluation/4842 1001100001010110000000 37.50% 40.00% 28.57% 0.00% 31.82%
https://data.jrc.ec.europa.eu/dataset/93d07f10-7757-485f-bb8e-3160536b97f8 https://fair.etais.ee/evaluation/4843 1001110011110110011100 0.00% 80.00% 71.43% 0.00% 59.09%
https://onlinelibrary.wiley.com/doi/abs/10.1002/aenm.201902830 https://fair.etais.ee/evaluation/4844 1000000001010000000000 12.50% 40.00% 0.00% 0.00% 13.64%
https://doi.org/10.25832/conventional_power_plants/2018-12-20 https://fair.etais.ee/evaluation/4845 1101110011110110011100 62.50% 80.00% 71.43% 0.00% 63.64%

See also: https://www.rd-alliance.org/groups/fair-data-maturity-model-wg and https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/

Agenda and notes from Day 3

Reporting experience from use case applications. Prepare the next workshop on workflows and metadata. Introduction and discussion of data management plans.

Time slot Topic
10-12.00 Discussion lead by WP2. Report from use case experiences (by use case leaders). Wrap up.
  • UC1 - Wrap up:
    • Identified Issues: Often locally kept data (findability), poor maintenance of DBs (accessibility), privacy issues (re-usability), Dublin core standards often not considered, Lack of live and fresh data in many DBs, FAIRness assessment often subjective.
    • Next steps: Find recipes for FAIRification, Database specific implementation of FAIRness
  • UC2 - Wrap up:
    • Identified Issues: lack of standardization, DBs often discuss similar topics but use different (meta-)data standards, Varying degree of information on licensing, FAIR2.1 (provenance) unclear how to be interpreted.
    • Next steps: Reassess datasets with other FAIR principles than Wilkinson's, More discussion with team of other EERAData UCs, Cross country deep dive with selected DBs.
  • UC3 - Wrap up:
    • Identified Issues: Different metadata standards depending on domain, no unique definition of metadata/ individual preferences, lack of standards, interoperability, Increasing amount of generated data leads to increasing costs of data management.
    • Next steps: How to connect to other EERAData UCs? How to connect to EERA JPs?
  • UC4 - Wrap up:
    • Identified Issues: 2 groups of Databases: policy DBs and policy relevant DBs, Dimensions of policy: policy domain (thematic grouping) and policy scope (administrational grouping e.g. administrative level policy is aimed at)
    • Next steps: Agree on set of keywords, FAIR principles: comparison between Wilkinson and Mons system
12-12.30 Presentation "Some pitfalls in data base licenses", Carsten Hoyer-Klick, DLR - German Aerospace Center. Main points:
  • Data bases can be proteced by general copyright law or data base generation right or both
  • Data base generation right just need substantial invesment
  • If no license is applied to the data base, general copyright and data base generation right apply, which are very restrictive
  • Data bases should be published with suitable persmissive licenses (e.g. CC-BY)
  • Normally, public funded projects count as commercial project under general copyright law
  • https://open-power-system-data.org/legal
12.30-13.30 Lunch break
13.30-15.00 Data Management Plans

Presentation “Introduction to DMP and best practices” by Trond Kvamme, NSD. Link: [21] Main points:

  • Bulleted list item
  • almost 80% of all public funded research data is never re-used
  • Content of a DMP: 1 Roles and responsibilities, 2 Legal and ethical issues, 3 Data description, 4 Documentation and Data quality, 5 Storage and security, 6 Longterm preservation, 7 Data sharing and re-use
  • 5 quick tips for developing a DMP
    • Begin as early as possible and update regularly (according to H2020 at least one update at project start, midterm and end)
    • Check what help you can get from your institution
    • Consider online DMP tool
    • Create a DMP that actually helps you. Do not treat DMP a just another bureaucratic "must do"
    • Be as simple as possible but as precise as necessary. Make sure to clarify which part of your data each DMP section refers to. Large projects can have different data subsets that require different management plans.
  • DMP tools: DCC DMP online [22], DMPTool [23], easydmp [24], NSD DMP [25], Data Stewardship Wizard [26]

Presentation on Machine-actionable DMPs by Tomasz Miksa, TU Vienna. Link: [27] Main points:

  • Current shortcomings of DMPs:
    • Manually completed, vague, not updated, considered bureaucracy, completed last minute
    • People make commitments in DMP that effect other people without previous consultation, e.g researchers stating that their institution will provide storage space without contacting responsible administration at the institution first
    • Often more people involved in DMP as initially assumed, e.g secondary persons not directly involved in the project, such as administrators, legal consultants, etc.
  • Currently DMP designs are often driven by the wishes of the funders and ask for information that is not strictly necessary for a well structured data management.
  • The first aim of automated DMP tools would be to allow for automated retrieval of administrative data from already existing databases at the institutions (e.g. researcher profiles, Orcid-IDs, institution's profile, etc.), sparing the project members from manually re-entering this information with every new DMP.
  • RDA DMP Common Standards: [28]

Discussion of EERAdata DMP draft (August, HVL). Main points:

  • Establish a Data Committee
  • allow provision of different licenses for sub-datasets within one database
  • Even though EERAData only works with already existing databases, the results of the FAIR evaluation are new data that require new licensing
  • Short break of 15 min
15.15-16.00 Wrap up of workshop with feedback from invited experts.
  • Promoting FAIR data: Why is FAIR open data good? How many more users, publications, etc.?
  • Need for more responsible management of research investment: We are spending billions into generating data. We need to make sure that this data is valuable/ stays valuable
  • Can we minimize data, e.g. avoid double collection of data?


Day 1 1.jpg

WS1 through the lense of an artist