WS1

From EERAdata Wiki
Revision as of 02:11, 4 June 2020 by Valerias (talk | contribs) (Agenda Day 1)
Jump to: navigation, search

The first workshop of EERAdata is organized as an online webinar and hackathon from June, 2-4, 2020.

Objective

  • Learn about FAIR and open data practices in the energy and other research communities;
  • Choose selected databases to be investigated in use cases. Discuss and summarize the reasons for selection.
  • Analyze the state of FAIR/O principles for the selected database (with the help of the questionnaire and FAIR/O assessment tools).
  • Evaluate the state of FAIR/O principles for the metadata of selected databases.
  • Suggest general principles for the design of metadata (use-case- and domain-specific),
  • Discuss top-level metadata adhering to FAIR/O principles (use-case- and domain-specific),
  • Learn about the FAIRification of Data Management Plans (DMP).

Read aheads

Obligatory:

Suggested read on the history of metadata: Metadata - Shaping Knowledge from Antiquity to the Semantic Web by Richard Gartner, https://www.springer.com/gp/book/9783319408910

Join and collaborate interactively!

The workshop is a hackathon! EERAdata is evolving jointly only! Join the discussions and work together online:


EERAdata story board

Objective: collect views from participants regarding energy data, energy metadata, FAIR/O problems etc. to support discussions, identification of gaps & needs (WP2) and elements to design the platform (WP3).

Questions:

  • Q#1: What is your user story regarding FAIR/O energy data?
  • Q#2: What is your user story regarding metadata?

Template to answer: As a <stakeholder>, I want <goal> so that <reason>.

How to post a story?

  1. Click on the link: https://padlet.com/janaschwanitz/aciywzt9fs1h70m2
  2. Click on plus at the right bottom to add a story.
  3. Choose as title depending on the question one wants to answer. Choose for Q#1: FAIR/O data, or for Q#2: metadata. In the example above, the title “metadata” was chosen.
  4. Type your sentence to let us know where the gaps and hopes are: The template is: As a <stakeholder>, I want <goal> so that <reason>.

Example: As a data-driven energy researcher, I want utopia - i.e. one-stop access to all relevant metadata -, so that I can browse available data, check where they are from and whether I can trust them when reusing.

How to comment a story?

  • One can rate stories by clicking on a number of stars.
  • One can comment stories as well as comments with words.
  • It is anonymous commenting and adding of stories, as long as one does not login into one's personal profile at padlet.com.

EERAdata wiki - How to?

The easiest way is you simply start writing and editing. You can install an easy editor by doing what is described in the picture to the right.

How to install an easy editor

Consult the User's Guide for information on using the wiki software.

EERAdata on github

Link here and share your thoughts and issues! Note, work in progress.

EERAdata @ Research Gate

Link here and share your thoughts and issues! Note, work in progress.

Just have fun!

Metadata memory game

Read before playing:

This is a little memory game where players need to identify triples, that is three tiles that belong to one data set. All of them relate to low carbon energy. Some tiles have the form of a picture, others show metadata descriptions (in various formats) and some are even sound files. So, the game is to have fun, while learning about metadata. Try out if you can find all triples. Note if you have opened 3 tiles and they do not match, all will automatically close. Just as you know it from a real-life memory game. Link

Agenda Day 1

Builds on read aheads. Online talks and discussions. Space for interaction with participants after each presentation. Moderated discussion of collected comments.

Time slot Topic
10.00-10.20 Welcome and introduction with workshop goals and procedures: “EERAdata - Towards Utopia for low carbon energy research”, Valeria Jana Schwanitz, HVL & PI EERAdata. Link: [1]
10.20-12.15 Online lectures:
  • “The EOSC Nordic: machine-actionable FAIR maturity evaluations & the FAIRification of data repositories” - Andreas Jaunsen, Nordforsk & EOSC-Nordic. Link: [2]
  • “OpenAIRE: Open Access Infrastructure for Research in Europe”, Ilaria Fava, OpenAIRE. Link: [3]

A short break of 15 min -

  • “Community-driven metadata and ontologies for Materials Science and their key role in artificial-intelligence tools”, Luca Ghiringhelli, FHI Berlin. Link: [4]
  • “Metadata practices from IRP Wind”, Anna Maria Sempreviva, DTU. Link: [5]
12.15-13.00 Lunch Break. Play the EERAdata game “Utopia and metadata”. Or any time.
13.00-14.00 Online lectures:
  • “Humanities and data: for a community-driven path towards FAIRness”, Elena Giglia, UNITO. Re-using presentation held at the Open Science Conference 2020 in Berlin. Link to presentation stored at zenodo: https://zenodo.org/record/3776849#.XthYBnYza90
  • “RISIS - An e-Infrastructure for the STI-Policy research community”, Thomas Scherngell, AIT. Link: [6]
14.00-14.30 Break - Game “Utopia and metadata”. Or any time.
14.30-16.00 Discussion to compile a to-do list for work in use cases on the second day. Serves as a guiding and aligning process. Lead by WP2, August Wierling/Valeria, HVL.

Notes from Day 1

Welcome and Introduction

  • Introduction by Valeria:
    • What is the project EERAdata about
    • In the energy system the data revolution speeds up
    • Data on:
      • Wheather patterns
    • more and more application are becomign available
      • digital twins: steering nuclear plants just from the scree
    • Buildings get energy passports
    • era of digital reporting and compliance arises
    • Importance of linking across different formats and topics
    • time spent as aresearcher: Large amount of time spent on data sorting, formatting, searching, etc
  • Valeria invites everyone again to write down their story connected to energy data on the EERAdata storyboard (see Link on Main page WIKI)
    • This will be valuable in moving forward with the ideas for the EERAdata project
  • Valerias personal starting point regarding EERAData:
    • Work in towards enabling FAIR and open data for the low carbon energy research community
  • We hope to add what we learn in this 3 days to the EERA WIKI
    • guidance platform for low carbon energy researchers
  • at the end we hope we come to a set of meta data standards for energy data
  • what could be the utopia for energy data researcher:
    • emagine lookign for citizen led initiatives in PV
    • search for keyworkds on a website -> and recieve a list of datasets related to this topic
    • for all those datasets:
      • see amount of initiatives available
      • when where they updated the last time
      • links provided to access the data
      • are the certified as FAIR/O and who is certifying institution
      • refine search with specific filter options
      • information on the quality/ data collection and processing methodology
      • option for simple data analysis operations
      • -> get an overview over richness of data
    • -> we are still very far away from this utopia
  • Goals of this Workshop:
    • listen to various best practices
    • day 2: discuss the provided examples
    • day 3: presentation of findings from day 2 discussions
    • day 3 afternoon: Data management plans (DMPs)

The EOSC Nordic: machine-actionable FAIR maturity evaluations & the FAIRification of data repositories

  • notes incoplete, will be added later
    • What is the EOSC:
      • goal:
        • enable reserachers to access data across domains and disciplines as easy as possible
        • locating the relevant data
      • vision:
        • all european data shoud be available to research
        • does not mean that all data is stored centrally, but that all databases are interconnected
  • WP 4 members: include all nordic countries
  • How can we improve datasets:
    • source code:
  • recommendations:
  • evaluations:
    • why do we evaluate repositories?
      • what is the level of FAIRness?
      • about 10 datasets per repository
      • FAIR score
      • Positive highlights:
      • FAIR score of all evaluated datasets:
        • majority at 0.10 out of 1
  • coming tasks:
  • question from Valeria: would it be possible to use their scoring system to assess one of the EERAdata datasets?
    • Answer:

OpenAIRE: Open Access Infrastructure for Research in Europe

  • Introduction into OPENAIRE:
    • 2007 pilot publications
  • Openaire is bridging the 2 worlds where science is performed and published
  • In practice:
    • services that monitor, assess and accelerate open science
    • facilitate research communities adoption of open science
  • who is openaire:
    • 50 partners all over europe
    • experts in open science in every EU member states
    • regional coordinators
    • topical coordinators working on policies for open science
    • openaire is an infrastucture
    • last phase of openaire: citizen science
      • openaire end in 12/2020
    • 4 regions: North-, east-, west-, south europe
    • large diferences in needs from country to country
      • -> find best solutions to develop open science for each country
    • connections to open science community in North america, Japan, ...
  • openaire has published set of guidelines in cooperation with international network:
  • provided services:
    • providing policy advice
    • training and support
    • open science infrastructure
    • How to achieve FAIRness in sharing resreach data?
    • guidelines on how to provide open access to publications, not only data
  • How we support:
    • Helpdesk: questions, FAQ
  • Openaire provides guides for:
    • researchers
    • content providers
    • funders
  • Webinars:
  • outreach:
    • 230 webinars
    • 22 NOADS are involved in EOSC WGs
  • explore portal:
    • search interface for all content available through Openaire
      • 40M publications: shows most crucial metadata
    • EERADATA record:
      • for now, no information, as no data has been collected from repositories yet
  • Openaire connect project:
    • gateway for reserach communities
    • allows research communities to build a gateway that collects different data relevant to your research community
  • COVID-19 research community:
    • ca 3200 research datasets
    • 113 relates projects
  • OpenAIRE provide:
    • allows connection of repositories to openAIRE
  • Zenodo: repository developed by openaire that allows institutions to publish their reserach output
  • AGROS: machines readable data repository
  • comment from Andreas Jaunsen: today, research projects spent roughly 80% of their research time on data gathereing/ processing
    • -> open access aims to reduce this by only having to precess the data once
  • Carsten Hoyer comments:
    • create a metadata catalogue
    • link the provenance information of data
      • -> track who has done what with the data
  • Question from August:
  • Rich metadata: search specifically for the research question for one on paper? How could this be achieved?
    • openaire explore: works as normal search engine -> high "noise" (unwanted search results)
  • incentives for researchers to provide rich metadata?
    • Use of commen ontology allows for more refined serach results

Community-driven metadata and ontologies for Materials Science and their key role in artificial-intelligence tools

  • Key message:
  • FAIR: Findable, Accessible, Interoperable, Reusable
    • Findable:
      • Uniqueness of the date
    • accessible:
      • URL, accessible vi API
    • Interoperability:
    • reusability: Metadata should be as desciptive as possible
  • NOMAD-FAIRDI Workshop:
    • Shared metadata and data formats for big-data driven material science
  • Data object:
    • Metadata:
      • Unique identifier, Structure of the data, Method
      • should contain information of teh full provenance of the data:
        • where does it come from? Another database? A calculation? etc?
      • Definition:
  • NOMAD Respository structure:
    • Nomad Repository
    • Conversion Layer
    • The archive
      • Three access points:
  • Computational material sciences:
  • Ontology:
  • Questions:
    • what happens to metadate when you physically go from one structure to a combined structure? How do you combine the metadate of the two initial structures?
    • from EERAData perspective: in material for low carbon energy use case, what could be valuable contribution of EERAData to reserach community?
      • Ontologies are driven by the use cases.
      • summarizing: Excersizing a fine grain use case such as solar PV. Linking different levels of metadata. Further develop metadata at the use case level. How to link across use cases?

Metadata practices from IRP Wind

  • Starting point was the inability to find information
  • This presentation is about research data
  • digital objects can be assets due to competitive advantages that the digital objects provides
  • Interpretion of FAIR: F A and I data results in R data
    • Reusable data multiplies the value of the data
  • "Data should be as opeb as possible but as closed as neccessary"
  • Issue: How to make data findable but safe
    • Misstrust/ fear that own data might be missused/ one looses the competitive advantage
  • Issue: Data findability
    • Datasets are organized in different ways
  • IRP Metadate and taxonomies:
    • Base metadate as defined by Dublin core standards
    • Additional non-Dublin Core taxonomies
  • Creating a taxonomy:
    • Expert elicitation: Group of experts creates a taxonomy which is then reviewed by wider reserach community
    • Author keywords: Map keywords used by authors along similarities in meaning, frequency of usage
      • Pros: Allows tracking of trends
      • Cons: Mix of disciplines, models, etc; Many errors and ambiguities; single generic words with a broad range of possible interpretations.
  • Virtual libraries:
    • Two aspects: Seach engine (metadata) and Data storage (data)
      • Linked databases allow being searched centrally, but data is stored decentrally
  • conclusion:
    • Purpose of a web data portal is to (1) connect safely data owners and users and (2) inform on availability of Data, not neccessarily provide direct access to data
  • Discussion:
    • How to create and maintain metadata? How to allow dynamic updates of data? How to deal with language issues (non-native speaketors, translation,...)?
    • Third option to create taxonomies: Use machines or algorithms to link the different taxonomy development approaches?
    • General metadata standards exist, what is needed are domain specific metadata
    • Two seperate categrories: Metadata and vocabolaries

Humanities and data: for a community-driven path towards FAIRness

  • Humanities are often a bit neglected in the data community
  • Who is ALLEA?
  • Who is Co-operas?
    • implementation network
    • goal: taking care of the whole cycle of humanities reserach from data gathering to publishing
  • Data management life cycle:
    • Identification
    • Planning
    • Gathering
    • maintaining, processing
    • dissimination
  • FAIR Data in humanities?
    • currently, the FAIR principles will persist in shaping the data landscape in humanities
  • "There is risk and value in being a first mover, but there is more risk in being a follower"
  • Is data still a dirty word in the SSH?
    • In the humanities, we all use Data, even if we are not aware of it
    • Data in humanities is more prone to interpretation than in STEM disciplines -> harder to grasp
  • What is data in the humanities:
    • Never "raw" data
    • Data is always an expression of the method
    • there is always a choice
    • There is always an interpretation
    • there is always a dicussion
  • Data pillars:
    • Data are so divers that it is impossible to converge on a common definition
      • to define data, go as deep as possible into reserach practices
  • risk of oversimplification
  • preliminary issues of FAIRness:
    • In which step and how should FAIR principles be applied?
    • what language?
    • Lack of skills among researchers
    • registry of existing tools
    • need to preserve specifities of how we do research in the humanities
    • services and tools need to be sustainable
  • Findability:
    • metadate needs to be seperated from data
    • maintain richness of metadata
  • difficulties in humanities:
    • data in humanities are linked to cultural, political aspects
  • Nest steps:
    • Data stewardship wizard for SSH
  • Accessible does not equate to open
  • interoperability:
    • data is often so closely tailored to a specific project, that interoperability in impossible
  • reusability:
    • Copyright is still an issue
    • need of legal advice
    • In the end, reuse is the final aim of FAIRness
    • reusability is also a measure of the impact of your data
  • mindshift:
    • currently everything is closed, except for a few daring frontrunners
    • in future, everything should be open, except for a few absolutely neccessary cases.

RISIS - An e-Infrastructure for the STI-Policy research community

  • RISIS:
    • within social sciences
    • strong connection to economics
    • Measuring innovation: long tradition of using quantitative indicators and qualitative models
    • started in 2014
    • focuss on linking R&I datasets
    • harmonizing of Organization names (Organization are main objects of study in this project)
  • Datasets:
    • firm innovation capabilities
    • R&D output databases
    • Public research and higher education
    • Database on policy learning
  • need to adapt to new developments within research communities
  • RISIS Knowmak tool:
    • openly and freely present indicators in R&I gathered from RISIS datasets
  • RISIS dataset portal
    • Check metadata of available datasets
    • choose relevant datasets
    • fill out data access aplication form
    • application will be assessed by external reviewers
    • if approved, acces is granted
  • Discussion:
    • How do you integrate your datasets with other datasets (e.g. business registers, national industry sector classifications)
      • bussiness registers often not open, but need purchasing, hence integration often not possible
      • external classifications (e.g. industry sector classifications) are often included, but a inetrnal ontology was also developed
    • Are the metadata desciptions machien readable? Or only available as PDF?
      • Yes they are also available as machine readable formats, but PDF as qualitative version is imprtant for external presentation

Discussion/ to-do list for work in use cases

  • Questions:
    • What is the take home message for you?
  • Use case 1:
    • main goal is the re-usability
    • we need to assess the databases that were chosen previously
    • best preactices: EOSC,
  • Use Case 2:
    • Main issue in this use case is privacy/ sensitivity
    • security comes first
    • Tradeoff between unsiversal metadata language and domain specific language
  • Use case 3:
    • Linguistics is a problem
    • As soon as we change the application of material/ material data, we change/need to change the metadata
    • Find similarities of already existing metadata
  • Use case 4:
    • big challange regarding the linguistic problem
      • Diferent language
      • Different interpretation of same terms
    • Pick low hanging databases that are far away from FAIR principles
  • EERAData is probably more about asking the right questions than providing the right answers, there are already a lot of answers out there
  • Interests of HVL:
    • Assess FAIRness of our own COMETS database
    • Assess how EERAdata adheres to new EU funding requirements
  • Outcome of first and second workshop is hopefully some drafts for low carbon energy metadata standards, which can be reviewed by experts
  • EERAdata platform responsibles will join in on all use case discussions on second WS day
  • "Ontology is a formal represenation of the knowledge of a community on a specific purpose"

Agenda Day 2

News of the day before

Not energy, but an inspiring connection: Perhaps unexpected, or perhaps even not: DNA helps to puzzle pieces of Qumran role. DNA taken from animal skins that were used to write on ...

Perhaps unexpected, or perhaps even not: DNA helps to puzzle pieces of Qumran role. DNA taken from animal skins that were used to write on ...

METADATA - DISCUSSIONS in use cases, parallel sessions

Work in use cases on databases and metadata, led by use case leaders. Suggested outline:

  • 10-12 Discuss and update preliminary state of FAIR/O for the use case. Use the prepared draft of databases to check compliance with FAIR principles (tools: WP3 questionnaire and others). Compare the assessment results for each database. Observe and discuss agreements and differences across the evaluation tools. Generate the overall picture for FAIR/O compliance for the use case to pin down the state of art. Let’s see if we come to the same result as in our initial assessment for the application (traffic lights). Continuously make notes to report later on results. Select a responsible person. Objective: select 3-5 databases per use case. Discuss which databases to select. One to cover use-case specific challenges; and one with cross-use case relevance, and one a low hanging fruit for which it would be relatively easy to improve the current FAIR/O status.
  • 13-15 Joint brainstorming to discuss FAIR/O state of the metadata for the selected use cases. Evaluate: What is the current description of metadata? How extensive are they? Is only administrative information provided? Or richer context description? What frameworks for metadata are used: taxonomy? thesaurus? ontology? How is the metadata information technically implemented: plain text file? xml? rdf? ... Identify use case specific issues with metadata - What are the gaps? What is perceived as a hard nut to crack? Pay special attention to the metadata of the databases and fill out the table provided WP2. Continuously make notes to report results the next day! Select a responsible person!
  • 15-17 Joint recording of lessons learned. Create and/or update the WIKI for the use case with literature, gaps, best practices, FAIR/O discussion, metadata discussion, suggested next steps, .... Get your head around what to report next day! Plan for 20 min. See the links to WIKI page templates below (Notes from Day 2).

Gaps identified across use cases

Use case 1 Use case 2 Use case 3 Use case 4
Buildings efficiency Power transmission & distribution networks Material solutions for low carbon energy Low carbon energy and energy efficiency policies
UC1 logo.jpg
UC 2 logo.jpg
UC 3 logo.jpg
UC 4 logo.jpg
  • qualitative nature of data limiting interoperability
  • data availability issues for time-series
  • multiplicity and scattered nature of data sources (households, industries, utility companies, municipalities)
  • Lack of standardization for metadata taxonomy and common vocabulary
  • Ambiguity on licensing issues for various types of energy data
  • Lack of unique identifier for energy data in most databases
  • microscopic use cases resembling the existing one about PV
  • link between microscopic and macroscopic materials (e.g., turbine blades)
  • metadata for applications of materials
  • linking to other fields
  • heterogeneous data make standardization difficult
  • policies are a topic linked to all use cases in EERAdata
  • metadata for images (e.g., maps) underdeveloped
  • complexity of complete provenance information
  • language and terminology are an issue (e.g., records instead of data)
  • stark discrepancy between FAIR assessment results by humans and machines

Note: This schedule is a suggestion. Adjust and organize breaks as needed.

Notes from Day 2

  • Use case 1 - Summary: UC1, Detailed notes: WS1UC1
  • Use case 2 - Summary: UC2, Detailed notes: WS1UC2
  • Use case 3 - Summary: UC3, Detailed notes: WS1UC3
  • Use case 4 - Summary: UC4, Detailed notes: WS1UC4

Agenda Day 3

Reporting experience from use case applications. Prepare the next workshop on workflows and metadata. Introduction and discussion of data management plans.

Time slot Topic
10-12.00 Discussion lead by WP2. Report from use case experiences (by use case leaders). Wrap up.
12-12.30 "FAIR data and licensing", Carsten Hoyer-Klick, DLR - German Aerospace Center
12.30-13.30 Lunch break
13.30-15.00 Data Management Plans
  • Presentation “Introduction to DMP and best practices” by Trond Kvamme, NSD
  • Presentation on machine-actionable DMPs by Tomasz Miksa, TU Vienna
  • Discussion of EERAdata DMP draft (August, HVL)
  • Short break of 15 min
15.15-16.00 Wrap up of workshop with feedback from invited experts.

Notes from Day 3

  • Bulleted list item