Set FAIRification objective

From EERAdata Wiki
Jump to: navigation, search

In the pre-FAIRification process, objectives and guiding principles are defined to steer the overall FAIRification process. A goal of the pre-FAIRification process is to find an optimum between effort and sophistication. As Wilkinson et al. [1] puts it: "Communities decide what FAIR practices are most important to them, essentially setting the targets for themselves, allowing members of that community to evolve over time while realistically operating within their budgets in order to achieve their best FAIR performance". This applies in particular to the amount of metadata provided and to the depth in the provenance of the compiled data. As an example, in many cases information is compiled from pdf documents. A decision could be to refer to the whole pdf document as a source without trying to specify the location of the data inside of the document. In general, workload, competencies, and skills of the involved team of researchers have to be analyzed and balanced with the benefits and level of detail provided to the end user. Another decision that can be made at this point is to identify issues with access rights to the data.

In this context, it is also useful to estimate the workload of FAIRification. Our experience from three projects can be found here.

Example

Use case 2 FAIRified an inventory of citizen-led initiatives active in the low-carbon energy domain. The data were collected in the H2020 project COMETS. While the data were collected as a set of linked files, the final inventory was set up as a single file since no resources were available to maintain the inventory as a database with a user frontend. This had important consequences on the FAIRification process. In detail, the following decision were taken:

  • To host the final file on a scientific repository.
  • As the project participated in the Open Research Data Pilot of the European Commission, the data had to be published as an open-access publication.
  • To answer the research questions identified in the user scenarios, it was crucial to preserve the relationships between the data. This first and foremost meant to keep the link between primary and secondary database keys with the task to combine several linked datasets into a single document.
  • To promote openness of the data and offer the data to a wide audience, non-proprietary data formats are avoided. The goal was to offer readable text instead of some sort of binary format such as an xls file.
  • Due to the very dynamic evolution of citizen-led initiatives and their projects, the database will be outdated in the medium term. To ensure extensibility is therefore important and additional data should be included in the inventory without major layout changes.
  • A core aim of the FAIR guiding principles is to ensure machine-actionability. As a consequence, special attention was put on making data as interoperable as possible. This implies using specific technical solutions (see the semantic model and linking data and metadata) that require educating the team members who compiled the inventory. Educational modules had to be developed on: a) Metadata standards, b) RDF, c) SPARQL as a query language.
  • Giving information on data sources was restricted to referring to the source document alone instead of specifying the particular location in the document.

References

  1. Wilkinson, M.D., Dumontier, M., Sansone, SA. et al. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Sci Data 6, 174 (2019) doi:10.1038/s41597-019-0184-5