FAIR guiding principles
These principles are outlined in Box #2 of Wilkinson et al.
To be Findable:
- F1. (meta)data are assigned a globally unique and persistent identifier
- F2. data are described with rich metadata (defined by R1 below)
- F3. metadata clearly and explicitly include the identifier of the data it describes
- F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
- A1. (meta)data are retrievable by their identifier using a standardized communications protocol
- A1.1 the protocol is open, free, and universally implementable
- A1.2 the protocol allows for an authentication and authorization procedure, where necessary
- A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
- I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- I2. (meta)data use vocabularies that follow FAIR principles
- I3. (meta)data include qualified references to other (meta)data
To be Reusable:
- R1. meta(data) are richly described with a plurality of accurate and relevant attributes
- R1.1. (meta)data are released with a clear and accessible data usage license
- R1.2. (meta)data are associated with detailed provenance
- R1.3. (meta)data meet domain-relevant community standards
In order to react to inconsistent interpretations of these principles, Jacobsen et al.  specified implications for possible implementations of these principles, starting from the objectives behind the FAIR principles:
Findability: Digital resources should be easy to find for both humans and computers. Extensive machine-actionable metadata are essential for automatic discovery of relevant datasets and services, and are therefore an essential component of the FAIRification process ...
Accessibility: Protocols for retrieving digital resources should be made explicit, for both humans and machines, including well-defined mechanisms to obtain authorization for access to protected data.
Interoperability: When two or more digital resources are related to the same topic or entity, it should be possible for machines to merge the information into a richer, unified view of that entity. Similarly, when a digital entity is capable of being processed by an online service, a machine should be capable of automatically detecting this compliance and facilitating the interaction between the data and that tool. This requires that the meaning (semantics) of each participating resource – be they data and/or services service – is clear.
Reusability: Digital resources are sufficiently well described for both humans and computers, such that a machine is capable of deciding: if a digital resource should be reused (i.e., is it relevant to the task at-hand?); if a digital resource can be reused, and under what conditions (i.e., do I fulfill the conditions of reuse?); and who to credit if it is reused.
In the paper of Jacobsen , recommendations are given for implementation of the 15 principles. Here a few quotes:
A common example of a useful identifier is the Digital Object Identifier (DOI) which is guaranteed by the DOI specification to be globally unique and persistent. DOIs provide an additional service, under principle A1, of being able to direct calls to the source data to the location of that data, even if the identified data moves. This ensures that identifiers are stable and valid beyond the project that generated them. In some circumstances, again with DOIs being an example, third-party persistent identifiers may also provide support for principle A2 (that metadata exists beyond the lifespan of the data) since these identifiers may still be responsive to Web calls, and be capable of providing metadata, even if the source resource is no longer active.
Whereas principle F1 enables unambiguous identification of resources of interest, principle F2 speaks to the ability to discover a resource of interest through, for example, search or filtering.... It is a challenge for each domain-specific community to define their own metadata descriptors necessary or optimizing findability. The minimal “richness” of the metadata should be defined so that it serves its intended purpose and should also be guided by the requirements of the other FAIR principles.... Examples of metadata schemata can be found in FAIRsharing and include for instance the Data Documentation Initiative (DDI), the HCLS Dataset Descriptors, and many domain-specific “minimal information” models that have been invented.
It is a challenge to each community to choose a machine-actionable metadata model that explicitly links a resource and its metadata. An example of a technology that provides this link is FAIR Data Point, which is based on the Data Catalogue model (DCAT) that provides not only unique identifiers for potentially multiple layers of metadata, but also provides a single, predictable, and searchable path through these layers of descriptors, down to the data object itself.
Principle F4 states that digital resources must be registered or indexed in a searchable resource. The searchable resource provides the infrastructure by which a metadata record (F1) can be discovered, using either the attributes in that metadata (F2) or the identifier of the data object itself (F3).... Current choices are for each community to choose, and publicly declare, what search engine to use for their own purposes, general or field-specific, and should at a minimum provide metadata following the standard that is indexed by the search engine of choice. They should also provide a machine-readable interface definition that would allow an automated search without human intervention. ... An example of a generic searchable resource that supports manual exploration is Google Dataset Search.
The “standardized communication protocol” is critical here. Its purpose is to provide a predictable way for an agent to access a resource, regardless of whether unrestricted access to the content of the resource is granted or not. An example of a standardized access protocol is the Hypertext Transfer Protocol (HTTP); however, FAIR does not preclude non-mechanized access protocols, such as a verbal request to the data holder in the case of highly sensitive data, so long as the access protocol is explicit and clearly defined.
Current choices are for communities to choose standardized communication protocols that are open, free and universally implementable. The most common example of a compliant protocol is the HTTP protocol that underlies the majority of Web traffic. It has additional useful features, including the ability to request metadata in a preferred format, and/or to inquire as to the formats that are available.
This principle clearly demonstrates that FAIR is not equal to “open”. Some digital resources, such as data that have access restrictions based on ethical, legal or contractual constraints, require additional measures to be accessed. This often pertains to assuring that the access requester is indeed that requester (authentication), that the requester’s profile and credentials match the access conditions of the resource (authorization), and that the intended use matches permitted use cases (e.g. non-commercial purposes only) (see also R1.1, where there are requirements to provide explicit documentation about who may use the data, and for what purposes). At the level of technical implementation, an additional authentication and authorization procedure must be specified, if it is not already defined by the protocol (see A1.1). ... Again, the most common example of a compliant protocol is the HTTP protocol. Another example is the life science AAI protocol. Brewster et al. describe an early implementation of an ontology-based approach to this challenge.
... given that those data may have been used and are referenced by others, it is important that consumers have, at the very least, access to high quality metadata that describes those resources sufficiently to minimally understand their nature and their provenance, even when the relevant data are not available anymore. This principle relies heavily on the “second purpose” of principle F3 (the metadata record contains the identifier of the data), because in the case where the data record is no longer available, there must be a clear and precise way of discovering its historical metadata record....Examples of early attempts to address this critical principle relates closely to the principles of digital curation including the concept of a FAIR compliant DMP (Data Management Plan). Many other efforts are underway to improve the long-term stewardship of reusable digital resources.
Achieving a “common understanding” of digital resources through a globally understood “language” for machines is the purpose of principle I1, with an emphasis on “knowledge” and “knowledge representation”. ... the principle says that producers of digital resources are required to use a language (i.e., a representation of data/knowledge) that has a defined mechanism formechanized interpretation – a machine-readable “grammar” – where, for example, the difference between an entity, as well as any relevant relationship between entities, is defined in the structure of the language itself. This allows machines to consume the information with at least a basic “understanding” of its content. ... The key consideration in this regard is that FAIR speaks to the ability of data to be reused by a generic agent, rather than a community-specific agent. This is most easily accomplished by making the knowledge available in the most widely used format(s), even if this means duplication of the information in the community-specific format. ... The most widely-accepted choice to adhere to this principle, at the present time, is the Resource Description Framework (RDF) which is the W3C’s recommendation for how to represent knowledge on the Web in a machine-accessible format. Other choices may also be acceptable, for instance when they are already in widespread use within a given community. In that case, it would be helpful for the community to also provide a “translator” between their preferred format, and a more widely used format such as RDF.
I2, ..., requires that the vocabulary terms used in the knowledge representation language (principle I1) can be sufficiently distinguished, by a machine, to ensure detection of “false agreements” as well as “false disagreements”.
Ontologies defined in the “Web Ontology Language” (OWL) and shared via a publicly accessible registry (e.g. BioPortal for life science ontologies) are examples of formally represented, accessible, mapped, and shared knowledge representations in a broadly applicable language for knowledge representation, that are also compliant with the Findability requirements of FAIR, since BioPortal provides a machine-accessible search interface.
An important aspect of FAIR is that data or metadata, generally speaking, does not exist in a silo ... To be FAIR with respect to principle I3, the data could contain links to a resource containing city data (e.g., Wikidata), geographical and geospatial data, or other related domain resources that are generated by that city, so long as they are properly qualified references using meaningful,clearly-interpretable relationships.
It is worth noting as an example that several “upper ontologies” such as the SemanticScience Integrated Ontology have a wide range of precisely-defined relationships that can be used as-is, or as a starting-point for a newly-minted relationship that is more specific than the one provided in the upper-ontology. The benefit of “inheriting” from higher-level relationships is that agents capable of understanding these higher level concepts, can infer at least a basic interpretation of the intent of the new relationship coined within the community, and therefore enhances interoperability.
... the focus of R1 is to enable machines and humans to assess if the discovered resource is appropriate for reuse, ... The term “plurality” is used to indicate that the metadata author should be as generous as possible, not presuming who the consumer might be, and therefore provide as much metadata as possible to support the widest variety of use-cases and agent needs.
Digital resources and their metadata must always, without exception, include a license that describes under which conditions the resource can be used, even if that is “unconditional” ... the absence of a license does not indicate “open”, but rather creates legal uncertainty that will deter (in fact, in many cases legally prevent) reuse. ... There are good reasons for choosing a CC0 license for data18 and these considerations should be assessed, alongside all other considerations, when a community decides on the license they wish to apply. It is critical, however, that a license is chosen.
Detailed provenance includes facets such as how the resource was generated, why it was generated, by whom, under what conditions, using what starting-data or source-resource, using what funding/resources, who owns the data, who should be given credit, and any filters or cleansing processes that have been applied post-generation. ... Provenance descriptions can for instance be implemented following community specific templates according to the PROV-Template approach.
. Communities must (then) take-on the challenge of deciding which metadata elements, addressed within their community’s “boutique” standard(s), should be additionally represented using a more global standard (p.rinciples F2 and R1.2), even if this results in duplication of metadata, such that it can be used for search and interpretation by more generic, third-party agents.
- ↑ Wilkinson et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18. Template:Cite journal
- ↑ 2.0 2.1 Jacobsen et al. FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence 2:10-29 (2020) doi: 10.1162/dint_r_00024.