CSVW for energy datasets
This page collects examples of improving the FAIRness of datasets using the csv extension csvw. Most importantly, csv on the web offers a possibility to tie together metadata and data, starting from a well-known and widely used data format. The standard offers a rich framework to annotate existing csv documents with additional information and transform them into other forms of structured data exchange formats such as JSON(-ld) and RDF. At the same time, csv on the web is user-friendly offering a flexible mechanism from minimal FAIR extensions to elaborated context building for the data to be shared. CSV on the web is a W3C recommendation in coherence with ...
Contents
FAIR principles
The example illustrates how csv on the web contributes to realize the FAIR principles
To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
To be Accessible:
A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
To be Re-usable:
R1. (meta)data have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
Worked example
We start with a csv file whose contents are shown in this table:
Name | status | year of foundation | national identifier | Street address | city | postal code | C/O | lat | lon | website | Activity | National industrial sector classification (if no other information on area of activity available) | Purpose (original language) | Purpose (translation) | date of removal | country code | Legal Form |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kvarkenvinden 1 | active | 1998-01-27 | 769602-8096 | Norra Obbolavägen 89 | Umeå | 904 22 | 63.80667 | 20.27364 | http://kvarkenvinden.se | wind onshore | 35110 | Föreningen har till ändamål att främja sina medlemmars ekonomiska intressen och dess miljöintresse genom att utöva driftsansvar över vindkraftverk i syfte att tillhandahålla vindenergi för medlemmarnas konsumtion. All genom föreningen genererad vindenergi ägs av medlemmarna. | The purpose of the association is to promote the financial interests of its members and its environmental interests by exercising operational responsibility for wind turbines in order to provide wind energy for the members' consumption. All wind energy generated by the association is owned by the members. | SWE | C61P | ||
Ollebacken vind ekonomiska förening | active | 2008-01-08 | 769618-1010 | SIKÅS NORRA BYVÄGEN 180 | Hammerdal | 833 49 | 63.67432 | 15.06297 | https://www.ollebackenvind.se | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarans ekonomiska intressen genom att i egen regi producera miljö vänlig energi. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. | SWE | C61P | ||
Jamtkulingen ekonomiska förening | active | 2009-01-20 | 769619-7420 | Södra Strandvägen 19 A | Frösön | 832 44 | Sven Erik Eriksson | 63.17622 | 14.61152 | http://www.jamtkulingen.se/ | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. | SWE | C61P | |
Hällingarna Vind | active | 2005-08-02 | 769612-8318 | OLLEBACKEN 130 | Hammerdal | 833 49 | 63.59838 | 15.05107 | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig engeri. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. | SWE | C61P | |||
Offerdalsvind Ekonomiska förening | active | 2000-08-31 | 769606-0719 | BERGE 718, | Offerdal | 835 97 | 63.46154 | 14.09483 | http://www.offerdalsvind.se | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. | SWE | C61P | ||
Trärike vindkraft ekonomisk förening | liquidation | 1996-08-07 | 769601-6331 | VIKINGAVÄGEN 36 | Sundsvall | 857 41 | 62.40317 | 17.26335 | http://www.trarikevindkraft.se/index.htm | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intresse genom att förse medlemmarna med egen vindkraft- producerad el och även främja medlemmarnas miljöintresse och vindkraftens utveckling. Föreningen skall bygga upp ett kapital som säkrar uppbyggnad, drift, underhåll och demontering av föreningens vindkraftverk. | The purpose of the association is to promote the members 'financial interest by providing the members with their own wind-powered electricity and also promoting the members' environmental interest and the development of wind power. The association will build up a capital that ensures the construction, operation, maintenance and dismantling of the association's wind turbines. | SWE | C61P | ||
Dala Vindkraft Ekonomisk förening | active | 2006-02-18 | 769613-8911 | RIKSVÄGEN 15 | Rättvik | 795 32 | 60.88933 | 15.11092 | http://dalavind.se/vindandelar-foreningar/dala-vindkraft-ekonomisk-forening/medlemsinformation | wind onshore, E-trade | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen, samt deras miljöintresse, genom att tillhandahålla medlemmarna egen vindkraftsproducerad elkraft. | The purpose of the association is to promote the members' financial interests, as well as their environmental interests, by providing the members with their own wind-powered electricity. | SWE | C61P | ||
Vindela | active | 2004-08-17 | 769611-2411 | BOX 4 | Malung | 782 21 | 60.6834 | 13.71603 | http://dalavind.se/vindandelar-foreningar/vindela/ | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. | SWE | C61P | ||
Äppelbovind | active | 2000-09-25 | 769606-1485 | BOX 4 | Malung | 782 21 | 60.6834 | 13.71603 | http://dalavind.se/vindandelar-foreningar/appelbovind/kontakt/ | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. | SWE | C61P | ||
Fjällbergsvind ekonomisk förening | liquidation | 2005-09-13 | 769613-0587 | Djupuddsvägen 35 | Grängesberg | 772 40 | 60.08136 | 14.98449 | http://dalavind.se/vindandelar-foreningar/fjallbergs-vind-ekonomiskforening | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkrafts- producerad elkraft. | The purpose of the association is to promote the members' financial interests by providing the members with their own wind power produced electricity. | SWE | C61P | ||
Kyrkvinden ekonomiska förening | active | 2005-05-09 | 769613-0025 | GIMOGATAN 6 B 3TR | Uppsala | 752 20 | 59.8687 | 17.6083 | https://www.kyrkvinden.se | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att förmedla och i egen regi eller genom samarbetspartner producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by conveying, on their own account or through partners, environmentally friendly electricity | SWE | C61P | ||
Ljusterö Vind ekonomiska förening | active | 2008-04-02 | 769618-5961 | LJUSTERÖ TORG | Ljusterö | 184 95 | 59.52403 | 18.60869 | http://www.ljusterovind.se/ | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi samt annan därmed förenlig verksamhet. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy and other related activities on their own behalf. The members participate in the business as consumers. | SWE | C61P | ||
Windy ekonomisk förening | active | 2000-12-11 | 769606-4802 | SVARTEDALSBACKEN 9 | Lerum | 443 39 | Mattias Skjöldebrandt | 57.76418 | 12.26767 | http://windy-vindkraft.se/ | wind onshore | 35110 | Föreningen har till ändamål att främja medlemmarna ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkraft- producerad el, därigenom också främjande medlemmarnas intresse för miljö och energihushållning samt bedriva därmed förenlig verksamhet. | The purpose of the association is to promote the members 'financial interests by providing the members with their own electricity produced by wind power, thereby also promoting the members' interest in the environment and energy management, and conducting compatible activities therewith. | SWE | C61P |
The csv file contains information on the name of the initiatives, its legal status, its year of foundation, its national identifier, its street address, the city it is located in, the corresponding postal code, a possible C/O information, a latitude geo-information of the location (lat), a longitude geo-information of the location (lon), the website of the initiatives, some information for activities, a national industrial sector classification, a purpose statement in original language, the same purpose statement translated to English, the date of removal, the country code, and its legal form. To relate metadata information to this information in the csv file, we create a second file containing this metadata. The file format for this metadata information file is json. Let us assume that the csv file itself has the filename "SWE_initiatives_sample.csv". According to the csv on the web standard, the metadata file should have the filename "SWE_initiatives_sample.csv-metadata.json". A minimal form of the metadata file contains the following information
{ "@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv" }
The @context information links to the language conventions of the csvw standard, the url information states the filename of the csv file. This minimal file can be extended to contain more specific metadata. All entries are encoded in the form of property specifications and corresponding values.
General information about the csv file
In a first step, we include general information about the csv file. We start with a code snippet for specifics such as title, description, and creator
{ "@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv" "dc:title": "Example - list of citizen-led initiatives in Sweden", "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web", "dc:creator": { "schema:name": "August Wierling", "schema:url": "https://orcid.org/0000-0002-7443-7593", "schema:contactPoint": { "email": "augustw@hvl.no} }
As in the example above, property specifications can be terms from popular metadata vocabularies. E.g. the Dublin Core, schema.org, or DCAT vocabulary can be used. All of these vocabularies can be used independently or together. In the above example, metadata terms from the Dublin core vocabulary are mixed with terms from schema.org. The title of the csv file, and its description are stated using the Dublin core terms. The information inside of the dc:creator term contains information which in turn is specified using the schema.org vocabulary. Information about the creator is given in more detail specifying a human-readable name of the creator, a url of the creator (here: his orcid number), and contact point details such as the email. The contact point information can be extended using also a telephone or a fax number. We continue with a more extensive list of details about the file as a whole
"@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv", "dc:title": "Example - list of citizen-led initiatives in Sweden", "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web", "dc:date": "2022-10-07", "dc:format": "text/csv", "dc:language": "en-US", "dc:publisher": { "schema:name": "EERAdata project", "schema:url": "https://cordis.europa.eu/project/id/883823", "schema:contactPoint": { "email": "info@eeradata.eu", "url": "https://www.eeradata.eu" } } "dc:rights": "https://creativecommons.org/licenses/by-sa/4.0/", "dc:subject": "Energy communities, Sweden, Community energy, Energy cooperatives, Renewable Energy", "dc:source": { "schema:name": "ENBP Inventory \"Energy by people\" - First Europe-wide inventory on energy communities", "schema:url": "https://doi.org/10.18710/2CPQHQ" }, "dc:type": "dataset", "dc:creator": { "schema:name": "August Wierling", "schema:url": "https://orcid.org/0000-0002-7443-7593", "schema:contactPoint": { "email": "augustw@hvl.no} }, "dc:coverage": "Sweden", "dc:identifier": "https://eeradata-platform.eu/"
The date follows ISO 8601. The language is specified following RFC 4646. For the media type, RFC 7111 has being used as a specification. The type is taken according to the DCMI type vocabulary. The dc:publisher information has several details which are grouped into one object by curly brackets: the EERAdata project as the name of the publisher, the corresponding CORDIS entry as a persistent identifier, and contact information in form of an email and a website. The entry for dc:rights contains the license information and points to a website provided by the creative commons organization. It states that the csv file is licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). As such, anybody is free to share and adapt the file. The dc:subject contains a list of keywords describing the contents of the csv file in more details. The subject information should be more extensive in a real example. Here, only a basic example is given. The dc:type information declares the csv file information as a dataset according to the possible types listed by the DCMI type element working draft. The dc:identifier holds as DCMI describes it an unambiguous reference to the resource within a given context. Ideally, the resource is the final FAIRified object. Thus, it does apply to the json file created out of the original csv file and its json metadata document. Best practice is to assign a persistent identifier.
Now, how does this contribute to make the original csv FAIR fair ?
Specifying information about table headers
This section describes how to specify further the entries in the various columns of the csv file. Note that the csv on the web standard allow to connect the metadata file in json to several csv files which share certain layout. For our puroposes here, we focus on a single table - the one illustrated above with the information on Swedish energy coooperatives. Before describing a full-fletched description for all the columns, we start with the first four columns from the left specifying the name of the initiative, its legal status, the year of foundation, and a national identifier. We start with a simple set of specifications before assigning more information to the columns. For more information, please also see the primer as well as the recommendation itself.
"tableSchema": { "columns": [{ "titles": "Name", "dc:description": "Name of the initiative", "datatype": "string", "required": true },{ "titles": "Status", "dc:description": "Legal status", "datatype": { "base": "string", "format": "active|inactive|liquidation" }, "required": true },{ "titles": "Year of foundation", "dc:description": "Year of foundation of the initiative", "datatype": "date" },{ "titles": "National identifier", "datatype": { "@type": "https://www.wikidata.org/wiki/Property:P6460", "dc:title": "National identifier for Sweden", "dc:description": "National identifier for Sweden", "base": "string", "format": "\d{6}-\d{4}" }
The general property for specific table attributes is tableSchema. Details on the columns is specified by columns. Per column, a title, a description and details about the datatype are fixed. For example, the first column has the title name and the dc:description entry gives further information on what name actually means. The datatype for all entries in the first column is string. For possible further pre-defined datatypes, see the Metadata Vocabulary for Tabular Data. The specification of true for required leads to an error message if the corresponding entry in the csv file is empty. In the entry for the second column, the format properties list allowed entries for the values in the second column. If there is any entry other than active, inactive or liquidation, an error will be reported. The entries of the third column have the datatype date, so entries must comply with the ISO 8601 standard YYYY-MM-DD. Finally, the national identifier for organizations in Sweden is listed in the fourth column. It consists of 6 digits, followed by a dash, followed by another 4 digits. The format statements allows to specifies patterns of such type with the help of regular expressions as shown in the example. Indeed, the @type ties each entry to the wikidata entry P6460 and in that way defines that all entries are Swedish organizational numbers.
The next three columns contain street information and relate to the schema.org vocabulary to specify a street address, the name of the municipality and the postal code of the initiative. The corresponding entries for the metadata file are
{ "titles": "Street address", "dc:description": "Street address of the initiative", "datatype": "string", "@type": "schema:streetAddress" },{ "titles": "Municipality", "dc:description": "Municipality where the initiative is located", "datatype": "string", "@type": "schema:streetLocality" },{ "titles": "Postal code", "dc:description": "Postal code of the location of the initiative", "datatype": "string", "@type": "schema:postalCode" }