CSVW for energy datasets
This page collects examples of improving the FAIRness of datasets using the csv extension csvw. Most importantly, csv on the web offers a possibility to tie together metadata and data, starting from a well-known and widely used data format. The standard offers a rich framework to annotate existing csv documents with additional information and transform them into other forms of structured data exchange formats such as JSON(-ld) and RDF. At the same time, csv on the web is user-friendly offering a flexible mechanism from minimal FAIR extensions to elaborated context building for the data to be shared. CSV on the web is a W3C recommendation in coherence with ...
Contents
FAIR principles
The example illustrates how csv on the web contributes to realize the FAIR principles
To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
To be Accessible:
A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
To be Re-usable:
R1. (meta)data have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
Worked example: citizen-led initiatives
We start with a csv file whose contents are shown in this table:
name | status | year of foundation | national identifier | street address | city | postal code | C/O | lat | lon | website | activity | national industrial sector classification (if no other information on area of activity available) | purpose (original language) | purpose (translation) | date of removal | country code | legal form |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kvarkenvinden 1 | active | 1998-01-27 | 769602-8096 | Norra Obbolavägen 89 | Umeå | 904 22 | 63.80667 | 20.27364 | http://kvarkenvinden.se | wind onshore | 35.110 | Föreningen har till ändamål att främja sina medlemmars ekonomiska intressen och dess miljöintresse genom att utöva driftsansvar över vindkraftverk i syfte att tillhandahålla vindenergi för medlemmarnas konsumtion. All genom föreningen genererad vindenergi ägs av medlemmarna. | The purpose of the association is to promote the financial interests of its members and its environmental interests by exercising operational responsibility for wind turbines in order to provide wind energy for the members' consumption. All wind energy generated by the association is owned by the members. | SWE | C61P | ||
Ollebacken vind ekonomiska förening | active | 2008-01-08 | 769618-1010 | SIKÅS NORRA BYVÄGEN 180 | Hammerdal | 833 49 | 63.67432 | 15.06297 | https://www.ollebackenvind.se | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarans ekonomiska intressen genom att i egen regi producera miljö vänlig energi. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. | SWE | C61P | ||
Jamtkulingen ekonomiska förening | active | 2009-01-20 | 769619-7420 | Södra Strandvägen 19 A | Frösön | 832 44 | Sven Erik Eriksson | 63.17622 | 14.61152 | http://www.jamtkulingen.se/ | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. | SWE | C61P | |
Hällingarna Vind | active | 2005-08-02 | 769612-8318 | OLLEBACKEN 130 | Hammerdal | 833 49 | 63.59838 | 15.05107 | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig engeri. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. | SWE | C61P | |||
Offerdalsvind Ekonomiska förening | active | 2000-08-31 | 769606-0719 | BERGE 718, | Offerdal | 835 97 | 63.46154 | 14.09483 | http://www.offerdalsvind.se | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. | SWE | C61P | ||
Trärike vindkraft ekonomisk förening | liquidation | 1996-08-07 | 769601-6331 | VIKINGAVÄGEN 36 | Sundsvall | 857 41 | 62.40317 | 17.26335 | http://www.trarikevindkraft.se/index.htm | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intresse genom att förse medlemmarna med egen vindkraft- producerad el och även främja medlemmarnas miljöintresse och vindkraftens utveckling. Föreningen skall bygga upp ett kapital som säkrar uppbyggnad, drift, underhåll och demontering av föreningens vindkraftverk. | The purpose of the association is to promote the members 'financial interest by providing the members with their own wind-powered electricity and also promoting the members' environmental interest and the development of wind power. The association will build up a capital that ensures the construction, operation, maintenance and dismantling of the association's wind turbines. | SWE | C61P | ||
Dala Vindkraft Ekonomisk förening | active | 2006-02-18 | 769613-8911 | RIKSVÄGEN 15 | Rättvik | 795 32 | 60.88933 | 15.11092 | http://dalavind.se/vindandelar-foreningar/dala-vindkraft-ekonomisk-forening/medlemsinformation | wind onshore, E-trade | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen, samt deras miljöintresse, genom att tillhandahålla medlemmarna egen vindkraftsproducerad elkraft. | The purpose of the association is to promote the members' financial interests, as well as their environmental interests, by providing the members with their own wind-powered electricity. | SWE | C61P | ||
Vindela | active | 2004-08-17 | 769611-2411 | BOX 4 | Malung | 782 21 | 60.6834 | 13.71603 | http://dalavind.se/vindandelar-foreningar/vindela/ | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. | SWE | C61P | ||
Äppelbovind | active | 2000-09-25 | 769606-1485 | BOX 4 | Malung | 782 21 | 60.6834 | 13.71603 | http://dalavind.se/vindandelar-foreningar/appelbovind/kontakt/ | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. | SWE | C61P | ||
Fjällbergsvind ekonomisk förening | liquidation | 2005-09-13 | 769613-0587 | Djupuddsvägen 35 | Grängesberg | 772 40 | 60.08136 | 14.98449 | http://dalavind.se/vindandelar-foreningar/fjallbergs-vind-ekonomiskforening | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkrafts- producerad elkraft. | The purpose of the association is to promote the members' financial interests by providing the members with their own wind power produced electricity. | SWE | C61P | ||
Kyrkvinden ekonomiska förening | active | 2005-05-09 | 769613-0025 | GIMOGATAN 6 B 3TR | Uppsala | 752 20 | 59.8687 | 17.6083 | https://www.kyrkvinden.se | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att förmedla och i egen regi eller genom samarbetspartner producera miljövänlig elkraft. | The purpose of the association is to promote the members' financial interests by conveying, on their own account or through partners, environmentally friendly electricity | SWE | C61P | ||
Ljusterö Vind ekonomiska förening | active | 2008-04-02 | 769618-5961 | LJUSTERÖ TORG | Ljusterö | 184 95 | 59.52403 | 18.60869 | http://www.ljusterovind.se/ | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi samt annan därmed förenlig verksamhet. Medlemmarna deltar i verksamheten som konsumenter. | The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy and other related activities on their own behalf. The members participate in the business as consumers. | SWE | C61P | ||
Windy ekonomisk förening | active | 2000-12-11 | 769606-4802 | SVARTEDALSBACKEN 9 | Lerum | 443 39 | Mattias Skjöldebrandt | 57.76418 | 12.26767 | http://windy-vindkraft.se/ | wind onshore | 35.110 | Föreningen har till ändamål att främja medlemmarna ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkraft- producerad el, därigenom också främjande medlemmarnas intresse för miljö och energihushållning samt bedriva därmed förenlig verksamhet. | The purpose of the association is to promote the members 'financial interests by providing the members with their own electricity produced by wind power, thereby also promoting the members' interest in the environment and energy management, and conducting compatible activities therewith. | SWE | C61P |
The csv file contains information on the name of the initiatives, its legal status, its year of foundation, its national identifier, its street address, the city it is located in, the corresponding postal code, a possible C/O information, a latitude geo-information of the location (lat), a longitude geo-information of the location (lon), the website of the initiatives, some information for activities, a national industrial sector classification, a purpose statement in original language, the same purpose statement translated to English, the date of removal, the country code, and its legal form. To relate metadata information to this information in the csv file, we create a second file containing this metadata. The file format for this metadata information file is json. Let us assume that the csv file itself has the filename "SWE_initiatives_sample.csv". According to the csv on the web standard, the metadata file should have the filename "SWE_initiatives_sample.csv-metadata.json". A minimal form of the metadata file contains the following information
{ "@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv" }
The @context information links to the language conventions of the csvw standard, the url information states the filename of the csv file. This minimal file can be extended to contain more specific metadata. All entries are encoded in the form of property specifications and corresponding values.
General information about the csv file
In a first step, we include general information about the csv file. We start with a code snippet for specifics such as title, description, and creator
{ "@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv" "dc:title": "Example - list of citizen-led initiatives in Sweden", "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web", "dc:creator": { "schema:name": "August Wierling", "schema:url": "https://orcid.org/0000-0002-7443-7593", "schema:contactPoint": { "email": "augustw@hvl.no} }
As in the example above, property specifications can be terms from popular metadata vocabularies. E.g. the Dublin Core, schema.org, or DCAT vocabulary can be used. All of these vocabularies can be used independently or together. In the above example, metadata terms from the Dublin core vocabulary are mixed with terms from schema.org. The title of the csv file, and its description are stated using the Dublin core terms. The information inside of the dc:creator term contains information which in turn is specified using the schema.org vocabulary. Information about the creator is given in more detail specifying a human-readable name of the creator, a url of the creator (here: his orcid number), and contact point details such as the email. The contact point information can be extended using also a telephone or a fax number. We continue with a more extensive list of details about the file as a whole
"@context": "http://www.w3.org/ns/csvw", "url": "SWE_initiatives_sample.csv", "dc:title": "Example - list of citizen-led initiatives in Sweden", "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web", "dc:date": "2022-10-07", "dc:format": "text/csv", "dc:language": "en-US", "dc:publisher": { "schema:name": "EERAdata project", "schema:url": "https://cordis.europa.eu/project/id/883823", "schema:contactPoint": { "email": "info@eeradata.eu", "url": "https://www.eeradata.eu" } } "dc:rights": "https://creativecommons.org/licenses/by-sa/4.0/", "dc:subject": "Energy communities, Sweden, Community energy, Energy cooperatives, Renewable Energy", "dc:source": { "schema:name": "ENBP Inventory \"Energy by people\" - First Europe-wide inventory on energy communities", "schema:url": "https://doi.org/10.18710/2CPQHQ" }, "dc:type": "dataset", "dc:creator": { "schema:name": "August Wierling", "schema:url": "https://orcid.org/0000-0002-7443-7593", "schema:contactPoint": { "email": "augustw@hvl.no} }, "dc:coverage": "Sweden", "dc:identifier": "https://eeradata-platform.eu/"
The date follows ISO 8601. The language is specified following RFC 4646. For the media type, RFC 7111 has being used as a specification. The type is taken according to the DCMI type vocabulary. The dc:publisher information has several details which are grouped into one object by curly brackets: the EERAdata project as the name of the publisher, the corresponding CORDIS entry as a persistent identifier, and contact information in form of an email and a website. The entry for dc:rights contains the license information and points to a website provided by the creative commons organization. It states that the csv file is licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). As such, anybody is free to share and adapt the file. The dc:subject contains a list of keywords describing the contents of the csv file in more details. The subject information should be more extensive in a real example. Here, only a basic example is given. The dc:type information declares the csv file information as a dataset according to the possible types listed by the DCMI type element working draft. The dc:identifier holds as DCMI describes it an unambiguous reference to the resource within a given context. Ideally, the resource is the final FAIRified object. Thus, it does apply to the json file created out of the original csv file and its json metadata document. Best practice is to assign a persistent identifier.
Now, how does this contribute to make the original csv FAIR fair ?
Specifying information about table headers
This section describes how to specify further the entries in the various columns of the csv file. Note that the csv on the web standard allow to connect the metadata file in json to several csv files which share certain layout. For our puroposes here, we focus on a single table - the one illustrated above with the information on Swedish energy coooperatives. Before describing a full-fletched description for all the columns, we start with the first four columns from the left specifying the name of the initiative, its legal status, the year of foundation, and a national identifier. We start with a simple set of specifications before assigning more information to the columns. For more information, please also see the primer as well as the recommendation itself.
"tableSchema": { "columns": [{ "titles": "name", "dc:description": "Name of the initiative", "datatype": "string", "required": true },{ "titles": "status", "dc:description": "Legal status", "datatype": { "base": "string", "format": "active|inactive|liquidation" }, "required": true },{ "titles": "year of foundation", "dc:description": "Year of foundation of the initiative", "datatype": "date" },{ "titles": "national identifier", "datatype": { "propertyURL": "https://www.wikidata.org/wiki/Property:P6460", "dc:title": "National identifier for Sweden", "dc:description": "National identifier for Sweden", "base": "string", "format": "\d{6}-\d{4}" }
The general property for specific table attributes is tableSchema. Details on the columns is specified by columns. Per column, a title, a description and details about the datatype are fixed. For example, the first column has the title name and the dc:description entry gives further information on what name actually means. The datatype for all entries in the first column is string. For possible further pre-defined datatypes, see the Metadata Vocabulary for Tabular Data. The specification of true for required leads to an error message if the corresponding entry in the csv file is empty. In the entry for the second column, the format properties list allowed entries for the values in the second column. If there is any entry other than active, inactive or liquidation, an error will be reported. The entries of the third column have the datatype date, so entries must comply with the ISO 8601 standard YYYY-MM-DD. Finally, the national identifier for organizations in Sweden is listed in the fourth column. It consists of 6 digits, followed by a dash, followed by another 4 digits. The format statements allows to specifies patterns of such type with the help of regular expressions as shown in the example. Indeed, the propertyURL ties each entry to the wikidata entry P6460 and in that way defines that all entries are Swedish organizational numbers.
The next three columns contain street information and relate to the schema.org vocabulary to specify a street address, the name of the municipality and the postal code of the initiative. The corresponding entries for the metadata file are
{ "titles": "Street address", "dc:description": "Street address of the initiative", "datatype": "string", "propertyURL": "schema:streetAddress" },{ "titles": "city", "dc:description": "Municipality where the initiative is located", "datatype": "string", "propertyURL": "schema:streetLocality" },{ "titles": "postal code", "dc:description": "Postal code of the location of the initiative", "datatype": "string", "propertyURL": "schema:postalCode" }
Note that for the case of Sweden, the format property can further be used to define allowed patterns for street addresses and postal codes. The location of the headquarter of the initiative is reported also in terms of geo-coordinates in the csv file. The column entitled lat contains information on latitudes, while the column entitled lon holds longitudes. Here, schema.org provides also a possibility to link to standards
{ "titles": "lat", "dc:description": "geo location of headquarter of initiative, latitude, WGS84", "datatype": { "base": "number", "minimum": "-90", "maximum": "90" }, "propertyURL": "schema:latitude" }, { "titles": "lon", "dc:description": "geo location of headquarter of initiative, longitude, WGS84", "datatype": { "base": "number", "minimum": "-180", "maximum": "180" }, "propertyURL": "schema:longitude" }
As can be seen form the example, CSVW allows to restrict values for a range of possibilities. Latitudes range between -90 and 90, latitudes between -180 and 180. Using the schema.org definition makes it implicitly clear, that the WGS84 standard is used to describe geo locations.
The next column contains a link to the web presence of the initiative. A minimal way to specify this would be again with the help of schema.org as
{ "titles": "website", "dc:description": "Link to the web presence of the initiative", "propertyURL": "schema:url" }
The column entitled activity contains information about the activity of the initiative. From the general point of view, activities of citizen-led initiatives can be quite broad ranging from electricity and heat generation by different means to distribution activities and energy efficient measures. Again, the task is to find a resource on the web which allows expressing that all entries in this column are activities. The makesOffer property provided by schema.org is a possibility to state this. According to its definition, makesOffer describes 'A pointer to products or services offered by the organization or person.' The specification of the column reads
{ "titles": "activity", "dc:description": "Describes activitites by citizen-led energy initiatives", "propertyURL": "schema:makesOffer" }
Note, that here it is suggested to use a controlled vocabulary from which of the different activities are sourced from. More details will be discussed elsewhere.
The next column contains information specifying the national industrial sector classification which provides information about the type of activities the initiative is engaged with based on a classification of economic activities published by Statistics Sweden, see here. Note, that this information overlaps to some extent with the information offered in the activities column. However, the details which can be expressed in a domain-specific controlled vocabulary are usually much greater that the rather general classification scheme covering the whole national industry sector. On the other hand, initiatives may engage with activities, which are captured in the general scheme but are not contained in a domain-specific vocabulary. The Swedish Standard Industrial Classification is based on the EU’s recommended standards, NACE Rev.2. SNI 2007. It allows however for more detailed specifications. The official codes for activity groups are designated as two digits separated by a dots from three digits. The example here is the code 35.110 which encodes 'Production of electricity'. Similar to the example discussed above, the format of the entry can be specified with the format statement
{ "titles": "national identifier", "datatype": { "aboutURL": "https://www.wikidata.org/wiki/Q2976602", "dc:title": "Swedish Standard Industrial Classification", "dc:description": "Swedish Standard Industrial Classification for activities by the citizen-led initiave", "base": "string", "format": "\d{2}\.\d{3}" }
While wikidata offers resources for the [NACE classification codes] and the Belgium classification code, no resource is available for the Swedish case. As a minimum, wikidata allows a resource for economic classification schemes in general using the resource wikidata:Q2976602 or wikidata:Q27048688. For describing that all values in a column are of a particular type, the csv on the web offers the statement aboutURL.
Additional specifications in terms of type can be given to the columns on the year of foundation and the year of dissolution. Here, schema.org provides a definition and the corresponding entries such as
"propertyURL": "schema:foundingDate", "propertyURL": "schema:dissolutionDate",
would serve as a means of specification.
Worked example: Power plant information
In a second part of this tutorial, we consider data about power plants. Here, a list of wind farms from Germany serve as an example. The data is originally again organized as a csv file containing information such as the name of the power plant, the type of the power plant, a classification of the energy product used as input in the power plant, the location in terms of latitude and longitude, the nameplate capacity, the commissioning year, the decommissioning year, and information on the owner of the power plant. The table here shows six different wind farms with the associated information.
name | type | using energy product | latitude | longitude | nameplate capacity [kW] | commissioning date | decommissioning date | owner |
---|---|---|---|---|---|---|---|---|
Langwedel dritte | onshore wind farm | RA310 | 53.013274 | 9.158455 | 3050 | 2017-12-29 | Bürger Energie Bremen | |
WEA Kammerberg | onshore wind farm | RA310 | 48.387257 | 11.518869 | 3000 | 2015-11-03 | Bürger Energie Genossenschaft Freisinger Land | |
Windpark Söhrewald / Niestetal | onshore wind farm | RA310 | 51.241938 | 9.518432 | 21525 | 2015-09-19 | Bürger Energie Kassel & Söhre | |
Windpark Rohrberg | onshore wind farm | RA310 | 51.23638 | 9.710966 | 15000 | 2016-03-23 | Bürger Energie Kassel & Söhre | |
Windpark Stiftswald | onshore wind farm | RA310 | 51.245691 | 9.658835 | 27000 | 2017-06-28 | Bürger Energie Kassel & Söhre | |
Windpark Kreuzstein | onshore wind farm | RA310 | 51.274447 | 9.730573 | 24000 | 2019-01-01 | Bürger Energie Kassel & Söhre |
As before, the csv file is supplemented by metadata using a json metadata file. Following the naming convention of ...
In a first step, we again consider metadata which relate to the csv as a whole such as a creator of the file, access rights for the entire file etc. The corresponding part of the metadata file may look like this
"@context": "http://www.w3.org/ns/csvw", "url": "DEU_powerPlants_sample.csv", "dc:title": "Example - list of power plants in Germany", "dc:description": "List of power plants in Germany owned by citizen-led initiatives, example dataset to be used for illustrating the use of csv on the web", "dc:date": "2022-10-27", "dc:format": "text/csv", "dc:language": "en-US", "dc:publisher": { "schema:name": "EERAdata project", "schema:url": "https://cordis.europa.eu/project/id/883823", "schema:contactPoint": { "email": "info@eeradata.eu", "url": "https://www.eeradata.eu" } } "dc:rights": "https://creativecommons.org/licenses/by-sa/4.0/", "dc:subject": "Energy communities, Germany, Community energy, Energy cooperatives, Renewable Energy, power plants", "dc:source": { "schema:name": "ENBP Inventory \"Energy by people\" - First Europe-wide inventory on energy communities", "schema:url": "https://doi.org/10.18710/2CPQHQ" }, "dc:type": "dataset", "dc:creator": { "schema:name": "August Wierling", "schema:url": "https://orcid.org/0000-0002-7443-7593", "schema:contactPoint": { "email": "augustw@hvl.no} }, "dc:coverage": "Germany", "dc:identifier": "https://eeradata-platform.eu/"
As before, we continue by providing information about the contents of the columns in the csv file. We start with the name of the wind farm being listed in the first column. It can be referenced with schema:name which according to schema.org is a property assigning a 'name' to a 'thing'. Alternatively, here rdfs:label can be used for this purpose. The code snippet for describing the left most column would look like this:
"tableSchema": { "columns": [{ "titles": "name", "dc:description": "Name of the power plant", "datatype": "string", "propertyURL": "schema:name" } ] }
For the commissioning date, the entry looks like this
"tableSchema": { "columns": [{ "titles": "commissioning date", "dc:description": "Commissioning date of the power plant", "datatype": "date", "propertyURL": "wikidata:P729" } ] }
Here, the commissioning date is linked to the property wikidata:P729.