CSVW for energy datasets

From EERAdata Wiki
Revision as of 06:18, 17 October 2022 by Valerias (talk | contribs)
Jump to: navigation, search

This page collects examples of improving the FAIRness of datasets using the csv extension csvw. Most importantly, csv on the web offers a possibility to tie together metadata and data, starting from a well-known and widely used data format. The standard offers a rich framework to annotate existing csv documents with additional information and transform them into other forms of structured data exchange formats such as JSON(-ld) and RDF. At the same time, csv on the web is user-friendly offering a flexible mechanism from minimal FAIR extensions to elaborated context building for the data to be shared. CSV on the web is a W3C recommendation in coherence with ...

FAIR principles

The example illustrates how csv on the web contributes to realize the FAIR principles

To be Findable:

F1. (meta)data are assigned a globally unique and eternally persistent identifier.

F2. data are described with rich metadata.

F3. (meta)data are registered or indexed in a searchable resource.

F4. metadata specify the data identifier.

To be Accessible:

A1 (meta)data are retrievable by their identifier using a standardized communications protocol.

A1.1 the protocol is open, free, and universally implementable.

A1.2 the protocol allows for an authentication and authorization procedure, where necessary.

A2 metadata are accessible, even when the data are no longer available.

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles.

I3. (meta)data include qualified references to other (meta)data.

To be Re-usable:

R1. (meta)data have a plurality of accurate and relevant attributes.

R1.1. (meta)data are released with a clear and accessible data usage license.

R1.2. (meta)data are associated with their provenance.

R1.3. (meta)data meet domain-relevant community standards.

Worked example

We start with a csv file whose contents are shown in this table:

name status year of foundation national identifier street address city postal code C/O lat lon website activity national industrial sector classification (if no other information on area of activity available) purpose (original language) purpose (translation) date of removal country code legal form
Kvarkenvinden 1 active 1998-01-27 769602-8096 Norra Obbolavägen 89 Umeå 904 22 63.80667 20.27364 http://kvarkenvinden.se wind onshore 35110 Föreningen har till ändamål att främja sina medlemmars ekonomiska intressen och dess miljöintresse genom att utöva driftsansvar över vindkraftverk i syfte att tillhandahålla vindenergi för medlemmarnas konsumtion. All genom föreningen genererad vindenergi ägs av medlemmarna. The purpose of the association is to promote the financial interests of its members and its environmental interests by exercising operational responsibility for wind turbines in order to provide wind energy for the members' consumption. All wind energy generated by the association is owned by the members. SWE C61P
Ollebacken vind ekonomiska förening active 2008-01-08 769618-1010 SIKÅS NORRA BYVÄGEN 180 Hammerdal 833 49 63.67432 15.06297 https://www.ollebackenvind.se wind onshore 35110 Föreningen har till ändamål att främja medlemmarans ekonomiska intressen genom att i egen regi producera miljö vänlig energi. The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. SWE C61P
Jamtkulingen ekonomiska förening active 2009-01-20 769619-7420 Södra Strandvägen 19 A Frösön 832 44 Sven Erik Eriksson 63.17622 14.61152 http://www.jamtkulingen.se/ wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy on their own. SWE C61P
Hällingarna Vind active 2005-08-02 769612-8318 OLLEBACKEN 130 Hammerdal 833 49 63.59838 15.05107 wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig engeri. Medlemmarna deltar i verksamheten som konsumenter. The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. SWE C61P
Offerdalsvind Ekonomiska förening active 2000-08-31 769606-0719 BERGE 718, Offerdal 835 97 63.46154 14.09483 http://www.offerdalsvind.se wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi. Medlemmarna deltar i verksamheten som konsumenter. The purpose of the association is to promote the members' financial interests by producing environmentally friendly areas on their own. The members participate in the business as consumers. SWE C61P
Trärike vindkraft ekonomisk förening liquidation 1996-08-07 769601-6331 VIKINGAVÄGEN 36 Sundsvall 857 41 62.40317 17.26335 http://www.trarikevindkraft.se/index.htm wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intresse genom att förse medlemmarna med egen vindkraft- producerad el och även främja medlemmarnas miljöintresse och vindkraftens utveckling. Föreningen skall bygga upp ett kapital som säkrar uppbyggnad, drift, underhåll och demontering av föreningens vindkraftverk. The purpose of the association is to promote the members 'financial interest by providing the members with their own wind-powered electricity and also promoting the members' environmental interest and the development of wind power. The association will build up a capital that ensures the construction, operation, maintenance and dismantling of the association's wind turbines. SWE C61P
Dala Vindkraft Ekonomisk förening active 2006-02-18 769613-8911 RIKSVÄGEN 15 Rättvik 795 32 60.88933 15.11092 http://dalavind.se/vindandelar-foreningar/dala-vindkraft-ekonomisk-forening/medlemsinformation wind onshore, E-trade 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen, samt deras miljöintresse, genom att tillhandahålla medlemmarna egen vindkraftsproducerad elkraft. The purpose of the association is to promote the members' financial interests, as well as their environmental interests, by providing the members with their own wind-powered electricity. SWE C61P
Vindela active 2004-08-17 769611-2411 BOX 4 Malung 782 21 60.6834 13.71603 http://dalavind.se/vindandelar-foreningar/vindela/ wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. SWE C61P
Äppelbovind active 2000-09-25 769606-1485 BOX 4 Malung 782 21 60.6834 13.71603 http://dalavind.se/vindandelar-foreningar/appelbovind/kontakt/ wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig elkraft. The purpose of the association is to promote the members' financial interests by producing environmentally friendly electricity on their own. SWE C61P
Fjällbergsvind ekonomisk förening liquidation 2005-09-13 769613-0587 Djupuddsvägen 35 Grängesberg 772 40 60.08136 14.98449 http://dalavind.se/vindandelar-foreningar/fjallbergs-vind-ekonomiskforening wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkrafts- producerad elkraft. The purpose of the association is to promote the members' financial interests by providing the members with their own wind power produced electricity. SWE C61P
Kyrkvinden ekonomiska förening active 2005-05-09 769613-0025 GIMOGATAN 6 B 3TR Uppsala 752 20 59.8687 17.6083 https://www.kyrkvinden.se wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att förmedla och i egen regi eller genom samarbetspartner producera miljövänlig elkraft. The purpose of the association is to promote the members' financial interests by conveying, on their own account or through partners, environmentally friendly electricity SWE C61P
Ljusterö Vind ekonomiska förening active 2008-04-02 769618-5961 LJUSTERÖ TORG Ljusterö 184 95 59.52403 18.60869 http://www.ljusterovind.se/ wind onshore 35110 Föreningen har till ändamål att främja medlemmarnas ekonomiska intressen genom att i egen regi producera miljövänlig energi samt annan därmed förenlig verksamhet. Medlemmarna deltar i verksamheten som konsumenter. The purpose of the association is to promote the members' financial interests by producing environmentally friendly energy and other related activities on their own behalf. The members participate in the business as consumers. SWE C61P
Windy ekonomisk förening active 2000-12-11 769606-4802 SVARTEDALSBACKEN 9 Lerum 443 39 Mattias Skjöldebrandt 57.76418 12.26767 http://windy-vindkraft.se/ wind onshore 35110 Föreningen har till ändamål att främja medlemmarna ekonomiska intressen genom att tillhandahålla medlemmarna egen vindkraft- producerad el, därigenom också främjande medlemmarnas intresse för miljö och energihushållning samt bedriva därmed förenlig verksamhet. The purpose of the association is to promote the members 'financial interests by providing the members with their own electricity produced by wind power, thereby also promoting the members' interest in the environment and energy management, and conducting compatible activities therewith. SWE C61P

The csv file contains information on the name of the initiatives, its legal status, its year of foundation, its national identifier, its street address, the city it is located in, the corresponding postal code, a possible C/O information, a latitude geo-information of the location (lat), a longitude geo-information of the location (lon), the website of the initiatives, some information for activities, a national industrial sector classification, a purpose statement in original language, the same purpose statement translated to English, the date of removal, the country code, and its legal form. To relate metadata information to this information in the csv file, we create a second file containing this metadata. The file format for this metadata information file is json. Let us assume that the csv file itself has the filename "SWE_initiatives_sample.csv". According to the csv on the web standard, the metadata file should have the filename "SWE_initiatives_sample.csv-metadata.json". A minimal form of the metadata file contains the following information

  {
     "@context": "http://www.w3.org/ns/csvw",
     "url": "SWE_initiatives_sample.csv"
  }    
 

The @context information links to the language conventions of the csvw standard, the url information states the filename of the csv file. This minimal file can be extended to contain more specific metadata. All entries are encoded in the form of property specifications and corresponding values.

General information about the csv file

In a first step, we include general information about the csv file. We start with a code snippet for specifics such as title, description, and creator

   {
    "@context": "http://www.w3.org/ns/csvw",
    "url": "SWE_initiatives_sample.csv"
    "dc:title": "Example - list of citizen-led initiatives in Sweden",
    "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web",
    "dc:creator": {
	"schema:name": "August Wierling",
	"schema:url": "https://orcid.org/0000-0002-7443-7593",
	"schema:contactPoint": { "email": "augustw@hvl.no}
    }
 

As in the example above, property specifications can be terms from popular metadata vocabularies. E.g. the Dublin Core, schema.org, or DCAT vocabulary can be used. All of these vocabularies can be used independently or together. In the above example, metadata terms from the Dublin core vocabulary are mixed with terms from schema.org. The title of the csv file, and its description are stated using the Dublin core terms. The information inside of the dc:creator term contains information which in turn is specified using the schema.org vocabulary. Information about the creator is given in more detail specifying a human-readable name of the creator, a url of the creator (here: his orcid number), and contact point details such as the email. The contact point information can be extended using also a telephone or a fax number. We continue with a more extensive list of details about the file as a whole

    "@context": "http://www.w3.org/ns/csvw",
    "url": "SWE_initiatives_sample.csv",
    "dc:title": "Example - list of citizen-led initiatives in Sweden",
    "dc:description": "List of citizen-led initiatives in Sweden, example dataset to be used for illustrating the use of csv on the web",
    "dc:date": "2022-10-07",
    "dc:format": "text/csv",
    "dc:language": "en-US",
    "dc:publisher": {
	"schema:name": "EERAdata project",
	"schema:url": "https://cordis.europa.eu/project/id/883823",
	"schema:contactPoint": {
	    "email": "info@eeradata.eu",
	    "url": "https://www.eeradata.eu"
	}
    }	
    "dc:rights": "https://creativecommons.org/licenses/by-sa/4.0/",
    "dc:subject": "Energy communities, Sweden, Community energy, Energy cooperatives, Renewable Energy",
    "dc:source": {
	"schema:name": "ENBP Inventory \"Energy by people\" - First Europe-wide inventory on energy communities",
	"schema:url": "https://doi.org/10.18710/2CPQHQ"
    }, 	
    "dc:type": "dataset",
    "dc:creator": {
	"schema:name": "August Wierling",
	"schema:url": "https://orcid.org/0000-0002-7443-7593",
	"schema:contactPoint": { "email": "augustw@hvl.no}
    },
    "dc:coverage": "Sweden",
    "dc:identifier": "https://eeradata-platform.eu/" 
 

The date follows ISO 8601. The language is specified following RFC 4646. For the media type, RFC 7111 has being used as a specification. The type is taken according to the DCMI type vocabulary. The dc:publisher information has several details which are grouped into one object by curly brackets: the EERAdata project as the name of the publisher, the corresponding CORDIS entry as a persistent identifier, and contact information in form of an email and a website. The entry for dc:rights contains the license information and points to a website provided by the creative commons organization. It states that the csv file is licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). As such, anybody is free to share and adapt the file. The dc:subject contains a list of keywords describing the contents of the csv file in more details. The subject information should be more extensive in a real example. Here, only a basic example is given. The dc:type information declares the csv file information as a dataset according to the possible types listed by the DCMI type element working draft. The dc:identifier holds as DCMI describes it an unambiguous reference to the resource within a given context. Ideally, the resource is the final FAIRified object. Thus, it does apply to the json file created out of the original csv file and its json metadata document. Best practice is to assign a persistent identifier.

Now, how does this contribute to make the original csv FAIR fair ?

Specifying information about table headers

This section describes how to specify further the entries in the various columns of the csv file. Note that the csv on the web standard allow to connect the metadata file in json to several csv files which share certain layout. For our puroposes here, we focus on a single table - the one illustrated above with the information on Swedish energy coooperatives. Before describing a full-fletched description for all the columns, we start with the first four columns from the left specifying the name of the initiative, its legal status, the year of foundation, and a national identifier. We start with a simple set of specifications before assigning more information to the columns. For more information, please also see the primer as well as the recommendation itself.

    "tableSchema": {
	"columns": [{
	    "titles": "name",
            "dc:description": "Name of the initiative",
            "datatype": "string",
            "required": true
        },{
	    "titles": "status",
            "dc:description": "Legal status",
            "datatype": {
                "base": "string",
                "format": "active|inactive|liquidation"
            },
            "required": true
        },{
	    "titles": "year of foundation",
            "dc:description": "Year of foundation of the initiative",
            "datatype": "date"
        },{
	    "titles": "national identifier",
            "datatype": {
               "propertyURL": "https://www.wikidata.org/wiki/Property:P6460",
               "dc:title": "National identifier for Sweden",
               "dc:description": "National identifier for Sweden",
               "base": "string",
               "format": "\d{6}-\d{4}"
        }
 

The general property for specific table attributes is tableSchema. Details on the columns is specified by columns. Per column, a title, a description and details about the datatype are fixed. For example, the first column has the title name and the dc:description entry gives further information on what name actually means. The datatype for all entries in the first column is string. For possible further pre-defined datatypes, see the Metadata Vocabulary for Tabular Data. The specification of true for required leads to an error message if the corresponding entry in the csv file is empty. In the entry for the second column, the format properties list allowed entries for the values in the second column. If there is any entry other than active, inactive or liquidation, an error will be reported. The entries of the third column have the datatype date, so entries must comply with the ISO 8601 standard YYYY-MM-DD. Finally, the national identifier for organizations in Sweden is listed in the fourth column. It consists of 6 digits, followed by a dash, followed by another 4 digits. The format statements allows to specifies patterns of such type with the help of regular expressions as shown in the example. Indeed, the propertyURL ties each entry to the wikidata entry P6460 and in that way defines that all entries are Swedish organizational numbers.

The next three columns contain street information and relate to the schema.org vocabulary to specify a street address, the name of the municipality and the postal code of the initiative. The corresponding entries for the metadata file are

         {
            "titles": "Street address",
            "dc:description": "Street address of the initiative",
            "datatype": "string",
	    "propertyURL": "schema:streetAddress"
         },{
            "titles": "city",
            "dc:description": "Municipality where the initiative is located",
            "datatype": "string",
	    "propertyURL": "schema:streetLocality"
        },{
	    "titles": "postal code",
            "dc:description": "Postal code of the location of the initiative",
	    "datatype": "string",
	    "propertyURL": "schema:postalCode"
        }
 

Note that for the case of Sweden, the format property can further be used to define allowed patterns for street addresses and postal codes. The location of the headquarter of the initiative is reported also in terms of geo-coordinates in the csv file. The column entitled lat contains information on latitudes, while the column entitled lon holds longitudes.

Additional specifications in terms of type can be given to the columns on the year of foundation and the year of dissolution. Here, schema.org offers a definition and corresponding entries such as

 
            "propertyURL": "schema:foundingDate",
            "propertyURL": "schema:dissolutionDate",
 

would serve as a means of specification.

How to test the metadata document?

Resources