Analyse data

From EERAdata Wiki
Jump to: navigation, search

A proper understanding of the data is essential for carrying out any FAIRification activity. If the data are own data or coming from an in-house activity, such an understanding may come easily. But if the data are provided by a third party, a detailed analysis might be necessary. This step analyses the data to support the FAIRification step. Issues that may be considered here are:

  • Are the data structured following a common framework?
  • Do the data meet the intended formatting?
  • Are data missing?
  • Are there licensing issues that may prevent them to be used at all, even to a restricted community? OpenAIRE offers a report Can I reuse someone else’s research data? that provides guidance on these issues.
  • Is the data in a non-proprietary format such as txt, csv, jpg, png etc.?
  • Are some FAIR features already existing in the data such as persistent identifiers? If the data are extensive, running a (semi-) automatic FAIR assessment tool is helpful.

Here is a list of tools:

Having the data already in structured way such as following the tidy data framework proposed by Wickham [1] is helpful. He considers data which consists of several observations of some variables. The data are organized in a table such as a csv file. He suggests that every column represents a variable, every row corresponds to an observation, and every cell has just a single value. In case of interlinked data, the csv files can be linked by primary and secondary keys. This will lead so a so-called normalized set of data.

General suggestions and practises on handling research data are compiled e.g. at the Princeton University Library

References

  1. Wickham, H. Tidy data. The Journal of Statistical Software 59(10), 1-23 (2014) doi:10.18637/jss.v059.i10