Analyse data
A proper understanding of the data is essential for carrying out any FAIRification activity. If the data are own data or coming from an in-house activity, such an understanding may come easily. But if the data are provided by a third party, a detailed analysis might be necessary. This step analyses the data to support the FAIRification step. Issues that may be considered here are:
- Are the data structured following a common framework?
- Do the data meet the intended formatting?
- Are data missing?
- Are there licensing issues that may prevent them to be used at all, even to a restricted community? OpenAIRE offers a report Can I reuse someone else’s research data? that provides guidance on these issues.
- Is the data in a non-proprietary format such as txt, csv, jpg, png etc.?
- Are some FAIR features already existing in the data such as persistent identifiers? If the data are extensive, running a (semi-) automatic FAIR assessment tool is helpful.
Here is a list of tools:
Having the data already in structured way such as following the tidy data framework proposed by Wickham [1] is helpful. He considers data which consists of several observations of some variables. The data are organized in a table such as a csv file. He suggests that every column represents a variable, every row corresponds to an observation, and every cell has just a single value. In case of interlinked data, the csv files can be linked by primary and secondary keys. This will lead so a so-called normalized set of data.
General suggestions and practises on handling research data are compiled e.g. at the Princeton University Library
References
- ↑ Wickham, H. Tidy data. The Journal of Statistical Software 59(10), 1-23 (2014) doi:10.18637/jss.v059.i10