FAIR - Reusable
- 1 Data should be accompanied by enough information on how it was collected or processed, as to guarantee its quality and hence make it usable by other. It should have a license that allows and facilitates reuse
Data should be accompanied by enough information on how it was collected or processed, as to guarantee its quality and hence make it usable by other. It should have a license that allows and facilitates reuse
Data has a detailed provenance
Provenance indicates the history of your data, so it should include as much information as possible on the code, datasets and processes used to produce the data. Provenance can be recorded as a separate document, but it is often composed of several elements. The metadata attached to the publication is part of the provenance, as is metadata available in the data files, any relevant technical report, links to source code and data documentation and other references. They can all contribute to your data provenance.
Provenance is central to data reproducibility and hence to build trust in the data.
Data has a license
A dataset without a license cannot be used. A potential user would have to contact the owner and ask for permission to use the data. A license tells immediately to a user what can be done with the data.
The license should be clear, it is always better to use an internationally recognised license, rather than a custom one. Widely used licenses are easily recognised by other users, so they know what the license cover without having to read it. These licenses are also more machine-readable as software to run queries on repositories will recognised them.
Data uses community standards
Data that uses file formats, standards and conventions used by the community are more reusable. Applying discipline conventions makes the data more readable both by other researchers, and by software developed for the same community. Discipline specific software modules often adopt the same conventions and make assumptions on how data might be structured, or variables named.
Using accepted vocabularies, for example to name variables, reduces the risk of the data being misinterpreted and misused.