FAIR - Accessible
Data should have open access whenever possible and be available through stardardised protocols
Open Access means that there should be no licensing restriction to accessing and using the data. The data should be available for free, for any use and the only requirement should be for a user to attribute the data to the author/s. The data should be available to anyone.
Open Access does not mean unlicensed data, in fact a a license is a way to give access to a data, you cannot use unlicensed dataset without asking permission first.
Applying an embargo period is also acceptable, as eventually the data will be available, it is important however to set a date on which the embargo will terminate.
If you are working with sensitive data for either security or ethical reasons, then it is acceptable to control access to the data.
Usually this can be obtained in two ways:
- By withdrawing the sensitive part of the data, a process referred to as de-identification, where you remove the information that could identify people or places.
- By applying a mediated access, the information on the data is available to anyone but to gain access to the actual data a user has to contact the data owner or the repository curator. In this case you should be clear on why the access is mediated and which requirements a user has to satisfy to gain access.
For Open Access to be truly achieved, you need to make sure that your data is not only available online but also easy to obtain.
Data should be available online in a way which is easy to discover and to download.
You should choose wherever possible a repository which is available to anyone and uses common and free internet protocols.
A simple definition of internet protocols is a "set of rules for a particular type of communication". Widely used protocols are HTTP and FTP, they are the most common ways to make data available online.
However, if the data you want to share is quite big in size, it is worth to publish your data in a repository which uses specialised protocols. Some protocols allow a user to select and download a subset of the data, so they can download exactly what they need. While these protocols are specialised, they are usually common inside a specific community. An example is OPeNDAP which is widely used in the Climate science community.
There are some situations when it might not be possible to have the data online:
- In some cases, particularly with model output, it is not possible to publish all the data, it is important in these cases for the metadata to be available instead. And wherever possible also the codes/tools used to produce the data. The CLEX data policy illustrates some of the situations where this is considered an acceptable alternative.
- Storing data is expensive and eventually a repository might decide to retire older data, while references to the data will still be circulating in papers. In this case it is really important to retain the metadata online and give an alternative way to access the data. This is why a DOI always points to the metadata and never directly to the data itself since the way the data is shared is more likely to change.