Data conventions and standards are an important tool to manage your data in a way it can be easily and effectively shared with others. Conventions help achieving this in two ways:
- formatting the data in a way which is easy for others in the same community to use;
- providing enough information about the data (metadata) in a shared "language" so others can understand the data as it was meant to by its creator
Conventions are also convenient for anyone using them, as they provide and easy to adopt template for your data and you do not need to invent a new data model with every new project. What's more they help you being consistent, and you are less likely to mis-interpret your own data later on.
Some conventions are universally adopted as the metric unit system, and we all use them without even noticing anymore. Others are community specific and are developed to tackle specific data formats and needs of a scientific discipline.
To read more about why data standards are important there is this article from the ARDC.
The climate research community uses mostly netCDF (Network Common Data Format) as a data format. NetCDF is a self-describing binary format which means that metadata information is stored with the data itself. The Climate and Forecast Conventions (CF Conventions) were developed to standardised this metadata.
The CF conventions are specifically designed to facilitate the processing and sharing of netCDF files. They are based on the older COARDS conventions, which they extend. The first version ( v1.0 ) of the CF Conventions was released in 2003, the current version ( in 2021 ) is v1.8. Each new version tries, as much as possible, to be compatible with older versions. The first versions, as the name implied were focusing on climate and forecast data, since then they broaden their scope to earth data in general, including observational data.
CF is now widely adopted as the main standard both in the production of netCDF related code and for the publication of netCDF data. As the initial focus was to allow interoperability of netCDF based software packages, the conventions main aim is to define clearly each variable and the spatial and temporal properties of the data.
As a consequence, applying these Conventions to your netCDF files makes them more re-usable. Most software used in Climate science will know how to open and process correctly the files. The metadata required will describe clearly the characteristic of the data in the files, making it easier, for a potential user, to identify correctly the variables and compare them to similar data.
CF Conventions focus mostly on the variable and dimensions description, the full Conventions document is quite long but, in most cases, you will be using the same attributes. This CMS Blog provides an example on how to apply them to your data covering the attributes most commonly required.
Important elements of the Conventions are:
- the UDUNITS2 packages for units' standards
- the standard_name whose scope is to provide a common terminology for variables names. For example, every variable with the standard_name air_temperature can be defined as "Air temperature is the bulk temperature of the air, not the surface (skin) temperature." with K or equivalent units, regardless of the way the actual variable name in the file. Standard_name is a very useful attribute but should be applied with attention. It is better to leave it out if a suitable one is not available.
There are various tools available to help you check your files against a version of the CF Conventions. We covered some in this wiki page: CF checker
Attribute Convention for Data Discovery
Other conventions specific to sub-domains