Difference between revisions of "Conventions"

Line 1: Line 1:
  
{{Template:Working_on}} Conventions and standards adopted by a research community are ...     Climate and Forecast Conventions
+
{{Template:Working_on}}
  
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">The [https://cfconventions.org CF conventions]&nbsp;are&nbsp;specifically designed to facilitate the processing and sharing of&nbsp;[https://www.unidata.ucar.edu/software/netcdf/index.html netCDF]&nbsp;files. They are based on the older&nbsp;&nbsp;&nbsp;[https://ferret.pmel.noaa.gov/noaa_coop/coop_cdf_profile.html COARDS conventions], which they extend.&nbsp;The first version v1.0 of the CF Conventions was released in 2003, the current version now&nbsp;(2021) is v1.8. Each new version tries, as much as possible,&nbsp;to be compatible with older versions. The first versions, as the name implied were focusing on climate and forecast data, since the broaden their scope to earth data in general, including observational data.</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Data conventions and standards are an important tool to manage your data in a way it can be easily and effectively shared with others. Conventions help&nbsp;achieving this in two ways:</span></span>
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">formatting the data in a way which is easy for others in the same community to use;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">providing enough information about the data (metadata) in a shared "language" so others can understand the data as it was meant to by its creator</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Conventions are also convenient for anyone using them, as they provide and easy to adopt template for your data and you do not need to invent a new data model with every new project. What's more they help you being consistent and you are less likely to not be able to understand your own data&nbsp;after a few years or even months.&nbsp;</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Some conventions are universally adopted as the metric unit system, and we all use&nbsp;them without even noticing anymore. Others are community specific and are developed to tackle specific data formats and needs of a scientific discipline.</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">To read more about why data standards are important there is this [https://ardc.edu.au/resources/community-endorsed-data-standards/ article from the ARDC].</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The climate research community uses mostly [https://www.unidata.ucar.edu/software/netcdf/ netCDF]</span></span>&nbsp;(Network Common Data Format)&nbsp;<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">as a data format. NetCDF is a self-describing binary format which means that metadata information is stored with the data itself.&nbsp;The&nbsp;Climate and Forecast Conventions (CF Conventions) were developed to standardised this metadata.</span></span>
 +
 
 +
&nbsp;
 +
 
 +
=== '''<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:large;">CF Conventions</span></span>''' ===
 +
 
 +
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">The [https://cfconventions.org CF conventions]&nbsp;are&nbsp;specifically designed to facilitate the processing and sharing of netCDF&nbsp;files. They are based on the older&nbsp;[https://ferret.pmel.noaa.gov/noaa_coop/coop_cdf_profile.html COARDS conventions], which they extend.&nbsp;The first version ( v1.0 ) of the CF Conventions was released in 2003, the current version ( in 2021 ) is v1.8. Each new version tries, as much as possible,&nbsp;to be compatible with older versions. The first versions, as the name implied were focusing on climate and forecast data, since then they broaden their scope to earth data in general, including observational data.</span></span>
  
 
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">CF&nbsp;is now widely adopted as the main standard both in the production of netCDF related code and for the publication of netCDF data. As the initial&nbsp;focus was to allow interoperability of netCDF based software&nbsp;packages, the conventions&nbsp;main aim is to&nbsp;define&nbsp;clearly each variable and&nbsp;the&nbsp;spatial and temporal properties of the data.</span></span>
 
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">CF&nbsp;is now widely adopted as the main standard both in the production of netCDF related code and for the publication of netCDF data. As the initial&nbsp;focus was to allow interoperability of netCDF based software&nbsp;packages, the conventions&nbsp;main aim is to&nbsp;define&nbsp;clearly each variable and&nbsp;the&nbsp;spatial and temporal properties of the data.</span></span>
  
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">As a consequence, applying these Conventions to your netCDF files makes them more&nbsp;re-usable. &nbsp;Most&nbsp;software used in Claimte science will know how to open and process correctly the files.&nbsp;&nbsp;The metadata required&nbsp;will describe clearly&nbsp;the characteristic of the data in the files, making it&nbsp;easier, for a potential user, to identify correctly the variables and compare them to similar data.</span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">As a consequence, applying these Conventions to your netCDF files makes them more&nbsp;re-usable. &nbsp;Most&nbsp;software used in Climate science will know how to open and process correctly the files.&nbsp;The metadata required&nbsp;will describe clearly&nbsp;the characteristic of the data in the files, making it&nbsp;easier, for a potential user, to identify correctly the variables and compare them to similar data.</span></span>
  
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">CF&nbsp;Conventions focus mostly on the variable and dimensions description, the full Conventions document is quite long but in most cases you use the same attributes. This [https://climate-cms.org/2018/10/26/Setting-up-NetCDF-file-attributes.html CMS Blog] provides and example on&nbsp;how to apply them to your data covering the&nbsp;attributes most commonly required.</span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">CF&nbsp;Conventions focus mostly on the variable and dimensions description, the full Conventions document is quite long but, in most cases, you will be using&nbsp;the same attributes. This [https://climate-cms.org/2018/10/26/Setting-up-NetCDF-file-attributes.html CMS Blog] provides an&nbsp;example on&nbsp;how to apply them to your data covering the&nbsp;attributes most commonly required.</span></span>
  
 
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">Important elements of the Conventions are:</span></span>
 
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">Important elements of the Conventions are:</span></span>
  
 
*<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">the&nbsp;UDUNITS&nbsp;packages&nbsp;for units standards</span></span>  
 
*<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">the&nbsp;UDUNITS&nbsp;packages&nbsp;for units standards</span></span>  
*<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">the [https://cfconventions.org/Data/cf-standard-names/77/build/cf-standard-name-table.html standard_name]&nbsp;whose scope is to provide a common terminology for variables names.&nbsp;For example, every variable with the&nbsp;standard_name '''''air_temperature'''''&nbsp;can be defined&nbsp;as "Air temperature is the bulk temperature of the air, not the surface (skin) temperature." with K or equivalent units, regardless of the way the actual variable name in the file. Standard_name is a very useful attribute but should be applied with attention. It is better to leave it out if a suitable one is not available.</span></span>  
+
*<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">the [https://cfconventions.org/Data/cf-standard-names/77/build/cf-standard-name-table.html standard_name]&nbsp;whose scope is to provide a common terminology for variables names.&nbsp;For example, every variable with the&nbsp;standard_name '''''air_temperature'''''&nbsp;can be defined&nbsp;as "''Air temperature is the bulk temperature of the air, not the surface (skin) temperature.''" with K or equivalent units, regardless of the way the actual variable name in the file. Standard_name is a very useful attribute but should be applied with attention. It is better to leave it out if a suitable one is not available.</span></span>  
  
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">&nbsp;There are various tools available to help you check your files against a version of the CF Conventions. We covered some in this wiki page:&nbsp;[[CF_checker|CF&nbsp;checker]] &nbsp;</span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">There are various tools available to help you check your files against a version of the CF Conventions. We covered some in this wiki page:&nbsp;[[CF_checker|CF&nbsp;checker]] &nbsp;</span></span>
  
 
'''<span style="font-size:large;"><span style="font-family:Arial,Helvetica,sans-serif;">Attribute Convention for Data Discovery&nbsp;</span></span>'''&nbsp;
 
'''<span style="font-size:large;"><span style="font-family:Arial,Helvetica,sans-serif;">Attribute Convention for Data Discovery&nbsp;</span></span>'''&nbsp;

Revision as of 20:30, 1 June 2021

Template:Working on New page under construction

Data conventions and standards are an important tool to manage your data in a way it can be easily and effectively shared with others. Conventions help achieving this in two ways:

  • formatting the data in a way which is easy for others in the same community to use;
  • providing enough information about the data (metadata) in a shared "language" so others can understand the data as it was meant to by its creator

Conventions are also convenient for anyone using them, as they provide and easy to adopt template for your data and you do not need to invent a new data model with every new project. What's more they help you being consistent and you are less likely to not be able to understand your own data after a few years or even months. 

Some conventions are universally adopted as the metric unit system, and we all use them without even noticing anymore. Others are community specific and are developed to tackle specific data formats and needs of a scientific discipline.

To read more about why data standards are important there is this article from the ARDC.

The climate research community uses mostly netCDF (Network Common Data Format) as a data format. NetCDF is a self-describing binary format which means that metadata information is stored with the data itself. The Climate and Forecast Conventions (CF Conventions) were developed to standardised this metadata.

 

CF Conventions

The CF conventions are specifically designed to facilitate the processing and sharing of netCDF files. They are based on the older COARDS conventions, which they extend. The first version ( v1.0 ) of the CF Conventions was released in 2003, the current version ( in 2021 ) is v1.8. Each new version tries, as much as possible, to be compatible with older versions. The first versions, as the name implied were focusing on climate and forecast data, since then they broaden their scope to earth data in general, including observational data.

CF is now widely adopted as the main standard both in the production of netCDF related code and for the publication of netCDF data. As the initial focus was to allow interoperability of netCDF based software packages, the conventions main aim is to define clearly each variable and the spatial and temporal properties of the data.

As a consequence, applying these Conventions to your netCDF files makes them more re-usable.  Most software used in Climate science will know how to open and process correctly the files. The metadata required will describe clearly the characteristic of the data in the files, making it easier, for a potential user, to identify correctly the variables and compare them to similar data.

CF Conventions focus mostly on the variable and dimensions description, the full Conventions document is quite long but, in most cases, you will be using the same attributes. This CMS Blog provides an example on how to apply them to your data covering the attributes most commonly required.

Important elements of the Conventions are:

  • the UDUNITS packages for units standards
  • the standard_name whose scope is to provide a common terminology for variables names. For example, every variable with the standard_name air_temperature can be defined as "Air temperature is the bulk temperature of the air, not the surface (skin) temperature." with K or equivalent units, regardless of the way the actual variable name in the file. Standard_name is a very useful attribute but should be applied with attention. It is better to leave it out if a suitable one is not available.

There are various tools available to help you check your files against a version of the CF Conventions. We covered some in this wiki page: CF checker  

Attribute Convention for Data Discovery  

Other conventions specific to sub-domains

  Land :  https://www.lmd.jussieu.fr/~polcher/ALMA/