Controlled vocabularies

Revision as of 21:48, 17 June 2021 by P.petrelli (talk | contribs)
Template:Working on New page under construction


A controlled vocabulary is an agreed list of terms definitions used to provide a unique label to a concept. Controlled vocabularies are usually discipline related; their main aim is to facilitate sharing of data in the same community. For this reason, is important that the community participate in the development of the vocabulary and agrees to its adoption for them to be useful.

In some case vocabularies have been created in relation to a specific project, and then more widely adopted. As an example, since CMIP is an intercomparison project with modelling groups participating from across the world, it was essential to its success to define and use controlled vocabularies. CMIP6 controlled vocabularies cover many different aspects: experiments, variables, realms, models, sub-projects, frequency, resolution and grid labels. Their definition and labels for variables, frequency and realms are often adopted by other climate data producers.

Another example of controlled vocabulary is the CF conventions standard_name table, anyone can contribute by proposing a definition for variable which are not yet covered.


Controlled vocabularies also provide keywords to use when publishing data. Keywords are a powerful instrument when used properly. They can greatly increase the discoverability of a dataset, which is why it is one of the few highly recommended attributes in the ACDD conventions. Unfortunately, there is not yet an agreed controlled vocabulary to be used specifically for climate science. Lots of climate terms are however covered by the Global Change Master Directory Keywords, maintained by NASA.

There are few categories you should try to cover when assigning keywords:

dataset acronym: if your data is strictly related to another dataset, or your code is applied to a specific dataset

project acronym: if your dataset and or code relates to a specific project 

programming language:  you should add this to your code records and be as specific as possible, for example use python3, rather than just python

data type: observation, model output, etc.

realm or discipline: like ocean, land and/or physical ocanography, climate science etc. For the disciplines you can use the Fields_of_Research from the Bureau of statistics

variable names: if you have many just list the more relevant

spatiotemporal characteristic of the data: frequency, resolution, region covered

Every time you can try to use terms provided in a vocabulary, the GCMD keywords will cover most of the categories listed above. If you are using a speciifc name as for datasets and projects, then use the official acronyms and soecify the versions whenever possible.

Also remember that if a portal has a free text search any word in yout title will be also used as a keyword, which is why it is useful to have a descriptive_title for your dataset or code.

Research Vocabulary Australia

ARDC manage a controlled vocabulary service Research Vocabulary Australia (RVA) to list vocabularies used by Australian research community. As well as making it easier to find controlled vocabularies, RVA also allows research organisation to contribute and publish new vocabularies.