Difference between revisions of "Controlled vocabularies"

Line 12: Line 12:
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Controlled vocabularies also provide&nbsp;keywords to use when publishing&nbsp;data. Keywords are a powerful instrument when used properly. They can greatly increase the discoverability of a dataset, which is why&nbsp;it is one of the few highly recommended attributes in the [[Conventions|ACDD conventions]]. Unfortunately, there is not yet an agreed controlled vocabulary to be used specifically for climate science. Lots of climate terms are however covered by the [https://earthdata.nasa.gov/earth-observation-data/find-data/idn/gcmd-keywords Global Change Master Directory Keywords], maintained by NASA.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Controlled vocabularies also provide&nbsp;keywords to use when publishing&nbsp;data. Keywords are a powerful instrument when used properly. They can greatly increase the discoverability of a dataset, which is why&nbsp;it is one of the few highly recommended attributes in the [[Conventions|ACDD conventions]]. Unfortunately, there is not yet an agreed controlled vocabulary to be used specifically for climate science. Lots of climate terms are however covered by the [https://earthdata.nasa.gov/earth-observation-data/find-data/idn/gcmd-keywords Global Change Master Directory Keywords], maintained by NASA.</span></span>
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are other terms you can use as keywords:</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are few categories you should try to cover when assigning&nbsp;keywords:</span></span>
  
*datasets acronym  
+
*
*language name with version for code records  
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">dataset acronym: if your data is strictly related to another dataset, or your code is applied to a specific dataset</span></span>
*...
+
 
*<br/> &nbsp;  
+
*
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">project acronym: if your dataset and or code relates to a specific project&nbsp;</span></span>
 +
 
 +
*
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">programming language: &nbsp;you should add this to your code records and be as specific as possible, for example use&nbsp;python3, rather than just python</span></span>
 +
 
 +
*
 +
<font face="Arial, Helvetica, sans-serif" size="3">data type: observation, model output, etc.</font>
 +
 
 +
*
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">realm or discipline: like ocean, land and/or physical ocanography, climate science etc. For the disciplines&nbsp;you can use the [[Fields_of_Research]] from the Bureau of statistics</span></span>
 +
 
 +
*
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">variable names: if you have many just list the more relevant</span></span>
 +
 
 +
*
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">spatiotemporal characteristic of the data: frequency, resolution, region covered</span></span>
 +
 
 +
 
 +
<font face="Arial, Helvetica, sans-serif" size="3">Every time you can try to use terms provided in a vocabulary, the GCMD keywords will cover most of the categories listed above. If you are using a speciifc name as for datasets and projects, then use the official acronyms and soecify the versions whenever possible.</font>
 +
 
 +
<font face="Arial, Helvetica, sans-serif" size="3">Also remember that if a portal has a free text search any word in yout title will be also used as a keyword, which is why it is useful to have a [[descriptive_title]] for your dataset or code.</font>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Research Vocabulary Australia'''</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Research Vocabulary Australia'''</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">ARDC manage a controlled vocabulary service [https://vocabs.ardc.edu.au Research Vocabulary Australia]&nbsp;(RVA) to&nbsp;list</span>&nbsp;vocabularies used by Australian research community. As well as making it easier to find controlled vocabularies, RVA&nbsp;also allows research organisation to contribute and publish new vocabularies.&nbsp;</span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">ARDC manage a controlled vocabulary service [https://vocabs.ardc.edu.au Research Vocabulary Australia]&nbsp;(RVA) to&nbsp;list</span>&nbsp;vocabularies used by Australian research community. As well as making it easier to find controlled vocabularies, RVA&nbsp;also allows research organisation to contribute and publish new vocabularies.&nbsp;</span>
 
&nbsp;
 
 
<span style="font-size:medium;">Should this be in data tools???</span>
 
  
 
&nbsp;
 
&nbsp;
  
 
[[Category:Data]] [[Category:Data induction]]
 
[[Category:Data]] [[Category:Data induction]]

Revision as of 21:48, 17 June 2021

Template:Working on New page under construction

 

A controlled vocabulary is an agreed list of terms definitions used to provide a unique label to a concept. Controlled vocabularies are usually discipline related; their main aim is to facilitate sharing of data in the same community. For this reason, is important that the community participate in the development of the vocabulary and agrees to its adoption for them to be useful.

In some case vocabularies have been created in relation to a specific project, and then more widely adopted. As an example, since CMIP is an intercomparison project with modelling groups participating from across the world, it was essential to its success to define and use controlled vocabularies. CMIP6 controlled vocabularies cover many different aspects: experiments, variables, realms, models, sub-projects, frequency, resolution and grid labels. Their definition and labels for variables, frequency and realms are often adopted by other climate data producers.

Another example of controlled vocabulary is the CF conventions standard_name table, anyone can contribute by proposing a definition for variable which are not yet covered.

Keywords

Controlled vocabularies also provide keywords to use when publishing data. Keywords are a powerful instrument when used properly. They can greatly increase the discoverability of a dataset, which is why it is one of the few highly recommended attributes in the ACDD conventions. Unfortunately, there is not yet an agreed controlled vocabulary to be used specifically for climate science. Lots of climate terms are however covered by the Global Change Master Directory Keywords, maintained by NASA.

There are few categories you should try to cover when assigning keywords:

dataset acronym: if your data is strictly related to another dataset, or your code is applied to a specific dataset

project acronym: if your dataset and or code relates to a specific project 

programming language:  you should add this to your code records and be as specific as possible, for example use python3, rather than just python

data type: observation, model output, etc.

realm or discipline: like ocean, land and/or physical ocanography, climate science etc. For the disciplines you can use the Fields_of_Research from the Bureau of statistics

variable names: if you have many just list the more relevant

spatiotemporal characteristic of the data: frequency, resolution, region covered


Every time you can try to use terms provided in a vocabulary, the GCMD keywords will cover most of the categories listed above. If you are using a speciifc name as for datasets and projects, then use the official acronyms and soecify the versions whenever possible.

Also remember that if a portal has a free text search any word in yout title will be also used as a keyword, which is why it is useful to have a descriptive_title for your dataset or code.

Research Vocabulary Australia

ARDC manage a controlled vocabulary service Research Vocabulary Australia (RVA) to list vocabularies used by Australian research community. As well as making it easier to find controlled vocabularies, RVA also allows research organisation to contribute and publish new vocabularies.