Data induction

Revision as of 19:16, 7 June 2021 by P.petrelli (talk | contribs)

Template:Working on New page under construction

As a researcher or a post graduate student in climate science you do most of your work on a computer. If you were working in a laboratory you would be following a set of rules and protocols, you would be documenting your experiments to be able to reproduce them later or describe them to others in a detailed and exact way. When we are sitting at our desk it is really easy to forget that what we are doing will eventually be shared with others. If you are writing a paper you are used  to receive a set of guidelines and to justify and references anything you write, as well as making sure is readable and well presented. We are not used to share our data and code but they are just another research product. As such they are also covered by requirements, however lots of these have been introduced only recently and the guidelines are continuosly evolving.

This is why we prepared this data induction, as an attempt to clarify what your responsibilities are, by covering all the applicable data policies and requirements, but also providing a set of guidelines and services which will help you satisfy them.

This induction is structured in three parts, starting from your responsibility as a researcher.

Part 1: Policy 

In the first part we are covering the ARC and institutional requirements, the journal publishers requirements and the CLEX data policy. Trying to put all these requirements together can be confusing and overwhelming. The key is to remember that the CLEX data policy is an intepretation of how these requirements can be applied to climate science and all our guidelines are aiming to help you satisfying them. We also try to cover gaps in data services and are always available to help you.

All the policies are based upon the FAIR principles so this is the best place to start to understand better what your data and code should look like and why this is important.

Part 2: Publishing

You can satisfy most of the requirements by publishing your data, which is why in the second part we focus on publishing. Publishing is also the best way of sharing your data with others. While just putting your data online somewhere might seems the quickest solution, if your data is not properly described and formatted, it is of little use to anyone.

Part 3: Best practices

The final part is covering the best practices to manage your data and code. Publishing is just the last step in your data workflow, proper planning and adoption of best practices from the start of your project makes publishing much easier as well as optimising the use of our shared storage and computational resources.

Induction outcomes

At the end of this induction you should have all the information necessary to manage, share and publish your data. In particular:

  • you will know what data you are required to archive, share or publish according to your institution, the ARC or in order to publish a paper.  
  • you will be familiar with the key data concepts and terminlogy
  • you will know how to choose and apply data and software licensing
  • you will be familiar with and know how to use conventions and standard used by the climate community
  • you will know how to manage storage and computational resources available to you
  • you will be ready when asked to publish data to make all the relevant choices:
    • which data to publish,
    • where to publish,
    • how to prepare your data and finally how to advertise your newly published dataset


Managing your data properly, in a way that makes your research easier to reproduce and share and more valuable, it is not complicated, it will eventually become just another working habit. However, you might encounter concepts and terminology that are new for you. To help you we listed the definitions of all the most commonly used terms, key concepts and acronyms.