Difference between revisions of "Data induction"

Line 1: Line 1:
 +
  
 
{{Working on}}
 
{{Working on}}
  
We prepare a .. that will help you accessing all our data documentation in the same way we present it in our data management trainings. At the end of this journey you should have all the information necessary to manage, share and publish your data. In particular:
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">As a researcher or a post graduate student in climate science you do most of your work on a computer.&nbsp;If you were working in a laboratory you would be following a set of rules and protocols, you would be documenting your experiments to be able to reproduce them later or describe them to others in a detailed and exact way. When we are sitting at our desk it is really easy to forget that what we are doing will eventually be shared with others. If you are writing a paper&nbsp;you are used&nbsp;&nbsp;to receive a set of guidelines and to justify and references anything you write, as well as making sure is readable and well presented. We are not used to share our data and code but they are just another [[Why_should_I_care?|research product]]. As such they are also covered by requirements, however lots of these have been introduced only recently and the guidelines are continuosly evolving.</span></span>
  
*&nbsp; you will be familiar with the key data concepts and terminlogy
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">This is why we prepared this data induction, as an attempt to clarify what your responsibilities are, by covering all the applicable data policies and requirements, but also providing a set of guidelines and services which will help you satisfy them.</span></span>
*&nbsp; you will know about data and software licensing
 
*&nbsp; you will know what data you are required to archive, share or publish according to your institution, the ARC or in order to publish a paper.&nbsp;
 
*&nbsp;you will be ready when asked to publish data to make all the relevant choices:
 
**which data to publish,  
 
**where to publish,
 
**how to prepare your data and finally how to advertise your newly published dataset 
 
  
== Data key concepts ==
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">This induction is structured in three&nbsp;parts, starting from your responsibility as a researcher.</span></span>
  
Managing your dataset properly in a way that makes your research easier to reproduce and share and so more valuable it is not complicated, it will eventually become just another working habit. When you start looking into the data world though the new concepts and terminology used can make the journey look much more complicated.
+
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Part 1: Policy'''&nbsp;</span></span> ===
  
To help you districate yourself we listed the definitions of all the most commonly used terms and the key concepts.
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In the first part we are covering the ARC and institutional requirements, the&nbsp;journal publishers requirements and the&nbsp;CLEX data policy.&nbsp;Trying to put all these requirements together can be confusing and overwhelming. The key is to&nbsp;remember that the&nbsp;CLEX data policy is an intepretation of how these requirements can be applied to&nbsp;climate science and all our guidelines are aiming to help you satisfying them. We also try to cover gaps in data services and are&nbsp;always available to help you.</span></span>
  
You should read through the key concepts now and get back to it everytime you need to.
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">All the policies are based upon the [[FAIR]] principles so this is the best place to start to understand better what your data and code should look like and why this is important.</span></span>
 +
 
 +
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Part 2: Publishing'''</span></span> ===
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You can satisfy most of the requirements by publishing your data, which is why in the second part we focus on publishing. Publishing is also the best way of sharing your data with others. While just putting your data online somewhere might seems the quickest solution, if your data is not properly described and formatted, it is of little use to anyone.</span></span>
 +
 
 +
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Part 3: Best practices'''</span></span> ===
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The final part is covering the best practices to manage your data and code. Publishing is just the last step in your data workflow, proper planning and adoption of best practices from the start of your project makes publishing much easier as well as optimising&nbsp;the use of our shared storage and computational resources.</span></span>
 +
 
 +
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Induction outcomes'''</span></span> ===
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">At the end of this induction you should have all the information necessary to manage, share and publish your data. In particular:</span></span>
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will know what data you are required to archive, share or publish according to your institution, the ARC or in order to publish a paper.&nbsp;&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will be familiar with the key data concepts and terminlogy</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will know how to choose and apply data and software licensing</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will be familiar with and know how to use conventions and standard&nbsp;used by the climate community</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will know&nbsp;how to manage storage and computational resources available to you</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">you will be ready when asked to publish data to make all the relevant choices:</span></span>
 +
**<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">which data to publish,</span></span>
 +
**<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">where to publish,</span></span>
 +
**<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">how to prepare your data and finally how to advertise your newly published dataset</span></span> 
  
 
&nbsp;
 
&nbsp;
  
&nbsp;
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Managing your data&nbsp;properly, in a way that makes your research easier to reproduce and share and more valuable, it is not complicated, it will eventually become just another working habit. However, you might encounter concepts and terminology that are new for you.&nbsp;To help you we listed the definitions of all the most commonly used [[Data_terminology|terms, key concepts and&nbsp;acronyms]].</span></span>

Revision as of 20:16, 7 June 2021


Template:Working on New page under construction

As a researcher or a post graduate student in climate science you do most of your work on a computer. If you were working in a laboratory you would be following a set of rules and protocols, you would be documenting your experiments to be able to reproduce them later or describe them to others in a detailed and exact way. When we are sitting at our desk it is really easy to forget that what we are doing will eventually be shared with others. If you are writing a paper you are used  to receive a set of guidelines and to justify and references anything you write, as well as making sure is readable and well presented. We are not used to share our data and code but they are just another research product. As such they are also covered by requirements, however lots of these have been introduced only recently and the guidelines are continuosly evolving.

This is why we prepared this data induction, as an attempt to clarify what your responsibilities are, by covering all the applicable data policies and requirements, but also providing a set of guidelines and services which will help you satisfy them.

This induction is structured in three parts, starting from your responsibility as a researcher.

Part 1: Policy 

In the first part we are covering the ARC and institutional requirements, the journal publishers requirements and the CLEX data policy. Trying to put all these requirements together can be confusing and overwhelming. The key is to remember that the CLEX data policy is an intepretation of how these requirements can be applied to climate science and all our guidelines are aiming to help you satisfying them. We also try to cover gaps in data services and are always available to help you.

All the policies are based upon the FAIR principles so this is the best place to start to understand better what your data and code should look like and why this is important.

Part 2: Publishing

You can satisfy most of the requirements by publishing your data, which is why in the second part we focus on publishing. Publishing is also the best way of sharing your data with others. While just putting your data online somewhere might seems the quickest solution, if your data is not properly described and formatted, it is of little use to anyone.

Part 3: Best practices

The final part is covering the best practices to manage your data and code. Publishing is just the last step in your data workflow, proper planning and adoption of best practices from the start of your project makes publishing much easier as well as optimising the use of our shared storage and computational resources.

Induction outcomes

At the end of this induction you should have all the information necessary to manage, share and publish your data. In particular:

  • you will know what data you are required to archive, share or publish according to your institution, the ARC or in order to publish a paper.  
  • you will be familiar with the key data concepts and terminlogy
  • you will know how to choose and apply data and software licensing
  • you will be familiar with and know how to use conventions and standard used by the climate community
  • you will know how to manage storage and computational resources available to you
  • you will be ready when asked to publish data to make all the relevant choices:
    • which data to publish,
    • where to publish,
    • how to prepare your data and finally how to advertise your newly published dataset

 

Managing your data properly, in a way that makes your research easier to reproduce and share and more valuable, it is not complicated, it will eventually become just another working habit. However, you might encounter concepts and terminology that are new for you. To help you we listed the definitions of all the most commonly used terms, key concepts and acronyms.