Difference between revisions of "Why should I care?"

(Created page with "<p align="center" style="text-align:center"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Data Publication in...")
 
 
Line 1: Line 1:
<p align="center" style="text-align:center"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Data Publication in CLEX: A Quick Guide for University Researchers'''</span></span></span></p> <p align="center" style="text-align:center"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''by Ian Macadam'''</span></span></span></p>
 
&nbsp;
 
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Why should we publish data?'''</span></span></span>
+
== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Data Publication in CLEX: A Quick Guide for University Researchers'''</span></span></span></span></span> ==
 +
<p align="center" style="text-align:center"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''by Ian Macadam'''</span></span></span></span></span></p>
 +
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Why should we publish data?'''</span></span></span></span></span> ===
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">This is an obvious question for any researcher of the current and previous generation to ask. However, I suggest that a better question to ask would be “Why aren’t we all routinely publishing our data?”. By the time CLEX “closes its doors” in 2024, I expect (and hope) that an academic asking “Why should we publish data?” be greeted with the same number of raised eyebrows as one asking “Why should we publish papers?”.</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">This is an obvious question for any researcher of the current and previous generation to ask. However, I suggest that a better question to ask would be “Why aren’t we all routinely publishing our data?”. By the time CLEX “closes its doors” in 2024, I expect (and hope) that an academic asking “Why should we publish data?” be greeted with the same number of raised eyebrows as one asking “Why should we publish papers?”.</span></span></span></span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">I once heard a professor say “research that has not been published does not exist”. He is, of course, wrong – there is plenty of unpublished research undertaken by governments, private corporations and universities (and in many cases there are some extremely good reasons why this research is not published!). However, the quote is a good one in the context of CLEX academics. In this context, we publish our research to let the world know it exists, thus gaining credit for our work and allowing others to reproduce, test and build on it. These objectives are partially served by publishing scientific papers. However, our research ≠our papers. An ever-greater portion of our research takes the form of data and computer code that, in most cases, cannot easily be reproduced just by reading the relevant paper (e.g. it’s kind of a nuisance to recode and rerun all the CMIP6 models from scratch on the basis of a 4-page ''Nature''paper to see if the authors’ conclusions about ice albedo effects stack up). More rigorous processes for publication are therefore being increasingly applied to data and software.</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">I once heard a professor say “research that has not been published does not exist”. He is, of course, wrong – there is plenty of unpublished research undertaken by governments, private corporations and universities (and in many cases there are some extremely good reasons why this research is not published!). However, the quote is a good one in the context of CLEX academics. In this context, we publish our research to let the world know it exists, thus gaining credit for our work and allowing others to reproduce, test and build on it. These objectives are partially served by publishing scientific papers. However, our research ≠our papers. An ever-greater portion of our research takes the form of data and computer code that, in most cases, cannot easily be reproduced just by reading the relevant paper (e.g. it’s kind of a nuisance to recode and rerun all the CMIP6 models from scratch on the basis of a 4-page ''Nature''paper to see if the authors’ conclusions about ice albedo effects stack up). More rigorous processes for publication are therefore being increasingly applied to data and software.</span></span></span></span></span>
  
&nbsp;
+
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''What is published data?'''</span></span></span></span></span> ===
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''What is published data?'''</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">So what is published data? A useful definition for published data is data that has been assigned a Digital Object Identifier (DOI) which is visible, with at least a brief description of the dataset, via at least one relevant online repository. This definition does not necessarily mean that a dataset adheres to the FAIR Data Principles but it will be increasingly difficult to claim that a dataset is FAIR if it does not meet this definition.</span></span></span></span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">So what is published data? A useful definition for published data is data that has been assigned a Digital Object Identifier (DOI) which is visible, with at least a brief description of the dataset, via at least one relevant online repository. This definition does not necessarily mean that a dataset adheres to the FAIR Data Principles but it will be increasingly difficult to claim that a dataset is FAIR if it does not meet this definition.</span></span></span>
+
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Why should I publish data?'''</span></span></span></span></span> ===
  
&nbsp;
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Good practice in science is rapidly evolving around data. Today, it would be bizarre, to say the least, not to have a DOI assigned to a scientific paper. Publishers assign DOIs to papers so that they are easy to reference and so that computer systems can easily track citations. It is rapidly becoming the norm for DOIs to be assigned to datasets. In the future, datasets that researchers produce will be less likely to be used if they do not have DOIs and those researchers who do have DOIs assigned to their datasets will receive extra credit through citations of their data. On a more basic level, it may not be possible to publish a paper in some journals if the data underlying the analysis is not available to readers.</span></span></span></span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''Why should I publish data?'''</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Of course, having a DOI assigned to a dataset is no use if the DOI and a description of the dataset is not visible to those with a potential interest in the data, so an important part of data publication is that potential users can find the data in an appropriate online repository. The analogy here is assigning a DOI to a scientific paper but then not having it included in a journal!</span></span></span></span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Good practice in science is rapidly evolving around data. Today, it would be bizarre, to say the least, not to have a DOI assigned to a scientific paper. Publishers assign DOIs to papers so that they are easy to reference and so that computer systems can easily track citations. It is rapidly becoming the norm for DOIs to be assigned to datasets. In the future, datasets that researchers produce will be less likely to be used if they do not have DOIs and those researchers who do have DOIs assigned to their datasets will receive extra credit through citations of their data. On a more basic level, it may not be possible to publish a paper in some journals if the data underlying the analysis is not available to readers.</span></span></span>
+
=== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''When should I NOT publish my data?'''</span></span></span></span></span> ===
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Of course, having a DOI assigned to a dataset is no use if the DOI and a description of the dataset is not visible to those with a potential interest in the data, so an important part of data publication is that potential users can find the data in an appropriate online repository. The analogy here is assigning a DOI to a scientific paper but then not having it included in a journal!</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">It is not appropriate to publish all data. There are number of cases where one would not want to publish a dataset:</span></span></span></span></span>
  
&nbsp;
+
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The data are “intermediate” or “working” data produced as an intermediate step to your final results and are not critical to the reproducibility of the final results and, often, are subject to change as you work on your methods.</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the data now would allow other scientists to analyse it and publish key conclusions before you have completed your own analysis.</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You have derived the data using underlying data or methods that do not allow you to publish the data.</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the data would prejudice a patent application or give away IP that has commercial value.</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The data contains sensitive personal information (e.g. health records for individuals).</span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">'''When should I NOT publish my data?'''</span></span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Of course, many of these reasons depend on timing. It may not be a good idea to publish your data this summer, before you have completed your analysis or applied for a patent, but fine to do so next summer after your paper is published and your patent application has been submitted.</span></span></span></span></span>
  
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">It is not appropriate to publish all data. There are number of cases where one would not want to publish a dataset:</span></span></span>
+
[[Category:Data induction]]
 
 
#The data are “intermediate” or “working” data produced as an intermediate step to your final results and are not critical to the reproducibility of the final results and, often, are subject to change as you work on your methods.
 
 
 
&nbsp;
 
<ol start="2">
 
<li>Publishing the data now would allow other scientists to analyse it and publish key conclusions before you have completed your own analysis.</li>
 
</ol>
 
 
 
&nbsp;
 
<ol start="3">
 
<li>You have derived the data using underlying data or methods that do not allow you to publish the data.</li>
 
</ol>
 
 
 
&nbsp;
 
<ol start="4">
 
<li>Publishing the data would prejudice a patent application or give away IP that has commercial value.</li>
 
</ol>
 
 
 
&nbsp;
 
<ol start="5">
 
<li>The data contains sensitive personal information (e.g. health records for individuals).</li>
 
</ol>
 
 
 
<span style="line-height:15.693333625793457px"><span style="caret-color:#000000"><span style="color:#000000">Of course, many of these reasons depend on timing. It may not be a good idea to publish your data this summer, before you have completed your analysis or applied for a patent, but fine to do so next summer after your paper is published and your patent application has been submitted.</span></span></span>
 
 
 
&nbsp;
 

Latest revision as of 01:41, 8 July 2021

Data Publication in CLEX: A Quick Guide for University Researchers

by Ian Macadam

Why should we publish data?

This is an obvious question for any researcher of the current and previous generation to ask. However, I suggest that a better question to ask would be “Why aren’t we all routinely publishing our data?”. By the time CLEX “closes its doors” in 2024, I expect (and hope) that an academic asking “Why should we publish data?” be greeted with the same number of raised eyebrows as one asking “Why should we publish papers?”.

I once heard a professor say “research that has not been published does not exist”. He is, of course, wrong – there is plenty of unpublished research undertaken by governments, private corporations and universities (and in many cases there are some extremely good reasons why this research is not published!). However, the quote is a good one in the context of CLEX academics. In this context, we publish our research to let the world know it exists, thus gaining credit for our work and allowing others to reproduce, test and build on it. These objectives are partially served by publishing scientific papers. However, our research ≠our papers. An ever-greater portion of our research takes the form of data and computer code that, in most cases, cannot easily be reproduced just by reading the relevant paper (e.g. it’s kind of a nuisance to recode and rerun all the CMIP6 models from scratch on the basis of a 4-page Naturepaper to see if the authors’ conclusions about ice albedo effects stack up). More rigorous processes for publication are therefore being increasingly applied to data and software.

What is published data?

So what is published data? A useful definition for published data is data that has been assigned a Digital Object Identifier (DOI) which is visible, with at least a brief description of the dataset, via at least one relevant online repository. This definition does not necessarily mean that a dataset adheres to the FAIR Data Principles but it will be increasingly difficult to claim that a dataset is FAIR if it does not meet this definition.

Why should I publish data?

Good practice in science is rapidly evolving around data. Today, it would be bizarre, to say the least, not to have a DOI assigned to a scientific paper. Publishers assign DOIs to papers so that they are easy to reference and so that computer systems can easily track citations. It is rapidly becoming the norm for DOIs to be assigned to datasets. In the future, datasets that researchers produce will be less likely to be used if they do not have DOIs and those researchers who do have DOIs assigned to their datasets will receive extra credit through citations of their data. On a more basic level, it may not be possible to publish a paper in some journals if the data underlying the analysis is not available to readers.

Of course, having a DOI assigned to a dataset is no use if the DOI and a description of the dataset is not visible to those with a potential interest in the data, so an important part of data publication is that potential users can find the data in an appropriate online repository. The analogy here is assigning a DOI to a scientific paper but then not having it included in a journal!

When should I NOT publish my data?

It is not appropriate to publish all data. There are number of cases where one would not want to publish a dataset:

  1. The data are “intermediate” or “working” data produced as an intermediate step to your final results and are not critical to the reproducibility of the final results and, often, are subject to change as you work on your methods.
  2. Publishing the data now would allow other scientists to analyse it and publish key conclusions before you have completed your own analysis.
  3. You have derived the data using underlying data or methods that do not allow you to publish the data.
  4. Publishing the data would prejudice a patent application or give away IP that has commercial value.
  5. The data contains sensitive personal information (e.g. health records for individuals).

Of course, many of these reasons depend on timing. It may not be a good idea to publish your data this summer, before you have completed your analysis or applied for a patent, but fine to do so next summer after your paper is published and your patent application has been submitted.