Difference between revisions of "Which data should I publish"

Line 1: Line 1:
[[Category: Data]]
== Which data should I publish? ==
== Which data should I publish? ==

Revision as of 00:24, 12 December 2019

Which data should I publish?

While there's not a ready-made answer to this there are guidelines and principles to help you formulate your own answer. A few are listed here.

  • Sharing and publishing some data is better than none, so start from what is easier, look at what are the practices and which services are on offer at your institution to find examples and guidelines.
  • Provide the information needed to interpret, reuse and reproduce your results. This is what journal publishers usually require, most of them provides guidelines and examples of which data you should share (see our publisher policies page).
  • If the output is big publish only a subset. If your methods are well described, the codes you used are easily available, than in most cases you can publish only the subset of data that underlines your publication. So, for example if you run a model but then used only some of the variables in the output, publishing only the post-processed output of the model is generally sufficient from a journal point of view. You still need though to document clearly which model version and configuration you used, which input data and wherever possible point to an online source for them. Some thoughts on what kind of data you should be publishing along these lines are shared in the Centre position on data publishing .
  • Conversely dumping the data somewhere and not providing any information on the process makes the data available but completely useless. Try to think of what do you look for then you are considering using a dataset for your own research.What kind of information do you consider essential for the data to be usable and which additional information would make its use easier?'
  • While you might be required to share just part of your data output, there is often lots of data (especially if you run a model simulation) which you will end up not using but that might be useful to someone else. So even if you have not time or resources to publish all of your data output, providing information on its existence and on how to request access to it, it is often enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary.
  • Any data you share is an asset for your institution. Remember we have a data expert in CLEx and there will be others in your institution, do not hesitate to ask for advice and practical help. If you share data both internally or externally you are creating a new asset for other researchers and it is in our interest to help you.