Difference between revisions of "Versioning"

(Created page with "<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Versioning is the process of creating and managing multiple releases, each labeled by a v...")
 
 
Line 1: Line 1:
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Versioning is the process of creating and managing multiple releases, each labeled by a version, of&nbsp;&nbsp;a&nbsp;research output, such as a dataset or code.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Applying versions consistently and clearly is really important, as different releases can have fairly different characteristics and their applications can produce very different results.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are several discussion groups and documents&nbsp;dedicated to the topic of versioning, here we are trying to cover only the ssential aspects.&nbsp;</span></span>
 
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Code versioning</span></span>''' ===
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Versions are really important to identify your code, even if you are using already a version control system. Even if you are not planning new releases it’s fairly common to have updates with code.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Consider following the [https://semver.org/ Semantic Versioning convention], this scheme uses a 3 part version number,</span></span>
 
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp;MAJOR.MINOR.PATCH&nbsp; as in v1.3.0</span></span>
+
<span style="font-size: medium; font-family: Arial, Helvetica, sans-serif;">Versioning is the process of creating and managing multiple releases, each labelled by a version, of&nbsp;a&nbsp;research output, such as a dataset or code.</span>&nbsp;<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Applying versions consistently and clearly is really important, as different releases can have fairly different characteristics and their applications can produce very different results.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are several discussion groups and documents&nbsp;dedicated to the topic of versioning, here we are trying to cover only the essential aspects.&nbsp;</span></span>
  
#
+
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Code versioning</span></span>''' ===
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">MAJOR changes when the updates will break previous behaviour, use “0” to indicate a code still under development</span></span>
 
  
#
+
<span style="font-size: medium; font-family: Arial, Helvetica, sans-serif;">Versions are really important to identify your code, even if you are using already a version control system. Even if you are not planning new releases it’s fairly common to have updates with code.</span>&nbsp;<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Consider following the [https://semver.org/ Semantic Versioning convention], this scheme uses a 3 part version number,</span></span>
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">MINOR changes when adding new functionality in a backwards compatible manner</span></span>
 
  
#
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp;MAJOR.MINOR.PATCH&nbsp; as in v1.3.0</span></span>
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">PATCH number when you make backwards compatible bug fixes.</span></span>
 
  
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">MAJOR changes when the updates will break previous behaviour, use “0” to indicate a code still under development</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">MINOR changes when adding new functionality in a backwards compatible manner</span></span>
 +
#<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">PATCH number when you make backwards compatible bug fixes.</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In the example above we went from 1.2.3 to 1.3.0 as we added a new functionality without breaking existing behaviour.&nbsp;</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In the example above we went from 1.2.3 to 1.3.0 as we added a new functionality without breaking existing behaviour.&nbsp;</span></span>
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">If you are using GitHub, produce a release before publishing.&nbsp;While a commit url is a persistent indicator of a specific code snapshot, it is not &nbsp;easy to share, and commits message usually refer only to the last step.</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">If you are using GitHub, produce a release before publishing.&nbsp;While a commit url is a persistent indicator of a specific code snapshot, it is not easy to share, and commits message usually refer only to the last step.</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Most of all it is important to be consistent and versions should always progress from lower to higher.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Most of all it is important to be consistent and versions should always progress from lower to higher.</span></span>
Line 24: Line 20:
 
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Dataset versioning</span></span>''' ===
 
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Dataset versioning</span></span>''' ===
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">As for code even if you are not planning any new version of the dataset, you might chnage idea or need to extend or correct&nbsp;the data in the future.</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">As for code even if you are not planning any new version of the dataset, you might change idea or need to extend or correct&nbsp;the data in the future.</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Conventions around data versioning are still been developed. The ARDC provides some [https://ardc.edu.au/resources/working-with-data/data-versioning/ guidance].&nbsp;</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Conventions around data versioning are still been developed. The ARDC provides some [https://ardc.edu.au/resources/working-with-data/data-versioning/ guidance].&nbsp;</span></span>
Line 32: Line 28:
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Here are some suggestions to work a versioning strategy for a “stable” dataset</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Here are some suggestions to work a versioning strategy for a “stable” dataset</span></span>
  
*
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Any change in the actual data should be accompanied by&nbsp;a new version.</span></span>  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">any change in the actual data should be accompanied by&nbsp; a new version. NB if you’re main data is stored on Zenodo adding or removing files will force a new version to be created</span></span>
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Changes to the metadata do not need a new version.</span></span>  
 
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Most often numbers or time stamps are used to identify versions, whichever approach you choose, it is important to be consistent. It is impossible for a user to work out which version is the latest if two versions are called v2.0 and v2020.&nbsp;</span></span>  
*
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">changes to the metadata do not need a new version</span></span>
 
 
 
*
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Most often numbers or time stamps are used to identify versions, whichever approach you choose, it is important to be consistent. It is impossible for a user to work out which version is the latest if two versions are called v2.0 and v2020.&nbsp;</span></span>
 
 
 
  
==== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Continuos dataset</span></span>''' ====
+
==== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Continuous dataset</span></span>''' ====
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Datasets which are continuously updated require particular attention. As a starting point you should apply the same versioning strategy used for a “stable” dataset. However, you also need to consider the following:</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Datasets which are continuously updated require particular attention. As a starting point you should apply the same versioning strategy used for a “stable” dataset. However, you also need to consider the following:</span></span>
  
*
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The documentation should report how the dataset is updated and how frequently.</span></span>
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The documentation should report how the dataset is updated and how frequently.</span></span>
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">It is better to add new files, if possible, rather than updating existing files.</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">It is even more important to have a versioning strategy to distinguish between the constant change to the data, allowed as part of the same version, and what circumstances will warrant the creation of a new version.</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">When the way the data is produced changes, then there should be a new version, even if the older data in the timeseries is not affected by the change in methodology.</span></span>  
  
*
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">As a DOI should always refer to exactly the same object there is not consensus on how to treat continuous data. Some of the common strategies are:</span></span>
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">It is better to add new files, if possible, rather than updating existing files.</span></span>
 
 
 
*
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">It is even more important to have a versioning strategy to distinguish between the constant change to the data, allowed as part of the same version, and what circumstances will warrant the creation of a new version.</span></span>
 
 
 
*
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">When the way the data is produced changes, than there should be a new version, even if the older data in the timeseries is not affected by the change in methodology.</span></span>
 
  
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the data at regular time intervals, i.e. every year. This is done at times delaying the data release, so the data is always covered by one DOI or updating the data continuously and leaving gaps in the DOI coverage. In both cases each new DOI will be reflected in a new version</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the DOI and updating the data continuously until a change in data production warrant the creation of a new version. Users are then required to add a timestamp indicating&nbsp;when the data was accessed in the dataset citation.</span></span>
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">As a DOI should always refer to exactly the same object there is not consensus on how to treat continuous data. Some of the common strategies are:</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">NB if you’re main data is stored on Zenodo adding or removing files will force a new version to be created.</span></span>
 
 
*
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the data at regular time intervals, ie. every year. This is done at times delaying the data release, so the data is always covered by one DOI or updating the data continuously and leaving gaps in the DOI coverage. In both cases each new DOI will be reflected in a new version</span></span>
 
  
*
+
[[Category:Data induction]]
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Publishing the DOI and updating the data continuously until a change in data production warrant the creation of a new version. Users are then required to add a timestamp when citing the data, indicating&nbsp; when the data was accessed.</span></span>
 

Latest revision as of 01:04, 6 July 2021

Versioning is the process of creating and managing multiple releases, each labelled by a version, of a research output, such as a dataset or code. Applying versions consistently and clearly is really important, as different releases can have fairly different characteristics and their applications can produce very different results. There are several discussion groups and documents dedicated to the topic of versioning, here we are trying to cover only the essential aspects. 

Code versioning

Versions are really important to identify your code, even if you are using already a version control system. Even if you are not planning new releases it’s fairly common to have updates with code. Consider following the Semantic Versioning convention, this scheme uses a 3 part version number,

    MAJOR.MINOR.PATCH  as in v1.3.0

  1. MAJOR changes when the updates will break previous behaviour, use “0” to indicate a code still under development
  2. MINOR changes when adding new functionality in a backwards compatible manner
  3. PATCH number when you make backwards compatible bug fixes.

In the example above we went from 1.2.3 to 1.3.0 as we added a new functionality without breaking existing behaviour. 

If you are using GitHub, produce a release before publishing. While a commit url is a persistent indicator of a specific code snapshot, it is not easy to share, and commits message usually refer only to the last step.

Most of all it is important to be consistent and versions should always progress from lower to higher.

Dataset versioning

As for code even if you are not planning any new version of the dataset, you might change idea or need to extend or correct the data in the future.

Conventions around data versioning are still been developed. The ARDC provides some guidance

Completed dataset

Here are some suggestions to work a versioning strategy for a “stable” dataset

  • Any change in the actual data should be accompanied by a new version.
  • Changes to the metadata do not need a new version.
  • Most often numbers or time stamps are used to identify versions, whichever approach you choose, it is important to be consistent. It is impossible for a user to work out which version is the latest if two versions are called v2.0 and v2020. 

Continuous dataset

Datasets which are continuously updated require particular attention. As a starting point you should apply the same versioning strategy used for a “stable” dataset. However, you also need to consider the following:

  • The documentation should report how the dataset is updated and how frequently.
  • It is better to add new files, if possible, rather than updating existing files.
  • It is even more important to have a versioning strategy to distinguish between the constant change to the data, allowed as part of the same version, and what circumstances will warrant the creation of a new version.
  • When the way the data is produced changes, then there should be a new version, even if the older data in the timeseries is not affected by the change in methodology.

As a DOI should always refer to exactly the same object there is not consensus on how to treat continuous data. Some of the common strategies are:

  • Publishing the data at regular time intervals, i.e. every year. This is done at times delaying the data release, so the data is always covered by one DOI or updating the data continuously and leaving gaps in the DOI coverage. In both cases each new DOI will be reflected in a new version
  • Publishing the DOI and updating the data continuously until a change in data production warrant the creation of a new version. Users are then required to add a timestamp indicating when the data was accessed in the dataset citation.

NB if you’re main data is stored on Zenodo adding or removing files will force a new version to be created.