http://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&feed=atom&action=historyProvenance - Revision history2024-03-28T17:35:27ZRevision history for this page on the wikiMediaWiki 1.31.0http://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&diff=3075&oldid=prevP.petrelli at 06:49, 8 July 20212021-07-08T06:49:28Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 06:49, 8 July 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l2" >Line 2:</td>
<td colspan="2" class="diff-lineno">Line 2:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance, also referred to&nbsp;as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for its creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to the point a dataset is published and you have the project&nbsp;final product.</span></span></div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance, also referred to&nbsp;as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for its creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to the point a dataset is published and you have the project&nbsp;final product.</span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance <del class="diffchange diffchange-inline">is complicated</del>, as usually data gets through a lot of steps and re-iterations. Given the nature of research itself, the objectives and methods of&nbsp;your analysis might change&nbsp;various times, you might keep some of the steps and modify&nbsp;others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility of your research.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of the steps automatically, but ultimately none of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance <ins class="diffchange diffchange-inline">can be complex</ins>, as usually data gets through a lot of steps and re-iterations. Given the nature of research itself, the objectives and methods of&nbsp;your analysis might change&nbsp;various times, you might keep some of the steps and modify&nbsp;others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility of your research.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of the steps automatically, but ultimately none of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of are:</span></span></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">'''</ins><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of are:</span></span><ins class="diffchange diffchange-inline">'''</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a DOI&nbsp;you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations where you do not have choice, when you do you should always prefer&nbsp;a well-documented dataset to one which is poorly documented.</span></span>  </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a DOI&nbsp;you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations where you do not have choice, when you do you should always prefer&nbsp;a well-documented dataset to one which is poorly documented.</span></span>  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, we recommend [[Git_Introduction|git]]. Git and GitHub come with lots of tools and options, make the most of them:&nbsp;readme files, releases, issues, project plans and commit messages, they all help you not only tracking the changes but why they happened.</span></span>  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, we recommend [[Git_Introduction|git]]. Git and GitHub come with lots of tools and options, make the most of them:&nbsp;readme files, releases, issues, project plans and commit messages, they all help you not only tracking the changes but why they happened.</span></span></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">if you are using someone else code, as for input data,&nbsp;make sure is properly documented and versioned.</span></span>  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">if you are using someone else code, as for input data,&nbsp;make sure is properly documented and versioned.</span></span></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset. Metadata and use of standards&nbsp;make easier sharing data and code, in fact might be required. It will also help you in the future if you need to get back to them after a long break.</span></span>  </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset. Metadata and use of standards&nbsp;make easier sharing data and code, in fact might be required. It will also help you in the future if you need to get back to them after a long break.</span></span>  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review your provenance often, before you can forget, you could make it a&nbsp;habit at the end of a working day to make sure your previous notes, metadata etc are all still relevant. It will only take a few minutes.</span></span>  </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review your provenance often, before you can forget, you could make it a&nbsp;habit at the end of a working day to make sure your previous notes, metadata etc are all still relevant. It will only take a few minutes.</span></span>  </div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l20" >Line 20:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Our [[Data_Management_Plan|data management plan webtool]] is set up to help you doing so, as you can update your plans at any time. it also has references to documentation and questions to remind you what to cover.</span></span></div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Our [[Data_Management_Plan|data management plan webtool]] is set up to help you doing so, as you can update your plans at any time. it also has references to documentation and questions to remind you what to cover.</span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC also has a lot of resources <del class="diffchange diffchange-inline">online for </del>[https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC also has a lot of <ins class="diffchange diffchange-inline">online </ins>resources<ins class="diffchange diffchange-inline">&nbsp;covering&nbsp;</ins>[https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&nbsp;</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">[[Category:Data]] </del>[[Category:Data induction]]</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Data induction]]</div></td></tr>
</table>P.petrellihttp://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&diff=3030&oldid=prevP.petrelli at 06:20, 30 June 20212021-06-30T06:20:37Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 06:20, 30 June 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">{{Template</del>:<del class="diffchange diffchange-inline">Working on}}</del></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span style="font-size</ins>:<ins class="diffchange diffchange-inline">medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance, also referred to&nbsp;as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for its creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to the point a dataset is published and you have the project&nbsp;final product.</span></span></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance<del class="diffchange diffchange-inline">, also referred to sometimes as lineage, </del>is <del class="diffchange diffchange-inline">the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for its creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial</del>,<del class="diffchange diffchange-inline">Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to the point a dataset is published and you have the project&nbsp;final product. Provenance is complicated </del>as usually data gets through a lot of steps and re-iterations. Given the nature of research itself, the objectives and methods <del class="diffchange diffchange-inline">or </del>your analysis might change&nbsp;various <del class="diffchange diffchange-inline">time</del>, you might keep some of the steps and modify&nbsp;others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility of your research.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of steps automatically, but ultimately none of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance is <ins class="diffchange diffchange-inline">complicated</ins>, as usually data gets through a lot of steps and re-iterations. Given the nature of research itself, the objectives and methods <ins class="diffchange diffchange-inline">of&nbsp;</ins>your analysis might change&nbsp;various <ins class="diffchange diffchange-inline">times</ins>, you might keep some of the steps and modify&nbsp;others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility of your research.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of <ins class="diffchange diffchange-inline">the </ins>steps automatically, but ultimately none of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of are:</span></span></div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of are:</span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a DOI&nbsp;you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations where you do not have choice, when you do a well-documented dataset is <del class="diffchange diffchange-inline">a safer option for your analysis</del>.</span></span>  </div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a DOI&nbsp;you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations where you do not have choice, when you do <ins class="diffchange diffchange-inline">you should always prefer&nbsp;</ins>a well-documented dataset <ins class="diffchange diffchange-inline">to one which </ins>is <ins class="diffchange diffchange-inline">poorly documented</ins>.</span></span>  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, <del class="diffchange diffchange-inline">again as for data</del>, <del class="diffchange diffchange-inline">if you are using someone else code </del>make <del class="diffchange diffchange-inline">sure is properly documented and versioned.(link) Use all </del>the <del class="diffchange diffchange-inline">options given you but a version control system, for example Github has </del>readme files, issues, project plans and commit messages, they all help you not only tracking the changes but why they happened.</span></span>  </div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset<del class="diffchange diffchange-inline">, not only that is necessary when you will eventually want to share the </del>data <del class="diffchange diffchange-inline">or </del>code, <del class="diffchange diffchange-inline">but it </del>will <del class="diffchange diffchange-inline">make it easier for </del>you to <del class="diffchange diffchange-inline">remember what the data is and what the code is doing</del></span></span>  </div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, <ins class="diffchange diffchange-inline">we recommend [[Git_Introduction|git]]. Git and GitHub come with lots of tools and options</ins>, make the <ins class="diffchange diffchange-inline">most of them:&nbsp;</ins>readme files<ins class="diffchange diffchange-inline">, releases</ins>, issues, project plans and commit messages, they all help you not only tracking the changes but why they happened.</span></span></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review often, before you can forget, you could make it a&nbsp;habit at the end of a working day to make sure your previous notes, metadata etc are all still relevant. It will only take a few minutes.</span></span>  </div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">*</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">if you are using someone else code, as for input data,&nbsp;make sure is properly documented and versioned.</span></span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset<ins class="diffchange diffchange-inline">. Metadata and use of standards&nbsp;make easier sharing </ins>data <ins class="diffchange diffchange-inline">and </ins>code, <ins class="diffchange diffchange-inline">in fact might be required. It </ins>will <ins class="diffchange diffchange-inline">also help you in the future if </ins>you <ins class="diffchange diffchange-inline">need to get back </ins>to <ins class="diffchange diffchange-inline">them after a long break.</ins></span></span>  </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review <ins class="diffchange diffchange-inline">your provenance </ins>often, before you can forget, you could make it a&nbsp;habit at the end of a working day to make sure your previous notes, metadata etc are all still relevant. It will only take a few minutes.</span></span>  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In conclusion provenance is a progressive account of your research, part of the provenance will be directly attached to the data or the code you used, but it is good to have one document that collect all the other sources. A data management plan is a good template for such a document, if you create one at the start of your project and update it regularly you will have your work done when you want to publish the data, when you need to describe your research&nbsp;in a paper, or even before leaving an institution at the end of your PhD or postdoc.</span></span></div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In conclusion provenance is a progressive account of your research, part of the provenance will be directly attached to the data or the code you used, but it is good to have one document that collect all the other sources. A data management plan is a good template for such a document, if you create one at the start of your project and update it regularly you will have your work done when you want to publish the data, when you need to describe your research&nbsp;in a paper, or even before leaving an institution at the end of your PhD or postdoc.</span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC has a lot of resources online for [https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Our [[Data_Management_Plan|data management plan webtool]] is set up to help you doing so, as you can update your plans at any time. it also has references to documentation and questions to remind you what to cover.</span></span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC <ins class="diffchange diffchange-inline">also </ins>has a lot of resources online for [https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Data]][[Category:Data induction]]</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Data]] [[Category:Data induction]]</div></td></tr>
</table>P.petrellihttp://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&diff=3006&oldid=prevP.petrelli at 03:41, 21 June 20212021-06-21T03:41:32Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 03:41, 21 June 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l14" >Line 14:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC has a lot of resources online for [https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">The ARDC has a lot of resources online for [https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</span></span></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category:Data]][[Category:Data induction]]</ins></div></td></tr>
</table>P.petrellihttp://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&diff=3005&oldid=prevP.petrelli at 03:41, 21 June 20212021-06-21T03:41:07Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 03:41, 21 June 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">{{Template:Working on}} <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance , also referred to somethimes as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for it's creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to hte point a dataset is published and so you have the project&nbsp;final product. Provenance is complicated as usually data gets through a lot of steps an re-iterations. Given the nature of research itself, the objectives and methods or your analysis might changed various time, you might keep some of the steps and change others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility and to be able to share your data.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of steps automatically, but ultimately non of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp; <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of</span></span> </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a doi you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations were you do not have choice, when you do a well documented dataset is a safer option for your analysis.</span></span> </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, again as for data if you are using someone else code make sure is properly documented and versioned.(link) Use all the options given you but a version control system, for example github has readme files, issues, project plans and commit messages, they all help you not only tracking the chnages but why they happened.</span></span> </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">*</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset, not only that is necessary when you will ventually want to share the data or code, but it will make it easier for you to remember what the data is and what the code is doing</span></span></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">*</del></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">{{Template</ins>:<ins class="diffchange diffchange-inline">Working on}}</ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"><span style="font-size</del>:<del class="diffchange diffchange-inline">medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review often, before you can forget, you could make it an habit at the end of a working day to make sure your previous notes, metadata etc are all sitll relevant.</span></span></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance, also referred to sometimes as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for its creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to the point a dataset is published and you have the project&nbsp;final product. Provenance is complicated as usually data gets through a lot of steps and re-iterations. Given the nature of research itself, the objectives and methods or your analysis might change&nbsp;various time, you might keep some of the steps and modify&nbsp;others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility of your research.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of steps automatically, but ultimately none of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp;</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In conclusion provenance is a progressive account of your research, part of the provenance will be directly attached to the data or the code you used, but it is good to have one document that collect all the other sources. A data management plan is a good template for such a document, if you create one at the start of your project and update it regularly you will have your work done when you want to publish the data, when you need to describe your research&nbsp;in a paper, or even before leaving an institution at the end of your PhD or postdoc. <del class="diffchange diffchange-inline">&nbsp</del>;</span></span></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of are:</span></span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a DOI&nbsp;you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations where you do not have choice, when you do a well-documented dataset is a safer option for your analysis.</span></span> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, again as for data, if you are using someone else code make sure is properly documented and versioned.(link) Use all the options given you but a version control system, for example Github has readme files, issues, project plans and commit messages, they all help you not only tracking the changes but why they happened.</span></span> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset, not only that is necessary when you will eventually want to share the data or code, but it will make it easier for you to remember what the data is and what the code is doing</span></span> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review often, before you can forget, you could make it a&nbsp;habit at the end of a working day to make sure your previous notes, metadata etc are all still relevant. It will only take a few minutes.</span></span> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In conclusion provenance is a progressive account of your research, part of the provenance will be directly attached to the data or the code you used, but it is good to have one document that collect all the other sources. A data management plan is a good template for such a document, if you create one at the start of your project and update it regularly you will have your work done when you want to publish the data, when you need to describe your research&nbsp;in a paper, or even before leaving an institution at the end of your PhD or postdoc.<ins class="diffchange diffchange-inline"></span></span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif</ins>;<ins class="diffchange diffchange-inline">">The ARDC has a lot of resources online for [https://ardc.edu.au/resources/working-with-data/data-provenance/ data provenance].</ins></span></span></div></td></tr>
</table>P.petrellihttp://climate-cms.wikis.unsw.edu.au/index.php?title=Provenance&diff=3004&oldid=prevP.petrelli: Created page with "{{Template:Working on}} <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance , also referred to somethimes as lineage, is the docum..."2021-06-21T03:33:37Z<p>Created page with "{{Template:Working on}} <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance , also referred to somethimes as lineage, is the docum..."</p>
<p><b>New page</b></p><div>{{Template:Working on}} <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Provenance , also referred to somethimes as lineage, is the documentation of a dataset origin. It includes how the data was collected or generated, which methodologies, instruments and/or&nbsp;software were used for it's creation.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">You could think of it as the data workflow from the start of a project to hte point a dataset is published and so you have the project&nbsp;final product. Provenance is complicated as usually data gets through a lot of steps an re-iterations. Given the nature of research itself, the objectives and methods or your analysis might changed various time, you might keep some of the steps and change others.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Which is why having a good provenance is so important for the reproducibility and to be able to share your data.</span></span> <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">There are&nbsp;tools available to help recording some of steps automatically, but ultimately non of them will produce a good provenance record without regular manual intervention. It is not enough to track the changes you also need to know why they happened.</span></span> &nbsp; <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Important things to keep track of</span></span> <br />
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">what data&nbsp;you used as input, if any, it helps if the dataset has been published and has a doi you can refer to,&nbsp;or it is at least well documented and properly versioned. While there are situations were you do not have choice, when you do a well documented dataset is a safer option for your analysis.</span></span> <br />
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use a version control system for your analysis code, again as for data if you are using someone else code make sure is properly documented and versioned.(link) Use all the options given you but a version control system, for example github has readme files, issues, project plans and commit messages, they all help you not only tracking the chnages but why they happened.</span></span> <br />
*<br />
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">use good coding practices and metadata conventions for your dataset, not only that is necessary when you will ventually want to share the data or code, but it will make it easier for you to remember what the data is and what the code is doing</span></span><br />
<br />
*<br />
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">review often, before you can forget, you could make it an habit at the end of a working day to make sure your previous notes, metadata etc are all sitll relevant.</span></span><br />
<br />
<br />
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In conclusion provenance is a progressive account of your research, part of the provenance will be directly attached to the data or the code you used, but it is good to have one document that collect all the other sources. A data management plan is a good template for such a document, if you create one at the start of your project and update it regularly you will have your work done when you want to publish the data, when you need to describe your research&nbsp;in a paper, or even before leaving an institution at the end of your PhD or postdoc. &nbsp;</span></span></div>P.petrelli