Difference between revisions of "Leaving the Centre guidelines"
(→To keep, always)
|Line 1:||Line 1:|
== Sorting your data ==
== Sorting your data ==
Revision as of 20:32, 11 March 2020
The goal of these guidelines is to make sure you do not leave your mess behind you for someone else to sort. And to make sure, it is easier to identify data and code that others in the Centre might find useful to them.
Sorting your data
You need to know what you will be required to do when you leave before you leave so you can prepare for it. Sorting through your files might take longer than you think.
You might need to discuss the following questions with your supervisor:
- Will you require access to the files after you leave (e.g. for a paper review)?
- What files will be useful to others? Should they be published?
- What files won't be useful to others? Should they be archived or deleted?
If you need specific advice on how to actually transfer or archive your files, you can always ask for assistance to our helpdesk: .
You require access to the files after you leave
Files at NCI
When you leave, you will keep your NCI credentials IF you keep your contact details up to date through https://my.nci.org.au
Make sure to quit all NCI projects you don't need access to anymore. You can be ruthless as it is quite easy to gain access to those later on if needed.
Some projects have strict license terms attached (e.g. access). Your membership might be revoked at any time after you leave the Centre if you haven't negotiated extended access to those projects. Please contact the Lead CI or the project manager to discuss your needs.
Files at your institution Most universities will close your university and e-mail account. Often university data services are accessible only via your university account. If that is the case, you need to arrange access for yourself by contacting the IT services or the CI of projects that you used to deposit data before you leave. Specifics on university data services and advice on what happens when you leave can be found following the relevant link in the data services page.
Your files will be used by others
Publish the files
You should publish your files if the files are frequently used by several other persons. Ideally, that data should be published as soon as its usefulness to others is clear, not just when you leave.
If your files are at NCI, that is all you need to do. If your files at held at your institution, you may need to make sure there is a copy that is accessible by others and owned by someone who is likely to stay for years to come. This might depend on your institution data services.
Change ownership of the files
If someone else comes after you and simply uses your files to continue on related work, you simply need to change the ownership of the files. If the files are at NCI, you can not change the files' ownership. You and the Lead CI of the project owning the files will need to contact [] so NCI staff can do it for you.
Files not useful anymore
It can be difficult to know what to keep and what to delete. A general rule might be to keep what is needed to reproduce your work and delete everything else. It gets complicated as this should be weighed against the cost, in money and time, of reproducing your work. This rule still allows us to clearly identify files that always need to be kept and files that never need to be kept.
To keep, always
There are 2 types of codes: codes distributed by others (e.g. climate models) and codes written by you.
For codes distributed by others, you simply need to keep a reference to the version used as long as no modification to the code was made by you. If you modified the code and this modification is part of a standard version of the model, you might be able to simply reference this version as long as it is exactly the version you used. If you modified the code for yourself only, this is now a code written by you and falls into the second category.
For codes written by you, the simplest is for you to save the code in a Git repository on Github, then to publish this repository via Zenodo. We can help with the publication. It is absolutely fine to create a repository per project or paper with all your codes in. You can then use the README file from the Github repository to explain how to reproduce your results. Don't forget to clearly reference everything one might need in addition to this repository. Or you can have a repository per code especially if you envision you'll reuse the same code for other work.
Configuration files for running the codes and some input files.
In addition to the codes themselves, you need to keep everything that enables someone to run the codes in the same way you have done so. Usually, the most complicated configurations are for climate models. Some climate models will save your configurations in version control repositories (e.g. UM, ACCESS, ACCESS-OM2), in which case you simply need to keep the information on how to retrieve these configurations. Some models don't save your configurations and you need to do it yourself.
For the input files, some inputs are published data in which case you need to keep the reference to this data (including the version). If you have written several codes, the output of a piece of code will be the input of the next piece of code, in which case you do not necessarily need to keep that data. But you need to keep the information on your workflow.
A description of your workflow
This can be a tricky one as there is not a one-size-fits-all format to save this information. It is fine to write a README file and archive it with other files from the project that need archiving. You can also have a special Github repository just for this README file, or a repository for all your projects with READMEs for each project. Whatever format you choose, it is important for this information to be publicly available (unless your project was restricted) and not a personal note.
This description should clearly describe step by step what someone should do to reproduce your work.
This information has to be kept for at least 5 years. The time requirements differ slightly depending on institutions and funding bodies.
To delete, always
You do not need to keep files that are not necessary to reproduce your work:
- log files
- failed experiments
- temporary files such as created from successive cdo/nco commands.
To keep or to delete?
Climate model outputs. Those are reproducible at least. It might not be possible to get bitwise reproducibility if underlying libraries change or the machine you have used is retired. The problem here is this data is usually quite large so the cost of storing it is important but it is also time-consuming to reproduce it. That is where you need to discuss with your supervisor as the answer might vary depending on how likely someone else might find this data useful.
Additionally, it is worth considering how many restart files you need to keep (if any). In most cases, you may need to archive fewer restart files for the long term as when you are actively working on a project.
Output of lengthy processing. It might feel necessary to keep those files but we would argue it is usually not worth the cost of the storage unless there is a clear indication they will be used again soon. The first reason is that with current modern programming techniques a lot of lengthy processing can be shortened significantly. The second reason is the very real cost of the storage is not worth the very hypothetical time saved in the future.
If you are leaving, or even if you are only changing position in the Center, one or more of the following might apply to you.
Your NCI user-id will stay the same unless you (or your new institution) specifically ask for it to be changed. NCI will suspend your account once you are no longer a member of any group or your contact details are not up to date. Occasionally projects review their active members by sending e-mails: keep your contact e-mail updated through my.nci.org.au
Leaving a project
If you leave a specific project:
- tidy up and document the files and directory structure and contact the lead-CI or project representative
- If you haven’t already, set r-X group access to all your files and directories
- If the project representative agrees, ask email@example.com to transfer ownership of your files to someone else in the project (specify the project and filesystem)
- If you want to transfer files externally
- use sftp, scp or rsync to transfer files securely ( rsync can be resumed )
- use the dedicated data-mover nodes, g-dm.nci.org.au for large file transfers
- use copyq if you want to queue a job
Leaving your institution