Scratch file expiry
On 17th May 2022, NCI is introducing an automatic system to purge unused data from the scratch filesystem. Below we are giving some steps on how best to prepare for that change. We also give some information on some useful tools to prepare a good data workflow going forward.
Preparation
Your best course of action is to start preparing now to avoid having any useful data going into quarantine on the 17th May. Below are the steps we recommend you to follow:
- Read the information provided by NCI
- Clean up /g/data: delete what you can, archive to tape or outside NCI what you can.
- Clean up /scratch: delete what you can, move to /g/data or tape or outside NCI if you need long term storage without accessing it.
- Run “nci-file-expiry list-warnings -p <project> > expiry_warning_<project>.txt” for all the projects you are a member.
- Check the output of nci-file-expiry. If you identify anything here that is important and at risk of deletion, decide if it should be put on /g/data or tape instead.
- Run “nci-file-expiry list-warnings” : this will catch any file you own in any project you may have forgotten about and decide what to do with anything that appear there. From October 2022 gadi shows, on login, a summary of files which are about to expire or that are quarantined.
- Rethink your data pipeline: do not leave data you are not using anymore in /scratch. Decide what you need to do with it when you stop using it: delete, move to /g/data or outside NCI or archive to tape.
- Manage your data under /g/data regularly: review your data, archive to tape or outside NCI as necessary.
Additional information
Below is a list of resources you might find useful to prepare for the automatic file expiry and build a good data workflow for the future.
Blog on building a sustainable data workflow
Description of the various filesystems at NCI
Long term strategy
- Keep using /scratch. There is not enough disk space if nobody is using /scratch. Your data is safe on /scratch as long as you access it, i.e. read, modify, create. Only the data you won't use for some time needs to be managed.
- Managing your data does not mean storing everything on /g/data. You need to think about what future use you have for that data. If you don't need it anymore and it can be reproduced, delete it. If you will need to publish the data, publish it now. It is a lot easier to publish a new version if you need a small change than to publish the initial version. For information, on what data needs to be published see the CLEX Data Policy and Which_data_should_I_publish page. If you can't publish now but won't need it for a long time, put the data on tape. If you won't need this data for a short time but you know you will get back to it soon, put it on /g/data
- Learn how to use the tape system. See how to use the MDSS tape system or our blog.
- Try and avoid your data getting into quarantine. Make managing your data a periodic, frequent task in your calendar. This includes all data: /scratch, /g/data and tape.
- Be careful when going on leave! Make sure to check your data holdings under /scratch before your leave.
- For extended leave (e.g. ship cruises, parental leave, sabbatical), do not leave any important data under /scratch.