Scratch file expiry

Revision as of 18:40, 26 April 2022 by C.carouge (talk | contribs)

On 17th May 2022, NCI is introducing an automatic system to purge unused data from the scratch filesystem.   Below we are giving some steps on how best to prepare for that change. We also give some information on some useful tools to prepare a good data workflow going forward.

Preparation

Your best course of action is to start preparing now to avoid having any useful data going into quarantine on the 17th May. Below are the steps we recommend you to follow:

  • Read the information provided by NCI
  • Clean up /g/data: delete what you can, archive to tape or outside NCI what you can.
  • Clean up /scratch: delete what you can, move to /g/data or tape or outside NCI if you need long term storage without accessing it.
  • Run “nci-file-expiry list-warnings -p <project> > expiry_warning_<project>.txt” for all the projects you are a member.
  • Check the output of nci-file-expiry. If you identify anything here that is important and at risk of deletion, decide if it should be put on /g/data or tape instead.
  • Run “nci-file-expiry list-warnings” : this will catch any file you own in any project you may have forgotten about and decide what to do with anything that appear there.
  • Rethink your data pipeline: do not leave data you are not using anymore in /scratch. Decide what you need to do with it when you stop using it: delete, move to /g/data or outside NCI or archive to tape.
  • Manage your data under /g/data regularly: review your data, archive to tape or outside NCI as necessary.

Additional information

Below is a list of resources you might find useful to prepare for the automatic file expiry and build a good data workflow for the future.

Archiving data at NCI

Blog on building a sustainable data workflow

Description of the various filesystems at NCI