Out of space

Revision as of 23:50, 17 October 2021 by P.petrelli (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

You're out of space in your project on gadi, what now?   Broadly there are four options: delete, compress, move or archive

Delete

The best option is to delete files you no longer require e.g.

  • failed model runs
  • duplicated data (stored in another location) 
  • intermediate data in analysis (delete intermediates but keep scripts to recreate if required)
  • model fields not required (many models default to writing out many more fields than most people require or use)
  • code (should be in a version controlled repository, e.g. GitHub)

Compress

All netCDF files you need to keep should be compressed. There is a detailed page on compression tools.

If the number of inodes (files) is the quota that needs to be reduced then using tools like tar and zip can be used to collect smaller files, such as text files, into a single archive file.

Move

Some projects have smaller allocations than others, or more free space. Assuming it is appropriate to do so, then it may be legitimate to change the group to a different project using the chgrp command. Note that for the purposes of accounting at NCI the group of a file is used to determine which quota it counts against, not the physical location on disk.

There is some short term storage available under the Climate LIEF grant that is administed by the CMS Team. All eligible researchers associated with Climate research at NCI can apply for short term storage.

Most home institutions have local storage options for data. This is also an option if it is not essential for the data to remain on NCI systems for further processing or analysis.

Note that when copying data from one location to another the best tool to use is rsync. Note that when using rsync the normal archive option (-a) will also change the group of the files after copying to be the same as they are in the original location. On NCI systems this can be undesirable due to the way the accounting system works, so the best combination of options is

rsync -vrltoD --safe-links srcdir targetdir

Archive

NCI has a tape-based data store (MDSS). This is by far the cheapest storage option at NCI, so also boasts the largest available capacity. In general this is where data is stored when it is no longer required for analysis. It is possible to retrieve archived data from MDSS, so it is feasible to archive data before completely finished analysing it, and keep only the fraction that is being actively analysed. It is not a suitable location for data files that need to be regularly accessed, e.g. model inputs, or for data that is likely to still be modified or updated.

There are detailed instructions about archiving your data to the Mass Data Store on gadi.