How to use MDSS tape storage at NCI

Revision as of 18:55, 9 May 2021 by P.petrelli (talk | contribs)

Massdata (Mass Data Storage System, MDSS for short) is the tape storage available at NCI. This kind of storage is intended for long term archiving of large files. Each project has a directory on the MDSS, the amount of storage allocated depends on the project allocation and can be checked using the nci_account command.

There is a comprehensive NCI guide to using MDSS

MDSS proper usage

MDSS is designed for medium to long-term archive of large files, so it is suitable for

  • Files you are required to keep, for example model outputs or configurations from published datasets, publications, PhD thesis etc.
  • Files that you or someone else are likely to reuse or analyse again in the future but not in the next few months. For example restart files or other model output you are not immediately using should be moved from disk to mdss as soon as possible.

Guidelines for storage

  • Big files: if your files are small in size (less than 20Mb) then use tools like tar to bundle them into a single archive file
  • Files should be group readable, with group execute permissions for directories. This helps with long term maintenance, allowing administrators to track the type and size of archived data.

MDSS is not a backup service and it is not suitable for code files you might want to keep, for this you might prefer to use online services as Github or Bitbucket.

Accessing MDSS

Massdata cannot be accessed directly via a directory path. All access of MDSS is via the command mdss mdss -P project-id + command (ls, put, get ..) See mdss -help to get a full list of the subcommands and Mdss --help subcommand to get specific help Please note mdss commands work only interactively or with ‘copyq’

  • Users connected to the project have rwx permissions in that directory and so may create their own files in those areas.


Preparing your data for mdss

  1. Organise your files and delete anything which you won’t be re-using. It is tempting to copy entire directories as they are to mdss thinking you’ll be getting back to them again later. There is currently no easy way to list what you are storing on massdata and so trying to tidy up after you uploaded your files would be slow and painful. Even more than with other storage options, it is really important to put there only suitable files and make sure that they have been compressed and tarred together if necessary
  2. NCI guidelines suggest a minimum size of 20MB per file and an average size of 250MB.

While you are preparing your data to be moved it is an opportunity to also document, if you haven’t done so already, what you are archiving and how. Even a simple readme file added to your main directory can help others and your future self. If you are archiving data underlying a publication or published dataset then it is important a summary of what is stored in /massdata and how is part of the dataset management plan.

Useful tools:

         TAR - to create archives

         Compressing tools

Monitoring mdss usage

nci_account -P <project-id> will give an account of the total massdata allocation, usage and availability both as size and i-nodes (i.e. number of files and directories)

Usage Report: Project=w35 Storage Period=2017.9 (01/07/2017-30/09/2017)
=======================================================================
-------------------------------------------------------------------------------------------------
System    StoragePt             Grant       Usage       Avail      iGrant      iUsage      iAvail
-------------------------------------------------------------------------------------------------
dmf       massdata          4048.00GB   1209.00GB   2839.00GB     323.00K       8.88K     314.12K
global    gdata1              76.00TB     59.69TB     16.31TB    3883.00K    2299.01K    1583.99K
global    gdata1a             76.00TB     60.47TB     15.53TB    3883.00K    2313.58K    1569.42K
raijin    short               15.00TB      7.54TB      7.46TB    3280.00K    2767.07K     512.93K
-------------------------------------------------------------------------------------------------
Total                        170.95TB    128.88TB     42.08TB      11.37M       7.39M       3.98M

Unfortunately there is not a command to check quickly usage by user-id as for /g/data and /short. The only way to get this information currently is to ask help@nci.org.au, administrators can access this information for any CI of the group.

Transferring data to and from MDSS

NCI supports different commands to work with MDSS as it is explained on their User Guide. The CMS team has also developed a utility called mdssdiff. This utility allows users to compare the contents of the local directory and a directory under /massdata. It will also recursively update the content on the massdata directory to copy the local directory or vice versa.

Modifications to MDSS datasets

Contact NCI at help@nci.org.au if large metadata operations are needed on massdata, as changing ownership, project code, permissions etc. of existing datasets