During your work at the Centre, you are likely to produce, use and share data on different systems. You will probably have access to two different systems: your University system and NCI.
Storage at NCI
NCI provides two types of storage: tape and disk. Tape is for long term storage while disk is more suited to store data you need to access often.
Tape at NCI
The tape system at NCI is called /massdata. Please read the archiving data wiki page to learn how to use this system. Here are a few important points to keep in mind:
- Tape is mostly appropriate for archiving data.
- You should only store big files on tape. If you want to migrate a lot of small files, you should first archive them together. To learn how to do that please have a look at "File compression and archiving" section below and email your questions to us.
- /massdata is only accessible from the login nodes (interactive) or via a script submitted to the copyq queue. It is generally recommended to use the copyq queue as you then have a much longer run time.
- Tape access (writing and reading) is slow.
- Considering data on /massdata are likely to be un-used for a long time it is quite essential to document your data. For example adding a detailed README file to your data folder can help a lot.
- Storage update for quota is only updated daily, overnight for massdata because of the size of it. It is then recommended to act quickly for clean up and if possible before breaching the quota.
It might be possible to add quota on massdata for your project to do so please send an email to us detailing how much additional space you would like and which NCI project it is for. Again, being tape storage the request might take some days to be processed so please plan ahead.
Disk at NCI
There are three different disk filesystems at NCI, each with a slightly different purpose. NCI also has a Users' Guide. All the disk filesystems are accessible from the login nodes and the compute nodes hence you can read/write to one while currently being in an other filesystem. And all these filesystems have access to massdata either through login nodes connection or sending a script to copyq queue.
- This is your home directory.
- This space is strictly limited at 10GB for each user but it is backed up.
- It is most suitable for storing source code rather than model outputs or observation datasets.
- You can monitor your use on home with the "quota" command.
- All projects have some storage on /scratch.
- The amount of space varies from a project to another.
- The management of the space is left to the responsibility of the members of each project. .
- When a project fills its quota on /short, the project's members will not be able to use the computing queues except for copyq queues to help with moving data around.
- To monitor the overall usage, please use the "nci_account" command. To monitor the usage per user, please use "nci-files-report -f scratch".
- Increasing the quota on /scratch for a project might be possible but is left to the decision of NCI staff. An extension can only be requested by the Lead CI for the project. You can check who the Lead CI is on my.nci.org.au.
- Most projects now have some storage on /g/data
- As for /scratch, the quota on /g/data is per project with management of the usage the sole responsibility of the project's members.
- /g/data can be less stable than /scratch.
- For compute nodes to have read and/or write access to this filesystem, you need to mount the specific area you need with the "-l storage" PBS flag
- A project's quota on /g/data cannot be increased with a simple request. There is a review of some of the quotas every year, at which point some projects might be granted an increase. If you need additional storage before then, please consider deleting old or incorrect data, archiving old data to /massdata, using temporary storage or your University system. If you still want an increase to be considered at review time, please make sure to discuss it with the Lead CI of your project who will be part of the review.
- To monitor usage, please use "nci_account". To see a per-user summary run the command "nci-files-report -f gdata"
The CMS is also managing two projects on NCI that can be used for temporary storage of data. Both are mounted on /g/data and have the same characteristic as other /g/data storage space as explained above.
To use any space on these projects, you need to:
- request connection to the project if you are not yet a member. You can check which projects you are part of with the "groups" command.
- fill a storage request using the CLEX DMPonline tool. If you do not yet have an account on this tool, please be patient for the account creation. To avoid robots and unauthorised access, the account creation requires human verification on our end. Please email us if you have any question about filling in the form. Note this form is principally to enable us to monitor the space used, requested and available. It also enables us to prepare a folder for you with appropriate permissions. The forms are very short and quick to fill and the storage is usually ready for use within a few hours. See this page for more detailed instructions on how to fill the form. Note:
- you do not need to email us, please just fill in the form to request the allocation you would like.
- also, you can request space for use by a whole group instead of per user, but all users of the group must request connection to the project.
The temporary storage projects are:
- /g/data/hh5: this project is for short temporary use (~3 months). It could be used for example to print your raw model outputs, then you would save a subset or a reformatted version to your project's space and move the raw outputs to /massdata for safekeeping.
- /g/data/ua8: the main purpose of this project is to store replica of datasets which do not have a specific data project assigned. However, the free space in this project can be used as temporary storage for data that is being processed for publication.
Storage at Universities
File compression and archiving
For an efficient use of storage, there are a few rules to keep in mind:
- it is more efficient to store a few larger files than lots of smaller files. It is hard to define large and small but files of several tens of gigabytes are absolutely acceptable. The size of the files should clearly also take into account how you or others are going to use them. File nearing 100 GB become unmanageable and should be produced only if there is no other option. In that case you should read about ...
- It is always best practice to compress your data when possible. Netcdf files are now easily compressible, see this wiki page for detailed explanations on tools available at NCI.
To store small setup files that define your experiments, think about using the "tar" command. This is a shell command with a manual accessible through
This command will save many files together in a single archive, it can be used on a directory tree and will restore the directory structure when restoring the files from the archive. This means if you have several experiments you need to save the setup of, the best way might be to create a directory tree containing the setup files of all the experiments then create one single archive file for all. The archive files can also easily be compressed/uncompressed using the gzip utility either at the archive creation time or afterwards: see the 2017 training material for usage of these commands and acls and tar cheat sheets. For more details see Archiving Data