Accounting at NCI
This page describes the different tools available for the accounting of computing and storage resources at NCI. Those tools have been developed by NCI and the CMS team.
CMS has put in place a Grafana server for visualising a range of accounting statistics for CLEx. You can access this server using your NCI credentials at: https://accessdev.nci.org.au/grafana/login
Unfortunately, we haven't yet put in place the collection of all the statistics for Gadi. Currently, you can see timeseries of your storage and SU usage on the User Report.
We will update this section as more statistics become available.
You can see the current allocation and the usage so far in the quarter for any computing project you are a member of. Using the
-v option, you can see the usage per user:
qstat allows you to monitor your jobs at NCI.
nqstat allows you to see the jobs currently submitted:
- for a given project
- for a given queue and project
- for a given user and project
The information output by
nqstat isn't always enough. CMS has developed
uqstat to output more information by default such as the job efficiency (cpu%), the queueing time, the walltime, the cost in SU etc.
To use this command, please load the
module use /g/data/hh5/public/modules module load nci-scripts
Contrary to what we usually recommend, it should be safe to load this module in your .bashrc file so this module is loaded by default.
-x option allows you to see the same information on current jobs and jobs finished up to 24 hours previously.
CMS has also developed
qcost a tool to help you choose the best PBS configuration for your job.
qcost calculates the cost, in SU, for a job submitted to the PBS system. The PBS queue information is provided by NCI but it can be tedious to determine which configuration of queue and memory request should be used to minimise job cost. qcost was created to make this process easier.
To use this command, please load the
nci-scripts module (see above). Usage:
qcost -h usage: qcost [-h] -q QUEUE -n NCPUS -m MEM [-t TIME] Return the cost (in SUs) for a PBS job submitted on gadi. No checking is done to ensure requests are within queue limits. optional arguments: -h, --help show this help message and exit -q QUEUE, --queue QUEUE PBS queue -n NCPUS, --ncpus NCPUS Number of cpus -m MEM, --mem MEM Requested memory -t TIME, --time TIME Job walltime (hours) e.g. qcost -q normal -n 4 -m 60GB -t 3:00:00
Note that if no time is specified it defaults to 1 hour (1:00:00). Walltime can be specified as H:M or H:M:S. Memory must be specified in units of bytes (B), e.g. 160GB, 2000MB. For example:
qcost -q normal -n 4 -m 60GB -t 3:00:00 90 SU qcost -q express -n 4 -m 60GB -t 3:00:00 270 SU
This command will list the quota and usage for all the projects you are a member of on both /scratch and /g/data. It will give usage and quota for both space and number of files.
This command will list the usage in both space and number of files along different dimensions depending on options.
Usage owned by a project per user
If you want to know who owns files in your project or where those files are, you should use:
nci-files-report --group w35 --filesystem gdata
This is probably the most useful options for the project's managers to find who has the most data owned by a project.
Usage in a project directory per user
If you want to know who owns files in the main directory of one of your projects, you should use:
nci-files-report --project w35 --filesystem gdata
This is probably the least useful option for this function.
Usage per user
If you want to know your data footprint across all your projects in a filesystem, you should use:
nci-files-report --user --filesystem gdata
Why are the totals different in
nci-files-report and Grafana?
CMS is collecting the data for Grafana and we can only scan data that has group read permissions. We ask everyone to put those permissions on if there are no restrictions on your scientific project.
ncfind and nccompress
These utilities are maintained by the CMS team. These allow you to find uncompressed netcdf files and to easily compress them. You'll find a complete documentation on this wiki page
du, find and wc
These commands can help you identify where in your area are the large files or which directories contain lots of files. These are Unix commands and all have man pages for information.
- Check the sizes of sub-directories:
du -shc *
- Find files larger than X
find . -type f -size +100M
- Count number of files and subdirectories in directory
ls -1 | wc -l
Note: the Internet is the perfect source for commands and options variations to find this type of information on your files and directories.
ncdu is a disk usage analyzer with an ncurses interface. It is designed to find space hogs and aims to be fast, simple and easy to use.
This command is part of the conda environment maintained by CMS. To use:
module purge module use /g/data3/hh5/public/modules module load conda/analysis3
To invoke run ncdu directory_path and replace directory_path with a path to a directory you wish to check.
Some example output:
Gives a screen items in the directory sorted by size:
--- /g/data3/hh5/public/apps ---------------------------------------------------- 64.0 GiB [##########] /miniconda3 799.6 MiB [ ] /lrose 152.3 MiB [ ] /easybuild 4.0 KiB [ ] README Total disk usage: 64.9 GiB Apparent size: 64.6 GiB Items: 2461484
Pushing <g> twice changes the way the usage is displayed
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help --- /g/data3/hh5/public/apps ---------------------------------------------------- 64.0 GiB [ 98.6% ##########] /miniconda3 799.6 MiB [ 1.2% ] /lrose 152.3 MiB [ 0.2% ] /easybuild 4.0 KiB [ 0.0% ] README
Total disk usage: 64.9 GiB Apparent size: 64.6 GiB Items: 2461484
Select a directory, push <return> and it will show you a view of that directory sorted by size
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help --- /g/data3/hh5/public/apps/miniconda3 ------------------------------------------ /.. 33.2 GiB [ 51.9% ##########] /pkgs 22.6 GiB [ 35.3% ###### ] /envs 11.8 GiB [ 18.5% ### ] /old-envs 4.6 GiB [ 7.1% # ] /conda-bld 236.6 MiB [ 0.4% ] /lib 8.7 MiB [ 0.0% ] /share 6.1 MiB [ 0.0% ] /bin 5.1 MiB [ 0.0% ] /include 4.9 MiB [ 0.0% ] /conda-meta 4.3 MiB [ 0.0% ] /ssl 1.9 MiB [ 0.0% ] /compiler_compat 28.0 KiB [ 0.0% ] /etc 12.0 KiB [ 0.0% ] /x86_64-conda_cos6-linux-gnu 4.0 KiB [ 0.0% ] /locks 4.0 KiB [ 0.0% ] LICENSE.txt
Total disk usage: 64.0 GiB Apparent size: 63.7 GiB Items: 2446705
Select /.. at the top to move back up the directory tree. In this way it is possible to navigate the directory structure and pinpoint the largest users of disk space. The disk usage information is cached, so once it has analysed a directory tree it is very fast to navigate.
to quit the program.