Accounting at NCI

Revision as of 16:31, 6 October 2021 by C.carouge (talk | contribs)

This page describes the different tools available for the accounting of computing and storage resources at NCI. Those tools have been developed by NCI and the CMS team.

Grafana

CMS has put in place a Grafana server for visualising a range of accounting statistics for CLEx. You can access this server using your NCI credentials at: https://accessdev.nci.org.au/grafana/login

Unfortunately, we haven't yet put in place the collection of all the statistics for Gadi. Currently, you can see timeseries of your storage and SU usage on the User Report.

We will update this section as more statistics become available.

Computing resources

nci_account

You can see the current allocation and the usage so far in the quarter for any computing project you are a member of. Using the -v option, you can see the usage per user:

nci_account -v

nqstat

qstat allows you to monitor your jobs at NCI. nqstat allows you to see the jobs currently submitted:

  • for a given project
  • for a given queue and project
  • for a given user and project

uqstat

The information output by qstat and nqstat isn't always enough. CMS has developed uqstat to output more information by default such as the job efficiency (cpu%), the queueing time, the walltime, the cost in SU etc.

To use this command, please load the nci-scripts module:

module use /g/data/hh5/public/modules
module load nci-scripts

Contrary to what we usually recommend, it should be safe to load this module in your .bashrc file so this module is loaded by default.

uqstat 

The -x option allows you to see the same information on current jobs and jobs finished up to 24 hours previously.

qcost

CMS has also developed qcost a tool to help you choose the best PBS configuration for your job. qcost calculates the cost, in SU, for a job submitted to the PBS system. The PBS queue information is provided by NCI but it can be tedious to determine which configuration of queue and memory request should be used to minimise job cost. qcost was created to make this process easier.

To use this command, please load the nci-scripts module (see above). Usage:

 qcost -h 
 usage: qcost [-h] -q QUEUE -n NCPUS -m MEM [-t TIME]
 
 Return the cost (in SUs) for a PBS job submitted on gadi. No checking is done to ensure requests are within queue limits.
 
 optional arguments:
   -h, --help                show this help message and exit
   -q QUEUE, --queue QUEUE   PBS queue
   -n NCPUS, --ncpus NCPUS   Number of cpus
   -m MEM, --mem MEM         Requested memory
   -t TIME, --time TIME      Job walltime (hours)
 
 e.g. qcost -q normal -n 4 -m 60GB -t 3:00:00

Note that if no time is specified it defaults to 1 hour (1:00:00). Walltime can be specified as H:M or H:M:S. Memory must be specified in units of bytes (B), e.g. 160GB, 2000MB. For example:

   qcost -q normal -n 4 -m 60GB -t 3:00:00
   90 SU

   qcost -q express -n 4 -m 60GB -t 3:00:00
  270 SU

Storage resources

lquota

This command will list the quota and usage for all the projects you are a member of on both /scratch and /g/data. It will give usage and quota for both space and number of files.

nci-files-report

This command will list the usage in both space and number of files along different dimensions depending on options.

Usage owned by a project per user

If you want to know who owns files in your project or where those files are, you should use:

nci-files-report --group w35 --filesystem gdata

This is probably the most useful options for the project's managers to find who has the most data owned by a project.

Usage in a project directory per user

If you want to know who owns files in the main directory of one of your projects, you should use:

nci-files-report --project w35 --filesystem gdata

This is probably the least useful option for this function.

Usage per user

If you want to know your data footprint across all your projects in a filesystem, you should use:

nci-files-report --user --filesystem gdata

Why are the totals different in nci-files-report and Grafana?

CMS is collecting the data for Grafana and we can only scan data that has group read permissions. We ask everyone to put those permissions on if there are no restrictions on your scientific project

ncfind and nccompres

These utilities are maintained by the CMS team. These allow you to find uncompressed netcdf files and to easily compress them. You'll find a complete documentation on this wiki page

du, find and wc

These commands can help you identify where in your area are the large files or which directories contain lots of files. These are Unix commands and all have man pages for information.

Useful options:

  • Check the sizes of sub-directories:
 du -shc *
  • Find files larger than X
 find . -type f -size +100M
  • Count number of files and subdirectories in directory
 ls -1 | wc -l

Note: the Internet is the perfect source for commands and options variations to find this type of information on your files and directories.

ncdu

ncdu is a disk usage analyzer with an ncurses interface. It is designed to find space hogs and aims to be fast, simple and easy to use.

This command is part of the conda environment maintained by CMS. To use:

 module purge
 module use /g/data3/hh5/public/modules
 module load conda/analysis3

To invoke run ncdu directory_path and replace directory_path with a path to a directory you wish to check.

Some example output:

 ncdu /g/data3/hh5/public/apps

Gives a screen items in the directory sorted by size:

 --- /g/data3/hh5/public/apps ----------------------------------------------------
    64.0 GiB [##########] /miniconda3
   799.6 MiB [          ] /lrose
   152.3 MiB [          ] /easybuild
     4.0 KiB [          ]  README
 
  Total disk usage:  64.9 GiB  Apparent size:  64.6 GiB  Items: 2461484

Pushing <g> twice changes the way the usage is displayed

 ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
 --- /g/data3/hh5/public/apps ----------------------------------------------------
    64.0 GiB [ 98.6% ##########] /miniconda3
   799.6 MiB [  1.2%           ] /lrose
   152.3 MiB [  0.2%           ] /easybuild
     4.0 KiB [  0.0%           ]  README
  Total disk usage:  64.9 GiB  Apparent size:  64.6 GiB  Items: 2461484

Select a directory, push <return> and it will show you a view of that directory sorted by size

 ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
 --- /g/data3/hh5/public/apps/miniconda3 ------------------------------------------
                               /..
    33.2 GiB [ 51.9% ##########] /pkgs
    22.6 GiB [ 35.3% ######    ] /envs  
    11.8 GiB [ 18.5% ###       ] /old-envs
     4.6 GiB [  7.1% #         ] /conda-bld
   236.6 MiB [  0.4%           ] /lib
     8.7 MiB [  0.0%           ] /share
     6.1 MiB [  0.0%           ] /bin
     5.1 MiB [  0.0%           ] /include
     4.9 MiB [  0.0%           ] /conda-meta
     4.3 MiB [  0.0%           ] /ssl
     1.9 MiB [  0.0%           ] /compiler_compat
    28.0 KiB [  0.0%           ] /etc
    12.0 KiB [  0.0%           ] /x86_64-conda_cos6-linux-gnu
     4.0 KiB [  0.0%           ] /locks
     4.0 KiB [  0.0%           ]  LICENSE.txt
  Total disk usage:  64.0 GiB  Apparent size:  63.7 GiB  Items: 2446705

Select /.. at the top to move back up the directory tree. In this way it is possible to navigate the directory structure and pinpoint the largest users of disk space. The disk usage information is cached, so once it has analysed a directory tree it is very fast to navigate.

Press to quit the program.