Dusql

Revision as of 00:54, 29 August 2019 by S.wales (talk | contribs)

dusql is a disk usage analysis tool developed by CMS to help deal with data on our storage areas at NCI

dusql is installed in the 'unstable' CMS conda environment, to use it run

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable

dusql --help

Commands

ncdu: Finding Files Interactively

The simplest way to find files is to use the interactive viewer, dusql ncdu. This is a basic text interface that shows how many files match a given condition in each directory.

Say you want to find big files in your /short directory. You might run dusql ncdu /short/$PROJECT/$USER --size=10gb to find all the files larger than 10 GB

du: Summarising a Directory

dusql du works the same as ncdu, it shows the total size and file count of files matching some constraint under a directory, but rather than the text interface it just prints a summary for each directory to screen. You can give it multiple directories as well, e.g. to find files under the current directory older than 3 years:

$ dusql du * --mtime=-3y | sort -hr
304.99GB,     6624 files, um-ostia
  4.76GB,      223 files, wrf-era
  3.41GB,     1003 files, access-cm2-ukca
  1.94GB,       98 files, mpas
919.57MB,        1 files, nu-wrf_v8-wrf371-lis71rp7.tgz

It's helpful to pipe the output of dusql du to sort -hr as shown above to order the paths by size, or sort -nr -k 2 to sort by file count.

find: Listing Individual Files

dusql find will print out the paths of all matching files. It can be helpful if there's just a few files you're trying to track down:

$ dusql find . --mtime=-7y | head
/short/w35/saw562/scratch/spherepack3.2/Makefile
/short/w35/saw562/scratch/wrf-era/FILE:2006-03-02_18
/short/w35/saw562/scratch/wrf-era/SST:2006-03-03_12
/short/w35/saw562/scratch/wrf-era/FILE:2006-03-01_00
/short/w35/saw562/scratch/wrf-era/SST:2006-03-01_12