Difference between revisions of "Dusql"
Line 9: | Line 9: | ||
</pre> | </pre> | ||
− | Dusql looks at a file list stored in a database in order to reduce the load on the filesystem. This database is updated nightly, just like <code>short_files_report</code>, so it won't notice changed or deleted files until the next day | + | Dusql looks at a file list stored in a database in order to reduce the load on the filesystem. This database is updated nightly, just like <code>short_files_report</code>, so it won't notice changed or deleted files until the next day. |
+ | |||
+ | We only scan the CLEX projects w35, w40, w42, w48, w97 and v45, and only members of these projects can access the database | ||
== Commands == | == Commands == |
Revision as of 22:58, 29 August 2019
dusql is a disk usage analysis tool developed by CMS to help deal with data on our storage areas at NCI
dusql is installed in the 'unstable' CMS conda environment, to use it run
module use /g/data/hh5/public/modules module load conda/analysis3-unstable dusql --help
Dusql looks at a file list stored in a database in order to reduce the load on the filesystem. This database is updated nightly, just like short_files_report
, so it won't notice changed or deleted files until the next day.
We only scan the CLEX projects w35, w40, w42, w48, w97 and v45, and only members of these projects can access the database
Contents
Commands
ncdu: Finding Files Interactively
The simplest way to find files is to use the interactive viewer, dusql ncdu
. This is a basic text interface that shows how many files match a given condition in each directory.
Say you want to find big files in your /short directory. You might run dusql ncdu /short/$PROJECT/$USER --size=10gb
to find all the files larger than 10 GB
du: Summarising a Directory
dusql du
works the same as ncdu
, it shows the total size and file count of files matching some constraint under a directory, but rather than the text interface it just prints a summary for each directory to screen. You can give it multiple directories as well, e.g. to find files under the current directory older than 3 years:
$ dusql du * --mtime=-3y | sort -hr 304.99GB, 6624 files, um-ostia 4.76GB, 223 files, wrf-era 3.41GB, 1003 files, access-cm2-ukca 1.94GB, 98 files, mpas 919.57MB, 1 files, nu-wrf_v8-wrf371-lis71rp7.tgz
It's helpful to pipe the output of dusql du
to sort -hr
as shown above to order the paths by size, or sort -nr -k 2
to sort by file count.
find: Listing Individual Files
dusql find
will print out the paths of all matching files. It can be helpful if there's just a few files you're trying to track down:
$ dusql find . --mtime=-7y | head /short/w35/saw562/scratch/spherepack3.2/Makefile /short/w35/saw562/scratch/wrf-era/FILE:2006-03-02_18 /short/w35/saw562/scratch/wrf-era/SST:2006-03-03_12 /short/w35/saw562/scratch/wrf-era/FILE:2006-03-01_00 /short/w35/saw562/scratch/wrf-era/SST:2006-03-01_12
Filters
All the dusql commands accept a common set of filters. If a file doesn't match the filter its size isn't included in the totals reported by ncdu and du:
--user=USER
Only matches a file if it is owned by usernameUSER
. Use--user=-USER
to only match if the file is not owned byUSER
--group=GROUP
Only matches a file if it is owned by groupGROUP
. Use--group=-GROUP
to only match if the file is not owned byGROUP
--mtime=TIME
Only match a file if it was create afterTIME
. Use--mtime=-TIME
to only match if the file was created beforeTIME
.TIME
may be:- A year
2015
- A date
20150326
- A time delta readable by Pandas
1y6m
- A year
--size=SIZE
Only match a file if it is larger thanSIZE
. Use--size=-SIZE
to only match if the file is smaller thanSIZE
.SIZE
can accept standard units, e.g.10gb
. If units aren't specified the size is assumed to be in bytes.
Things to Search For
- Files in your /short space not in the proper group
dusql ncdu /short/$PROJECT/$USER --group=-$PROJECT
- Files in your /short space older than 1 year
dusql ncdu /short/$PROJECT/$USER --mtime=-1y
(note in some circumstances the file age can be inaccurate, e.g. if it came from a tar file) - Files in your /g/data space larger than 10gb
dusql ncdu /g/data/$PROJECT/$USER --size=10gb