Difference between revisions of "Gadi"
A.heerdegen (talk | contribs) (Added explanation of copying and moving) |
|||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ||
+ | = Gadi processors = | ||
Gadi is NCI's primary supercomputer since January 2020. The supercomputer is composed of an assortment of processors, the majority of those are Cascade Lake processors. Different queues give access to different processors and have a different charging rate | Gadi is NCI's primary supercomputer since January 2020. The supercomputer is composed of an assortment of processors, the majority of those are Cascade Lake processors. Different queues give access to different processors and have a different charging rate | ||
Line 5: | Line 6: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
− | ! Queue | + | ! Queue |
+ | ! Memory | ||
+ | ! Priority | ||
+ | ! Charging rate per walltime-hour | ||
+ | ! CPU per node | ||
+ | ! Processor type | ||
+ | |- | ||
+ | | normal | ||
+ | | 192 GB | ||
+ | | Normal | ||
+ | | 2SU | ||
+ | | 48 | ||
+ | | Cascade Lake (CL), 3200 nodes | ||
|- | |- | ||
− | | | + | | normalbw |
+ | | 256 GB | ||
+ | | Normal | ||
+ | | 1.25SU | ||
+ | | 28 | ||
+ | | Broadwell (BW), 800 nodes | ||
|- | |- | ||
− | | express | + | | express |
+ | | 192 GB | ||
+ | | High | ||
+ | | 6SU | ||
+ | | 48 | ||
+ | | Cascade Lake (CL), 3200 nodes | ||
|- | |- | ||
− | | copyq | + | | copyq |
+ | | 192 GB | ||
+ | | Normal | ||
+ | | 2SU | ||
+ | | 1 cpu jobs only | ||
+ | | Cascade Lake (CL), 3200 nodes | ||
|- | |- | ||
− | | gpuvolta | + | | gpuvolta |
+ | | 340 GB | ||
+ | | Normal | ||
+ | | 3SU | ||
+ | | 48CPUs, 4 GPU | ||
+ | | 640 Nvidia V100 GPUs, 160 nodes | ||
|} | |} | ||
− | |||
− | + | Full [https://opus.nci.org.au/display/Help/Queue+Limits details of all available queues are available from the NCI help pages]. | |
− | =Getting an account= | + | = Getting an account = |
To get a new account at NCI, you will need to get connected to a NCI project. Before you start the process, talk to your CI or supervisor to know which project code to use. You will need to apply via [https://my.nci.org.au/ my.nci.org.au]. NCI will send you a password via SMS once your application has been processed, this usually takes under a day to do. | To get a new account at NCI, you will need to get connected to a NCI project. Before you start the process, talk to your CI or supervisor to know which project code to use. You will need to apply via [https://my.nci.org.au/ my.nci.org.au]. NCI will send you a password via SMS once your application has been processed, this usually takes under a day to do. | ||
Line 25: | Line 57: | ||
Once you have an account, [https://my.nci.org.au/mancini/login?next=/mancini/ my.nci.org.au] will allow you to ask for membership of other projects you might need. Those could be projects for additional compute time or projects to access data etc. | Once you have an account, [https://my.nci.org.au/mancini/login?next=/mancini/ my.nci.org.au] will allow you to ask for membership of other projects you might need. Those could be projects for additional compute time or projects to access data etc. | ||
− | =Connecting to Gadi= | + | = Connecting to Gadi = |
− | To connect to Gadi, you'll need to use a SSH connection to <span style="font-family:monospace">gadi.nci.org.au</span>. If you're using Windows, you'll need to use something like [http://www.putty.org/ | + | To connect to Gadi, you'll need to use a SSH connection to <span style="font-family:monospace">gadi.nci.org.au</span>. If you're using Windows, you'll need to use something like [http://www.putty.org/ PuTTY], or if you're connecting from linux or mac run on the commandline (substitute <span style="font-family:monospace">abc123</span> with your own username) |
− | <syntaxhighlight lang=bash> | + | <syntaxhighlight lang="bash">ssh -Y abc123@gadi.nci.org.au |
− | ssh -Y abc123@gadi.nci.org.au | ||
</syntaxhighlight> | </syntaxhighlight> | ||
You can make a shortcut for this by editing (or creating) the file <span style="font-family:monospace">~/.ssh/config</span> and adding the lines: | You can make a shortcut for this by editing (or creating) the file <span style="font-family:monospace">~/.ssh/config</span> and adding the lines: | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text"> |
Host gadi | Host gadi | ||
HostName gadi.nci.org.au | HostName gadi.nci.org.au | ||
Line 40: | Line 71: | ||
ForwardX11Trusted true | ForwardX11Trusted true | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | This way you just need to type '<span style="font-family:monospace">ssh gadi</span>' to connect. | + | This way you just need to type '<span style="font-family:monospace">ssh gadi</span>' to connect. |
− | + | = Swapping Projects = | |
− | =Swapping Projects= | ||
If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g. | If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g. | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text">switchproj w35 |
− | switchproj w35 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | will change your current project to w35. | + | will start a new shell and change your current project to w35. This is useful if you want to interactively create files that will be owned by a specific project. You can also change your default project by editing the file on Gadi <span style="font-family:monospace">~/.config/gadi-login.conf</span>, it should have a line like <syntaxhighlight lang="text">PROJECT w35 |
− | |||
− | You can also change your default project by editing the file on | ||
− | <syntaxhighlight lang=text> | ||
− | |||
</syntaxhighlight> | </syntaxhighlight> | ||
− | + | You simply need to change the project code to the one you'd like. Then you need to log out and back in for the change to take effect. In the <span style="font-family:monospace">~/.config/gadi-login.conf</span> file, you can also change your default shell. That is the active shell when you log into Gadi. | |
− | =Resources on | + | = Resources on Gadi = |
To see how much compute time you have available run the command | To see how much compute time you have available run the command | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text">nci_account -P $PROJECT -q 2013.q3 |
− | nci_account -P $PROJECT -q 2013.q3 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
To see how much storage space you have available run the command | To see how much storage space you have available run the command | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text">lquota -P $PROJECT |
− | lquota -P $PROJECT | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | =Submitting Jobs= | + | For more information see [[Accounting_at_NCI|Accounting_at_NCI]]. |
+ | |||
+ | | ||
+ | |||
+ | = Submitting Jobs = | ||
To run a job on the supercomputer you submit it to a job queue using the '<span style="font-family:monospace">qsub</span>' command. Jobs are shell script files, they contain special markers to say what resources the job needs. | To run a job on the supercomputer you submit it to a job queue using the '<span style="font-family:monospace">qsub</span>' command. Jobs are shell script files, they contain special markers to say what resources the job needs. | ||
As an example the script "hello.sh" | As an example the script "hello.sh" | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text">#!/bin/bash |
− | #!/bin/bash | ||
#PBS -l ncpus=2 | #PBS -l ncpus=2 | ||
#PBS -l walltime=10:00 | #PBS -l walltime=10:00 | ||
Line 81: | Line 107: | ||
echo "Hello" | echo "Hello" | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | |||
− | If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses. | + | says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'. If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses. |
+ | |||
+ | See the [https://opus.nci.org.au/display/Help/How+to+submit+a+job NCI PBS documentation] for more detail. | ||
− | =Managing Jobs= | + | = Managing Jobs = |
To see a list of your submitted & currently running jobs run | To see a list of your submitted & currently running jobs run | ||
− | <syntaxhighlight lang=text> | + | <syntaxhighlight lang="text">nqstat |
− | nqstat | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | This also shows how much resources each job has requested & is currently using. | + | This also shows how much resources each job has requested & is currently using. Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run <syntaxhighlight lang="text">qstat -s 123456 # Show any status information, e.g. why the job isn't currently running |
− | |||
− | Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run | ||
− | <syntaxhighlight lang=text> | ||
− | qstat -s 123456 # Show any status information, e.g. why the job isn't currently running | ||
qstat -f 123456 # Show full information, including resources requested & environment variables | qstat -f 123456 # Show full information, including resources requested & environment variables | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | To remove a job from the queue use <span style="font-family:monospace">qdel</span> | + | To remove a job from the queue use <span style="font-family:monospace">qdel</span> <syntaxhighlight lang="text">qdel 123456 # Remove the job 123456 from the queue |
− | <syntaxhighlight lang=text> | ||
− | qdel 123456 # Remove the job 123456 from the queue | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | = | + | = Copying and moving files = |
− | |||
− | |||
− | + | In general the project code is retained when moving files from one filesystem to another om gadi (using the <tt>mv</tt> command). If you instead copy the files (using <tt>cp</tt>) then the project code is changed to be the same as the project for the directory they are being copied to if the <tt>setgid</tt> bit is set, which it often is, e.g. | |
− | < | + | <pre>$ ls -ld /scratch/v45 |
− | + | drwxrws--- 170 root v45 16384 May 20 20:16 '''/scratch/v45'''</pre> | |
− | |||
− | </ | ||
− | |||
− | < | ||
− | |||
− | |||
− | </ | ||
− | + | the <tt>s</tt> in the group permissions indicates the setgid bit is set, which means any files '''created''' there will have the group <tt>v45</tt>, rather than your default project code, and copying creates a new file. When using <tt>mv</tt> across filesystems it will effectively do a copy and on success change the attributes to match the original and then delete the original. <tt>rsync</tt> will also do something similar depending on the options used. | |
− | < | ||
− | |||
− | </ | ||
− | |||
− | + | See here for more details on the setgid bit<br/> [https://linuxconfig.org/how-to-use-special-permissions-the-setuid-setgid-and-sticky-bits https://linuxconfig.org/how-to-use-special-permissions-the-setuid-setgid-and-sticky-bits] | |
− | < | ||
− | / | ||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:NCI]] | [[Category:NCI]] | ||
+ | |||
+ | |
Latest revision as of 02:37, 1 June 2022
Contents
Gadi processors
Gadi is NCI's primary supercomputer since January 2020. The supercomputer is composed of an assortment of processors, the majority of those are Cascade Lake processors. Different queues give access to different processors and have a different charging rate
Queue | Memory | Priority | Charging rate per walltime-hour | CPU per node | Processor type |
---|---|---|---|---|---|
normal | 192 GB | Normal | 2SU | 48 | Cascade Lake (CL), 3200 nodes |
normalbw | 256 GB | Normal | 1.25SU | 28 | Broadwell (BW), 800 nodes |
express | 192 GB | High | 6SU | 48 | Cascade Lake (CL), 3200 nodes |
copyq | 192 GB | Normal | 2SU | 1 cpu jobs only | Cascade Lake (CL), 3200 nodes |
gpuvolta | 340 GB | Normal | 3SU | 48CPUs, 4 GPU | 640 Nvidia V100 GPUs, 160 nodes |
Full details of all available queues are available from the NCI help pages.
Getting an account
To get a new account at NCI, you will need to get connected to a NCI project. Before you start the process, talk to your CI or supervisor to know which project code to use. You will need to apply via my.nci.org.au. NCI will send you a password via SMS once your application has been processed, this usually takes under a day to do.
Once you have an account, my.nci.org.au will allow you to ask for membership of other projects you might need. Those could be projects for additional compute time or projects to access data etc.
Connecting to Gadi
To connect to Gadi, you'll need to use a SSH connection to gadi.nci.org.au. If you're using Windows, you'll need to use something like PuTTY, or if you're connecting from linux or mac run on the commandline (substitute abc123 with your own username)
ssh -Y abc123@gadi.nci.org.au
You can make a shortcut for this by editing (or creating) the file ~/.ssh/config and adding the lines:
Host gadi
HostName gadi.nci.org.au
User abc123
ForwardX11 true
ForwardX11Trusted true
This way you just need to type 'ssh gadi' to connect.
Swapping Projects
If you use more than one project you can swap between them with the command 'switchproj', e.g.
switchproj w35
will start a new shell and change your current project to w35. This is useful if you want to interactively create files that will be owned by a specific project. You can also change your default project by editing the file on Gadi ~/.config/gadi-login.conf, it should have a line like
PROJECT w35
You simply need to change the project code to the one you'd like. Then you need to log out and back in for the change to take effect. In the ~/.config/gadi-login.conf file, you can also change your default shell. That is the active shell when you log into Gadi.
Resources on Gadi
To see how much compute time you have available run the command
nci_account -P $PROJECT -q 2013.q3
To see how much storage space you have available run the command
lquota -P $PROJECT
For more information see Accounting_at_NCI.
Submitting Jobs
To run a job on the supercomputer you submit it to a job queue using the 'qsub' command. Jobs are shell script files, they contain special markers to say what resources the job needs.
As an example the script "hello.sh"
#!/bin/bash
#PBS -l ncpus=2
#PBS -l walltime=10:00
#PBS -l mem=1gb
#PBS -v PROJECT
echo "Hello"
says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'. If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses.
See the NCI PBS documentation for more detail.
Managing Jobs
To see a list of your submitted & currently running jobs run
nqstat
This also shows how much resources each job has requested & is currently using. Each job in the queue has a run id number associated with it (this is also printed when you submit a job with qsub). To get more information on a job run
qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
qstat -f 123456 # Show full information, including resources requested & environment variables
To remove a job from the queue use qdel
qdel 123456 # Remove the job 123456 from the queue
Copying and moving files
In general the project code is retained when moving files from one filesystem to another om gadi (using the mv command). If you instead copy the files (using cp) then the project code is changed to be the same as the project for the directory they are being copied to if the setgid bit is set, which it often is, e.g.
$ ls -ld /scratch/v45 drwxrws--- 170 root v45 16384 May 20 20:16 '''/scratch/v45'''
the s in the group permissions indicates the setgid bit is set, which means any files created there will have the group v45, rather than your default project code, and copying creates a new file. When using mv across filesystems it will effectively do a copy and on success change the attributes to match the original and then delete the original. rsync will also do something similar depending on the options used.
See here for more details on the setgid bit
https://linuxconfig.org/how-to-use-special-permissions-the-setuid-setgid-and-sticky-bits