Difference between revisions of "Gadi"

m (A.heerdegen moved page Raijin to Gadi: Main compute resource is no longer raijin, changed to gadi)
Line 21: Line 21:
 
| 48
 
| 48
 
| Cascade Lake (CL), 3200 nodes
 
| Cascade Lake (CL), 3200 nodes
 +
|-
 +
| normalbw
 +
| 256 GB
 +
| Normal
 +
| 1.25SU
 +
| 28
 +
| Broadwell (BW), 800 nodes
 
|-
 
|-
 
| express
 
| express
Line 44: Line 51:
 
|}
 
|}
  
More processors will be added in the coming weeks.
+
Full [https://opus.nci.org.au/display/Help/Queue+Limits details of all available queues are available from the NCI help pages].
 
 
NCI does not yet have a User Guide for Gadi. But the notes to get [https://opus.nci.org.au/display/Help/Preparing+for+Gadi prepared for Gadi] will provide you with a wealth of information on the machine and its use.
 
  
 
= Getting an account =
 
= Getting an account =
Line 57: Line 62:
  
 
To connect to Gadi, you'll need to use a SSH connection to <span style="font-family:monospace">gadi.nci.org.au</span>. If you're using Windows, you'll need to use something like [http://www.putty.org/ PuTTY], or if you're connecting from linux or mac run on the commandline (substitute <span style="font-family:monospace">abc123</span> with your own username)
 
To connect to Gadi, you'll need to use a SSH connection to <span style="font-family:monospace">gadi.nci.org.au</span>. If you're using Windows, you'll need to use something like [http://www.putty.org/ PuTTY], or if you're connecting from linux or mac run on the commandline (substitute <span style="font-family:monospace">abc123</span> with your own username)
<syntaxhighlight lang="bash">
+
<syntaxhighlight lang="bash">ssh -Y abc123@gadi.nci.org.au
ssh -Y abc123@gadi.nci.org.au
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Line 73: Line 77:
  
 
If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g.
 
If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g.
<syntaxhighlight lang="text">
+
<syntaxhighlight lang="text">switchproj w35
switchproj w35
 
 
</syntaxhighlight>
 
</syntaxhighlight>
will change your current project to w35. This is useful if you want to interactively create files that will be owned by a specific project.  
+
will start a new shell and change your current project to w35. This is useful if you want to interactively create files that will be owned by a specific project. You can also change your default project by editing the file on Gadi <span style="font-family:monospace">~/.config/gadi-login.conf</span>, it should have a line like <syntaxhighlight lang="text">PROJECT w35
You can also change your default project by editing the file on Gadi <span style="font-family:monospace">~/.config/gadi-login.conf</span>, it should have a line like
 
<syntaxhighlight lang="text">
 
PROJECT w35
 
 
</syntaxhighlight>
 
</syntaxhighlight>
You simply need to change the project code to the one you'd like. Then you need to log out and back in for the change to take effect.  
+
You simply need to change the project code to the one you'd like. Then you need to log out and back in for the change to take effect. In the <span style="font-family:monospace">~/.config/gadi-login.conf</span> file, you can also change your default shell. That is the active shell when you log into Gadi.  
In the <span style="font-family:monospace">~/.config/gadi-login.conf</span> file, you can also change your default shell. That is the active shell when you log into Gadi.
 
 
 
 
= Resources on Gadi =
 
= Resources on Gadi =
  
 
To see how much compute time you have available run the command
 
To see how much compute time you have available run the command
<syntaxhighlight lang="text">
+
<syntaxhighlight lang="text">nci_account -P $PROJECT -q 2013.q3
nci_account -P $PROJECT -q 2013.q3
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
To see how much storage space you have available run the command
 
To see how much storage space you have available run the command
<syntaxhighlight lang="text">
+
<syntaxhighlight lang="text">lquota -P $PROJECT
lquota -P $PROJECT
 
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
For more information see [[Accounting_at_NCI|Accounting_at_NCI]].
 +
 +
&nbsp;
  
 
= Submitting Jobs =
 
= Submitting Jobs =
Line 101: Line 101:
  
 
As an example the script "hello.sh"
 
As an example the script "hello.sh"
<syntaxhighlight lang="text">
+
<syntaxhighlight lang="text">#!/bin/bash
#!/bin/bash
 
 
#PBS -l ncpus=2
 
#PBS -l ncpus=2
 
#PBS -l walltime=10:00
 
#PBS -l walltime=10:00
Line 110: Line 109:
 
echo "Hello"
 
echo "Hello"
 
</syntaxhighlight>
 
</syntaxhighlight>
says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'.  
+
 
If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses.
+
says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'. If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses.
 +
 
 +
See the [https://opus.nci.org.au/display/Help/How+to+submit+a+job NCI PBS documentation] for more detail.
  
 
= Managing Jobs =
 
= Managing Jobs =
  
 
To see a list of your submitted & currently running jobs run
 
To see a list of your submitted & currently running jobs run
<syntaxhighlight lang="text">
+
<syntaxhighlight lang="text">nqstat
nqstat
 
 
</syntaxhighlight>
 
</syntaxhighlight>
This also shows how much resources each job has requested & is currently using.  
+
This also shows how much resources each job has requested & is currently using. Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run <syntaxhighlight lang="text">qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run
 
<syntaxhighlight lang="text">
 
qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
 
 
qstat -f 123456 # Show full information, including resources requested & environment variables
 
qstat -f 123456 # Show full information, including resources requested & environment variables
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 129: Line 126:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
= Changes from Vayu =
+
= &nbsp; =
 
 
Vayu was NCI's previous supercomputer. There are some changes that need to be made to run jobs designed for Vayu run on Raijin.
 
 
 
The PBS flags
 
<syntaxhighlight lang="text">
 
#PBS -l vmem=2gb
 
#PBS -wd
 
</syntaxhighlight>
 
should be changed to <syntaxhighlight lang="text">
 
#PBS -l mem=2gb
 
#PBS -l wd
 
</syntaxhighlight>
 
 
 
The environment variable $PROJECT should be set before submitting a job, or a line like
 
<syntaxhighlight lang="text">
 
#PBS -v PROJECT=w35
 
</syntaxhighlight>
 
should be added to scripts.
 
Shared ACCESS data that used to be in the path
 
<syntaxhighlight lang="text">
 
/data/projects/access
 
</syntaxhighlight>
 
is now available under <syntaxhighlight lang="text">
 
~access/data
 
</syntaxhighlight>
 
  
 
[[Category:NCI]]
 
[[Category:NCI]]

Revision as of 21:36, 10 May 2021

Template:Needs Update This page needs updating

Gadi processors

Gadi is NCI's primary supercomputer since January 2020. The supercomputer is composed of an assortment of processors, the majority of those are Cascade Lake processors. Different queues give access to different processors and have a different charging rate

Queue Memory Priority Charging rate per walltime-hour CPU per node Processor type
normal 192 GB Normal 2SU 48 Cascade Lake (CL), 3200 nodes
normalbw 256 GB Normal 1.25SU 28 Broadwell (BW), 800 nodes
express 192 GB High 6SU 48 Cascade Lake (CL), 3200 nodes
copyq 192 GB Normal 2SU 1 cpu jobs only Cascade Lake (CL), 3200 nodes
gpuvolta 340 GB Normal 3SU 48CPUs, 4 GPU 640 Nvidia V100 GPUs, 160 nodes

Full details of all available queues are available from the NCI help pages.

Getting an account

To get a new account at NCI, you will need to get connected to a NCI project. Before you start the process, talk to your CI or supervisor to know which project code to use. You will need to apply via my.nci.org.au. NCI will send you a password via SMS once your application has been processed, this usually takes under a day to do.

Once you have an account, my.nci.org.au will allow you to ask for membership of other projects you might need. Those could be projects for additional compute time or projects to access data etc.

Connecting to Gadi

To connect to Gadi, you'll need to use a SSH connection to gadi.nci.org.au. If you're using Windows, you'll need to use something like PuTTY, or if you're connecting from linux or mac run on the commandline (substitute abc123 with your own username)

ssh -Y abc123@gadi.nci.org.au

You can make a shortcut for this by editing (or creating) the file ~/.ssh/config and adding the lines:

Host              gadi
HostName          gadi.nci.org.au
User              abc123
ForwardX11        true
ForwardX11Trusted true

This way you just need to type 'ssh gadi' to connect.

Swapping Projects

If you use more than one project you can swap between them with the command 'switchproj', e.g.

switchproj w35

will start a new shell and change your current project to w35. This is useful if you want to interactively create files that will be owned by a specific project. You can also change your default project by editing the file on Gadi ~/.config/gadi-login.conf, it should have a line like

PROJECT w35

You simply need to change the project code to the one you'd like. Then you need to log out and back in for the change to take effect. In the ~/.config/gadi-login.conf file, you can also change your default shell. That is the active shell when you log into Gadi.

Resources on Gadi

To see how much compute time you have available run the command

nci_account -P $PROJECT -q 2013.q3

To see how much storage space you have available run the command

lquota -P $PROJECT

For more information see Accounting_at_NCI.

 

Submitting Jobs

To run a job on the supercomputer you submit it to a job queue using the 'qsub' command. Jobs are shell script files, they contain special markers to say what resources the job needs.

As an example the script "hello.sh"

#!/bin/bash
#PBS -l ncpus=2
#PBS -l walltime=10:00
#PBS -l mem=1gb
#PBS -v PROJECT

echo "Hello"

says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'. If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses.

See the NCI PBS documentation for more detail.

Managing Jobs

To see a list of your submitted & currently running jobs run

nqstat

This also shows how much resources each job has requested & is currently using. Each job in the queue has a run id number associated with it (this is also printed when you submit a job with qsub). To get more information on a job run

qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
qstat -f 123456 # Show full information, including resources requested & environment variables

To remove a job from the queue use qdel

qdel 123456 # Remove the job 123456 from the queue