Difference between revisions of "Gadi"

Line 12: Line 12:
  
 
To connect to Raijin you'll need to use a SSH connection to <span style="font-family:monospace">raijin.nci.org.au</span>. If you're using Windows you'll need to use something like [http://www.putty.org/ | PuTTY], or if you're connecting from linux or mac run on the commandline (substitute <span style="font-family:monospace">abc123</span> with your own username)
 
To connect to Raijin you'll need to use a SSH connection to <span style="font-family:monospace">raijin.nci.org.au</span>. If you're using Windows you'll need to use something like [http://www.putty.org/ | PuTTY], or if you're connecting from linux or mac run on the commandline (substitute <span style="font-family:monospace">abc123</span> with your own username)
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
ssh -Y abc123@raijin.nci.org.au
 
ssh -Y abc123@raijin.nci.org.au
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
You can make a shortcut for this by editing the file <span style="font-family:monospace">~/.ssh/config</span> and adding the lines:
 
You can make a shortcut for this by editing the file <span style="font-family:monospace">~/.ssh/config</span> and adding the lines:
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
Host              raijin
 
Host              raijin
 
HostName          raijin.nci.org.au
 
HostName          raijin.nci.org.au
Line 31: Line 31:
  
 
If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g.
 
If you use more than one project you can swap between them with the command '<span style="font-family:monospace">switchproj</span>', e.g.
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
switchproj w35
 
switchproj w35
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 37: Line 37:
  
 
You can also change your default project by editing the file on Raijin <span style="font-family:monospace">~/.rashrc</span>, it should have a line like
 
You can also change your default project by editing the file on Raijin <span style="font-family:monospace">~/.rashrc</span>, it should have a line like
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
setenv PROJECT w35
 
setenv PROJECT w35
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 44: Line 44:
  
 
To see how much compute time you have available run the command
 
To see how much compute time you have available run the command
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
nci_account -P $PROJECT -q 2013.q3
 
nci_account -P $PROJECT -q 2013.q3
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
To see how much storage space you have available run the command
 
To see how much storage space you have available run the command
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
lquota -P $PROJECT
 
lquota -P $PROJECT
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 58: Line 58:
  
 
As an example the script "hello.sh"
 
As an example the script "hello.sh"
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
#!/bin/bash
 
#!/bin/bash
 
#PBS -l ncpus=2
 
#PBS -l ncpus=2
Line 74: Line 74:
  
 
To see a list of your submitted & currently running jobs run
 
To see a list of your submitted & currently running jobs run
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
nqstat
 
nqstat
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 80: Line 80:
  
 
Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run
 
Each job in the queue has a run id number associated with it (this is also printed when you submit a job with <span style="font-family:monospace">qsub</span>). To get more information on a job run
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
 
qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
 
qstat -f 123456 # Show full information, including resources requested & environment variables
 
qstat -f 123456 # Show full information, including resources requested & environment variables
 
</syntaxhighlight>
 
</syntaxhighlight>
 
To remove a job from the queue use <span style="font-family:monospace">qdel</span>
 
To remove a job from the queue use <span style="font-family:monospace">qdel</span>
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
qdel 123456 # Remove the job 123456 from the queue
 
qdel 123456 # Remove the job 123456 from the queue
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 94: Line 94:
  
 
The PBS flags
 
The PBS flags
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
#PBS -l vmem=2gb
 
#PBS -l vmem=2gb
 
#PBS -wd
 
#PBS -wd
 
</syntaxhighlight>
 
</syntaxhighlight>
 
should be changed to
 
should be changed to
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
#PBS -l mem=2gb
 
#PBS -l mem=2gb
 
#PBS -l wd
 
#PBS -l wd
Line 105: Line 105:
  
 
The environment variable $PROJECT should be set before submitting a job, or a line like
 
The environment variable $PROJECT should be set before submitting a job, or a line like
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
#PBS -v PROJECT=w35
 
#PBS -v PROJECT=w35
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 111: Line 111:
  
 
Shared ACCESS data that used to be in the path
 
Shared ACCESS data that used to be in the path
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
/data/projects/access
 
/data/projects/access
 
</syntaxhighlight>
 
</syntaxhighlight>
 
is now available under
 
is now available under
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
~access/data
 
~access/data
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
[[Category:NCI]]
 
[[Category:NCI]]

Revision as of 23:55, 11 December 2019

Template:Needs Update This page needs updating

Raijin was NCI's primary supercomputer from July 2013 to December 2019. It has 3592 compute nodes, each containing two 8-core Intel Sandy-Bridge processors. The majority of nodes have 32GB of memory, with ~1000 having 64 GB.

NCI have a | user guide for Raijin available on their website which you should also read through.

Getting an account

If you don't already have a NCI account you should apply for one. You'll also need to be connected to a NCI project for accounting purposes, your CI or supervisor should be able to give you the project code to use in the connection form. NCI will send you a password via SMS once your application has been processed, this usually takes under a day to do.

Connecting to Raijin

To connect to Raijin you'll need to use a SSH connection to raijin.nci.org.au. If you're using Windows you'll need to use something like | PuTTY, or if you're connecting from linux or mac run on the commandline (substitute abc123 with your own username)

ssh -Y abc123@raijin.nci.org.au

You can make a shortcut for this by editing the file ~/.ssh/config and adding the lines:

Host              raijin
HostName          raijin.nci.org.au
User              abc123
ForwardX11        true
ForwardX11Trusted true

This way you just need to type 'ssh raijin' to connect.

If you want to use VNC instead of X11 Forwarding, see this guide: VNC to Raijin

Swapping Projects

If you use more than one project you can swap between them with the command 'switchproj', e.g.

switchproj w35

will change your current project to w35.

You can also change your default project by editing the file on Raijin ~/.rashrc, it should have a line like

setenv PROJECT w35

Resources on Raijin

To see how much compute time you have available run the command

nci_account -P $PROJECT -q 2013.q3

To see how much storage space you have available run the command

lquota -P $PROJECT

Submitting Jobs

To run a job on the supercomputer you submit it to a job queue using the 'qsub' command. Jobs are shell script files, they contain special markers to say what resources the job needs.

As an example the script "hello.sh"

#!/bin/bash
#PBS -l ncpus=2
#PBS -l walltime=10:00
#PBS -l mem=1gb
#PBS -v PROJECT

echo "Hello"

says to run with 2 cpus for a maximum time of 10 minutes. The job can use up to 1 GB of memory. Anything after the #PBS lines is what gets run on the supercomputer, in this instance it just prints "Hello" (any output goes to files in the directory you submitted the job named like "hello.sh.o123456", error messages go to files named like "hello.sh.e123456). The command '-v PROJECT' means run using the current project, you can also specify a project to use like '-v PROJECT=w35'.

If the job tries to use more resources than it's asked for it will be automatically stopped. The less resources you ask for the more likely it is that your job will run quickly however, you should try to request an amount close to what the job actually uses.

Managing Jobs

To see a list of your submitted & currently running jobs run

nqstat

This also shows how much resources each job has requested & is currently using.

Each job in the queue has a run id number associated with it (this is also printed when you submit a job with qsub). To get more information on a job run

qstat -s 123456 # Show any status information, e.g. why the job isn't currently running
qstat -f 123456 # Show full information, including resources requested & environment variables

To remove a job from the queue use qdel

qdel 123456 # Remove the job 123456 from the queue

Changes from Vayu

Vayu was NCI's previous supercomputer. There are some changes that need to be made to run jobs designed for Vayu run on Raijin.

The PBS flags

#PBS -l vmem=2gb
#PBS -wd

should be changed to

#PBS -l mem=2gb
#PBS -l wd

The environment variable $PROJECT should be set before submitting a job, or a line like

#PBS -v PROJECT=w35

should be added to scripts.

Shared ACCESS data that used to be in the path

/data/projects/access

is now available under

~access/data