CLEx induction

Revision as of 23:55, 27 November 2019 by H.wolff (talk | contribs) (Modules:)

The CMS team is here to help you with the technical aspects of your work. A lot of you (probably all) will have to work with computer code and software, usually in a Linux/Unix environment. You might also need to access and/or share large amounts of data, work on different servers, etc.

Some of you will be somewhat familiar with these tasks whereas they will be new to others. We are here to help whatever your technical knowledge is. We are happy to answer any question, we can't necessarily do everything but feel free to ask as we might have ideas or we might know who you should contact and what to exactly ask.

The members of the CMS team are:

Name Home-based Institution Email
Claire Carouge ANU c.carouge@unsw.edu.au
Aidan Heerdegen ANU aidan.heerdegen@anu.edu.au
Paola Petrelli U Tasmania paola.petrelli@utas.edu.au
Scott Wales U Melbourne scott.wales@unimelb.edu.au
Holger Wolff Monash holger.wolff@monash.edu
Danny Eisenberg UNSW d.eisenberg@unsw.edu.au

(Note: there is no typo, Claire is sitting at ANU but has a UNSW email address). Although we won't ignore emails sent to our individual addresses, our preferred way of contact is through our help desk email address: cws_help@nf.nci.org.au. Contacting the help desk will make sure your email is seen by one of us in case the person at your institution is away for example. Don't worry about the help desk email being an NCI account, you can still ask any question there.


Step 1: Get an account with NCI

NCI operates the computers that you will be doing most of your work on. NCI provides services to the Centre, they are not part of the Centre of Excellence.

To get access to NCI servers, you need to do three things:

1. You need to find out which project(s) you need to be connected to.

Your project decides who gets billed for what you do on the NCI Servers. Your supervisor should be able to tell you which project(s) you should get connected to. Be mindful that there are two types of projects: projects for computation and projects for datasets. The mapping of the computational projects for CLEX can be found here. We recommend to join all the projects relevant to your research project but to choose 1 default project to work from. The same page with the list of computational projects explains how to set your default project. The projects for datasets do not have any computational time added to them so you need to make sure you ask for at least one computation project. Your supervisor should know. Note if you will run the ACCESS model or the UM model, you need to ask a connection to the "access" project in addition to at least one computation project.

2. You need to register with NCI.

This can be done on NCI registration page You will need to supply some information, and read and accept their policies. In order to be added to the system, you need to request connection to at least one project and be accepted on the project. Note that this can take a bit of time until the Lead chief investigator of the project has received and confirmed your request via email. Note that this is an automatic email process, you don't need to send an email to the lead chief investigator. Although, you might want to do so if you think you need to introduce yourself to him/her. Note that you should be able to request connection to several projects while getting signed up.

3. You might want to ask your supervisor whether you also should be connected to any of the following data projects.

These don’t provide any computing allowances, but they give access to certain datasets or models that you might need.

Project Purpose Administrator
ua8 access to ARCCSS/CLEx published data, GSWP3, NCEP Polar sst, ostia sst, CMIP5 ocean processing CMS
rq7 ECMWF Year of Tropical Convection re-analysis CMS
rq5 OFES - OGCM for the Earth Simulator ocean re-analysis CMS
ub4 ECMWF ERA Interim re-analysis 6 hrs data CMS
cb20 CMIP3 replica data NCI
al33 CMIP5 replica data NCI
rr3 CMIP5 published data NCI
oi10 CMIP6 replica data NCI
rr7 BoM re-analysis collection including ERA-Interim monthly data and JRA55, MERRA2 BoM
cable Give access to the CABLE model and CABLE benchmarking dataset. Please read the CABLE registration information. CSIRO
access Give access to the Unified Model and ACCESS tools CMS - BoM - CSIRO

 

Step 2: Set up your Connection

The standard method of connecting to the NCI systems is SSH. We want you to use a passphrase protected private key authorisation method with an ssh-agent and agent forwarding for convenience. If you understood every part of that sentence, go ahead and set it up. If not, the next part describes what to do.

The instructions describe how to set up your connection to both gadi.nci.org.au and accessdev.nci.org.au. You can not set up a connection to accessdev if you are not going to use the ACCESS or UM models. So please only follow the instructions you need!

 

Linux and MacOS:

  1. Create the ssh directory on your computer
    $ mkdir -p ~/.ssh
    
  2. Create a config file
    $ nano ~/.ssh/config
    
    Enter the following contents, replacing xx0000 with your NCI user name.
    Host gadi
        HostName gadi.nci.org.au
        User xx0000
        ForwardX11 yes
        ForwardX11Trusted yes
    Host access
        HostName accessdev.nci.org.au
        User xx0000
        ForwardX11 yes
        ForwardX11Trusted yes
        ForwardAgent yes
  3. Create a key pair. It is imperative that you select a strong passphrase when asked for it.
    $ ssh-keygen -t rsa
    
  4. Distribute the key pair. On most systems, there is a useful script to distribute the key pairs called ssh-copy-id:
    $ ssh-copy-id gadi
    $ ssh-copy-id access
    
    If that script fails, you have to distribute the keys manually with these commands:
    $ cat ~/.ssh/id_rsa.pub | ssh gadi "mkdir -p ~/.ssh/; cat >> ~/.ssh/authorized_keys"
    $ cat ~/.ssh/id_rsa.pub | ssh access "mkdir -p ~/.ssh/; cat >> ~/.ssh/authorized_keys"
    
  5. Test whether you have an agent. If you have an agent already running, this command will ask for the passphrase for the just-created key pair:
    $ ssh-add
    

If the last command told you that it couldn’t open a connection, then you don’t have an agent. Come talk to someone of the CMS team, we will help you set it up.

Windows

Windows is not Unix based, so it doesn’t come with standard SSH programs. The most popular SSH program for Windows is PuTTY, available here. You will need at least PuTTY, Pageant (which is the agent), and PuTTYgen, the key generator. A nice video on how to set it up is on YouTube. Windows also doesn't come with an X11 Server, which is needed to display graphical user interfaces. At this point in time, we suggest something like Xming. The last free version of Xming came out in 2007, this version should be sufficient. If you really want to, you can 'donate' GBP10 (ca. $20) to get an up-to-date version of Xming.

Another option is to install Cygwin, which is a large collection of Linux utilities that run on Windows. Cygwin includes a shell, SSH, and an X11 server, as well as many other useful tools.

SSH-keys for file transfers

This section refers to how to transfer data to the raijin supercomputer. It has not been updated for the new gadi supercomputer. Template:NeedsUpdate

Additionally, some groups transfer data from NCI to outside NCI, for example for storage on University maintained servers. For these transfers, the best way to proceed is to create a restricted ssh key pair used only for file transfer. The setup is explained on this NCI page. This page has not been updated since the Vayu machine. Most of the instructions are still valid, except for step 3. The restricted command prefix should now be:

from="chipmunk*.nci.org.au,r-dm*.nci.org.au,gopher*.nci.org.au,raijin*.nci.org.au",command="~/bin/rrsync /data/archive",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,no-user-rc ssh-rsa

Step 3: Basics of the NCI system

We are mainly using 3 systems of the NCI: raijin, accessdev and VDI. Raijin is the supercomputer that we run our models on. Accessdev is a virtual server that we have software installed that are required to allow the running the models. VDI is a virtual desktop designed for interactive analysis of your results. All these servers run Linux as operating system.

Read the manual!

Considering you will be using systems that are provided by the NCI, you should take the time to read the documentation provided by them about their systems. That is everything on their help knowledge base.

Additional notes

Modules:

The servers have a lot of software installed, some of which they have multiple versions of. If you need to use a software, you usually will need to load a certain module. A full help of the module command is accessible through the man pages on gadi. Here are the most important commands regarding the modules:

$ module help Show help about the module system
$ module avail List all available modules (a lot!)
$ module list List the currently loaded modules
$ module load intel-compiler Load the default version of the intel compiler
$ module load intel-compiler/2019.3.199 Load a specific version of the intel compiler
$ module switch intel-compiler/2019.3.199 Switch to a different version of the intel compiler (unload and load)
$ module unload intel-compiler/2019.3.199 Unload the specified version of the intel compiler
$ module use /g/data/hh5/public/modules Add our Centre's curated modules to the list of available modules

Some modules conflict with one another, for example different versions of the same software. You might have to unload one module first to load a different one.

PBS:

The other important system you need to understand in order to work on raijin is the scheduler PBS. This is the system for running software on dedicated nodes inside raijin. Basically, a user tells PBS that they want to run this job on this many CPUs with these resources for at least so long, and PBS places it in a queue, and when the required resources are available, it will assign this job to them.

Again, NCI has a very good documentation about how to run this here

The most important commands are:

qsub Submit a job to the queue
qstat
nqstat
Display the status of the queued jobs
qdel Remove a job from the queue

Of these, qsub is the most important, and there are so many parameters that we won’t tell them all here.

In general, the fewer resources you request, the earlier your job will run. But if your program exceeds the requested resources at any time, PBS will kill the job, and you have to start over.

More specifically:

  • Raijin is comprised of nodes with 16 CPUs each. If you request less than 16 CPUs, you will get the number of CPUs that you request. If you request more, you will get as many nodes as you need. So it is usually a good idea for jobs with more than 16 CPUs to use a multiple of 16 CPUs.
  • Nodes come with between 32GB and 126GB of memory to share within the 16 CPUs. So if your job requires 16 or more CPUs, you might as well request just under 32GB per node, for example 95GB for a 48-cpu job.
  • If you request more than 32GB per 16 CPUs, your job will possibly have to wait longer for enough high-memory nodes to become available.

Step 4: How to run models

A lot of information to gain access to models or run the models can be found on the ARCCSS CMS wiki (see link at the end).

A very important message from the CMS team: Often you will work with source code. Please ensure that your version of the source code is under version control! We use SVN and git (usually hosted on github) for the model codes and smaller utilities.DO NOT JUST COPY SOURCE CODE! CHECK IT OUT, OR CHECK IT IN!

UM

The Unified model is currently transitioning from the old user environment umui to the new rose/cylc user interface. Either way, you need to log into accessdev.nci.org.au.

umui(x)

The old user interface has to be used for versions up to 7.x, and can be used for 8.x. umui was developed by the UM, umuix is very similar, but more convenient developed by CSIRO. An introduction into its use is on our wiki: http://climate-cms.unsw.wikispaces.net/Introduction+to+UMUI

rose/cylc

The new user interface can be used from version 8.x, and has to be used for versions 9.0 and later.

WRF

The CMS team ports the WRF model on Raijin. The latest versions can be found under /projects/WRF on Raijin. From WRF v3.6.0 onwards, the WRF codes under /projects are managed under Git. Information on using the model is on the CMS wiki. Users should also be aware that we do not reproduce the information from the WRF model users website which stays a very important resource for all users. In particular, every new user is strongly encouraged to run through the official tutorials (found under User Support on the WRF model users site).

CABLE

CABLE is under licence so you need to request access to the code. All the information you need to get access and use CABLE is on the CABLE trac site

Data management

Most of the information on how the data is managed is stored in this wiki in the Data Services page. Here you can find information on the datasets managed by the Center as well the relevant policies in regard to data. Take some time to get familiar with it, here are some suggestions:

  1. Read the Center data policy and how to prepare a data management plan. You can use the ARCCSS DMPonline tool to do this. Because this tool stores your plan into a database, you can save it and access it as many times as you want, which means you can build your data management plan as you progress with your research. It is good though to start as soon as you can, DMPonline is structured as to teach about data practices and the resources available to you, it also informs the CMS team on which data, software and/or infrastructure you are planning to use, so we can better tailor our efforts to your needs.
  2. Get familiar with publishing requirements, while this might seem faraway, it is a good idea to keep track from now of what you are doing in regard to data and metadata, as it is often much more difficult to retrieve this information at the end of your project. There are also some technical times involved in getting data properly published.
  3. Consider creating a researcher-ID to uniquely identify your work.
  4. Make sure you browse through all the available data sources before you download data, probably even better send us an e-mail to the climate_help to make sure you are not wasting time downloading something which might be already available, or that we could download and manage for you
  5. Become familiar with the different filesystems available on any server you are using, make sure you know which are meant to be working spaces, where you can archive your data, how to share data with collaborators and best practices to transfer data within and out of the system. If you are using raijin on NCI you can find this information on their user support and training sections.
  6. You can explore a lot of data related information in the Australian National Data Services website
  7. Don't hesitate in contacting climate_help for any question you might have, or also e-mail directly paola.petrelli@utas.edu.au, if you have a more general question or you are confused about data management. We are always available to discuss with you the details of your project and give you advice.

How to learn more

General setup:

a blog post by a UNSW's PhD student on a few things to know and learn: Oliver Angélil's post

CMS:

this wiki

NCI:

Guides and reference pages

Training material and this

Version Control with git:

(Note: git might also be very useful for your thesis!) Git video tutorial by GitHub

Data management:

Data induction

Getting help

Your first port of call for help should be the CMS team via the help-desk: cws_help@nci.org.au. This help desk is followed by all the members of the CMS team and by some staff from NCI.