Difference between revisions of "CLEx induction"
|Line 225:||Line 225:|
* <b>qsub</b> – Submit a job to the queue
* <b>qsub</b> – Submit a job to the queue
* <b>qstat</b> – Display the status of the queued jobs
* <b>qdel</b> – Remove a job from the queue
* <b>qdel</b> – Remove a job from the queue
Revision as of 17:45, 11 March 2020
The CMS team is here to help you with the technical aspects of your work. A lot of you (probably all) will have to work with computer code and software, usually in a Linux/Unix environment. You might also need to access and/or share large amounts of data, work on different servers, etc.
Some of you will be somewhat familiar with these tasks whereas they will be new to others. We are here to help whatever your technical knowledge is. We are happy to answer any question, we can't necessarily do everything but feel free to ask as we might have ideas or we might know who you should contact and what to exactly ask.
The members of the CMS team are:
|Paola Petrelli||U Tasmaniafirstname.lastname@example.org|
|Scott Wales||U Melbourneemail@example.com|
(Note: there is no typo, Claire is sitting at ANU but has a UNSW email address). Although we won't ignore emails sent to our individual addresses, our preferred way of contact is through our help desk email address: firstname.lastname@example.org. Contacting the help desk will make sure your email is seen by one of us in case the person at your institution is away for example. Don't worry about the help desk email being an NCI account, you can still ask any question there.
- 1 Step 1: Get an account with NCI
- 2 Step 2: Set up your Connection
- 3 Step 3: Basics of the NCI system
- 4 Step 4: How to run models
- 5 Data management
- 6 How to learn more
- 7 Getting help
Step 1: Get an account with NCI
NCI operates the computers that you will be doing most of your work on. NCI provides services to the Centre, they are not part of the Centre of Excellence.
To get access to NCI servers, you need to do three things:
1. You need to find out which project(s) you need to be connected to.
Your project decides who gets billed for what you do on the NCI Servers. Your supervisor should be able to tell you which project(s) you should get connected to. Be mindful that there are two types of projects: projects for computation and projects for datasets. The mapping of the computational projects for CLEX can be found here. We recommend to join all the projects relevant to your research project but to choose 1 default project to work from. The same page with the list of computational projects explains how to set your default project. The projects for datasets do not have any computational time added to them so you need to make sure you ask for at least one computation project. Your supervisor should know. Note if you will run the ACCESS model or the UM model, you need to ask a connection to the "access" project in addition to at least one computation project.
2. You need to register with NCI.
This can be done on NCI registration page You will need to supply some information, and read and accept their policies. In order to be added to the system, you need to request connection to at least one project and be accepted on the project. Note that this can take a bit of time until the Lead chief investigator of the project has received and confirmed your request via email. Note that this is an automatic email process, you don't need to send an email to the lead chief investigator. Although, you might want to do so if you think you need to introduce yourself to him/her. Note that you should be able to request connection to several projects while getting signed up.
3. You might want to ask your supervisor whether you also should be connected to any of the following data projects.
These don’t provide any computing allowances, but they give access to certain datasets or models that you might need.
|ua8||access to ARCCSS/CLEx published data, GSWP3, NCEP Polar sst, ostia sst, CMIP5 ocean processing||CMS|
|rq7||ECMWF Year of Tropical Convection re-analysis||CMS|
|rq5||OFES - OGCM for the Earth Simulator ocean re-analysis||CMS|
|ub4||ECMWF ERA Interim re-analysis 6 hrs data||CMS|
|cb20||CMIP3 replica data||NCI|
|al33||CMIP5 replica data||NCI|
|rr3||CMIP5 published data||NCI|
|oi10||CMIP6 replica data||NCI|
|rr7||BoM re-analysis collection including ERA-Interim monthly data and JRA55, MERRA2||BoM|
|cable||Give access to the CABLE model and CABLE benchmarking dataset. Please read the CABLE registration information.||CSIRO|
|access||Give access to the Unified Model and ACCESS tools||CMS - BoM - CSIRO|
Step 2: Set up your Connection
The standard method of connecting to the NCI systems is SSH. We want you to use a passphrase protected private key authorisation method with an ssh-agent and agent forwarding for convenience. If you understood every part of that sentence, go ahead and set it up. If not, the next part describes what to do.
The instructions describe how to set up your connection to both gadi.nci.org.au and accessdev.nci.org.au. You can not set up a connection to accessdev if you are not going to use the ACCESS or UM models. So please only follow the instructions you need!
Linux and MacOS:
- Create the ssh directory on your computer
$ mkdir -p ~/.ssh
- Create a config file
$ nano ~/.ssh/config
Enter the following contents, replacing xx0000 with your NCI user name.
Host gadi HostName gadi.nci.org.au User xx0000 ForwardX11 yes ForwardX11Trusted yes Host access HostName accessdev.nci.org.au User xx0000 ForwardX11 yes ForwardX11Trusted yes ForwardAgent yes
If you use a Mac, and you want to store the key in your keychain so that it gets automatically loaded when you log in, you can add this section to your config file:
Host * AddKeysToAgent yes UseKeychain yes
- Create a key pair. It is imperative that you select a strong passphrase when asked for it.
$ ssh-keygen -t rsa
- Distribute the key pair. On most systems, there is a useful script to distribute the key pairs called ssh-copy-id:
$ ssh-copy-id gadi $ ssh-copy-id access
If that script fails, you have to distribute the keys manually with these commands:
$ cat ~/.ssh/id_rsa.pub | ssh gadi "mkdir -p ~/.ssh/; cat >> ~/.ssh/authorized_keys" $ cat ~/.ssh/id_rsa.pub | ssh access "mkdir -p ~/.ssh/; cat >> ~/.ssh/authorized_keys"
- Test whether you have an agent. If you have an agent already running, this command will ask for the passphrase for the just-created key pair:
If the last command told you that it couldn’t open a connection, then you don’t have an agent. Come talk to someone of the CMS team, we will help you set it up.
Windows is not Unix based, so it doesn’t come with standard SSH programs. The most popular SSH program for Windows is PuTTY, available here. You will need at least PuTTY, Pageant (which is the agent), and PuTTYgen, the key generator. A nice video on how to set it up is on YouTube. Windows also doesn't come with an X11 Server, which is needed to display graphical user interfaces. At this point in time, we suggest something like Xming.
Another option is to install Cygwin, which is a large collection of Linux utilities that run on Windows. Cygwin includes a shell, SSH, and an X11 server, as well as many other useful tools.
SSH-keys for file transfers
Additionally, some groups transfer data from NCI to outside NCI, for example for storage on University maintained servers. For these transfers, the best way to proceed is to create a restricted ssh key pair used only for file transfer. The setup is explained in Using restricted commands for transferring files. Most of the instructions are still valid, except for step 3. The restricted command prefix should now be:
from="gadi-dm*.nci.org.au,gadi*.nci.org.au",command="~/bin/rrsync /data/archive",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,no-user-rc ssh-rsa
Step 3: Basics of the NCI system
We are mainly using 3 systems of the NCI: gadi, accessdev and VDI. Gadi is the supercomputer that we run our models on. Accessdev is a virtual server that we have software installed that are required to allow the running the models. VDI is a virtual desktop designed for interactive analysis of your results. All these servers run Linux as operating system.
Read the manual!
Considering you will be using systems that are provided by the NCI, you should take the time to read the documentation provided by them about their systems. That is everything on their help knowledge base.
The servers have a lot of software installed, some of which they have multiple versions of. If you need to use a software, you usually will need to load a certain module. A full help of the module command is accessible through the man pages on gadi. Here are the most important commands regarding the modules:
|$ module help||Show help about the module system|
|$ module avail||List all available modules (a lot!)|
|$ module list||List the currently loaded modules|
|$ module load intel-compiler||Load the default version of the intel compiler|
|$ module load intel-compiler/2019.3.199||Load a specific version of the intel compiler|
|$ module switch intel-compiler/2019.3.199||Switch to a different version of the intel compiler (unload and load)|
|$ module unload intel-compiler/2019.3.199||Unload the specified version of the intel compiler|
|$ module use /g/data/hh5/public/modules||Add our Centre's curated modules to the list of available modules|
Some modules conflict with one another, for example different versions of the same software. You might have to unload one module first to load a different one.
The other important system you need to understand in order to work on gadi is the scheduler PBS. This is the system for running software on dedicated nodes inside gadi. Basically, a user tells PBS that they want to run this job on this many CPUs with these resources for at least so long, and PBS places it in a queue, and when the required resources are available, it will assign this job to them.
Again, NCI has a very good documentation about how to run this here
The most important commands are:
- qsub – Submit a job to the queue
- qstat – Display the status of the queued jobs
- qdel – Remove a job from the queue
Of these, qsub is the most important, and there are so many parameters that we won’t tell them all here.
In general, the fewer resources you request, the earlier your job will run. But if your program exceeds the requested resources at any time, PBS will kill the job, and you have to start over.
- Gadi is comprised of nodes with 48 CPUs each. If you request less than 48 CPUs, you will get the number of CPUs that you request. If you request more, you will need to use a multiple of 48 CPUs.
- Nodes come with 192GB of memory to share within the 48 CPUs. So if your job requires 48 or more CPUs, you might as well request all the memory, for example 384GB for a 96-cpu job.
Running a job on the queue costs so-called SU. On the old raijin, one SU was equivalent to one core for one hour. On the much faster new gadi, one core for one hour costs 2 SU. Note that on gadi, the SU cost is calculated using the higher of the number of cores and the memory usage. So if you request 5 cores but only 2GB memory, its SU cost is based on the 5 cores. If you request 1 core but 96GB of memory (half of the whole node), the cost will be based on 24 cores (half the whole node).
Step 4: How to run models
A lot of information to gain access to models or run the models can be found on the ARCCSS CMS wiki (see link at the end).
A very important message from the CMS team: Often you will work with source code. Please ensure that your version of the source code is under version control! We use SVN and git (usually hosted on github) for the model codes and smaller utilities.DO NOT JUST COPY SOURCE CODE! CHECK IT OUT, OR CHECK IT IN!
The Unified model is currently transitioning from the old user environment umui to the new rose/cylc user interface. Either way, you need to log into accessdev.nci.org.au.
The old user interface has to be used for versions up to 7.x, and can be used for 8.x. umui was developed by the UM, umuix is very similar, but more convenient developed by CSIRO. An introduction into its use is on our wiki: http://climate-cms.unsw.wikispaces.net/Introduction+to+UMUI
The new user interface can be used from version 8.x, and has to be used for versions 9.0 and later.
The CMS team ports the WRF model at NCI. Information on using the model is on the CMS wiki. Users should also be aware that we do not reproduce the information from the WRF model users website which stays a very important resource for all users. In particular, every new user is strongly encouraged to run through the official tutorials (found under User Support on the WRF model users site). As found on the WRF pages on this wiki, only the compilation steps differ from the NCAR tutorial.
CABLE is under licence so you need to request access to the code. All the information you need to get access and use CABLE is on the CABLE trac site
Most of the information on how the data is managed is stored in this wiki in the Data Services page. Here you can find information on the datasets managed by the Center as well the relevant policies in regard to data. Take some time to get familiar with it, here are some suggestions:
- Read the Center data policy and how to prepare a data management plan. You can use the ARCCSS DMPonline tool to do this. Because this tool stores your plan into a database, you can save it and access it as many times as you want, which means you can build your data management plan as you progress with your research. It is good though to start as soon as you can, DMPonline is structured as to teach about data practices and the resources available to you, it also informs the CMS team on which data, software and/or infrastructure you are planning to use, so we can better tailor our efforts to your needs.
- Get familiar with publishing requirements, while this might seem faraway, it is a good idea to keep track from now of what you are doing in regard to data and metadata, as it is often much more difficult to retrieve this information at the end of your project. There are also some technical times involved in getting data properly published.
- Consider creating a researcher-ID to uniquely identify your work.
- Make sure you browse through all the available data sources before you download data, probably even better send us an e-mail to the climate_help to make sure you are not wasting time downloading something which might be already available, or that we could download and manage for you
- Become familiar with the different filesystems available on any server you are using, make sure you know which are meant to be working spaces, where you can archive your data, how to share data with collaborators and best practices to transfer data within and out of the system. If you are using raijin on NCI you can find this information on their user support and training sections.
- You can explore a lot of data related information in the Australian National Data Services website
- Don't hesitate in contacting climate_help for any question you might have, or also e-mail directly email@example.com, if you have a more general question or you are confused about data management. We are always available to discuss with you the details of your project and give you advice.
How to learn more
a blog post by a UNSW's PhD student on a few things to know and learn: Oliver Angélil's post
- E-Mail: firstname.lastname@example.org
- Chat with us on Slack (Use your university email in your sign up.)
- This Wiki
- In person.
Version Control with git:
(Note: git might also be very useful for your thesis!) Git video tutorial by GitHub
Your first port of call for help should be the CMS team via the help-desk: email@example.com. This help desk is followed by all the members of the CMS team and by some staff from NCI.