Difference between revisions of "Running the UM with Rose"

(Resources)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
+
For a general overview of how to use the UM with Rose and Cylc see the [https://code.metoffice.gov.uk/doc/um/latest/um-training/index.html Met Office's UM tutorial]
==Copying STASH fields==
 
 
 
A pair of macros are provided with UM suites for copying STASH settings between jobs, named STASHExport and STASHImport. You can find the macros in the editor under <span style="font-family:monospace">Metadata -> um</span>
 
 
 
[[File:stashexport.png|800x558px]]
 
 
 
The exported STASH configuration will be saved into '<span style="font-family:monospace">app/um/STASHexport.ini}}'. To import the settings into a new job copy this file to the new suite's '{{app/um/STASHImport.ini}}' and run the '{{STASHImport</span>' macro in the Rose editor on that suite.
 
  
 
== Restarting and Extending Rose Suites ==
 
== Restarting and Extending Rose Suites ==
  
 
To restart a stopped job run
 
To restart a stopped job run
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
rose suite-run --restart
 
rose suite-run --restart
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 21: Line 14:
  
 
If the configuration has changed (say you have edited the suite end date to make it run for another year) you need to reload the configuration when you restart it, which you can do with
 
If the configuration has changed (say you have edited the suite end date to make it run for another year) you need to reload the configuration when you restart it, which you can do with
<syntaxhighlight>
+
<syntaxhighlight lang=text>
 
rose suite-run --reload --restart
 
rose suite-run --reload --restart
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 31: Line 24:
 
The task will continue from the most recent restart file, provided that it is not the very first task. Resubmitting the first UM task will restart the run from the beginning.
 
The task will continue from the most recent restart file, provided that it is not the very first task. Resubmitting the first UM task will restart the run from the beginning.
  
==Porting suites to NCI (work in progress)==  
+
==Copying STASH fields==  
  
===Basics===
+
A pair of macros are provided with UM suites for copying STASH settings between jobs, named STASHExport and STASHImport. You can find the macros in the editor under <span style="font-family:monospace">Metadata -> um</span>
  
Site specific information goes into the `<span style="font-family:monospace">site}}` directory of the suite. If this already exists follow the convention already in place, otherwise create a file `{{site/nci-raijin.rc</span>` which contains at the minimum:
+
[[File:stashexport.png|800x558px]]
 
 
<syntaxhighlight>
 
[ runtime ]
 
    [[ root ]]
 
        [[ environment ]]
 
            UMDIR = /projects/access/umdir
 
            TIDS = /g/data1/access/TIDS
 
 
 
    [[ ACCESSDEV ]]
 
        init-script = ""
 
        [[ job submission ]]
 
            method = background
 
        [[ remote ]]
 
            host = accessdev.nci.org.au
 
 
 
    [[ RAIJIN ]]
 
        init-script = """
 
            module purge
 
            export PATH=~access/bin:$PATH
 
            export ROSE_VERSION=<span style="font-family:monospace"> ROSE_VERSION </span>
 
            ulimit -s unlimited
 
            module load openmpi/1.10.2
 
            """
 
        [[ remote ]]
 
            host = raijin.nci.org.au
 
        [[ job submission ]]
 
            method = pbs
 
        [[ directives ]]
 
            -P = <span style="font-family:monospace"> NCI_PROJECT | default(environ['PROJECT']) </span>
 
            -q = <span style="font-family:monospace"> NCI_QUEUE | default('normal') </span>
 
            -l ncpus = 1
 
            -l mem = 1gb
 
            -l walltime = 0:10:00
 
            -l jobfs = 1gb
 
            -W umask = 0022
 
</syntaxhighlight>
 
 
 
These sections provide default settings for jobs running on NCI servers, running jobs on accessdev (e.g. code downloads) in the background and jobs on raijin in the PBS queue.
 
 
 
To link this into the main suite configuration add a line at the end of `<span style="font-family:monospace">suite.rc</span>`:
 
<syntaxhighlight>
 
{% include 'site/'+SITE+'.rc' %}
 
</syntaxhighlight>
 
and in `<span style="font-family:monospace">rose-suite.conf</span>` add a new Jinja setting
 
<syntaxhighlight>
 
SITE = 'nci-raijin'
 
</syntaxhighlight>
 
 
 
With this done Rose and Cylc will load the site configuration, but individual tasks still need to be hooked up. How to do this will depend on the suite layout. As an example the Nested suite has two top-level groups `<span style="font-family:monospace">[[HOST_LOCAL]]}}` and `{{[[HOST_HPC]]}}` for tasks that should be run on the Cylc server and the HPC respectively, which don't inherit from anything else. In this case you can add to `{{site/nci-raijin.rc</span>`:
 
 
 
<syntaxhighlight>
 
[ runtime ]
 
    [[ HOST_LOCAL ]]
 
        inherit = ACCESSDEV
 
    [[ HOST_HPC ]]
 
        inherit = RAIJIN
 
</syntaxhighlight>
 
 
 
===Building the UM===
 
  
A number of extra modules are required to build the UM. The best reference for the current recommendation is the rose-stem suite for the version you are running - https://code.metoffice.gov.uk/trac/um/browser/main/trunk/rose-stem/site/nci/family.rc.
+
The exported STASH configuration will be saved into 'app/um/STASHexport.ini'. To import the settings into a new job copy this file to the new suite's 'app/um/STASHImport.ini' and run the 'STASHImport' macro in the Rose editor on that suite.
  
At NCI we use a two-stage build - fcm extracts the code on Accessdev, then copies it over to Raijin where it is built. The configuration might look like:
+
==Porting suites to NCI==
  
<syntaxhighlight>
+
See [https://accessdev.nci.org.au/trac/wiki/gadi#ConfiguringACCESSJobsforGadi the ACCESS wiki] for Rose/Cylc configuration settings on Gadi
[ runtime ]
 
    [ FCM_EXTRACT_RESOURCES ]
 
        inherit = HOST_LOCAL
 
 
 
    [ FCM_BUILD_RESOURCES ]
 
        inherit = HOST_HPC
 
        init-script = """
 
            module purge
 
            export PATH=~access/bin:$PATH
 
            export ROSE_VERSION=<span style="font-family:monospace">ROSE_VERSION</span>
 
            ulimit -s unlimited
 
            module load intel-fc/15.0.1.133
 
            module load intel-cc/15.0.1.133
 
            module load openmpi/1.10.2
 
            module load gcom/6.3_ompi.1.10.2
 
            module load netcdf/4.3.0
 
            module load grib-api/1.10.4
 
            module load drhook
 
            module load fcm
 
            module load shumlib/2017.06.1
 
            """
 
</syntaxhighlight>
 
  
 
==Resources==  
 
==Resources==  
  
* [https://accessdev.nci.org.au/trac/wiki/GettingConnected | Getting Connected to Accessdev]
+
* [https://accessdev.nci.org.au/trac/wiki/GettingConnected Getting Connected to Accessdev]
* [https://code.metoffice.gov.uk/doc/um/vn10.8/um-training/index.html | Unified Model Rose Tutorial]
+
* [https://code.metoffice.gov.uk/doc/um/latest/um-training/index.html Unified Model Rose Tutorial]
* [https://metomi.github.io/rose/doc/rose.html | Rose Documentation]
+
* [https://metomi.github.io/rose/doc/rose.html Rose Documentation]
* [https://cylc.github.io/cylc/html/single/cug-html.html | Cylc Documentation]
+
* [https://cylc.github.io/cylc/html/single/cug-html.html Cylc Documentation]
  
* [https://github.com/metomi/rose | Rose on Github]
+
* [https://github.com/metomi/rose Rose on Github]
* [https://github.com/cylc/cylc | Cylc on Github]
+
* [https://github.com/cylc/cylc Cylc on Github]
 
[[Category:Unified Model]][[Category:Rose]]
 
[[Category:Unified Model]][[Category:Rose]]

Latest revision as of 22:14, 29 April 2021

For a general overview of how to use the UM with Rose and Cylc see the Met Office's UM tutorial

Restarting and Extending Rose Suites

To restart a stopped job run

rose suite-run --restart

This will put the suite back into the exact same state it was when Cylc stopped - failed tasks will still be failed, and if it reached the end of the run Cylc will promptly stop running again.

Resubmit a failed task by right clicking on it and selecting 'Trigger Task'

To extend the run dates you'll need to change the end time in the Rose editor, and then reload the configuration.

If the configuration has changed (say you have edited the suite end date to make it run for another year) you need to reload the configuration when you restart it, which you can do with

rose suite-run --reload --restart

Resubmitting Tasks

If a UM task has failed (i.e. it has a red box in the Cylc GUI) you can re-submit it by right clicking the task and selecting 'Trigger (run now)'

The task will continue from the most recent restart file, provided that it is not the very first task. Resubmitting the first UM task will restart the run from the beginning.

Copying STASH fields

A pair of macros are provided with UM suites for copying STASH settings between jobs, named STASHExport and STASHImport. You can find the macros in the editor under Metadata -> um

Stashexport.png

The exported STASH configuration will be saved into 'app/um/STASHexport.ini'. To import the settings into a new job copy this file to the new suite's 'app/um/STASHImport.ini' and run the 'STASHImport' macro in the Rose editor on that suite.

Porting suites to NCI

See the ACCESS wiki for Rose/Cylc configuration settings on Gadi

Resources