UM Run Scripts

Revision as of 02:32, 6 February 2013 by ScottWales (talk | contribs) (Imported from Wikispaces)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The UM is operated through a system of shell scripts, generated by the UMUI. In most cases users don't directly interact with these scripts, however the process is documented here for advanced users.

Generating the run scripts happens when you press the 'Process' button in the UMUI, 'Submit' runs them. Submit copies across the umui_jobs/$RUNID directory to umui_runs/$RUNID-$TIMESTAMP, runs SUBMIT on the supercomputer to generate the batch scripts then runs FCM_MAIN_SCR on accesscollab to run fcm & qsub the batch scripts. This is handled by UM/um_nav_actions.tcl in the UMUI code.

UMUI Scripts

These are generated by the UMUI

  • SUBMIT

Generates the run scripts for the supercomputer - umuisubmit_compile, umuisubmit_run and umuisubmit_clr for build, run and build & run jobs.

    • umuisubmit_compile

Sets up the PBS information for the queue system, loads modules then runs FCM_BUILD_COMMAND.

      • FCM_BUILD_COMMAND

Runs 'fcm build' on the umbase, ummodel and umrecon directories in order to compile the runscripts, model and reconfiguration programs, then puts all the programs in the bin directory.

    • umuisubmit_run

Sets up PBS and variables for the job decomposition then runs SCRIPT

      • SCRIPT

Sets up environment variables for the model configuration, then calls the runscript UMScr_TopLevel

    • umuisubmit_clr

Identical to umuisubmit_compile, only once the build is done it will submit umuisubmit_run to the queue to automatically run the job.

  • FCM_MAIN_SCR

Handles the compilation of a job. It extracts the source code from subversion with FCM_EXTR_SCR if needed, then submits the relevant batch script (umuisubmit_{compile,run,clr}) to the queue.

    • FCM_EXTR_SCR

Extracts the source code from subversion to accesscollab using the fcm tool. If the job is using multiple branches then fcm is used to merge them together into a single source folder. Once any merging has been done the source code is copied across to the supercomputer, ready to be compiled.

UM Scripts

These come from the UM repository, in src/script/control

  • UMScr_Toplevel

Sets up namelist files, calls qsmaster then prints copious amounts of output to the log file. Once this is done calls submitchk to check for resubmission.

    • qsmaster

Chooses which of the UM, NEMO or CICE to run, calls qsexecute to run the UM then calls qsfinal to clean up

      • qsexecute

Sets up control files, runs the reconfiguration if needed then runs the UM itself.

Search for 'LINUX_MPP' to find the mpirun commands that get run on NCI. In order these are the reconfiguration, the UM with automatic post-processing enabled and the UM with automatic post-processing disabled. PAREXE is a script that makes sure all of the environment variables are set on every node; this isn't needed on the NCI systems, it's a relic from the Met Office.

The giant if statement at the start of the script is for handling restart data, most of it is error recovery for when the UM is being run in operational mode.

      • qsfinal

Updates restart information with qspickup, qshistreset and qshistprint, creates a resumbit script using qsresubmit.

        • qsresubmit

Checks if the job is a CRUN, then creates a script $JOBID.resub that will submit the next section. The resubmit script isn't yet submitted

    • submitchk

Does some error checking of the archiving system then submits the restart script created by qsresubmit.