Difference between revisions of "ACCESS-S"
Line 1: | Line 1: | ||
+ | {{Unsupported}} | ||
+ | |||
=ACCESS-S= | =ACCESS-S= | ||
Revision as of 00:17, 12 March 2019
Contents
ACCESS-S
This is an as-of-yet unlisted document to chart my progress with ACCESS-S. Once it's running, hopefully this will make it easier to convert it into a document.
Getting ACCESS-S
This is the initial email I got from Hailin:
Hi Holger,
If you'd like to play the ACCESS-S1 suite, you can copy my suite au-aa563.
The followings are the key parameters to run the suite.
In rose-suite.conf:
MAKE_BUILDS=true #set true to compile source codes
N_GSHC_MEMBERS=3 #num of ensemble members, as for MEMBERS=-m in app/glosea_init_cntl_file/rose-app.conf
N_GSHC_STEPS=2 #number of RESUBMIT (number of chunk runs)
RESUB_DAYS=1 #number of days per chunk run
In app/glosea_init_cntl_file/rose-app.conf:
GS_HCST_START_DATE=1990050100 #start date, it is 01 of May in this case
MEMBERS=-m 3 #total number of ensembles, must be the same as the N_GSHC_MEMBERS in rose-suite.conf
GS_YEAR_LIST=1997 #the year of the run
After you compiled the codes and run the job successfully, you could maintain your own INSTALL_DIR which is defined in suite.rc:
INSTALL_DIR = "/short/dx2/hxy599/gc2-install"
If you have any problems please let me know.
Regards,
Hailin
So I made a copy of that, new rose is au-aa566, most of the things were already set to the values that Hailin initiated in his email. I've changed the INSTALL_DIR}} to {{/short/${PROJECT}/${USER}/gc2-install}} but I'm also not a member of the group Template:Dx2 or Template:Ub7, so I'm also trying to copy the Template:DUMP DIR and Template:DUMP DIR BOM directories to my {{/short/${PROJECT}/${USER}/dump}} and {{/short/${PROJECT}/${USER}/dump-bom, respectively, but there is 27TB of data, and I can't do that.
Getting ACCESS-S to run
I've copied the job, and just tried to run it, but it failed with error messages, culminating in Illegal item: [scheduling]initial cycle time
The solution to this is to use older versions of CYLC and ROSE with this command:
$ CYLC_VERSION=6.9.1 ROSE_VERSION=2016.06.1 rosie go
First hurdles:
- gsfc_get_analysis gets a submit-failed
- GSHC_M1-3 get failed
For now, I've reset the suite.rc to point to the BoM directories, to see whether that changes anything -- It didn't
Looking at the job activity log and the job itself of gsfc_get_analysis}}, I notice strange PBS directives: Template:ConsumableMemory(2GB) and Template:Wall clock limit. I find these strings in suite.rc, and replace them with Template:-l vmem=2GB and Template:-l walltime=01:11:00. (I also find another reference to these Values for {{glosea_joi_prods, and change them as well.)
This seems to have succeeded for gsfc_get_analysis}}, but the {{GSHC_M1-3 still fail. I found this error message:
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!
? Error Code: 19
? Error Message: Error reading namelist NLSTCALL. Please check input list against code.
? Error from processor: 0
? Error number: 0
????????????????????????????????????????????????????????????????????????????????
It seems in the namelist entered, there's a value for control_resubmit}}, which the UM doesn't understand. Since rose considers this variable to be compulsory, I've had to remove it from the file {{~roses/au-aa566/app/coupled/rose-app.conf, and now I've submitted it again. (Or I could have disabled all metadata from the menu option...)
Second issues:
gsfc_get_analysis fails at the end, but it seems that it's not doing all that much:
======================================================================================
Resource Usage on 2017-06-06 15:39:23:
Job Id: 5497709.r-man2
Project: w35
Exit Status: 1
Service Units: 1.24
NCPUs Requested: 1 NCPUs Used: 1
CPU Time Used: 00:00:03
Memory Requested: 500.0MB Memory Used: 9.56MB
Walltime requested: 01:11:00 Walltime Used: 01:14:15
JobFS requested: 100.0MB JobFS used: 0B
======================================================================================
CPU time used is only 3 seconds, while it ran out of walltime after almost 1h15m.
So it seems that, since SUITE_TYPE}} is set to Template:Research (and thereby {{GS_SUITE_TYPE is also research, some environment variables are set to directories that might exist on the MetOffice computer, but not on raijin:
{%- if RUN_GSFC or RUN_GSMN %}
[[gsfc_get_analysis]]
environment scripting = """eval $(rose task-env)
export SHORT_DATE=${ROSE_TASK_CYCLE_TIME%%00}"""
[[environment]]
ANALYSES_DATADIR = ${ROSE_DATAC}/analyses/${ROSE_TASK_CYCLE_TIME}
ROSE_TASK_APP = glosea_get_fcst_analyses
{% if GS_SUITE_TYPE != 'research' %}
FOAM_SUITE_NAME = $(os_get_suiteid --mode=<span style="font-family:monospace"> SUITE_TYPE </span> ocean)
GLOBAL_SUITE_NAME = $(os_get_suiteid --mode=<span style="font-family:monospace"> SUITE_TYPE </span> global)
ROSE_DATAC_GLOBAL = ${ROSE_DATAC/$CYLC_SUITE_NAME/$GLOBAL_SUITE_NAME}
ROSE_DATAC_FOAM = ${ROSE_DATAC/$CYLC_SUITE_NAME/$FOAM_SUITE_NAME}
{% else %}
ROSE_DATAC_GLOBAL = /critical/opfc/suites-oper/global/share/data/${ROSE_TASK_CYCLE_TIME}
ROSE_DATAC_FOAM = /critical/opfc/suites-oper/ocean/share/data/${ROSE_TASK_CYCLE_TIME}
{% endif %}
[[directives]]
-l = "vmem=2GB,walltime=01:11:00"
# resources = ConsumableMemory(2Gb)
# wall_clock_limit = "01:11:00,01:10:00"
For now I replaced the else clause above with the same data from as the original and try again.
Full Reset
Scott noticed that there were some new changes to the configuration file, namely RUN_GSFC}} and {{RUN_GSMN were set to true.
Since I couldn't remember ever changing them, I just made a full reset, changed and only changed the project.