Archiving Output

Revision as of 23:25, 11 December 2019 by (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Template:Pre UM10 This information is only for versions of the UM before 10.0
Template:Needs Update This page needs updating

NCI has a data archive space called MDSS available to its users. This space should be used for long-term storage of large data sets, since it means your data is in a secure location and is not taking up quota on /short.

MDSS is a tape-based store, and is in general slower than the /short disks. Unlike /short however MDSS is backed up with a redundant data store, making data stored there much more secure. For information about the MDSS store see | NCI's userguide, you can request space on MDSS by emailing [| the helpdesk].

The UM can save its output files directly to MDSS using its automatic archiving system. Enabling this requires adding a branch to your job's source code and changing some settings within the UMUI to select what to archive.

Runscripts Branch

The archiving system at NCI is different to the default Met Office system, and requires some minor code changes in the run scripts to work.

In the panel FCM Configuration -> FCM Options for Atmosphere... add the following to the table of branches if not already present:

fcm:um_dev/Share/VN7.3/runscripts/src  |  HEAD  | Y

Since this is a code change you will need to rebuild the model. If you get extract errors let the [| helpdesk] know and we can manually merge the branches.

UMUI Settings

Go to the UMUI panel Post Processing -> Main Switch... and enable post processing at the top of the panel and select the 'MOOSE' archive system. The rest of the settings here are for the Met Office system and not relevant at NCI.

Next you need to enable archiving for each type of file. In the panel Post Processing -> Initialization... are the settings for each of the STASH output streams. Scroll across all the way to the right to the 'Archiving' column and put a 'Y' next to the streams you want archived. The system expects the streams to be used in the following way (you can set them up differently but names might get messed up):

Stream Type
a Monthly means
e Daily means
j 3-Hourly means
h Hourly means

What happens

As the model produces output it will call the script ~access/bin/archive/ to save each file to MDSS. This script does the following:

  • Converts the file to NetCDF format, using the ACCESS CMIP mappings to name the variables
  • Renames the file, giving it an ISO date instead of the UM's encoding and specifying if it is a monthly, daily etc. file
  • Compresses the file using gzip
  • Copies the file to MDSS, in the folder $USER/$RUNID

To see the output files you can use the command mdss ls, e.g.

$ mdss ls saw562/uajra

Retrieve a file with mdss get, e.g.

$ mdss get saw562/uajra/
$ gunzip

If you have problems

If you have any problems using the NCI archive system email [|]. We are also able to fix missing variable names in the NetCDF file (e.g. if it comes out like field1234).

The section on archiving in the | UM user guide may also be helpful, although a lot is Met Office specific information.

Missing Features

  • At the moment the archive server doesn't delete the original files from /short once they've been archived, so you still get the disk filling up over time
  • The UMUI has settings for automatic archiving of restart files, so that for example you can archive a restart file every 2 years if you need to rerun parts of the simulation. This doesn't seem to work with the AMIP runs and needs looking into