CodeBreak 13/10/2021

Revision as of 23:37, 19 October 2021 by C.carouge (talk | contribs) (Created page with " = Summary of topics = <ul style="margin-top:0;margin-bottom:0;padding-inline-start:48px;"> <li><span style="font-size:11pt; font-family:Arial; color:#000000; background-co...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Summary of topics

  • Applying land masking to CMIP data
  • netCDF compression
  • Understanding Dask better
  • Bathymetry data

Land masking

Requirement: compare CMIP6 outputs for several models and sometimes mask the data over the ocean. CMIP6 data does not come with a land-mask data, but it comes with a land fraction field, sftlf. To use it as a mask over the ocean (resp. land), multiply the field by sftlf (resp. 1.-sftlf). Then mask the points where sftlf is 0. (resp. 1.) using a .where()to get NaNs over the masked area.   When comparing several models on different grids, it can be a complicated problem as the masks are different between several models. There are several ways to go around this:

  • Mask the original data then regrid. Then, only keep the data on points where all the models are non-NaNs.
  • Regrid the land fraction from one model and apply it to all models.

Both solutions will create some approximation. Considering both masked and unmasked data needed to be on the common grid, the second solution seems the most straightforward.

NetCDF compression

Some NetCDF files created using a custom python program produced very large (16GB) files. These files were not compressed.   Closer inspection with ncdump -hsshows that some dimension fields are compressed, but the actual data field is not. Attempts to compress after the fact with nccompressfail: nc2nc does not work as the files are in NETCDF4 format, and nccopyfails with a chunking error, presumably it gets confused because some fields are already compressed.
We’ve shown the blog post at explains how to create compressed NetCDF files with python, and settled on a solution to use python scripts in a dedicated queued job with sufficient memory to read the old file, and write a new one with the encoding={field:{‘shuffle’: True, ‘zlib’: True, ‘complevel’: 5}}argument. This will take a few minutes per file.  

Bathymetry Data

A follow up from last session: projecting a South-polar stereo bathymetry with coordinates in metres to latitude/longitude. That was solved satisfactorily, but the next required step is to create a new bathymetry in rectilinear lat/lon coordinates on the same grid as the existing ACCESS-OM2 0.1 degree model. This requires some work and follow-up offline.