CodeBreak 1/9/2021

Revision as of 02:33, 6 September 2021 by C.carouge (talk | contribs) (C.carouge moved page CodeBreak to CodeBreak 1/9/2021 without leaving a redirect)

Code Breaks

  1/9/2021   Summary of topics:

  • How to organise plotting large numbers of plots from heterogenous models in python
  • Layout of subplots with matplotlib in python
  • Fast percentile calculation in python


Organising workflows in python

cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:

The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:

As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:  

  For more information this is a pretty comprehensive write up of some of the commonly used data structures in python  

Subplots in matplotlib

  Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type:
Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or (though ood is designed to cater for larger jobs than VDI).  This is all covered on our wiki   which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data  

Fast percentile calculation in python

  Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample   Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN)       tas.rolling(time=31, center=True)   Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window)       tas.rolling(time=31, center=True).construct(time='window')   Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample)       (tas.rolling(time=31, center=True)


  .groupby(time='dayofyear'))   Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop:       doy_pct = []

    for doy, sample in (tas





        doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window']))       xarray.concat(doy_pct, dim='dayofyear')   See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile.   Try it with just a single point to help understand how this is working   Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.