CodeBreak 1/9/2021

Revision as of 03:40, 6 September 2021 by C.carouge (talk | contribs)

Summary of topics:

  • How to organise plotting large numbers of plots from heterogenous models in python
  • Layout of subplots with matplotlib in python
  • Fast percentile calculation in python

Organising workflows in python

cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:

The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:

As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:  

For more information, this is a pretty comprehensive write up of some of the commonly used data structures in python  

Subplots in matplotlib

  Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type:

Jupyter Notebook

Unrelated to the original topics, but some of the attendees didn't know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or (though ood is designed to cater for larger jobs than VDI).  This is all covered on our wiki   which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data  

Fast percentile calculation in python

  The question was on how to calculate a percentile climatology where we calculate the 90th percentile for each day of the year considering the values for the 31 days surrounding each date.

The notebook illustrates:

  • the use of rolling() and construct() to build a DataArray with the 31-day windows for each day of the timeseries.
  • the use of groupby() to do calculations for each day of the year.
  • the use of quantile() to calculate percentiles on DataArrays.
  • the use of load() if a function complains about chunking.
  • the use of a list and xarray.concat() to create a DataArray of results.