Summary of topics:

How to organise plotting large numbers of plots from heterogenous models in python
Layout of subplots with matplotlib in python
Fast percentile calculation in python

Organising workflows in python

cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:

https://cf-xarray.readthedocs.io/en/latest/

The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:

https://docs.python.org/3/library/pathlib.html

As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:

For more information, this is a pretty comprehensive write up of some of the commonly used data structures in python https://realpython.com/python-data-structures/

Subplots in matplotlib

Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: https://climate-cms.org/2018/04/27/subplots.html

Jupyter Notebook

Unrelated to the original topics, but some of the attendees didn't know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data

Fast percentile calculation in python

The question was on how to calculate a percentile climatology where we calculate the 90th percentile for each day of the year considering the values for the 31 days surrounding each date.

The notebook illustrates:

the use of rolling() and construct() to build a DataArray with the 31 days windows for each day of the timeseries.
the use of groupby() to do calculations for each day of the year.
the use of quantile() to calculate percentiles on DataArrays.
the use of load() if a function complains about chunking.
the use of a list and xarray.concat() to create a DataArray of results.

To remove:

Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN) tas.rolling(time=31, center=True) Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window) tas.rolling(time=31, center=True).construct(time='window') Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample) (tas.rolling(time=31, center=True)

.construct(time='window')

.groupby(time='dayofyear')) Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop: doy_pct = []

for doy, sample in (tas

.rolling(time=31,center=True)

.construct(time='window')

.groupby('time.dayofyear')):

print(doy)

doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window'])) xarray.concat(doy_pct, dim='dayofyear') See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile. Try it with just a single point to help understand how this is working Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.

Anonymous

Search

Navigation

Site Navigation

Models

Links

Navigation

Wiki tools

Wiki tools

CodeBreak 1/9/2021

Namespaces

Page actions

Contents

Summary of topics:

Organising workflows in python

Subplots in matplotlib

Jupyter Notebook

Fast percentile calculation in python

Anonymous

Search

Navigation

Wiki tools

Page tools

CodeBreak 1/9/2021

Contents

Summary of topics:

Organising workflows in python

Subplots in matplotlib

Jupyter Notebook

Fast percentile calculation in python