Difference between revisions of "CodeBreak 1/9/2021"

(Created page with " = Code Breaks =   1/9/2021   Summary of topics: * How to organise plotting large numbers of plots from heterogenous models in python * Layout of subplots with ma...")
 
m
Line 1: Line 1:
  
 
= Code Breaks =
 
= Code Breaks =
  1/9/2021  
 
Summary of topics:
 
  
*
+
  1/9/2021   Summary of topics:
How to organise plotting large numbers of plots from heterogenous models in python
 
  
*
+
*How to organise plotting large numbers of plots from heterogenous models in python
Layout of subplots with matplotlib in python
+
*Layout of subplots with matplotlib in python
 +
*Fast percentile calculation in python  
  
*
+
 
Fast percentile calculation in python
 
  
 
 
 
== Organising workflows in python ==
 
== Organising workflows in python ==
  
Line 24: Line 20:
 
[https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html]
 
[https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html]
  
As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:
+
As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:  
   
 
*
 
[https://realpython.com/python-data-structures/#dict-simple-data-objects python dict]
 
  
*
+
*[https://realpython.com/python-data-structures/#dict-simple-data-objects python dict]
[https://datatofish.com/create-pandas-dataframe/ pandas dataframe]
+
*[https://datatofish.com/create-pandas-dataframe/ pandas dataframe]
 +
*[https://realpython.com/python-data-classes/ python data class]  
  
*
+
  For more information this is a pretty comprehensive write up of some of the commonly used data structures in python   [https://realpython.com/python-data-structures/ https://realpython.com/python-data-structures/]  
[https://realpython.com/python-data-classes/ python data class]
 
  
 
 
For more information this is a pretty comprehensive write up of some of the commonly used data structures in python
 
 
 
[https://realpython.com/python-data-structures/ https://realpython.com/python-data-structures/]
 
 
 
 
== Subplots in matplotlib ==
 
== Subplots in matplotlib ==
   
+
 
Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type:
+
&nbsp; Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: &nbsp; [https://climate-cms.org/2018/04/27/subplots.html https://climate-cms.org/2018/04/27/subplots.html]<br/> Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI).&nbsp; This is all covered on our wiki &nbsp; [http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi] &nbsp; which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data &nbsp;
&nbsp;  
+
 
[https://climate-cms.org/2018/04/27/subplots.html https://climate-cms.org/2018/04/27/subplots.html]
 
<br/>  
 
Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI).&nbsp; This is all covered on our wiki
 
&nbsp;  
 
[http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi]
 
&nbsp;  
 
which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data
 
&nbsp;  
 
 
== Fast percentile calculation in python ==
 
== Fast percentile calculation in python ==
&nbsp;  
+
 
Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample
+
&nbsp; Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample &nbsp; Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN) &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;tas.rolling(time=31, center=True) &nbsp; Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window) &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;tas.rolling(time=31, center=True).construct(time='window') &nbsp; Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample) &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;(tas.rolling(time=31, center=True)
&nbsp;  
 
Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN)
 
&nbsp;  
 
&nbsp;&nbsp;&nbsp;&nbsp;tas.rolling(time=31, center=True)
 
&nbsp;  
 
Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window)
 
&nbsp;  
 
&nbsp;&nbsp;&nbsp;&nbsp;tas.rolling(time=31, center=True).construct(time='window')
 
&nbsp;  
 
Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample)
 
&nbsp;  
 
&nbsp;&nbsp;&nbsp;&nbsp;(tas.rolling(time=31, center=True)
 
  
 
&nbsp;&nbsp;.construct(time='window')
 
&nbsp;&nbsp;.construct(time='window')
  
&nbsp;&nbsp;.groupby(time='dayofyear'))
+
&nbsp;&nbsp;.groupby(time='dayofyear')) &nbsp; Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop: &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;doy_pct = []
&nbsp;  
 
Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop:
 
&nbsp;  
 
&nbsp;&nbsp;&nbsp;&nbsp;doy_pct = []
 
  
 
&nbsp;&nbsp;&nbsp;&nbsp;for doy, sample in (tas
 
&nbsp;&nbsp;&nbsp;&nbsp;for doy, sample in (tas
Line 86: Line 50:
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(doy)
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(doy)
  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window']))
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window'])) &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;xarray.concat(doy_pct, dim='dayofyear') &nbsp; See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile. &nbsp; Try it with just a single point to help understand how this is working &nbsp; Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.<br/> &nbsp;
&nbsp;  
 
&nbsp;&nbsp;&nbsp;&nbsp;xarray.concat(doy_pct, dim='dayofyear')
 
&nbsp;  
 
See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile.
 
&nbsp;  
 
Try it with just a single point to help understand how this is working
 
&nbsp;  
 
Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.
 
<br/> &nbsp;
 

Revision as of 01:54, 6 September 2021

Code Breaks

  1/9/2021   Summary of topics:

  • How to organise plotting large numbers of plots from heterogenous models in python
  • Layout of subplots with matplotlib in python
  • Fast percentile calculation in python

 

Organising workflows in python

cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:

https://cf-xarray.readthedocs.io/en/latest/

The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:

https://docs.python.org/3/library/pathlib.html

As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:  

  For more information this is a pretty comprehensive write up of some of the commonly used data structures in python   https://realpython.com/python-data-structures/  

Subplots in matplotlib

  Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type:   https://climate-cms.org/2018/04/27/subplots.html
Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI).  This is all covered on our wiki   http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi   which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data  

Fast percentile calculation in python

  Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample   Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN)       tas.rolling(time=31, center=True)   Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window)       tas.rolling(time=31, center=True).construct(time='window')   Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample)       (tas.rolling(time=31, center=True)

  .construct(time='window')

  .groupby(time='dayofyear'))   Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop:       doy_pct = []

    for doy, sample in (tas

.rolling(time=31,center=True)

.construct(time='window')

.groupby('time.dayofyear')):

        print(doy)

        doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window']))       xarray.concat(doy_pct, dim='dayofyear')   See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile.   Try it with just a single point to help understand how this is working   Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.