Difference between revisions of "CodeBreak 1/9/2021"
A.heerdegen (talk | contribs) (Created page with " = Code Breaks = 1/9/2021 Summary of topics: * How to organise plotting large numbers of plots from heterogenous models in python * Layout of subplots with ma...") |
P.petrelli (talk | contribs) m |
||
Line 1: | Line 1: | ||
= Code Breaks = | = Code Breaks = | ||
− | |||
− | |||
− | + | 1/9/2021 Summary of topics: | |
− | |||
− | * | + | *How to organise plotting large numbers of plots from heterogenous models in python |
− | Layout of subplots with matplotlib in python | + | *Layout of subplots with matplotlib in python |
+ | *Fast percentile calculation in python | ||
− | + | | |
− | |||
− | |||
== Organising workflows in python == | == Organising workflows in python == | ||
Line 24: | Line 20: | ||
[https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html] | [https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html] | ||
− | As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are: | + | As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are: |
− | | ||
− | |||
− | |||
− | * | + | *[https://realpython.com/python-data-structures/#dict-simple-data-objects python dict] |
− | [https://datatofish.com/create-pandas-dataframe/ pandas dataframe] | + | *[https://datatofish.com/create-pandas-dataframe/ pandas dataframe] |
+ | *[https://realpython.com/python-data-classes/ python data class] | ||
− | + | For more information this is a pretty comprehensive write up of some of the commonly used data structures in python [https://realpython.com/python-data-structures/ https://realpython.com/python-data-structures/] | |
− | [https://realpython.com/python-data- | ||
− | |||
− | |||
− | |||
− | |||
− | |||
== Subplots in matplotlib == | == Subplots in matplotlib == | ||
− | | + | |
− | Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: | + | Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: [https://climate-cms.org/2018/04/27/subplots.html https://climate-cms.org/2018/04/27/subplots.html]<br/> Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki [http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi] which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data |
− | | + | |
− | [https://climate-cms.org/2018/04/27/subplots.html https://climate-cms.org/2018/04/27/subplots.html] | ||
− | <br/> | ||
− | Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki | ||
− | | ||
− | [http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi] | ||
− | | ||
− | which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data | ||
− | | ||
== Fast percentile calculation in python == | == Fast percentile calculation in python == | ||
− | | + | |
− | Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample | + | Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN) tas.rolling(time=31, center=True) Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window) tas.rolling(time=31, center=True).construct(time='window') Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample) (tas.rolling(time=31, center=True) |
− | | ||
− | Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN) | ||
− | | ||
− | tas.rolling(time=31, center=True) | ||
− | | ||
− | Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window) | ||
− | | ||
− | tas.rolling(time=31, center=True).construct(time='window') | ||
− | | ||
− | Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample) | ||
− | | ||
− | (tas.rolling(time=31, center=True) | ||
.construct(time='window') | .construct(time='window') | ||
− | .groupby(time='dayofyear')) | + | .groupby(time='dayofyear')) Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop: doy_pct = [] |
− | | ||
− | Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop: | ||
− | | ||
− | doy_pct = [] | ||
for doy, sample in (tas | for doy, sample in (tas | ||
Line 86: | Line 50: | ||
print(doy) | print(doy) | ||
− | doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window'])) | + | doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window'])) xarray.concat(doy_pct, dim='dayofyear') See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile. Try it with just a single point to help understand how this is working Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.<br/> |
− | | ||
− | xarray.concat(doy_pct, dim='dayofyear') | ||
− | | ||
− | See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile. | ||
− | | ||
− | Try it with just a single point to help understand how this is working | ||
− | | ||
− | Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'. | ||
− | <br/> |
Revision as of 01:54, 6 September 2021
Contents
Code Breaks
1/9/2021 Summary of topics:
- How to organise plotting large numbers of plots from heterogenous models in python
- Layout of subplots with matplotlib in python
- Fast percentile calculation in python
Organising workflows in python
cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:
https://cf-xarray.readthedocs.io/en/latest/
The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:
https://docs.python.org/3/library/pathlib.html
As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:
For more information this is a pretty comprehensive write up of some of the commonly used data structures in python https://realpython.com/python-data-structures/
Subplots in matplotlib
Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: https://climate-cms.org/2018/04/27/subplots.html
Unrelated to the original topics, but some of the attendees didn' t know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data
Fast percentile calculation in python
Calculating a percentile climatology where for each day in the year, that day plus 15 days either side are gathered together over all years to make a (dayofyear, lat, lon) dataset of 90th percentiles from each sample Rolling will get a 31 day sample centred around the day of interest (at the start and end of the dataset the values will be NAN) tas.rolling(time=31, center=True) Construct will take the rolling samples and convert them to a new dimension, so the data now has dimensions (time, lat, lon, window) tas.rolling(time=31, center=True).construct(time='window') Groupby will collect all the equivalent days of the year (remember the 'time' axis is the centre of the sample) (tas.rolling(time=31, center=True)
.construct(time='window')
.groupby(time='dayofyear')) Normally you can just add a reduction operation (e.g. mean) to the end here, but that doesn't work with percentiles in this case. Instead do a loop: doy_pct = []
for doy, sample in (tas
.rolling(time=31,center=True)
.construct(time='window')
.groupby('time.dayofyear')):
print(doy)
doy_pct.append(sample.load().quantile(0.9, dim=['time', 'window'])) xarray.concat(doy_pct, dim='dayofyear') See that we've called `.load()` inside the loop, which avoids a Dask chunking error when doing a percentile. Try it with just a single point to help understand how this is working Note the use of the list to gather the percentiles arrays for each day, then concatenate along a new dimension 'dayofyear'.