Difference between revisions of "CodeBreak 1/9/2021"
(4 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
= Organising workflows in python = | = Organising workflows in python = | ||
− | <span style="color:# | + | <span style="background-color:#ffff00">cf-xarray</span> is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here: |
[https://cf-xarray.readthedocs.io/en/latest/ https://cf-xarray.readthedocs.io/en/latest/] | [https://cf-xarray.readthedocs.io/en/latest/ https://cf-xarray.readthedocs.io/en/latest/] | ||
− | The python3 <span style="color:# | + | The python3 <span style="background-color:#ffff00">pathlib</span> library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk: |
[https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html] | [https://docs.python.org/3/library/pathlib.html https://docs.python.org/3/library/pathlib.html] | ||
− | As far as <span style="color:# | + | As far as <span style="background-color:#ffff00">how to organise plotting</span> of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are: |
*[https://realpython.com/python-data-structures/#dict-simple-data-objects python dict] | *[https://realpython.com/python-data-structures/#dict-simple-data-objects python dict] | ||
Line 26: | Line 26: | ||
= Subplots in matplotlib = | = Subplots in matplotlib = | ||
− | Scott wrote a blog showing a sophisticated <span style="color:# | + | Scott wrote a blog showing a sophisticated <span style="background-color:#ffff00">use of subplot</span>, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: [https://climate-cms.org/2018/04/27/subplots.html https://climate-cms.org/2018/04/27/subplots.html] |
− | |||
= Jupyter Notebook = | = Jupyter Notebook = | ||
− | Unrelated to the original topics, but some of the attendees didn't know it was possible to <span style="color:# | + | Unrelated to the original topics, but some of the attendees didn't know it was possible to <span style="background-color:#ffff00">connect a jupyter notebook directly to gadi</span> compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or [https://ood.nci.org.au ood.nci.org.au] (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki [http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi] which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data |
− | |||
− | |||
− | |||
− | |||
− | + | = Fast percentile calculation in python = | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | . | + | The question was on <span style="background-color:#ffff00">how to calculate a percentile climatology</span> where we calculate the 90th percentile for each day of the year considering the values for the 31 days surrounding each date. |
− | + | [https://nbviewer.jupyter.org/gist/ScottWales/205460f23ca4490b82578cb12ac07e7c The notebook] illustrates: | |
− | + | *the use of rolling() and construct() to build a DataArray with the 31-day windows for each day of the timeseries. | |
+ | *the use of groupby() to do calculations for each day of the year. | ||
+ | *the use of quantile() to calculate percentiles on DataArrays. | ||
+ | *the use of load() if a function complains about chunking. | ||
+ | *the use of a list and xarray.concat() to create a DataArray of results. |
Latest revision as of 23:55, 6 September 2021
Contents
Summary of topics:
- How to organise plotting large numbers of plots from heterogenous models in python
- Layout of subplots with matplotlib in python
- Fast percentile calculation in python
Organising workflows in python
cf-xarray is a useful library for writing general code for a heterogenous set of models that might not share the same names for things like dimensions (e.g. lat/latitude/LAT/) and variables. cf-xarray has code for inferring this information and allows you to refer to them in a general way. It is available in the conda environments and the documentation is here:
https://cf-xarray.readthedocs.io/en/latest/
The python3 pathlib library is an object oriented path manipulation library that makes working with paths a lot simpler and cleaner, e.g. when opening data files, and saving processed output or plots to disk:
https://docs.python.org/3/library/pathlib.html
As far as how to organise plotting of a large number of different plots from a range of models, there are a range of data structures that might suit this purpose, and it comes down to the specifics of what needs to be done and personal preference, but some options are:
For more information, this is a pretty comprehensive write up of some of the commonly used data structures in python https://realpython.com/python-data-structures/
Subplots in matplotlib
Scott wrote a blog showing a sophisticated use of subplot, but also has some tips for organising the plots by saving references to each in a dictionary named for the plot type: https://climate-cms.org/2018/04/27/subplots.html
Jupyter Notebook
Unrelated to the original topics, but some of the attendees didn't know it was possible to connect a jupyter notebook directly to gadi compute nodes, which is useful for anyone who must access data on /scratch, or have workloads that are currently too onerous for VDI or ood.nci.org.au (though ood is designed to cater for larger jobs than VDI). This is all covered on our wiki http://climate-cms.wikis.unsw.edu.au/Running_Jupyter_Notebook#On_Gadi which also covers creating symbolic links to access files in other locations on the file system, e.g. /g/data
Fast percentile calculation in python
The question was on how to calculate a percentile climatology where we calculate the 90th percentile for each day of the year considering the values for the 31 days surrounding each date.
The notebook illustrates:
- the use of rolling() and construct() to build a DataArray with the 31-day windows for each day of the timeseries.
- the use of groupby() to do calculations for each day of the year.
- the use of quantile() to calculate percentiles on DataArrays.
- the use of load() if a function complains about chunking.
- the use of a list and xarray.concat() to create a DataArray of results.