Difference between revisions of "Conda"

m (Add note that conda envs don't work on accessdev)
(Updated conda history)
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
 
+
 
 
 
  
 
== Conda Python Environments ==
 
== Conda Python Environments ==
Line 41: Line 40:
  
 
Normally after three quarters have passed old environments are removed, to reduce disk space and support burden. Conda environment.yml descriptions of past environments are available at [https://github.com/coecms/conda-history https://github.com/coecms/conda-history].
 
Normally after three quarters have passed old environments are removed, to reduce disk space and support burden. Conda environment.yml descriptions of past environments are available at [https://github.com/coecms/conda-history https://github.com/coecms/conda-history].
 +
 +
  
 
== Creating personal environments ==
 
== Creating personal environments ==
  
You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create. Make a file <code>~/.condarc</code> like:
+
You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create.
  
&nbsp;
+
'''Make a file <tt><code>~/.condarc</code></tt>''' like:
<syntaxhighlight lang="none">
+
<syntaxhighlight lang="none">auto_activate_base: false
auto_activate_base: false
 
 
envs_dirs:
 
envs_dirs:
 
   - /scratch/$PROJECT/$USER/conda/envs
 
   - /scratch/$PROJECT/$USER/conda/envs
Line 59: Line 59:
  
 
This will set up Conda to create environments in /scratch, by default it puts them in your home directory which will rapidly use up your disk quota.
 
This will set up Conda to create environments in /scratch, by default it puts them in your home directory which will rapidly use up your disk quota.
 +
 +
&nbsp;
 +
 +
'''To create the conda environment, load the conda module, then deactivate it''' with
 +
<syntaxhighlight lang="bash">conda deactivate
 +
conda env create ...
 +
conda activate ...
 +
</syntaxhighlight>
 +
 +
&nbsp;
 +
 +
'''Create an environment file: <tt>environment.yml</tt>.''' Files on <tt>scratch</tt> are deleted if no longer used, so this file allows you to re-create your environment if some of the files are deleted. You can use any name for the environment file. Keep it secure.
 +
<pre>conda env export > environment.yml</pre>
 +
 +
To recreate the environment from the environment file:
 +
<pre>conda env create -f environment.yml</pre>
  
 
== Interactive Analysis / Jupyter ==
 
== Interactive Analysis / Jupyter ==
Line 64: Line 80:
 
[https://jupyter.org/ Jupyter] provides a 'notebook' interface for working with Python - you can combine Python code, text, latex equations and plots in a web interface.
 
[https://jupyter.org/ Jupyter] provides a 'notebook' interface for working with Python - you can combine Python code, text, latex equations and plots in a web interface.
  
The preferred method of running Jupyter at NCI is through the ''''Open on Demand' (OOD)''' service https://ood.nci.org.au. This runs a Jupyter instance in NCI's cloud that you can access directly from your browser. To use CLEX Conda in OOD, start Jupyter with the advanced options:
+
The preferred method of running Jupyter at NCI is through the ''''Open on Demand' (OOD)''' service [https://ood.nci.org.au https://ood.nci.org.au]. This runs a Jupyter instance in NCI's cloud that you can access directly from your browser. To use CLEX Conda in OOD, start Jupyter with the advanced options:
  
* Module Directories: /g/data/hh5/public/modules
+
*Module Directories: /g/data/hh5/public/modules  
* Modules: conda/analysis3
+
*Modules: conda/analysis3  
  
The Centre has developed scripts that can run Jupyter on NCI facilities - VDI or Gadi - and display the notebook interface on your local computer. These scripts are available at https://github.com/coecms/nci_scripts, see the instructions there for usage.
+
The Centre has developed a script&nbsp;('''<span style="font-family:Arial,Helvetica,sans-serif;">gadi_jupyter</span>'''<span style="font-family:Courier New,Courier,monospace;">)</span> that can run Jupyter on gadi compute nodes and display the notebook interface on your local computer. These scripts are available at [https://github.com/coecms/nci_scripts https://github.com/coecms/nci_scripts], see the instructions there for usage.
  
 
You can also run Jupyter directly on VDI, by loading the [[Conda]] environment and running 'jupyter lab'.
 
You can also run Jupyter directly on VDI, by loading the [[Conda]] environment and running 'jupyter lab'.
Line 82: Line 98:
  
 
As a general rule we will only install packages from the [https://conda-forge.org/feedstocks/ 'conda-forge'] channel. Newly installed packages will be available in the conda/analysis3-unstable environment.
 
As a general rule we will only install packages from the [https://conda-forge.org/feedstocks/ 'conda-forge'] channel. Newly installed packages will be available in the conda/analysis3-unstable environment.
 +
  
 
== Update History ==
 
== Update History ==
  
=== 22.01 (Current Unstable) ===
+
=== 22.04&nbsp;(Current Stable) ===
 +
 
 +
'''Notable new packages'''
 +
 
 +
*'''Statistics/ML'''
 +
**[https://lmfit.github.io/lmfit-py/ lmfit]&nbsp;Non-Linear Least-Squares Minimization and Curve-Fitting for Python 
 +
*'''Climate specific'''
 +
**[https://github.com/ccarouge/benchcab benchcab] CABLE benchmarking
 +
**[https://xclim.readthedocs.io/en/stable/ xclim]&nbsp;library of functions to compute climate indices from observations or model simulations. It is built using xarray and can benefit from the parallelization handling provided by dask
 +
**[https://github.com/aus-ref-clim-data-nci/acs-replica-intake acs-replica-intake]&nbsp;Intake-esm catalogue for the Australian Reference Climate Data at NCI collection 
 +
*'''Developer'''
 +
**[https://github.com/jupyter-server/jupyter-resource-usage jupyter-resource-usage]&nbsp;extension for Jupyter Notebooks and JupyterLab that displays an indication of how much resources your current notebook server and its children (kernels, terminals, etc) are using 
 +
 
 +
=== 22.01 (Unsupported) ===
 +
 
 +
'''Notable new packages'''
 +
 
 +
*'''Statistics/ML'''
 +
**[https://pygam.readthedocs.io/en/latest/ pygam]&nbsp;build&nbsp;Generalized Additive Models in Python
 +
**[https://pyspectrum.readthedocs.io/en/latest/ spectrum]&nbsp;tools to estimate Power Spectral Densities based on Fourier transform, parametric methods or eigenvalues analysis 
 +
*'''Climate specific'''
 +
**[https://xmhw.readthedocs.io/en/latest/ xmhw]&nbsp;xarray compatible Marine Heatwave Detection
 +
**[https://argopy.readthedocs.io/en/latest/ argopy]&nbsp;library dedicated to&nbsp;[https://argopy.readthedocs.io/en/latest/what_is_argo.html#what-is-argo Argo]&nbsp;data access, manipulation and visualisation for standard users as well as Argo experts.
 +
**[https://www.ilamb.org/doc/ ilamb]&nbsp;the International Land Model Benchmarking ([http://www.ilamb.org/ ILAMB]) project is a model-data intercomparison and integration project designed to improve the performance of land models 
 +
*'''Developer'''
 +
**[https://github.com/lidatong/dataclasses-json dataclasses-json]&nbsp;provides&nbsp;a simple API for encoding and decoding&nbsp;[https://docs.python.org/3/library/dataclasses.html dataclasses]&nbsp;to and from JSON 
 +
*'''Geospatial'''
 +
**'''​​​​​​​'''[https://pysal.org pysal]&nbsp;spatial analysis library for open, cross platform geospatial data science 
  
=== 21.10 (Current Stable) ===
+
=== <del>21.10</del> ===
  
 
'''Notable new packages'''
 
'''Notable new packages'''
  
* '''Statistics/ML'''
+
*'''Statistics/ML'''  
** [https://github.com/josuemtzmo/xarrayMannKendall xarraymannkendall] compute linear trends over 2D and 3D arrays
+
**[https://github.com/josuemtzmo/xarrayMannKendall xarraymannkendall] compute linear trends over 2D and 3D arrays  
** [https://hdbscan.readthedocs.io/en/latest/ hdbscan] tools to use unsupervised learning to find clusters, or dense regions, of a dataset
+
**[https://hdbscan.readthedocs.io/en/latest/ hdbscan] tools to use unsupervised learning to find clusters, or dense regions, of a dataset  
  
* '''Geospatial'''
+
*'''Geospatial'''  
** [https://github.com/jannikmi/timezonefinder timezonefinder] looking up the corresponding timezone for given coordinates on earth entirely offline
+
**[https://github.com/jannikmi/timezonefinder timezonefinder] looking up the corresponding timezone for given coordinates on earth entirely offline  
** [https://iris-grib.readthedocs.io/en/stable/ iris-grib] converting between weather and climate datasets that are stored as GRIB files and Iris cubes
+
**[https://iris-grib.readthedocs.io/en/stable/ iris-grib] converting between weather and climate datasets that are stored as GRIB files and Iris cubes  
** [https://github.com/NCAR/intake-thredds intake-thredds] Intake interface to THREDDS data catalogs  
+
**[https://github.com/NCAR/intake-thredds intake-thredds] Intake interface to THREDDS data catalogs  
  
* '''Developer Tools'''
+
*'''Developer Tools'''  
** [https://lftp.yar.ru/ lftp] sophisticated file transfer program supporting a number of network protocols
+
**[https://lftp.yar.ru/ lftp] sophisticated file transfer program supporting a number of network protocols  
** [https://github.com/gorakhargosh/watchdog watchdog] monitor file system events
+
**[https://github.com/gorakhargosh/watchdog watchdog] monitor file system events  
** [https://github.com/ml31415/numpy-groupies numpy_groupies] Optimised tools for group-indexing operations: aggregated sum and more  
+
**[https://github.com/ml31415/numpy-groupies numpy_groupies] Optimised tools for group-indexing operations: aggregated sum and more  
** [https://anytree.readthedocs.io/en/latest/ anytree] Simple, lightweight and extensible Tree data structure
+
**[https://anytree.readthedocs.io/en/latest/ anytree] Simple, lightweight and extensible Tree data structure  
** [https://github.com/hansec/fortran-language-server fortran-language-server] Fortran implementation of the Language Server Protocol
+
**[https://github.com/hansec/fortran-language-server fortran-language-server] Fortran implementation of the Language Server Protocol  
** [https://jupyterbook.org/intro.html jupyter-book] building beautiful, publication-quality books and documents from computational material
+
**[https://jupyterbook.org/intro.html jupyter-book] building beautiful, publication-quality books and documents from computational material  
** [https://github.com/fsspec/kerchunk kerchunk] Cloud-friendly access to archival data
+
**[https://github.com/fsspec/kerchunk kerchunk] Cloud-friendly access to archival data  
** [https://github.com/ultrajson/ultrajson ujson] ultra fast JSON encoder and decoder
+
**[https://github.com/ultrajson/ultrajson ujson] ultra fast JSON encoder and decoder  
  
 
'''Known Issues'''
 
'''Known Issues'''
  
* MetPy is incompatible with the Matplotlib version in this environment https://github.com/Unidata/MetPy/issues/2281
+
*MetPy is incompatible with the Matplotlib version in this environment [https://github.com/Unidata/MetPy/issues/2281 https://github.com/Unidata/MetPy/issues/2281]
  
=== 21.07 (Unsupported) ===
+
=== <s>21.07</s> ===
  
Python has been updated to version 3.9. See what's new at https://docs.python.org/3/whatsnew/3.9.html
+
Python has been updated to version 3.9. See what's new at [https://docs.python.org/3/whatsnew/3.9.html https://docs.python.org/3/whatsnew/3.9.html]
  
 
'''Notable new packages'''
 
'''Notable new packages'''
  
* '''Statistics/ML'''
+
*'''Statistics/ML'''  
** [https://www.tensorflow.org/ tensorflow] TensorFlow is an end-to-end open source platform for machine learning
+
**[https://www.tensorflow.org/ tensorflow] TensorFlow is an end-to-end open source platform for machine learning  
*** Note: You may need to load the NCI modules 'cuda' and 'cudnn' to use tensorflow on GPU nodes
+
***Note: You may need to load the NCI modules 'cuda' and 'cudnn' to use tensorflow on GPU nodes  
** [https://shap.readthedocs.io/en/latest/index.html shap] a game theoretic approach to explain the output of any machine learning model
+
**[https://shap.readthedocs.io/en/latest/index.html shap] a game theoretic approach to explain the output of any machine learning model  
  
* '''Geospatial'''
+
*'''Geospatial'''  
** [https://gcm-filters.readthedocs.io/en/latest/ gcm_filters] Diffusion-based Spatial Filtering of Gridded Data from General Circulation Models
+
**[https://gcm-filters.readthedocs.io/en/latest/ gcm_filters] Diffusion-based Spatial Filtering of Gridded Data from General Circulation Models  
  
* '''Visualisation'''
+
*'''Visualisation'''  
** [https://docs.enthought.com/mayavi/mayavi/ mayavi] 3D scientific data visualization and plotting in Python
+
**[https://docs.enthought.com/mayavi/mayavi/ mayavi] 3D scientific data visualization and plotting in Python  
  
* '''Developer Tools'''
+
*'''Developer Tools'''  
** [https://github.com/danielwhatmuff/gh gh] a tool to open Github projects in a browser from the command line
+
**[https://github.com/danielwhatmuff/gh gh] a tool to open Github projects in a browser from the command line  
  
 
=== <s>21.04</s> ===
 
=== <s>21.04</s> ===
Line 140: Line 184:
 
'''Notable new packages'''
 
'''Notable new packages'''
  
* '''Statistics/ML'''
+
*'''Statistics/ML'''  
** [https://www.cvxpy.org/ cvxpy] modeling language for convex optimization problems
+
**[https://www.cvxpy.org/ cvxpy] modeling language for convex optimization problems  
** [https://xgboost.readthedocs.io/en/latest/index.html dask-xgboost] optimized distributed gradient boosting library
+
**[https://xgboost.readthedocs.io/en/latest/index.html dask-xgboost] optimized distributed gradient boosting library  
** [https://github.com/mmhs013/pymannkendall pymannkendall] analyze time series data for consistently increasing or decreasing trends
+
**[https://github.com/mmhs013/pymannkendall pymannkendall] analyze time series data for consistently increasing or decreasing trends  
  
* '''Geospatial'''
+
*'''Geospatial'''  
** [https://xarray-spatial.org/ xarray-spatial] common raster analysis functions
+
**[https://xarray-spatial.org/ xarray-spatial] common raster analysis functions  
  
* '''Visualisation'''
+
*'''Visualisation'''  
** [https://github.com/jbusecke/xmovie xmovie] simple way of creating beautiful movies from xarray objects
+
**[https://github.com/jbusecke/xmovie xmovie] simple way of creating beautiful movies from xarray objects  
  
* '''Developer Tools'''
+
*'''Developer Tools'''  
** [https://www.atlassian.com/git/tutorials/git-subtree git-subtree] subtree extension for Git
+
**[https://www.atlassian.com/git/tutorials/git-subtree git-subtree] subtree extension for Git  
** [https://rechunker.readthedocs.io/en/latest/ rechunker] efficient and scalable manipulation of the chunk structure of chunked array formats such as Zarr and TileDB
+
**[https://rechunker.readthedocs.io/en/latest/ rechunker] efficient and scalable manipulation of the chunk structure of chunked array formats such as Zarr and TileDB  
** [https://github.com/spyder-ide/spyder-kernels spyder-kernels] allow conda environment to work with [https://www.spyder-ide.org/ Spyder IDE]
+
**[https://github.com/spyder-ide/spyder-kernels spyder-kernels] allow conda environment to work with [https://www.spyder-ide.org/ Spyder IDE]  
** [https://github.com/rasbt/watermark watermark] IPython magic extension for printing date and time stamps, version numbers, and hardware information
+
**[https://github.com/rasbt/watermark watermark] IPython magic extension for printing date and time stamps, version numbers, and hardware information  
  
 
=== <s>21.01</s> ===
 
=== <s>21.01</s> ===
Line 165: Line 209:
 
'''Notable new packages'''
 
'''Notable new packages'''
  
* '''Statistics'''
+
*'''Statistics'''  
** [https://ml.dask.org/index.html dask-ml] Scalable machine learning
+
**[https://ml.dask.org/index.html dask-ml] Scalable machine learning  
** [https://geostat-framework.readthedocs.io/projects/pykrige/en/stable/ pykrige] Gaussian process regression toolkit
+
**[https://geostat-framework.readthedocs.io/projects/pykrige/en/stable/ pykrige] Gaussian process regression toolkit  
** [https://nctoolkit.readthedocs.io/en/latest/ nctoolkit] Toolkit for analysing NetCDF data
+
**[https://nctoolkit.readthedocs.io/en/latest/ nctoolkit] Toolkit for analysing NetCDF data  
  
* '''Geospatial'''
+
*'''Geospatial'''  
** [http://xarray-spatial.org/ xarray-spatial] Raster-based spatial analysis
+
**[http://xarray-spatial.org/ xarray-spatial] Raster-based spatial analysis  
** [https://pypi.org/project/Fiona/ fiona] GDAL vector API for Python
+
**[https://pypi.org/project/Fiona/ fiona] GDAL vector API for Python  
  
* '''Visualisation'''
+
*'''Visualisation'''  
** [https://residentmario.github.io/geoplot/index.html geoplot] Geospatial data visualisation
+
**[https://residentmario.github.io/geoplot/index.html geoplot] Geospatial data visualisation  
** [https://python-visualization.github.io/folium/ folium] Leaflet webpage map manipulation
+
**[https://python-visualization.github.io/folium/ folium] Leaflet webpage map manipulation  
** [https://github.com/jwass/mplleaflet mplleaflet] Use matplotlib on Leaflet webpage maps
+
**[https://github.com/jwass/mplleaflet mplleaflet] Use matplotlib on Leaflet webpage maps  
  
* '''Developer Tools'''
+
*'''Developer Tools'''  
** [https://mamba.readthedocs.io/en/latest/ mamba] Faster conda environment setup
+
**[https://mamba.readthedocs.io/en/latest/ mamba] Faster conda environment setup  
** [https://github.com/psf/black black] Python code formatter
+
**[https://github.com/psf/black black] Python code formatter  
** [https://intake-esm.readthedocs.io/en/latest/ intake-esm] Data cataloguing tool
+
**[https://intake-esm.readthedocs.io/en/latest/ intake-esm] Data cataloguing tool  
  
 
=== <s>20.10 (Unsupported)</s> ===
 
=== <s>20.10 (Unsupported)</s> ===
Line 190: Line 234:
 
There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working
 
There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working
  
*pynio (https://github.com/conda-forge/pynio-feedstock/issues/90)
+
*pynio ([https://github.com/conda-forge/pynio-feedstock/issues/90 https://github.com/conda-forge/pynio-feedstock/issues/90])  
*<s>cf_units</s> (was renamed to 'cf-units')
+
*<s>cf_units</s> (was renamed to 'cf-units')  
*<s>pymunge</s> (fixed)
+
*<s>pymunge</s> (fixed)  
*<s>ants</s> (fixed)
+
*<s>ants</s> (fixed)  
  
 
'''Notable new packages'''
 
'''Notable new packages'''
  
 
*[https://github.com/corteva/rioxarray rioxarray] geospatial xarray extension powered by rasterio  
 
*[https://github.com/corteva/rioxarray rioxarray] geospatial xarray extension powered by rasterio  
*[https://github.com/cupy/cupy cupy] CUDA accellerated numpy
+
*[https://github.com/cupy/cupy cupy] CUDA accellerated numpy  
*[https://github.com/jwkvam/celluloid celluloid] simplified animations with matplotlib
+
*[https://github.com/jwkvam/celluloid celluloid] simplified animations with matplotlib  
* [https://pythonhosted.org/rasterstats/ rasterstats] summarizing geospatial raster datasets based on vector geometries
+
*[https://pythonhosted.org/rasterstats/ rasterstats] summarizing geospatial raster datasets based on vector geometries  
* [https://github.com/python-windrose/windrose windrose] manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
+
*[https://github.com/python-windrose/windrose windrose] manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution  
* [https://pyam-iamc.readthedocs.io/en/stable/ pyam] analysis and visualization of integrated-assessment scenarios
+
*[https://pyam-iamc.readthedocs.io/en/stable/ pyam] analysis and visualization of integrated-assessment scenarios  
* [https://github.com/apache/arrow pyarrow] a cross-language development platform for in-memory data  
+
*[https://github.com/apache/arrow pyarrow] a cross-language development platform for in-memory data  
* [https://github.com/jdowner/gist python-gist] a command line interface for working with github gists
+
*[https://github.com/jdowner/gist python-gist] a command line interface for working with github gists  
* [https://sourceforge.net/projects/mcmc-jags/ jags] statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo
+
*[https://sourceforge.net/projects/mcmc-jags/ jags] statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo  
  
 
=== <s>20.07 (Unsupported)</s> ===
 
=== <s>20.07 (Unsupported)</s> ===

Latest revision as of 02:05, 8 July 2022

 

Conda Python Environments

CMS maintain an Anaconda Python environment at NCI, with a wide variety of climate and weather related libraries.

You can find the most recent list of libraries at our github repository, or run conda list with an environment loaded.

To use any of the conda environments:

  1. Request access to hh5 (to do once)
  2. You must first run (to do at each session, a qsub job is a new session)

module use /g/data/hh5/public/modules

You can safely put this in your ~/.bashrc file. It can go in the if in_interactive_shell; then section of the file.

If you need to use the conda environment in a PBS job you will need to add the hh5 project to your storage flags, e.g.

#PBS -l storage=gdata/hh5

Note that these conda environments will work on gadi login and compute nodes, and in the OOD cloud environment. They do not work on accessdev, as this has a very old system version that is no longer compatible.

Stable Environment

We update the stable environment once a quarter, around when NCI do their quarterly maintenance of Gadi. Otherwise everything in the environment stays fixed, we don't update packages or install new packages unless something is very broken.

module load conda/analysis3

Unstable Environment

The unstable environment gets updated more often, as we install new packages or apply updates to existing ones. If you ask for a new package it will be installed here.

module load conda/analysis3-unstable

When we do our quarterly update the unstable environment becomes the new stable environment.

Removed Environments

Normally after three quarters have passed old environments are removed, to reduce disk space and support burden. Conda environment.yml descriptions of past environments are available at https://github.com/coecms/conda-history.


Creating personal environments

You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create.

Make a file ~/.condarc like:

auto_activate_base: false
envs_dirs:
  - /scratch/$PROJECT/$USER/conda/envs
  - /g/data/hh5/public/apps/miniconda3/envs
pkgs_dirs:
  - /scratch/$PROJECT/$USER/conda/pkgs
conda-build:
  root-dir: /scratch/$PROJECT/$USER/conda/bld

This will set up Conda to create environments in /scratch, by default it puts them in your home directory which will rapidly use up your disk quota.

 

To create the conda environment, load the conda module, then deactivate it with

conda deactivate
conda env create ...
conda activate ...

 

Create an environment file: environment.yml. Files on scratch are deleted if no longer used, so this file allows you to re-create your environment if some of the files are deleted. You can use any name for the environment file. Keep it secure.

conda env export > environment.yml

To recreate the environment from the environment file:

conda env create -f environment.yml

Interactive Analysis / Jupyter

Jupyter provides a 'notebook' interface for working with Python - you can combine Python code, text, latex equations and plots in a web interface.

The preferred method of running Jupyter at NCI is through the 'Open on Demand' (OOD) service https://ood.nci.org.au. This runs a Jupyter instance in NCI's cloud that you can access directly from your browser. To use CLEX Conda in OOD, start Jupyter with the advanced options:

  • Module Directories: /g/data/hh5/public/modules
  • Modules: conda/analysis3

The Centre has developed a script (gadi_jupyter) that can run Jupyter on gadi compute nodes and display the notebook interface on your local computer. These scripts are available at https://github.com/coecms/nci_scripts, see the instructions there for usage.

You can also run Jupyter directly on VDI, by loading the Conda environment and running 'jupyter lab'.

Note that on Windows the Jupyter scripts must be run through a Bash terminal (From WSL or Cygwin).

Requesting new packages

You can ask for a new package to be installed or for an existing package to be updated by emailing cws_help@nci.org.au. Please include a link to the package documentation to your request.

It would be appreciated if you can check the package isn't already installed before putting in a request. To do so, please load the unstable environment and use conda list to list the packages included in that environment

As a general rule we will only install packages from the 'conda-forge' channel. Newly installed packages will be available in the conda/analysis3-unstable environment.


Update History

22.04 (Current Stable)

Notable new packages

  • Statistics/ML
    • lmfit Non-Linear Least-Squares Minimization and Curve-Fitting for Python
  • Climate specific
    • benchcab CABLE benchmarking
    • xclim library of functions to compute climate indices from observations or model simulations. It is built using xarray and can benefit from the parallelization handling provided by dask
    • acs-replica-intake Intake-esm catalogue for the Australian Reference Climate Data at NCI collection
  • Developer
    • jupyter-resource-usage extension for Jupyter Notebooks and JupyterLab that displays an indication of how much resources your current notebook server and its children (kernels, terminals, etc) are using

22.01 (Unsupported)

Notable new packages

  • Statistics/ML
    • pygam build Generalized Additive Models in Python
    • spectrum tools to estimate Power Spectral Densities based on Fourier transform, parametric methods or eigenvalues analysis
  • Climate specific
    • xmhw xarray compatible Marine Heatwave Detection
    • argopy library dedicated to Argo data access, manipulation and visualisation for standard users as well as Argo experts.
    • ilamb the International Land Model Benchmarking (ILAMB) project is a model-data intercomparison and integration project designed to improve the performance of land models
  • Developer
  • Geospatial
    • ​​​​​​​pysal spatial analysis library for open, cross platform geospatial data science

21.10

Notable new packages

  • Statistics/ML
    • xarraymannkendall compute linear trends over 2D and 3D arrays
    • hdbscan tools to use unsupervised learning to find clusters, or dense regions, of a dataset
  • Geospatial
    • timezonefinder looking up the corresponding timezone for given coordinates on earth entirely offline
    • iris-grib converting between weather and climate datasets that are stored as GRIB files and Iris cubes
    • intake-thredds Intake interface to THREDDS data catalogs
  • Developer Tools
    • lftp sophisticated file transfer program supporting a number of network protocols
    • watchdog monitor file system events
    • numpy_groupies Optimised tools for group-indexing operations: aggregated sum and more
    • anytree Simple, lightweight and extensible Tree data structure
    • fortran-language-server Fortran implementation of the Language Server Protocol
    • jupyter-book building beautiful, publication-quality books and documents from computational material
    • kerchunk Cloud-friendly access to archival data
    • ujson ultra fast JSON encoder and decoder

Known Issues

21.07

Python has been updated to version 3.9. See what's new at https://docs.python.org/3/whatsnew/3.9.html

Notable new packages

  • Statistics/ML
    • tensorflow TensorFlow is an end-to-end open source platform for machine learning
      • Note: You may need to load the NCI modules 'cuda' and 'cudnn' to use tensorflow on GPU nodes
    • shap a game theoretic approach to explain the output of any machine learning model
  • Geospatial
    • gcm_filters Diffusion-based Spatial Filtering of Gridded Data from General Circulation Models
  • Visualisation
    • mayavi 3D scientific data visualization and plotting in Python
  • Developer Tools
    • gh a tool to open Github projects in a browser from the command line

21.04

Conda has dropped support for the old operating system version used on the Accessdev VM. It is likely that future versions of analysis3 after 21.04 will not work on Accessdev, consider setting up tasks to run on Gadi if they require the Conda environment.

Notable new packages

  • Statistics/ML
    • cvxpy modeling language for convex optimization problems
    • dask-xgboost optimized distributed gradient boosting library
    • pymannkendall analyze time series data for consistently increasing or decreasing trends
  • Visualisation
    • xmovie simple way of creating beautiful movies from xarray objects
  • Developer Tools
    • git-subtree subtree extension for Git
    • rechunker efficient and scalable manipulation of the chunk structure of chunked array formats such as Zarr and TileDB
    • spyder-kernels allow conda environment to work with Spyder IDE
    • watermark IPython magic extension for printing date and time stamps, version numbers, and hardware information

21.01

The conda environment now uses a conda provided OpenMPI, rather than Gadi's OpenMPI module. This impacts users of mpi4py, esmf, esmpy and xesmf

era5grib, the tool for converting NCI's ERA5 archive to GRIB format for use in UM/WRF limited area runs, now uses the NCI managed ERA5 archive in projects rt52 and zz93. The new archive has global coverage, allowing limited area models to be run anywhere on the globe. The previously used CLEX archive will be removed on the 28th April to free up disk space, until this happens previous behavior can be accessed using the flag '--source CLEX'.

Notable new packages

  • Statistics
    • dask-ml Scalable machine learning
    • pykrige Gaussian process regression toolkit
    • nctoolkit Toolkit for analysing NetCDF data
  • Visualisation
    • geoplot Geospatial data visualisation
    • folium Leaflet webpage map manipulation
    • mplleaflet Use matplotlib on Leaflet webpage maps
  • Developer Tools

20.10 (Unsupported)

Python has been updated to 3.8 (changes)

There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working

Notable new packages

  • rioxarray geospatial xarray extension powered by rasterio
  • cupy CUDA accellerated numpy
  • celluloid simplified animations with matplotlib
  • rasterstats summarizing geospatial raster datasets based on vector geometries
  • windrose manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
  • pyam analysis and visualization of integrated-assessment scenarios
  • pyarrow a cross-language development platform for in-memory data
  • python-gist a command line interface for working with github gists
  • jags statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo

20.07 (Unsupported)

Notable New Packages

  • xesmf - regrids xarray data (NOTE: uses Gadi's ESMF install, won't work on VDI)
  • sharppy - sounding and holograph analysis
  • earthpy - spatial raster and vector tools (e.g. rasterise shapefiles)
  • descartes - plot shapefiles
  • era5grib - convert data from NCI ERA5 archive to GRIB for use in WRF/UM (beta tool)
  • xmitgcm - read mitgcm binary output

20.04 (Unsupported)

Cartopy NaturalEarth source data has been centrally installed, so coastlines etc. can be drawn on compute nodes

Clef has been updated to 1.0, can now find ACCESS model CMIP6 data published by NCI

Notable New Packages

  • geopy: Locate lat/lon coordinates of places
  • ninja: mom6 build system

20.01 (Unsupported)

Python has been updated to 3.7

Notable New Packages

  • xlrd: Read excel files
  • ants: Unified Model Ancillary tools
  • climtas: Dask-aware Xarray timeseries processing

19.10 (Unsupported)

basemap has been removed as it is no longer supported and caused conflicts with other packages

Notable New Packages

19.07 (Unsupported)

Notable New Packages

  • pyferret
  • pyngl
  • pynio
  • xgcm
  • xrft

19.04 (unsupported)

Notable Changes

  • arccssive has been renamed to clef
  • A bug preventing Iris from opening UM files without a date has been fixed

Notable New Packages

  • bottleneck Fast rolling operations
  • cfgrib CF metadata for GRIB files
  • cfunits Convert between CF units
  • h5netcdf Pythonic interface to netCDF4 via h5py
  • intake Lightweight data catalogues
  • nccmp Compare netcdf files
  • sparse Sparse multi-dimensional arrays