Difference between revisions of "Conda"

(Update History)
(21.04 (Current Untable))
 
(15 intermediate revisions by 3 users not shown)
Line 7: Line 7:
  
 
To use any of the conda environments:
 
To use any of the conda environments:
# Request access to [https://my.nci.org.au/mancini/project/hh5 hh5] (to do once)
+
 
# You must first run (to do at each session)
+
#Request access to [https://my.nci.org.au/mancini/project/hh5 hh5] (to do once)  
 +
#You must first run (to do at each session)  
  
 
<code>module use /g/data3/hh5/public/modules</code>
 
<code>module use /g/data3/hh5/public/modules</code>
  
 
You can safely put this in your <code>~/.profile</code> or <code>~/.login</code> file
 
You can safely put this in your <code>~/.profile</code> or <code>~/.login</code> file
 +
 +
If you need to use the conda environment in a PBS job you will need to add the hh5 project to your storage flags, e.g.
 +
<syntaxhighlight lang="none">#PBS -l storage=gdata/hh5
 +
</syntaxhighlight>
  
 
=== Stable Environment ===
 
=== Stable Environment ===
  
We update the stable environment once a quarter, around when NCI do their quarterly maintenance of Raijin. Otherwise everything in the environment stays fixed, we don't update packages or install new packages unless something is very broken.
+
We update the stable environment once a quarter, around when NCI do their quarterly maintenance of Gadi. Otherwise everything in the environment stays fixed, we don't update packages or install new packages unless something is very broken.
  
 
<code>module load conda/analysis3</code>
 
<code>module load conda/analysis3</code>
Line 27: Line 32:
  
 
When we do our quarterly update the unstable environment becomes the new stable environment.
 
When we do our quarterly update the unstable environment becomes the new stable environment.
 +
 +
=== Removed Environments ===
 +
 +
Normally after three quarters have passed old environments are removed, to reduce disk space and support burden. Conda environment.yml descriptions of past environments are available at https://github.com/coecms/conda-history.
  
 
== Creating personal environments ==
 
== Creating personal environments ==
Line 32: Line 41:
 
You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create. Make a file <code>~/.condarc</code> like:
 
You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create. Make a file <code>~/.condarc</code> like:
  
 +
&nbsp;
 
<syntaxhighlight lang="none">
 
<syntaxhighlight lang="none">
 
auto_activate_base: false
 
auto_activate_base: false
Line 47: Line 57:
 
== Interactive Analysis / Jupyter ==
 
== Interactive Analysis / Jupyter ==
  
For interactive analysis we encourage making use of NCI's [[VDI|VDI]] system, which allows you to run Jupyter notebooks without waiting in the queue
+
[https://jupyter.org/ Jupyter] provides a 'notebook' interface for working with Python - you can combine Python code, text, latex equations and plots in a web interface.
 +
 
 +
The Centre has developed scripts that can run Jupyter on NCI facilities - VDI or Gadi - and display the notebook interface on your local computer. These scripts are available at https://github.com/coecms/nci_scripts, see the instructions there for usage.
 +
 
 +
You can also run Jupyter directly on VDI, by loading the [[Conda]] environment and running 'jupyter lab'.
 +
 
 +
Note that on Windows the Jupyter scripts must be run through a Bash terminal (From WSL or Cygwin).
  
 
== Requesting new packages ==
 
== Requesting new packages ==
  
You can ask for a new package to be installed or for an existing package to be updated by emailing [mailto:cws_help@nci.org.au cws_help@nci.org.au].
+
You can ask for a new package to be installed or for an existing package to be updated by emailing [mailto:cws_help@nci.org.au cws_help@nci.org.au]. Please include a link to the package documentation to your request.
 +
 
 +
It would be appreciated if you can check the package isn't already installed before putting in a request. To do so, please load the unstable environment and use <code>conda list</code> to list the packages included in that environment
  
 
As a general rule we will only install packages from the [https://conda-forge.org/feedstocks/ 'conda-forge'] channel. Newly installed packages will be available in the conda/analysis3-unstable environment.
 
As a general rule we will only install packages from the [https://conda-forge.org/feedstocks/ 'conda-forge'] channel. Newly installed packages will be available in the conda/analysis3-unstable environment.
Line 57: Line 75:
 
== Update History ==
 
== Update History ==
  
=== 20.10 (Current Unstable) ===
+
=== 21.04 (Current Unstable) ===
 +
 
 +
=== 21.01 (Current Stable) ===
 +
 
 +
The conda environment now uses a conda provided OpenMPI, rather than Gadi's OpenMPI module. This impacts users of mpi4py, esmf, esmpy and xesmf
 +
 
 +
era5grib, the tool for converting NCI's ERA5 archive to GRIB format for use in UM/WRF limited area runs, now uses the NCI managed ERA5 archive in projects [https://opus.nci.org.au/display/ERA5/ERA5+Community+Home rt52 and zz93]. The new archive has global coverage, allowing limited area models to be run anywhere on the globe. The previously used CLEX archive will be removed on the 28th April to free up disk space, until this happens previous behavior can be accessed using the flag '--source CLEX'.
 +
 
 +
'''Notable new packages'''
 +
 
 +
* '''Statistics'''
 +
** [https://ml.dask.org/index.html dask-ml] Scalable machine learning
 +
** [https://geostat-framework.readthedocs.io/projects/pykrige/en/stable/ pykrige] Gaussian process regression toolkit
 +
** [https://nctoolkit.readthedocs.io/en/latest/ nctoolkit] Toolkit for analysing NetCDF data
 +
 
 +
* '''Geospatial'''
 +
** [http://xarray-spatial.org/ xarray-spatial] Raster-based spatial analysis
 +
** [https://pypi.org/project/Fiona/ fiona] GDAL vector API for Python
 +
 
 +
* '''Visualisation'''
 +
** [https://residentmario.github.io/geoplot/index.html geoplot] Geospatial data visualisation
 +
** [https://python-visualization.github.io/folium/ folium] Leaflet webpage map manipulation
 +
** [https://github.com/jwass/mplleaflet mplleaflet] Use matplotlib on Leaflet webpage maps
 +
 
 +
* '''Developer Tools'''
 +
** [https://mamba.readthedocs.io/en/latest/ mamba] Faster conda environment setup
 +
** [https://github.com/psf/black black] Python code formatter
 +
** [https://intake-esm.readthedocs.io/en/latest/ intake-esm] Data cataloguing tool
 +
 
 +
=== 20.10 (Unsupported) ===
  
 
Python has been updated to 3.8 ([https://docs.python.org/3.8/whatsnew/3.8.html changes])
 
Python has been updated to 3.8 ([https://docs.python.org/3.8/whatsnew/3.8.html changes])
Line 63: Line 110:
 
There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working
 
There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working
  
*pynio  
+
*pynio (https://github.com/conda-forge/pynio-feedstock/issues/90)
*cf_units  
+
*<s>cf_units</s> (was renamed to 'cf-units')
*pymunge  
+
*<s>pymunge</s> (fixed)
*ants
+
*<s>ants</s> (fixed)
  
 
'''Notable new packages'''
 
'''Notable new packages'''
* [https://github.com/corteva/rioxarray rioxarray] geospatial xarray extension powered by rasterio
 
* [https://github.com/cupy/cupy cupy] CUDA accellerated numpy
 
  
=== 20.07 (Current Stable) ===
+
*[https://github.com/corteva/rioxarray rioxarray] geospatial xarray extension powered by rasterio
 +
*[https://github.com/cupy/cupy cupy] CUDA accellerated numpy
 +
*[https://github.com/jwkvam/celluloid celluloid] simplified animations with matplotlib
 +
* [https://pythonhosted.org/rasterstats/ rasterstats] summarizing geospatial raster datasets based on vector geometries
 +
* [https://github.com/python-windrose/windrose windrose] manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
 +
* [https://pyam-iamc.readthedocs.io/en/stable/ pyam] analysis and visualization of integrated-assessment scenarios
 +
* [https://github.com/apache/arrow pyarrow] a cross-language development platform for in-memory data
 +
* [https://github.com/jdowner/gist python-gist] a command line interface for working with github gists
 +
* [https://sourceforge.net/projects/mcmc-jags/ jags] statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo
 +
 
 +
=== <s>20.07 (Unsupported)</s> ===
  
 
'''Notable New Packages'''
 
'''Notable New Packages'''
Line 83: Line 138:
 
*[https://xmitgcm.readthedocs.io xmitgcm] - read mitgcm binary output  
 
*[https://xmitgcm.readthedocs.io xmitgcm] - read mitgcm binary output  
  
=== 20.04 (Unsupported) ===
+
=== <s>20.04 (Unsupported)</s> ===
  
 
Cartopy NaturalEarth source data has been centrally installed, so coastlines etc. can be drawn on compute nodes
 
Cartopy NaturalEarth source data has been centrally installed, so coastlines etc. can be drawn on compute nodes
Line 114: Line 169:
 
*[https://jiffyclub.github.io/palettable/ palettable]: Matplotlib colour palettes  
 
*[https://jiffyclub.github.io/palettable/ palettable]: Matplotlib colour palettes  
  
=== 19.07 (Unsupported) ===
+
=== <s>19.07 (Unsupported)</s> ===
  
 
'''Notable New Packages'''
 
'''Notable New Packages'''

Latest revision as of 23:44, 13 April 2021

Conda Python Environments

CMS maintain an Anaconda Python environment at NCI, with a wide variety of climate and weather related libraries.

You can find the most recent list of libraries at our github repository, or run conda list with an environment loaded.

To use any of the conda environments:

  1. Request access to hh5 (to do once)
  2. You must first run (to do at each session)

module use /g/data3/hh5/public/modules

You can safely put this in your ~/.profile or ~/.login file

If you need to use the conda environment in a PBS job you will need to add the hh5 project to your storage flags, e.g.

#PBS -l storage=gdata/hh5

Stable Environment

We update the stable environment once a quarter, around when NCI do their quarterly maintenance of Gadi. Otherwise everything in the environment stays fixed, we don't update packages or install new packages unless something is very broken.

module load conda/analysis3

Unstable Environment

The unstable environment gets updated more often, as we install new packages or apply updates to existing ones. If you ask for a new package it will be installed here.

module load conda/analysis3-unstable

When we do our quarterly update the unstable environment becomes the new stable environment.

Removed Environments

Normally after three quarters have passed old environments are removed, to reduce disk space and support burden. Conda environment.yml descriptions of past environments are available at https://github.com/coecms/conda-history.

Creating personal environments

You can create your own environment if needed, but please be cautious of both the size on disk and number of files that Conda environments can create. Make a file ~/.condarc like:

 

auto_activate_base: false
envs_dirs:
  - /scratch/$PROJECT/$USER/conda/envs
  - /g/data/hh5/public/apps/miniconda3/envs
pkgs_dirs:
  - /scratch/$PROJECT/$USER/conda/pkgs
conda-build:
  root-dir: /scratch/$PROJECT/$USER/conda/bld

This will set up Conda to create environments in /scratch, by default it puts them in your home directory which will rapidly use up your disk quota.

Interactive Analysis / Jupyter

Jupyter provides a 'notebook' interface for working with Python - you can combine Python code, text, latex equations and plots in a web interface.

The Centre has developed scripts that can run Jupyter on NCI facilities - VDI or Gadi - and display the notebook interface on your local computer. These scripts are available at https://github.com/coecms/nci_scripts, see the instructions there for usage.

You can also run Jupyter directly on VDI, by loading the Conda environment and running 'jupyter lab'.

Note that on Windows the Jupyter scripts must be run through a Bash terminal (From WSL or Cygwin).

Requesting new packages

You can ask for a new package to be installed or for an existing package to be updated by emailing cws_help@nci.org.au. Please include a link to the package documentation to your request.

It would be appreciated if you can check the package isn't already installed before putting in a request. To do so, please load the unstable environment and use conda list to list the packages included in that environment

As a general rule we will only install packages from the 'conda-forge' channel. Newly installed packages will be available in the conda/analysis3-unstable environment.

Update History

21.04 (Current Unstable)

21.01 (Current Stable)

The conda environment now uses a conda provided OpenMPI, rather than Gadi's OpenMPI module. This impacts users of mpi4py, esmf, esmpy and xesmf

era5grib, the tool for converting NCI's ERA5 archive to GRIB format for use in UM/WRF limited area runs, now uses the NCI managed ERA5 archive in projects rt52 and zz93. The new archive has global coverage, allowing limited area models to be run anywhere on the globe. The previously used CLEX archive will be removed on the 28th April to free up disk space, until this happens previous behavior can be accessed using the flag '--source CLEX'.

Notable new packages

  • Statistics
    • dask-ml Scalable machine learning
    • pykrige Gaussian process regression toolkit
    • nctoolkit Toolkit for analysing NetCDF data
  • Visualisation
    • geoplot Geospatial data visualisation
    • folium Leaflet webpage map manipulation
    • mplleaflet Use matplotlib on Leaflet webpage maps
  • Developer Tools

20.10 (Unsupported)

Python has been updated to 3.8 (changes)

There are a small number of packages in analysis3-20.07 that are not compatible with 3.8, these have been disabled until we can get them working

Notable new packages

  • rioxarray geospatial xarray extension powered by rasterio
  • cupy CUDA accellerated numpy
  • celluloid simplified animations with matplotlib
  • rasterstats summarizing geospatial raster datasets based on vector geometries
  • windrose manage wind data, draw windrose (also known as a polar rose plot), draw probability density function and fit Weibull distribution
  • pyam analysis and visualization of integrated-assessment scenarios
  • pyarrow a cross-language development platform for in-memory data
  • python-gist a command line interface for working with github gists
  • jags statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo

20.07 (Unsupported)

Notable New Packages

  • xesmf - regrids xarray data (NOTE: uses Gadi's ESMF install, won't work on VDI)
  • sharppy - sounding and holograph analysis
  • earthpy - spatial raster and vector tools (e.g. rasterise shapefiles)
  • descartes - plot shapefiles
  • era5grib - convert data from NCI ERA5 archive to GRIB for use in WRF/UM (beta tool)
  • xmitgcm - read mitgcm binary output

20.04 (Unsupported)

Cartopy NaturalEarth source data has been centrally installed, so coastlines etc. can be drawn on compute nodes

Clef has been updated to 1.0, can now find ACCESS model CMIP6 data published by NCI

Notable New Packages

  • geopy: Locate lat/lon coordinates of places
  • ninja: mom6 build system

20.01 (Unsupported)

Python has been updated to 3.7

Notable New Packages

  • xlrd: Read excel files
  • ants: Unified Model Ancillary tools
  • climtas: Dask-aware Xarray timeseries processing

19.10 (Unsupported)

basemap has been removed as it is no longer supported and caused conflicts with other packages

Notable New Packages

19.07 (Unsupported)

Notable New Packages

  • pyferret
  • pyngl
  • pynio
  • xgcm
  • xrft

19.04 (unsupported)

Notable Changes

  • arccssive has been renamed to clef
  • A bug preventing Iris from opening UM files without a date has been fixed

Notable New Packages

  • bottleneck Fast rolling operations
  • cfgrib CF metadata for GRIB files
  • cfunits Convert between CF units
  • h5netcdf Pythonic interface to netCDF4 via h5py
  • intake Lightweight data catalogues
  • nccmp Compare netcdf files
  • sparse Sparse multi-dimensional arrays