Difference between revisions of "Programming"

(Bash)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
Pages about programming related topics
+
Here are some starting point for topics related to programming. Generally we rely on external sources for basic how-tos and only provide domain-specific advice ourselves. You can also check out our [https://www.youtube.com/user/COECSSCMS/videos Youtube channel] and [https://climate-cms.org blog] for more topics and demonstrations.
  
[[Running IPython Notebook]]
+
== Python ==
  
[[Jenkins]]
+
[https://www.python.org Python] is a language widely used in the climate community for data analysis. It's also one of the most commonly used languages generally, knowing Python is a very useful skill.
  
[[Object-oriented Fortran]]
+
'''Beginner'''
 +
* [https://swcarpentry.github.io/python-novice-inflammation/ Software Carpentry Introduction to Python]
 +
* [http://swcarpentry.github.io/python-novice-gapminder/ Software Carpentry Plotting and Programming in Python]
 +
* [https://carpentrieslab.github.io/python-aos-lesson/ Python for Atmosphere and Ocean Scientists]
 +
* [[Conda|CLEX Conda environment at NCI]]
 +
 
 +
'''Intermediate'''
 +
* [https://docs.python.org/3/ Python Language Documentation]
 +
 
 +
'''Advanced'''
 +
* [https://dask.org/ Dask Documentation] (parallelisation library)
 +
 
 +
=== Jupyter Notebooks ===
 +
 
 +
A handy way to work with Python is to use the [https://jupyter.org/ Jupyter Notebook] interface. This lets you make documents combining text, python code and images. You can run Jupyter at NCI using the [[Conda#Interactive_Analysis_.2F_Jupyter|CLEX Conda environment]].
 +
 
 +
=== Plotting ===
 +
 
 +
[https://matplotlib.org/ Matplotlib] is the starting point for Python plotting. Most of the time you use its [https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html pyplot] interface, which is much easier to work with.
 +
 
 +
If you're making plots on a map check out the [https://scitools.org.uk/cartopy/docs/latest/ Cartopy] library - this handles things like map projections, drawing coastlines and adding background images.
 +
 
 +
=== Analysing data ===
 +
 
 +
The [http://xarray.pydata.org/en/stable/ xarray] library is a great place to start if you're working with gridded NetCDF data. It allows you to easily open a file, perform common types of analyses like climatologies, and easily plot results.
 +
 
 +
[https://pandas.pydata.org/ Pandas] is a similar library for tabular data, like you'd get from observations at a weather station.
 +
 
 +
[https://numpy.org/ Numpy] and [https://www.scipy.org/ Scipy] are the grandfathers of Python science. They provide optimised versions of standard mathematical functions, especially for working with arrays. Xarray and Pandas both use Numpy arrays to store data.
 +
 
 +
== Fortran ==
 +
 
 +
[https://fortran-lang.org/ Fortran] is the language most commonly used in numerical climate and weather models, like MOM and WRF. It's not commonly used outside of science, its main advantage is its optimised speed on supercomputers.
 +
 
 +
'''Beginner''':
 +
* [https://fortran-lang.org/learn/quickstart Quickstart Tutorial]
 +
 
 +
'''Intermediate''':
 +
* [https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top.html Intel Compiler Reference]
 +
* [https://www.unidata.ucar.edu/software/netcdf/docs-fortran/ NetCDF Fortran documentation]
 +
 
 +
=== Parallel Programming ===
 +
 
 +
There are two common ways to write parallel programs in Fortran. First is to use a [https://en.wikipedia.org/wiki/Message_Passing_Interface MPI (message passing interface)] library, which allows your program to run on multiple nodes of a supercomputer with the nodes communicating with each other by passing messages between themselves. Second is to use [https://en.wikipedia.org/wiki/OpenMP OpenMP], which lets you share arrays between different parallel instances, but can only work within a single node. MPI is used in all large climate and weather models, OpenMP is an extra layer used in only some of them.
 +
 
 +
=== Debugging ===
 +
 
 +
The [https://opus.nci.org.au/display/Help/Arm+HPC+Tools+--+formally+Allinea ARM HPC tools] include a debugger called [https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-ddt DDT] that can be used with both serial and parallel Fortran programs. This lets you stop the program and see the status of variables, as well as swap between different MPI ranks to see the state of the whole parallel program.
 +
 
 +
The ARM tools also include a profiler called [https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-map MAP] that shows slow spots in a Fortran program.
 +
 
 +
== Bash ==
 +
 
 +
Bash is the most common language used in the Linux terminal, though there are alternatives like 'csh' and 'zsh'. There are a wide variety of command line programs available
 +
 
 +
'''Beginner'''
 +
* [https://swcarpentry.github.io/shell-novice/ Software Carpentry Introduction to Shell]
 +
 
 +
'''Intermediate'''
 +
* [https://www.grymoire.com/Unix/Sed.html Sed Introduction and Tutorial] (batch editing files)
 +
 
 +
=== Analysing Data ===
 +
 
 +
There's a number of useful tools for working with climate data on the command line, most notably [https://code.mpimet.mpg.de/projects/cdo/ CDO (climate data operators)] and [http://nco.sourceforge.net/ NCO (netcdf operators)]. Both provide easy access to a wide variety of operations.
 +
 
 +
[[Category:Training]]

Latest revision as of 02:23, 29 April 2021

Here are some starting point for topics related to programming. Generally we rely on external sources for basic how-tos and only provide domain-specific advice ourselves. You can also check out our Youtube channel and blog for more topics and demonstrations.

Python

Python is a language widely used in the climate community for data analysis. It's also one of the most commonly used languages generally, knowing Python is a very useful skill.

Beginner

Intermediate

Advanced

Jupyter Notebooks

A handy way to work with Python is to use the Jupyter Notebook interface. This lets you make documents combining text, python code and images. You can run Jupyter at NCI using the CLEX Conda environment.

Plotting

Matplotlib is the starting point for Python plotting. Most of the time you use its pyplot interface, which is much easier to work with.

If you're making plots on a map check out the Cartopy library - this handles things like map projections, drawing coastlines and adding background images.

Analysing data

The xarray library is a great place to start if you're working with gridded NetCDF data. It allows you to easily open a file, perform common types of analyses like climatologies, and easily plot results.

Pandas is a similar library for tabular data, like you'd get from observations at a weather station.

Numpy and Scipy are the grandfathers of Python science. They provide optimised versions of standard mathematical functions, especially for working with arrays. Xarray and Pandas both use Numpy arrays to store data.

Fortran

Fortran is the language most commonly used in numerical climate and weather models, like MOM and WRF. It's not commonly used outside of science, its main advantage is its optimised speed on supercomputers.

Beginner:

Intermediate:

Parallel Programming

There are two common ways to write parallel programs in Fortran. First is to use a MPI (message passing interface) library, which allows your program to run on multiple nodes of a supercomputer with the nodes communicating with each other by passing messages between themselves. Second is to use OpenMP, which lets you share arrays between different parallel instances, but can only work within a single node. MPI is used in all large climate and weather models, OpenMP is an extra layer used in only some of them.

Debugging

The ARM HPC tools include a debugger called DDT that can be used with both serial and parallel Fortran programs. This lets you stop the program and see the status of variables, as well as swap between different MPI ranks to see the state of the whole parallel program.

The ARM tools also include a profiler called MAP that shows slow spots in a Fortran program.

Bash

Bash is the most common language used in the Linux terminal, though there are alternatives like 'csh' and 'zsh'. There are a wide variety of command line programs available

Beginner

Intermediate

Analysing Data

There's a number of useful tools for working with climate data on the command line, most notably CDO (climate data operators) and NCO (netcdf operators). Both provide easy access to a wide variety of operations.