Difference between revisions of "Programming"

(One intermediate revision by the same user not shown)
Line 15: Line 15:
* [https://dask.org/ Dask (parallelisation library)]
* [https://dask.org/ Dask Documentation] (parallelisation library)
=== Jupyter Notebooks ===
=== Jupyter Notebooks ===
Line 64: Line 64:
* [https://www.grymoire.com/Unix/Sed.html Sed Introduction and Tutorial]
* [https://www.grymoire.com/Unix/Sed.html Sed Introduction and Tutorial] (batch editing files)
=== Analysing Data ===
=== Analysing Data ===

Latest revision as of 02:23, 29 April 2021

Here are some starting point for topics related to programming. Generally we rely on external sources for basic how-tos and only provide domain-specific advice ourselves. You can also check out our Youtube channel and blog for more topics and demonstrations.


Python is a language widely used in the climate community for data analysis. It's also one of the most commonly used languages generally, knowing Python is a very useful skill.




Jupyter Notebooks

A handy way to work with Python is to use the Jupyter Notebook interface. This lets you make documents combining text, python code and images. You can run Jupyter at NCI using the CLEX Conda environment.


Matplotlib is the starting point for Python plotting. Most of the time you use its pyplot interface, which is much easier to work with.

If you're making plots on a map check out the Cartopy library - this handles things like map projections, drawing coastlines and adding background images.

Analysing data

The xarray library is a great place to start if you're working with gridded NetCDF data. It allows you to easily open a file, perform common types of analyses like climatologies, and easily plot results.

Pandas is a similar library for tabular data, like you'd get from observations at a weather station.

Numpy and Scipy are the grandfathers of Python science. They provide optimised versions of standard mathematical functions, especially for working with arrays. Xarray and Pandas both use Numpy arrays to store data.


Fortran is the language most commonly used in numerical climate and weather models, like MOM and WRF. It's not commonly used outside of science, its main advantage is its optimised speed on supercomputers.



Parallel Programming

There are two common ways to write parallel programs in Fortran. First is to use a MPI (message passing interface) library, which allows your program to run on multiple nodes of a supercomputer with the nodes communicating with each other by passing messages between themselves. Second is to use OpenMP, which lets you share arrays between different parallel instances, but can only work within a single node. MPI is used in all large climate and weather models, OpenMP is an extra layer used in only some of them.


The ARM HPC tools include a debugger called DDT that can be used with both serial and parallel Fortran programs. This lets you stop the program and see the status of variables, as well as swap between different MPI ranks to see the state of the whole parallel program.

The ARM tools also include a profiler called MAP that shows slow spots in a Fortran program.


Bash is the most common language used in the Linux terminal, though there are alternatives like 'csh' and 'zsh'. There are a wide variety of command line programs available



Analysing Data

There's a number of useful tools for working with climate data on the command line, most notably CDO (climate data operators) and NCO (netcdf operators). Both provide easy access to a wide variety of operations.