# Python for Scientists

## Contents

# Python Course Outline

*Note: This is an outline for a future course currently in development.*

## Introduction to Python

### Installation

Not intended as a CMS support topic, only how to quickly get started

- Enthought (self-contained python/numpy/scipy/matplotlib installation)
- Distribution packages (apt-get, yum, MacPorts)

### Starting Python

- Python's interactive shell
- Python scripting (#!/usr/bin/env python stuff)
- iPython, PyLab: Interactive Matlab-like environments
- Spyder: A python-based GUI to emulate Matlab (not recommended?)

### Basics

Focus on major differences from Matlab, programming languages:

- Forced indentation
- Built in data types: bool, int, float, complex, strings, lists, tuples, dicts, files, ...
- Indexing and Slicing (0..N-1, -1, -2, etc.)
- Aliasing: variables as labels
- Conditional syntax (if-then-else, while, etc.)
- Iterators: Looping without indexing ('for x in list:')
- Possibly some itertools examples?

- Functions
- Simple string manipulation
- Objects: Not OO design, just the persistence of objects
- e.g. anything (variables, functions, etc.) can be function arguments

### Modules

- Using modules (import xyz, import xyz as x, from xyz import y)
- There are lots of modules for any given task, we should settle on a few recommendations (but also not be afraid to change our minds in the future)

## Numpy

### Numpy vs Matlab

- numpy arrays vs. Matlab matrices
- Array generation and manipulation (arange, linspace, zeros, meshgrid)

Review basic operations:

- Arithmetic on arrays
- built-in functions (sqrt, sin, etc.) and constants (pi)
- Numpy array indexing (syntax, slicing, etc.)
- Basic manipulations (concatenation, reshaping, tiling, vstack/hstack, etc.)

### Performance

- Introduce numpy as a series of job submissions to the C libraries
- Vectorisation arithmetic (avoiding index loops)
- An Intel MKL numpy is often several times faster than a gcc BLAS numpy (though never actually confirmed this)
- numexpr: An easy-to-use, high-performance alternative to numpy for certain tasks, includes limited parallelisation options.

### Masked arrays (numpy.ma)

(I have only some limited experience with ma, but it is very useful for land masks in CMIP ocean output)

- Mask creation (inc. logical numpy operators)
- Masked array manipulation

## SciPy

I use scipy in a very ad-hoc manner, I don't have a comprehensive grasp of its features. But the following come to mind (with examples):

- Interpolation (scipy.interpolate)
- Earth grid generation

- Statistics (scipy.stats, scipy.random)
- Signal processing (scipy.signal)
- Time series analysis, filtering

- Linear Algebra
- SVD, EOF/PC analysis

## Image Analysis

*Note: ANU GFD has a strong fluid dynamics laboratory, and CoE researcher Andy Hogg often supervises students in lab experiments.*

### Python Image Library (PIL)

- I have no experience with PIL, but it has been used by at least one student here. It is a natural alternative to Matlab's builtin image analysis tools.
- I don't regard image analysis as CMS work, but including it in a course would help promote collaboration here at ANU.

## I/O

### NetCDF

- scipy.io.netcdf:
- Good performance
- Included with scipy
- imperfect NetCDF implementation (occasional garbage 'inf' data)

- netcdf4-python:
- Complete(?) NetCDF3/4 support
- Reduced performance
- Nontrivial installation

### ASCII/raw text input

- I haven't done this in python, but there is still an occasional need for this (esp. for old data sets)

### pydap

- Simplified interface to access netcdf files via OPeNDAP

### PyTables

- HDF5 support, good performance
- (I haven't used this much)

## Plotting

### Matplotlib (2D plotting)

- Line/curve plots
- Scatter plots
- Field plots: Contours, image maps, etc.
- Subplots
- Frills (labels, legends, arrows, etc.)

### Mayavi (3D plotting)

- I have never used this, but it's the only option that I know of

## Shell Interface (OS)

The ANU group uses python for job submissions of numerical models on vayu, so there may be some interest in how to run subprocesses and manipulate files through python scripts, as if it were a traditional shell script.

- os, sys, shutils
- subprocess

## Earth Science Tools and Modules

### basemap (matplotlib)

- geographic plotting
- similar to mmap in Matlab

### datetime, calendar

- Useful for calendar tracking

### gridding?

- Are there generic gridding modules in python? e.g. mercator, cubic sphere, tripolar, etc?