Difference between revisions of "Python for Scientists"

(Imported from Wikispaces)
 
Line 1: Line 1:
 +
{{Needs Update}}
 +
 
=Python Course Outline=  
 
=Python Course Outline=  
 
''Note: This is an outline for a future course currently in development.''
 
''Note: This is an outline for a future course currently in development.''

Revision as of 01:06, 18 March 2019

Template:Needs Update This page needs updating

Python Course Outline

Note: This is an outline for a future course currently in development.

Introduction to Python

Installation

Not intended as a CMS support topic, only how to quickly get started

  • Enthought (self-contained python/numpy/scipy/matplotlib installation)
  • Distribution packages (apt-get, yum, MacPorts)

Starting Python

  • Python's interactive shell
  • Python scripting (#!/usr/bin/env python stuff)
  • iPython, PyLab: Interactive Matlab-like environments
  • Spyder: A python-based GUI to emulate Matlab (not recommended?)

Basics

Focus on major differences from Matlab, programming languages:

  • Forced indentation
  • Built in data types: bool, int, float, complex, strings, lists, tuples, dicts, files, ...
  • Indexing and Slicing (0..N-1, -1, -2, etc.)
  • Aliasing: variables as labels
  • Conditional syntax (if-then-else, while, etc.)
  • Iterators: Looping without indexing ('for x in list:')
    • Possibly some itertools examples?
  • Functions
  • Simple string manipulation
  • Objects: Not OO design, just the persistence of objects
    • e.g. anything (variables, functions, etc.) can be function arguments

Modules

  • Using modules (import xyz, import xyz as x, from xyz import y)
  • There are lots of modules for any given task, we should settle on a few recommendations (but also not be afraid to change our minds in the future)

Numpy

Numpy vs Matlab

  • numpy arrays vs. Matlab matrices
  • Array generation and manipulation (arange, linspace, zeros, meshgrid)

Review basic operations:

  • Arithmetic on arrays
  • built-in functions (sqrt, sin, etc.) and constants (pi)
  • Numpy array indexing (syntax, slicing, etc.)
  • Basic manipulations (concatenation, reshaping, tiling, vstack/hstack, etc.)

Performance

  • Introduce numpy as a series of job submissions to the C libraries
  • Vectorisation arithmetic (avoiding index loops)
  • An Intel MKL numpy is often several times faster than a gcc BLAS numpy (though never actually confirmed this)
  • numexpr: An easy-to-use, high-performance alternative to numpy for certain tasks, includes limited parallelisation options.

Masked arrays (numpy.ma)

(I have only some limited experience with ma, but it is very useful for land masks in CMIP ocean output)

  • Mask creation (inc. logical numpy operators)
  • Masked array manipulation

SciPy

I use scipy in a very ad-hoc manner, I don't have a comprehensive grasp of its features. But the following come to mind (with examples):

  • Interpolation (scipy.interpolate)
    • Earth grid generation
  • Statistics (scipy.stats, scipy.random)
  • Signal processing (scipy.signal)
    • Time series analysis, filtering
  • Linear Algebra
    • SVD, EOF/PC analysis

Image Analysis

Note: ANU GFD has a strong fluid dynamics laboratory, and CoE researcher Andy Hogg often supervises students in lab experiments.

Python Image Library (PIL)

  • I have no experience with PIL, but it has been used by at least one student here. It is a natural alternative to Matlab's builtin image analysis tools.
  • I don't regard image analysis as CMS work, but including it in a course would help promote collaboration here at ANU.

I/O

NetCDF

  • scipy.io.netcdf:
    • Good performance
    • Included with scipy
    • imperfect NetCDF implementation (occasional garbage 'inf' data)
  • netcdf4-python:
    • Complete(?) NetCDF3/4 support
    • Reduced performance
    • Nontrivial installation

ASCII/raw text input

  • I haven't done this in python, but there is still an occasional need for this (esp. for old data sets)

pydap

  • Simplified interface to access netcdf files via OPeNDAP

PyTables

  • HDF5 support, good performance
  • (I haven't used this much)

Plotting

Matplotlib (2D plotting)

  • Line/curve plots
  • Scatter plots
  • Field plots: Contours, image maps, etc.
  • Subplots
  • Frills (labels, legends, arrows, etc.)

Mayavi (3D plotting)

  • I have never used this, but it's the only option that I know of

Shell Interface (OS)

The ANU group uses python for job submissions of numerical models on vayu, so there may be some interest in how to run subprocesses and manipulate files through python scripts, as if it were a traditional shell script.

  • os, sys, shutils
  • subprocess

Earth Science Tools and Modules

basemap (matplotlib)

  • geographic plotting
  • similar to mmap in Matlab

datetime, calendar

  • Useful for calendar tracking

gridding?

  • Are there generic gridding modules in python? e.g. mercator, cubic sphere, tripolar, etc?