CodeBreak 29/9/2021

Working with coordinates

When you pull data from different files, there can be cases where the coordinates appear to be the same but in fact differ from each other because of floating point representation. When doing calculations with two DataArrays, Xarray is using the coordinates to find data that is co-located. If the coordinates are slightly different, Xarray will consider the data is not co-located. Then Xarray will likely silently drop the points that are not co-located or return an indexing error. If you know the coordinates should be the same, the simplest is to assign the coordinates of one of the DataArray to the other array so Xarray will now consider them the same. We’ve put a simple example on this blogto illustrate the issue and how to solve it.

Incompatible datetime and cftime error

When working with COSIMA Cookbook data, trying to add uniform coordinates threw an error because the time coordinates were not the same. The solution is to add the argument use_cftime=Trueto the getvarfunction. This forces xarray to always use the more general cftime type for time axes, so the time axes will always use compatible types.   Background: The standard numpy datetime library only supports a subset of the possible time axes that are commonly found in earth system data files, particularly model output data. The xarray library defaults to using the numpy datetime type as this is more compatible with other python packages (e.g. pandas) and so potentially offers more functionality. See the xarray docs for more information.

Reprojecting data from metres to lat/lon in python

Had some trouble converting the BedMachine Antarctic topography/bathymetry mapfrom metres to lat/lon in python. See this notebookfor an example using the pyproj library. Another possibility: there are some scripts for performing coordinate transforms on this data

Calculate daily means of ERA5 pressure-level data using python

ERA5 data is a big dataset, the spatial and temporal resolution are respectively 0.25X0.25 degrees and 1 hr on 37 levels. The full timeseries for one pressure level variable is ~17TB. So while calculating a daily mean using xarray is quite straightforward we need to handle the data size to manage the memory usage.

This notebookshows in detail some strategies to handle the data and parallelise the computation using xarray and dask. While the example uses surface variables, and the older ERA5 collection, it offers a good step by step explanation of the strategy it adopts, introducing also the climtas module which has some useful functions to make the task more manageable and efficient. 

Converting UM outputs from pressure levels to model levels

A question has arisen for interpolating a field that was created on Pressure levels to Model levels.

The method, using metpy.interolate.log_interpolate_1d has been tested using air temperature, a field that was output on both model (UM Stash Code m1s30i204) and pressure (m1s16i004) levels.

Unfortunately, interpolating the temperature of the pressure level field to model levels did not agree with the model level version, with differences up to 30 degrees, beyond what could be considered interpolation uncertainty.

It's correct to use log interpolation to convert from model levels to pressure levels, however to do this you need to know the pressure on model levels. In this instance the pressure had been approximated with the ideal gas law, but what you really need is the actual pressure values from the model. The differences between these was the most likely cause of the error, the model will need to be re-run outputting the pressure field to verify.