NetCDF Compression Tools
Contents
Why compress netCDF files?
Space. You are likely to store your netCDF files on a shared filesystem either at NCI or at your University, as such, it is your responsibility to manage your space correctly and avoid wasting storage space. Compressing your netCDF data files can shrink your data files to one third of the size. This is the equivalent of being given three times as much disk space.
NetCDF compression is lossless: the data is exactly as it was when read from disk. It can still be read using the same programming interface. As long as the program reading the data has been compiled with the latest netCDF library (version 4) then the task of decompressing the data is handled by the library and as far as the programs are concerned there is no difference in the data. The usual tools, such as ncdump, can be used to examine the variables contained within the netCDF file. However, if you rely on an old piece of software that you think may not have been compiled with netCDF4 you should test that it can read compressed netCDF4 files before converting all your data.
Compression through tools such as gzip is possible but not recommended. This has the disadvantage that the file must be decompressed to be read and then recompressed again when you have finished, which can be time consuming and degrade your productivity, not to mention the data in question will take up much more room while it is being analysed.
General guidelines
The netCDF library has several options for compressing data, which all compression programs will use, as they all use the underlying library to perform the compression. There is a <a alt="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a></a>" href="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a></a>" title="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression</a></a>">more detailed explanation</a> if you wish to understand more, but briefly:
Deflate level
This is an integer value from ranging from 0 to 9. A value of 0 means no compression, and 9 is the highest level of compression possible. The higher this value the smaller your file will be once compressed. However, there is a trade-off, the higher the deflate level, the longer it will take to compress, particularly so with very high deflate levels. At deflate level 9 it can take six times longer to compress the data, with only a few percent improvement in compression. The recommended deflate level is 5. This combines good compression with a small increase in compression time.
Shuffle
Turn shuffle on. Simple. It usually results in a smaller compressed file with little performance overhead.
Chunking
The netCDF library writes the data to disk in "chunks". There is a <a alt="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a></a>" href="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a></a>" title="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" href="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters" title="http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters">http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters</a></a>">very good description of chunking and how it works</a>. All you really have to know is that in order to use netCDF compression your data must be chunked.The question then is, do I care how the program I use chooses the size of my data chunks? The answer is almost certainly yes, but maybe not a lot. An optimal chunking strategy is <a alt="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a></a>" href="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a></a>" title="<a alt="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" href="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>" title="<a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a>"><a alt="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" href="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes" title="http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes">http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes</a></a>">largely determined by the structure of your data and how you will access it</a>.
Specific details about chunking strategies are largely dependent on the tool used to compress your data, and will be covered in more detail in the next section. However, all tools still utilise the underlying netCDF4 library, and so can implement the default chunking strategy, which has changed over time. For many versions the default strategy has been to create chunks that are simply the same size as the dimensions of the variable, which can be a disastrous choice in terms of performance if the data is also compressed. The entire variable must be read into memory to be uncompressed even if only a single slice is required.
Compression tools
There are some software packages available on raijin that can be used to compress netCDF data:
nco
The <a alt="<a alt="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" href="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" title="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>"><a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a></a>" href="<a alt="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" href="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" title="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>"><a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a></a>" title="<a alt="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" href="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>" title="<a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a>"><a alt="http://nco.sourceforge.net" href="http://nco.sourceforge.net" title="http://nco.sourceforge.net">http://nco.sourceforge.net</a></a>">netCDF Operator (NCO) program suite</a> can compress netCDF files and has recently <a alt="<a alt="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" href="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" title="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>"><a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a></a>" href="<a alt="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" href="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" title="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>"><a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a></a>" title="<a alt="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" href="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>" title="<a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a>"><a alt="http://nco.sourceforge.net/nco.html#Compression" href="http://nco.sourceforge.net/nco.html#Compression" title="http://nco.sourceforge.net/nco.html#Compression">http://nco.sourceforge.net/nco.html#Compression</a></a>">included some ability to choose different chunking strategies</a>. It may be that for some cases this is a reasonable solution based on the available options, but a weakness is the inability to use their optimised chunking strategy for variables with four dimensions or more.
cdo
<a alt="<a alt="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" href="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" title="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>"><a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a></a>" href="<a alt="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" href="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" title="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>"><a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a></a>" title="<a alt="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" href="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>" title="<a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a>"><a alt="https://code.zmaw.de/projects/cdo" href="https://code.zmaw.de/projects/cdo" title="https://code.zmaw.de/projects/cdo">https://code.zmaw.de/projects/cdo</a></a>">Climate Data Operators</a> (cdo) can also compress netCDF and offers limited chunking options: auto, grid or lines.
netcdf
One of the standard tools included in a netCDF installation is <a alt="<a alt="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" href="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" title="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>"><a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a></a>" href="<a alt="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" href="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" title="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>"><a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a></a>" title="<a alt="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" href="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>" title="<a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a>"><a alt="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" href="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy" title="https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy">https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html#guide_nccopy</a></a>">nccopy</a>. nccopy can compress files and define the chunking using a command line argument (-c). nccopy is a good option if your data file structure changes little, so a chunking scheme can be decided upon and hard coded into scripts. It is not so useful if the dimensions and variables change. Another major limitation is that the chunking is defined by dimensions, not variables. If your data file has variables that share dimensions, but have different combinations or numbers of dimensions it is not possible to determine an optimal chunking strategy for each variable.
nccompress
The nccompress package is available on gadi in the CMS conda environment. At present it consists of three python programs, ncfind, nc2nc and nccompress, written and supported by <a alt="<a alt="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" href="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" title="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>"><a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a></a>" href="<a alt="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" href="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" title="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>"><a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a></a>" title="<a alt="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" href="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>" title="<a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a>"><a alt="http://www.climatescience.org.au/staff/profile/AHeerdegen" href="http://www.climatescience.org.au/staff/profile/AHeerdegen" title="http://www.climatescience.org.au/staff/profile/AHeerdegen">http://www.climatescience.org.au/staff/profile/AHeerdegen</a></a>">Aidan</a>. nc2nc can copy netCDF files with compression and an optimised chunking strategy that has reasonable performance for many datasets. This two main limitations: it is slower than the other programs, and it can only compress netCDF3 or netCDF4 classic format. There is more detail in the following sections.
ncvarinfo
The convenience utility ncvarinfo is also included, and though it has no direct relevance to compression, it is a convenient way to get a summary of the contents of a netCDF file.
Identifying files to be compressed
ncfind, part of the nccompress package, can be used to find netCDF files and discriminate between compressed and uncompressed:
<syntaxhighlight lang="text"> $ ncfind -h usage: ncfind [-h] [-r] [-u | -c] [inputs [inputs ...]] Find netCDF files. Can discriminate by compression positional arguments: inputs netCDF files or directories (-r must be specified to recursively descend directories). Can accept piped arguments. optional arguments: -h, --help show this help message and exit -r, --recursive Recursively descend directories to find netCDF files (default False) -u, --uncompressed Find only uncompressed netCDF files (default False) -c, --compressed Find only compressed netCDF files (default False) </syntaxhighlight>
There are other methods for finding files, namely the unix utility find utility. For example, to find all files in the directory "directoryname" which end in ".nc":
<syntaxhighlight lang="text"> find directoryname -iname &quot;*.nc&quot; </syntaxhighlight>
However, if your netCDF files do not use the convention of ending in ".nc" or cannot be systematically found based on filename, you can use the ncfind to recursively descend into a directory structure looking for netCDF files:
<syntaxhighlight lang="text"> ncfind -r directoryname </syntaxhighlight>
You can refine the search further by requesting to return only those files that are uncompressed:
<syntaxhighlight lang="text"> ncfind -r -u directoryname </syntaxhighlight>
If you want to find out how much space these uncompressed files occupy you can combine this command with other unix utilities such as xargs and du:
<syntaxhighlight lang="text"> ncfind -r -u directoryname | xargs du -h </syntaxhighlight>
du is the disk usage utility. The output looks something like this:
<syntaxhighlight lang="text"> 67M output212/ice__212_223.nc 1003M output212/ocean__212_223.nc 1.1G total </syntaxhighlight>
It is even possible to combine the system find utility with ncfind, using a unix pipe (|). This command will find all files ending in ".nc", pipe the results to ncfind, and only those that are uncompressed will be printed to the screen:
<syntaxhighlight lang="text"> find directoryname -iname &quot;*.nc&quot; | ncfind -u </syntaxhighlight>
Batch Compressing files
Having identified where the netCDF files you wish to compress are located, there is a convenience program, nccompress, which can be used to easily step through and compress each file in turn:
<syntaxhighlight lang="text"> $ nccompress -h usage: nccompress [-h] [-d {1-9}] [-n] [-b BUFFERSIZE] [-t TMPDIR] [-v] [-r] [-o] [-m MAXCOMPRESS] [-p] [-f] [-c] [-pa] [-np NUMPROC] [--nccopy] inputs [inputs ...] Run nc2nc (or nccopy) on a number of netCDF files positional arguments: inputs netCDF files or directories (-r must be specified to recursively descend directories) optional arguments: -h, --help show this help message and exit -d {1-9}, --dlevel {1-9} Set deflate level. Valid values 0-9 (default=5) -n, --noshuffle Don&#39;t shuffle on deflation (default is to shuffle) -b BUFFERSIZE, --buffersize BUFFERSIZE Set size of copy buffer in MB (default=50) -t TMPDIR, --tmpdir TMPDIR Specify temporary directory to save compressed files -v, --verbose Verbose output -r, --recursive Recursively descend directories compressing all netCDF files (default False) -o, --overwrite Overwrite original files with compressed versions (default is to not overwrite) -m MAXCOMPRESS, --maxcompress MAXCOMPRESS Set a maximum compression as a paranoid check on success of nccopy (default is 10, set to zero for no check) -p, --paranoid Paranoid check : run nco ndiff on the resulting file ensure no data has been altered -f, --force Force compression, even if input file is already compressed (default False) -c, --clean Clean tmpdir by removing existing compressed files before starting (default False) -pa, --parallel Compress files in parallel -np NUMPROC, --numproc NUMPROC Specify the number of processes to use in parallel operation --nccopy Use nccopy instead of nc2nc (default False) </syntaxhighlight>
The simplest way to invoke the program would be with a single file:
<syntaxhighlight lang="text"> nccompress ice_daily_0001.nc </syntaxhighlight>
or using a wildcard expression:
<syntaxhighlight lang="text"> nccompress ice*.nc </syntaxhighlight>
You can also specify one or more directory names in combination with the recursive flag (-r) and the program will recursively descend into those directories and find all netCDF files contained therein. For example, a directory listing might look like so:
<syntaxhighlight lang="text"> $ ls data/ output001 output003 output005 output007 output009 restart001 restart003 restart005 restart007 restart009 output002 output004 output006 output008 output010 restart002 restart004 restart006 restart008 restart010 </syntaxhighlight>
with a number of sub-directories, all containing netCDF files. It is a good idea to do a trial run and make sure it functions properly. For example, this will compress the netCDF files in just one of the directories:
<syntaxhighlight lang="text"> nccompress -p -r data/output001 </syntaxhighlight>
Once completed there will be a new subdirectory called tmp.nc_compress inside the directory output001. It will contain compressed copies of all the netCDF files from the directory above. You can check the compressed copies to make sure they are correct. The paranoid option (-p) calls an nco command to check that the variables contained in the two files are the same. You can use the paranoid option routinely, thought it will make the process more time consuming. It is a good idea to use it in the testing phase. You should also check the compressed copies manually to make sure they look ok, and if so, re-run the command with the -o option (overwrite):
<syntaxhighlight lang="text"> nccompress -r -o data/output001 </syntaxhighlight>
and it will find the already compressed files, copy them over the originals and delete the temporary directory tmp.nc_compress. It won’t try to compress the files again. It also won’t compress already compressed files, so, for example, if you were happy that the compression was working well you could compress the entire data directory, and the already compressed files in output001 will not be re-compressed.
So, by default, nccompress does not overwrite the original files. If you invoke it without the '-o' option it will create compressed copies in the tmp.nc_compress subdirectory and leave them there, which will consume more disk space! This is a feature, not a bug, but you need to be aware that this is how it functions.
With large variables, which usually means large files (> 1GB) it is a good idea to specify a larger buffer size with the '-b' option, as it will run faster. On raijin this may mean you need to run interactively with a higher memory (~10GB) or submit it as a copyq job. A typical buffer size might be 1000 -> 5000 (1->5 GB).
It is also possible to use wildcards type operations, e.g.
<syntaxhighlight lang="text"> nccompress -r -o output* nccompress -r -o output00[1-5] nccompress -r -o run[1-5]/output*/ocean*.nc random.nc ice*.nc </syntaxhighlight>
The nccompress program just sorts out finding files/directories etc, it calls nc2nc to do the compression. Using the '--nccopy' forces nccompress to use the nccopy program in place of nc2nc, though the netcdf package must already be loaded for this to work. You can tell nccompress to work on multple files simultaneously with the '-pa' option. By default this will use all the physical processors on the machine, or you can specify how many simultaneous processes you want to with '-np', e.g.
<syntaxhighlight lang="text"> nccompress -r -o -np 16 run[1-5]/output*/ocean*.nc random.nc ice*.nc </syntaxhighlight>
will compress 16 netCDF files at a time (the -np option implies parallel option). As each directory is processed before beginning on a new directory there will be little reduction in execution time if there are few netCDF files in each directory.
nc2nc
The nc2nc program was written because no existing tool had a generalised per variable chunking algorithm. The total chunk size is defined to be the file system block size (4096KB). The dimensions of the chunk are sized to be as close as possible to the same ratio as the dimensions of the data, with the limits that no dimension can be less than 1. This chunking scheme performs well for a wide range of data, but there will always be cases for certain types of access, or variable shape that this is not optimal. In those cases a different approach may be required.
Be aware that nc2nc takes at least twice as long to compress an equivalent file as nccopy. In some cases with large files containing many variables it can be up to five times slower.
You can use nc2nc “stand alone”. It has a couple of extra features that can only be accessed by calling it directly:
<syntaxhighlight lang="text"> $ nc2nc -h usage: nc2nc [-h] [-d {1-9}] [-m MINDIM] [-b BUFFERSIZE] [-n] [-v] [-c] [-f] [-va VARS] [-q QUANTIZE] [-o] origin destination Make a copy of a netCDF file with automatic chunk sizing positional arguments: origin netCDF file to be compressed destination netCDF output file optional arguments: -h, --help show this help message and exit -d {1-9}, --dlevel {1-9} Set deflate level. Valid values 0-9 (default=5) -m MINDIM, --mindim MINDIM Minimum dimension of chunk. Valid values 1-dimsize -b BUFFERSIZE, --buffersize BUFFERSIZE Set size of copy buffer in MB (default=50) -n, --noshuffle Don&#39;t shuffle on deflation (default is to shuffle) -v, --verbose Verbose output -c, --classic use NETCDF4_CLASSIC output instead of NETCDF4 (default true) -f, --fletcher32 Activate Fletcher32 checksum -va VARS, --vars VARS Specify variables to copy (default is to copy all) -q QUANTIZE, --quantize QUANTIZE Truncate data in variable to a given decimal precision, e.g. -q speed=2 -q temp=0 causes variable speed to be truncated to a precision of 0.01 and temp to a precision of 1 -o, --overwrite Write output file even if already it exists (default is to not overwrite) </syntaxhighlight>
With the vars option (-va) it is possible to select out only a subset of variables to be copied to the destination file. By default the output file is netCDf4 classic, but this can be changed to netCDF4 using the '-c' option. It is also possible to specify a minimum dimension size for the chunks (-m). This may be desirable for a dataset that has one particularly long dimension,. The chunk dimensions would mirror this and be very large in this direction . If fast access is required from slices orthogonal to this direction performance might be improved setting this option to a number greater than 1.
ncvarinfo
ncvarinfo is a convenient way to get a summary of the contents of a netCDF file.
<syntaxhighlight lang="text"> ./ncvarinfo -h usage: ncvarinfo [-h] [-v] [-t] [-d] [-a] [-va VARS] inputs [inputs ...] Output summary information about a netCDF file positional arguments: inputs netCDF files optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -t, --time Show time variables -d, --dims Show dimensions -a, --aggregate Aggregate multiple netCDF files into one dataset -va VARS, --vars VARS Show info for only specify variables </syntaxhighlight>
By default it prints out a simple summary of the variables in a netCDF file, but omitting dimensions and time related variables. e.g.
<syntaxhighlight lang="text"> ncvarinfo output096/ocean_daily.nc output096/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude </syntaxhighlight>
If you specify more than one file it will print the information for each file in turn
<syntaxhighlight lang="text"> ncvarinfo output09?/ocean_daily.nc output096/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude output097/ocean_daily.nc Time steps: 365 x 1.0 days output098/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude output098/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude output099/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude </syntaxhighlight>
If the files have the same structure it is possible to aggregate the data and display it as if it were contained in a single dataset:
<syntaxhighlight lang="text"> ncvarinfo -a output09?/ocean_daily.nc Time steps: 1460 x 1.0 days tau_x :: (1460, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (1460, 1080, 1440) :: j-directed wind stress forcing v-velocity geolon_t :: (1080, 1440) :: tracer longitude geolat_t :: (1080, 1440) :: tracer latitude geolon_c :: (1080, 1440) :: uv longitude geolat_c :: (1080, 1440) :: uv latitude </syntaxhighlight>
You can also just request variables you are interested in to be output:
<syntaxhighlight lang="text"> ncvarinfo -va tau_x -va tau_y output09?/ocean_daily.nc output096/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity output097/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity output098/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity output099/ocean_daily.nc Time steps: 365 x 1.0 days tau_x :: (365, 1080, 1440) :: i-directed wind stress forcing u-velocity tau_y :: (365, 1080, 1440) :: j-directed wind stress forcing v-velocity </syntaxhighlight>
Data induction