Difference between revisions of "Finding datasets"

Line 103: Line 103:
== <span style="font-family:Calibri, sans-serif">'''External data resources'''</span> ==
== <span style="font-family:Calibri, sans-serif">'''External data resources'''</span> ==
Line 108: Line 109:
*[[GIOVANNI|GIOVANNI]]- GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure  
*[[GIOVANNI|GIOVANNI]]&nbsp;- GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure  
*[http://dl.tpac.org.au TPAC]Climate and Ocean data portal  
*[http://dl.tpac.org.au TPAC]&nbsp;Climate and Ocean data portal  
*[http://www.ands.org.au The Australian National Data Service (ANDS)]  
*[http://www.ands.org.au The Australian National Data Service (ANDS)]  
*[http://researchdata.ands.org.au Research Data Australia]  
*[http://researchdata.ands.org.au Research Data Australia]  
Line 116: Line 117:
*[http://www.bom.gov.au/cyclone/history/tracks/ BoM Tropical Cyclone data services]  
*[http://www.bom.gov.au/cyclone/history/tracks/ BoM Tropical Cyclone data services]  
*[https://tropicaldatahub.org/ Tropical Data Hub]  
*[https://tropicaldatahub.org/ Tropical Data Hub]  

Revision as of 21:46, 1 May 2019

Finding climate data at NCI

There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI data catalogue and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.


Step 1 - check the NCI data catalogue

NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from a list of available attributes.

Free text search

Geonetwork free text search will look for an exact match in the record title and description. For example if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only this dataset:

"Proportion of days per month with precip > 0.2mm: ..."

Likewise typing "precipitation" will return 62 datasets including any record that has the exact word in their title or description.

Selection by topic and other attributes

NCI created some categories dividing the datasets based on "topic", they are shown on the catalogue main page.  Once you click on one of them you will see a panel on the left side of the screen showing all the available attributes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel. Probably the most useful attributes are the keywords, they are based on keywords set by the data manager themselves or on Field Of Research (FOR) standards.

Unfortunately the selection panel can be a bit obscure, for example often the FOR codes appears as a sequence of numbers rather then their definition for example "0401" rather then "atmospheric science".  This is because the dataset descriptions and attributes are provided by the owners of the data, so there is a big variance in what is included and how keywords are chosen or interpreted.

Another limitation is that a dataset might be available as part of a data collection but not having his own record. If the collection is well described then at least you should be able to locate it,  sometimes the record descriptions are very generic and don't necessarily list of the datasets included.

NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. Feel free to send them feedback or tell us if you prefer and we will pass it on.

Step 2 - check the CMS wiki

The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because in some cases we have downloaded only a small subset.

Step 3 - ask the helpdesk:  cws_help@nci.org.au 

If you still can’t locate the datasets you were looking for or you find the dataset but the description wasn’t sufficient to determine that it covers your needs then feel free to e-mail us on the helpdesk.

As both geonetwork and the wiki are potentially incomplete or some of the records might be out of date, it is always a good idea to double check with us. We also might know about other data which is on raijin but not necessarily listed or enquire on your behalf to our partners to help you locate a specific product.

If you still cannot find what you are looking for we can help you downloading the data. When we receive a request to download data we try to get back to you as fast as possible with an answer and a timeframe. We are usually able to download the data for you and put in a shared environment where others can also access it. If the dataset requires a lot of storage, the download is time consuming or needs ongoing maintenance we have to check if it is in the center objectives with the infrastructure committee before going ahead.

It is rare that we have to say no to someone and we don't do it without a fairly strong reason because we prefer to download and manage the dataset for you. Having the data shared in a central location where others can access it is also a better use of disk storage.


Accessing the data

Once you know where the dataset is located, you need to join the relevant project, as for any other NCI project you go to

https://my.nci.org.au , search for the project and put in a request.

The lead CI for that project will receive an e-mail and either approve your request or contact you for further information. While this might feel frustrating there are very good reasons why a data manager might want to know how you will use the data. They might want to be sure you are aware of the dataset limitations and not using it improperly, they might want to know which subset of the data users are actually interested into and they usually need to justify the time and effort that goes into maintaining a dataset.
This is becoming increasingly important for NCI too. They have to prove that their funding is spent in useful ways. NCI is currently reviewing all data projects, their aim is to have a separate project for each dataset, if possible, and to make sure every single users have to request access to the data project as opposed to have world readable files.

Datasets hosted on raijin and managed by CLEX


Other datasets hosted on raijin

  • CMIP5- Coupled Model Intercomparison Project Phase 5 data on raijin
  • CMIP6- Coupled Model Intercomparison Project Phase 6 data on raijin


ARCCSS and CLEX datasets and software published on Research Data Australia (RDA)

The Centre of Excellence for Climate System Science has started publishing its datasets on Research Data Australia (RDA), the Australian National Data Service (ANDS) metadata repository. The first datasets to be published were from the Climate Model Downscaling Data for Impacts Research (CliMDDIR), then the ACCESS CMIP5 simulations.

  • ACCESS- CMIP5 simulations
  • CliMDDIR- Climate Model Downscaling Data for Impacts Research
  • ARCCSS collection- ARCCSS datasets on the NCI Data Catalogue
  • C20C+ ACCESS- Atmospheric ACCESS1.3 historical all forcing model output for the Climate of the 20th Century Plus (C20C+) Detection and Attribution sub-project
  • MarineHeatWaves- Marine heatwaves detection code


External data resources