Difference between revisions of "Finding datasets"

m
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  
=== <span style="font-size:large;">'''<span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Finding climate data at&nbsp;NCI</span></span></span>'''</span> ===
+
== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:large;">'''<span style="caret-color:#000000"><span style="color:#000000">Finding climate data at&nbsp;NCI</span></span>'''</span></span> ==
  
<span style="font-size:medium;"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI&nbsp;data catalogue and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;"><span style="caret-color:#000000"><span style="color:#000000">There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI&nbsp;data catalogue, and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.</span></span></span></span>
  
 
&nbsp;
 
&nbsp;
  
'''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 1 -&nbsp;check the [http://geonetwork.nci.org.au/geonetwork/srv/eng/main.home NCI data catalogue]</span></span></span></span>'''
+
=== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Check the [https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/home NCI data catalogue]</span></span></span></span>''' ===
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from a list of available attributes.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from a list of available attributes.</span></span></span></span>
  
<font color="#000000"><font face="Calibri, sans-serif"><font size="3"><span style="caret-color:#000000">Free text search</span></font></font></font>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''<font color="#000000"><span style="caret-color:#000000">Free text search</span></font>'''</span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Geonetwork free text search will look for an exact match in the record title and description. For example if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only this dataset:</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">Geonetwork free text search will look for an exact match in the record title and description. For example, if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only one&nbsp;dataset containing this sentence in its description:</span></span></span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">"</span></span></span></span>Proportion of days per month with precip > 0.2mm: ..."
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">"</span></span></span>Proportion of days per month with precip > 0.2mm: ..."</span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Likewise typing "precipitation" will return 62 datasets including any record that has the exact word in their title or description.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">Likewise typing "precipitation" will return 55&nbsp;datasets including any record that has the exact word in their title or description.</span></span></span></span>
  
<span style="font-size:medium;"><span style="font-family:Calibri, sans-serif">Selection&nbsp;by topic and other attributes</span></span>
+
'''<span style="font-size:medium;"><span style="font-family:Calibri, sans-serif">Selection&nbsp;by topic and other attributes</span></span>'''
  
<span style="font-size:medium;"><span style="font-family:Calibri, sans-serif">NCI created some categories dividing the datasets based on "topic", they are&nbsp;shown on the catalogue main page. &nbsp;Once you click on one of them&nbsp;you will see a panel on the left side of the screen showing all the available attributes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel.&nbsp;Probably the most useful attributes are&nbsp;the keywords, they are based on keywords set by the data manager themselves or on&nbsp;[http://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/6BB427AB9696C225CA2574180004463E?opendocument Field Of Research (FOR) standards].</span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">NCI created some categories dividing the datasets based on "topic", they are&nbsp;shown on the catalogue main page. &nbsp;Once you click on one of them, you will see a panel on the left side of the screen showing all the available attributes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel.&nbsp;Probably the most useful attributes are&nbsp;the keywords, they are based on keywords set by the data manager themselves or on&nbsp;[http://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/6BB427AB9696C225CA2574180004463E?opendocument Field Of Research (FOR) standards].</span></span>
  
<span style="font-size:medium;"><span style="font-family:Calibri, sans-serif">Unfortunately the&nbsp;selection&nbsp;panel can be a bit obscure, for example often the FOR codes appears as a sequence of numbers&nbsp;rather then their definition for example "0401" rather then&nbsp;"atmospheric science". &nbsp;This is because the dataset descriptions and attributes&nbsp;are provided by the owners of the data, so there is a big variance in what is included and how keywords are chosen or interpreted.</span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">The dataset descriptions and attributes&nbsp;are provided by the owners of the data, and sometimes overruled by NCI, so there is a big variance in what is included and how keywords are chosen or interpreted.&nbsp;For example often the FOR codes "0401 Atmospheric Sciences" can also&nbsp;appear&nbsp;as &nbsp;"0401",&nbsp;"Atmospheric Sciences" and "ATMOSPERIC&nbsp;SCIENCES" or one of its sub-categories "040107 - Meteorology".&nbsp;</span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Another limitation is that a dataset might be available as&nbsp;part of a data collection but not having his own record. If the collection is well described then at least you should be able to locate it,&nbsp;&nbsp;sometimes the record descriptions are very generic and don't necessarily list of the datasets included.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">Another limitation is that a dataset might be available as&nbsp;part of a data collection but not having its&nbsp;own record. If the collection is well described, then at least you should be able to locate it,&nbsp;&nbsp;sometimes the record descriptions are very generic and don't necessarily list all the datasets included.</span></span></span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. Feel free to send them feedback or tell us if you prefer and we will pass it on.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. How the data projects will be re-organised it is not yet defined but [[Data_projects_update|this page]] has some information on how this is likely to impact climate datasets.&nbsp;Feel free to send them feedback or tell us if you prefer and we will pass it on.</span></span></span></span>
  
'''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 2 -&nbsp;check the CMS wiki</span></span></span></span>'''
+
&nbsp;
 +
 
 +
=== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Check the CMS wiki</span></span></span></span>''' ===
 +
 
 +
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">The CMS wiki [http://climate-cms.wikis.unsw.edu.au/Category:Clex-managed-data lists all the datasets] we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because in some cases we have downloaded only a small subset.</span></span></span></span>
 +
 
 +
&nbsp;
 +
 
 +
&nbsp;
 +
 
 +
=== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Ask the helpdesk:&nbsp;&nbsp;[mailto:cws_help@nci.org.au cws_help@nci.org.au]&nbsp;</span></span></span></span>''' ===
 +
 
 +
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">If you still cannot locate the datasets you were looking for, or you find the dataset but the description was not sufficient to work out if it is what you need, then feel free to e-mail us on the helpdesk.</span></span></span></span>
 +
 
 +
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">As both geonetwork and the wiki are potentially incomplete or some of the records might be out of date, it is always a good idea to double check with us. We also might know about other data which is available on the NCI server,&nbsp;but not necessarily listed or enquire on your behalf to our partners to help you locate a specific product.</span></span></span></span>
 +
 
 +
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">If what you are looking for it is not available yet, we can help you downloading the data. When we receive a request to download data, we try to get back to you as fast as possible with an answer and a timeframe.&nbsp;We are usually able to download the data for you and put in a shared environment where others can also access it. If the dataset requires a lot of storage, the download is time consuming or needs ongoing maintenance we have to check if it is in the Centre&nbsp;objectives with the Infrastructure Committee before going ahead.</span></span></span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because in some cases we have downloaded only a small subset.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium"><span style="caret-color:#000000"><span style="color:#000000">It is rare that we have to say no to someone, and we do not do it without a fairly strong reason, because we prefer to download and manage the dataset for you.&nbsp;Having&nbsp;the data shared in a central location, where others can also access it,&nbsp;is a better use of disk storage.</span></span></span></span>
  
'''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 3 -&nbsp;ask the helpdesk:&nbsp;&nbsp;[mailto:cws_help@nci.org.au cws_help@nci.org.au]&nbsp;</span></span></span></span>'''
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">This [[Requesting_data|wiki page]] covers how to request data in more detail.</span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">If you still can’t locate the datasets you were looking for or you find the dataset but the description wasn’t sufficient to determine that it covers your needs then feel free to e-mail us on the helpdesk.</span></span></span></span>
+
== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:large;">'''Accessing the data'''</span></span> ==
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">As both geonetwork and the wiki are potentially incomplete or some of the records might be out of date, it is always a good idea to double check with us. We also might know about other data which is on raijin but not necessarily listed or enquire on your behalf to our partners to help you locate a specific product.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">Once you know where the dataset is located, you need to join the relevant project, as for any other NCI project you go to</span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">If you still cannot&nbsp;find what you are looking for we can help you downloading the data. When we receive a request to download data&nbsp;we try to get back to you as fast as possible with an answer and a timeframe.&nbsp;We are usually able to download the data for you and put in a shared environment where others can also access it. If the dataset requires a lot of storage, the download is time consuming or needs ongoing maintenance we have to check if it is in the center objectives with the infrastructure committee before going ahead.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">[https://my.nci.org.au https://my.nci.org.au]&nbsp;,&nbsp;search for the project and put in a request.</span></span>
  
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">It is rare that we have to say no to someone and we don't do it without a fairly strong reason because we prefer to download and manage the dataset for you.&nbsp;Having&nbsp;the data shared in a central location where others can access it is also&nbsp;a better use of disk storage.</span></span></span></span>
+
<span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">The lead CI for that project will receive an e-mail and either approve your request or contact you for further information. While this might feel frustrating there are very good reasons why a data manager might want to know how you will use the data.&nbsp;They might want to be sure you are aware of the dataset limitations and not using it improperly. They might want to know which subset of the data users are actually interested into and they usually need to justify the time and effort that goes into maintaining a dataset.<br/> This is becoming increasingly important for NCI, too. They have to prove that their funding is spent in useful ways. NCI is currently reviewing all data projects, their aim is to have a separate project for each dataset, if possible, and to make sure every single user&nbsp;has&nbsp;to request access to the data project as opposed to have world readable files.</span></span>
  
 
&nbsp;
 
&nbsp;
  
=== <span style="font-size:large;">'''Accessing the data'''</span> ===
+
==== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">'''Datasets hosted on NCI servers&nbsp;and managed by CLEX'''</span></span> ====
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''[[:Category:Clex-managed-data|Replicas of external datasets]]'''&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''[[:Category:Published_Data|ARCCSS and CLEX published datasets]]'''&nbsp;</span></span>
 +
 
 +
 
 +
==== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">'''Other datasets hosted by NCI'''</span></span> ====
  
<span style="font-size:medium;">Once you know where the dataset is located, you need to join the relevant project, as for any other NCI project you go to</span>
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">CMIP3 -&nbsp;Coupled Model Intercomparison Project Phase 3&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[[CMIP|CMIP5]]&nbsp;- Coupled Model Intercomparison Project Phase 5&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[[CMIP|CMIP6]]&nbsp;- Coupled Model Intercomparison Project Phase 6&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://cordex.org CORDEX] - Coordinated Regional Climate Downscaling Experiment</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://climate-cms.wikis.unsw.edu.au/ERA5 ERA5] - ECMWF latest reanalysis</span></span>  
  
<span style="font-size:medium;">[https://my.nci.org.au https://my.nci.org.au]&nbsp;,&nbsp;search for the project and put in a request.</span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''NB''' we have a tool [https://clef.readthedocs.io/en/latest/readme.html CleF - climate finder] to help you locate the CMIP and CORDEX data on gadi as well as request for new CMIP and CORDEX data to be downloaded. All the datasets listed above are also managed by NCI.</span></span>
  
<span style="font-size:medium;">The lead CI for that project will receive an e-mail and either approve your request or contact you for further information. While this might feel frustrating there are very good reasons why a data manager might want to know how you will use the data.&nbsp;They might want to be sure you are aware of the dataset limitations and not using it improperly, they might want to know which subset of the data users are actually interested into and they usually need to justify the time and effort that goes into maintaining a dataset.<br/> This is becoming increasingly important for NCI too. They have to prove that their funding is spent in useful ways. NCI is currently reviewing all data projects, their aim is to have a separate project for each dataset, if possible, and to make sure every single users have to request access to the data project as opposed to have world readable files.</span>
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[[AGCD|AGCD&nbsp;- Australian Gridded&nbsp;Climate Data]]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://openradar.io Australian Operational Weather Radar Archive]</span></span>  
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/home NCI geonetwork records]&nbsp;- to see the metadata record</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://dapds00.nci.org.au/thredds/catalog.html NCI Thredds catalog]&nbsp;- to access the actual files</span></span>  
  
 +
&nbsp;
  
 +
==== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">'''ARCCSS and CLEX&nbsp;datasets and software published on Research Data Australia (RDA)'''</span></span> ====
  
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Since the start of ARCCSS we published &nbsp;our datasets on Research Data Australia (RDA), the Australian Research&nbsp;Data Commons&nbsp;(ARDC) metadata repository. This is a metadata catalogue so it doesn't provide direct access to the data itself. However, it allows us&nbsp;to extend our data description, and have records also for research programs, authors and software all in one place.&nbsp;The first datasets to be published were from the Climate Model Downscaling Data for Impacts Research (CliMDDIR), then the ACCESS CMIP5 simulations.</span></span>
  
==== <span style="font-size:medium;">'''Datasets hosted on raijin and managed by CLEX'''</span> ====
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[[ACCESS_CoE_simulations|ACCESS]]&nbsp;- CMIP5 simulations</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://climddir.org/ CliMDDIR]&nbsp;- Climate Model Downscaling Data for Impacts Research</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[[ARCCSS_collection_datasets|CLEX&nbsp;and ARCCSS collections]]&nbsp;- CLEX&nbsp;and&nbsp;ARCCSS datasets from&nbsp;the NCI Data Catalogue</span></span>  
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://researchdata.ands.org.au/atmospheric-access13-historical-sub-project/645160 C20C+ ACCESS]&nbsp;-&nbsp;Atmospheric ACCESS1.3 historical all forcing model output for the Climate of the 20th Century Plus (C20C+) Detection and Attribution sub-project</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://researchdata.ands.org.au/marine-heatwaves-detection-code/814983 MarineHeatWaves]&nbsp;- Marine heatwaves detection code</span></span>
  
*'''[[:Category:Clex-managed-data|Replicas of external datasets]]'''&nbsp;
+
&nbsp;
*'''[[:Category:Published_Data|ARCCSS and CLEX published datasets]]'''&nbsp;
 
  
==== <span style="font-size:medium;">'''Other datasets hosted on raijin'''</span> ====
+
==== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CLEX Code&nbsp;Collection on Zenodo'''</span></span> ====
  
*[[CMIP|CMIP5]]- Coupled Model Intercomparison Project Phase 5 data on raijin
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">In 2020 we started a Zenodo community to collect and publish software, Zenodo allows us to get a DOI&nbsp;for the code&nbsp;records. Publishing the code associated to a paper is now often required when submitting an article to a journal. It also allows&nbsp;us to publish&nbsp;software created by researchers and students that could be useful to other and by&nbsp;the CMS team itself. Zenodo is an international repository and as such can&nbsp;reach a wider audience. If you would like to contribute to this collection just let us know.</span></span>
*[[CMIP|CMIP6]]- Coupled Model Intercomparison Project Phase 6 data on raijin
 
  
*[[AWAP|AWAP - Australian Water Availability Project]]
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''&nbsp;'''[https://zenodo.org/communities/arc-coe-clex/?page=1&size=20 CLEX Code Collection Zenodo community]</span></span>
*[http://openradar.io Australian Operational Weather Radar Archive]
 
*[http://geonetwork.nci.org.au/geonetwork/srv/eng/main.home NCI geonetwork records]&nbsp;- to see the meatadata record
 
*[http://dap.nci.org.au/thredds/catalog.html NCI Thredds catalog]&nbsp;- to access the actual files
 
  
 
&nbsp;
 
&nbsp;
  
==== <span style="font-size:medium;">'''ARCCSS and CLEX&nbsp;datasets and software published on Research Data Australia (RDA)'''</span> ====
+
== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:large;">'''External data resources'''</span></span> ==
  
The Centre of Excellence for Climate System Science has started publishing its datasets on Research Data Australia (RDA), the Australian National Data Service (ANDS) metadata repository. The first datasets to be published were from the Climate Model Downscaling Data for Impacts Research (CliMDDIR), then the ACCESS CMIP5 simulations.
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">If you have not yet found&nbsp;a dataset that suits your needs on NCI, there are still a lot of external data resources, you can browse online and sometimes easily access data from.</span></span>
  
*[[ACCESS_CoE_simulations|ACCESS]]- CMIP5 simulations
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Rather than going straight to&nbsp;a specific dataset, you should first make sure the data you want to use is fit for purpose.&nbsp;There are sites that offer reviews of climate&nbsp;datasets that can help you doing just that. It is always important to know how the data was generated before using it even if a dataset was recommended by someone else. There might be new better alternative datasets or others might have discovered and shared issues with the data. Doing a bit of research before using the data might save you a lot of time and stress.&nbsp;</span></span>
*[http://climddir.org/ CliMDDIR]- Climate Model Downscaling Data for Impacts Research
+
 
*[[ARCCSS_published_datasets|ARCCSS collection]]- ARCCSS datasets on the NCI Data Catalogue
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">These two are a very good place to start.</span></span>
*[https://researchdata.ands.org.au/atmospheric-access13-historical-sub-project/645160 C20C+ ACCESS]-&nbsp;Atmospheric ACCESS1.3 historical all forcing model output for the Climate of the 20th Century Plus (C20C+) Detection and Attribution sub-project
+
 
*[https://researchdata.ands.org.au/marine-heatwaves-detection-code/814983 MarineHeatWaves]- Marine heatwaves detection code
+
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://climatedataguide.ucar.edu NCAR Climate data guide]&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://reanalyses.org Reanalysis.org]&nbsp;</span></span>
 +
 
 +
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Australian data portals</span></span>''' ===
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://researchdata.edu.au Research Data Australia]&nbsp;- metadata repository for all Australian research datasets&nbsp;</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://www.bom.gov.au/climate/data/ BoM Climate data services]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://www.bom.gov.au/cyclone/history/tracks/ BoM Tropical Cyclone data services]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://dl.tpac.org.au TPAC][http://dl.tpac.org.au/thredds/catalog.html &nbsp;&nbsp;Climate and Ocean data portal]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://portal.aodn.org.au AODN]&nbsp;- Australian Ocean Data Network (includes IMOS observation&nbsp;data)</span></span>
 +
 
 +
=== '''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Global data portals</span></span>''' ===
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://www.ncdc.noaa.gov/data-access/paleoclimatology-data/datasets NOAA paleoclimatology data repository&nbsp;]&nbsp;- I am highlighting the paleo data because we are also publishing our paleoclimate datasets in this repository, but NOAA data collection includes also much more ocean and atmospheric data, you can access them also starting from the same link.</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://ooi-website.whoi.edu/data-portal/ OOI -&nbsp;Ocean Observatory Initiative]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://podaac.jpl.nasa.gov/ NASA JPL data portal&nbsp;]&nbsp;- including visualisation tool</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://earthdata.nasa.gov/ NASA earthdata portal]</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">A relatively recent option are remote desktops or other cloud resources with integrated data access</span></span>
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://cds.climate.copernicus.eu/cdsapp#!/toolbox Copernicus Climate Change Service Toolbox]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://giovanni.gsfc.nasa.gov/giovanni/ GIOVANNI]&nbsp;- GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Interdisciplinary data repositories that include climate related dataset</span></span>
 +
 
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://research.jcu.edu.au/data/default/rdmp/home Research Data JCU (James Cook University)]</span></span>
 +
*<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://aurin.org.au AURIN] - Australian Urban&nbsp;Research Infrastructure Network&nbsp;</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Google has developed a [https://toolbox.google.com/datasetsearch dataset search toolbox], currently is only a beta version, you can find here an [https://www.blog.google/products/search/making-it-easier-discover-datasets/ overview]&nbsp;. All the datasets published via the centre&nbsp;are also discoverable with this tool.</span></span>
  
 
&nbsp;
 
&nbsp;
  
&nbsp;
+
=== <span style="font-family:Arial,Helvetica,sans-serif;"><span style="font-size:medium;">'''Accessing remote data: OPeNDAP'''</span></span> ===
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Many of these repositories will allow you to access the data remotely without downloading it. Some will have developed their own tools as we saw above for the cloud and remote analysis web tools, but most will do that by making their data available through OPeNDAP a web-based software that allows users to access datasets remotely. Many software&nbsp;used for analysis recognise an OPeNDAP url as a filename. An OPeNDAP url is usually constituted by the remote address of the file followed by optional constraints.</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Usually any data repository that uses a thredds server, including the NCI data collection, makes the data available via OPeNDAP.</span></span>
  
== <span style="font-family:Calibri, sans-serif">'''External data resources'''</span> ==
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">We have a blog that demonstrates [https://climate-cms.org/2019/01/18/using-opendap.html how to build and use and OPeNDAP&nbsp;url], other more in depth information on OPeNDAP is available from their [https://www.opendap.org/ website], including a list of software&nbsp;that understand this protocol.</span></span>
  
If you haven't find a dataset that suits your needs on NCI&nbsp;there are stil a lot of external data resources you can browse and sometimes easily access data from.
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Using OPeNDAP is particularly useful when you need only a subset of a dataset, like a limited region or timeseries of the data. It is also a good idea if you are accessing a dataset that gets updated often so you can always get the latest available version of the data.</span></span>
  
Obviously you can try your luck with google but there are some website which have been created on purpose to help you. So rather than looking straight for the data you should look for sites that offer reviews of climate&nbsp;datasets. It is always important to know how the data was generated before using it even if a dataset was recommended by someone else. There might be new better alternative datasets or other might have discovered issues with the data. Doing a bit of research before using the data might save you a lot of time and stress.&nbsp;
+
==== <span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''Climate related repositories using OPeNDAP'''</span></span> ====
  
These two are a very good place to start.
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://dap.nci.org.au/thredds/catalog.html NCI thredds catalogue]</span></span>
  
*[https://climatedataguide.ucar.edu NCAR Climate data guide]&nbsp;
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://rda.ucar.edu/thredds/catalog/catalog.html NCAR thredds catalogue]</span></span>
*[https://reanalyses.org Reanalysis.org]&nbsp;
 
  
There are then of course data portals, for Australian resources:
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://dl.tpac.org.au TPAC][http://dl.tpac.org.au/thredds/catalog.html &nbsp;&nbsp;Climate and Ocean data portal]</span></span>
  
*[http://researchdata.ands.org.au Research Data Australia]&nbsp;- metadata repository for all Australian research datasets&nbsp;  
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://erddap-uncabled.oceanobservatories.org/uncabled/erddap/index.html OOI ERDDAP catalogue]&nbsp;- Ocean Observatory Initiative</span></span>
*[http://www.bom.gov.au/climate/data/ BoM Climate data services]
 
*[http://www.bom.gov.au/cyclone/history/tracks/ BoM Tropical Cyclone data services]
 
*[http://dl.tpac.org.au TPAC][http://dl.tpac.org.au/thredds/catalog.html &nbsp;&nbsp;Climate and Ocean data portal]
 
*[https://portal.aodn.org.au AODN]&nbsp;- Australian Ocean Data Network (includes IMOS observation&nbsp;data)
 
  
Global
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://podaac-opendap.jpl.nasa.gov/opendap/hyrax/allData/ NASA JPL PO.DDAC repository]&nbsp;- Physical Oceanography Distributed Active Archive Center</span></span>
  
*[https://www.ncdc.noaa.gov/data-access/paleoclimatology-data/datasets NOAA paleclimatology data repository&nbsp;]&nbsp;- I'm highlighting the paleo data because we are also publishing opur plaeoclimate datasets in this repository, but NOAA data collection includes also much more ocean and atmospheric data, you can access them also starting from the same link.  
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://search.earthdata.nasa.gov/search NASA Earth Observing System Data and Information System (EOSDIS)&nbsp;opendap&nbsp;servers]&nbsp;this includes also a [https://developer.earthdata.nasa.gov/opendap/resources/eosdis-opendap-servers tutorial]&nbsp;to use OPeNDAP</span></span>
  
A relatively recent option are remote desktops or other cloud resources with integrated data access
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[http://thredds.ucar.edu/thredds/catalog.html UCAR thredds catalogue]</span></span>
  
*[https://cds.climate.copernicus.eu/toolbox-editor Copernicus Climate Change Service Toolbox]
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://thredds.daac.ornl.gov/thredds/catalogs/ornldaac/ornldaac.html NASA ORNL DDAC repository] - Oak Ridge National Laboratory Distributed Active Archive Center&nbsp; for biogeochemical dynamics</span></span>
*[[GIOVANNI|GIOVANNI]]&nbsp;- GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure
 
  
Interdisciplinary data repositories that include climate related dataset
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://disc.gsfc.nasa.gov/information/tools?title=OPeNDAP%20and%20GDS NASA GES DISC repository] - atmospheric composition, water & energy cycles and climate variability&nbsp;</span></span>
  
*[https://tropicaldatahub.org/ Tropical Data Hub]
+
&nbsp;
*[https://aurin.org.au AURIN] - Australian Urban&nbsp;Research Infrastructure Network&nbsp;  
 
  
If you know already a dataset name and just want to find it google has a new intiative a [https://toolbox.google.com/datasetsearch dataset search toolbox], currently is only a beta version, you can find here an [https://www.blog.google/products/search/making-it-easier-discover-datasets/ overview]&nbsp;. All the datasets published via the center are also discoverable with this tool.
+
[[Category:Data induction]]

Latest revision as of 16:55, 6 November 2022

Finding climate data at NCI

There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI data catalogue, and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.

 

Check the NCI data catalogue

NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from a list of available attributes.

Free text search

Geonetwork free text search will look for an exact match in the record title and description. For example, if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only one dataset containing this sentence in its description:

"Proportion of days per month with precip > 0.2mm: ..."

Likewise typing "precipitation" will return 55 datasets including any record that has the exact word in their title or description.

Selection by topic and other attributes

NCI created some categories dividing the datasets based on "topic", they are shown on the catalogue main page.  Once you click on one of them, you will see a panel on the left side of the screen showing all the available attributes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel. Probably the most useful attributes are the keywords, they are based on keywords set by the data manager themselves or on Field Of Research (FOR) standards.

The dataset descriptions and attributes are provided by the owners of the data, and sometimes overruled by NCI, so there is a big variance in what is included and how keywords are chosen or interpreted. For example often the FOR codes "0401 Atmospheric Sciences" can also appear as  "0401", "Atmospheric Sciences" and "ATMOSPERIC SCIENCES" or one of its sub-categories "040107 - Meteorology". 

Another limitation is that a dataset might be available as part of a data collection but not having its own record. If the collection is well described, then at least you should be able to locate it,  sometimes the record descriptions are very generic and don't necessarily list all the datasets included.

NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. How the data projects will be re-organised it is not yet defined but this page has some information on how this is likely to impact climate datasets. Feel free to send them feedback or tell us if you prefer and we will pass it on.

 

Check the CMS wiki

The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because in some cases we have downloaded only a small subset.

 

 

Ask the helpdesk:  cws_help@nci.org.au 

If you still cannot locate the datasets you were looking for, or you find the dataset but the description was not sufficient to work out if it is what you need, then feel free to e-mail us on the helpdesk.

As both geonetwork and the wiki are potentially incomplete or some of the records might be out of date, it is always a good idea to double check with us. We also might know about other data which is available on the NCI server, but not necessarily listed or enquire on your behalf to our partners to help you locate a specific product.

If what you are looking for it is not available yet, we can help you downloading the data. When we receive a request to download data, we try to get back to you as fast as possible with an answer and a timeframe. We are usually able to download the data for you and put in a shared environment where others can also access it. If the dataset requires a lot of storage, the download is time consuming or needs ongoing maintenance we have to check if it is in the Centre objectives with the Infrastructure Committee before going ahead.

It is rare that we have to say no to someone, and we do not do it without a fairly strong reason, because we prefer to download and manage the dataset for you. Having the data shared in a central location, where others can also access it, is a better use of disk storage.

This wiki page covers how to request data in more detail.

Accessing the data

Once you know where the dataset is located, you need to join the relevant project, as for any other NCI project you go to

https://my.nci.org.au , search for the project and put in a request.

The lead CI for that project will receive an e-mail and either approve your request or contact you for further information. While this might feel frustrating there are very good reasons why a data manager might want to know how you will use the data. They might want to be sure you are aware of the dataset limitations and not using it improperly. They might want to know which subset of the data users are actually interested into and they usually need to justify the time and effort that goes into maintaining a dataset.
This is becoming increasingly important for NCI, too. They have to prove that their funding is spent in useful ways. NCI is currently reviewing all data projects, their aim is to have a separate project for each dataset, if possible, and to make sure every single user has to request access to the data project as opposed to have world readable files.

 

Datasets hosted on NCI servers and managed by CLEX


Other datasets hosted by NCI

  • CMIP3 - Coupled Model Intercomparison Project Phase 3 
  • CMIP5 - Coupled Model Intercomparison Project Phase 5 
  • CMIP6 - Coupled Model Intercomparison Project Phase 6 
  • CORDEX - Coordinated Regional Climate Downscaling Experiment
  • ERA5 - ECMWF latest reanalysis

NB we have a tool CleF - climate finder to help you locate the CMIP and CORDEX data on gadi as well as request for new CMIP and CORDEX data to be downloaded. All the datasets listed above are also managed by NCI.

 

ARCCSS and CLEX datasets and software published on Research Data Australia (RDA)

Since the start of ARCCSS we published  our datasets on Research Data Australia (RDA), the Australian Research Data Commons (ARDC) metadata repository. This is a metadata catalogue so it doesn't provide direct access to the data itself. However, it allows us to extend our data description, and have records also for research programs, authors and software all in one place. The first datasets to be published were from the Climate Model Downscaling Data for Impacts Research (CliMDDIR), then the ACCESS CMIP5 simulations.

  • ACCESS - CMIP5 simulations
  • CliMDDIR - Climate Model Downscaling Data for Impacts Research
  • CLEX and ARCCSS collections - CLEX and ARCCSS datasets from the NCI Data Catalogue
  • C20C+ ACCESS - Atmospheric ACCESS1.3 historical all forcing model output for the Climate of the 20th Century Plus (C20C+) Detection and Attribution sub-project
  • MarineHeatWaves - Marine heatwaves detection code

 

CLEX Code Collection on Zenodo

In 2020 we started a Zenodo community to collect and publish software, Zenodo allows us to get a DOI for the code records. Publishing the code associated to a paper is now often required when submitting an article to a journal. It also allows us to publish software created by researchers and students that could be useful to other and by the CMS team itself. Zenodo is an international repository and as such can reach a wider audience. If you would like to contribute to this collection just let us know.

 CLEX Code Collection Zenodo community

 

External data resources

If you have not yet found a dataset that suits your needs on NCI, there are still a lot of external data resources, you can browse online and sometimes easily access data from.

Rather than going straight to a specific dataset, you should first make sure the data you want to use is fit for purpose. There are sites that offer reviews of climate datasets that can help you doing just that. It is always important to know how the data was generated before using it even if a dataset was recommended by someone else. There might be new better alternative datasets or others might have discovered and shared issues with the data. Doing a bit of research before using the data might save you a lot of time and stress. 

These two are a very good place to start.

Australian data portals

Global data portals

A relatively recent option are remote desktops or other cloud resources with integrated data access

Interdisciplinary data repositories that include climate related dataset

Google has developed a dataset search toolbox, currently is only a beta version, you can find here an overview . All the datasets published via the centre are also discoverable with this tool.

 

Accessing remote data: OPeNDAP

Many of these repositories will allow you to access the data remotely without downloading it. Some will have developed their own tools as we saw above for the cloud and remote analysis web tools, but most will do that by making their data available through OPeNDAP a web-based software that allows users to access datasets remotely. Many software used for analysis recognise an OPeNDAP url as a filename. An OPeNDAP url is usually constituted by the remote address of the file followed by optional constraints.

Usually any data repository that uses a thredds server, including the NCI data collection, makes the data available via OPeNDAP.

We have a blog that demonstrates how to build and use and OPeNDAP url, other more in depth information on OPeNDAP is available from their website, including a list of software that understand this protocol.

Using OPeNDAP is particularly useful when you need only a subset of a dataset, like a limited region or timeseries of the data. It is also a good idea if you are accessing a dataset that gets updated often so you can always get the latest available version of the data.

Climate related repositories using OPeNDAP

NCI thredds catalogue

NCAR thredds catalogue

TPAC  Climate and Ocean data portal

OOI ERDDAP catalogue - Ocean Observatory Initiative

NASA JPL PO.DDAC repository - Physical Oceanography Distributed Active Archive Center

NASA Earth Observing System Data and Information System (EOSDIS) opendap servers this includes also a tutorial to use OPeNDAP

UCAR thredds catalogue

NASA ORNL DDAC repository - Oak Ridge National Laboratory Distributed Active Archive Center  for biogeochemical dynamics

NASA GES DISC repository - atmospheric composition, water & energy cycles and climate variability