Difference between revisions of "Finding datasets"
P.petrelli (talk | contribs) m |
P.petrelli (talk | contribs) |
||
Line 2: | Line 2: | ||
== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Finding data at NCI</span></span></span></span>''' == | == '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Finding data at NCI</span></span></span></span>''' == | ||
− | <span style="font-size:medium;"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">There are a lot of climate data resources which are available at NCI. | + | <span style="font-size:medium;"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI data catalogue and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.</span></span></span></span> |
| | ||
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 1: check the [http://geonetwork.nci.org.au/geonetwork/srv/eng/main.home NCI data catalogue] </span></span></span></span> | + | ==== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 1: check the [http://geonetwork.nci.org.au/geonetwork/srv/eng/main.home NCI data catalogue]</span></span></span></span>''' ==== |
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">NCI | + | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from alist of available attributes.</span></span></span></span> |
− | < | + | ==== <font color="#000000"><font face="Calibri, sans-serif"><font size="3"><span style="caret-color:#000000">Free text search</span></font></font></font> ==== |
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000"> | + | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Geonetwork free text search will look for an exact match in the record title and descritpion. For example if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only this dataset:</span></span></span></span> |
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000"> | + | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">"</span></span></span></span>Proportion of days per month with precip > 0.2mm: ..." |
− | | + | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Likewise typing "precipitation" will return 62 datasets inlcluding any record that has the exact word in their title or description.</span></span></span></span> |
+ | |||
+ | <font size="3">Selection by topic and other attributes</font> | ||
+ | |||
+ | <span style="font-size:medium;">NCI created some categories dividing the datasets based on "topic", they are shown on the catalogue main page. Once you click on one of them you will see a panel on the left side of the screen showing all the available attrbiutes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel. Probably the most useful attributes are the keywords, they are based on keywords set by the data manager themselves or on [http://www.abs.gov.au/Ausstats/abs@.nsf/Latestproducts/6BB427AB9696C225CA2574180004463E?opendocument Field Of Research (FOR) standards].</span> | ||
+ | |||
+ | <span style="font-size:medium;">Unfortunately the selection panel can be a bit obscure, for example often the FOR codes appears as a sequence of numbers rather then their definition for example "0401" rather then "atmospheric science". This is because </span><span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">the dataset descriptions and attributes are provided by the owners of the data, so there is a big variance in what is included and how keywords are chosen or interpreted.</span></span></span></span> | ||
+ | |||
+ | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Another limitation is that sometimes there is a record for a data collection but not for the single datasets included in it. Some of the data collections are clear as CMIP6 contains only CMIP6 data but atmospheric re-analysis or even the ARCCSS own data collection are more heterogenous and if a child record is not present for each dataset in the collection, it can be hard to get an idea of what is actually available.</span></span></span></span> | ||
+ | |||
+ | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. Feel free to send them feedback or tell us if you prefer and we will pass it on.</span></span></span></span> | ||
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 2 : check the CMS wiki</span></span></span></span> | + | ==== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step 2 : check the CMS wiki</span></span></span></span>''' ==== |
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because for example we have downloaded only a small subset.</span></span></span></span> | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because for example we have downloaded only a small subset.</span></span></span></span> | ||
Line 24: | Line 34: | ||
| | ||
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step3: ask at [mailto:cws_help@nci.org.au cws_help@nci.org.au] helpdesk</span></span></span></span> | + | ==== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step3: ask at [mailto:cws_help@nci.org.au cws_help@nci.org.au] helpdesk</span></span></span></span>''' ==== |
| | ||
Line 34: | Line 44: | ||
| | ||
− | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step4: request us to download the data</span></span></span></span> | + | ==== '''<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">Step4: request us to download the data</span></span></span></span>''' ==== |
<span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">If you still couldn’t find what you were looking for we can help you downloading the data. When we receive a request to download data, we quickly check the storage and time required for the task. Unless these are “enourmous” we usually download the data for you and put in a shared environment where others can also access it. If the dataset require a lot of storage or time to download or ongoing maintenance we might need to check with the infrastructure committee before going ahead.</span></span></span></span> | <span style="font-size:medium"><span style="font-family:Calibri, sans-serif"><span style="caret-color:#000000"><span style="color:#000000">If you still couldn’t find what you were looking for we can help you downloading the data. When we receive a request to download data, we quickly check the storage and time required for the task. Unless these are “enourmous” we usually download the data for you and put in a shared environment where others can also access it. If the dataset require a lot of storage or time to download or ongoing maintenance we might need to check with the infrastructure committee before going ahead.</span></span></span></span> |
Revision as of 22:21, 29 April 2019
Contents
- 1 Finding data at NCI
- 1.1 Step 1: check the NCI data catalogue
- 1.2 Free text search
- 1.3 Step 2 : check the CMS wiki
- 1.4 Step3: ask at cws_help@nci.org.au helpdesk
- 1.5 Step4: request us to download the data
- 1.6 Datasets hosted on raijin and managed by the ARCCSS
- 1.7 Other datasets hosted on raijin
- 1.8 ARCCSS datasets and software published on Research Data Australia (RDA)
- 2 External data resources
Finding data at NCI
There are a lot of climate data resources which are available at NCI. Only some of them have been listed in the NCI data catalogue and even then there are a few tips that can help you working out if the dataset you are looking for is already available or not.
Step 1: check the NCI data catalogue
NCI uses a geonetwork catalogue to list its collections, this covers most of the bigger data collections hosted by NCI. To find a dataset you can run a text search or select from alist of available attributes.
Free text search
Geonetwork free text search will look for an exact match in the record title and descritpion. For example if I am looking for "precipitation" datasets I need to type the entire word, if I type only "precip" it will return only this dataset:
"Proportion of days per month with precip > 0.2mm: ..."
Likewise typing "precipitation" will return 62 datasets inlcluding any record that has the exact word in their title or description.
Selection by topic and other attributes
NCI created some categories dividing the datasets based on "topic", they are shown on the catalogue main page. Once you click on one of them you will see a panel on the left side of the screen showing all the available attrbiutes you can use to select datasets. The same is true if you run a text search, you could then refine your selection using this panel. Probably the most useful attributes are the keywords, they are based on keywords set by the data manager themselves or on Field Of Research (FOR) standards.
Unfortunately the selection panel can be a bit obscure, for example often the FOR codes appears as a sequence of numbers rather then their definition for example "0401" rather then "atmospheric science". This is because the dataset descriptions and attributes are provided by the owners of the data, so there is a big variance in what is included and how keywords are chosen or interpreted.
Another limitation is that sometimes there is a record for a data collection but not for the single datasets included in it. Some of the data collections are clear as CMIP6 contains only CMIP6 data but atmospheric re-analysis or even the ARCCSS own data collection are more heterogenous and if a child record is not present for each dataset in the collection, it can be hard to get an idea of what is actually available.
NCI is working on an improved and more user-friendly interface and is also working with all the data managers to improve the quality of the records. Feel free to send them feedback or tell us if you prefer and we will pass it on.
Step 2 : check the CMS wiki
The CMS wiki lists all the datasets we download and manage for our researchers. Some of these are also listed in geonetwork, mostly are not because for example we have downloaded only a small subset.
Step3: ask at cws_help@nci.org.au helpdesk
If you still can’t locate the datasets you were looking for or you find the dataset but the description wasn’t sufficient to determine that it covers your needs then feel free to e-mail us on the helpdesk.
As both geonetwork and the wiki are potentially incomplete or some of the records might be out of date, it is always a good idea to double check with us. We also might know aboput other data which is on raijin but not necessarily listed or make enquire on your behalf to our partners to help ypu locate a specific product.
Step4: request us to download the data
If you still couldn’t find what you were looking for we can help you downloading the data. When we receive a request to download data, we quickly check the storage and time required for the task. Unless these are “enourmous” we usually download the data for you and put in a shared environment where others can also access it. If the dataset require a lot of storage or time to download or ongoing maintenance we might need to check with the infrastructure committee before going ahead.
It is rare that we have to say no to someone and we don’t do it without a fairly strong reason because we prefer download and manage updates to the data and to have it shared in a central location where other can access it too. We also want to avoid that precious disk space that should be used for analysis by all your group gets bogged with data.
Datasets hosted on raijin and managed by the ARCCSS
- ERA INTERIM- ECMWF re-analysis on raijin
- ERA5- ECMWF re-analysis
- MACC- ECMWF
- YOTC- ECMWF re-analysis on raijin
- CABLEdatasets collection (in collaboration with the CABLE users group)
- OSTIA-SST
- NOAA_OISST- Optimum Interpolation Sea Surface Temperature from NOAA
- CMIP5_ocean_processing
- NCEP Polar SST
- OFES- OGCM For the Earth Simulator
- MERRA2- Modern-Era Retrospective Analysis for Research and Applications 2
- JRA55- Japanese 55-year Reanalysis
- JRA55-do- JRA‐55 based data set for Driving Ocean ‐ sea ice model
- Upper Air Sounding Observations for Australia 2000-2015
- Lightning Stroke Counts onECMWF Era Interim Grid 20080301-20151031
- NASA-TRMM- Real-Time TRMM Multi-Satellite Precipitation Analysis
- C20C+ - International CLIVAR C20C+ Detection and Attribution project
- CESM1-LME- CESM1 Last Millenium Ensemble
- CESM1-CAM5-BGC-LE- CESM1-CAM5 BioGeoChemestry 20C + RCP8.5 Large Ensemble model output
- sealevel_GLO_PHY_L4_REP_observations_008_047- Global Ocean - Multimission altimeter satellite gridded sea surface heights and derived variables (previously known as AVISO)
- CMORPH - NOAA CPC MORPHing technique: high resolution precipitation (60S-60N) v1.0
- GSMaP - Global Satellite Mapping of Precipitation
- GHCN- Global Historical Climatology Network
Other datasets hosted on raijin
- CMIP5- Coupled Model Intercomparison Project Phase 5 data on raijin
- CMIP6- Coupled Model Intercomparison Project Phase 6 data on raijin
- AWAP - Australian Water Availability Project
- Australian Operational Weather Radar Archive
- NCI geonetwork records
- NCI Thredds catalog
ARCCSS datasets and software published on Research Data Australia (RDA)
The Centre of Excellence for Climate System Science has started publishing its datasets on Research Data Australia (RDA), the Australian National Data Service (ANDS) metadata repository. The first datasets to be published were from the Climate Model Downscaling Data for Impacts Research (CliMDDIR), then the ACCESS CMIP5 simulations.
- ACCESS- CMIP5 simulations
- CliMDDIR- Climate Model Downscaling Data for Impacts Research
- ARCCSS collection- ARCCSS datasets on the NCI Data Catalogue
- C20C+ ACCESS- Atmospheric ACCESS1.3 historical all forcing model output for the Climate of the 20th Century Plus (C20C+) Detection and Attribution sub-project
- MarineHeatWaves- Marine heatwaves detection code
External data resources
- GIOVANNI- GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure
- TPACClimate and Ocean data portal
- The Australian National Data Service (ANDS)
- Research Data Australia
- CF metadata conventions website
- BoM Climate data services
- BoM Tropical Cyclone data services
- Tropical Data Hub