Difference between revisions of "Data terminology"

Line 86: Line 86:
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CC''' - [https://creativecommons.org/ Creative Commons], a non-profit organisation that produces licenses to encourage sharing of knowledge, commonly used for data products</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CC''' - [https://creativecommons.org/ Creative Commons], a non-profit organisation that produces licenses to encourage sharing of knowledge, commonly used for data products</span></span>
 +
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CDM''' - Common Data Model</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CF''' - [http://climate-cms.wikis.unsw.edu.au/Conventions#CF_Conventions Climate and Forecast conventions], conventions used to set metadata attributes in NetCDF files</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''CF''' - [http://climate-cms.wikis.unsw.edu.au/Conventions#CF_Conventions Climate and Forecast conventions], conventions used to set metadata attributes in NetCDF files</span></span>
  
<font face="Arial, Helvetica, sans-serif" size="3">COSIMA -</font>
+
<font face="Arial, Helvetica, sans-serif" size="3">'''COSIMA''' -</font>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''DAP''' - Data Access Protocol is a data transmission protocol designed specifically for science data. The protocol provides data types to accommodate gridded data, relational data, and time series, regardless of the original format. It is recognised by many scoentifc softwares, so you can pass a dap url instead of a filename to open a file or a subset.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''DAP''' - Data Access Protocol is a data transmission protocol designed specifically for science data. The protocol provides data types to accommodate gridded data, relational data, and time series, regardless of the original format. It is recognised by many scoentifc softwares, so you can pass a dap url instead of a filename to open a file or a subset.</span></span>
Line 99: Line 101:
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''GeoNetwork''' -&nbsp;[https://geonetwork-opensource.org GeoNetwork]&nbsp;is an open source&nbsp;web interface to serve geospatial data across multiple catalogs.&nbsp;NCI uses a [http://geonetwork.nci.org.au GeoNetwork catalogue]&nbsp;to manage the&nbsp;metadata of data collections hosted on their servers.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''GeoNetwork''' -&nbsp;[https://geonetwork-opensource.org GeoNetwork]&nbsp;is an open source&nbsp;web interface to serve geospatial data across multiple catalogs.&nbsp;NCI uses a [http://geonetwork.nci.org.au GeoNetwork catalogue]&nbsp;to manage the&nbsp;metadata of data collections hosted on their servers.</span></span>
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">IMOS -&nbsp;</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''IMOS''' -&nbsp;</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''NCSS''' -&nbsp;[http://www.unidata.ucar.edu/software/thredds/current/tds/reference/NetcdfSubsetServiceReference.html NetCDF Subset Service]&nbsp;allows subsetting certain CDM datasets in coordinate space, using a REST API. Gridded data subsets can be returned in&nbsp;[http://cfconventions.org/ CF-compliant]&nbsp;netCDF3 or netCDF4. Point data subsets can be returned in CSV, XML, or&nbsp;[http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#discrete-sampling-geometries CF-DSG]&nbsp;NetCDF files.</span></span>
 +
 
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''OGC''' - the&nbsp;[https://www.ogc.org Open Geospatial Consortium]&nbsp; is an international consortium&nbsp;whose aim is to create&nbsp;free, publicly available geospatial standards to&nbsp;improve&nbsp;access to geospatial data. Examples are WCS, WMS and WFS described below.</span></span>
  
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''OPeNDAP''' -&nbsp;[https://www.opendap.org Open-source Project for a Network Data Access Protocol] is the client/server software associated to DAP.&nbsp;OPeNDAP is a widely used, subsetting data access method extending the HTTP protocol.&nbsp;We have a blog that demonstrates&nbsp;[https://climate-cms.org/2019/01/18/using-opendap.html how to build and use and opendap url], other more in depth information on OPenDAP&nbsp;is available from their&nbsp;[https://www.opendap.org/ website], including a list of software&nbsp;that understand this protocol.</span></span>
 
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''OPeNDAP''' -&nbsp;[https://www.opendap.org Open-source Project for a Network Data Access Protocol] is the client/server software associated to DAP.&nbsp;OPeNDAP is a widely used, subsetting data access method extending the HTTP protocol.&nbsp;We have a blog that demonstrates&nbsp;[https://climate-cms.org/2019/01/18/using-opendap.html how to build and use and opendap url], other more in depth information on OPenDAP&nbsp;is available from their&nbsp;[https://www.opendap.org/ website], including a list of software&nbsp;that understand this protocol.</span></span>
Line 109: Line 115:
 
'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">TDS -</span></span>'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://www.unidata.ucar.edu/software/tds/ THREDDS Data Server] is a data, metadata catalogue server based on THREDDS.</span></span>
 
'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">TDS -</span></span>'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">[https://www.unidata.ucar.edu/software/tds/ THREDDS Data Server] is a data, metadata catalogue server based on THREDDS.</span></span>
  
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">TERN -</span></span>
+
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">'''TERN''' -</span></span>
  
 
'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">THREDDS -&nbsp;</span></span>'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Thematic&nbsp;Real-Time&nbsp;Environmental&nbsp;Distributed&nbsp;Data&nbsp;Services&nbsp;provides metadata and data access for scientific datasets, using a variety of remote data access protocols, including DAP.&nbsp;<span style="caret-color:#000000"><span style="color:#000000">NCI uses a [https://dapds00.nci.org.au/thredds/catalog.html <span style="color:#3366bb">THREDDS server</span>] to make datasets available remotely. Our CLEX and ARCCSS collection are also available on this. A list of other useful THREDDS data servers is available from our [http://climate-cms.wikis.unsw.edu.au/Data_Access <span style="color:#006699">data access</span>] page.</span></span></span></span>
 
'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">THREDDS -&nbsp;</span></span>'''<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">Thematic&nbsp;Real-Time&nbsp;Environmental&nbsp;Distributed&nbsp;Data&nbsp;Services&nbsp;provides metadata and data access for scientific datasets, using a variety of remote data access protocols, including DAP.&nbsp;<span style="caret-color:#000000"><span style="color:#000000">NCI uses a [https://dapds00.nci.org.au/thredds/catalog.html <span style="color:#3366bb">THREDDS server</span>] to make datasets available remotely. Our CLEX and ARCCSS collection are also available on this. A list of other useful THREDDS data servers is available from our [http://climate-cms.wikis.unsw.edu.au/Data_Access <span style="color:#006699">data access</span>] page.</span></span></span></span>
 +
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;"><span style="caret-color:#000000"><span style="color:#000000">WCS -&nbsp;</span></span>[https://www.ogc.org/standards/wcs Web Coverage Service]&nbsp;is a protocol used to transfer "coverages", ie. objects covering a geographical area.</span></span>
 +
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">WFS - [https://www.ogc.org/standards/wfs Web Feature Service]&nbsp;is a protocol that&nbsp;offers direct fine-grained access to geographic information at the feature and feature property level.</span></span>
 +
 +
<span style="font-size:medium;"><span style="font-family:Arial,Helvetica,sans-serif;">WMS - [https://www.ogc.org/standards/wms Web Mapping Service]&nbsp;is a protocol used by map servers to deliver map images.</span></span>
  
 
|}
 
|}
  
 
[[Category:Data induction]]
 
[[Category:Data induction]]

Revision as of 01:42, 14 July 2021

We are listing here some data management key concepts and frequently recurring terms and acronyms.

NB this is a work in progress so it is not yet an exhaustive list

 

Key concepts   

FAIR

The FAIR Data Principles:

  • Findable:  data should be easy to find and identify. 
  • Accessible: data should have open access whenever possible.
  • Interoperable: well formatted data that uses discipline conventions and vocabularies, for both the data itself and the metadata used to describe it.
  • Reusable: data should be accompanied by enough information on how it was collected or processed, as to guarantee its quality and hence make it usable by other

File Management

Methods for storing, organising, naming, discovering and retrieving files in a structured consistent manner. 

Data Storage

The location and/or system you use to store your data during a research project. This could include disk on personal computers, disk or tape on a shared server,  external storage devices such as hard drives or SD cards, and networked drives managed by your institution, commercial or research cloud storage.

Data Back Up

The process of saving your data to protect against data loss. This can be an automatic process, where the storage location automatically retains previous versions of your data, or a manual process, where you need to actively save the data in another location.

Data Archiving or Preservation

The process of putting your data in long term storage following the completion of a project or publication for a minimum of 5 years. This includes identifying who can access the data and how it can be accessed. Many Institutions have Repositories which can be used by staff and students.

Data Sharing

Making your data available for use by other researchers for their own research projects. This requires quality metadata to determine data source and changes made to allow for reuse. The best way to share data is to publish it then it will be more discoverable and will be assigned a persistent identifier (such as DOI) which helps other to cite the data.

Data Provenance

Data provenance describes the journey data goes through. It documents the evolution of a dataset from the original source including all the processes and methodology by which it was produced.

Data_Management_Plan (DMP)

Tool to help you manage the data for a specific research project. It can takes different forms depending on the stage of your project, for example a DMP to submit with a grant application will be different from the DMP required to publish your data. A DMP evolves with your project and it is useful to record your data provenance

Metadata

Metadata is the information on data, examples are metadta files accompanying observations with details of instrumentations and location, the attributes of a NetCDF file. A metadata record or repositories will contain information on a dataset but not the data itself.
Open Access A set of principles and a range of practices through which research outputs are distributed online, free of cost or other access barriers.

Other terms

attribution - is the act of recognising the author/s of a piece of work that you used in your research. It is a common requirement of licenses

citation - is the way you attribute a piece of work, it should contain all the information necessary to locate the original work

copyright - is a form of intellectual property meant to protect the right of the author of a creative work to control how the work is used. More comprehensive but readale information on copyright is available here.

license - a copyright license is a legal document stating what someone else is allowed or not allowed to do with a research product

 

Acronyms

ARDC (ex ANDS) - Australian Research Data Commons is a NCRIS project aimed to enable the Australian research community and industry access to nationally significant, data intensive digital research infrastructure, platforms, skills and collections of high quality data.

AURIN - Australian Urban Research Infrastructure Network, an NCRIS project that provides e-research infrastructure and expertise to support urban, regional and social science research in academia, governement and industry.

CC - Creative Commons, a non-profit organisation that produces licenses to encourage sharing of knowledge, commonly used for data products

CDM - Common Data Model

CF - Climate and Forecast conventions, conventions used to set metadata attributes in NetCDF files

COSIMA -

DAP - Data Access Protocol is a data transmission protocol designed specifically for science data. The protocol provides data types to accommodate gridded data, relational data, and time series, regardless of the original format. It is recognised by many scoentifc softwares, so you can pass a dap url instead of a filename to open a file or a subset.

DMP - Data_Management_Plan

FAIR - see definition in key concepts

GeoNetworkGeoNetwork is an open source web interface to serve geospatial data across multiple catalogs. NCI uses a GeoNetwork catalogue to manage the metadata of data collections hosted on their servers.

IMOS

NCSSNetCDF Subset Service allows subsetting certain CDM datasets in coordinate space, using a REST API. Gridded data subsets can be returned in CF-compliant netCDF3 or netCDF4. Point data subsets can be returned in CSV, XML, or CF-DSG NetCDF files.

OGC - the Open Geospatial Consortium  is an international consortium whose aim is to create free, publicly available geospatial standards to improve access to geospatial data. Examples are WCS, WMS and WFS described below.

OPeNDAPOpen-source Project for a Network Data Access Protocol is the client/server software associated to DAP. OPeNDAP is a widely used, subsetting data access method extending the HTTP protocol. We have a blog that demonstrates how to build and use and opendap url, other more in depth information on OPenDAP is available from their website, including a list of software that understand this protocol.

RDA - Research Data Australia is the data discovery service of the Australian Research Data Commons (ARDC). Most universities and research centers across Australia are now listing their data collections on RDA. RDA does not hold the data, so it is often used instead to list records existing on other repositories. Universities and data centers often automatically create an RDA record for any new published dataset. It is a useful tool for data, and more recently, for software discovery. RDA also holds record for research programs, institution and the researchers themselves. Datasets listed on RDA are automatically added to the google dataset search tool. For all these reasons we try to list our datasets here, ARCCSS and CLEX have their own data source. 

RDA - Research Data Alliance is a global community-driven initiative with the goal of building the social and technical infrastructure to enable open sharing and re-use of data.

TDS -THREDDS Data Server is a data, metadata catalogue server based on THREDDS.

TERN -

THREDDS - Thematic Real-Time Environmental Distributed Data Services provides metadata and data access for scientific datasets, using a variety of remote data access protocols, including DAP. NCI uses a THREDDS server to make datasets available remotely. Our CLEX and ARCCSS collection are also available on this. A list of other useful THREDDS data servers is available from our data access page.

WCS - Web Coverage Service is a protocol used to transfer "coverages", ie. objects covering a geographical area.

WFS - Web Feature Service is a protocol that offers direct fine-grained access to geographic information at the feature and feature property level.

WMS - Web Mapping Service is a protocol used by map servers to deliver map images.