how to
What is a metadata ?

What is a metadata ?

Here you'll find more information on metadata, or how to answer the questions what? how? by whom? where? and when? about a dataset.

Metadata, or "data from/about data", refers to data (information) used to define or describe other data (information), whatever its medium (paper or electronic). A simple way of defining metadata is to answer the following questions about the data concerned:

  • What data? What does this data represent and how is it described (is it a string, a number, if so, what are its limits, etc.).
  • Who produced it? Who produced this data, but also who owns it or who can be contacted to find out more.
  • And when? How old is this data, and what period does it correspond to? Also, when was it updated? This is similar to the notion of status (finalized or not, depreciated, etc.).
  • Where? Where was the data produced?
  • How was it produced? How was the data produced (what protocol was used, what equipment? what processing did it undergo before being disseminated, etc.).

With SISelune, metadata is created in the form of a file (one file for one dataset, i.e. one piece of data), so each file must be able to provide answers to all these questions.

Why is metadata so important?

In particular, the aim of metadata is to make it possible and easier to consult and exchange data, and thus to ensure the long-term use of data. When working with data (whether geographic or not), it is strongly recommended to systematically :

  • Associate metadata with this created data to describe it and facilitate its reuse.
  • Consult the metadata of existing data before using it.

There are established standards for metadata in geographic data, such as ISO 1915, and by organizing data according to these standards, it becomes easier to manage it efficiently, ensure quality and share it. Most GIS software is based on existing standards for metadata management, and it is possible to enter metadata in file properties according to a standard.
Relying on these standards also enables harvesting mechanisms to be triggered automatically (via the geocatalog hosting the metadata records). This involves the automatic dissemination of metadata records between interconnected catalogs. For example, the Brittany region's GéoBretagne metadata geocatalog harvests GéoSAS geocatalog records in such a way that a user of the 1st catalog will find the metadata records, and therefore the data, of the 2nd.)

Metadata by example

Let's take the example of the dataset: "Analyses chimiques de l'eau (2014-2027) - Observatoire Sélune" with the associated metadata sheet: https://geosas.fr/geonetwork/srv/fre/catalog.search#/metadata/35dfec55-ce37-46d3-9a75-e33330170ed1
How does this metadata sheet meet the information needs for these water chemistry analyses?

What? What is this data?

As described in the summary of the metadata sheet, these are chemical analyses carried out on manual or automatic samples of dissolved-phase elements (dissolved phosphorus, major anions, dissolved silica, ammonium) and particulate-phase elements (suspended solids, their carbon and nitrogen content, total phosphorus). The data sheet also specifies that the data are described in French.

The associated keywords in the metadata record provide additional information on the characterization of this data: "Hydrologie", "Installations de suivi environnemental", "Analyses de l'eau", etc. Most of the keywords used are not entered by the creator of the metadata record, but are selected from thesauri (thematic lists), enabling the metadata record to be analyzed with regard to these keywords. For example, the geocatalog hosting the metadata records of the Sélune scientific program also hosts other metadata records, so a search in this catalog of all the records associated with "Installations de suivi environnemental" enables you to find other datasets characterized by the same keyword. This is all the more important in regional or national catalogs, where the metadata records of the scientific program are listed and therefore identifiable in the same way as other records produced in other circumstances (other scientific programs, monitoring observatories, etc.).

Another important piece of information for answering this question is genealogy. This information provides details about the data and its production. In the case of chemical analysis of water, the genealogy reminds us that the data is derived from water sampling in the field, whether automatic or not (and also specifies the circumstances of sampling), as well as the equipment used.

Who? Who are the stakeholders (producers, owners, contacts) of this data?

The metadata sheet provides a wealth of information:

  • 2 points of contact: UMR SAS and AESN.
  • The supplier: GeoSAS (in this case, producer and supplier)

When? When was this data acquired, created, distributed, and what period does it cover?

The metadata sheet indicates that this dataset has been finalized. The title already indicates the time period covered (2014-2027), but this information is also found in the detail with the period covered (Mon Oct 27 2014 00:00:00 GMT+0100 Fri Dec 31 2027 00:00:00 GMT+0100).

The publication date of the metadata sheet is also available (14/04/2021).

Where? Which geographical area is concerned by this data?

The geographical area is described by 2 elements:

  • A spatial extent. This very practical cartographic visualization of the data set is based on the extent of the GPS coordinates of the geographical objects identified. It is automatically calculated from the data, enabling you to see at a glance the area covered by the data.
  • The coordinate system and scale are essential for using the spatial representation recorded in the dataset.

How do we do it? How was this data acquired? Does it undergo any specific processing or modification?

This notion of "How" is generally presented in different ways. It may be succinctly presented in the summary and in certain key words (as in this case, "Installations de suivi environnemental", giving information on the use of equipment to monitor this data). 
But it's mainly genealogy that will go into more detail on :

  • The data acquisition method (in this case, hybrid, sometimes manual, sometimes automatic).
  • The type of equipment used in the case of automatic acquisition.
  • The protocol used to repatriate and preserve samples before analysis.