chemin de la data
The pathway of the data

The pathway of the data, from the field to user

The pathway of the data, from the field to user

The pathway of the data, from the field to user

The Information system SISelune, as each IS, is composed of a set of computer tools (software, web site, storage space, ...) as known as technical base. It is through these different computer tools that the field data passes to undergo many operations of verification, reorganization, cleaning and consolidation in order to finally arrive at its user. 

This article describes different steps followed by the data from the field to its ending user.

All this pathway on a unique schema

Different steps

1. Acquire data on the field

Depending on the context (social sciences, biology, geology, archaelogy), various means and method are used to acquire data on the field. These means and methods are systematically based on field work by all the scientists teams working on the program. Some of them are manual surveys (programmed campaigns), some of them are automatic surveys (data surveys from field deployed equipments) : 

  • regarding the dynamique du territoire, the team of geographers and experts in the humanities and social sciences relies on surveys carried out in the field, handwritten documents (minutes of meetings, press articles, ...), or audio and video (analyses of speeches), but also photographs for the landscape observatory.
  • regarding the biodiversité aquatique et terrestre, the scientific teams rely on programmed fishing campaigns (electrical, traps, trapping, ...) and field surveys (vegetation, bio-indicator species), but also acoustic data recordings (through DIDSON camera) or plant layer data (through LIDAR acquisition).
  • regarding the dynamique fluviale et qualité de l'eau, Geoscience teams rely on physico-chemical data records through long-term deployed field or as required (measurement campaigns) facilities (hydrological stations), but also records of river bed morphology data (sediment) through LIDAR acquisitions.

All these acquisitions of field data are therefore stored by the teams of scientists in the form of files (text, images, gps data, point clouds, ...). These are the files that serve as the raw material for the next stage of data work.

2. Prepare data

The need to respect the FAIR principles and in particular the Interoperability and Reusability aspects, requires a preparation work on the acquired data before putting them back in SISélune. Indeed, acquisition in the field can suffer from imperfections (input errors, incorrect GPS coordinates, specific timestamps, ...), problems of storage (volume of data especially for LIDAR or acoustic camera), necessary completion of data (adding contextual information) or on the contrary deletion of unnecessary data (technical or personal data).

This is the reason why a work of verification, reorganization, cleaning and consolidation with possible data already recorded is necessary before any escalation in the IS. This is one of the roles of the IS administrator, in close collaboration with each referring scientist, he transforms the initial data and thus prepares it for import into the IS.

These various operations on the raw data are performed through python scripts executed step-by-step and allowing to follow the different sub-steps.

3. Import data

Thus prepared, the data is restructured in the various relevant tables of the database. This 1st step allows to verify that the information thus stored remains consistent with the initial data. A 2nd step consists in creating a specific object (a view) which will be the formatting of the data disseminated. This view will be used for broadcast.

These 2 steps are partially automated in the same way as the initial data preparation within the same scripts mentioned above.

4. Disseminate data

Once the view is created on the database, the next steps are, for the moment, manual and assumed by the IS administrator.

This involves creating a geographic data flow at the GeoServer software, based on the above-mentioned view, to propose the dissemination of data over different standardized channels (OGC standard):

  • data consultation through specific GIS formats: WMS, WFS, ...
  • downloading data in text file (CSV) or GIS specific file (SHP)

From this step, the data can also be viewed and used directly through the GeoServer, through GIS software or even through scripts (python, R, ...).

An additional and optional step will facilitate this visualization, especially for the general public: the updating of the cartographic portal. The cartographic portal does not present all the data of the IS but contextualises them with other data (outside the Selune program: land use, agriculture, historical maps, ...).

5. Publish metadata

The final step in the field data processing process is the creation or updating, and publication of metadata, which is intended to describe the previously released data with answers to the following questions: What is the disseminated data? where was it acquired? how? when? and why?

The metadata sheet then completes the metadata catalog and thus allows a precise search on the IS data, but also, and especially support their dissemination. Indeed, the catalog deployed as part of the IS is configured to allow the harvesting of its sheets by other catalogs (regional, national, thematic, ...) thus ensuring the provision of IS data via other information systems.

The metadata sheet then complements the metadata catalogue and thus allows an accurate search on the IS data, but also, and especially support their dissemination. Indeed, the catalogue deployed within the framework of the IS is configured to allow the harvesting of its sheets by other catalogues (regional, national, thematic, ...) thus ensuring the availability of the IS data via other information systems.

6. Use data

Once the data acquired in the field is disseminated and the metadata published, via the IS, it is then possible to use it, on the cartographic portal (visualization, download), on GIS software or specific scripts (calculation, creation of maps, ...), search for more information (what? where? when? how?).

See also