Skip to content

Data Saving

make run-save-data

Summary

This scripts loads data from differents providers, aggregate the data to a standardized format and save the data in .txt files.

Configuration

The configuration file for this script is config/save_data.toml (based on config/default_save_data.toml). All the parameters and their functionality are listed below:

Input/Output

SAVING_DIR

Directory in which to save the dataframes.

default: "bgc_data"

Expected type: str

Data Selection

PROVIDERS

List of providers to use data from.

default: ["GLODAP_2022", "CMEMS", "ARGO", "NMDC", "CLIVAR", "IMR", "ICES"]

Expected type: list[str]

VARIABLES

Variables to include in the output file. The name or the variables are the ones defined in config/variables.toml, in the NAME field.

default: ["PROVIDER", "EXPOCODE", "YEAR", "MONTH", "DAY", "HOUR", "LONGITUDE","LATITUDE", "DEPH", "TEMP", "PSAL", "DOXY", "PHOS", "NTRA", "SLCA", "CPHL"]

Expected type: list[str]

DATE_MIN

Beginning of the data to load (included).

default: "20070101"

Expected type: str (respecting the YYYYMMDD format)

DATE_MAX

End of the data to load (included).

default: "20071231"

Expected type: str (respecting the YYYYMMDD format)

INTERVAL

Horizontal resolution of the plot. If set to 'day': will group datapoint by day. If set to 'week': will group datapoints by their week number. If set to 'month': will group datapoints by month. If set to 'year': will grou datapoints by year. If set to 'custom': will group datapoints based on a custom interval.

default: "month"

Expected type: str

CUSTOM_INTERVAL

If parameter INTERVAL is set to custom, length of the custom interval (in days).

default: 8

Expected type: int

LATITUDE_MIN

Minimum latitude boundary for the loaded data (included).

default: 50

Expected type: intor float

LATITUDE_MAX

Maximum latitude boundary for the loaded data (included).

default: 90

Expected type: intor float

LONGITUDE_MIN

Minimum longitude boundary for the loaded data (included).

default: -180

Expected type: intor float

LONGITUDE_MAX

Maximum longitude boundary for the loaded data (included).

default: 180

Expected type: intor float

DEPTH_MIN

Minimum depth boundary for the loaded data (included).

default: nan

Expected type: int or float

DEPTH_MAX

Maximum depth boundary for the loaded data (included).

default: 0

Expected type: int or float

EXPOCODES_TO_LOAD

Precise expocode to load alone. If empty, no discrimination on expocode will be conducted.

default: []

Expected type: list[str]

PRIORITY

Providers priority list to use when removing duplicates.

default: ["GLODAP_2022", "CMEMS", "ARGO", "NMDC", "CLIVAR", "IMR", "ICES"]

Expected type: list[str]

Others

VERBOSE

Verbose value, the higher, the more informations. If set to 0 or below: no information displayed. If set to 1: minimal informations displayed. If set to 2: very complete informations displayed. If set to 3 or higher: exhaustive informations displayed.

default: 2

Expected type: int

Script Output

When executed, this script will create files in the SAVING_DIR folder with the data from all sources specified in PROVIDERS with a standardized format.

This is an example of what this data could look like:

saving data output example

Source code: