Skip to content

bgc_data_processing.core.sources

Data Source objects.

DataSource(provider_name, data_format, dirin, data_category, excluded_files, files_pattern, variable_ensemble, **kwargs)

Data Source.

Parameters:

Name Type Description Default
provider_name str

Name of the data provider.

required
data_format str

Data format.

required
dirin Path | str

Input data directory.

required
data_category str

Category of the data.

required
excluded_files list[str]

Files not to load.

required
files_pattern FileNamePattern

Pattern to match to load files.

required
variable_ensemble SourceVariableSet

Ensembles of variables to consider.

required
Source code in src/bgc_data_processing/core/sources.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    provider_name: str,
    data_format: str,
    dirin: Path | str,
    data_category: str,
    excluded_files: list[str],
    files_pattern: "FileNamePattern",
    variable_ensemble: "SourceVariableSet",
    **kwargs,
) -> None:
    self._format = data_format
    self._category = data_category
    self._vars_ensemble = variable_ensemble
    self._store_vars = variable_ensemble.storing_variables
    self._files_pattern = files_pattern
    self._dirin = Path(dirin)
    self._provider = provider_name
    self._read_kwargs = kwargs
    self._prov_name = provider_name
    self._excl = excluded_files
    self._loader = None

as_template: dict[str, Any] property

Create template to easily re-create a similar data source.

Returns:

Type Description
dict[str, Any]

Arguements to create a similar data source.

dirin: Path property

Directory with data to load.

files_pattern: FileNamePattern property

Pattern to match for files in input directory.

provider: str property

Name of the data provider.

data_format: str property

Name of the data format.

data_category: str property

Name of the data category.

variables: SourceVariableSet property

Ensemble of all variables.

loader: BaseLoader property

Data loader.

saving_order: list[str] property writable

Saving Order for variables.

load_and_save(saving_directory, dateranges_gen, constraints)

Save data in files as soon as the data is loaded to relieve memory.

Parameters:

Name Type Description Default
saving_directory Path | str

Path to the directory to save in.

required
dateranges_gen DateRangeGenerator

Generator to use to retrieve dateranges.

required
constraints Constraints

Contraints ot apply on data.

required
Source code in src/bgc_data_processing/core/sources.py
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def load_and_save(
    self,
    saving_directory: Path | str,
    dateranges_gen: "DateRangeGenerator",
    constraints: "Constraints",
) -> None:
    """Save data in files as soon as the data is loaded to relieve memory.

    Parameters
    ----------
    saving_directory : Path | str
        Path to the directory to save in.
    dateranges_gen : DateRangeGenerator
        Generator to use to retrieve dateranges.
    constraints : Constraints
        Contraints ot apply on data.
    """
    date_label = self._vars_ensemble.get(self._vars_ensemble.date_var_name).label
    date_constraint = constraints.get_constraint_parameters(date_label)
    pattern_matcher = self._files_pattern.build_from_constraint(date_constraint)
    pattern_matcher.validate = self.loader.is_file_valid
    filepaths = pattern_matcher.select_matching_filepath(
        research_directory=self._dirin,
    )
    for filepath in filepaths:
        storer = self._create_storer(filepath=filepath, constraints=constraints)
        saver = StorerSaver(storer)
        saver.save_from_daterange(
            dateranges_gen=dateranges_gen,
            saving_directory=Path(saving_directory),
        )

load_all(constraints)

Load all files for the loader.

Parameters:

Name Type Description Default
constraints Constraints

Constraints slicer., by default Constraints()

required

Returns:

Type Description
Storer

Storer for the loaded data.

Source code in src/bgc_data_processing/core/sources.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
def load_all(self, constraints: "Constraints") -> "Storer":
    """Load all files for the loader.

    Parameters
    ----------
    constraints : Constraints, optional
        Constraints slicer., by default Constraints()

    Returns
    -------
    Storer
        Storer for the loaded data.
    """
    date_label = self._vars_ensemble.get(self._vars_ensemble.date_var_name).label
    date_constraint = constraints.get_constraint_parameters(date_label)
    pattern_matcher = self._files_pattern.build_from_constraint(date_constraint)
    pattern_matcher.validate = self.loader.is_file_valid
    filepaths = pattern_matcher.select_matching_filepath(
        research_directory=self._dirin,
    )
    storers = []
    for filepath in filepaths:
        storer = self._create_storer(filepath=filepath, constraints=constraints)
        storers.append(storer)
    return sum(storers)