Skip to content

bgc_data_processing.core.loaders.base

Base Loaders.

BaseLoader(provider_name, category, exclude, variables)

Bases: ABC

Base class to load data.

Parameters:

Name Type Description Default
provider_name str

Data provider name.

required
category str

Category provider belongs to.

required
exclude list[str]

Filenames to exclude from loading.

required
variables SourceVariableSet

Storer object containing all variables to consider for this data, both the one in the data file but and the one not represented in the file.

required
Source code in src/bgc_data_processing/core/loaders/base.py
30
31
32
33
34
35
36
37
38
39
40
def __init__(
    self,
    provider_name: str,
    category: str,
    exclude: list[str],
    variables: "LoadingVariablesSet",
) -> None:
    self._provider = provider_name
    self._category = category
    self._exclude = exclude
    self._variables = variables

provider: str property

_provider attribute getter.

Returns:

Type Description
str

data provider name.

category: str property

Returns the category of the provider.

Returns:

Type Description
str

Category provider belongs to.

variables: LoadingVariablesSet property

_variables attribute getter.

Returns:

Type Description
LoadingVariablesSet

Loading variables storer.

excluded_filenames: list[str] property

Filenames to exclude from loading.

is_file_valid(filepath)

Indicate whether a file is valid to be kept or not.

Parameters:

Name Type Description Default
filepath Path | str

Name of the file

required

Returns:

Type Description
bool

True if the name is not to be excluded.

Source code in src/bgc_data_processing/core/loaders/base.py
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def is_file_valid(self, filepath: Path | str) -> bool:
    """Indicate whether a file is valid to be kept or not.

    Parameters
    ----------
    filepath : Path | str
        Name of the file

    Returns
    -------
    bool
        True if the name is not to be excluded.
    """
    keep_path = str(filepath) not in self.excluded_filenames
    keep_name = Path(filepath).name not in self.excluded_filenames

    return keep_name and keep_path

load(filepath) abstractmethod

Load data.

Returns:

Type Description
Any

Data object.

Source code in src/bgc_data_processing/core/loaders/base.py
 98
 99
100
101
102
103
104
105
106
107
@abstractmethod
def load(self, filepath: str) -> pd.DataFrame:
    """Load data.

    Returns
    -------
    Any
        Data object.
    """
    ...

remove_nan_rows(df)

Remove rows.

Parameters:

Name Type Description Default
df DataFrame

DatafRame on which to remove rows.

required

Returns:

Type Description
DataFrame

DataFrame with rows removed

Source code in src/bgc_data_processing/core/loaders/base.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def remove_nan_rows(self, df: pd.DataFrame) -> pd.DataFrame:
    """Remove rows.

    Parameters
    ----------
    df : pd.DataFrame
        DatafRame on which to remove rows.

    Returns
    -------
    pd.DataFrame
        DataFrame with rows removed
    """
    # Load keys
    vars_to_remove_when_any_nan = self._variables.to_remove_if_any_nan
    vars_to_remove_when_all_nan = self._variables.to_remove_if_all_nan
    # Check for nans
    if vars_to_remove_when_any_nan:
        any_nans = df[vars_to_remove_when_any_nan].isna().any(axis=1)
    else:
        any_nans = pd.Series(False, index=df.index)
    if vars_to_remove_when_all_nan:
        all_nans = df[vars_to_remove_when_all_nan].isna().all(axis=1)
    else:
        all_nans = pd.Series(False, index=df.index)
    # Get indexes to drop
    indexes_to_drop = df[any_nans | all_nans].index
    return df.drop(index=indexes_to_drop)