stats⚓︎
Statistics for the assessment of DA methods.
Stats
is a data container for ([mostly] time series of) statistics.
It comes with a battery of methods to compute the default statistics.
Avrgs
is a data container for the same statistics,
but after they have been averaged in time (after the assimilation has finished).
Instances of these objects are created by da_methods.da_method
(i.e. "xp
") objects and written to their .stats
and .avrgs
attributes.
Default statistics⚓︎
List them using
The FAUSt
key/attribute⚓︎
The time series of statistics (the attributes of .stats
) may have attributes
.f
, .a
, .s
, .u
, referring to whether the statistic is for a "forecast",
"analysis", or "smoothing" estimate (as is decided when the calls to
Stats.assess is made), or a "universal" (forecast, but at intermediate
[non-obs.-time]) estimate.
The same applies for the time-averages of .avrgs
.
Field summaries⚓︎
The statistics are also averaged in space.
This is done according to the methods listed in
rc.field_summaries
of the dpr_config
.
Note
Although sometimes pretty close, rmv
(a.k.a. spread.rms
) is not (supposed
to be) an un-biased estimator of rmse
(a.k.a. err.rms
). This is because
of the square roots involved in the field summary. Instead, spread.ms
(i.e.
the mean variance) is the unbiased estimator of err.ms
.
Regional field summaries⚓︎
If the HiddenMarkovModel
has the attribute .sectors
with value, e.g.,
then .stats.rms
and .avrgs.rms
will also have attributes
named after the keys of HMM.sectors
, e.g. stats.err.rms.ocean
.
This also goes for any other (than rms
) type of field summary method.
Declaring new, custom statistics⚓︎
Only the time series created with Stats.new_series will be in the format
operated on by Stats.average_in_time. For example, create ndarray
of
length Ko+1
to hold the time series of estimated inflation values:
Alternatively you can overwrite a default statistic; for example:
error_time_series_a = xx - ensemble_time_series_a.mean(axis=1)
self.stats.err.rms.a = np.sqrt(np.mean(error_time_series_a**2, axis=-1))
Of course, you could just do this
However, xp_launch.run_experiment
(without free=False
) will delete
the Stats
object from xp
after the assimilation, in order to save memory.
Therefore, in order to have my_custom_stat
be available among xp.avrgs
, it
must be "registered":
Alternatively, you can do both at once
Modules:
Name | Description |
---|---|
liveplotting |
On-line (live) plots of the DA process for various models and methods. |
series |
Time series management and processing. |
Avrgs
⚓︎
Bases: StatPrint
, DotDict
A dict
specialized for the averages of statistics.
Embellishments:
tools.series.StatPrint
Avrgs.tabulate
getattr
that supports abbreviations.
__getattribute__(key)
⚓︎
Support deep and abbreviated lookup.
tabulate(statkeys=(), decimals=None)
⚓︎
Tabulate using tabulate_avrgs
Stats
⚓︎
Bases: StatPrint
Contains and computes statistics of the DA methods.
__init__(xp, HMM, xx, yy, liveplots=False, store_u=rc.store_u)
⚓︎
Init the default statistics.
assess(k, ko=None, faus=None, E=None, w=None, mu=None, Cov=None)
⚓︎
Common interface for both Stats
.assess_ens
and Stats
.assess_ext.
The _ens
assessment function gets called if E is not None
,
and _ext
if mu is not None
.
faus: One or more of ['f',' a', 'u', 's']
, indicating
that the result should be stored in (respectively)
the forecast/analysis/universal attribute.
Default: 'u' if ko is None else 'au' ('a' and 'u')
.
assess_ens(now, x, E, w)
⚓︎
Ensemble and Particle filter (weighted/importance) assessment.
assess_ext(now, x, mu, P)
⚓︎
Kalman filter (Gaussian) assessment.
average_in_time(kk=None, kko=None, free=False)
⚓︎
Avarage all univariate (scalar) time series.
kk
time inds for averagingkko
time inds for averaging obs
derivative_stats(now)
⚓︎
Stats that derive from others, and are not specific for _ens
or _ext
).
new_series(name, shape, length='FAUSt', field_mean=False, **kws)
⚓︎
Create (and register) a statistics time series, initialized with nan
s.
If length
is an integer, a DataSeries
(a trivial subclass of
numpy.ndarray
) is made. By default, though, a tools.series.FAUSt
is created.
NB: The sliding_diagnostics
liveplotting relies on detecting nan
's
to avoid plotting stats that are not being used.
Thus, you cannot use dtype=bool
or int
for stats that get plotted.
replay(figlist='default', speed=np.inf, t1=0, t2=None, **kwargs)
⚓︎
Replay LivePlot with what's been stored in 'self'.
- t1, t2: time window to plot.
- 'figlist' and 'speed': See LivePlot's doc.
Note
store_u
(whether to store non-obs-time stats) must
have been True
to have smooth graphs as in the actual LivePlot.
Note
Ensembles are generally not stored in the stats and so cannot be replayed.
summarize_marginals(now)
⚓︎
Compute Mean-field and RMS values.
write(stat_dict, k, ko, sub)
⚓︎
Write stat_dict
to series at (k, ko, sub)
.
align_col(col, pad='␣', missingval='', just='>')
⚓︎
center(E, axis=0, rescale=False)
⚓︎
Center ensemble.
Makes use of np
features: keepdims and broadcasting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
E
|
ndarray
|
Ensemble which going to be inflated |
required |
axis
|
int
|
The axis to be centered. Default: 0 |
0
|
rescale
|
bool
|
If True, inflate to compensate for reduction in the expected variance. The inflation factor is \(\sqrt{\frac{N}{N - 1}}\) where N is the ensemble size. Default: False |
False
|
Returns:
Name | Type | Description |
---|---|---|
X |
ndarray
|
Ensemble anomaly |
x |
ndarray
|
Mean of the ensemble |
inflate_ens(E, factor)
⚓︎
Inflate the ensemble (center, inflate, re-combine).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
E
|
ndarray
|
Ensemble which going to be inflated |
required |
factor
|
`float`
|
Inflation factor |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Inflated ensemble |
mean0(E, axis=0, rescale=True)
⚓︎
Like center
, but only return the anomalies (not the mean).
Uses rescale=True
by default, which is beneficial
when used to center observation perturbations.
register_stat(self, name, value)
⚓︎
Do self.name = value
and register name
as in self's stat_register
.
Note: self
is not always a Stats
object, but could be a "child" of it.
tabulate_avrgs(avrgs_list, statkeys=(), decimals=None)
⚓︎
Tabulate avrgs (val±prec).
unbias_var(w=None, N_eff=None, avoid_pathological=False)
⚓︎
Compute unbias-ing factor for variance estimation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
w
|
ndarray
|
Importance weights. Must sum to 1.
Only one of |
None
|
N_eff
|
float
|
The "effective" size of the weighted ensemble. If not provided, it is computed from the weights. The unbiasing factor is $$ N_{eff} / (N_{eff} - 1) $$. |
None
|
avoid_pathological
|
bool
|
Avoid weight collapse. Default: |
False
|
Returns:
Name | Type | Description |
---|---|---|
ub |
float
|
factor used to unbiasing variance |
Reference
unpack_uqs(uq_list, decimals=None)
⚓︎
Convert list of uq
s into dict of lists (of equal-length) of attributes.
The attributes are obtained by vars(uq)
,
and may get formatted somehow (e.g. cast to strings) in the output.
If uq
is None
, then None
is inserted in each list.
Else, uq
must be an instance of tools.rounding.UncertainQtty
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uq_list
|
list
|
List of |
required |
decimals
|
int
|
Desired number of decimals.
Used for (only) the columns "val" and "prec".
Default: |
None
|
weight_degeneracy(w, prec=1e-10)
⚓︎
Check if the weights are degenerate.
If it is degenerate, the maximum weight should be nearly one since sum(w) = 1
Parameters:
Name | Type | Description | Default |
---|---|---|---|
w
|
ndarray
|
Importance weights. Must sum to 1. |
required |
prec
|
float
|
Tolerance of the distance between w and one. Default:1e-10 |
1e-10
|
Returns:
Type | Description |
---|---|
bool
|
If weight is degenerate True, else False |