xp_process⚓︎

Tools (notably xpSpace) for processing and presenting experiment data.

`SparseSpace` ⚓︎

Bases: dict

Subclass of dict that enforces key conformity to a given namedtuple.

Like a normal dict, it can hold any type of objects. But, since the keys must conform, they effectively follow a coordinate system, so that the dict becomes a vector space.

Examples:

>>> dct = xpSpace(["x", "y", "z"])
>>> dct[(1, 2, 3)] = "pointA"

The coordinate system is specified by the dims: a list of keys defining the namedtuple of self.Coord. The above dict only has three dims, so this fails:

>>> dct[(1, 2, 3, 4)] = "pointB"
Traceback (most recent call last):
...
TypeError: The key (1, 2, 3, 4) did not fit the coord.  system
which has dims ('x', 'y', 'z')

Coordinates can contain any value, including None:

>>> dct[(1, 2, None)] = "pointB"

In intended usage, this space is highly sparse, meaning there are many coordinates with no entry. Indeed, as a data format for nd-arrays, it may be called "coordinate list representation", used e.g. by scipy.sparse.coo_matrix.

Thus, operations across (potentially multiple) dims, such as optimization or averaging, should be carried out by iterating -- not over the dims -- but over the the list of items.

The most important method is nest, which is used (by xpSpace.table_tree) to print and plot results. This is essentially a "groupby" operation, and indeed the case could be made that this class should be replaced by pandas.DataFrame, or better yet: https://github.com/pydata/xarray.

The __getitem__ is quite flexible, allowing accessing by:

The actual key, a self.Coord object, or a standard tuple.
Returns single item. Example:
```
>>> dct[1, 2, 3] == dct[(1, 2, 3)] == dct[dct.Coord(1, 2, 3)] == "pointA"
True
```
A slice or list.
Returns list.
PS: indexing by slice or list assumes that the dict is ordered, which we inherit from the builtin dict since Python 3.7. Moreover, it is a reflection of the fact that the internals of this class work by looping over items.

In addition, the subspace method (also aliased to __call__, and is implemented via coords_matching) can be used to select items by the values of a subset of their attributes. It returns a SparseSpace. If there is only a single item it can be accessed as in dct[()].

Inspired by

https://stackoverflow.com/a/7728830
https://stackoverflow.com/q/3387691

`call(**kwargs)` ⚓︎

Shortcut (syntactic sugar) for xp_process.SparseSpace.subspace.

`getitem(key)` ⚓︎

Also allows list-indexing by list and slice.

`init(dims)` ⚓︎

Usually initialized through xpSpace.from_list.

Parameters:

Name	Type	Description	Default
`dims`	`list or tuple`	The attributes defining the coordinate system.	required

`setitem(key, val)` ⚓︎

Setitem ensuring coordinate conforms.

`append_dim(dim)` ⚓︎

Expand self.Coord by dim. For each item, insert None in new dim.

`coord_from_attrs(obj)` ⚓︎

Form a coord for this xpSpace by extracting attrs. from obj.

For instances of self.Coord, this is the identity opeartor, i.e.

self.coord_from_attrs(coord) == coord

`coords_matching(**kwargs)` ⚓︎

Get all coords matching kwargs.

Used by xp_process.SparseSpace.label_xSection and xp_process.SparseSpace.subspace. Unlike the latter, this function returns a list of keys of the original subspace.

Note that the missingval shenanigans of xp_launch.xpList.inds are here unnecessary since each coordinate is complete.

`intersect_dims(attrs)` ⚓︎

Rm those a in attrs that are not in self.dims.

This enables sloppy dims allotment, for ease-of-use.

`label_xSection(label, *NoneAttrs, **sub_coord)` ⚓︎

Insert duplicate entries for the given cross-section.

Works by adding the attr. xSection to the dims of SparseSpace, and setting it to label for entries matching sub_coord, reflecting the "constance/constraint/fixation" this represents. This distinguishes the entries in this fixed-affine subspace, preventing them from being gobbled up by the operations of nest.

If you wish, you can specify the NoneAttrs, which are consequently set to None for the duplicated entries, preventing them from being shown in plot labels and tuning panels.

`nest(inner_dims=None, outer_dims=None)` ⚓︎

Project along inner_acces to yield a new xpSpace with dims outer_dims

The entries of this xpSpace are themselves xpSpaces, with dims inner_dims, each one regrouping the entries with the same (projected) coordinate.

Note: this method could also be called groupby. Note: this method is also called by __getitem__(key) if key is dict.

`subspace(**kwargs)` ⚓︎

Get an affine subspace.

NB: If you're calling this repeatedly (for all values of the same kwargs) then you should consider using xp_process.SparseSpace.nest instead.

Examples:

>>> xp_dict.subspace(da_method="EnKF", infl=1, seed=3)

`update(items)` ⚓︎

Update dict, using the custom __setitem__ to ensure key conformity.

NB: the kwargs syntax is not supported because it only works for keys that consist of (a single) string, which is not very interesting for SparseSpace.

`xpSpace` ⚓︎

Bases: SparseSpace

Functionality to facilitate working with xps and their results.

`fill(xps)` ⚓︎

Mass insertion.

`from_list(xps, tick_ordering=None)` ⚓︎

Init. from a list of objects, typically experiments referred to as xps.

Computes the relevant dims from the attributes, and
Fills the dict by xps.
Computes and writes the attribute ticks.

This creates a SparseSpace of xps. However, the nested subspaces generated by xpSpace.table_tree (for printing and plotting) will hold objects of type UncertainQtty, because it calls mean which calls get_stat(statkey).

`get_stat(statkey)` ⚓︎

Make xpSpace with same Coord as self, but values xp.avrgs.statkey.

`make_ticks(dct, ordering=None)` ⚓︎

Unique & sort, for each individual "dim" in dct. Assign to self.ticks.

NB: self.ticks will not "propagate" through SparseSpace.nest or the like.

`mean(dims=None)` ⚓︎

Compute mean over dims (a list). Returns xpSpace without those dims.

`plot(statkey, dims, get_style=default_styles, fignum=None, figsize=None, panels=None, costfun=None, title1=None, title2=None, unique_labels=True, squeeze_labels=True)` ⚓︎

Plot (tables of) results.

Analagously to xpSpace.print, the averages are grouped by dims["inner"], which here plays the role of the x-axis.

The averages can also be grouped by dims["outer"], producing a figure with multiple (columns of) panels.

The optimal points/parameters/attributes are plotted in smaller panels below the main plot. This can be turned off by providing the figure dims through the panels argument.

The parameters statkey, dims, costfun, sqeeze_labels are documented in xpSpace.print.

Parameters:

Name	Type	Description	Default
`get_style`	`function`	A function that takes an object, and returns a dict of line styles, usually as a function of the object's attributes.	`default_styles`
`title1`	`anything`	Figure title (in addition to the the defaults).	`None`
`title2`	`anything`	Figure title (in addition to the defaults). Goes on a new line.	`None`
`unique_labels`	`bool`	Only show a given line label once, even if it appears in several panels.	`True`
`squeeze_labels`		Don't include redundant attributes in the labels.	`True`

`print(statkey, dims, subcols=True, decimals=None, costfun=None, squeeze_labels=True, colorize=True, title=None)` ⚓︎

Print tables of results.

Parameters:

Name	Type	Description	Default
`statkey`	`str`	The statistic to extract from the `xp.avrgs` for each `xp`. Examples: `"rmse.a"` (i.e. `"err.rms.a"`), `"rmse.ocean.a"`, `"duration"`.	required
`dims`	`dict`	Allots (maps) the dims of `xpSpace` to different roles in the tables. The "role" `outer` should list the dims/attributes used to define the splitting of the results into separate tables: one table for each distinct combination of attributes. Similarly , the role `inner` determines which attributes split a table into its columns. `mean` lists the attributes over which the mean is taken (for that row & column) `optim` lists the attributes used over which the optimum is searched for (after taking the mean). Example: `dict(outer='da_method', inner='N', mean='seed', optim=('infl','loc_rad'))` Equivalently, use `mean=("seed",)`. It is acceptible to leave this empty: `mean=()` or `mean=None`.	required
`subcols`	`bool`	If `True`, then subcolumns are added to indicate `1σ`: the confidence interval. If `mean=None` is used, this simply reports the value `.prec` of the `statkey`, providing this is an `UncertainQtty`. Otherwise, it is computed as `sqrt(var(xps)/N)`, where `xps` is the set of statistic gathered over the `mean` dimensions. `*(optim)`: the optimal point (among all `optim` attributes), as defined by `costfun`. `☠`: the number of failures (non-finite values) at that point. `✓`: the number of successes that go into the value	`True`
`decimals`	`int`	Number of decimals to print. If `None`, this is determined for each statistic by its uncertainty.	`None`
`costfun`	`str or function`	Use `'increasing'` (default) or `'decreasing'` to indicate that the optimum is defined as the lowest or highest value of the `statkey` found.	`None`
`squeeze_labels`	`bool`	Don't include redundant attributes in the line labels. Caution: `get_style` will not be able to access the eliminated attrs.	`True`
`colorize`	`bool`	Add color to tables for readability.	`True`

`squeeze()` ⚓︎

Eliminate unnecessary dimensions.

`table_tree(statkey, dims, *, costfun=None)` ⚓︎

Make hierarchy outer > inner > mean > optim using SparseSpace.nest.

The dimension passed to nest (at each level) is specified by dims. The dimensions of dims['mean'] and dims['optim'] get eliminated by the mean/tune operations. The dims['outer'] and `dims['inner'] become the keys for the output hierarchy.

Note

Cannot support multiple statkeys because it's not (obviously) meaningful when optimizing over dims['optim'].

`tickz(dim_name)` ⚓︎

Dimension (axis) ticks without None

`tune(dims=None, costfun=None)` ⚓︎

Get (compile/tabulate) a stat. optimised wrt. tuning params (dims).

xp_process⚓︎

SparseSpace ⚓︎

__call__(**kwargs) ⚓︎

__getitem__(key) ⚓︎

__init__(dims) ⚓︎

__setitem__(key, val) ⚓︎

append_dim(dim) ⚓︎

coord_from_attrs(obj) ⚓︎

coords_matching(**kwargs) ⚓︎

intersect_dims(attrs) ⚓︎

label_xSection(label, *NoneAttrs, **sub_coord) ⚓︎

nest(inner_dims=None, outer_dims=None) ⚓︎

subspace(**kwargs) ⚓︎

update(items) ⚓︎

xpSpace ⚓︎

fill(xps) ⚓︎

from_list(xps, tick_ordering=None) ⚓︎

get_stat(statkey) ⚓︎

make_ticks(dct, ordering=None) ⚓︎

mean(dims=None) ⚓︎

plot(statkey, dims, get_style=default_styles, fignum=None, figsize=None, panels=None, costfun=None, title1=None, title2=None, unique_labels=True, squeeze_labels=True) ⚓︎

print(statkey, dims, subcols=True, decimals=None, costfun=None, squeeze_labels=True, colorize=True, title=None) ⚓︎

squeeze() ⚓︎

table_tree(statkey, dims, *, costfun=None) ⚓︎

tickz(dim_name) ⚓︎

tune(dims=None, costfun=None) ⚓︎

`SparseSpace` ⚓︎

`call(**kwargs)` ⚓︎

`getitem(key)` ⚓︎

`init(dims)` ⚓︎

`setitem(key, val)` ⚓︎

`append_dim(dim)` ⚓︎

`coord_from_attrs(obj)` ⚓︎

`coords_matching(**kwargs)` ⚓︎

`intersect_dims(attrs)` ⚓︎

`label_xSection(label, *NoneAttrs, **sub_coord)` ⚓︎

`nest(inner_dims=None, outer_dims=None)` ⚓︎

`subspace(**kwargs)` ⚓︎

`update(items)` ⚓︎

`xpSpace` ⚓︎

`fill(xps)` ⚓︎

`from_list(xps, tick_ordering=None)` ⚓︎

`get_stat(statkey)` ⚓︎

`make_ticks(dct, ordering=None)` ⚓︎

`mean(dims=None)` ⚓︎

`plot(statkey, dims, get_style=default_styles, fignum=None, figsize=None, panels=None, costfun=None, title1=None, title2=None, unique_labels=True, squeeze_labels=True)` ⚓︎

`print(statkey, dims, subcols=True, decimals=None, costfun=None, squeeze_labels=True, colorize=True, title=None)` ⚓︎

`squeeze()` ⚓︎

`table_tree(statkey, dims, *, costfun=None)` ⚓︎

`tickz(dim_name)` ⚓︎

`tune(dims=None, costfun=None)` ⚓︎