Skip to content

xp_process⚓︎

Tools (notably xpSpace) for processing and presenting experiment data.

SparseSpace ⚓︎

Bases: dict

Subclass of dict that enforces key conformity to a given namedtuple.

Like a normal dict, it can hold any type of objects. But, since the keys must conform, they effectively follow a coordinate system, so that the dict becomes a vector space.

Examples:

>>> dct = xpSpace(["x", "y", "z"])
>>> dct[(1, 2, 3)] = "pointA"

The coordinate system is specified by the dims: a list of keys defining the namedtuple of self.Coord. The above dict only has three dims, so this fails:

>>> dct[(1, 2, 3, 4)] = "pointB"
Traceback (most recent call last):
...
TypeError: The key (1, 2, 3, 4) did not fit the coord.  system
which has dims ('x', 'y', 'z')

Coordinates can contain any value, including None:

>>> dct[(1, 2, None)] = "pointB"

In intended usage, this space is highly sparse, meaning there are many coordinates with no entry. Indeed, as a data format for nd-arrays, it may be called "coordinate list representation", used e.g. by scipy.sparse.coo_matrix.

Thus, operations across (potentially multiple) dims, such as optimization or averaging, should be carried out by iterating -- not over the dims -- but over the the list of items.

The most important method is nest, which is used (by xpSpace.table_tree) to print and plot results. This is essentially a "groupby" operation, and indeed the case could be made that this class should be replaced by pandas.DataFrame, or better yet: https://github.com/pydata/xarray.

The __getitem__ is quite flexible, allowing accessing by:

  • The actual key, a self.Coord object, or a standard tuple.
    Returns single item. Example:

    >>> dct[1, 2, 3] == dct[(1, 2, 3)] == dct[dct.Coord(1, 2, 3)] == "pointA"
    True
    
  • A slice or list.
    Returns list.
    PS: indexing by slice or list assumes that the dict is ordered, which we inherit from the builtin dict since Python 3.7. Moreover, it is a reflection of the fact that the internals of this class work by looping over items.

In addition, the subspace method (also aliased to __call__, and is implemented via coords_matching) can be used to select items by the values of a subset of their attributes. It returns a SparseSpace. If there is only a single item it can be accessed as in dct[()].

Inspired by

  • https://stackoverflow.com/a/7728830
  • https://stackoverflow.com/q/3387691

__call__(**kwargs) ⚓︎

Shortcut (syntactic sugar) for xp_process.SparseSpace.subspace.

__getitem__(key) ⚓︎

Also allows list-indexing by list and slice.

__init__(dims) ⚓︎

Usually initialized through xpSpace.from_list.

Parameters:

Name Type Description Default
dims list or tuple

The attributes defining the coordinate system.

required

__setitem__(key, val) ⚓︎

Setitem ensuring coordinate conforms.

append_dim(dim) ⚓︎

Expand self.Coord by dim. For each item, insert None in new dim.

coord_from_attrs(obj) ⚓︎

Form a coord for this xpSpace by extracting attrs. from obj.

For instances of self.Coord, this is the identity opeartor, i.e.

self.coord_from_attrs(coord) == coord

coords_matching(**kwargs) ⚓︎

Get all coords matching kwargs.

Used by xp_process.SparseSpace.label_xSection and xp_process.SparseSpace.subspace. Unlike the latter, this function returns a list of keys of the original subspace.

Note that the missingval shenanigans of xp_launch.xpList.inds are here unnecessary since each coordinate is complete.

intersect_dims(attrs) ⚓︎

Rm those a in attrs that are not in self.dims.

This enables sloppy dims allotment, for ease-of-use.

label_xSection(label, *NoneAttrs, **sub_coord) ⚓︎

Insert duplicate entries for the given cross-section.

Works by adding the attr. xSection to the dims of SparseSpace, and setting it to label for entries matching sub_coord, reflecting the "constance/constraint/fixation" this represents. This distinguishes the entries in this fixed-affine subspace, preventing them from being gobbled up by the operations of nest.

If you wish, you can specify the NoneAttrs, which are consequently set to None for the duplicated entries, preventing them from being shown in plot labels and tuning panels.

nest(inner_dims=None, outer_dims=None) ⚓︎

Project along inner_acces to yield a new xpSpace with dims outer_dims

The entries of this xpSpace are themselves xpSpaces, with dims inner_dims, each one regrouping the entries with the same (projected) coordinate.

Note: this method could also be called groupby. Note: this method is also called by __getitem__(key) if key is dict.

subspace(**kwargs) ⚓︎

Get an affine subspace.

NB: If you're calling this repeatedly (for all values of the same kwargs) then you should consider using xp_process.SparseSpace.nest instead.

Examples:

>>> xp_dict.subspace(da_method="EnKF", infl=1, seed=3)

update(items) ⚓︎

Update dict, using the custom __setitem__ to ensure key conformity.

NB: the kwargs syntax is not supported because it only works for keys that consist of (a single) string, which is not very interesting for SparseSpace.

xpSpace ⚓︎

Bases: SparseSpace

Functionality to facilitate working with xps and their results.

fill(xps) ⚓︎

Mass insertion.

from_list(xps, tick_ordering=None) ⚓︎

Init. from a list of objects, typically experiments referred to as xps.

  • Computes the relevant dims from the attributes, and
  • Fills the dict by xps.
  • Computes and writes the attribute ticks.

This creates a SparseSpace of xps. However, the nested subspaces generated by xpSpace.table_tree (for printing and plotting) will hold objects of type UncertainQtty, because it calls mean which calls get_stat(statkey).

get_stat(statkey) ⚓︎

Make xpSpace with same Coord as self, but values xp.avrgs.statkey.

make_ticks(dct, ordering=None) ⚓︎

Unique & sort, for each individual "dim" in dct. Assign to self.ticks.

NB: self.ticks will not "propagate" through SparseSpace.nest or the like.

mean(dims=None) ⚓︎

Compute mean over dims (a list). Returns xpSpace without those dims.

plot(statkey, dims, get_style=default_styles, fignum=None, figsize=None, panels=None, costfun=None, title1=None, title2=None, unique_labels=True, squeeze_labels=True) ⚓︎

Plot (tables of) results.

Analagously to xpSpace.print, the averages are grouped by dims["inner"], which here plays the role of the x-axis.

The averages can also be grouped by dims["outer"], producing a figure with multiple (columns of) panels.

The optimal points/parameters/attributes are plotted in smaller panels below the main plot. This can be turned off by providing the figure dims through the panels argument.

The parameters statkey, dims, costfun, sqeeze_labels are documented in xpSpace.print.

Parameters:

Name Type Description Default
get_style function

A function that takes an object, and returns a dict of line styles, usually as a function of the object's attributes.

default_styles
title1 anything

Figure title (in addition to the the defaults).

None
title2 anything

Figure title (in addition to the defaults). Goes on a new line.

None
unique_labels bool

Only show a given line label once, even if it appears in several panels.

True
squeeze_labels

Don't include redundant attributes in the labels.

True

print(statkey, dims, subcols=True, decimals=None, costfun=None, squeeze_labels=True, colorize=True, title=None) ⚓︎

Print tables of results.

Parameters:

Name Type Description Default
statkey str

The statistic to extract from the xp.avrgs for each xp. Examples: "rmse.a" (i.e. "err.rms.a"), "rmse.ocean.a", "duration".

required
dims dict

Allots (maps) the dims of xpSpace to different roles in the tables.

  • The "role" outer should list the dims/attributes used to define the splitting of the results into separate tables: one table for each distinct combination of attributes.
  • Similarly , the role inner determines which attributes split a table into its columns.
  • mean lists the attributes over which the mean is taken (for that row & column)
  • optim lists the attributes used over which the optimum is searched for (after taking the mean).

Example:

dict(outer='da_method', inner='N', mean='seed',
     optim=('infl','loc_rad'))

Equivalently, use mean=("seed",). It is acceptible to leave this empty: mean=() or mean=None.

required
subcols bool

If True, then subcolumns are added to indicate

  • : the confidence interval. If mean=None is used, this simply reports the value .prec of the statkey, providing this is an UncertainQtty. Otherwise, it is computed as sqrt(var(xps)/N), where xps is the set of statistic gathered over the mean dimensions.
  • *(optim): the optimal point (among all optim attributes), as defined by costfun.
  • : the number of failures (non-finite values) at that point.
  • : the number of successes that go into the value
True
decimals int

Number of decimals to print. If None, this is determined for each statistic by its uncertainty.

None
costfun str or function

Use 'increasing' (default) or 'decreasing' to indicate that the optimum is defined as the lowest or highest value of the statkey found.

None
squeeze_labels bool

Don't include redundant attributes in the line labels. Caution: get_style will not be able to access the eliminated attrs.

True
colorize bool

Add color to tables for readability.

True

squeeze() ⚓︎

Eliminate unnecessary dimensions.

table_tree(statkey, dims, *, costfun=None) ⚓︎

Make hierarchy outer > inner > mean > optim using SparseSpace.nest.

The dimension passed to nest (at each level) is specified by dims. The dimensions of dims['mean'] and dims['optim'] get eliminated by the mean/tune operations. The dims['outer'] and `dims['inner'] become the keys for the output hierarchy.

Note

Cannot support multiple statkeys because it's not (obviously) meaningful when optimizing over dims['optim'].

tickz(dim_name) ⚓︎

Dimension (axis) ticks without None

tune(dims=None, costfun=None) ⚓︎

Get (compile/tabulate) a stat. optimised wrt. tuning params (dims).