Skip to content

xp_launch⚓︎

Tools (notably xpList) for setup and running of experiments (known as xps).

See da_methods.da_method for the strict definition of xps.

Modules:

Name Description
dapper

Root package of DAPPER

pb

Make progbar (wrapper around tqdm) and read1.

xpList ⚓︎

Bases: list

Subclass of list specialized for experiment ("xp") objects.

Main use: administrate experiment launches.

Modifications to list:

  • xpList.append supports unique to enable lazy xp declaration.
  • __iadd__ (+=) supports adding single xps. this is hackey, but convenience is king.
  • __getitem__ supports lists, similar to np.ndarray
  • __repr__: prints the list as rows of a table, where the columns represent attributes whose value is not shared among all xps. Refer to xpList.prep_table for more information.

Add-ons:

  • xpList.launch: run the experiments in current list.
  • xpList.prep_table: find all attributes of the xps in the list; classify as distinct, redundant, or common.
  • xpList.gen_names: use xpList.prep_table to generate a short & unique name for each xp in the list.
  • xpList.tabulate_avrgs: tabulate time-averaged results.
  • xpList.inds to search by kw-attrs.

Parameters:

Name Type Description Default
args entries

Nothing, or a list of xps.

()
unique bool

Duplicates won't get appended. Makes append (and __iadd__) relatively slow. Use extend or __add__ or combinator to bypass this validation.

False
Also see
  • Examples: docs/examples/basic_2, docs/examples/basic_3
  • xp_process.xpSpace, which is used for experient result presentation, as opposed to this class (xpList), which handles launching experiments.

da_methods ⚓︎

List da_method attributes in this list.

__getitem__(keys) ⚓︎

Indexing, also by a list

append(xp) ⚓︎

Append if not self.unique & present.

gen_names(abbrev=6, tab=False) ⚓︎

Similiar to self.__repr__(), but:

  • returns list of names
  • tabulation is optional
  • attaches (abbreviated) labels to each attribute

inds(strict=True, missingval='NONSENSE', **kws) ⚓︎

Find (all) indices of xps whose attributes match kws.

If strict, then xps lacking a requested attr. will not match, unless the missingval matches the required value.

launch(HMM, save_as='noname', mp=False, fail_gently=None, **kwargs) ⚓︎

Essentially: for xp in self: run_experiment(xp, ..., **kwargs).

See run_experiment for documentation on the kwargs and fail_gently. See tools.datafiles.create_run_dir for documentation save_as.

Depending on mp, run_experiment is delegated as follows:

  • False: caller process (no parallelisation)
  • True or "MP" or an int: multiprocessing on this host
  • "GCP" or "Google" or dict(server="GCP"): the DAPPER server (Google Cloud Computing with HTCondor).
    • Specify a list of files as mp["files"] to include them in working directory of the server workers.
    • In order to use absolute paths, the list should cosist of tuples, where the first item is relative to the second (which is an absolute path). The root is then not included in the working directory of the server.
    • If this dict field is empty, then all python files in sys.path[0] are uploaded.

See docs/examples/basic_2.py and docs/examples/basic_3.py for example use.

prep_table(nomerge=()) ⚓︎

Classify all attrs. of all xps as distinct, redundant, or common.

An attribute of the xps is inserted in one of the 3 dicts as follows: The attribute names become dict keys. If the values of an attribute (collected from all of the xps) are all equal, then the attribute is inserted in common, but only with a single value. If they are all the same or missing, then it is inserted in redundant with a single value. Otherwise, it is inserted in distinct, with its full list of values (filling with None where the attribute was missing in the corresponding xp).

The attrs in distinct are sufficient to (but not generally necessary, since there might exist a subset of attributes that) uniquely identify each xp in the list (the redundant and common can be "squeezed" out). Thus, a table of the xps does not need to list all of the attributes. This function also does the heavy lifting for xp_process.xpSpace.squeeze.

Parameters:

Name Type Description Default
nomerge list

Attributes that should always be seen as distinct.

()

combinator(param_dict, **glob_dict) ⚓︎

Mass creation of xp's by combining the value lists in the param_dict.

Returns a function (for_params) that creates all possible combinations of parameters (from their value list) for a given da_methods.da_method. This is a good deal more efficient than relying on xpList's unique. Parameters

  • not found among the args of the given DA method are ignored by for_params.
  • specified as keywords to the for_params fix the value preventing using the corresponding (if any) value list in the param_dict.

Warning

Beware! If, eg., infl or rot are in param_dict, aimed at the EnKF, but you forget that they are also attributes some method where you don't actually want to use them (eg. SVGDF), then you'll create many more than you intend.

run_experiment(xp, label, savedir, HMM, setup=seed_and_simulate, free=True, statkeys=False, fail_gently=False, **stat_kwargs) ⚓︎

Used by xp_launch.xpList.launch to run each single (DA) experiment ("xp").

This involves steps similar to docs/examples/basic_1.py, i.e.:

  • setup : Initialize experiment.
  • xp.assimilate : run DA, pass on exception if fail_gently
  • xp.stats.average_in_time : result averaging
  • xp.avrgs.tabulate : result printing
  • dill.dump : result storage

Parameters:

Name Type Description Default
xp object

Type: a da_methods.da_method-decorated class.

required
label str

Name attached to progressbar during assimilation.

required
savedir str

Path of folder wherein to store the experiment data.

required
HMM HiddenMarkovModel

Container defining the system.

required
free bool

Whether (or not) to del xp.stats after the experiment is done, so as to free up memory and/or not save this data (just keeping xp.avrgs).

True
statkeys list

A list of names (possibly in the form of abbreviations) of the statistical averages that should be printed immediately afther this xp.

False
fail_gently bool

Whether (or not) to propagate exceptions.

False
setup function

This function must take two arguments: HMM and xp, and return the HMM to be used by the DA methods (typically the same as the input HMM, but could be modified), and the (typically synthetic) truth and obs time series.

This gives you the ability to customize almost any aspect of the individual experiments within a batch launch of experiments (i.e. not just the parameters of the DA. method). Typically you will grab one or more parameter values stored in the xp (see da_methods.da_method) and act on them, usually by assigning them to some object that impacts the experiment. Thus, by generating a new xp for each such parameter value you can investigate the impact/sensitivity of the results to this parameter. Examples include:

  • Setting the seed. See the default setup, namely seed_and_simulate, for how this is, or should be, done.
  • Setting some aspect of the HMM such as the observation noise, or the interval between observations. This could be achieved for example by:

    def setup(hmm, xp):
        hmm.Obs.noise = GaussRV(M=hmm.Nx, C=xp.obs_noise)
        hmm.tseq.dkObs = xp.time_between_obs
        import dapper as dpr
        return dpr.seed_and_simulate(hmm, xp)
    

    This process could involve more steps, for example loading a full covariance matrix from a data file, as specified by the obs_noise parameter, before assigning it to C. Also note that the import statement is not strictly necessary (assuming dapper was already imported in the outer scope, typically the main script), except when running the experiments on a remote server.

    Sometimes, the parameter you want to set is not accessible as one of the conventional attributes of the HMM. For example, the Force in the Lorenz-96 model. In that case you can add these lines to the setup function:

    import dapper.mods.Lorenz96 as core
    core.Force = xp.the_force_parameter
    

    However, if your model is an OOP instance, the import approach will not work because it will serve you the original model instance, while setup() deals with a copy of it. Instead, you could re-initialize the entire model in setup() and overwrite HMM.Dyn. However, it is probably easier to just assign the instance to some custom attribute before launching the experiments, e.g. HMM.Dyn.object = the_model_instance, enabling you to set parameters on HMM.Dyn.object in setup(). Note that this approach won't work for modules (for ex., combining the above examples, HMM.Dyn.object = core) because modules are not serializable.

  • Using a different HMM entirely for the truth/obs (xx/yy) generation, than the one that will be used by the DA. Or loading the truth/obs time series from file. In both cases, you might also have to do some cropping or slicing of xx and yy before returning them.

seed_and_simulate

seed_and_simulate(HMM, xp) ⚓︎

Default experiment setup (sets seed and simulates truth and obs).

Used by xp_launch.xpList.launch via xp_launch.run_experiment.

Parameters:

Name Type Description Default
HMM HiddenMarkovModel

Container defining the system.

required
xp object

Type: a da_methods.da_method-decorated class.

xp.seed should be set (and int).

Without xp.seed the seed does not get set, and different xps will use different seeds (unless you do some funky hacking). Reproducibility for a script as a whole can still be achieved by setting the seed at the outset of the script. To avoid even that, set xp.seed to None or "clock".

required

Returns:

Type Description
tuple(xx, yy)

The simulated truth and observations.