API reference

This page provides an auto-generated summary of pyXpcm’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Top-level PCM functions

Creating a PCM

pcm(K, features[, scaling, reduction, …]) Profile Classification Model class constructor
pyxpcm.load_netcdf(ncfile) Load a PCM model from netcdf file

Attributes

pcm.K Return the number of classes
pcm.F Return the number of features
pcm.features Return features definition dictionnary

Computation

pcm.fit(self, ds[, features, dim]) Estimate PCM parameters
pcm.fit_predict(self, ds[, features, dim, …]) Estimate PCM parameters and predict classes.
pcm.predict(self, ds[, features, dim, …]) Predict labels for profile samples
pcm.predict_proba(self, ds[, features, dim, …]) Predict posterior probability of each components given the data
pcm.score(self, ds[, features, dim]) Compute the per-sample average log-likelihood of the given data
pcm.bic(self, ds[, features, dim]) Compute Bayesian information criterion for the current model on the input dataset

Low-level PCM properties and functions

pcm.timeit Return a pandas.DataFrame with Execution time of method called on this instance
pcm.ravel(self, da[, dim, feature_name]) Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z
pcm.unravel(self, ds, sampling_dims, X) Create a DataArray from a numpy array and sampling dimensions

Plotting

pcm.plot Access plotting functions

Plot PCM Contents

plot.quantile(m, da[, xlim, classdimname, …]) Plot q-th quantiles of a dataArray for each PCM components
plot.scaler(m[, style, plot_kw, subplot_kw]) Plot PCM scalers properties
plot.reducer(m[, pcalist, style, maxcols, …]) Plot PCM reducers properties
plot.preprocessed(m, ds[, features, dim, n, …]) Plot preprocessed features as pairwise scatter plots
plot.timeit(m[, group, split, subplot_kw, …]) Plot PCM registered timing of operations

Tools

plot.cmap(m, name[, palette, usage]) Return categorical colormaps
plot.colorbar(m[, cmap]) Add a colorbar to the current plot with centered ticks on discrete colors
plot.subplots(m[, maxcols, K, subplot_kw]) Return (figure, axis) with one subplot per cluster
plot.latlongrid(ax[, dx, dy, fontsize]) Add latitude/longitude grid line and labels to a cartopy geoaxes

Statistics

pcm.stat Access statistics functions
stat.quantile(ds[, q, of, using, outname, …]) Compute q-th quantile of a xarray.DataArray for each PCM components
stat.robustness(ds[, name, classdimname, …]) Compute classification robustness
stat.robustness_digit(ds[, name, …]) Digitize classification robustness

Save/load PCM models

pcm.to_netcdf(self, ncfile, \*\*ka) Save a PCM to a netcdf file
pyxpcm.load_netcdf(ncfile) Load a PCM model from netcdf file

Helper

tutorial.open_dataset(name) Open a dataset from the pyXpcm online data repository (requires internet).

Xarray pyxpcm name space

Provide accessor to enhance interoperability between xarray and pyxpcm.

Provide a scope named pyxpcm as accessor to xarray.Dataset objects.

class pyxpcm.xarray.pyXpcmDataSetAccessor[source]

Class registered under scope pyxpcm to access xarray.Dataset objects.

add(self, da)[source]

Add a xarray.DataArray to this xarray.Dataset

bic(self, this_pcm, **kwargs)[source]

Compute Bayesian information criterion for the current model on the input dataset

Only for a GMM classifier

Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
bic: float

The lower the better

drop_all(self)[source]

Remove xarray.DataArray created with pyXpcm front this xarray.Dataset

feature_dict(self, this_pcm, features=None)[source]

Return dictionary of features for this xarray.Dataset and a PCM

Parameters:
pcm : pyxpcm.pcmmodel.pcm
features : dict

Keys are PCM feature name, Values are corresponding xarray.Dataset variable names

Returns:
dict()

Dictionary where keys are PCM feature names and values the corresponding xarray.Dataset variables

fit(self, this_pcm, **kwargs)[source]

Estimate PCM parameters

For a PCM, the fit method consists in the following operations:

  • pre-processing
    • interpolation to the feature_axis levels of the model
    • scaling
    • reduction
  • estimate classifier parameters
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
self
fit_predict(self, this_pcm, **kwargs)[source]

Estimate PCM parameters and predict classes.

This method add these properties to the PCM object:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray

name: string (‘PCM_LABELS’)

Name of the DataArray holding labels.

Returns:
xarray.DataArray

Component labels (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

mask(self, this_pcm, features=None, dim=None)[source]

Create a mask where all PCM features are defined

Create a mask where all feature profiles are not null over the PCM feature axis.

Parameters:
:class:`pyxpcm.pcmmodel.pcm`
features : dict()

Definitions of this_pcm features in the xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim : str

Name of the vertical dimension in the xarray.Dataset. If not specified or set to None, dim is identified as the xarray.DataArray variables with attributes ‘axis’ set to ‘z’.

Returns:
xarray.DataArray
predict(self, this_pcm, inplace=False, **kwargs)[source]

Predict labels for profile samples

This method add these properties to the PCM object:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray

name: str, default is ‘PCM_LABELS’

Name of the xarray.DataArray with labels

Returns:
xarray.DataArray

Component labels (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with Component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

predict_proba(self, this_pcm, **kwargs)[source]

Predict posterior probability of each components given the data

This method adds these properties to the PCM instance:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted probabilities If True, return the input xarray.Dataset with probabilities added as a new xarray.DataArray

name: str, default is ‘PCM_POST’

Name of the DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

Returns:
xarray.DataArray

Probability of each Gaussian (state) in the model given each sample (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with Component Probability as a ‘PCM_POST’ new xarray.DataArray (if option ‘inplace’ = True)

quantile(self, this_pcm, inplace=False, **kwargs)[source]

Compute q-th quantile of a xarray.DataArray for each PCM components

Parameters:
q: float in the range of [0,1] (or sequence of floats)

Quantiles to compute, which must be between 0 and 1 inclusive.

of: str

Name of the xarray.Dataset variable to compute quantiles for.

using: str

Name of the xarray.Dataset variable with classification labels to use. Use ‘PCM_LABELS’ by default.

outname: ‘PCM_QUANT’ or str

Name of the xarray.DataArray with quantile

keep_attrs: boolean, False by default

Preserve of xarray.Dataset attributes or not in the new quantile variable.

Returns:
xarray.Dataset with shape (K, n_quantiles, N_z=n_features)
or
xarray.DataArray with shape (K, n_quantiles, N_z=n_features)
robustness(self, this_pcm, inplace=False, **kwargs)[source]

Compute classification robustness

Parameters:
name: str, default is ‘PCM_POST’

Name of the xarray.DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

outname: ‘PCM_ROBUSTNESS’ or str

Name of the xarray.DataArray with robustness

inplace: boolean, False by default

If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness added as a new xarray.DataArray

Returns:
xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False
robustness_digit(self, this_pcm, inplace=False, **kwargs)[source]

Digitize classification robustness

Parameters:
ds: :class:`xarray.Dataset`

Input dataset

name: str, default is ‘PCM_POST’

Name of the xarray.DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

outname: ‘PCM_ROBUSTNESS_CAT’ or str

Name of the xarray.DataArray with robustness categories

inplace: boolean, False by default

If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness categories added as a new xarray.DataArray

Returns:
xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False
sampling_dim(self, this_pcm, features=None, dim=None)[source]

Return the list of dimensions to be stacked for sampling

Parameters:
pcm : pyxpcm.pcm
features : None (default) or dict()

Keys are PCM feature name, Values are corresponding xarray.Dataset variable names. It set to None, all PCM features are used.

dim : None (default) or str()

The xarray.Dataset dimension to use as vertical axis in all features. If set to None, it is automatically set to the dimension with an attribute axis set to Z.

Returns:
dict()

Dictionary where keys are xarray.Dataset variable names of features and values are another dictionary with the list of sampling dimension in DIM_SAMPLING key and the name of the vertical axis in the DIM_VERTICAL key.

score(self, this_pcm, **kwargs)[source]

Compute the per-sample average log-likelihood of the given data

Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
log_likelihood: float

In the case of a GMM classifier, this is the Log likelihood of the Gaussian mixture given data

split(self)[source]

Split pyXpcm variables from the original xarray.Dataset

Returns:
xarray.Dataset, xarray.Dataset

Two DataSest: one with pyXpcm variables, one with the original DataSet