API reference¶

This page provides an auto-generated summary of pyXpcm’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Top-level PCM functions¶

Creating a PCM¶

`pcm`(K, features[, scaling, reduction, …])	Profile Classification Model class constructor
`pyxpcm.load_netcdf`(ncfile)	Load a PCM model from netcdf file

Attributes¶

`pcm.K`	Return the number of classes
`pcm.F`	Return the number of features
`pcm.features`	Return features definition dictionnary

Computation¶

`pcm.fit`(self, ds[, features, dim])	Estimate PCM parameters
`pcm.fit_predict`(self, ds[, features, dim, …])	Estimate PCM parameters and predict classes.
`pcm.predict`(self, ds[, features, dim, …])	Predict labels for profile samples
`pcm.predict_proba`(self, ds[, features, dim, …])	Predict posterior probability of each components given the data
`pcm.score`(self, ds[, features, dim])	Compute the per-sample average log-likelihood of the given data
`pcm.bic`(self, ds[, features, dim])	Compute Bayesian information criterion for the current model on the input dataset

Low-level PCM properties and functions¶

`pcm.timeit`	Return a `pandas.DataFrame` with Execution time of method called on this instance
`pcm.ravel`(self, da[, dim, feature_name])	Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z
`pcm.unravel`(self, ds, sampling_dims, X)	Create a DataArray from a numpy array and sampling dimensions

Plotting¶

pcm.plot Access plotting functions

Plot PCM Contents¶

`plot.quantile`(m, da[, xlim, classdimname, …])	Plot q-th quantiles of a dataArray for each PCM components
`plot.scaler`(m[, style, plot_kw, subplot_kw])	Plot PCM scalers properties
`plot.reducer`(m[, pcalist, style, maxcols, …])	Plot PCM reducers properties
`plot.preprocessed`(m, ds[, features, dim, n, …])	Plot preprocessed features as pairwise scatter plots
`plot.timeit`(m[, group, split, subplot_kw, …])	Plot PCM registered timing of operations

Tools¶

`plot.cmap`(m, name[, palette, usage])	Return categorical colormaps
`plot.colorbar`(m[, cmap])	Add a colorbar to the current plot with centered ticks on discrete colors
`plot.subplots`(m[, maxcols, K, subplot_kw])	Return (figure, axis) with one subplot per cluster
`plot.latlongrid`(ax[, dx, dy, fontsize])	Add latitude/longitude grid line and labels to a cartopy geoaxes

Statistics¶

`pcm.stat`	Access statistics functions
`stat.quantile`(ds[, q, of, using, outname, …])	Compute q-th quantile of a `xarray.DataArray` for each PCM components
`stat.robustness`(ds[, name, classdimname, …])	Compute classification robustness
`stat.robustness_digit`(ds[, name, …])	Digitize classification robustness

Save/load PCM models¶

`pcm.to_netcdf`(self, ncfile, \\ka)	Save a PCM to a netcdf file
`pyxpcm.load_netcdf`(ncfile)	Load a PCM model from netcdf file

Helper¶

tutorial.open_dataset(name) Open a dataset from the pyXpcm online data repository (requires internet).

Xarray pyxpcm name space¶

Provide accessor to enhance interoperability between xarray and pyxpcm.

Provide a scope named pyxpcm as accessor to xarray.Dataset objects.

class pyxpcm.xarray.pyXpcmDataSetAccessor[source]¶

Class registered under scope pyxpcm to access xarray.Dataset objects.

add(self, da)[source]¶: Add a xarray.DataArray to this xarray.Dataset

bic(self, this_pcm, **kwargs)[source]¶

Compute Bayesian information criterion for the current model on the input dataset

Only for a GMM classifier

Parameters:	ds: :class:`xarray.Dataset` The dataset to work with features: dict() Definitions of PCM features in the input `xarray.Dataset`. If not specified or set to None, features are identified using `xarray.DataArray` attributes ‘feature_name’. dim: str Name of the vertical dimension in the input `xarray.Dataset`
Returns:	bic: float The lower the better

drop_all(self)[source]¶: Remove xarray.DataArray created with pyXpcm front this xarray.Dataset

feature_dict(self, this_pcm, features=None)[source]¶

Return dictionary of features for this xarray.Dataset and a PCM

Parameters:	pcm : `pyxpcm.pcmmodel.pcm` features : dict Keys are PCM feature name, Values are corresponding `xarray.Dataset` variable names
Returns:	dict() Dictionary where keys are PCM feature names and values the corresponding `xarray.Dataset` variables

fit(self, this_pcm, **kwargs)[source]¶

Estimate PCM parameters

For a PCM, the fit method consists in the following operations:

pre-processing
- interpolation to the feature_axis levels of the model
- scaling
- reduction
estimate classifier parameters

Parameters:	ds: :class:`xarray.Dataset` The dataset to work with features: dict() Definitions of PCM features in the input `xarray.Dataset`. If not specified or set to None, features are identified using `xarray.DataArray` attributes ‘feature_name’. dim: str Name of the vertical dimension in the input `xarray.Dataset`
Returns:	self

fit_predict(self, this_pcm, **kwargs)[source]¶

Estimate PCM parameters and predict classes.

This method add these properties to the PCM object:

llh: The log likelihood of the model with regard to new data

Parameters:

ds: :class:`xarray.Dataset`: The dataset to work with
features: dict(): Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.
dim: str: Name of the vertical dimension in the input xarray.Dataset
inplace: boolean, False by default: If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray
name: string (‘PCM_LABELS’): Name of the DataArray holding labels.

Returns:

xarray.DataArray: Component labels (if option ‘inplace’ = False)
or
xarray.Dataset: Input dataset with component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

mask(self, this_pcm, features=None, dim=None)[source]¶

Create a mask where all PCM features are defined

Create a mask where all feature profiles are not null over the PCM feature axis.

Parameters:

:class:`pyxpcm.pcmmodel.pcm`
features : dict(): Definitions of this_pcm features in the xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.
dim : str: Name of the vertical dimension in the xarray.Dataset. If not specified or set to None, dim is identified as the xarray.DataArray variables with attributes ‘axis’ set to ‘z’.

Returns:

xarray.DataArray

predict(self, this_pcm, inplace=False, **kwargs)[source]¶

Predict labels for profile samples

This method add these properties to the PCM object:

llh: The log likelihood of the model with regard to new data

Parameters:

ds: :class:`xarray.Dataset`: The dataset to work with
features: dict(): Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.
dim: str: Name of the vertical dimension in the input xarray.Dataset
inplace: boolean, False by default: If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray
name: str, default is ‘PCM_LABELS’: Name of the xarray.DataArray with labels

Returns:

xarray.DataArray: Component labels (if option ‘inplace’ = False)
or
xarray.Dataset: Input dataset with Component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

predict_proba(self, this_pcm, **kwargs)[source]¶

Predict posterior probability of each components given the data

This method adds these properties to the PCM instance:

llh: The log likelihood of the model with regard to new data

Parameters:

ds: :class:`xarray.Dataset`: The dataset to work with
features: dict(): Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.
dim: str: Name of the vertical dimension in the input xarray.Dataset
inplace: boolean, False by default: If False, return a xarray.DataArray with predicted probabilities If True, return the input xarray.Dataset with probabilities added as a new xarray.DataArray
name: str, default is ‘PCM_POST’: Name of the DataArray with prediction probability (posteriors)
classdimname: str, default is ‘pcm_class’: Name of the dimension holding classes

Returns:

xarray.DataArray: Probability of each Gaussian (state) in the model given each sample (if option ‘inplace’ = False)
or
xarray.Dataset: Input dataset with Component Probability as a ‘PCM_POST’ new xarray.DataArray (if option ‘inplace’ = True)

quantile(self, this_pcm, inplace=False, **kwargs)[source]¶

Compute q-th quantile of a xarray.DataArray for each PCM components

Parameters:

q: float in the range of [0,1] (or sequence of floats): Quantiles to compute, which must be between 0 and 1 inclusive.
of: str: Name of the xarray.Dataset variable to compute quantiles for.
using: str: Name of the xarray.Dataset variable with classification labels to use. Use ‘PCM_LABELS’ by default.
outname: ‘PCM_QUANT’ or str: Name of the xarray.DataArray with quantile
keep_attrs: boolean, False by default: Preserve of xarray.Dataset attributes or not in the new quantile variable.

Returns:

xarray.Dataset with shape (K, n_quantiles, N_z=n_features)
or
xarray.DataArray with shape (K, n_quantiles, N_z=n_features)

robustness(self, this_pcm, inplace=False, **kwargs)[source]¶

Compute classification robustness

Parameters:

name: str, default is ‘PCM_POST’: Name of the xarray.DataArray with prediction probability (posteriors)
classdimname: str, default is ‘pcm_class’: Name of the dimension holding classes
outname: ‘PCM_ROBUSTNESS’ or str: Name of the xarray.DataArray with robustness
inplace: boolean, False by default: If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness added as a new xarray.DataArray

Returns:

xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False

robustness_digit(self, this_pcm, inplace=False, **kwargs)[source]¶

Digitize classification robustness

Parameters:

ds: :class:`xarray.Dataset`: Input dataset
name: str, default is ‘PCM_POST’: Name of the xarray.DataArray with prediction probability (posteriors)
classdimname: str, default is ‘pcm_class’: Name of the dimension holding classes
outname: ‘PCM_ROBUSTNESS_CAT’ or str: Name of the xarray.DataArray with robustness categories
inplace: boolean, False by default: If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness categories added as a new xarray.DataArray

Returns:

xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False

sampling_dim(self, this_pcm, features=None, dim=None)[source]¶

Return the list of dimensions to be stacked for sampling

Parameters:	pcm : `pyxpcm.pcm` features : None (default) or dict() Keys are PCM feature name, Values are corresponding `xarray.Dataset` variable names. It set to None, all PCM features are used. dim : None (default) or str() The `xarray.Dataset` dimension to use as vertical axis in all features. If set to None, it is automatically set to the dimension with an attribute `axis` set to `Z`.
Returns:	dict() Dictionary where keys are `xarray.Dataset` variable names of features and values are another dictionary with the list of sampling dimension in DIM_SAMPLING key and the name of the vertical axis in the DIM_VERTICAL key.

score(self, this_pcm, **kwargs)[source]¶

Compute the per-sample average log-likelihood of the given data

Parameters:	ds: :class:`xarray.Dataset` The dataset to work with features: dict() Definitions of PCM features in the input `xarray.Dataset`. If not specified or set to None, features are identified using `xarray.DataArray` attributes ‘feature_name’. dim: str Name of the vertical dimension in the input `xarray.Dataset`
Returns:	log_likelihood: float In the case of a GMM classifier, this is the Log likelihood of the Gaussian mixture given data

split(self)[source]¶

Split pyXpcm variables from the original xarray.Dataset

Returns:	`xarray.Dataset`, `xarray.Dataset` Two DataSest: one with pyXpcm variables, one with the original DataSet