pyxpcm.pcm¶

class pyxpcm.pcm(K: int, features: {}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]¶

Profile Classification Model class constructor

Consume and return xarray objects

__init__(self, K:int, features:{}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]¶

Create the PCM instance

Parameters:

K: int

The number of class, or cluster, in the classification model.

features: dict()

The vertical axis to use for each features. eg: {‘temperature’:np.arange(-2000,0,1)}

scaling: int (default: 1)

Define the scaling method:

0: No scaling
1: Center on sample mean and scale by sample std
2: Center on sample mean only

reduction: int (default: 1)

Define the dimensionality reduction method:

0: No reduction
1: Reduction using :class:`sklearn.decomposition.PCA`

maxvar: float (default: 99.9)

Maximum feature variance to preserve in the reduced dataset using sklearn.decomposition.PCA. In %.

classif: str (default: ‘gmm’)

Define the classification method. The only method available as of now is a Gaussian Mixture Model. See sklearn.mixture.GaussianMixture for more details.

covariance_type: str (default: ‘full’)

Define the type of covariance matrix shape to be used in the default classifier GMM. It can be ‘full’ (default), ‘tied’, ‘diag’ or ‘spherical’.

verb: boolean (default: False)

More verbose output

timeit: boolean (default: False)

Register time of operation for performance evaluation

timeit_verb: boolean (default: False)

Print time of operation during execution

chunk_size: ‘auto’ or int

Sampling chunk size, (array of features after pre-processing)

backend: str

Statistic library backend, ‘sklearn’ (default) or ‘dask_ml’

Methods

`__init__`(self, K, features[, scaling, …])	Create the PCM instance
`bic`(self, ds[, features, dim])	Compute Bayesian information criterion for the current model on the input dataset
`display`(self[, deep])	Display detailed parameters of the PCM This is not a get_params because it doesn’t return a dictionary Set Boolean option ‘deep’ to True for all properties display
`fit`(self, ds[, features, dim])	Estimate PCM parameters
`fit_predict`(self, ds[, features, dim, …])	Estimate PCM parameters and predict classes.
`predict`(self, ds[, features, dim, inplace, name])	Predict labels for profile samples
`predict_proba`(self, ds[, features, dim, …])	Predict posterior probability of each components given the data
`preprocessing`(self, ds[, features, dim, …])	Dataset pre-processing of feature(s)
`preprocessing_this`(self, da[, dim, …])	Pre-process data before anything
`ravel`(self, da[, dim, feature_name])	Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z
`score`(self, ds[, features, dim])	Compute the per-sample average log-likelihood of the given data
`to_netcdf`(self, ncfile, **ka)	Save a PCM to a netcdf file
`unravel`(self, ds, sampling_dims, X)	Create a DataArray from a numpy array and sampling dimensions

Attributes

`F`	Return the number of features
`K`	Return the number of classes
`backend`	Return the name of the statistic backend
`features`	Return features definition dictionnary
`fitstats`	Estimator fit properties
`plot`	Access plotting functions
`stat`	Access statistics functions
`timeit`	Return a `pandas.DataFrame` with Execution time of method called on this instance