pyxpcm.models.pcm

class pyxpcm.models.pcm(K: int, features: {}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]

Profile Classification Model class constructor

Consume and return xarray objects

__init__(K: int, features: {}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]

Create the PCM instance

Parameters:
K: int

The number of class, or cluster, in the classification model.

features: dict()

The vertical axis to use for each features. eg: {‘temperature’:np.arange(-2000,0,1)}

scaling: int (default: 1)

Define the scaling method:

  • 0: No scaling

  • 1: Center on sample mean and scale by sample std

  • 2: Center on sample mean only

reduction: int (default: 1)

Define the dimensionality reduction method:

  • 0: No reduction

  • 1: Reduction using :class:`sklearn.decomposition.PCA`

maxvar: float (default: 99.9)

Maximum feature variance to preserve in the reduced dataset using sklearn.decomposition.PCA. In %.

classif: str (default: ‘gmm’)

Define the classification method. The only method available as of now is a Gaussian Mixture Model. See sklearn.mixture.GaussianMixture for more details.

covariance_type: str (default: ‘full’)

Define the type of covariance matrix shape to be used in the default classifier GMM. It can be ‘full’ (default), ‘tied’, ‘diag’ or ‘spherical’.

verb: boolean (default: False)

More verbose output

timeit: boolean (default: False)

Register time of operation for performance evaluation

timeit_verb: boolean (default: False)

Print time of operation during execution

chunk_size: ‘auto’ or int

Sampling chunk size, (array of features after pre-processing)

backend: str

Statistic library backend, ‘sklearn’ (default) or ‘dask_ml’

Methods

__init__(K, features[, scaling, reduction, ...])

Create the PCM instance

bic(ds[, features, dim])

Compute Bayesian information criterion for the current model on the input dataset

display([deep])

Display detailed parameters of the PCM This is not a get_params because it doesn't return a dictionary Set Boolean option 'deep' to True for all properties display

fit(ds[, features, dim])

Estimate PCM parameters

fit_predict(ds[, features, dim, inplace, name])

Estimate PCM parameters and predict classes.

predict(ds[, features, dim, inplace, name])

Predict labels for profile samples

predict_proba(ds[, features, dim, inplace, ...])

Predict posterior probability of each components given the data

preprocessing(ds[, features, dim, action, mask])

Dataset pre-processing of feature(s)

preprocessing_this(da[, dim, feature_name, ...])

Pre-process data before anything

ravel(da[, dim, feature_name])

Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z

score(ds[, features, dim])

Compute the per-sample average log-likelihood of the given data

to_netcdf(ncfile, **ka)

Save a PCM to a netcdf file

unravel(ds, sampling_dims, X)

Create a DataArray from a numpy array and sampling dimensions

Attributes

F

Return the number of features

K

Return the number of classes

backend

Return the name of the statistic backend

features

Return features definition dictionnary

fitstats

Estimator fit properties

plot

Access plotting functions

stat

Access statistics functions

timeit

Return a pandas.DataFrame with Execution time of method called on this instance