pyxpcm.pcm

class pyxpcm.pcm(K: int, features: {}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]

Profile Classification Model class constructor

Consume and return xarray objects

__init__(self, K:int, features:{}, scaling=1, reduction=1, maxvar=15, classif='gmm', covariance_type='full', verb=False, debug=False, timeit=False, timeit_verb=False, chunk_size='auto', backend='sklearn')[source]

Create the PCM instance

Parameters:
K: int

The number of class, or cluster, in the classification model.

features: dict()

The vertical axis to use for each features. eg: {‘temperature’:np.arange(-2000,0,1)}

scaling: int (default: 1)

Define the scaling method:

  • 0: No scaling
  • 1: Center on sample mean and scale by sample std
  • 2: Center on sample mean only
reduction: int (default: 1)

Define the dimensionality reduction method:

  • 0: No reduction
  • 1: Reduction using :class:`sklearn.decomposition.PCA`
maxvar: float (default: 99.9)

Maximum feature variance to preserve in the reduced dataset using sklearn.decomposition.PCA. In %.

classif: str (default: ‘gmm’)

Define the classification method. The only method available as of now is a Gaussian Mixture Model. See sklearn.mixture.GaussianMixture for more details.

covariance_type: str (default: ‘full’)

Define the type of covariance matrix shape to be used in the default classifier GMM. It can be ‘full’ (default), ‘tied’, ‘diag’ or ‘spherical’.

verb: boolean (default: False)

More verbose output

timeit: boolean (default: False)

Register time of operation for performance evaluation

timeit_verb: boolean (default: False)

Print time of operation during execution

chunk_size: ‘auto’ or int

Sampling chunk size, (array of features after pre-processing)

backend: str

Statistic library backend, ‘sklearn’ (default) or ‘dask_ml’

Methods

__init__(self, K, features[, scaling, …]) Create the PCM instance
bic(self, ds[, features, dim]) Compute Bayesian information criterion for the current model on the input dataset
display(self[, deep]) Display detailed parameters of the PCM This is not a get_params because it doesn’t return a dictionary Set Boolean option ‘deep’ to True for all properties display
fit(self, ds[, features, dim]) Estimate PCM parameters
fit_predict(self, ds[, features, dim, …]) Estimate PCM parameters and predict classes.
predict(self, ds[, features, dim, inplace, name]) Predict labels for profile samples
predict_proba(self, ds[, features, dim, …]) Predict posterior probability of each components given the data
preprocessing(self, ds[, features, dim, …]) Dataset pre-processing of feature(s)
preprocessing_this(self, da[, dim, …]) Pre-process data before anything
ravel(self, da[, dim, feature_name]) Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z
score(self, ds[, features, dim]) Compute the per-sample average log-likelihood of the given data
to_netcdf(self, ncfile, **ka) Save a PCM to a netcdf file
unravel(self, ds, sampling_dims, X) Create a DataArray from a numpy array and sampling dimensions

Attributes

F Return the number of features
K Return the number of classes
backend Return the name of the statistic backend
features Return features definition dictionnary
fitstats Estimator fit properties
plot Access plotting functions
stat Access statistics functions
timeit Return a pandas.DataFrame with Execution time of method called on this instance