pyXpcm: Ocean Profile Classification Model

pyXpcm is a python package to create and work with ocean Profile Classification Model that consumes and produces Xarray objects. Xarray objects are N-D labeled arrays and datasets in Python.

An ocean Profile Classification Model allows to automatically assemble ocean profiles in clusters according to their vertical structure similarities. The geospatial properties of these clusters can be used to address a large variety of oceanographic problems: front detection, water mass identification, natural region contouring (gyres, eddies), reference profile selection for QC validation, etc… The vertical structure of these clusters furthermore provides a highly synthetic representation of large ocean areas that can be used for dimensionality reduction and coherent intercomparisons of ocean data (re)-analysis or simulations.

Documentation

Getting Started

Overview

What is an ocean PCM?

An ocean PCM is a Profile Classification Model for ocean data, a statistical procedure to classify ocean vertical profiles into a finite set of “clusters”. Depending on the dataset, such clusters can show space/time coherence that can be used in many different ways to study the ocean.

Statistic method

It consists in conducting un-supervised classification (clustering) with vertical profiles of one or more ocean variables.

Each levels of the vertical axis of each ocean variables are considered a feature. One ocean vertical profile with ocean variables is considered a sample.

All the details of the Profile Classification Modelling (PCM) statistical methodology can be found in Maze et al, 2017.

Illustration

Given a collection of Argo temperature profiles in the North Atlantic, a PCM analysis is applied and produces an optimal set of 8 ocean temperature profile classes. The PCM clusters synthesize the structural information of heat distribution in the North Atlantic. Each clusters objectively define an ocean region where dynamic gives rise to an unique vertical stratification pattern.

_images/graphical-abstract.png

Maze et al, 2017 applied it to the North Atlantic with Argo temperature data. Jones et al, 2019, later applied it to the Southern Ocean, also with Argo temperature data. Rosso et al (in prep) has applied it to the Southern Indian Ocean using both temperature and salinity Argo data.

pyXpcm

pyXpcm is an Python implementation of the PCM method that consumes and produces Xarray objects (xarray.Dataset and xarray.DataArray), hence the x.

With pyXpcm you can conduct a PCM analysis for a collection of profiles (gridded or not), of one or more ocean variables, stored in an xarray.Dataset. pyXpcm also provides basic statistics and plotting functions to get you started with your analysis.

The philosophy of the pyXpcm toolbox is to create and be able to use a PCM from and on different ocean datasets and variables. In order to achieve this, a PCM is created with information about ocean variables to classify and the vertical axis of these variables. Then this PCM can be fitted and subsequently classify ocean profiles from any datasets, as long as it contains the PCM variables.

The pyXpcm procedure is to preprocess (stack, scale, reduce and combine data) and then to fit a classifier on data. Once the model is fitted pyXpcm can classify data. The library uses many language and logic from Scikit-learn but doesn’t inherit from a sklearn.BaseEstimator.

Installation

Required dependencies

  • Python 3.6
  • Xarray 0.12
  • Dask 0.16
  • Scikit-learn 0.19

Note that Scikit-learn is the default statistic backend, but that if Dask_ml is installed you can use it as well (see API reference).

For full plotting functionality (see the Plotting API) the following packages are required:

  • Matplotlib 3.0 (mandatory)
  • Cartopy 0.17 (for some methods only)
  • Seaborn 0.9.0 (for some methods only)

Instructions

For the latest public release:

pip install pyxpcm

For the latest development version:

pip install git+http://github.com/obidam/pyxpcm.git

Pre-trained PCM

We’ll try to list here pre-trained PCM models ready for you to use and classify your data with.

Features K [1] Relevant domain Training data Access link Reference
Temperature (0-1400m,5m) 8 North Atlantic Argo (2000-2014) Archimer Maze et al (2017)
Temperature (15-980db,5db) 8 Southern Ocean Argo (2001-2017) Zenodo Jones et al (2019)
[1]Number of classes

User guide

Standard procedure

Here is a standard procedure based on pyXpcm. This will show you how to create a model, how to fit/train it, how to classify data and to visualise results.

Create a model

Let’s import the Profile Classification Model (PCM) constructor:

[2]:
from pyxpcm.models import pcm

A PCM can be created independently of any dataset using the class constructor.

To be created a PCM requires a number of classes (or clusters) and a dictionary to define the list of features and their vertical axis:

[3]:
z = np.arange(0.,-1000,-10.)
pcm_features = {'temperature': z, 'salinity':z}

We can now instantiate a PCM, say with 8 classes:

[4]:
m = pcm(K=8, features=pcm_features)
m
[4]:
<pcm 'gmm' (K: 8, F: 2)>
Number of class: 8
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>

Here we created a PCM with 8 classes (K=8) and 2 features (F=2) that are temperature and salinity profiles defined between the surface and 1000m depth.

We furthermore note the list of transform methods that will be used to preprocess each of the features (see the preprocessing documentation page for more details).

Note that the number of classes and features are PCM properties accessible at pyxpcm.pcm.K and pyxpcm.pcm.F.

Load training data

pyXpcm is able to work with both gridded datasets (eg: model outputs with longitude,latitude,time dimensions) and collection of profiles (eg: Argo, XBT, CTD section profiles).

In this example, let’s import a sample of North Atlantic Argo data that come with pyxpcm.pcm:

[5]:
import pyxpcm
ds = pyxpcm.tutorial.open_dataset('argo').load()
print(ds)
<xarray.Dataset>
Dimensions:    (DEPTH: 282, N_PROF: 7560)
Coordinates:
  * DEPTH      (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Dimensions without coordinates: N_PROF
Data variables:
    LATITUDE   (N_PROF) float32 ...
    LONGITUDE  (N_PROF) float32 ...
    TIME       (N_PROF) datetime64[ns] ...
    DBINDEX    (N_PROF) float64 ...
    TEMP       (N_PROF, DEPTH) float32 ...
    PSAL       (N_PROF, DEPTH) float32 ...
    SIG0       (N_PROF, DEPTH) float32 ...
    BRV2       (N_PROF, DEPTH) float32 ...
Attributes:
    Sample test prepared by:  G. Maze
    Institution:              Ifremer/LOPS
    Data source DOI:          10.17882/42182

Fit the model on data

Fitting can be done on any dataset coherent with the PCM definition, in a sense that it must have the feature variables of the PCM.

To tell the PCM model how to identify features in any xarray.Dataset, we need to provide a dictionary of variable names mapping:

[6]:
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}

which means that the PCM feature temperature is to be found in the dataset variables TEMP.

We also need to specify what is the vertical dimension of the dataset variables:

[7]:
features_zdim='DEPTH'

Now we’re ready to fit the model on the this dataset:

[8]:
m.fit(ds, features=features_in_ds, dim=features_zdim)
m
[8]:
<pcm 'gmm' (K: 8, F: 2)>
Number of class: 8
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>
         log likelihood of the training set: 38.825127

Note

pyXpcm can also identify PCM features and axis within a xarray.DataSet with variable attributes. From the example above we can set:

ds['TEMP'].attrs['feature_name'] = 'temperature'
ds['PSAL'].attrs['feature_name'] = 'salinity'
ds['DEPTH'].attrs['axis'] = 'Z'

And then simply call the fit method without arguments:

m.fit(ds)

Note that if data follows the CF the vertical dimension axis attribute should already be set to Z.

Classify data

Now that the PCM is fitted, we can predict the classification results like:

[9]:
m.predict(ds, features=features_in_ds, inplace=True)
ds
[9]:
<xarray.Dataset>
Dimensions:     (DEPTH: 282, N_PROF: 7560)
Coordinates:
  * N_PROF      (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
  * DEPTH       (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Data variables:
    LATITUDE    (N_PROF) float32 ...
    LONGITUDE   (N_PROF) float32 ...
    TIME        (N_PROF) datetime64[ns] ...
    DBINDEX     (N_PROF) float64 ...
    TEMP        (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
    PSAL        (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
    SIG0        (N_PROF, DEPTH) float32 ...
    BRV2        (N_PROF, DEPTH) float32 ...
    PCM_LABELS  (N_PROF) int64 7 7 7 7 7 7 7 7 7 7 7 7 ... 2 2 2 2 2 2 2 2 2 2 2
Attributes:
    Sample test prepared by:  G. Maze
    Institution:              Ifremer/LOPS
    Data source DOI:          10.17882/42182

Prediction labels are automatically added to the dataset as PCM_LABELS because the option inplace was set to True. We didn’t specify the dim option because our dataset is CF compliant.

pyXpcm use a GMM classifier by default, which is a fuzzy classifier. So we can also predict the probability of each classes for all profiles, the so-called posteriors:

[10]:
m.predict_proba(ds, features=features_in_ds, inplace=True)
ds
[10]:
<xarray.Dataset>
Dimensions:     (DEPTH: 282, N_PROF: 7560, pcm_class: 8)
Coordinates:
  * N_PROF      (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
  * DEPTH       (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Dimensions without coordinates: pcm_class
Data variables:
    LATITUDE    (N_PROF) float32 ...
    LONGITUDE   (N_PROF) float32 ...
    TIME        (N_PROF) datetime64[ns] ...
    DBINDEX     (N_PROF) float64 ...
    TEMP        (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
    PSAL        (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
    SIG0        (N_PROF, DEPTH) float32 ...
    BRV2        (N_PROF, DEPTH) float32 ...
    PCM_LABELS  (N_PROF) int64 7 7 7 7 7 7 7 7 7 7 7 7 ... 2 2 2 2 2 2 2 2 2 2 2
    PCM_POST    (pcm_class, N_PROF) float64 3.999e-41 2.313e-41 ... 0.0 0.0
Attributes:
    Sample test prepared by:  G. Maze
    Institution:              Ifremer/LOPS
    Data source DOI:          10.17882/42182

which are added to the dataset as the PCM_POST variables. The probability of classes for each profiles has a new dimension pcm_class by default that goes from 0 to K-1.

Note

You can delete variables added by pyXpcm to the xarray.DataSet with the pyxpcm.xarray.pyXpcmDataSetAccessor.drop_all() method:

ds.pyxpcm.drop_all()

Or you can split pyXpcm variables out of the original xarray.DataSet:

ds_pcm, ds = ds.pyxpcm.split()

It is important to note that once the PCM is fitted, you can predict labels for any dataset, as long as it has the PCM features.

For instance, let’s predict labels for a gridded dataset:

[12]:
ds_gridded = pyxpcm.tutorial.open_dataset('isas_snapshot').load()
ds_gridded
[12]:
<xarray.Dataset>
Dimensions:      (depth: 152, latitude: 53, longitude: 61)
Coordinates:
  * latitude     (latitude) float32 30.023445 30.455408 ... 49.41288 49.737103
  * longitude    (longitude) float32 -70.0 -69.5 -69.0 ... -41.0 -40.5 -40.0
  * depth        (depth) float32 -1.0 -3.0 -5.0 ... -1960.0 -1980.0 -2000.0
Data variables:
    TEMP         (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    TEMP_ERR     (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    TEMP_PCTVAR  (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL         (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL_ERR     (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL_PCTVAR  (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    SST          (latitude, longitude) float32 dask.array<chunksize=(53, 61), meta=np.ndarray>
[13]:
m.predict(ds_gridded, features={'temperature':'TEMP','salinity':'PSAL'}, dim='depth', inplace=True)
ds_gridded
[13]:
<xarray.Dataset>
Dimensions:      (depth: 152, latitude: 53, longitude: 61)
Coordinates:
  * latitude     (latitude) float64 30.02 30.46 30.89 ... 49.09 49.41 49.74
  * longitude    (longitude) float64 -70.0 -69.5 -69.0 ... -41.0 -40.5 -40.0
  * depth        (depth) float32 -1.0 -3.0 -5.0 ... -1960.0 -1980.0 -2000.0
Data variables:
    TEMP         (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    TEMP_ERR     (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    TEMP_PCTVAR  (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL         (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL_ERR     (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    PSAL_PCTVAR  (depth, latitude, longitude) float32 dask.array<chunksize=(152, 53, 61), meta=np.ndarray>
    SST          (latitude, longitude) float32 dask.array<chunksize=(53, 61), meta=np.ndarray>
    PCM_LABELS   (latitude, longitude) float64 7.0 7.0 7.0 7.0 ... 4.0 4.0 4.0

where you can see the adition of the PCM_LABELS variable.

Vertical structure of classes

One key outcome of the PCM analysis if the vertical structure of each classes. This can be computed using the :meth:pyxpcm.stat.quantile method.

Below we compute the 5, 50 and 95% quantiles for temperature and salinity of each classes:

[14]:
for vname in ['TEMP', 'PSAL']:
    ds = ds.pyxpcm.quantile(m, q=[0.05, 0.5, 0.95], of=vname, outname=vname + '_Q', keep_attrs=True, inplace=True)
ds
[14]:
<xarray.Dataset>
Dimensions:     (DEPTH: 282, N_PROF: 7560, pcm_class: 8, quantile: 3)
Coordinates:
  * pcm_class   (pcm_class) int64 0 1 2 3 4 5 6 7
  * N_PROF      (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
  * DEPTH       (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
  * quantile    (quantile) float64 0.05 0.5 0.95
Data variables:
    LATITUDE    (N_PROF) float32 27.122 27.818 27.452 26.976 ... 4.243 4.15 4.44
    LONGITUDE   (N_PROF) float32 -74.86 -75.6 -74.949 ... -1.263 -0.821 -0.002
    TIME        (N_PROF) datetime64[ns] 2008-06-23T13:07:30 ... 2013-03-09T14:52:58.124999936
    DBINDEX     (N_PROF) float64 1.484e+04 1.622e+04 ... 8.557e+03 1.063e+04
    TEMP        (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
    PSAL        (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
    SIG0        (N_PROF, DEPTH) float32 23.601229 23.601229 ... 27.685583
    BRV2        (N_PROF, DEPTH) float32 0.00029447526 ... 4.500769e-06
    PCM_LABELS  (N_PROF) int64 7 7 7 7 7 7 7 7 7 7 7 7 ... 2 2 2 2 2 2 2 2 2 2 2
    PCM_POST    (pcm_class, N_PROF) float64 3.999e-41 2.313e-41 ... 0.0 0.0
    TEMP_Q      (pcm_class, quantile, DEPTH) float64 11.22 11.22 ... 5.266 5.241
    PSAL_Q      (pcm_class, quantile, DEPTH) float64 35.13 35.13 ... 35.12 35.12
Attributes:
    Sample test prepared by:  G. Maze
    Institution:              Ifremer/LOPS
    Data source DOI:          10.17882/42182

Quantiles can be plotted using the :func:pyxpcm.plot.quantile method.

[15]:
fig, ax = m.plot.quantile(ds['TEMP_Q'], maxcols=4, figsize=(10, 8), sharey=True)
_images/example_38_0.png

Geographic distribution of classes

Warning

To follow this section you’ll need to have Cartopy installed and working.

A map of labels can now easily be plotted:

[16]:
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-80,1,-1,66]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,5), dpi=120, facecolor='w', edgecolor='k', subplot_kw=subplot_kw)

kmap = m.plot.cmap()
sc = ax.scatter(ds['LONGITUDE'], ds['LATITUDE'], s=3, c=ds['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K)
cl = m.plot.colorbar(ax=ax)

gl = m.plot.latlongrid(ax, dx=10)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
ax.set_title('LABELS of the training set')
plt.show()
_images/example_42_0.png

Since we predicted labels for 2 datasets, we can superimpose them

[17]:
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-75,-35,25,55]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,5), dpi=120, facecolor='w', edgecolor='k', subplot_kw=subplot_kw)

kmap = m.plot.cmap()
sc = ax.pcolor(ds_gridded['longitude'], ds_gridded['latitude'], ds_gridded['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K)
sc = ax.scatter(ds['LONGITUDE'], ds['LATITUDE'], s=10, c=ds['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K, edgecolors=[0.3]*3, linewidths=0.3)
cl = m.plot.colorbar(ax=ax)

gl = m.plot.latlongrid(ax, dx=10)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
ax.set_title('LABELS of the training set (dots) and another product (shade)')
plt.show()
_images/example_44_0.png

Posteriors are defined for each data point and give the probability of that point to belong to any of the classes. It can be plotted this way:

[18]:
cmap = sns.light_palette("blue", as_cmap=True)
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-80,1,-1,66]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = m.plot.subplots(figsize=(10,22), maxcols=2, subplot_kw=subplot_kw)

for k in m:
    sc = ax[k].scatter(ds['LONGITUDE'], ds['LATITUDE'], s=3, c=ds['PCM_POST'].sel(pcm_class=k),
                       cmap=cmap, transform=proj, vmin=0, vmax=1)
    cl = plt.colorbar(sc, ax=ax[k], fraction=0.03)
    gl = m.plot.latlongrid(ax[k], fontsize=8, dx=20, dy=10)
    ax[k].add_feature(cfeature.LAND)
    ax[k].add_feature(cfeature.COASTLINE)
    ax[k].set_title('PCM Posteriors k=%i' % k)
_images/example_46_0.png

PCM properties

The PCM class does a lot of data preprocessing under the hood in order to classify profiles.

Here is how to access PCM preprocessed results and data.

Import and set-up

Import the library and toy data

[2]:
import pyxpcm
from pyxpcm.models import pcm

Let’s work with a standard PCM of temperature and salinity, from the surface down to -1000m:

[3]:
# Define a vertical axis to work with
z = np.arange(0.,-1000,-10.)

# Define features to use
features_pcm = {'temperature': z, 'salinity': z}

# Instantiate the PCM
m = pcm(K=4, features=features_pcm, maxvar=2)
print(m)
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>

Note that here we used a strong dimensionality reduction to limit the dimensions and size of the plots to come (maxvar==2 tell the PCM to use the first 2 PCAs of each variables).

Now we can load a dataset to be used for fitting.

[4]:
ds = pyxpcm.tutorial.open_dataset('argo').load()

Fit and predict model on data:

[5]:
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
ds = ds.pyxpcm.fit_predict(m, features=features_in_ds, inplace=True)
print(ds)
<xarray.Dataset>
Dimensions:     (DEPTH: 282, N_PROF: 7560)
Coordinates:
  * N_PROF      (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
  * DEPTH       (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Data variables:
    LATITUDE    (N_PROF) float32 ...
    LONGITUDE   (N_PROF) float32 ...
    TIME        (N_PROF) datetime64[ns] ...
    DBINDEX     (N_PROF) float64 ...
    TEMP        (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
    PSAL        (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
    SIG0        (N_PROF, DEPTH) float32 ...
    BRV2        (N_PROF, DEPTH) float32 ...
    PCM_LABELS  (N_PROF) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0 0 0 0
Attributes:
    Sample test prepared by:  G. Maze
    Institution:              Ifremer/LOPS
    Data source DOI:          10.17882/42182

Scaler properties

[6]:
fig, ax = m.plot.scaler()

# More options:
# m.plot.scaler(style='darkgrid')
# m.plot.scaler(style='darkgrid', subplot_kw={'ylim':[-1000,0]})
_images/pcm_prop_11_0.png

Reducer properties

Plot eigen vectors for a PCA reducer or nothing if no reduced used

[7]:
fig, ax = m.plot.reducer()
# Equivalent to:
# pcmplot.reducer(m)

# More options:
# m.plot.reducer(pcalist = range(0,4));
# m.plot.reducer(pcalist = [0], maxcols=1);
# m.plot.reducer(pcalist = range(0,4), style='darkgrid',  plot_kw={'linewidth':1.5}, subplot_kw={'ylim':[-1400,0]}, figsize=(12,10));
_images/pcm_prop_13_0.png

Scatter plot of features, as seen by the classifier

You can have access to pre-processed data for your own plot/analysis through the pyxpcm.pcm.preprocessing() method:

[8]:
X, sampling_dims = m.preprocessing(ds, features=features_in_ds)
X
[8]:
<xarray.DataArray (n_samples: 7560, n_features: 4)>
array([[ 1.9281656 , -0.09149919,  1.7340997 , -0.27024782],
       [ 2.314077  ,  0.10684185,  2.0836833 , -0.18765019],
       [ 1.6755121 , -0.17313023,  1.5637012 , -0.43244886],
       ...,
       [-0.802601  , -0.5783772 , -1.5761338 , -0.31184074],
       [-0.9552184 , -0.6094395 , -1.8049222 , -0.4273216 ],
       [-0.8925139 , -0.6237318 , -1.7922652 , -0.46551177]],
      dtype=float32)
Coordinates:
  * n_samples   (n_samples) int64 0 1 2 3 4 5 ... 7554 7555 7556 7557 7558 7559
  * n_features  (n_features) <U13 'temperature_0' ... 'salinity_1'

pyXpcm return a 2-dimensional xarray.DataArray for which pairwise relationship can easily be visualise with the pyxpcm.plot.preprocessed() method (this requires Seaborn):

[9]:
g = m.plot.preprocessed(ds, features=features_in_ds, style='darkgrid')

# A posteriori adjustements:
# g.set(xlim=(-3,3),ylim=(-3,3))
# g.savefig('toto.png')
_images/pcm_prop_17_0.png
[10]:
# Combine KDE with histrograms (very slow plot, so commented here):
g = m.plot.preprocessed(ds, features=features_in_ds, kde=True)
_images/pcm_prop_18_0.png

Save and load PCM from local files

PCM instances are light weigth python objects and can easily be saved on and loaded from files. pyXpcm uses the netcdf file format because it is easy to add meta-data to numerical arrays.

Import and set-up

Import the library and toy data

[2]:
import pyxpcm
from pyxpcm.models import pcm

# Load tutorial data:
ds = pyxpcm.tutorial.open_dataset('argo').load()

Saving a model

Let’s first create a PCM and fit it onto the tutorial dataset:

[3]:
# Define a vertical axis to work with
z = np.arange(0.,-1000,-10.)

# Define features to use
features_pcm = {'temperature': z, 'salinity': z}

# Instantiate the PCM:
m = pcm(K=4, features=features_pcm)

# Fit:
m.fit(ds, features={'temperature': 'TEMP', 'salinity': 'PSAL'})
[3]:
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>
         log likelihood of the training set: 33.234424

We can now save the fitted model to a local file:

[4]:
m.to_netcdf('my_pcm.nc')

Loading a model

To load a PCM from file, use:

[5]:
m_loaded = pyxpcm.load_netcdf('my_pcm.nc')
m_loaded
[5]:
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
         Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
         Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
         Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>
         log likelihood of the training set: 33.234424
{
“cells”: [
{

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.013553, “end_time”: “2020-02-11T21:00:30.433971”, “exception”: false, “start_time”: “2020-02-11T21:00:30.420418”,
“exception”: false, “start_time”: “2020-02-11T23:09:46.832799”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“.. _preprocessing:”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.009143, “end_time”: “2020-02-11T21:00:30.453308”, “exception”: false, “start_time”: “2020-02-11T21:00:30.444165”,
“exception”: false, “start_time”: “2020-02-11T23:09:46.857003”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“# Features preprocessing”

]

}, {

“cell_type”: “code”, “execution_count”: 1, “metadata”: {

“nbsphinx”: “hidden”, “papermill”: {
<<<<<<< Updated upstream
“duration”: 2.487577, “end_time”: “2020-02-11T21:00:32.948821”, “exception”: false, “start_time”: “2020-02-11T21:00:30.461244”,
“exception”: false, “start_time”: “2020-02-11T23:09:46.875302”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “outputs”: [], “source”: [

“# Hidden notebook set-upn”, “n”, “import os, sysn”, “import numpy as npn”, “import pandas as pdn”, “import xarray as xrn”, “import matplotlib.pyplot as pltn”, “%matplotlib inlinen”, “sys.path.insert(0, os.path.abspath(‘/Users/gmaze/git/github/gmaze/pyxpcm’))n”, “n”, “import pyxpcmn”, “from pyxpcm.models import pcmn”, “n”, “import seaborn as snsn”, “import cartopy.crs as ccrsn”, “import cartopy.feature as cfeaturen”, “import matplotlib.ticker as mtickern”, “import matplotlib as mpln”, “n”, “# Load sample data:n”, “ds = pyxpcm.tutorial.open_dataset(‘isas_snapshot’).load()n”, “n”, “# Define vertical axis and features to use:n”, “z = ds[‘depth’].where(ds[‘depth’]>=-1200, drop=True)n”, “features_pcm = {‘TEMP’: z, ‘TEMP’: z}n”, “n”, “m = pcm(K=3, features=features_pcm)”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.006653, “end_time”: “2020-02-11T21:00:32.963525”, “exception”: false, “start_time”: “2020-02-11T21:00:32.956872”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.663360”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“The Profile Classification Model (PCM) requires data to be preprocessed in order to match the model vertical axis, to scale feature dimensions with each others and to reduce the dimensionality of the problem. Some of these steps are mandatory and they all can be user parameterised.n”, “n”, “The PCM preprocessing operations are organised into 4 steps:n”, “n”, “.. image:: _static/Preprocessing_pipeline_2lines.pngn”, ” :width: 100%n”, ” :align: center”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.006608, “end_time”: “2020-02-11T21:00:32.977136”, “exception”: false, “start_time”: “2020-02-11T21:00:32.970528”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.677573”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“## Stack”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.00741, “end_time”: “2020-02-11T21:00:32.991539”, “exception”: false, “start_time”: “2020-02-11T21:00:32.984129”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.691422”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“This step mask, extract, flatten and transform any ND-array set of feature variables (eg: temperature, salinity) into a plain 2D-array collection of vertical profiles usable for machine learning methods.”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.006983, “end_time”: “2020-02-11T21:00:33.006119”, “exception”: false, “start_time”: “2020-02-11T21:00:32.999136”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.704866”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: [], “toc-hr-collapsed”: false

}, “source”: [

“### Mask”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.00649, “end_time”: “2020-02-11T21:00:33.019490”, “exception”: false, “start_time”: “2020-02-11T21:00:33.013000”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.719397”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“This step computes a mask of the input data that will reject all profiles that are full of nans over the depth range of feature vertical axis. This ensure that all feature variables will be successfully retrieved to fill in the plain 2D-array collection of profiles.n”, “n”, “This operation is conducted by pyxpcm.xarray.pyXpcmDataSetAccessor.mask(), so that the mask can be computed (and plotted) this way:”

]

}, {

“cell_type”: “code”, “execution_count”: 2, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.021824, “end_time”: “2020-02-11T21:00:33.048361”, “exception”: false, “start_time”: “2020-02-11T21:00:33.026537”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.732762”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

“<xarray.DataArray ‘pcm_MASK’ (latitude: 53, longitude: 61)>n”, “dask.array<eq, shape=(53, 61), dtype=bool, chunksize=(53, 61), chunktype=numpy.ndarray>n”, “Coordinates:n”, ” * longitude (longitude) float32 -70.0 -69.5 -69.0 -68.5 … -41.0 -40.5 -40.0n”, ” * latitude (latitude) float32 30.023445 30.455408 … 49.41288 49.737103n”

]

}

], “source”: [

“mask = ds.pyxpcm.mask(m)n”, “print(mask)”

]

}, {

“cell_type”: “code”, “execution_count”: 3, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.221273, “end_time”: “2020-02-11T21:00:33.276820”, “exception”: false, “start_time”: “2020-02-11T21:00:33.055547”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.762424”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “outputs”: [

{
“data”: {

“image/png”: “iVBORw0KGgoAAAANSUhEUgAAAYUAAAELCAYAAAA2mZrgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3de7xcVX3+8c+TiKgIAQQrWFQU1CoXAQGhASLeEMErlovVCqhIrZaLBakKwapoFYvFVtDyA7Qi8VIUQcALBBOh0XBVEAveqkKxhYAGBJJznt8fe5+Tncnk7H0mM+fMzHner9d+ndmXtfd3wjDfWWvtvZZsExERATBrugOIiIj+kaQQERHjkhQiImJckkJERIxLUoiIiHFJChERMS5JISKiD0haX9I1ku6TdFCb/dtIuro85qhexZGkEBHRH1YArwPOWMv+jwDHAXsDR0jatBdBJClERPQB26O275rgkG1tX2d7JbAQ2K0XcTyqFyedDpLyaHZENGZb61L+aVut51/9ZmWTQ+8H5rRsO9X2/EleshrvMqAnNYWhSQoAL16zGS5iIF1x503THcJQm73F7et8jl/9ZiUjd23b5Fpz1jUBlUYrrzcGbuzCOdeQ5qOIiA6t8MrapYtul7SzpNnAPsAPunnyMUNVU4iImEqjdLfVWtJXgZ2A5ZL2AL4NzLG9ADgJOAdYDzjP9r1dvXgpSSEiokOjq7XorDvbr5tg3x0UNYSeSlKIiOjQyBBOPZCkEBHRoW43H/WDJIWIiA6t6HLzUT9IUoiI6FCajyIiYtzw1ROSFCIiOjaSPoWImAov23LH1dbzhHN/Ghm+nJCkEBHRqTQfRUTEuBVdGdKovyQpRER0aIQkhYiYBlPVx9B6nU7NlD6QJIWIiBg3muajiIgYk5pCRPSFbjXz9MpMuaV2hWdPdwhdl6QQEdGh1BQiImLciIdv8sokhYiIDo0O4YzGSQoRER0axuajKUlzkuZKsqTNJJ0haWG53CPplS3HPq3cPnbMHlMRY0TEZI14Vu0yaKaqpnAssBTA9jEAkh4F3AZ8q83x19o+YIpii4joyAqG7+6jnqcxSQcCi4EHWna9BFhs+6E2xXaVtEjSWZI26HWMERGdSE1hkiTNAo4GXgu8qmX3YcDn2xS7C3iG7eWSTgBOBE5uOe984JSuBxwRPdHuuYpheHZhGDuae/2ODgMubq0NSHos8ALgu60FbD9se3m5eiGwS5tj5ttWdelB7BERExqxapdB0+uksD1wkKTLgR2ABeX2A4HLbI+0FpC0UWV1HnBHj2OMiOjICLNql0HT0+Yj2yeOvZa0EDi4XD0U+Gj1WEnn2j4c2FvSqcDycjm8lzFGRHRqdAD7DOpM2XMKtudVXr+mzf7Dy7+XAJdMVVwREZ16JGMfRUTEmGHsaE5SiIjo0CDeclonSSEiem4Ybj9tZ3QIh7lIUoiI6FBqChERMS6T7ERExLhBfA6hTpJCREzKsPYPdCLPKURExLhhnE8hSSEiokOpKUTEQGtt+mkdvTRNQ5MzjDWF4UtzERFTZMXoo2qXpiQdJekaSVdL2qZl336Slpb7P9X1N1KRpBAR0aFRVLs0IWlT4AhgL+B44LSWQ94PvNb2nsDTJO3QzfdRleajiIgOdfHhtd2Bq8rpBJZKembL/h8BG0v6LfAY4N5uXbhVkkLEkGrSP5A+hHUz2nASHUlu2XSq7fmV9U2A+6pFWo7/KnA58CBwue3fTC7S5pIUIiI61PThtQazQy6jmJRszGjL/jMpZqG8G7hA0p/b/n7TOCcjfQoRER0atWqXhpYA8yTNlrQzcHvL/pXA/bZHKRLIJt17F6tLTSEiokPdGvvI9r2SzgcWASuAIyXtB8yxvQD4ILBQ0iPAf1M0JfVEkkLEkEj/wNSbRE2glu2zgLMqm+6o7LsQuLBrF5tAkkJERIeG8YnmKXlHkuZKsqTNJM2XdIukhZK+vpbjPyBpsaTLJG0+FTFGREzWCKpdBs1UpbljgaWV9VNsz7P9qtYDJW0H7GR7LnAOcMIUxRgRMSld7GjuGz1vPpJ0ILAYqCaA90n6W+Bs2//eUmQv4NLy9aUUT/dFREX6D/rDyiGcZKenNQVJs4CjgU9XNp9p+3nAy4G/lrRtS7Hxhzhs/xHYoM1555fNUeNLb95BRMTajVi1y6CZMClIenSDZb0JTnEYcLHth8Y22L6n/Luc4raq57WUWQbMKa//GOCB1pPanm9b1aXRu42I6KJRz6pdBk1d89HvgTspHrlu92tcwGzgKWspvz2wi6RXAzsACyS91vb9kmYDc4FvtJT5HvBh4Gxgf4qmp4iIvjOIfQZ16pLCT2zvNNEBkm5Y2z7bJ1aOWwgcDJwu6TkUyeRi2zeU+8+1fbjtWyTdJGkxsBx4Y7O3EjNBt9rSW+cRiOhE01FQB8mESaEuITQ9pjxuXvnyLWvZf3jl9cnAyU3OGxExXWZiTWGcpGcAT62Wsf2tXgQVETEIVo4O391HjZKCpH8B9gRuZtXofQaSFKKrpupWyybXSRNT1JlxzUcV+wLPLUfoi4gIhrP5qOn9Ukspmo4iIqI0455olnQtRTPR+sCtkm4FHqa8RbWcLzQiYkYaxC/9OnXNR4dMSRQxIwza0Ayt8aaPIVrNuKRg+1cAkr5o+9DqPklfBA5tWzAiYgZYOYBPLNepaz6aBawHPLsczmIsLW7EmsNTRETMKDOupgC8EzgG2BL4KauSwh8ohqGIiJixZlxSsP1JSZ8C3m37o1MUUwygQesv6ES79zhV/Qwz4d93EA1jUqhtELM9wupzIUREBGCrdhk0TXtJLpP0VkmbVIfN7mlkERF9bqVn1S6DpukTzUeUf/++ss3A07sbTkTE4BjEmkCdRknB9ta9DiT6R9qvm8uzDDPbMPYpNB0Q77EUdyHNLTctBs4op8uMiJiRhrGm0LTB61xgY+A9wEkUzymc26ugIiIGwYwb+6ji2barQ17cLOnGXgQUETEo3G6S4gHXNCn8XtIBti8BkPQKivmbY8Ckv6C3ujVPQ/47DYaRAby7qE7TpHA48M+Szqa46+jGcltExIw1iM1DdZreffQz4BWdXkTSXGARsDlwPPDCctfXbH+k5dinAdcBPyo3nWT72k6vHRHRKzO2+UjSk4G3s+YczYc1vM6xFBP1APw/2yeVg+0tlvQF279uOf5a2wc0PHdExLQYxruPmjYffR24ALiSVXM0NyLpQIpbWF8FYPv28u+opBFgpE2xXSUtAm4Bjrf9wGSuOQzSpjy88t92eAxjUmjcS2L7E7avsn312FJXpqwNHA18us2+w4Cf2b6zZdddwDNs7wX8HDixTdn5klxdmr6PiIhuGRlV7TJomiaF70h6r6TtJT1zbGlQ7jDgYtsPVTdKeiFwJEXCWI3th20vL1cvBHZpc8x826ouDd9HRETXDOOAeE2bj3Yv/764ss3AvjXltgd2kfRqYAdggaR3A6cB+7d7IlrSRrbHbnedB9zRMMaIiCk1iF/6dZreffTCifZLOtr2Gk1Etk+sHLMQOBi4hOKJ6P+QBHCM7RslnWv7cGBvSacCy8tl6G59TZtyxHAYxnbrpjWFOm+lTb9Ble155csXrGX/4eXfSygSR0REX+tmTUHSUcBfASuAI23fUdn3eOBM4CnArLof6uuiW4/jDV8dKiKijhssDUjalGKKgr0onuU6reWQU4BzbL+oQcvNbhPsO64ulm4lhWGsRUVETGh0VLVLQ7sDV9kesb0UaL2RZy5wiKSFkv6m5lxfkPSi6gYVzqTBLJrdaj5KTWEC6UOIGE5Nm4/a3DZ/qu35lfVNgPuqRVqO3xk4mWIKg29JutL2rWu53H7AJZL+3vZFkh4DLAAeBF5SF+ukkoKkJ9r+XZtdn5zMeSIihkLDpNDgtvllFHdrjml9SPj/gO/YtqQrge2AtknB9s/KmsI3y2GDDgGutH1Sk1gbNR9JeqmkW4Al5foOkj5XCeK8JueJiBgmdv3S0BJgnqTZknYGbm/ZvxjYqXz9fCa4VV/SSymSxkeB9wO/Bq4qv8dfWhdI05rCaRQdIN8FsH2zpJ0mLhIRMeS61Jtq+15J51MMHLoCOFLSfsAc2wsoJjj7bDkL5tW2r5/gdIdWXn+9ZZuBb00US9OksLIMGig6LUjnckTMcN28JdX2WcBZlU13VPb9gtUfHp7oPOv0bFfTu49+UPZ4P1rSC4Dzqck2ERHDzqOqXaaapAMlPaWy/kFJN0u6VNK2deWbJoVjKUYz/Qnwd8C1wAmdBBwRMTS69JxCl30I+F8ASQcBr6cYGeLLwGfrCjcd5mKlpAXAD2xfV45+GhExw/Xl3fijlXHlXgWcbfs64DpJ76or3HSSnTdS1BY2ArYBng38E/CyjkIecnkuIWKG6M+e1RWStgbupfiO/nBl32PrCjf9xX88sAfwB4DyoYktJhdnRMSQ6c/mo+Moxo/7KfAZ2z8BkPQy4Oa6wk3vPnrE9sNjT+VJenSHwUZEDI3p6EiuY3sR8Nw226+Q9MO68k1rCt+QdDqwoaRDgEuBz08q0oiIYdOfNYXVSNpU0pGSvgVM9HwD0Lyj+R8kvZziLe4KfML2ZesW6nBI/0HEDNank+xI2gR4HcWdR1sDmwP72V5SV7Y2KUiaDVxi++VAEkFERKkfZ4eXdAWwFfA14O/LO0Z/0SQhQIPmI9sjwAOSNlu3UCMihkx/Nh/9HlgfmANsNNkRKJp2ND8O+KmkxcADYxttHzaJQCMihksfNh/Zfr2kxwGvAP4aOBfYRNJLgIW2V0xUvmlS+Oi6hTkY0j8QEZPSOsB1n7D9IMUTzF8uE8SBwNuAz1HzOEHTjuar1yVASXMpRv/bnOIRwM8BG1KM8X1ym+M/AOxL8VzEm2z/77pcPyKiJ/qwT6FVmSAWAAvKBDGhpk803wU8kWLmHiiak/4b+AXwt7Z/VHOKY4Gl5esTgX+z/VVJ35C0ne0fV661HbCT7bnluB0nUIy3FBHRX/qw+UjSNTWH7DnRzqbNRxcBl9q+tLzo/sALKXq3Pwu8YIIAD6SYIGJsbtC5FBM/AHyzXP9xpcheFM9BUP49vmGMERFTqh/vPqJojXks8BXgYuD+yRRumhT2sv3XYyu2vynpH2z/XTn/Z/vIioHzjgZey6qk8LjKYE3LgKe3FNsE+Hl5nT9K2qDNeecDp7RuT59AREypPkwKtveQ9KfAQRRTJf+BIkF8zfbv68o3faL5DkmnSdpF0s6SPgz8uhzuYuUE5Q4DLrb9UGXbg5VEsjHFoE1VyyhupaI87oGW/dieb1vVpeH7iIgYerZ/Y/sM4EUUCeGfgLc0Kds0KbyB4sv7fcDJwH3lthFgojk/twcOknQ5sANFZ8ciYP9y//7letX3WvYvbhhjRMSU0qhqlymPSZol6cWSzqKY+3kH4ADbn2hSvundRw9KOofibqHx+RRsj7LmL/1quRMrgS4EDqZo7/q8pOPK891S7j/X9uG2b5F0U/lMxHLgjU1ijIiYcn3YfAT8DriT4pbUiyii3FDSSwFsr/sczd2YT8H2vMrqfm32H155fTJFjSQion/1Z1K4hCKyp7Nmn62pmUq5aUfz8cDuwH9CMZ+CpMynEBEzWj/efWT7zU2Ok3S07U+3bm/ap/CI7Ycp82LmU4iIoF/HPmrqre02Zj6FiIgOabR+6WNte8Ezn0JERKcG+274tvWYpn0KlEkgiSAiYkx/Nw/VmXxNoRzzaOxt/wmrj310t+0tuxZeRMSA6ceO5kn4ZLuNEyYF21sASPoUcFll7KMDKMYoioiYufo4KUjameLO0a0ovusF2PaeFC/Oa1euafPRPrb/ZmzF9iWSTl2niCMiBlyf1xS+SDHC9I+YxMwPTZPC7ZL+kWKYCoBDKIbOjoiYufr77qLf2b54soWaJoW/BN5BMfaRgO/TZpTSiIiZpM9rCidK+gqwEHhkbKPtz0xUqPHYR8DH1iW6iIiYUu8H/kgx42V3mo8knVf3yHSTYyIihlJ/1xSebHuHyRaqqym8UtIFE+wXsNtkLxoRMQz6vPnoIkkHU4xAUW0+emTtReqTwmsaXPifGxwTETF8+jspvKn8+xGKSMWq0VPXqu45hau7ElpExBDq57GNbG/dSbmmA+JFRESrPh4lVdL7JG1cWd9E0nvryiUpRER0SK5fGp9LOkrSNZKulrRNm/2zJd0m6d0NT/k62/eNrdheBhxUV6hxUpD0XEmvLl9vJGnzpmUjIoZSl2oKkjYFjqAYPuh44LQ2hx0B/GwS0c2WtEHlGo8H1qsr1HQ6zhOBfYGtga8BGwP/Duw9iQAjIoZL95qHdgeusj0CLJX0zOpOSY8BXgl8BXhCw3N+Clgk6cIy0kOAM+sKNa0pHAq8HHgAwPZ/A3PqCknaUtKSsjp0jaQdJL1H0sJy+a2kd7Up90DlmCZ3QEVETLmmk+xIcssyv+VUmwD3VU/dsv+dwNlMIg2VTy4fA6wPPAY4xvbZdeWaDnPxiO1RSWPTcdYmhNLdwB5l2X2BE2z/JcUtUki6HviPNuV+YXtew2tEREyPhl/Rdu1sPMuA7Svr4/c1ld+382x/TNKbm4Ym6WPAThStOwDzJV1n+4SJyjVNCp8pqyBPkPQeimrIx+sKlVWhMRsBN1QCfjZwv+3ftCm6laSrgd8Cf2v7fxvGGRExZbr48NoS4GRJs4Edgdsr+54NbCbpcuDJwHqSbrL97ZpzHgA8x/bYj/l/BW4G1j0p2P43SdcCL6So1rzB9i1Nykp6DvBvFGN6v66y6w0UQ7u283Tb90j6C+B0Vj2EMXbO+WRAvoiYbl1KCrbvlXQ+sAhYARwpaT9gju0FFH0OlDWFzRokBIAfA88CbivXnwUsrSukMom03yk9ZaLCZd9CI5KeB5xte+zN3QrMtX3vBGUE3Gh7xwbn98hd2zYNJyJmsNlb3N6kSWdCkrzd8Z+oPe7Hpx+3ztfqRNk8vz2rksKfATdSJJ3xyXZa1dUUFlDkwkdTVGluo+icfhZFU9CuNUGtb/vhcnUZ5XSeknYD7miXEMpbqB4qm572An5ZE2NExPTo72EuOrpJp26Yiz0Ayv6EI2zfXK7vCKxx11Abu0r6EEWniYDjyu2HAqsNtCfpDIomoW2Az0paDqwEjmr8biIiplA/D4hn+1edlGva0bz9WEIoL3aTpOc3CGoxsE+b7ce22XZM+fI6YOeGcUVETJ8+TgqdapoUrpG0APhSuX4w8IPehBQRMRj6uabQqaZJ4SiKO4fmUjQDfRn4aq+CiogYCDM4KTwK+Hq5VLdNOFlDRMRQm8FJ4aesevvrA38C/JpiLKSIiBmpn+dT6FTTh9dW+/IvO5nf3IuAIiIGxTD2KXQ0n4LtpcCfdzmWiIjB0seT7HSq6dDZp7Hq7c2ieJBtMuN6R0QMnWGsKTTtU7it8noEuBRY3P1wIiIGyAxOCpvZPr26QdLxFIPVRUTMTEOYFJr2KbyhzbY3djOQiIhB03SSnUEyYU1B0pHAW4BnSrqmsmtD4NZeBhYR0e80wSjTg6qu+egrwHeBU4GTK9uX276nZ1FFRAyC4csJtUlhju1fliOdrl/Zvr6kJ9j+rx7GFhHR12bi3UfvA95GMWF0KwP7dj2iiIhBMdOSgu23lS+Paq0VSHpmz6KKiBgAw1hTaHr30YUNt0VEzBgz8e6jnSmm3HyCpLdVdm1UVzYiYugNYU2h7ot9Q+BJFHM0b1HZvhx4fa+CiogYBMPYfFTXp3A1cLWkz9i+a4piiogYDDPwOYUxsyT9K/BnVG5Ntb3nRIUkbQlcBDwErAe8nWL+5fdTzMcA8HLbf2wpdxTwV8AK4EjbdzSMMyJiysy4mkLF54Ezgb2BQyjmUnioQbm7gT1sj0raFzgB+A7wadsfb1dA0qbAEcCewE7AaaSpKiL6kEamO4Lua3r30RzbFwGjtn9s+93APnWFbI/YHut/3wi4oXz9FkmLykH1Wu0OXFWWXQrk1teI6E9DOJ9C06TwsCQBP5f0ZkkvATZvUlDSc8pxk84EFgFfA55L8eDbXmUNomoT4L7qKdqcc74kV5eG7yMiomvk+mXQNE0KxwIbAO8E5gFH03A6Ttu3ln0PBwJn2r6vrAWsAP4D2KWlyDJgTmV9jTt9bc+3rerS8H1ERHSPXb8MmKZzNC8pXy5nEnMzS1rf9sPl6jLgQUlzbN9fbtsHuKSl2BLgZEmzKWZ4u73p9SIiptIg1gTq1D28di0TtIrV3X0E7FoOpjdK0Qx0HHC8pJdRzOB2PUVzEpLOAE6xfa+k8ymamlYARzZ8LxERU2umJQWKO406Znsxa3ZIX8/qw3CPHXtM5fVZwFnrcu2IiF7T6PBlhbqH1341VYFERAyaGdd8FBERE0hSiIiIMakpRETEKjOtTyEiIiYwfDmh8cNrERHRQqOuXRqfSzpK0jWSrpa0Tcu+88t9SyT9VdffSEVqChERHepWn0KDgUA/aPt2SesDN0m6oBwVouuSFCIiOtW95qPxgUCBpZJWGwjU9tjIDmOJoGcTfab5KCKiQ7JrF4DWATwlzW85Ve1AoKUTgQvL5NETqSlERHSq4e/1BoN2LgO2n+jMkt4APA84tGF0HUlNISKiQ13saF4CzJM0W9LOtAwEKmk/isFI31SZo6YnUlOIiOhUl4bGbjcQaJkI5theAJwH3AlcUUxtwyG2/6crF2+RpBAR0aFuPtHcZiDQOyr7ntS9K00sSSEiolMDOIlOnSSFiIgOqaet+9MjSSEiolOpKURExJgZN8lORERMIDWFiIgYlz6FiIgYo9QUJkfSlsBFwEPAesDbgX2AN1I8oHG97Xe2KfcA8MNy9ZO2L+plnBERHUlSmLS7gT1sj0raFzgBmA98yrYlXShpru3FLeV+YXtej2OLiFg3Q5gUejr2ke2RyjgdGwE32L7DHv+XXAm0G+1vq3KiiQskbd66U9L81lEHe/QWIiLWSiOuXQZNzwfEk/QcSdcAZ1KM6zG2fW/gibavbVPs6bb3Ab4GnN660/Z826ouvYo/ImKt7PplwPQ8Kdi+1faewIEUiQFJzwU+Chy2ljL3lC+/DOzY6xgjIjoyhEmh1x3N69t+uFxdBjwo6SkUI/79he3/a1NmA+ChchKJvYBf9jLGiIiODeCXfp1edzTvKulDFHfzCjgO+EfgCcC55RCwH7F9uaQzgFOAbYDPSlpO0edwVI9jjIjoTJ5TmJzyrqJ9WjYfspZjjylfXgfs3Mu4IiK6QaPDlxXy8FpERKcy9lFERIxLn0JERIxLUoiIiHFJChERMS59ChERMW603Sg9gy1JISKiU6kpRETEuPQpRETEuCSFiIgYl6QQERHjMsxFRESMS1KIiIhxufsoIiLGrJpteHgkKUREdCo1hYiIGJe7jyIiYtxIhrmIiIiSc/dRRESMG8Lmo1m9PLmkLSUtkXS1pGsk7SDpcZK+JGmRpLMlrRGDpKPK46+WtE0vY4yI6Nio65cB09OkANwN7GF7H+B9wAnAEcAS23sBK4CXVwtI2rQ8Zi/geOC0HscYEdEZj9YvA6anScH2iFfdyLsRcAPFl/2l5bZLy/Wq3YGryrJLgWf2MsaIiE551LVLUxO1kEjaptLiclTX30hFr2sKSHqOpGuAM4FFwCbAfeXuZcCmLUWq+wHU5pzzJbm69CD0iIgJeWSkdmmiQQvJR4DjgL2BI8rje6LnScH2rbb3BA6kSAzLgDnl7o2Be1uKVPcDrFH/sj3ftqpLD0KPiJhY95qP6lpItrV9ne2VwEJgt+69idX19O4jSevbfrhcXQY8CHwPeAXwU2B/4IqWYkuAkyXNBnYEbm96vdlbND40ImJd/eo7/spTGxx3f5vWjFNtz6+s17WQVNfbtbB0Ta9vSd1V0ocofu2LovpzG3CepEXArcBlAJLOAE6xfa+k8ymamlYARza50FhtQZIHqeaQeHsr8fbWTI7X9tO6cZ7SMmD7ynprFaO6vjFwYxevvRp5yO6znckf0qmQeHsr8fZWv8Zb9hFcCsylaCE5yfbrK/u/TNHPcBOwGHiF7dam967Iw2sREdOsXQuJpP2AObYXACcB5wDrAef1KiFAagrTLvH2VuLtrcQ7fHp+99E0OHW6A5ikxNtbibe3Eu+QGbqaQkREdG4YawoREdGhoUoKki6UtLBcHpC0Q7m9LwfYk/R4SedK+q6kq8ptT5N0T+V97DHdcY5pF2+5/QOSFku6TNLm0xljlaTzJF1X/jueXdn+QOXf9zXTGeOYCWLty88ugKTZkm6T9O5yvW8/u7BmvOW2vvzsTqehuvvI9iEAkjYDrrZ9c+Xx8T2BnShu63r92s8ypU4BzrG9uGX7tbYPmI6AaqwRr6TtgJ1sz5V0EMWgh383XQG2cVT5hGjVL2zPm45gaqwWa59/dqGI7Wct2/r1swst8Q7AZ3daDFVNoeL1wJfL1/08wN5c4JDyV9XfVLbvWg4tfpakDaYruDbaxds6wOHcaYls7f6ljPellW1blb+8L+izX4etsfbtZ1fSY4BXAl9p2dWXn921xNvvn91pMaxJ4VDggvJ17QB702hn4OvAi4HXSnoOcBfwjHJo8Z8DJ05jfK3axTv+72v7j0DffBEAx9veHfgL4OOSxsbUeno5nPvXgNOnLbrVtYu1nz+77wTOBqp3qvTzZ7ddvP382Z02A9d8JGkL4KI2uz5s+2JJWwGPsf1f5fa6x8d7aqJ4gf8DvmPbkq4EtrN9KzA2XtSFwKenJtLCZOOlMoBh+WvsgamKtbzmhJ8HANu/kzT2S/uHtu8pj/ky8N6pibSjWPv1s3saMM/2xyS9eWxjOc5ZP35228bLNH92+9XAJQXbdwEvmOCQQyk+kGM6HmCvGyaKV9IbKNqKrweeD3xT0ka2f18eMg+4YyriHDPZeIFbKBLG2RQDHLb2j/RUTbxzbN9f/g+/M/CrsknjIdsjFM0Hv+zXWCnav/vusytpd2AzSZcDTwbWk3QTxeRZfffZnSDe7zGNn91+NXBJoYGDKYbpBto/Pj5dgbXxHuCzkh5L0TF+vaQDJJ0KLC+Xw6c1wtWtES+ApJskLaaI943TGWCLL5bNMOsB/xR0tLgAAAUNSURBVFz+Ct+F4j0sB1YCPZ2wZBLWiBWgHz+7tpdQ9HdQ/vLezPa3+/Wzu7Z4y/V+/exOmzy8FhER44a1ozkiIjqQpBAREeOSFCIiYlySQkREjEtSiIiIcUkKERExLklhhipHtPzPLp/z7eXAYkg6RtKknoOR9JGWJ07rjl8h6UZJj26z70JJ8yZz/X4m6dXVUVIlfVPScknPns64YvgkKUTX2D7L9tiAY8fQ+4cj77H9PNuP9OLkkmZJ6pf/R14NjCcF2/sDraO/RqyzfvnAxzSStHn5y/NmSVdI+pNy+0JJH1Uxzv/15bhSSHqmpKXlr/RPSFpYbp9f1hbeDmwJLJG0oLVWomLugP3K10dL+q/yHE+vHLOHinHur5f0xXL4h7r38Q8qxsu/DNi0sv1ASUvKeM+sbP9gefzlkr49VrOQdLekj1N86W4xQfm3Svph+e920gRxbSjpC+WxS8qnqpH0qnL9BkmXlE80j9Wybiuftj1D0vMpRvj8lzKGDev+LSI6laQQUMxbe4XtHShGQf1gZd9y27sAXwDeXm47A3i/7efRZpA222cBdwK72z54bReV9GSKGsUuFF96zy+3Pxr4KHCA7Z2BnwBvmegNSNoNeBHFAHJHUo6Bo2Jo7HcBe5fxblAOx7Abxfg821GMs7975XRPBL5dXvuRtZTfjmK02N0pxoN6oaTq4HVV7wcW2N6VYmyufy23fw94ge2dgCtZ9e/7Xopx/ncETimHzb4YeEdZM/rDRP8WEetiGMc+isnbE5hfvv53imGGx1xc/r2B4ssTYEfbl5WvF1B+mXdgV4pRV/8AUP7CB3gWxZf7QkkAj6YYfG8iewIX2V4B3ClpUbn9BcAOFLUWgMcCN1E0xXzd9sqW46FIhFfUlN+6vOb15XGPB7YFftQmthcBL5P0gXJ94/LvVsCXJD2pPO/3y+3XA5+T9BWKJB0xZZIUAopx+tc2CNbYUMijtP+8NBnjf4TVa6Xrr6WsKn9/aPulNDfRub5u+22r7ZSOZfX3XC3/QMv2duXfBXza9ocbxvZy23e2bD8TONX2lZIOBN5Qbt8feCFwEHA0RY0mYkqk+Sig+IV6SPn6MOqHEL5Z0svK1wet5ZjlwFjb993An0raoGw3H5vh6gfAi8s29w2B/crttwFbjzXHlPu3bvAeXiNpPUlbVq7xn8CLyqaqsf6TLYBrgFdJelT5S/3P13LetZW/kmIWuo3L7U/Vqkl8Wn0HeMfYisq5w4GNgN+WndlvLPfNAv7U9neA44A/K4+t/ntG9ExqCgFF09H5kt4K/A/1QwgfC3xBxTDJi4DftznmHGCRpJtsHyzpDIomqJ8ANwPY/m25fSnFvAbfL7c/Iukw4GwV8x+You/hF2sLyPYPJH23PPdPKBNbOVz2O4CLJa1HUfN5s+0lZZPRjyiS0A/bvY8Jyv9Y0unA91S0K90PvK782+oDwKck3Uzx/9w3yjhPpWgWu5MiQT4ZmA1cUCZJsWoSoC+V/x7vBfZKv0L0SobOjkmT9DjbD5av3wc8yvb8aYjjf2w/aR3Kb2D7AUmbAdcCOw/Sl215x9bbbd823bHE8EhNITqxZ/kreRbFr/c3TVMcI5JuBHbr8FmFz0l6BsXENu8fsITwTYpbeFdOdywxXFJTiOiSsg/k8y2bb7DdFzOQRTSRpBAREeNy91FERIxLUoiIiHFJChERMS5JISIixiUpRETEuP8Pt/wCK5Ga1SwAAAAASUVORK5CYII=n”, “text/plain”: [

“<Figure size 432x288 with 2 Axes>”

]

}, “metadata”: {

“needs_background”: “light”

}, “output_type”: “display_data”

}

], “source”: [

“mask.plot();”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007343, “end_time”: “2020-02-11T21:00:33.291512”, “exception”: false, “start_time”: “2020-02-11T21:00:33.284169”,
“exception”: false, “start_time”: “2020-02-11T23:09:49.999331”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“### Ravel”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.006591, “end_time”: “2020-02-11T21:00:33.305412”, “exception”: false, “start_time”: “2020-02-11T21:00:33.298821”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.014331”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“For ND-array to be used as a feature, it has to be ravelled, flatten, along the N-1 dimensions that are not the vertical one. This operation will thus transform any ND-array into a 2D-array (sampling and vertical_axis dimensions) and additionnaly drop profiles according to the PCM mask determined above.n”, “n”, “This operation is conducted by pyxpcm.pcm.ravel().n”, “n”, “The output 2D-array is a xarray.DataArray that can be chunked along the sampling dimension with the PCM constructor option chunk_size:”

]

}, {

“cell_type”: “code”, “execution_count”: 4, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.326244, “end_time”: “2020-02-11T21:00:33.638566”, “exception”: false, “start_time”: “2020-02-11T21:00:33.312322”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.028681”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “outputs”: [], “source”: [

“m = pcm(K=3, features=features_pcm, chunk_size=1e3).fit(ds)”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007209, “end_time”: “2020-02-11T21:00:33.653698”, “exception”: false, “start_time”: “2020-02-11T21:00:33.646489”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.359658”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“By default, chunk_size='auto'.”

]

}, {

“cell_type”: “code”, “execution_count”: 5, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.034957, “end_time”: “2020-02-11T21:00:33.696022”, “exception”: false, “start_time”: “2020-02-11T21:00:33.661065”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.374384”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “outputs”: [

{
“data”: {
“text/html”: [
“<pre>&lt;xarray.DataArray &#x27;TEMP&#x27; (sampling: 2289, depth: 152)&gt;n”, “dask.array&lt;rechunk-merge, shape=(2289, 152), dtype=float32, chunksize=(1000, 152), chunktype=numpy.ndarray&gt;n”, “Coordinates:n”, ” * depth (depth) float32 -1.0 -3.0 -5.0 -10.0 … -1960.0 -1980.0 -2000.0n”, ” * sampling (sampling) MultiIndexn”, ” - latitude (sampling) float64 30.02 30.02 30.02 30.02 … 49.74 49.74 49.74n”, ” - longitude (sampling) float64 -70.0 -69.5 -69.0 -68.5 … -41.0 -40.5 -40.0n”, “Attributes:n”, ” long_name: Temperature n”, ” standard_name: sea_water_temperaturen”, ” units: degree_Celsiusn”, ” valid_min: -23000n”, ” valid_max: 20000</pre>”

], “text/plain”: [

“<xarray.DataArray ‘TEMP’ (sampling: 2289, depth: 152)>n”, “dask.array<rechunk-merge, shape=(2289, 152), dtype=float32, chunksize=(1000, 152), chunktype=numpy.ndarray>n”, “Coordinates:n”, ” * depth (depth) float32 -1.0 -3.0 -5.0 -10.0 … -1960.0 -1980.0 -2000.0n”, ” * sampling (sampling) MultiIndexn”, ” - latitude (sampling) float64 30.02 30.02 30.02 30.02 … 49.74 49.74 49.74n”, ” - longitude (sampling) float64 -70.0 -69.5 -69.0 -68.5 … -41.0 -40.5 -40.0n”, “Attributes:n”, ” long_name: Temperature n”, ” standard_name: sea_water_temperaturen”, ” units: degree_Celsiusn”, ” valid_min: -23000n”, ” valid_max: 20000”

]

}, “execution_count”: 5, “metadata”: {}, “output_type”: “execute_result”

}

], “source”: [

“X, z, sampling_dims = m.ravel(ds[‘TEMP’], dim=’depth’, feature_name=’TEMP’)n”, “X”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007559, “end_time”: “2020-02-11T21:00:33.711220”, “exception”: false, “start_time”: “2020-02-11T21:00:33.703661”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.418226”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“See the chunksize of the dask.array.Array for this feature.”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007467, “end_time”: “2020-02-11T21:00:33.726199”, “exception”: false, “start_time”: “2020-02-11T21:00:33.718732”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.433249”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“### Interpolate”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007477, “end_time”: “2020-02-11T21:00:33.741072”, “exception”: false, “start_time”: “2020-02-11T21:00:33.733595”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.447986”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“Even if input data vertical axis are in the range of the PCM feature axis, they may not be defined on similar level values. In this step, if the input data are not defined on the same vertical axis as the PCM, an interpolation is triggered. The interpolation is conducted following these rules:n”, “n”, “- If PCM axis levels are found into the input data vertical axis, then a simple intersection is used.n”, “- If PCM axis starts at the surface (0 value) and not the input data, the 1st non-nan value is replicated to the surface, as a mixed layer.n”, “- If PCM axis levels are not in the input data vertical axis, a linear interpolation through the xarray.DataArray.interp() method is triggered for each profiles.n”, “n”, “The entire interpolation processed is managed by a pyxpcm.utils.Vertical_Interpolator instance that is created at the time of PCM instanciation.”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007292, “end_time”: “2020-02-11T21:00:33.756155”, “exception”: false, “start_time”: “2020-02-11T21:00:33.748863”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.462945”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“Scalen”, “—–”

]

}, {

“cell_type”: “raw”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007415, “end_time”: “2020-02-11T21:00:33.771215”, “exception”: false, “start_time”: “2020-02-11T21:00:33.763800”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.477820”,
>>>>>>> Stashed changes
“status”: “completed”

}, “raw_mimetype”: “text/restructuredtext”, “tags”: []

}, “source”: [

“Each variable can be normalised along a vertical level. This step ensures that structures/patterns located at depth in the profile, will be considered similarly to those close to the surface by the classifier.n”, “n”, “Scaling is defined at the PCM creation (pyxpcm.models.pcm) with the option scale. It is an integer value with the following meaning:n”, “n”, ” - 0: No scalingn”, ” - 1: Center on sample mean and scale by sample stdn”, ” - 2: Center on sample mean only”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.00727, “end_time”: “2020-02-11T21:00:33.786008”, “exception”: false, “start_time”: “2020-02-11T21:00:33.778738”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.493762”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“## Reducen”, “n”, “[TBC]”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“papermill”: {
<<<<<<< Updated upstream
“duration”: 0.007432, “end_time”: “2020-02-11T21:00:33.801243”, “exception”: false, “start_time”: “2020-02-11T21:00:33.793811”,
“exception”: false, “start_time”: “2020-02-11T23:09:50.508950”,
>>>>>>> Stashed changes
“status”: “completed”

}, “tags”: []

}, “source”: [

“## Combinen”, “n”, “[TBC]”

]

}

], “metadata”: {

“kernelspec”: {
“display_name”: “obidam36”, “language”: “python”, “name”: “obidam36”

}, “language_info”: {

“codemirror_mode”: {
“name”: “ipython”, “version”: 3

}, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.6.7”

}, “papermill”: {

<<<<<<< Updated upstream
“duration”: 5.112727, “end_time”: “2020-02-11T21:00:34.335987”,
>>>>>>> Stashed changes
“environment_variables”: {}, “exception”: null, “input_path”: “preprocessing.ipynb”, “output_path”: “../preprocessing.ipynb”, “parameters”: {},
<<<<<<< Updated upstream
“start_time”: “2020-02-11T21:00:29.223260”,
“version”: “1.2.1”

}, “toc-showmarkdowntxt”: false

}, “nbformat”: 4, “nbformat_minor”: 4

}

Debugging and performances

Import and set-up

Import the library and toy data

[2]:
import pyxpcm
from pyxpcm.models import pcm

# Load a dataset to work with:
ds = pyxpcm.tutorial.open_dataset('argo').load()

# Define vertical axis and features to use:
z = np.arange(0.,-1000.,-10.)
features_pcm = {'temperature': z, 'salinity': z}
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}

Debugging

Use option debug to print log messages

[3]:
# Instantiate a new PCM:
m = pcm(K=8, features=features_pcm, debug=True)

# Fit with log:
m.fit(ds, features=features_in_ds);
> Start preprocessing for action 'fit'

        > Preprocessing xarray dataset 'TEMP' as PCM feature 'temperature'
         [<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (282,))] X RAVELED with success
                Output axis is in the input axis, not need to interpolate, simple intersection
         [<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (100,))] X INTERPOLATED with success)
         [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X SCALED with success)
         [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X REDUCED with success)
        temperature pre-processed with success,  [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None]
        Homogenisation for fit of temperature

        > Preprocessing xarray dataset 'PSAL' as PCM feature 'salinity'
         [<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (282,))] X RAVELED with success
                Output axis is in the input axis, not need to interpolate, simple intersection
         [<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (100,))] X INTERPOLATED with success)
         [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X SCALED with success)
         [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X REDUCED with success)
        salinity pre-processed with success,  [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None]
        Homogenisation for fit of salinity
        Features array shape and type for xarray: (7560, 30) <class 'numpy.ndarray'> <class 'memoryview'>
> Preprocessing done, working with final X (<class 'xarray.core.dataarray.DataArray'>) array of shape: (7560, 30)  and sampling dimensions: ['N_PROF']

Performance / Optimisation

Use timeit and timeit_verb to compute computation time of PCM operations

Times are accessible as a pandas Dataframe in timeit pyXpcm instance property.

The pyXpcm m.plot.timeit() plot method allows for a simple visualisation of times.

Time readings during execution
[4]:
# Create a PCM and execute methods:
m = pcm(K=8, features=features_pcm, timeit=True, timeit_verb=1)
m.fit(ds, features=features_in_ds);
  fit.1-preprocess.1-mask: 20 ms
  fit.1-preprocess.2-feature_temperature.1-ravel: 29 ms
  fit.1-preprocess.2-feature_temperature.2-interp: 0 ms
  fit.1-preprocess.2-feature_temperature.3-scale_fit: 7 ms
  fit.1-preprocess.2-feature_temperature.4-scale_transform: 4 ms
  fit.1-preprocess.2-feature_temperature.5-reduce_fit: 10 ms
  fit.1-preprocess.2-feature_temperature.6-reduce_transform: 2 ms
  fit.1-preprocess.2-feature_temperature.total: 55 ms
  fit.1-preprocess: 56 ms
  fit.1-preprocess.3-homogeniser: 1 ms
  fit.1-preprocess.2-feature_salinity.1-ravel: 25 ms
  fit.1-preprocess.2-feature_salinity.2-interp: 0 ms
  fit.1-preprocess.2-feature_salinity.3-scale_fit: 7 ms
  fit.1-preprocess.2-feature_salinity.4-scale_transform: 4 ms
  fit.1-preprocess.2-feature_salinity.5-reduce_fit: 9 ms
  fit.1-preprocess.2-feature_salinity.6-reduce_transform: 2 ms
  fit.1-preprocess.2-feature_salinity.total: 51 ms
  fit.1-preprocess: 51 ms
  fit.1-preprocess.3-homogeniser: 1 ms
  fit.1-preprocess.4-xarray: 0 ms
  fit.1-preprocess: 132 ms
  fit.fit: 2675 ms
  fit.score: 9 ms
  fit: 2817 ms
A posteriori Execution time analysis
[5]:
# Create a PCM and execute methods:
m = pcm(K=8, features=features_pcm, timeit=True, timeit_verb=0)
m.fit(ds, features=features_in_ds);
m.predict(ds, features=features_in_ds);
m.fit_predict(ds, features=features_in_ds);

Execution times are accessible through a dataframe with the pyxpcm.pcm.timeit property

[6]:
m.timeit
[6]:
Method       Sub-method    Sub-sub-method         Sub-sub-sub-method
fit          1-preprocess  1-mask                 total                   19.667864
                           2-feature_temperature  1-ravel                 28.775930
                                                  2-interp                 0.649929
                                                  3-scale_fit             10.529041
                                                  4-scale_transform        4.294872
                                                  5-reduce_fit            11.734009
                                                  6-reduce_transform       2.493382
                                                  total                   58.599949
                           total                                         232.898712
                           3-homogeniser          total                    2.095938
                           2-feature_salinity     1-ravel                 20.290136
                                                  2-interp                 0.611067
                                                  3-scale_fit              7.843971
                                                  4-scale_transform        4.334688
                                                  5-reduce_fit            10.154247
                                                  6-reduce_transform       2.441883
                                                  total                   45.794964
                           4-xarray               total                    0.997305
             fit           total                                        1721.768379
             score         total                                           8.548021
             total                                                      1859.453201
predict      1-preprocess  1-mask                 total                   18.440008
                           2-feature_temperature  1-ravel                 29.564142
                                                  2-interp                 0.645876
                                                  3-scale_fit              0.000954
                                                  4-scale_transform        4.281998
                                                  5-reduce_fit             0.002146
                                                  6-reduce_transform       2.423048
                                                  total                   37.017107
                           total                                         159.928322
                                                                           ...
                           2-feature_salinity     6-reduce_transform       2.339125
                                                  total                   32.124043
                           4-xarray               total                    1.005888
             predict       total                                           8.524895
             score         total                                           8.650064
             xarray        total                                           6.422997
             total                                                       114.479065
fit_predict  1-preprocess  1-mask                 total                   21.162987
                           2-feature_temperature  1-ravel                 20.349026
                                                  2-interp                 0.633001
                                                  3-scale_fit              0.001907
                                                  4-scale_transform        6.472826
                                                  5-reduce_fit             0.000954
                                                  6-reduce_transform       3.720999
                                                  total                   31.296015
                           total                                         149.966955
                           3-homogeniser          total                    2.356052
                           2-feature_salinity     1-ravel                 23.428917
                                                  2-interp                 0.654697
                                                  3-scale_fit              0.000954
                                                  4-scale_transform        4.308939
                                                  5-reduce_fit             0.001192
                                                  6-reduce_transform       2.245188
                                                  total                   30.743122
                           4-xarray               total                    0.961065
             fit           total                                        1801.964045
             score         total                                           8.421898
             predict       total                                           7.369041
             xarray        total                                           5.813122
             total                                                      1911.516190
Length: 66, dtype: float64

Visualisation help

To facilitate your analysis of execution times, you can use pyxpcm.plot.timeit().

Main steps by method
[7]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-method', style='darkgrid') # Default group/split
df
[7]:
Sub-method 1-preprocess fit predict score xarray
Method
fit 464.207888 1721.768379 NaN 8.548021 NaN
fit_predict 298.304796 1801.964045 7.369041 8.421898 5.813122
predict 318.270445 NaN 8.524895 8.650064 6.422997
_images/debug_perf_16_1.png
Preprocessing main steps by method
[8]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-sub-method')
df
[8]:
Sub-sub-method 1-mask 2-feature_salinity 2-feature_temperature 3-homogeniser 4-xarray
Method
fit 19.667864 91.470957 117.077112 2.095938 0.997305
fit_predict 21.162987 61.383009 62.474728 2.356052 0.961065
predict 18.440008 64.152002 73.935270 0.808954 1.005888
_images/debug_perf_18_1.png
Preprocessing details by method
[9]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-sub-sub-method')
df
[9]:
Sub-sub-sub-method 1-ravel 2-interp 3-scale_fit 4-scale_transform 5-reduce_fit 6-reduce_transform
Method
fit 49.066067 1.260996 18.373013 8.629560 21.888256 4.935265
fit_predict 43.777943 1.287699 0.002861 10.781765 0.002146 5.966187
predict 54.390907 1.282930 0.002146 8.505106 0.002861 4.762173
_images/debug_perf_20_1.png
Preprocessing details by features
[10]:
fig, ax, df = m.plot.timeit(split='Sub-sub-sub-method', group='Sub-sub-method', unit='s')
df
[10]:
Sub-sub-sub-method 1-ravel 2-interp 3-scale_fit 4-scale_transform 5-reduce_fit 6-reduce_transform
Sub-sub-method
2-feature_salinity 0.068546 0.001903 0.007846 0.012867 0.010156 0.007026
2-feature_temperature 0.078689 0.001929 0.010532 0.015050 0.011737 0.008637
_images/debug_perf_22_1.png

Help & reference

Bibliography

What’s New

v0.4.1 (21 Feb. 2020)

  • Improved documentation
  • Improved unit testing
  • Bug fix:
    • Fix a bug in the preprocessing step using dask_ml bakend that would cause an error for data already in dask arrays

v0.4.0 (1 Nov. 2019)

Warning

The API has changed, break backward compatibility.

  • Enhancements:

    • Multiple-features classification
    • ND-Array classification (so that you can classify directly profiles from gridded products, eg: latitude/longitude/time grid, and not only a collection of profiles already in 2D array)
    • pyXpcm methods can be accessed through the xarray.Dataset accessor namespace pyxpcm
    • Allow to choose statistic backends (sklearn, dask_ml or user-defined)
    • Save/load PCM to/from netcdf files
  • pyXpcm now consumes xarray/dask objects all along, not only on the user front-end. This add a small overhead with small dataset but allows for PCM to handle large and more complex datasets.

v0.3 (5 Apr. 2019)

  • Removed support for python 2.7
  • Added more data input consistency checks
  • Fix bug in interpolation and plotting methods
  • Added custom colormap and colorbar to plot module

v0.2 (26 Mar. 2019)

  • Upgrade to python 3.6 (compatible 2.7)
  • Added test for continuous coverage
  • Added score and bic methods
  • Improved vocabulary consistency in methods

v0.1.3 (12 Nov. 2018)

  • Initial release.

API reference

This page provides an auto-generated summary of pyXpcm’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Top-level PCM functions

Creating a PCM
pcm(K, features[, scaling, reduction, …]) Profile Classification Model class constructor
pyxpcm.load_netcdf(ncfile) Load a PCM model from netcdf file
Attributes
pcm.K Return the number of classes
pcm.F Return the number of features
pcm.features Return features definition dictionnary
Computation
pcm.fit(self, ds[, features, dim]) Estimate PCM parameters
pcm.fit_predict(self, ds[, features, dim, …]) Estimate PCM parameters and predict classes.
pcm.predict(self, ds[, features, dim, …]) Predict labels for profile samples
pcm.predict_proba(self, ds[, features, dim, …]) Predict posterior probability of each components given the data
pcm.score(self, ds[, features, dim]) Compute the per-sample average log-likelihood of the given data
pcm.bic(self, ds[, features, dim]) Compute Bayesian information criterion for the current model on the input dataset

Low-level PCM properties and functions

pcm.timeit Return a pandas.DataFrame with Execution time of method called on this instance
pcm.ravel(self, da[, dim, feature_name]) Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z
pcm.unravel(self, ds, sampling_dims, X) Create a DataArray from a numpy array and sampling dimensions

Plotting

pcm.plot Access plotting functions
Plot PCM Contents
plot.quantile(m, da[, xlim, classdimname, …]) Plot q-th quantiles of a dataArray for each PCM components
plot.scaler(m[, style, plot_kw, subplot_kw]) Plot PCM scalers properties
plot.reducer(m[, pcalist, style, maxcols, …]) Plot PCM reducers properties
plot.preprocessed(m, ds[, features, dim, n, …]) Plot preprocessed features as pairwise scatter plots
plot.timeit(m[, group, split, subplot_kw, …]) Plot PCM registered timing of operations
Tools
plot.cmap(m, name[, palette, usage]) Return categorical colormaps
plot.colorbar(m[, cmap]) Add a colorbar to the current plot with centered ticks on discrete colors
plot.subplots(m[, maxcols, K, subplot_kw]) Return (figure, axis) with one subplot per cluster
plot.latlongrid(ax[, dx, dy, fontsize]) Add latitude/longitude grid line and labels to a cartopy geoaxes

Statistics

pcm.stat Access statistics functions
stat.quantile(ds[, q, of, using, outname, …]) Compute q-th quantile of a xarray.DataArray for each PCM components
stat.robustness(ds[, name, classdimname, …]) Compute classification robustness
stat.robustness_digit(ds[, name, …]) Digitize classification robustness

Save/load PCM models

pcm.to_netcdf(self, ncfile, \*\*ka) Save a PCM to a netcdf file
pyxpcm.load_netcdf(ncfile) Load a PCM model from netcdf file

Helper

tutorial.open_dataset(name) Open a dataset from the pyXpcm online data repository (requires internet).

Xarray pyxpcm name space

Provide accessor to enhance interoperability between xarray and pyxpcm.

Provide a scope named pyxpcm as accessor to xarray.Dataset objects.

class pyxpcm.xarray.pyXpcmDataSetAccessor[source]

Class registered under scope pyxpcm to access xarray.Dataset objects.

add(self, da)[source]

Add a xarray.DataArray to this xarray.Dataset

bic(self, this_pcm, **kwargs)[source]

Compute Bayesian information criterion for the current model on the input dataset

Only for a GMM classifier

Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
bic: float

The lower the better

drop_all(self)[source]

Remove xarray.DataArray created with pyXpcm front this xarray.Dataset

feature_dict(self, this_pcm, features=None)[source]

Return dictionary of features for this xarray.Dataset and a PCM

Parameters:
pcm : pyxpcm.pcmmodel.pcm
features : dict

Keys are PCM feature name, Values are corresponding xarray.Dataset variable names

Returns:
dict()

Dictionary where keys are PCM feature names and values the corresponding xarray.Dataset variables

fit(self, this_pcm, **kwargs)[source]

Estimate PCM parameters

For a PCM, the fit method consists in the following operations:

  • pre-processing
    • interpolation to the feature_axis levels of the model
    • scaling
    • reduction
  • estimate classifier parameters
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
self
fit_predict(self, this_pcm, **kwargs)[source]

Estimate PCM parameters and predict classes.

This method add these properties to the PCM object:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray

name: string (‘PCM_LABELS’)

Name of the DataArray holding labels.

Returns:
xarray.DataArray

Component labels (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

mask(self, this_pcm, features=None, dim=None)[source]

Create a mask where all PCM features are defined

Create a mask where all feature profiles are not null over the PCM feature axis.

Parameters:
:class:`pyxpcm.pcmmodel.pcm`
features : dict()

Definitions of this_pcm features in the xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim : str

Name of the vertical dimension in the xarray.Dataset. If not specified or set to None, dim is identified as the xarray.DataArray variables with attributes ‘axis’ set to ‘z’.

Returns:
xarray.DataArray
predict(self, this_pcm, inplace=False, **kwargs)[source]

Predict labels for profile samples

This method add these properties to the PCM object:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted labels If True, return the input xarray.Dataset with labels added as a new xarray.DataArray

name: str, default is ‘PCM_LABELS’

Name of the xarray.DataArray with labels

Returns:
xarray.DataArray

Component labels (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with Component labels as a ‘PCM_LABELS’ new xarray.DataArray (if option ‘inplace’ = True)

predict_proba(self, this_pcm, **kwargs)[source]

Predict posterior probability of each components given the data

This method adds these properties to the PCM instance:

  • llh: The log likelihood of the model with regard to new data
Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

inplace: boolean, False by default

If False, return a xarray.DataArray with predicted probabilities If True, return the input xarray.Dataset with probabilities added as a new xarray.DataArray

name: str, default is ‘PCM_POST’

Name of the DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

Returns:
xarray.DataArray

Probability of each Gaussian (state) in the model given each sample (if option ‘inplace’ = False)

or
xarray.Dataset

Input dataset with Component Probability as a ‘PCM_POST’ new xarray.DataArray (if option ‘inplace’ = True)

quantile(self, this_pcm, inplace=False, **kwargs)[source]

Compute q-th quantile of a xarray.DataArray for each PCM components

Parameters:
q: float in the range of [0,1] (or sequence of floats)

Quantiles to compute, which must be between 0 and 1 inclusive.

of: str

Name of the xarray.Dataset variable to compute quantiles for.

using: str

Name of the xarray.Dataset variable with classification labels to use. Use ‘PCM_LABELS’ by default.

outname: ‘PCM_QUANT’ or str

Name of the xarray.DataArray with quantile

keep_attrs: boolean, False by default

Preserve of xarray.Dataset attributes or not in the new quantile variable.

Returns:
xarray.Dataset with shape (K, n_quantiles, N_z=n_features)
or
xarray.DataArray with shape (K, n_quantiles, N_z=n_features)
robustness(self, this_pcm, inplace=False, **kwargs)[source]

Compute classification robustness

Parameters:
name: str, default is ‘PCM_POST’

Name of the xarray.DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

outname: ‘PCM_ROBUSTNESS’ or str

Name of the xarray.DataArray with robustness

inplace: boolean, False by default

If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness added as a new xarray.DataArray

Returns:
xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False
robustness_digit(self, this_pcm, inplace=False, **kwargs)[source]

Digitize classification robustness

Parameters:
ds: :class:`xarray.Dataset`

Input dataset

name: str, default is ‘PCM_POST’

Name of the xarray.DataArray with prediction probability (posteriors)

classdimname: str, default is ‘pcm_class’

Name of the dimension holding classes

outname: ‘PCM_ROBUSTNESS_CAT’ or str

Name of the xarray.DataArray with robustness categories

inplace: boolean, False by default

If False, return a xarray.DataArray with robustness If True, return the input xarray.Dataset with robustness categories added as a new xarray.DataArray

Returns:
xarray.Dataset if inplace=True
or
xarray.DataArray if inplace=False
sampling_dim(self, this_pcm, features=None, dim=None)[source]

Return the list of dimensions to be stacked for sampling

Parameters:
pcm : pyxpcm.pcm
features : None (default) or dict()

Keys are PCM feature name, Values are corresponding xarray.Dataset variable names. It set to None, all PCM features are used.

dim : None (default) or str()

The xarray.Dataset dimension to use as vertical axis in all features. If set to None, it is automatically set to the dimension with an attribute axis set to Z.

Returns:
dict()

Dictionary where keys are xarray.Dataset variable names of features and values are another dictionary with the list of sampling dimension in DIM_SAMPLING key and the name of the vertical axis in the DIM_VERTICAL key.

score(self, this_pcm, **kwargs)[source]

Compute the per-sample average log-likelihood of the given data

Parameters:
ds: :class:`xarray.Dataset`

The dataset to work with

features: dict()

Definitions of PCM features in the input xarray.Dataset. If not specified or set to None, features are identified using xarray.DataArray attributes ‘feature_name’.

dim: str

Name of the vertical dimension in the input xarray.Dataset

Returns:
log_likelihood: float

In the case of a GMM classifier, this is the Log likelihood of the Gaussian mixture given data

split(self)[source]

Split pyXpcm variables from the original xarray.Dataset

Returns:
xarray.Dataset, xarray.Dataset

Two DataSest: one with pyXpcm variables, one with the original DataSet