PCM properties¶
The PCM class does a lot of data preprocessing under the hood in order to classify profiles.
Here is how to access PCM preprocessed results and data.
Import and set-up
Import the library and toy data
[2]:
import pyxpcm
from pyxpcm.models import pcm
Let’s work with a standard PCM of temperature and salinity, from the surface down to -1000m:
[3]:
# Define a vertical axis to work with
z = np.arange(0.,-1000,-10.)
# Define features to use
features_pcm = {'temperature': z, 'salinity': z}
# Instantiate the PCM
m = pcm(K=4, features=features_pcm, maxvar=2)
print(m)
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
Note that here we used a strong dimensionality reduction to limit the dimensions and size of the plots to come (maxvar==2
tell the PCM to use the first 2 PCAs of each variables).
Now we can load a dataset to be used for fitting.
[4]:
ds = pyxpcm.tutorial.open_dataset('argo').load()
Fit and predict model on data:
[5]:
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
ds = ds.pyxpcm.fit_predict(m, features=features_in_ds, inplace=True)
print(ds)
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560)
Coordinates:
* N_PROF (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
PSAL (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
PCM_LABELS (N_PROF) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 3 3 3
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
Scaler properties¶
[6]:
fig, ax = m.plot.scaler()
# More options:
# m.plot.scaler(style='darkgrid')
# m.plot.scaler(style='darkgrid', subplot_kw={'ylim':[-1000,0]})
Reducer properties¶
Plot eigen vectors for a PCA reducer or nothing if no reduced used
[7]:
fig, ax = m.plot.reducer()
# Equivalent to:
# pcmplot.reducer(m)
# More options:
# m.plot.reducer(pcalist = range(0,4));
# m.plot.reducer(pcalist = [0], maxcols=1);
# m.plot.reducer(pcalist = range(0,4), style='darkgrid', plot_kw={'linewidth':1.5}, subplot_kw={'ylim':[-1400,0]}, figsize=(12,10));
Scatter plot of features, as seen by the classifier¶
You can have access to pre-processed data for your own plot/analysis through the pyxpcm.pcm.preprocessing()
method:
[8]:
X, sampling_dims = m.preprocessing(ds, features=features_in_ds)
X
[8]:
<xarray.DataArray (n_samples: 7560, n_features: 4)>
array([[ 1.928166, -0.091499, 1.7341 , -0.270248],
[ 2.314077, 0.106842, 2.083683, -0.18765 ],
[ 1.675512, -0.17313 , 1.563701, -0.432449],
...,
[-0.802601, -0.578377, -1.576134, -0.311841],
[-0.955218, -0.609439, -1.804922, -0.427322],
[-0.892514, -0.623732, -1.792266, -0.465512]], dtype=float32)
Coordinates:
* n_samples (n_samples) int64 0 1 2 3 4 5 ... 7554 7555 7556 7557 7558 7559
* n_features (n_features) <U13 'temperature_0' ... 'salinity_1'
pyXpcm return a 2-dimensional xarray.DataArray
for which pairwise relationship can easily be visualise with the pyxpcm.plot.preprocessed()
method (this requires Seaborn):
[9]:
g = m.plot.preprocessed(ds, features=features_in_ds, style='darkgrid')
# A posteriori adjustements:
# g.set(xlim=(-3,3),ylim=(-3,3))
# g.savefig('toto.png')
[10]:
# Combine KDE with histrograms (very slow plot, so commented here):
g = m.plot.preprocessed(ds, features=features_in_ds, kde=True)
/Users/gmaze/anaconda/envs/obidam36/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval