Overview

What is an ocean PCM?

An ocean PCM is a Profile Classification Model for ocean data, a statistical procedure to classify ocean vertical profiles into a finite set of “clusters”. Depending on the dataset, such clusters can show space/time coherence that can be used in many different ways to study the ocean.

Statistic method

It consists in conducting un-supervised classification (clustering) with vertical profiles of one or more ocean variables.

Each levels of the vertical axis of each ocean variables are considered a feature. One ocean vertical profile with ocean variables is considered a sample.

All the details of the Profile Classification Modelling (PCM) statistical methodology can be found in Maze et al, 2017.

Illustration

Given a collection of Argo temperature profiles in the North Atlantic, a PCM analysis is applied and produces an optimal set of 8 ocean temperature profile classes. The PCM clusters synthesize the structural information of heat distribution in the North Atlantic. Each clusters objectively define an ocean region where dynamic gives rise to an unique vertical stratification pattern.

_images/graphical-abstract.png

Maze et al, 2017 applied it to the North Atlantic with Argo temperature data. Jones et al, 2019, later applied it to the Southern Ocean, also with Argo temperature data. Rosso et al (in prep) has applied it to the Southern Indian Ocean using both temperature and salinity Argo data.

pyXpcm

pyXpcm is an Python implementation of the PCM method that consumes and produces Xarray objects (xarray.Dataset and xarray.DataArray), hence the x.

With pyXpcm you can conduct a PCM analysis for a collection of profiles (gridded or not), of one or more ocean variables, stored in an xarray.Dataset. pyXpcm also provides basic statistics and plotting functions to get you started with your analysis.

The philosophy of the pyXpcm toolbox is to create and be able to use a PCM from and on different ocean datasets and variables. In order to achieve this, a PCM is created with information about ocean variables to classify and the vertical axis of these variables. Then this PCM can be fitted and subsequently classify ocean profiles from any datasets, as long as it contains the PCM variables.

The pyXpcm procedure is to preprocess (stack, scale, reduce and combine data) and then to fit a classifier on data. Once the model is fitted pyXpcm can classify data. The library uses many language and logic from Scikit-learn but doesn’t inherit from a sklearn.BaseEstimator.