Cubical complex persistence scikit-learn like interface¶

Since: GUDHI 3.6.0

License: MIT

Requires: Scikit-learn

Cubical complex persistence scikit-learn like interface example¶

In this example, hand written digits are used as an input. a TDA scikit-learn pipeline is constructed and is composed of:

CubicalPersistence that builds a cubical complex from the inputs and returns its persistence diagrams
DiagramSelector that removes non-finite persistence diagrams values
PersistenceImage that builds the persistence images from persistence diagrams
SVC which is a scikit-learn support vector classifier.

This ML pipeline is trained to detect if the hand written digit is an ‘8’ or not, thanks to the fact that an ‘8’ has two holes in \(\mathbf{H}_1\), or, like in this example, three connected components in \(\mathbf{H}_0\).

# Standard scientific Python imports
import numpy as np

# Standard scikit-learn imports
from sklearn.datasets import fetch_openml
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn import metrics

# Import TDA pipeline requirements
from gudhi.sklearn.cubical_persistence import CubicalPersistence
from gudhi.representations import PersistenceImage, DiagramSelector

X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)

# Target is: "is an eight ?"
y = (y == "8") * 1
print("There are", np.sum(y), "eights out of", len(y), "numbers.")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)
pipe = Pipeline(
    [
        ("cub_pers", CubicalPersistence(homology_dimensions=0, newshape=[-1, 28, 28], n_jobs=-2)),
        # Or for multiple persistence dimension computation
        # ("cub_pers", CubicalPersistence(homology_dimensions=[0, 1], newshape=[-1, 28, 28])),
        # ("H0_diags", DimensionSelector(index=0), # where index is the index in homology_dimensions array
        ("finite_diags", DiagramSelector(use=True, point_type="finite")),
        (
            "pers_img",
            PersistenceImage(bandwidth=50, weight=lambda x: x[1] ** 2, im_range=[0, 256, 0, 256], resolution=[20, 20]),
        ),
        ("svc", SVC()),
    ]
)

# Learn from the train subset
pipe.fit(X_train, y_train)
# Predict from the test subset
predicted = pipe.predict(X_test)

print(f"Classification report for TDA pipeline {pipe}:\n" f"{metrics.classification_report(y_test, predicted)}\n")

There are 6825 eights out of 70000 numbers.
Classification report for TDA pipeline Pipeline(steps=[('cub_pers',
                 CubicalPersistence(newshape=[28, 28], n_jobs=-2)),
                ('finite_diags', DiagramSelector(use=True)),
                ('pers_img',
                 PersistenceImage(bandwidth=50, im_range=[0, 256, 0, 256],
                                  weight=<function <lambda> at 0x7f3e54137ae8>)),
                ('svc', SVC())]):
              precision    recall  f1-score   support

           0       0.97      0.99      0.98     25284
           1       0.92      0.68      0.78      2716

    accuracy                           0.96     28000
   macro avg       0.94      0.84      0.88     28000
weighted avg       0.96      0.96      0.96     28000

Cubical complex persistence scikit-learn like interface reference¶

class gudhi.sklearn.cubical_persistence.CubicalPersistence(homology_dimensions, newshape=None, homology_coeff_field=11, min_persistence=0.0, n_jobs=None)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

This is a class for computing the persistence diagrams from a cubical complex.

__init__(homology_dimensions, newshape=None, homology_coeff_field=11, min_persistence=0.0, n_jobs=None)[source]¶

Constructor for the CubicalPersistence class.

Parameters

homology_dimensions¶ (int or list of int) – The returned persistence diagrams dimension(s). Short circuit the use of DimensionSelector when only one dimension matters (in other words, when homology_dimensions is an int).
newshape¶ (tuple of ints) – If cells filtration values require to be reshaped (cf. transform()), set newshape to perform numpy.reshape(X, newshape, order=’C’) in transform() method.
homology_coeff_field¶ (int) – The homology coefficient field. Must be a prime number. Default value is 11.
min_persistence¶ (float) – The minimum persistence value to take into account (strictly greater than min_persistence). Default value is 0.0. Set min_persistence to -1.0 to see all values.
n_jobs¶ (int) – cf. https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

fit(X, Y=None)[source]¶: Nothing to be done, but useful when included in a scikit-learn Pipeline.

transform(X, Y=None)[source]¶

Compute all the cubical complexes and their associated persistence diagrams.

Parameters

X¶ (list of list of float OR list of numpy.ndarray) – List of cells filtration values (numpy.reshape(X, newshape, order=’C’ if newshape is set with a tuple of ints).

Returns

Persistence diagrams in the format:

If homology_dimensions was set to n: [array( Hn(X[0]) ), array( Hn(X[1]) ), …]
If homology_dimensions was set to [i, j]: [[array( Hi(X[0]) ), array( Hj(X[0]) )], [array( Hi(X[1]) ), array( Hj(X[1]) )], …]

Return type

list of (,2) array_like or list of list of (,2) array_like

Cubical complex persistence scikit-learn like interface¶

Cubical complex persistence scikit-learn like interface example¶

Cubical complex persistence scikit-learn like interface reference¶

Table of Contents

This Page