Classes | |
class | Gudhi::Persistence_representations::Persistence_heat_maps< Scalling_of_kernels > |
A class implementing persistence heat maps. More... | |
class | Gudhi::Persistence_representations::Persistence_landscape |
A class implementing persistence landscapes data structures. More... | |
class | Gudhi::Persistence_representations::Persistence_landscape_on_grid |
A class implementing persistence landscapes by approximating them on a collection of grid points. More... | |
class | Gudhi::Persistence_representations::Vector_distances_in_diagram< F > |
A class implementing persistence vectors. More... | |
class | Gudhi::Persistence_representations::Sliced_Wasserstein |
A class implementing the Sliced Wasserstein kernel. More... | |
In order to perform most of the statistical tests and machine learning algorithms on a data one need to be able to perform only a very limited number of operations on them. Let us fix a representation of data of a type A. To perform most of the statistical and machine learning operations one need to be able to compute average of objects of type A (so that the averaged object is also of a type A), to compute distance between objects of a type A, to vectorize object of a type A and to compute scalar product of a pair objects of a type A.
To put this statement into a context, let us assume we have two collections
Then we shuffle B, and we divide the shuffled version of B into two classes:
The permutation test reminded above can be performed for any type A which can be averaged, and which allows for computations of distances.
The Persistence_representations contains a collection of various representations of persistent homology that implements various concepts described below:
At the moment an implementation of the following representations of persistence are available (further details of those representations will be discussed later):
Note that at the while functionalities like averaging, distances and scalar products are fixed, there is no canonical way of vectorizing and computing real valued characteristics of objects. Therefore the vectorizations and computation of real value characteristics procedures are quite likely to evolve in the furthering versions of the library.
The main aim of this implementation is to be able to implement various statistical methods, both on the level of C++ and on the level of python. The methods will operate on the functionalities offered by concepts. That means that the statistical and ML methods will be able to operate on any representation that implement the required concept (including the ones that are not in the library at the moment). That gives provides a framework, that is very easy to extend, for topological statistics.
Below we are discussing the representations which are currently implemented in Persistence_representations package:
Reference manual: Gudhi::Persistence_representations::Persistence_landscape
Persistence landscapes were originally proposed by Bubenik in [11]. Efficient algorithms to compute them rigorously were proposed by Bubenik and Dlotko in [10]. The idea of persistence landscapes is shortly summarized in below.
To begin with, suppose we are given a point
A persistence landscape of the birth-death pairs
The detailed description of algorithms used to compute persistence landscapes can be found in [10]. Note that this implementation provides exact representation of landscapes. That have many advantages, but also a few drawbacks. For instance, as discussed in [10], the exact representation of landscape may be of quadratic size with respect to the input persistence diagram. It may therefore happen that, for very large diagrams, using this representation may be memory–prohibitive. In such a case, there are two possible ways to proceed:
Reference manual: Gudhi::Persistence_representations::Persistence_landscape_on_grid
This is an alternative, not–exact, representation of persistence landscapes defined in the Section Persistence Landscapes. Unlike in the Section Persistence Landscapes we build a representation of persistence landscape by sampling its values on a finite, equally distributed grid of points. Since, the persistence landscapes that originate from persistence diagrams have slope
Due to a lack of rigorous description of the algorithms to deal with this non–rigorous representation of persistence landscapes in the literature, we are providing a short discussion of them in below.
Let us assume that we want to compute persistence landscape on a interval
When averaging two persistence landscapes represented by a grid we need to make sure that they are defined in a compatible grids. I.e. the intervals
Computations of distances between two persistence landscapes on a grid is not much different than in the rigorous case. In this case, we sum up the distances between the same levels of corresponding landscapes. For fixed level, we approximate the landscapes between the corresponding constitutive points of landscapes by linear functions, and compute the
Similarly as in case of distance, when computing the scalar product of two persistence landscapes on a grid, we sum up the scalar products of corresponding levels of landscapes. For each level, we assume that the persistence landscape on a grid between two grid points is approximated by linear function. Therefore to compute scalar product of two corresponding levels of landscapes, we sum up the integrals of products of line segments for every pair of constitutive grid points.
Note that for this representation we need to specify a few parameters:
Note that the same representation is used in TDA R-package [31].
Reference manual: Gudhi::Persistence_representations::Persistence_heat_maps
This is a general class of discrete structures which are based on idea of placing a kernel in the points of persistence diagrams. This idea appeared in work by many authors over the last 15 years. As far as we know this idea was firstly described in the work of Bologna group in [28] and [32]. Later it has been described by Colorado State University group in [1]. The presented paper in the first time provide a discussion of stability of the representation. Also, the same ideas are used in construction of two recent kernels used for machine learning: [39] and [43]. Both the kernel's construction uses interesting ideas to ensure stability of the representation with respect to Wasserstein metric. In the kernel presented in [39], a scaling function is used to multiply the Gaussian kernel in the way that the points close to diagonal got low weight and consequently do not have a big influence on the resulting distribution. In [43] for every point
In Persistence_representations package we currently implement a discretization of the distributions described above. The base of this implementation is 2-dimensional array of pixels. Each pixel have assigned a real value which is a sum of values of distributions induced by each point of the persistence diagram. At the moment we compute the sum of values on a center of a pixels. It can be easily extended to any other function (like for instance sum of integrals of the intermediate distribution on a pixel).
The parameters that determine the structure are the following:
In addition to the previous method, we also provide two more methods to perform exact calculations, in the sense that we use functions instead of matrices to define the kernel between the points of the diagrams. Indeed, in both of these exact methods, the kernel is no longer provided as a square matrix, or a filter (see parameters above), but rather as a function assigning a real value to a pair of points in the plane.
In the first of these exact methods, we aim at obtaining a finite-dimensional representation of the diagram, so we still use a grid of pixels. On the other hand, in the second exact method, we represent diagrams implicitly as functions (i.e. infinite-dimensional representations). This way, we no longer require grids, but only scalar products and distances are available with these implicit representations. This type of representations is known as kernel methods (see Kernels on persistence diagrams below for more details on kernels).
Names can be a bit confusing so we recall that, with this second exact method, we implicitly define a kernel representation of diagrams that is built from a kernel between points in the plane. Hence, we have two kernels here, which are independent. One is defined between points in the plane (its type in the code is Kernel2D), and is a template parameter, whereas the other is defined between persistence diagrams (it is the scalar product of the infinite-dimensional representations of the diagrams).
Reference manual: Gudhi::Persistence_representations::Vector_distances_in_diagram
This is a representation of persistent homology in a form of a vector which was designed for an application in 3d graphic in [15]. Below we provide a short description of this representation.
Given a persistence diagram
We pick the smallest of those and add it to a vector. The obtained vector of numbers is then sorted in decreasing order. This way we obtain a persistence vector representing the diagram.
Given two persistence vectors, the computation of distances, averages and scalar products is straightforward. Average is simply a coordinate-wise average of a collection of vectors. In this section we assume that the vectors are extended by zeros if they are of a different size. To compute distances we compute absolute value of differences between coordinates. A scalar product is a sum of products of values at the corresponding positions of two vectors.
Reference manual: Gudhi::Persistence_representations::Sliced_Wasserstein
Reference manual: Gudhi::Persistence_representations::Persistence_heat_maps
Kernels for persistence diagrams can be regarded as infinite-dimensional vectorizations. More specifically, they are similarity functions whose evaluations on pairs of persistence diagrams equals the scalar products between images of these pairs under a map
There have been several attempts at defining kernels, i.e., positive semi-definite functions, between persistence diagrams within the last few years. We provide implementation for the Sliced Wasserstein kernel—see [16], which takes the form of a Gaussian kernel with a specific distance between persistence diagrams called the Sliced Wasserstein distance:
When launching:
the program output is: