Point cloud utilities manual#

File Readers#

gudhi.read_points_from_off_file()#

Read points from an OFF file.

Parameters:

off_file (string) – An OFF file style name.

Returns:

The point set.

Return type:

List[List[float]]

gudhi.read_lower_triangular_matrix_from_csv_file()#

Read lower triangular matrix from a CSV style file.

Parameters:
  • csv_file (string) – A CSV file style name.

  • separator (char) – The value separator in the CSV file. Default value is ‘;’

Returns:

The lower triangular matrix.

Return type:

List[List[float]]

File Writers#

gudhi.write_points_to_off_file(fname, points)#

Write points to an OFF file.

A simple wrapper for numpy.savetxt.

Parameters:
  • fname (str or file handle) – Name of the OFF file.

  • points (numpy array of shape (n, dim)) – Point coordinates.

Subsampling#

gudhi.subsampling.choose_n_farthest_points()#

Subsample by a greedy strategy of iteratively adding the farthest point from the current chosen point set to the subsampling. The iteration starts with the landmark starting point.

Parameters:

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters:

off_file (string) – An OFF file style name.

And in both cases

Parameters:
  • nb_points (int) – Number of points of the subsample (the subsample may be smaller if there are fewer than nb_points distinct input points). Default: all of them.

  • starting_point (int) – The iteration starts with the landmark starting point, which is the index of the point to start with. If not set, this index is chosen randomly.

  • fast (bool) – If True (default), use an implementation that is efficient when the doubling dimension and spread are small, but slow otherwise. If False, use the standard quadratic algorithm.

Returns:

The subsample point set, in the order they were selected by the greedy strategy.

Return type:

List[List[float]]

gudhi.subsampling.pick_n_random_points()#

Subsample a point set by picking random vertices.

Parameters:

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters:

off_file (string) – An OFF file style name.

And in both cases

Parameters:

nb_points (int) – Number of points of the subsample.

Returns:

The subsample point set.

Return type:

List[List[float]]

gudhi.subsampling.sparsify_point_set()#

Outputs a subset of the input points so that the squared distance between any two points is greater than min_squared_dist.

Parameters:

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters:

off_file (string) – An OFF file style name.

And in both cases

Parameters:

min_squared_dist (float) – Minimum squared distance separating the output points.

Returns:

The subsample point set.

Return type:

List[List[float]]

Time Delay Embedding#

class gudhi.point_cloud.timedelay.TimeDelayEmbedding[source]#

Point cloud transformation class. Embeds time-series data in the R^d according to Takens’ Embedding Theorem and obtains the coordinates of each point.

Example

Given delay=3 and skip=2, a point cloud which is obtained by embedding a scalar time-series into R^3 is as follows:

time-series = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
point cloud = [[1, 4, 7],
               [3, 6, 9]]

Given delay=1 and skip=1, a point cloud which is obtained by embedding a 2D vector time-series data into R^4 is as follows:

time-series = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
point cloud = [[0, 1, 2, 3],
               [2, 3, 4, 5],
               [4, 5, 6, 7],
               [6, 7, 8, 9]]
__call__(ts)[source]#

Transform method for single time-series data.

Parameters:

ts (Iterable[float] or Iterable[Iterable[float]]) – A single time-series data, with scalar or vector values.

Returns:

point cloud – Makes point cloud from a single time-series data.

Return type:

n x dim numpy arrays

__init__(dim=3, delay=1, skip=1)[source]#

Constructor for the TimeDelayEmbedding class.

Parameters:
  • dim (int) – d of R^d to be embedded. Optional (default=3).

  • delay (int) – Time-Delay embedding. Optional (default=1).

  • skip (int) – How often to skip embedded points. Optional (default=1).

transform(ts)[source]#

Transform method for multiple time-series data.

Parameters:

ts (Iterable[Iterable[float]] or Iterable[Iterable[Iterable[float]]]) – Multiple time-series data, with scalar or vector values.

Returns:

point clouds – Makes point cloud from each time-series data.

Return type:

list of n x dim numpy arrays

K nearest neighbors#

class gudhi.point_cloud.knn.KNearestNeighbors[source]#

Class wrapping several implementations for computing the k nearest neighbors in a point set.

Requires:

PyKeOps, SciPy, Scikit-learn, and/or Hnswlib in function of the selected implementation.

__init__(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]#
Parameters:
  • k (int) – number of neighbors (possibly including the point itself).

  • return_index (bool) – if True, return the index of each neighbor.

  • return_distance (bool) – if True, return the distance to each neighbor.

  • implementation (str) –

    choice of the library that does the real work.

    • ’keops’ for a brute-force, CUDA implementation through pykeops. Useful when the dimension becomes large (10+) but the number of points remains low (less than a million). Only “minkowski” and its aliases are supported.

    • ’ckdtree’ for scipy’s cKDTree. Only “minkowski” and its aliases are supported.

    • ’sklearn’ for scikit-learn’s NearestNeighbors. Note that this provides in particular an option algorithm=”brute”.

    • ’hnsw’ for hnswlib.Index. It can be very fast but does not provide guarantees. Only supports “euclidean” for now.

    • None will try to select a sensible one (scipy if possible, scikit-learn otherwise).

  • metric (str) – see sklearn.neighbors.NearestNeighbors.

  • eps (float) – relative error when computing nearest neighbors with the cKDTree.

  • p (float) – norm L^p on input points (including numpy.inf) if metric is “minkowski”. Defaults to 2.

  • n_jobs (int) – number of jobs to schedule for parallel processing of nearest neighbors on the CPU. If -1 is given all processors are used. Default: 1.

  • sort_results (bool) – if True, then distances and indices of each point are sorted on return, so that the first column contains the closest points. Otherwise, neighbors are returned in an arbitrary order. Defaults to True.

  • enable_autodiff (bool) – if the input is a torch.tensor or tensorflow.Tensor, this instructs the function to compute distances in a way that works with automatic differentiation. This is experimental, not supported for all metrics, and requires the package EagerPy. Defaults to False.

  • kwargs – additional parameters are forwarded to the backends.

fit(X, y=None)[source]#
Parameters:

X (numpy.array) – coordinates for reference points.

fit_transform(X, y=None)[source]#
transform(X)[source]#
Parameters:

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”.

Returns:

if return_index, an array of shape (len(X), k) with the indices (in the argument of fit()) of the k nearest neighbors to the points of X. If return_distance, an array of the same shape with the distances to those neighbors. If both, a tuple with the two arrays, in this order.

Return type:

numpy.array

Distance to measure#

class gudhi.point_cloud.dtm.DTMDensity[source]#

Density estimator based on the distance to the empirical measure defined by a point set, as defined in [2]. Note that this implementation only renormalizes when asked, and the renormalization only works for a Euclidean metric, so in other cases the total measure may not be 1.

Note

When the dimension is high, using it as an exponent can quickly lead to under- or overflows. We recommend using a small fixed value instead in those cases, even if it won’t have the same nice theoretical properties as the dimension.

__init__(k=None, weights=None, q=None, dim=None, normalize=False, n_samples=None, **kwargs)[source]#
Parameters:
  • k (int) – number of neighbors (possibly including the point itself). Optional if it can be guessed from weights or metric=”neighbors”.

  • weights (numpy.array) – weights of each of the k neighbors, optional. They are supposed to sum to 1.

  • q (float) – order used to compute the distance to measure. Defaults to dim.

  • dim (float) – final exponent representing the dimension. Defaults to the dimension, and must be specified when the dimension cannot be read from the input (metric is “neighbors” or “precomputed”).

  • normalize (bool) – normalize the density so it corresponds to a probability measure on ℝᵈ. Only available for the Euclidean metric, defaults to False.

  • n_samples (int) – number of sample points used for fitting. Only needed if normalize is True and metric is “neighbors”.

  • kwargs – same parameters as KNearestNeighbors, except that metric=”neighbors” means that transform() expects an array with the distances to the k nearest neighbors.

fit(X, y=None)[source]#
Parameters:

X (numpy.array) – coordinates for mass points.

fit_transform(X, y=None)[source]#
transform(X)[source]#
Parameters:

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).

class gudhi.point_cloud.dtm.DistanceToMeasure[source]#

Class to compute the distance to the empirical measure defined by a point set, as introduced in [13].

__init__(k, q=2, **kwargs)[source]#
Parameters:
  • k (int) – number of neighbors (possibly including the point itself).

  • q (float) – order used to compute the distance to measure. Defaults to 2.

  • kwargs – same parameters as KNearestNeighbors, except that metric=”neighbors” means that transform() expects an array with the distances to the k nearest neighbors.

fit(X, y=None)[source]#
Parameters:

X (numpy.array) – coordinates for mass points.

fit_transform(X, y=None)[source]#
transform(X)[source]#
Parameters:

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).

Returns:

a 1-d array with, for each point of X, its distance to the measure defined by the argument of fit().

Return type:

numpy.array