Point cloud utilities manual

File Readers

gudhi.read_points_from_off_file()

Read points from an OFF file.

Parameters

off_file (string) – An OFF file style name.

Returns

The point set.

Return type

List[List[float]]

gudhi.read_lower_triangular_matrix_from_csv_file()

Read lower triangular matrix from a CSV style file.

Parameters
  • csv_file (string) – A CSV file style name.

  • separator (char) – The value separator in the CSV file. Default value is ‘;’

Returns

The lower triangular matrix.

Return type

List[List[float]]

File Writers

gudhi.write_points_to_off_file(fname, points)

Write points to an OFF file.

A simple wrapper for numpy.savetxt.

Parameters
  • fname (str or file handle) – Name of the OFF file.

  • points (numpy array of shape (n, dim)) – Point coordinates.

Subsampling

Requires

Eigen \(\geq\) 3.1.0 and CGAL \(\geq\) 4.11.0

gudhi.subsampling.choose_n_farthest_points()

Subsample by a greedy strategy of iteratively adding the farthest point from the current chosen point set to the subsampling. The iteration starts with the landmark starting point.

Parameters

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters

off_file (string) – An OFF file style name.

And in both cases

Parameters
  • nb_points (int) – Number of points of the subsample (the subsample may be smaller if there are fewer than nb_points distinct input points)

  • starting_point (int) – The iteration starts with the landmark starting point, which is the index of the point to start with. If not set, this index is chosen randomly.

Returns

The subsample point set.

Return type

List[List[float]]

gudhi.subsampling.pick_n_random_points()

Subsample a point set by picking random vertices.

Parameters

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters

off_file (string) – An OFF file style name.

And in both cases

Parameters

nb_points (int) – Number of points of the subsample.

Returns

The subsample point set.

Return type

List[List[float]]

gudhi.subsampling.sparsify_point_set()

Outputs a subset of the input points so that the squared distance between any two points is greater than min_squared_dist.

Parameters

points (Iterable[Iterable[float]]) – The input point set.

Or

Parameters

off_file (string) – An OFF file style name.

And in both cases

Parameters

min_squared_dist (float) – Minimum squared distance separating the output points.

Returns

The subsample point set.

Return type

List[List[float]]

Time Delay Embedding

class gudhi.point_cloud.timedelay.TimeDelayEmbedding(dim=3, delay=1, skip=1)[source]

Point cloud transformation class. Embeds time-series data in the R^d according to Takens’ Embedding Theorem and obtains the coordinates of each point.

Parameters
  • dim (int, optional (default=3)) – d of R^d to be embedded.

  • delay (int, optional (default=1)) – Time-Delay embedding.

  • skip (int, optional (default=1)) – How often to skip embedded points.

Example

Given delay=3 and skip=2, a point cloud which is obtained by embedding a scalar time-series into R^3 is as follows:

time-series = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
point cloud = [[1, 4, 7],
               [3, 6, 9]]

Given delay=1 and skip=1, a point cloud which is obtained by embedding a 2D vector time-series data into R^4 is as follows:

time-series = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
point cloud = [[0, 1, 2, 3],
               [2, 3, 4, 5],
               [4, 5, 6, 7],
               [6, 7, 8, 9]]
__call__(ts)[source]

Transform method for single time-series data.

Parameters

ts (Iterable[float] or Iterable[Iterable[float]]) – A single time-series data, with scalar or vector values.

Returns

point cloud – Makes point cloud from a single time-series data.

Return type

n x dim numpy arrays

transform(ts)[source]

Transform method for multiple time-series data.

Parameters

ts (Iterable[Iterable[float]] or Iterable[Iterable[Iterable[float]]]) – Multiple time-series data, with scalar or vector values.

Returns

point clouds – Makes point cloud from each time-series data.

Return type

list of n x dim numpy arrays

K nearest neighbors

class gudhi.point_cloud.knn.KNearestNeighbors(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]

Class wrapping several implementations for computing the k nearest neighbors in a point set.

Requires

PyKeOps, SciPy, Scikit-learn, and/or Hnswlib in function of the selected implementation.

__init__(k, return_index=True, return_distance=False, metric='euclidean', **kwargs)[source]
Parameters
  • k (int) – number of neighbors (possibly including the point itself).

  • return_index (bool) – if True, return the index of each neighbor.

  • return_distance (bool) – if True, return the distance to each neighbor.

  • implementation (str) –

    choice of the library that does the real work.

    • ’keops’ for a brute-force, CUDA implementation through pykeops. Useful when the dimension becomes large (10+) but the number of points remains low (less than a million). Only “minkowski” and its aliases are supported.

    • ’ckdtree’ for scipy’s cKDTree. Only “minkowski” and its aliases are supported.

    • ’sklearn’ for scikit-learn’s NearestNeighbors. Note that this provides in particular an option algorithm=”brute”.

    • ’hnsw’ for hnswlib.Index. It can be very fast but does not provide guarantees. Only supports “euclidean” for now.

    • None will try to select a sensible one (scipy if possible, scikit-learn otherwise).

  • metric (str) – see sklearn.neighbors.NearestNeighbors.

  • eps (float) – relative error when computing nearest neighbors with the cKDTree.

  • p (float) – norm L^p on input points (including numpy.inf) if metric is “minkowski”. Defaults to 2.

  • n_jobs (int) – number of jobs to schedule for parallel processing of nearest neighbors on the CPU. If -1 is given all processors are used. Default: 1.

  • sort_results (bool) – if True, then distances and indices of each point are sorted on return, so that the first column contains the closest points. Otherwise, neighbors are returned in an arbitrary order. Defaults to True.

  • enable_autodiff (bool) – if the input is a torch.tensor or tensorflow.Tensor, this instructs the function to compute distances in a way that works with automatic differentiation. This is experimental, not supported for all metrics, and requires the package EagerPy. Defaults to False.

  • kwargs – additional parameters are forwarded to the backends.

fit(X, y=None)[source]
Parameters

X (numpy.array) – coordinates for reference points.

fit_transform(X, y=None)[source]
transform(X)[source]
Parameters

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”.

Returns

if return_index, an array of shape (len(X), k) with the indices (in the argument of fit()) of the k nearest neighbors to the points of X. If return_distance, an array of the same shape with the distances to those neighbors. If both, a tuple with the two arrays, in this order.

Return type

numpy.array

Distance to measure

class gudhi.point_cloud.dtm.DTMDensity(k=None, weights=None, q=None, dim=None, normalize=False, n_samples=None, **kwargs)[source]

Density estimator based on the distance to the empirical measure defined by a point set, as defined in [2]. Note that this implementation only renormalizes when asked, and the renormalization only works for a Euclidean metric, so in other cases the total measure may not be 1.

Note

When the dimension is high, using it as an exponent can quickly lead to under- or overflows. We recommend using a small fixed value instead in those cases, even if it won’t have the same nice theoretical properties as the dimension.

__init__(k=None, weights=None, q=None, dim=None, normalize=False, n_samples=None, **kwargs)[source]
Parameters
  • k (int) – number of neighbors (possibly including the point itself). Optional if it can be guessed from weights or metric=”neighbors”.

  • weights (numpy.array) – weights of each of the k neighbors, optional. They are supposed to sum to 1.

  • q (float) – order used to compute the distance to measure. Defaults to dim.

  • dim (float) – final exponent representing the dimension. Defaults to the dimension, and must be specified when the dimension cannot be read from the input (metric is “neighbors” or “precomputed”).

  • normalize (bool) – normalize the density so it corresponds to a probability measure on ℝᵈ. Only available for the Euclidean metric, defaults to False.

  • n_samples (int) – number of sample points used for fitting. Only needed if normalize is True and metric is “neighbors”.

  • kwargs – same parameters as KNearestNeighbors, except that metric=”neighbors” means that transform() expects an array with the distances to the k nearest neighbors.

fit(X, y=None)[source]
Parameters

X (numpy.array) – coordinates for mass points.

fit_transform(X, y=None)[source]
transform(X)[source]
Parameters

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).

class gudhi.point_cloud.dtm.DistanceToMeasure(k, q=2, **kwargs)[source]

Class to compute the distance to the empirical measure defined by a point set, as introduced in [13].

__init__(k, q=2, **kwargs)[source]
Parameters
  • k (int) – number of neighbors (possibly including the point itself).

  • q (float) – order used to compute the distance to measure. Defaults to 2.

  • kwargs – same parameters as KNearestNeighbors, except that metric=”neighbors” means that transform() expects an array with the distances to the k nearest neighbors.

fit(X, y=None)[source]
Parameters

X (numpy.array) – coordinates for mass points.

fit_transform(X, y=None)[source]
transform(X)[source]
Parameters

X (numpy.array) – coordinates for query points, or distance matrix if metric is “precomputed”, or distances to the k nearest neighbors if metric is “neighbors” (if the array has more than k columns, the remaining ones are ignored).

Returns

a 1-d array with, for each point of X, its distance to the measure defined by the argument of fit().

Return type

numpy.array