Subsampling

Enumerations

enum  : std::size_t { Gudhi::subsampling::random_starting_point = std::size_t(-1) }
 

Functions

template<typename Distance , typename Point_range , typename PointOutputIterator , typename DistanceOutputIterator = Null_output_iterator>
void Gudhi::subsampling::choose_n_farthest_points (Distance dist, Point_range const &input_pts, std::size_t final_size, std::size_t starting_point, PointOutputIterator output_it, DistanceOutputIterator dist_it={})
 Subsample by an iterative, greedy strategy. More...
 
template<typename Distance , typename Point_range , typename PointOutputIterator , typename DistanceOutputIterator = Null_output_iterator>
void Gudhi::subsampling::choose_n_farthest_points_metric (Distance dist_, Point_range const &input_pts, std::size_t final_size, std::size_t starting_point, PointOutputIterator output_it, DistanceOutputIterator dist_it={})
 Subsample by an iterative, greedy strategy. More...
 
template<typename Point_container , typename OutputIterator >
void Gudhi::subsampling::pick_n_random_points (Point_container const &points, std::size_t final_size, OutputIterator output_it)
 Subsample a point set by picking random vertices. More...
 
template<typename Kernel , typename Point_range , typename OutputIterator >
void Gudhi::subsampling::sparsify_point_set (const Kernel &k, Point_range const &input_pts, typename Kernel::FT min_squared_dist, OutputIterator output_it)
 Outputs a subset of the input points so that the squared distance between any two points is greater than min_squared_dist. More...
 

Detailed Description

Author
Clément Jamin, Siargey Kachanovich, Marc Glisse

Introduction

This Gudhi component offers methods to subsample a set of points.

Example: sparsify_point_set

This example outputs a subset of the input points so that the squared distance between any two points is greater than or equal to 0.4.

#include <gudhi/sparsify_point_set.h>
#include <CGAL/Epick_d.h>
#include <CGAL/Random.h>
#include <iostream>
#include <vector>
#include <iterator>
int main(void) {
typedef CGAL::Epick_d<CGAL::Dimension_tag<4> > K;
typedef typename K::Point_d Point_d;
CGAL::Random rd;
std::vector<Point_d> points;
for (int i = 0; i < 500; ++i)
points.push_back(Point_d(rd.get_double(-1., 1), rd.get_double(-1., 1),
rd.get_double(-1., 1), rd.get_double(-1., 1)));
K k;
std::vector<Point_d> results;
Gudhi::subsampling::sparsify_point_set(k, points, 0.4, std::back_inserter(results));
std::clog << "Before sparsification: " << points.size() << " points.\n";
std::clog << "After sparsification: " << results.size() << " points.\n";
return 0;
}
void sparsify_point_set(const Kernel &k, Point_range const &input_pts, typename Kernel::FT min_squared_dist, OutputIterator output_it)
Outputs a subset of the input points so that the squared distance between any two points is greater t...
Definition: sparsify_point_set.h:50

Example: choose_n_farthest_points

This example outputs a subset of 100 points obtained by González algorithm, starting with a random point, and greedily adding the point farthest from those already picked.

#include <gudhi/choose_n_farthest_points.h>
#include <CGAL/Epick_d.h>
#include <CGAL/Random.h>
#include <iostream>
#include <vector>
#include <iterator>
int main(void) {
typedef CGAL::Epick_d<CGAL::Dimension_tag<4> > K;
typedef typename K::Point_d Point_d;
CGAL::Random rd;
std::vector<Point_d> points;
for (int i = 0; i < 500; ++i)
points.push_back(Point_d(rd.get_double(-1., 1), rd.get_double(-1., 1),
rd.get_double(-1., 1), rd.get_double(-1., 1)));
K k;
std::vector<Point_d> results;
Gudhi::subsampling::choose_n_farthest_points(k.squared_distance_d_object(), points, 100,
std::back_inserter(results));
std::clog << "Before sparsification: " << points.size() << " points.\n";
std::clog << "After sparsification: " << results.size() << " points.\n";
return 0;
}
void choose_n_farthest_points(Distance dist, Point_range const &input_pts, std::size_t final_size, std::size_t starting_point, PointOutputIterator output_it, DistanceOutputIterator dist_it={})
Subsample by an iterative, greedy strategy.
Definition: choose_n_farthest_points.h:75
@ random_starting_point
Definition: choose_n_farthest_points.h:43

There is an alternative version choose_n_farthest_points_metric() with the same syntax, which can be faster in many cases.

Example: pick_n_random_points

This example outputs a subset of 100 points picked randomly.

#include <gudhi/pick_n_random_points.h>
#include <CGAL/Epick_d.h>
#include <CGAL/Random.h>
#include <iostream>
#include <vector>
#include <iterator>
int main(void) {
typedef CGAL::Epick_d<CGAL::Dimension_tag<4> > K;
typedef typename K::Point_d Point_d;
CGAL::Random rd;
std::vector<Point_d> points;
for (int i = 0; i < 500; ++i)
points.push_back(Point_d(rd.get_double(-1., 1), rd.get_double(-1., 1),
rd.get_double(-1., 1), rd.get_double(-1., 1)));
K k;
std::vector<Point_d> results;
Gudhi::subsampling::pick_n_random_points(points, 100, std::back_inserter(results));
std::clog << "Before sparsification: " << points.size() << " points.\n";
std::clog << "After sparsification: " << results.size() << " points.\n";
return 0;
}
void pick_n_random_points(Point_container const &points, std::size_t final_size, OutputIterator output_it)
Subsample a point set by picking random vertices.
Definition: pick_n_random_points.h:42

Enumeration Type Documentation

◆ anonymous enum

anonymous enum : std::size_t
Enumerator
random_starting_point 

Argument for choose_n_farthest_points to indicate that the starting point should be picked randomly.

Function Documentation

◆ choose_n_farthest_points()

template<typename Distance , typename Point_range , typename PointOutputIterator , typename DistanceOutputIterator = Null_output_iterator>
void Gudhi::subsampling::choose_n_farthest_points ( Distance  dist,
Point_range const &  input_pts,
std::size_t  final_size,
std::size_t  starting_point,
PointOutputIterator  output_it,
DistanceOutputIterator  dist_it = {} 
)

Subsample by an iterative, greedy strategy.

The algorithm starts with the landmark starting point or, if starting point==random_starting_point, with a landmark chosen randomly from the point set. At each iteration, it finds the point farthest from the set of current landmarks, outputs this distance, and promotes the point to landmark. It stops after finding final_size landmarks.

Template Parameters
Distancemust provide an operator() that takes 2 points (value type of the range) and returns their distance (or some more general proximity measure) as a double.
Point_rangeRandom access range of points.
PointOutputIteratorOutput iterator whose value type is the point type.
DistanceOutputIteratorOutput iterator for distances.
Parameters
[in]distDistance function.
[in]input_ptsThe input points.
[in]final_sizeThe size of the subsample to compute (reduced to the number of input points if final_size is larger).
[in]starting_pointThe seed in the farthest point algorithm.
[out]output_itThe output iterator where landmarks are written.
[out]dist_itThe optional output iterator where the distance from a landmark to the previous landmarks is written.
Warning
Older versions of this function took a CGAL kernel as argument. Users need to replace k with k.squared_distance_d_object() in the first argument of every call to choose_n_farthest_points.
Examples
example_choose_n_farthest_points.cpp, example_custom_distance.cpp, example_strong_witness_complex_off.cpp, example_witness_complex_off.cpp, example_witness_complex_sphere.cpp, strong_witness_persistence.cpp, and weak_witness_persistence.cpp.

◆ choose_n_farthest_points_metric()

template<typename Distance , typename Point_range , typename PointOutputIterator , typename DistanceOutputIterator = Null_output_iterator>
void Gudhi::subsampling::choose_n_farthest_points_metric ( Distance  dist_,
Point_range const &  input_pts,
std::size_t  final_size,
std::size_t  starting_point,
PointOutputIterator  output_it,
DistanceOutputIterator  dist_it = {} 
)

Subsample by an iterative, greedy strategy.

This computes the same thing as choose_n_farthest_points(), but relies on the triangle inequality to reduce the amount of computation when the doubling dimension and spread are small. In the worst case, this can be much slower than choose_n_farthest_points() though. See [20] and its associated video for details about this algorithm.

Template Parameters
Distancemust provide an operator() that takes 2 points (value type of the range) and returns their distance as a double. It must be a true metric (not squared Euclidean), the algorithm relies on the triangle inequality.
Point_rangeRandom access range of points.
PointOutputIteratorOutput iterator whose value type is the point type.
DistanceOutputIteratorOutput iterator for distances.
Parameters
[in]dist_Distance function.
[in]input_ptsThe input points.
[in]final_sizeThe size of the subsample to compute (reduced to the number of input points if final_size is larger).
[in]starting_pointThe seed in the farthest point algorithm.
[out]output_itThe output iterator where landmarks are written.
[out]dist_itThe optional output iterator where the distance from a landmark to the previous landmarks is written.

◆ pick_n_random_points()

template<typename Point_container , typename OutputIterator >
void Gudhi::subsampling::pick_n_random_points ( Point_container const &  points,
std::size_t  final_size,
OutputIterator  output_it 
)

Subsample a point set by picking random vertices.

It chooses final_size distinct points from a random access range points and outputs them to the output iterator output_it. Point_container::iterator should be ValueSwappable and RandomAccessIterator.

Examples
example_pick_n_random_points.cpp.

◆ sparsify_point_set()

template<typename Kernel , typename Point_range , typename OutputIterator >
void Gudhi::subsampling::sparsify_point_set ( const Kernel &  k,
Point_range const &  input_pts,
typename Kernel::FT  min_squared_dist,
OutputIterator  output_it 
)

Outputs a subset of the input points so that the squared distance between any two points is greater than min_squared_dist.

Template Parameters
Kernelmust be a model of the SearchTraits concept, such as the CGAL::Epick_d class, which can be static if you know the ambiant dimension at compile-time, or dynamic if you don't.
Point_rangeRange whose value type is Kernel::Point_d. It must provide random-access via operator[] and the points should be stored contiguously in memory.
OutputIteratorOutput iterator whose value type is Kernel::Point_d.
Parameters
[in]kA kernel object.
[in]input_ptsConst reference to the input points.
[in]min_squared_distMinimum squared distance separating the output points.
[out]output_itThe output iterator.
Examples
example_sparsify_point_set.cpp, and example_with_perturb.cpp.