Symbolic Regression (UDP)

class symbolic_regression

A Symbolic Regression problem.

../../_images/symbolic_regression.jpg

Symbolic regression is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity (ref: https://en.wikipedia.org/wiki/Symbolic_regression). It also is one of the core applications for Differentiable Cartesian Genetic Programming.

This class provides an easy way to instantiate symbolic regression problems as optimization problems having a continuous part (i.e. the value of the parameters in the model) and an integer part (i.e. the representation of the model computational graph). The instantiated object can be used as UDP (User Defined Problem) in the pagmo optimization suite.

The symbolic regression problem can be instantiated both as a single and a two-objectives problem. In the second case, aside the Mean Squared Error, the formula complexity will be considered as an objective.

Public Functions

inline symbolic_regression()

Default constructor.

A default constructor is needed by the pagmo UDP interface, but it should not be used. It constructs a list of 1 empty points/labels vector and a dummy cgp member. It is guaranteed that m_points[0] and m_labels[0] can be accessed.

inline symbolic_regression(const std::vector<std::vector<double>> &points, const std::vector<std::vector<double>> &labels, unsigned r = 1u, unsigned c = 10u, unsigned l = 11u, unsigned arity = 2u, std::vector<kernel<double>> f = kernel_set<double>({"sum", "diff", "mul", "pdiv"})(), unsigned n_eph = 0u, bool multi_objective = false, unsigned parallel_batches = 0u, std::string loss_s = "MSE", unsigned seed = random_device::next())

Constructor.

Constructs a symbolic_regression optimization problem compatible with the pagmo UDP interface.

Parameters

points – [in] input data.
labels – [in] output data.
r – [in] number of rows of the dCGP.
c – [in] number of columns of the dCGP.
l – [in] number of levels-back allowed in the dCGP.
arity – [in] arity of the basis functions.
f – [in] function set. An std::vector of dcgp::kernel<expression::type>.
n_eph – [in] number of ephemeral constants.
multi_objective – [in] when true, it will consider the model complexity as a second objective.
parallel_batches – [in] number of parallel batches.
loss_s – [in] loss type as string, either “MSE” or “CE”.
seed – [in] seed used for the random engine.

Throws

std::invalid_argument – if points and labels are not consistent.
std::invalid_argument – if the CGP related parameters (i.e. r, c, etc…) are malformed.

inline pagmo::vector_double::size_type get_nobj() const

Number of objectives.

Returns the number of objectives.

Returns: the number of objectives.

inline pagmo::vector_double fitness(const pagmo::vector_double &x) const

Fitness computation.

Computes the fitness for this UDP

Parameters: x – the decision vector.
Returns: the fitness of x.

inline pagmo::vector_double gradient(const pagmo::vector_double &x) const

Gradient computation.

Computes the gradient of the loss with respect to the ephemeral constants (i.e. the continuous part of the chromosome).

Parameters: x – the decision vector.
Returns: the gradient in x.

inline pagmo::sparsity_pattern gradient_sparsity() const

Sparsity pattern (gradient)

Returns the sparsity pattern of the gradient. The sparsity patter is dense in the continuous part of the chromosome. (this is a result of assuming all ephemeral constants are actually in the expression, if not zeros will be returned)

Returns: the gradient sparsity pattern.

inline std::vector<pagmo::vector_double> hessians(const pagmo::vector_double &x) const

Hessian computation.

Computes the hessian of the loss with respect to the ephemeral constants (i.e. the continuous part of the chromosome).

Parameters: x – the decision vector.
Returns: the hessian in x.

inline std::vector<pagmo::sparsity_pattern> hessians_sparsity() const

Sparsity pattern (hessian)

Returns the sparsity pattern of the hessian. The sparsity patter is dense in the continuous part of the chromosome. (this is a result of assuming all ephemeral constants are actually in the expression, if not zeros will be returned)

Returns: the hessian sparsity pattern.

inline std::pair<pagmo::vector_double, pagmo::vector_double> get_bounds() const

Box-bounds.

Returns the box-bounds for this UDP.

Returns: the lower and upper bounds for each of the decision vector components

inline pagmo::vector_double::size_type get_nix() const

Integer dimension.

Returns the integer dimension of the problem.

Returns: the integer dimension of the problem.

inline std::string get_name() const

Problem name.

Returns a string containing the problem name.

Returns: a string containing the problem name

inline std::string get_extra_info() const

Extra info.

Returns: a string containing extra problem information.

inline std::string pretty(const pagmo::vector_double &x) const

Human-readable representation of a decision vector.

A human readable representation of the chromosome is here obtained by calling directly the expression::operator() assuming as inputs variables names \(x_1, x_2, ...\) and as ephemeral constants names \(c_1, c_2, ...\)

Parameters: x – [in] a valid chromosome.
Returns: a string containing the mathematical expression represented by x.

inline std::string prettier(const pagmo::vector_double &x) const

Human-readable representation of a decision vector.

A human readable representation of the chromosome is here obtained by using symengine the expression::operator() assuming as inputs variables names \(x_1, x_2, ...\) and as ephemeral constants names \(c_1, c_2, ...\)

Parameters: x – [in] a valid chromosome.
Returns: a string containing the mathematical expression represented by x.

inline const expression<double> &get_cgp() const

Gets the inner CGP.

The access to the inner CGP is offered in the public interface to allow evolve methods in UDAs to reuse the same object and perform mutations via it. This is a hack to interface pagmo with dCGP. Alternatives would be a friendship relation (uughhh) or construct a new CGP object within the evolve each time (seems expensive). So here it is, FOR USE ONLY IN udas::evolve methods.

inline void set_cgp(const pagmo::vector_double &x) const

Sets the inner CGP.

The access to the inner CGP is offered in the public interface to allow evolve methods in UDAs to reuse the same object and perform mutations via it.

inline std::vector<double> predict(const std::vector<double> &point, pagmo::vector_double x) const

Model predictions.

Uses the model encoded in x to predict the label of point.

Parameters

point – [in] point to be predicted.
x – [in] chromosome encoding the model.

Returns

the predicted label for point.

inline std::vector<std::vector<double>> predict(const std::vector<std::vector<double>> &points, pagmo::vector_double x) const

Model predictions.

Uses the model encoded in x to predict the labels of points.

Parameters

points – [in] points to be predicted.
x – [in] chromosome encoding the model.

Returns

the predicted labels for points.

inline pagmo::thread_safety get_thread_safety() const

Thread safety for this udp.

This is set to none as pitonic kernels could be in the inner expression

inline void set_phenotype_correction(typename expression<double>::pc_fun_type pc, typename expression<audi::gdual_v>::pc_fun_type dpc)

Sets the phenotype correction.

Sets the phenotype correction for both the internal cgp and dcgp. No checks are made and the user must take care that the two functions passed are actually computing the same quantity using different types (double and gdual_v).

Parameters

pc – callable to be applied to correct a double expression.
dpc – callable to be applied to correct a gdual_v expression.

inline void unset_phenotype_correction(): Unsets the phenotype correction.

template<typename Archive> inline void serialize(Archive &ar, unsigned)

Object serialization.

This method will save/load this into the archive ar.

Parameters: ar – target archive.
Throws: unspecified – any exception thrown by the serialization of the expression and of primitive types.